NAME

webwatcher - A tool for keeping track of web content


SYNOPSIS

webwatcher [-uhdm] [-c config file]


DESCRIPTION

Web sites where many different people are managing the content get stale quickly. People tend to forget to take care of documents for which they are responsible. Webwatcher is designed to help with that problem. It keeps tracks of owners and expiration times on a configurable collection of files in a configurable collection of directories. Each directory is custom configurable for it's own behavior, including the filename extenions investigated and the number of days a content owner will be notified that a document needs attention before a configurable ``boss'' is notified that they've been slacking.

Output is mailed to the document owners on a per document basis, or a full report on all the files can be generated.

Normal usage is to call webwatcher from cron on a daily basis with the -m setting.

During operation webwatcher parses it's configuration file, determines what directories it should look in and what files (based on extension, such as .html or .html) it should examine. A file which is chosen for examinaton is opened and the program searches for the following tags in the body:

<meta name=``expires'' content=``timeformat''>

<meta name=``owner'' content=``owner''>

and attempts to pull out the HTML <title>. If the program cannot find both the expires and owner meta tags within the first max_lines of the file, it moves on to the next possibility. Webwatcher recursively passes through the directories.

If it does find the tags, owner and timeformat are collected and some comparisons are made.

timeformat is either a relative number of days, '+<number of days>', or an absolute timestamp 'YYYYMMDD'. Webwatcher checks the modification time of the file and compares that agains the expires time. If the file is expired and the -m flag is set, the owner is mailed a notification. If owner does not contain an '@' the mail_domain configuration setting is added to the owner and the result is mailed.

If -m is not set, the information is output to STDOUT.


OPTIONS

-d

Turn on debugging. This will show some progress of the program and indicate what files it is checking into.

-u

Show brief usage information.

-h

Show this this help information.

-m

Have reporting go out through the mail on a per file basis to the owner of each file.

-c configfile

Specify a config file different from the default. Can be used (e.g) to allow individual users to manage their own content with their own webwatcher configuration files.


CONFIGURATION

The webwatcher.conf file sets the configuration for webwatcher. Configuration is centered around determining what directories are to be searched for files and what actions are to be taken in each of those directories.

The configuration file is line based, with a directory per line and one 'Default' line. Blank lines and lines that start with # are ignored. The Default line sets defaults which may be overridden on a per directory basis. There are some examples in the distributed configuration file.

A Default line is not required, but without it you must set all the configuration options for each directory.

The format of the file is, on one single line per directory, as follows:

directory:extensions:expire_time:boss:mail_domain:max_lines:warn_time

directory

A directory in which webwatcher should look for files. Any directories listed will be recursively searched, so there's no need to list a directory within one you have already listed.

extensions

A space separated list of extensions to files that should be searched. For example: ``html htm php phtml''.

expire_time

A number in days. After a file has been ignored by a content owner after being notified for this many days the boss is mailed as well as the content owner.

boss

Either a fully qualified mail address, or a username. When expire_time has passed for a particular file this person will be mailed as well. If there is no '@' in boss, an '@' and mail_domain will be appended.

mail_domain

A real domain that can receive mail. When webwatcher attempts to send out a mail message, if the recipient has no domain, this is the domain that will be added to qualify the address. This means that the owner meta tag may contain just a username.

max_lines

An integer. Defines how many lines into a file webwatcher should search for the meta tags. This is configurable on a per directory basis in case there is content which for some reason has a lot of stuff at the top of the file. It exists to prevent entire files being read in the even that there are very large HTML files.

warn_time

An integer, representing days. If you would like to receive a mail message some number of days before a file expires instead of after it expires set this item. Depending on your requirements this may not be very useful to you.

follow_sym

A boolean (1 or 0) that states whether or not symlinks should be followed. If you set this to zero you are making the assumption that a file that is symlinked is being webwatched elsewhere at the real location of the file.


GENERAL OPERATION

So, to make webwatcher work for you, what you would do is define appropriate Default settings in the configuration file and list whatever directories you would like to have webwatched. In those directories, edit the files to include the meta tags described above in DESCRIPTION.

If you get stuck, contact the author and he'll help out and fix the docs as well.


BUGS

At this time there is no facillity for excluding a directory from being searched.


FILES

webwatcher.conf

configuration file, defaults to being found in /etc/webwatcher


COPYRIGHT

The program and associated documentation is Copyright 2000 Chris Dent <cdent@kiva.net> all rights reserved. This program is free software; you can redistribute it and/or modify it under the terms of either the GNU Public License or the Perl Artistic License. The author would appreciate notification if you find bugs, make improvements or otherwise have comments.