GROMIT

Commet Links Validator Manual

NAME

commet - links validator program.

VERSION

These notes relate to Version 2.3 (20 April 1998).

SYNOPSIS

commet URL [ URL... ] [ options ]
or
commet host [ -database db_name ] [ -update ] [ options ]

where options is: [ -checklinks ] [ -volume num ] [ -baddies filename ] [ -goodies filename ]

DESCRIPTION

Commet is a links validator that is designed to work either as a standalone application (taking a list of URLs to check) or as part of the Feathers(tm) URL database application. The first argument must be either a URL to check (standalone) or the name of a remote host on which the Feathers mSQL server is running (Feathers mode).

Commet checks the validity of links by issuing a "HEAD request" on the URL. Commet will optionally download the body of the document, extract all its links, and validate those as well (only to one level). It is designed to be a quick, lightweight version of Gromit.

OPTIONS

URL or host name
The first argument must be either a URL or a host name. If a URL is given, it (and any subsquent URLs also passed on the command line) is checked for validity. Invalid URLs are printed on STDOUT. Valid URLs are not displayed at all (but see -volume and -goodies below).

If the first argument is a host name then it is assumed to be the name of a host where an mSQL daemon is running. This daemon is usually a Feathers(tm) database server. Specify the database name with the -database argument (below). If the mSQL database is not running on a remote server, but is local, use localhost as the server name.

-database db_name
The name of the Feathers(tm) database on the remote system. This will be the database name you created in mSQL on the remote system. The default database name is linker.

-checklinks
If present, the body of the document is downloaded and all links extracted. The extracted links are checked for validity as if they were entered on the command line. Only one level of extraction is allowed, lest Commet try to validate the entire World Wide Web. The default is for checklinks to be off.

-update
If present, the corresponding record for the URL just checked is updated in the Feathers(tm) database. The updates that occur are:

  1. The record_status field is updated to reflect current status (2=active; 3=unable to contact). This reflects whether or not the URL was successfully located.

  2. The date_contact field is updated to reflect the current date and time (in UNIX system time format).

  3. The date_modified field is updated to reflect the date and time that the server reported this file as being last modified. If the server does not return a Last-Modified header, this is set to 0.

  4. You must have write access to the database. The default is for updates to be off.

-volume 0 to 4
Sets the verbosity of Commet. The default is 2. The levels are:
-baddies filename
-goodies filename
These options allow recording of Commet's work. By default, a copy of the bad links found is stored in commet.bad in the current directory. You can redirect this by specifying a different file name after the -baddies switch. To turn recording off, specify /dev/null as the filename. Good links found are not recorded normally, but you can save the list of valid links by putting a file name after the -goodies argument.

URL VALIDATION

A URL is considered valid by Commet if the server returns a status code between "200" and "299" when a HEAD request is made for the file. Otherwise the URL is taken to be invalid.

The status code and URL for a bad link are typically stored in the file commet.bad (unless over-ridden with the -baddies argument) and displayed on standard output (unless -volume has been set to below 2). The following codes should not be taken as exhibiting a permanent error -- links failing with this code should be checked again later, or checked manually using a browser an appropriate authentication requirements:

NOTES

If Commet has to check two documents in a row that come from the same server it will wait 10 seconds between requests, so as to not flood the remote server with HEAD requests.

Even though designed for Feathers, Commet could easily be used to work with any database backend that has a Perl DBD driver written for it (including Oracle, Sybase and others). Commet uses symbolic field names and SQL statements so minimal changes would be required to adapt the code for use with other URL database systems.

FILES

commet.bad
List of "bad" URLs preceded by the server error code that led them to be declared "bad". Created by default in the current working directory when Commet is invoked, but can be over-ridden with the -baddies argument (see above).

EXIT STATUS

Commet returns 0 if no errors were encountered (all links were OK). Otherwise, Commet returns 1. Commet will return 2 if there were syntax errors on the command line.

SEE ALSO

gromit(1); wallace(1).

BUGS

Commet could be greatly sped up if modified to run under wallace(1), which would allow Commet to be "threaded". Checking multiple entries at once, especially for Feathers(tm) database would cut down the time required to validate all links.

It appears that some servers return error codes on HEAD requests that they do not return on a proper GET (check mcni.net).

On OS/2, Win95 or WinNT, "file not found" errors are generated if "/dev/null" is used as an argument to -baddies or -goodies. Note that /dev/null is used for -goodies by default. Use NUL instead, or edit the defaults.

AUTHOR

Daniel Austin (dan at austlii.edu.au)
Australasian Legal Information Institute (AustLII)

Gromit Web Toolbox / http://avoca.austlii.edu.au/~dan/gromit/