commet - links validator program.
These notes relate to Version 2.3 (20 April 1998).
commet URL [ URL... ] [ options ]
or
commet host [ -database db_name ] [ -update ] [ options ]where options is: [ -checklinks ] [ -volume num ] [ -baddies filename ] [ -goodies filename ]
Commet is a links validator that is designed to work either as a standalone application (taking a list of URLs to check) or as part of the Feathers(tm) URL database application. The first argument must be either a URL to check (standalone) or the name of a remote host on which the Feathers mSQL server is running (Feathers mode).Commet checks the validity of links by issuing a "HEAD request" on the URL. Commet will optionally download the body of the document, extract all its links, and validate those as well (only to one level). It is designed to be a quick, lightweight version of Gromit.
If the first argument is a host name then it is assumed to be the name of a host where an mSQL daemon is running. This daemon is usually a Feathers(tm) database server. Specify the database name with the -database argument (below). If the mSQL database is not running on a remote server, but is local, use localhost as the server name.
A URL is considered valid by Commet if the server returns a status code between "200" and "299" when a HEAD request is made for the file. Otherwise the URL is taken to be invalid.The status code and URL for a bad link are typically stored in the file commet.bad (unless over-ridden with the -baddies argument) and displayed on standard output (unless -volume has been set to below 2). The following codes should not be taken as exhibiting a permanent error -- links failing with this code should be checked again later, or checked manually using a browser an appropriate authentication requirements:
- 30X: 300 'Multiple Choices', 302 'Moved Temporarily', 303 'See Other', 304 'Not Modified', 305 'Use Proxy'
- 40X: 402 'Payment Required', 407 'Proxy Authentication Required', 408 'Request Timeout', 409 'Conflict', 415 'Unsupported Media Type'
- 50X: 500 'Internal Server Error', 502 'Bad Gateway', 503 'Service Unavailable', 504 'Gateway Timeout', 505 'HTTP Version Not Supported'
If Commet has to check two documents in a row that come from the same server it will wait 10 seconds between requests, so as to not flood the remote server with HEAD requests.Even though designed for Feathers, Commet could easily be used to work with any database backend that has a Perl DBD driver written for it (including Oracle, Sybase and others). Commet uses symbolic field names and SQL statements so minimal changes would be required to adapt the code for use with other URL database systems.
Commet returns 0 if no errors were encountered (all links were OK). Otherwise, Commet returns 1. Commet will return 2 if there were syntax errors on the command line.
gromit(1); wallace(1).
Commet could be greatly sped up if modified to run under wallace(1), which would allow Commet to be "threaded". Checking multiple entries at once, especially for Feathers(tm) database would cut down the time required to validate all links.It appears that some servers return error codes on HEAD requests that they do not return on a proper GET (check mcni.net).
On OS/2, Win95 or WinNT, "file not found" errors are generated if "/dev/null" is used as an argument to -baddies or -goodies. Note that /dev/null is used for -goodies by default. Use NUL instead, or edit the defaults.
Daniel Austin (dan at austlii.edu.au)
Australasian Legal Information Institute (AustLII)