AustLII Web Toolbox Manual
NAME
gromit - recursively download HTML pages.
wallaced - control multiple gromit processes.
commet - validate URLs read from an mSQL(tm) database.
VERSION
These notes relate to Version 2.3 (20 April 1998).
SYNOPSIS
See gromit(1);
wallaced(1); and
commet(1).
GENERAL DESCRIPTION
The AustLII Web Toolbox consists of two web "robots" (also called
spiders, or sometimes "bots") and a harness program. A number of
small demonstration or utility scripts are also provided in the
distribution. Almost all programs are written in Perl 5, and
rely extensively on the LWP library.
GROMIT
Gromit is the main workhorse. It recursively downloads URLs,
starting from a specific page and working down. It obeys the
Robots Exclusion Protocol (robots.txt) and has a set delay between
downloads to stop it from flooding remote servers. See the
help file gromit(1).
COMMET
Commet works with an mSQL database of URLs and checks them for
validity. Alternatively it can read its list of URLs from an
ASCII file (one URL per line). It will typically print bad URLs
to STDOUT, but if -update is specified, it will also
update the mSQL database with appropriate status fields. Help
is provided in commet(1).
WALLACE
Wallace is the human interface for Gromit, and is able to
control multiple gromit processes. It will also interact
with an mSQL database, to determine which URLs are meant to
be downloaded with Gromit, and which options should be passed
to Gromit for the download. Only one gromit process is used for
each site. You can configure how many processes run in total.
See wallaced(1).
AUTHOR
The AustLII Web Toolbox is written and maintained by Daniel
Austin (Australasian Legal
Information Institute.
Gromit Web Toolbox / http://avoca.austlii.edu.au/~dan/gromit/