2. The problems of finding law on the internet
Despite the abundance of valuable legal materials already on the web, and the
rapidity with which these materials are expanding, these materials are often
very difficult to find, since they are scattered across thousands of web sites
located all around the world.
There are essentially only two types of tools which help users find legal
materials on the internet, `intellectual' indexes and `robot' or automated
* 'Intellectual' indexes, where individual web sites are classified
by hand according to various classificatory schemes. Usually, such indices only
provide the title, URL and perhaps a brief
description of each site indexed. Yahoo!http://www.yahoo.com/]
is a well known example of a general intellectual index of the web (ie one
which is not law-specific).
Despite the existence of these research aids, finding legal information on the
internet is difficult, for at least the following reasons:
* `Robot' / automated indexes, where a program (variously called a
`web robot' or `web spider') traverses the web, downloading every page it
encounters, so that every word on every page can be indexed by a remotely
located search engine. When the search engine displays a URL as a result of a
search, that URL is to the original site, not to a mirror on the remote site.
is perhaps the best known general example of such an `internet-wide' search
engine that searches an index created by a web spider. The principle advantage
of this approach that it is possible to search every word that has been
indexed, not just the titles and brief summary of what is on the site.
So the problems of finding legal
materials world-wide are that it is both difficult to find which useful sites
exist for a particular country or subject, and also difficult to find what is
on such sites as are known. These research problems are very substantial even
for the most expert `internet savvy' lawyers and law librarians. They are much
worse for inexperienced users.
- Intellectual indexes are hard to maintain As the quantity of legal
material on the internet grows, the sites that contain significant legal
information grows so numerous, and some of sites are so large, that it is
difficult to maintain intellectual indexes, at least with any depth of indexing
of each site. The best that can be hoped for is that sites with significant
legal materials are identified in the index, even though there is no detailed
description of their content. For example, it soon becomes impossible to
include in an intellectual index the content of each piece of legislation, each
case, or each journal article included on a large site.
- Good intellectual indexes for law are hard to find While there are
many multi-country intellectual indices to law on the internethttp://www.austlii.edu.au/links/World/Indices/ ],
none are even remotely comprehensive, and many are US-oriented with a slight
international gloss. Some very good indices do exist for particular countries
(eg Canada, the USA, Germany and Australia), and for some subject matter areas,
but there are few of them and they are often difficult to find from the
multi-country indices. It is therefore difficult to find a good place to start!
The coverage of legal materials in general-purpose internet indexes is no more
helpful, as an inspection of the paltry coverage of legal materials in an index
such as Yahoo! (the largest general-purpose index) will show.
- Robot indexes are not comprehensive There are very good
internet-wide robot indexes, such as Alta Vista, but they are not as
comprehensive as people often assume. For example, Alta Vista apparently
only indexes about 600 pages of even the largest web site. Furthermore, well-behave robots adhere to the robot
exclusion standard, by which web servers tell robots which pages they may not
index on a site. Because of the effects of some robots on server performance,
and for other reasons, many servers exclude robots. Such factors lead to
estimates that even the largest internet-wide search engines only index about
20% of the estimated 150 million web pages.
- Robot indexes contain too much `noise' It is difficult to make
searches precise enough to find only legal materials using internet-wide robot
indexes, because they index predominantly non-legal material. It is usually
necessary to try to impose some ad hoc search limitation (in addition to the
real search terms) such as `law or legislation or code or court' or some such,
to try to stem the flood of irrelevant information (or more likely, to fool the
relevance ranking into putting legally oriented material first).
- Robot indexes are difficult to search for particular countries It
is also difficult for most users to limit searches to materials concerning laws
of particular countries, and failure to do so
will usually result in the search being flooded with material from North
America and other `content rich' parts of the internet.
- Many significant law sites can't be searched When you do find a
site containing valuable legal information it will often not have a search
engine at all, so searching at word level is not possible. Of the more than 30
internet sites around the world containing significant quantities of
legislation, less than half have any search engine. It requires considerably
greater technical ability to run a search engine than it does to simply put
pages of legal material onto the internet where they can be browsed.
- Using different search engines can be confusing Even if a law site
does have its own search engine, users who wish to find legal materials on
different sites can also be easily confused by the need to use different search
engines with different search commands.
 `Universal Resource Locator' or internet
address of a web page
 See for many examples.
 Email from John Pike, webmaster of the
American Federation of Scientists, quotes confirmation from Alta Vista that 600
is about the maximum for any one site.
 For example, on Alta Vista, a search for
Vietnamese legal materials requires a search which is limited to materials
which are located on a server in Vietnam (the `domain:vn' delimiter) or contain
`Vietnam or Viet Nam' - and this is still somewhat hit or miss.