6. Why legal research on the internet is difficult
Despite the existence of these research aids, finding legal information on the
internet is difficult, for at least the following reasons:
So the problems of finding legal
materials world-wide are that it is both difficult to find which useful sites
exist for a particular country or subject, and also difficult to find what is
on such sites as are known. These research problems are very substantial even
for the most expert `internet savvy' lawyers and law librarians. They are much
worse for inexperienced users.
- Intellectual indexes are hard to maintain As the quantity of legal
material on the internet grows, the sites that contain significant legal
information grows so numerous, and some of sites are so large, that it is
difficult to maintain intellectual indexes, at least with any depth of indexing
of each site. The best that can be hoped for is that sites with significant
legal materials are identified in the index, even though there is no detailed
description of their content. For example, it soon becomes impossible to
include in an intellectual index the content of each piece of legislation, each
case, or each journal article included on a large site.
- Good intellectual indexes for law are hard to find While there are
many multi-country intellectual indices to law on the internet, none are even remotely comprehensive, and many are
US-oriented with a slight international gloss. Some very good indices do exist
for particular countries (eg Canada, the USA, Germany and Australia), and for
some subject matter areas, but there are few of them and they are often
difficult to find from the multi-country indices. It is therefore difficult to
find a good place to start! The coverage of legal materials in general-purpose
internet indexes is no more helpful, as an inspection of the limited coverage
of legal materials in an index such as Yahoo! (the largest
general-purpose index) will show.
- Robot indexes are not comprehensive There are very good
internet-wide robot indexes, such as Alta Vista, but they are not as
comprehensive as people often assumehttp://searchenginewatch.com/features.htm)].There
are a number of reasons for thishttp://searchenginewatch.com/size.htm]:
- Some robots only index a sample of pages on a particular site (at least at
any one time), and do not continue indexing until they complete all pages on a
site in one session. In 1996 it was claimed that Alta Vista only indexed
about 10% of the pages of moderately large web sites (600 / 6,000 pages in the
example cited), and not denied by Alta Vistahttp://www5.zdnet.com/anchordesk/talkback/talkback_11638.html].
Alta Vista now claims to index sites without any limit on pages.
- Well-behaved robotshttp://info.webcrawler.com/mak/projects/robots/robots.html ]
adhere to the robot exclusion standardhttp://info.webcrawler.com/mak/projects/robots/exclusion.html],
by which web servers tell robots which pages they may not index on a site.
Because of the effects of some robots on server performance, and for other
reasons, many servers exclude robots. All major search engines observe robot
- There are some technical problems with frames and with dynamically created
web pages that mean that cannot be included in web spider indexing.
- Some web spiders (including Alta Vista) only re-index some sites as
infrequently as every three months, so there may be new pages added in that
period that are not indexed.
- Robot indexes contain too much `noise' It is difficult to make
searches precise enough to find only legal materials using internet-wide robot
indexes, because they index predominantly non-legal material. It is usually
necessary to try to impose some ad hoc search limitation (in addition to the
real search terms) such as `law or legislation or code or court' or some such,
to try to stem the flood of irrelevant information (or more likely, to fool the
relevance ranking into putting legally oriented material first).
- Robot indexes are difficult to search for particular countries It
is also difficult for most users to limit searches to materials concerning laws
of particular countries, and failure to do
so will usually result in the search being flooded with material from North
America and other `content rich' parts of the internet.
- Many significant law sites can't be searched When you do find a
site containing valuable legal information it will often not have a search
engine at all, so searching at word level is not possible. Of the more than 30
internet sites around the world containing significant quantities of
legislation, less than half have any search engine. It requires considerably
greater technical ability to run a search engine than it does to simply put
pages of legal material onto the internet where they can be browsed.
- Using different search engines can be confusing Even if a law site
does have its own search engine, users who wish to find legal materials on
different sites can also be easily confused by the need to use different search
engines with different search commands.
http://www.austlii.edu.au/links/World/Indices/ for many examples.
 Such factors led to estimates in 1996
that even the largest internet-wide search engines only indexed about 20% of
the estimated 150 million web pages. However, the most recent figures published
by Search Engine Watch (at include claims by Alta Vista to index 100 M pages,
HotBot 80 M, Excite 55 M and others less than that. The estimated total number
of web pages is now in excess of 200 M, so, whatever the exact situation may
be, it is still the case that no search engines can claim to index all pages on
 See `How big are the search engines?'
(Search Engine Watch) at and references linked therefrom, for detailed
discussion of all these matters.
 The claim by John Pike, webmaster of the
American Federation of Scientists, and the reply by Alta Vista are available at
and discussed in `The Alta Vista Size Controversy' on Search Engine Watch at
 See Martin Koster `The Web Robots Pages'
at for details of the operation of web robots
 See the `Robots Exclusion' page, dealing
with both the standard and the Meta Tag for robot exclusion at
 For example, on Alta Vista, a search for
Vietnamese legal materials requires a search which is limited to materials
which are located on a server in Vietnam (the `domain:vn' delimiter) or contain
`Vietnam or Viet Nam' - and this is still somewhat hit or miss.