- 6. Why legal research on the internet is difficult

6. Why legal research on the internet is difficult

Despite the existence of these research aids, finding legal information on the internet is difficult, for at least the following reasons:

Intellectual indexes are hard to maintain As the quantity of legal material on the internet grows, the sites that contain significant legal information grows so numerous, and some of sites are so large, that it is difficult to maintain intellectual indexes, at least with any depth of indexing of each site. The best that can be hoped for is that sites with significant legal materials are identified in the index, even though there is no detailed description of their content. For example, it soon becomes impossible to include in an intellectual index the content of each piece of legislation, each case, or each journal article included on a large site.
Good intellectual indexes for law are hard to find While there are many multi-country intellectual indices to law on the internet[23], none are even remotely comprehensive, and many are US-oriented with a slight international gloss. Some very good indices do exist for particular countries (eg Canada, the USA, Germany and Australia), and for some subject matter areas, but there are few of them and they are often difficult to find from the multi-country indices. It is therefore difficult to find a good place to start! The coverage of legal materials in general-purpose internet indexes is no more helpful, as an inspection of the limited coverage of legal materials in an index such as Yahoo! (the largest general-purpose index) will show.
Robot indexes are not comprehensive There are very good internet-wide robot indexes, such as Alta Vista, but they are not as comprehensive as people often assume[24]http://searchenginewatch.com/features.htm)].There are a number of reasons for this[25]http://searchenginewatch.com/size.htm]:
- Some robots only index a sample of pages on a particular site (at least at any one time), and do not continue indexing until they complete all pages on a site in one session. In 1996 it was claimed that Alta Vista only indexed about 10% of the pages of moderately large web sites (600 / 6,000 pages in the example cited), and not denied by Alta Vista[26]http://www5.zdnet.com/anchordesk/talkback/talkback_11638.html]. Alta Vista now claims to index sites without any limit on pages.
- Well-behaved robots[27]http://info.webcrawler.com/mak/projects/robots/robots.html ] adhere to the robot exclusion standard[28]http://info.webcrawler.com/mak/projects/robots/exclusion.html], by which web servers tell robots which pages they may not index on a site. Because of the effects of some robots on server performance, and for other reasons, many servers exclude robots. All major search engines observe robot exclusions.
- There are some technical problems with frames and with dynamically created web pages that mean that cannot be included in web spider indexing.
- Some web spiders (including Alta Vista) only re-index some sites as infrequently as every three months, so there may be new pages added in that period that are not indexed.
Robot indexes contain too much `noise' It is difficult to make searches precise enough to find only legal materials using internet-wide robot indexes, because they index predominantly non-legal material. It is usually necessary to try to impose some ad hoc search limitation (in addition to the real search terms) such as `law or legislation or code or court' or some such, to try to stem the flood of irrelevant information (or more likely, to fool the relevance ranking into putting legally oriented material first).
Robot indexes are difficult to search for particular countries It is also difficult for most users to limit searches to materials concerning laws of particular countries[29], and failure to do so will usually result in the search being flooded with material from North America and other `content rich' parts of the internet.
Many significant law sites can't be searched When you do find a site containing valuable legal information it will often not have a search engine at all, so searching at word level is not possible. Of the more than 30 internet sites around the world containing significant quantities of legislation, less than half have any search engine. It requires considerably greater technical ability to run a search engine than it does to simply put pages of legal material onto the internet where they can be browsed.
Using different search engines can be confusing Even if a law site does have its own search engine, users who wish to find legal materials on different sites can also be easily confused by the need to use different search engines with different search commands.

So the problems of finding legal materials world-wide are that it is both difficult to find which useful sites exist for a particular country or subject, and also difficult to find what is on such sites as are known. These research problems are very substantial even for the most expert `internet savvy' lawyers and law librarians. They are much worse for inexperienced users.

[23] See http://www.austlii.edu.au/links/World/Indices/ for many examples.

[24] Such factors led to estimates in 1996 that even the largest internet-wide search engines only indexed about 20% of the estimated 150 million web pages. However, the most recent figures published by Search Engine Watch (at include claims by Alta Vista to index 100 M pages, HotBot 80 M, Excite 55 M and others less than that. The estimated total number of web pages is now in excess of 200 M, so, whatever the exact situation may be, it is still the case that no search engines can claim to index all pages on the web.

[25] See `How big are the search engines?' (Search Engine Watch) at and references linked therefrom, for detailed discussion of all these matters.

[26] The claim by John Pike, webmaster of the American Federation of Scientists, and the reply by Alta Vista are available at and discussed in `The Alta Vista Size Controversy' on Search Engine Watch at

[27] See Martin Koster `The Web Robots Pages' at for details of the operation of web robots

[28] See the `Robots Exclusion' page, dealing with both the standard and the Meta Tag for robot exclusion at

[29] For example, on Alta Vista, a search for Vietnamese legal materials requires a search which is limited to materials which are located on a server in Vietnam (the `domain:vn' delimiter) or contain `Vietnam or Viet Nam' - and this is still somewhat hit or miss.

[Previous] [Next] [Up] [Title]