- 2. The problems of finding law on the internet

2. The problems of finding law on the internet

Two types of tools - `Intellectual' and `robot' /automated indexes
Why legal research on the internet is difficult

Despite the abundance of valuable legal materials already on the web, and the rapidity with which these materials are expanding, these materials are often very difficult to find, since they are scattered across thousands of web sites located all around the world.

Two types of tools - `Intellectual' and `robot' /automated indexes

There are essentially only two types of tools which help users find legal materials on the internet, `intellectual' indexes and `robot' or automated indexes.

* 'Intellectual' indexes, where individual web sites are classified by hand according to various classificatory schemes. Usually, such indices only provide the title, URL[3] and perhaps a brief description of each site indexed. Yahoo![4]http://www.yahoo.com/] is a well known example of a general intellectual index of the web (ie one which is not law-specific).

* `Robot' / automated indexes, where a program (variously called a `web robot' or `web spider') traverses the web, downloading every page it encounters, so that every word on every page can be indexed by a remotely located search engine. When the search engine displays a URL as a result of a search, that URL is to the original site, not to a mirror on the remote site. Alta Vista[5]http://www.altavista.digital.com/] is perhaps the best known general example of such an `internet-wide' search engine that searches an index created by a web spider. The principle advantage of this approach that it is possible to search every word that has been indexed, not just the titles and brief summary of what is on the site.

Why legal research on the internet is difficult

Despite the existence of these research aids, finding legal information on the internet is difficult, for at least the following reasons:

Intellectual indexes are hard to maintain As the quantity of legal material on the internet grows, the sites that contain significant legal information grows so numerous, and some of sites are so large, that it is difficult to maintain intellectual indexes, at least with any depth of indexing of each site. The best that can be hoped for is that sites with significant legal materials are identified in the index, even though there is no detailed description of their content. For example, it soon becomes impossible to include in an intellectual index the content of each piece of legislation, each case, or each journal article included on a large site.
Good intellectual indexes for law are hard to find While there are many multi-country intellectual indices to law on the internet[6]http://www.austlii.edu.au/links/World/Indices/ ], none are even remotely comprehensive, and many are US-oriented with a slight international gloss. Some very good indices do exist for particular countries (eg Canada, the USA, Germany and Australia), and for some subject matter areas, but there are few of them and they are often difficult to find from the multi-country indices. It is therefore difficult to find a good place to start! The coverage of legal materials in general-purpose internet indexes is no more helpful, as an inspection of the paltry coverage of legal materials in an index such as Yahoo! (the largest general-purpose index) will show.
Robot indexes are not comprehensive There are very good internet-wide robot indexes, such as Alta Vista, but they are not as comprehensive as people often assume. For example, Alta Vista apparently only indexes about 600 pages of even the largest web site[7]. Furthermore, well-behave robots adhere to the robot exclusion standard, by which web servers tell robots which pages they may not index on a site. Because of the effects of some robots on server performance, and for other reasons, many servers exclude robots. Such factors lead to estimates that even the largest internet-wide search engines only index about 20% of the estimated 150 million web pages.
Robot indexes contain too much `noise' It is difficult to make searches precise enough to find only legal materials using internet-wide robot indexes, because they index predominantly non-legal material. It is usually necessary to try to impose some ad hoc search limitation (in addition to the real search terms) such as `law or legislation or code or court' or some such, to try to stem the flood of irrelevant information (or more likely, to fool the relevance ranking into putting legally oriented material first).
Robot indexes are difficult to search for particular countries It is also difficult for most users to limit searches to materials concerning laws of particular countries[8], and failure to do so will usually result in the search being flooded with material from North America and other `content rich' parts of the internet.
Many significant law sites can't be searched When you do find a site containing valuable legal information it will often not have a search engine at all, so searching at word level is not possible. Of the more than 30 internet sites around the world containing significant quantities of legislation, less than half have any search engine. It requires considerably greater technical ability to run a search engine than it does to simply put pages of legal material onto the internet where they can be browsed.
Using different search engines can be confusing Even if a law site does have its own search engine, users who wish to find legal materials on different sites can also be easily confused by the need to use different search engines with different search commands.

So the problems of finding legal materials world-wide are that it is both difficult to find which useful sites exist for a particular country or subject, and also difficult to find what is on such sites as are known. These research problems are very substantial even for the most expert `internet savvy' lawyers and law librarians. They are much worse for inexperienced users. ²

[3] `Universal Resource Locator' or internet address of a web page

[4]

[5]

[6] See for many examples.

[7] Email from John Pike, webmaster of the American Federation of Scientists, quotes confirmation from Alta Vista that 600 is about the maximum for any one site.

[8] For example, on Alta Vista, a search for Vietnamese legal materials requires a search which is limited to materials which are located on a server in Vietnam (the `domain:vn' delimiter) or contain `Vietnam or Viet Nam' - and this is still somewhat hit or miss.

[Previous] [Next] [Up] [Title]