Project DIAL Report - Chapter 2 - Legal research via the Internet

Chapter 2 - Legal research via the Internet - Potential and problems

Conclusions concerning the availability of law on the Internet
Conclusions concerning existing legal research tools on the Internet
Conclusions concerning the desirable features of a new approach
Recommendation

Conclusions concerning the availability of law on the Internet

There is already an abundance of legal materials, including legislative materials, available on the Internet, if only it can be accessed effectively. This material includes significant legislation collections from over 50 countries (some of which are comprehensive), case law from over 20 countries with decisions from some major courts in the world being available within hours, large collections of law reform reports, the texts of up to 200 law journals, and a vast but as yet unquantifiable body research reports from specialist law research centres, legal academics and law firms.
There quantity of available legal information on the Internet can be expected to further expand at a rapid rate in the next few years, and it is reasonable to expect that a large proportion of it will continue to be available for free access as it is now. Even if more of it becomes `user pays', users will still need to locate information in order to purchase it, and vendors will encourage tools that facilitate this. The `world law library on the Internet' is a realistic description.
It is realistic to speak of the development of a `world law library' on the Internet, but it is one in which it is at present very difficult to find all of the available and relevant information. The Internet provides a unique opportunity for lawyers and legal researchers, particularly those in the developing member countries (DMCs) of the Bank who may not otherwise have access to international legal resources in print, to obtain affordable access to at least the basic elements of a world-wide law library.

Conclusions concerning existing legal research tools on the Internet

Existing intellectual indexes are inadequate and provide only very limited coverage of the available materials, and even the few that do provide substantial coverage lack important features such as an ability to search for entries.
Intellectual indexes are also inherently shallow, expensive to develop in any depth, and cannot provide the ability to search as comprehensively as automated indexes which allow word occurrence searching.
Internet-wide search engines (not specific to law) based on web spider technology do provide an ability to search at word-occurrence level for documents located across the Internet. However, their coverage of legal materials (or other materials) is not comprehensive.
Another problem with using Internet-wide search engines for legal research is that it is very difficult (particularly for inexperienced searchers such as are likely to be found in the audiences for this project) to formulate searches which are specific enough to remove the `noise' of non-legal usages of search terms, or which limit the items retrieved to those concerning a particular country. Users risk not being able to find relevant items because of the bulk of irrelevant items retrieved.
Many valuable law sites, particularly in developing countries, do not have their own search engines and so cannot be searched at word-occurrence level even when located.
Even where a particular site has been found by a user and does have a search engine, the proliferation of different search engines that a user must master, perhaps for only occasional use, is likely to discourage use and result in poor quality searching.
Centralised collections of world-wide legal information, such as GLIN, do not at this stage provide a solution to the problems of Internet legal research, but may well be complementary to the approach taken by Project DIAL.

Conclusions concerning the desirable features of a new approach

The above conclusions support to the approach which has been taken in the development of the Project DIAL prototype, and in related facilities developed for the prototype host, AustLII. This approach to reducing the problems of legal research on the Internet rests on these propositions:

An intellectual index is essential to identify high value law sites and legal resources, but cannot and should not aim to be comprehensive, particularly in its depth of indexing particular sites once they have been identified.
Web spider indexing of remote law sites, and a sufficiently powerful search engine, are necessary to provide the depth of search capacity that intellectual indexing cannot provide;
This is particularly so when many significant law sites do not have search engines at all, and where there is no consistency among the search engines used by those that do.
Searching robot indexed sites will work much better if (i) only law sites are indexed (to remove non-legal `noise' and improve precision); and (ii) such sites are indexed comprehensively (to improve recall). We call such a web spider and search engine dedicated to legal materials only a `targeted' web spider.
Significant law sites which normally exclude robots may allow a `targeted' law-oriented web spider to index them, by request. The number of requests may be manageable.
A comprehensive intellectual index is needed to identify the law sites worth indexing, and therefore to `target' the robot. The intellectual index therefore serves the double function of a useful resource in itself, and the essential means of `feeding' the search engine.
Once a law-oriented web spider has created a searchable index of key law sites, specific searches over that index for various types of subject matter can be `embedded' in the intellectual index, thereby making the intellectual index `self-updating' to a certain extent, and so reducing its maintenance costs. Such `embedded searches' also cater for inexperienced users who have difficulty in formulating searches.

The key to effective legal research on the Internet may therefore be a tight integration of an intellectual index and a search engine based on a web spider, a symbiotic relationship in which each builds on the features provided by the other.

Recommendation

The Bank should consider a further Technical Assistance to provide better access to legal information via the Internet to Bank DMCs because (i) the current and potential value of the legal information available via the internet is clearly very high, and of particular value to DMCs; and (ii) the existing tools for legal research on the Internet are inadequate, but methods to improve them have been identified, and these improvements will create effective access for DMC users.

[Previous] [Next] [Up] [Title]