Existing commercial retrieval systems still use the first generation of a technology that is thirty years old. While that technology may have been impressive before the invention of the microcomputer, most full text retrieval systems would today be classified as `user hostile'. Few lawyers have found the benefit of these techniques so compelling as to want to use them constantly. In part this may be due to the price of the services, and to the lack of sufficiently extensive databases, but there is also a constant refrain, even among experienced users, that the software is too difficult to use.
However, there is no consensus as to what form the `second generation' of retrieval software should take. This Chapter outlines some criticisms of existing retrieval systems and improvements that have been suggested.
Other technologies which are related to but go beyond `information retrieval' are also mentioned. New retrieval technologies, expert systems, `smart books', CD-ROM and electronic mail will change computerised legal information beyond recognition.
Bing comments that `Boolean retrieval is by far the most widely used strategy', and
The strength of the retrieval strategy is its flexibility, and the possibility it offers for experienced users to construct complicated and well performing requests. In principle, high retrieval performance is always possible.
The drawback ... is the high demands posed to the user ... An inexperienced user will have difficulties in exploiting the possibilities, and a request may therefore easily have a structure which is different from the one intended by the user (for instance the user mixes up the ORs and ANDs, or is unaware of the sequence in which they are executed). (J Bing (Ed) Handbook of Legal Information Retrieval , North Holland 1984, p163)
A recent Australian commentator on the use of STATUS has commented
on the simple searches carried out by most users of a Hansard database
in similar but more colourful terms:
... at training courses we often see eyes start to glaze over at the mere mention of Boolean logic. The concepts of algebraic precedence and the use of parentheses to enforce it then finishes the job. So, when asking questions such users tend to avoid the issue by adopting a stepping-stone approach to their enquiry ... (J Gouldstone, `How users really use databases' Information Online 87, Proceedings of the Second Australian Online Information Conference, Library Association of Australia 1987)
Some research claims that existing computerised retrieval systems are superior to manual research (See, for example, M Iosipescu and J Yogis A Comparison of Automated and Manual Legal Research, Canadian Law Information Council, Ottowa, 1981), whereas other researchers have been very critical (See, for example, D Blair and M Maron, `An evaluation of retrieval effectiveness for a full-text document-retrieval system', (1985) Vol 28 No 3 Proceedings of the ACM).
Relevant | Not relevant | |
Retrieved | a | b |
Not retrieved | c | d |
`Recall' is the ratio of a to (a + c), or the proportion of relevant material retrieved to all relevant material retrievable. Failure to retrieve relevant material (ie c) lowers this ratio.
High recall and high precision (ie each as close to 1:1 as possible) are both desirable. There seems to be no optimum relationship between precision and recall. However, it is often observed that there is an inverse relationship between the two, with high recall resulting in low precision, and vice-versa (see Blair and Maron op cit and references cited therein).
IBM's STAIRS retrieval system was used. Experienced para-legals constructed the queries in conjunction with the lawyers working on the case, often refining a query through a number of attempts. The lawyers had specified that, for them, an acceptable level of recall required that at least 75% of all relevant documents must be retrieved. They were less concerned with precision. Their subjective assessment of the results of the queries was that this level of recall had been achieved -- that is, they were happy with what they retrieved.
All of the documents were then analysed according to relevance, and the actual precision and recall measurements made. To the great surprise of the lawyers concerned, it was found that the value of recall was only 0.2 (20%), whereas precision was 0.75 (75%). They believed that they were retrieving more than 75% of all relevant documents, but in fact were only retrieving 20%.
Blair and Maron concluded that the main assumption underlying automatic indexing was incorrect, namely that the likely users of these systems can `foresee the exact words and phrases that will be used in the documents they will find useful, and only in those documents'. In other words, they question whether, in practice, likely users can capture concepts in word-occurrence combinations. If not, it is irrelevant whether it is possible to do so in theory. They suggest that if high recall is desired, manually assigned index terms (`intellectual indexing') is necessary.
There are a number of reasons why we should exercise caution before extrapolating this result into a general condemnation of full text retrieval systems.
First, the experiment was on a litigation support application, where one would expect to find a database consisting of a wide variety of document types, ranging from letters between parties, affidavits of witnesses, technical documents, and formal court documents. Such documents will have no consistent structure, so that that search functions based on specific sections of documents will be unavailable. Nor will such documents have any regularity in style or precision in use of language, such as one can reasonably expect to find in the formal style of reported cases or, a fortiori, in legislation. Such formality must make it easier for a lawyer who is sensitive to judicial or statutory language to predict what words will be used in relation to which concepts in these texts (See P Flass's letter in 28(11) Comm ACM, 1985 for a similar point).
It is also questionable whether to have paralegals translating lawyers' requests is the best model of legal information retrieval, whether a manually-constructed index would have behaved any better, and whether at least some of the fault lies with the lack of adequate ranking functions in STAIRS, rather than with automatic indexing per se.
The sensible but unsurprising conclusion is that, if it is possible on pragmatic grounds such as cost to include both full text (with automatic indexing) and abstracts, catchwords or indices (intellectual indexing), then retrieval performance will be enhanced. Both are needed to overcome the limitations of each.
Most online retrieval systems are command-driven, and present the user with a prompt (such as >) when another command is expected, but no suggestion as to what commands are possible or desirable at that point. This was the norm for all programs until the mass-marketing of computer programs from the late 1970s. However, most microcomputer users now expect a greater degree of `user friendliness' , such as on-screen menus of commands which indicate which commands are possible at that time. As Bing says, the relative user-friendliness of text retrieval systems has deteriorated. In part this is due to the restrictions imposed by communications protocols and speeds, but these problems could be overcome to some degree by local interface software running on the user's computer. The LEXIS communications software goes some way toward this goal.
Help facilities are usually passive, waiting for the user to invoke them. Error messages in most systems are extremely cryptic, of the `Command not valid' variety. Bing points out that what is needed is `active intervention of the system. The system should monitor the dialogue, and butt in with advice when detecting something is wrong.' LEXIS does this to some extent.
Other problems (Discussed in J.Bing `Legal text retrieval systems -- the unsatisfactory state of the art' (1987) Vol 2 No 1 Journal of Law & Information Science) which could be lessened by such active system intervention include:
A by-product of this `identity' function, as Bing calls it, is that logical and positional connectors provide no method of ranking those documents which do satisfy a search request in terms of how likely it is that they will be relevant to the user's request. A document either satisfies the request or it does not.
A number of alternative retrieval methods are being developed based on `nearness' functions, rather than `identity', in order to counter both of the above problems (For the Norwegian approach of `conceptor based retrieval', implemented in NOVA*STATUS and SIFT, see J.Bing `The law of the books and the law of the files' (Part 1) (1987) 54 Computers and Law 31). Fairly simple word-frequency ranking was included in the Canadian QL system, and is included in the STAIRS retrieval system, but has not proved to be sophisticated enough.
The approach taken by STATUS with IQ may be summarised briefly as follows (See D L Pape Status with IQ User's Guide, Computer Power Applied Research and Development Division, Canberra, 1985).
STATUS with IQ has not yet been included as an alternative search method on the CLIRS or SCALE databases. It has been tested with a database of High Court cases.
Such an approach could require intellectual or `conceptual' indexing of legal materials, although Norwegian research suggest that it may be possible instead to construct a `norm-based thesaurus', a conceptual model of an area of law, with groups of terms associated with each conceptual node in the structure, but with no intellectual indexing of the actual documents. The norm-based thesaurus would suggest additional search terms to users and, by utilising user feedback, could be self-maintaining (See Bing `The text retrieval system as a conversion partner' op cit; J Bing `Designing text retrieval systems for "conceptual searching"` Proceedings of the First International Conference on Artificial Intelligence and Law, ACM, Boston, 1988).
There are many other research projects in this area at present (See J Bing `Designing text retrieval systems for "conceptual searching"` ibid for a summary of some of the most important; see also the papers by Tong et al, Hafner, Dick and Belew in those Proceedings), but none involve commercial implementation with the exception of the Italian ITALGIURE system.
The CCH ACCESS system will be run by CCH on its own network, and will use a combination of retrieval software on the host computer, and interface software on the user's own PC. The retrieval software will allow both Boolean retrieval and menus based on the normal CCH numbered tables of contents and paragraphs. The key to the system, however, will be that all paragraphs will be coded to cross-refer to the statutory provisions, caselaw and other commentary referred to in that paragraph (See Sperling, 1987, pp 3-4).
It will therefore be possible to have immediate and automatic access to the primary materials to which a commentary refers, and vice versa . This approach will constitute an integration of primary and secondary databases on a topic. A similar approach has already been taken in the development of some legal expert systems.
Such an approach differs from existing information retrieval techniques in that it requires considerably more complex and labour-intensive coding or `marking up' of the data to be included in the retrieval system.
The second element in the CCH approach will be the interface software on the user's PC. It will make extensive use of user-manipulable windows to allow different categories of data (eg commentary and statutes) to be viewed simultaneously, and for note-taking while on line. In other words, it will apply what by now are conventional microcomputer techniques to the creation of a more friendly and responsive information retrieval interface than remote systems can yet provide.
CCH Australia doers not expect to offer an online CCH ACCESS system `in the immediate future', but considers that the software being developed is what is needed for searching CD-ROM disks, and that it may be offered in this way in future (id p 8).
CCH's proposals do not seem to involve any startling computing innovations, but appear to be very sound.
The only notable exception in Australia has been the sale of `precedent packages' collections of useful forms and precedent documents in such areas as conveyancing and litigation (S Lewis Australasian Legal Software Directory, Legal Management Consultancy Services, Sydney, 1987, contains details of some packages). Precedents require relatively small disk storage, they only require word processing software, which lawyers require in any event, and they only require intermittent updating.
New storage technologies are likely to change completely the economics of such distribution of computerised legal information.
Reference works are starting to become available on CD-ROM. The twenty volume Grolier Encyclopedia is available on a CD-ROM for under $400.
The Index to Legal Periodicals is available as a database on the LEXIS, Westlaw and Wilsonline online retrieval services in the United States, and is therefore searchable by LEXIS users in Australia. It is also available on CD-ROM from the company operating Wilsonline. The data is kept current in two ways. The CD-ROM is replaced quarterly by an updated cumulative disk. Purchasers of the CD-ROM are offered free search time on the online version of the database on Wilsonline, although they must still pay telecommunications charges.
The Legal Resources Index, the other principal legal bibliographic reference work, is also available as a database on LEXIS (LGLIND), Westlaw and other online services. It is also available as a videodisc called LEGALTRAC, available from Information Access Company, the publishers of the Legal Resources Index . Videodisc is a technology requiring more complex equipment than CD-ROM, but comparable in its storage capacity. The LEGALTRAC system does not use a full text retrieval system. Searches are based on a subject index (L Scott Rawnsley `Making Tracs: Road Testing the INFOTRAC and LEGALTRAC video-Disk Databases' (1986) Vol 6 Nos 3/4 Legal Reference Services Quarterly 168).
No Australian legal materials are yet available on CD-ROM.
The Weldon-Hardie Group of companies, publishers of the Macquarie Dictionary, have announced that a subsidiary company, Megaword, is to market a Book Reader. It is to be battery powered, book size, with a backlit screen the size of a page, and six keys to control retrieval functions. Each `book' comes on a credit card sized storage device. The retrieval system involves a concordance, so search techniques similar to those in online retrieval systems will be possible. (see Online Currents Vol 2(6), 1987, p1).
A prototype smart book using -- what else but? -- the Bible was demonstrated in 1987. CCH Australia has announced that the second smart book to be released will be its Master Tax Guide 1989 . Many would see this as an appropriate encore.
Online Services or optical discs or CD-ROM will never replace books... because as yet neither form is portable. You are tied to a computer and/or telephone. You can't take one to bed, or read on the train. (Editorial Online Currents Vol 2(6), 1987).
A legal expert system is a computer program that gives legal advice. Whereas a database program merely retrieves information which is potentially relevant to an individual legal problem, leaving its application to specific problems to the user, an expert system applies information to the specific problem. Databases store the `raw material' on which legal advice and decisions are based: cases, statutes and `textbook law'. `Knowledge bases' store legal information in a more highly structured form which also represents the interrelation between the different items of information, and allows it to be applied to individual problems. A legal `knowledge base' is an attempt to represent the structure of an area of law.
SSES was developed by P Johnson (a lawyer) and D Mead (a programmer), initially for the A.C.T. Welfare Rights Centre. Distribution to other welfare agencies is now being organised. The SSES application was developed using a `shell' called Knowledge Base System Management System (KBMS-1) written as part of the project in the logic programming language, Prolog. The shell may be used to produce applications in other areas of law. See D Mead and P Johnson The Social Security Enquiry System -- Background Paper, the authors, January 1988.
Both the SSES and DataLex projects received substantial assistance from the Law Foundation of New South Wales, which has shown a consistent interest in the development of computerised legal information services.
The Tax Breakdown of Termination Payments program calculates taxable amounts of termination payments. The Tax Return Package prepares tax returns in a form approved by the Australian Taxation Office. The Assets Register and Depreciation Package records details of an organisation's assets and produces an assets register listing those assets and schedules showing depreciation expenses against individual assets. The Fringe Benefits Tax Package prepares FBT returns and answers questions on whether or not a benefit is subject to FBT.
CCH provides a `Hotline', a telephone advice service to assist with the use of Solvware.
Once created by an expert in a particular field, these `templates' may be used by less expert users to create documents for their own transactions. The marketing of templates has recently commenced in Australia, but few good templates are yet available.