Reading Guide: Hypertext and Retrieval
3. Text retrieval - principles and evaluation

[Previous] [Next] [Title]
[This Part is complete for 2000]

3.1. Introduction

3.1.1. What is text retrieval?

'Text retrieval' (also sometimes know as 'full text searching' or 'free text searching') refers to computerised systems which allow users to find particular combinations of words in large bodies of text whether or not those texts have any uniform structure.

Howard Turtle (see below) gives a more technical (and accurate) definition ' a retrieval system applies some matching function to the representation of the information needs and the representation of each document to determine which documents to retrieve'.

Definitions

For those unfamiliar with text retrieval, the following glossaries may be helpful, either as a starting point for reading, or to come back to later:

3.1.2. Basic principles - concordances and 'exact match models'

In the late 1950s, the basic technique underlying most computerised searching of large bodies of text was developed, variously know as the 'concordance', 'inverted file' or 'word occurrence index'. In summary, it involves the construction of an alphabetical list of every different word in each document in a set of documents, with the locations of each occurrence of that word recorded next to it. 'Searches' of the documents is in fact carried out over this 'concordance', not over the actual texts. (See a short history of text retrieval in law for more details.)

There seem to be few explanations of concordances or the underlying operation of search engines available on the web. The following reference, although it is old (1988), does explain the essentials of concordance-based searching:

Graham Greenleaf, Andrew Mowbray and David Lewis Retrieval Techniques (extract from Chapter 2 '`Basic principles of legal information retrieval' Australasian Computerised Legal Information Handbook Chapter 2, Butterworths, 1988).

The following points should be noted:

The text retrieval system described in this book uses a relatively complex concordance five place concordance. Many text retrieval systems in use on the internet use far less complex concordances, sometimes only recording the number of a document in which a word is found, but not the location of the word within the document (much less the location of paragraphs or sentences).

Other references

Technical sources

3.2. Evaluation of text retrieval performance

How do you determine the effectiveness of a text retrieval system? Although Jon Bing says "The science of information retrieval lacks a comprehensive theoretical foundation', a lot of experimental effort has been made, with 'recall' and 'precision' as the two main measurements of quality of retrieval results.

3.2.1. Precision and recall

The effectiveness of full text retrieval is often measured in terms of precision and recall. The following table illustrates this. Assume that there are 100 documents in a collection being searched.The search retrieves 10 documents, only 4 of which (after inspection) are found to be relevant. However, after inspecting all the other 90 documents, we find that there are 2 relevant documents not found by the search.

Relevant

Not relevant
Total
Retrieved
Hits

a = 4

False drops

b = 6

a + b =10

Not retreived
Misses

c = 2

Dodged

d = 88

c + d =90

Total
a + c = 6
b + d = 94
a+b+c+d = 100
Precision and recall table (derived from Miranda Pao - reference below)

Ideally, both precision and recall should be as close to 1 /1 ( ie 1 or 100%) as possible. We want to retrieve all relevant documents (perfect recall) and no irrelevant documents (perfect precision).

See Precision and recall (in Greenleaf Mowbray and Lewis cited above)

For a lengthier discussion, see Dabney IV. Key Ideas in the Evaluation of Document Retrieval Systems (in Dabney cited below)

3.2.2. Relationship between precision and recall - the retrieval problem

The principal problem in the science of information retrieval is caused by the simple fact that no information retrieval system delivers both perfect recall with perfect precision.

Many who have studied information retrieval have alleged that there is inverse relationship between precision and recall: the more we do to improve precision, recall drops; the more we do to increase recall, precision drops. Among others, this is asserted by Pao (reference below, p12), and suggested by Blair and Maron (reference below). Summarising 30 years of information retrieval research, Pao says `recall and precision are bound to lie within the 40 to 60 percent range'; but says that the inverse relationship is often noted but not proven.

This leads to two main issues:

3.2.3. `Recall devices' and `precision devices'

Bing classifies the search features of text retrieval systems as `recall devices' or `precision devices' (Jon Bing in an extract in LAWS 4609 Course Materials 1996 #3 p37 at the Law Reserve Desk) depending on whether their use enhance precision or recall. For example, truncations and thesaurii enhance recall, as does use of synonyms (and the OR connector), whereas use of the AND connector, and limiting a search over specified databases or parts of databases (eg headnotes) improves precision.

3.2.4. Further references

The following materials are not available on the internet but provide valuable discussion:

3.3. Arguments about text retrieval effectiveness in law

3.3.1. Blair and Maron's experiment

D Blair and M E Maron `An evaluation of retrieval effectiveness for a full-text document retrieval system' (1985) 28(3) Communications of the ACM 289 (see LAWS 4609 Course Materials 1996 #3 p38 at the Law Reserve Desk for an extract) is the most famous and controversial article in the field of text retrieval and law. This is simply because if is one of very few large-scall empirical studies in text retrieval, and just about the only one related to legal information. Blair and Maron attacked conventional wisdom, by arguing that recall was much lower than the 75% recall that users of this system (lawyers) demanded and though they were getting (because that is when they stopped searching). They were in fact only getting 20% recall, but with 75% precision. This gives some support for the alleged inverse precision/recall ratio.

Some additional comments on the experiment:

It is possible that Blair & Maron's experiemnt is mainly significant in showing that that Boolean retrieval or `exactness' retrieval (as Bing calls it) without a relevance ranking method gives psychological encouragement to attempt to reduce large potentially relevant sets by inappropriate means. The use of relevance ranking to overcome the deficiencies of Boolean retrieval will be discussed in later parts of this topic.

Additional refernces

3.3.2. What retrieval results do lawyers require?

There is considerable dispute, prompted to a large extent by Blair and Maron's experiment, over what lawyers do actually want and demand from computerised retrieval: do they prefer to maximise recall or precision. Dabney and Burson have contributed to this debate.

Daniel Dabney 'The Curse of Thamus' - see 'Ramifications for the Users of CALR Systems' in 'The Curse of Thamus: An Analysis of Full-Text Legal Document Retrieval' (1986) 78(5) Law Library Journal 5-40 (was on Yale Law School web site - No longer on the web)

Some notes on Dabney's approach (GG):

Scott F Burson `A reconstruction of Thamus: comments on the evaluation of legal information retrieval systems' (1987) 79(134) law Library Journal (see LAWS 4609 Course Materials 1996 #3 p94 at the Law Reserve Desk)

Some notes on Burson's approach (GG):

As a result of these discussions, it is clear that the supposed inverse relationship between precision and recall poses a very difficult dilemma for the use of text retrieval systems in law. It may well be that the use of relevance ranking systems, either in conjunction with boolean retrieval or separately from it, provides the way out of this dilemma. This will be discussed in the subsequent readings on legal research via the internet, where the greatest use of relevance ranking systems has been made.

Further references

3.4. Relevance ranking

One of the most important innovations in text retrieval, which can be used both as an alternative and an enhancement to boolean retrieval, is relevance ranking systems . They are also sometimes called 'best match' or 'probability models' (Turtle) or a 'statistical interface' (Feldman). Relevance ranking is also discussed in Part 5 of this Reading Guide in the context of internet legal research, where it has had the most effect.

3.4.1. Essential features - inverse document frequency and within document frequency

The common feature of the various approaches to relevance ranking is the attempt to rank documents by the probability that they will be relevant to the query, such that the most relevant document is first in the list of retrieved documents, the next most relevant is displayed second, and so on.

The simplest relevance ranking system would be to count the number of times each search term occurred in a document, calculate the sum of all search terms occurring in the document, and list the document with the highest total as 'most relevant'. This would be a very crude measure.

Howard Turtle 'Text retrieval in the legal world' Parts 5.5 and 5.8 in Artificial Intelligence & Law, (1995) Vol 3 Nos 1-2 discusses the basic elements of relevance ranking systems.

The main question is how to calculate the 'significance' of a particular occurrence of a word in a document, as a means of assessing the relative likely relevance of the document in which it occurs. The second question is how to compound the measures of significance of each occurrence of each search term so as to give an overall measure of significance of a document.

Turtle says that two measures of word occurrence significance are used in most relevance ranking systems:

(i) Inverse document frequency = the number of occurrences of a term in whole database divided by the number of words in the whole database. This means (very roughly) that terms which occur in relatively fewer documents in the whole collection are given greater weight (also called the 'discrimination value' of the term).

(ii) Within document frequency = the number of occurrences of a term in a document divided by the number of words in the whole document. Therefore (again in rough terms) a word that occurs a lot in a short document gets a high score on this criterion.

One approach is then that the weighting to be given to an occurrence of a term in a particular document is the product of these measures for that term. As a result, for example, a term that occurs a lot in a short document (and so has high within document frequency), but doesn't occur very often in the whole database (and so has a high inverse document frequency) will get a very high overall weighting.

Finally, each occurrence of a search term in a document is multiplied by its appropriate weighting. The measure of relevance of the whole document would then be the sum of the weights of each occurrence of each search term in the document. So, a document which has many occurrences of search terms which have high overall weightings will be regarded as a document high in relevance.

There are many measures of relevance implemented in search engines more complex than this simple one.

3.4.2. Example in an internet system - AustLII's relevance ranking

AustLII implements relevance ranking in its SINO search engine in two ways (selected from the 'Find' setting), as both an alternative to boolean searching and as an enhancement to it: The availability of relevance ranking can alter a user's search strategy using boolean retrieval. It is often wise to use a broad boolean search in order to maximise recall, relying upon the relevance ranking to supply the precision that the boolean search lacks.

There is a brief discussion of how popular internet search engines rank web pages in How Search Engines Rank Web Pages (on Search Engine Watch).

3.4.3. Effect of relevance ranking on evaluation of search results

If a search engine uses relevance ranking in any way, it is no longer possible to evaluate its results using simple measures of recall and precision. For example, if the first 50 documents retrieved by a relevance ranked search are all or most of the relevant documents in a database, it doesn't much matter that a search retrieves 100 documents but the last 50 were not very relevant, because the user can just keep working down the list until the degree of relevance is no longer high enough to warrant their continuing reading additional items.

Turtle (see 3.2 in his article) suggests methods of constructing a precision / recall curve, which shows recall values at pre-defined recall points (eg from 0.1 to 1.0 of all relevant documents). One way of thinking about such a curve is that it measures such things as 'if a user stops browsing when precision drops below 50% (ie only one in two documents is regarded as relevant), does this occur when recall of only 30% of relevant documents has occurred, or not until 90% of relevant documents have been recalled?

3.4.4. Examples in pre-internet systems - WestLaw's WIN and Lexis 'Freestyle'

Westlaw's WIN and Lexis' 'Freestyle' were two pre-internet examples of so-called 'natural language' searching, which were early implementations of relevance ranking to legal documents. In the LAWS 4609 Course Materials #3 at the UNSW Law Reserve Desk there are extracts from search manuals - WestLaw 'Retrieving documents with WIN - WestLaw is natural' (Chapter 5 of Introduction to WestLaw (1993), WestLaw) at p105 and Lexis 'Freestyle' searching (Information from Lexis help screens) at p110.

For discussion of evaluations, see:

3.5. Integration of hypertext and text retrieval

The purpose of these extracts is to act as an introduction to how hypertext and text retrieval may be combined in unusual ways to produce a result where 'the sum is greater than the parts'. In section 5 following, dealing with legal research on the internet, there is a similar theme in how it is necessary to use 'intellectual indexing' (which are essentially structures created using hypertext) in addition to full text retrieval, in order to get the best results.

3.5.1. The 'DataLex' approach to integrating hypertext and text retrieval

The main reading for this section is extracts from Graham Greenleaf, Andrew Mowbray and Peter van Dijk 'Representing and using legal knowledge in integrated decision support systems: DataLex WorkStations' Artificial Intelligence and Law Kluwer, Vol 3, Nos 1-2, 1995, 97-124; This pre-AustLII paper outlines an approach to the integration of hypertext and text retrieval in legal applications, much but not all of which has subsequently been implemented on AustLII. In a later reading guide, the integration of both these technologies with inferencing systems is discussed.

It is only necessary to read the parts listed below at this stage.

3.5.2. Relative costs of different technologies

In Paquin, Blanchard, and Thomasset `Loge-expert: From a legal expert system to an information system for non-lawyers' Proceedings of the 4th International Conference on Artificial Intelligence and Law, ACM Press 1991, p254 (copy available at UNSW Law Reserve desk in LAWS 4609 Materials #3 (1996) Theories of computerising law: hypertext and text retrieval at p20 - ignore the expert systems aspects for the moment).

This article contains one of the few attempts to compare the benefits and costs of using text retrieval, hypertext and expert systems to computerise legal information.

Diagram of relative costs and effectiveness of different forms of computerisation of law (following Paquin, Blanchard, and Thomasset 1991)

3.5.3. Additional reading

3.6. Other approaches to text retrieval in law

There are many enhancements to basic boolean ('exact match') retrieval systems, and quite a few alternative approaches to boolean retrieval. Some have been tested for legal information retrieval.

Two surveys of the various approaches that have been taken to text retrieval in law are:

3.6.1. Howard Turtle's survey - 'Text retrieval in the legal world'

Howard Turtle 'Text retrieval in the legal world' Artificial Intelligence & Law, (1995) Vol 3 Nos 1-2 (see LAWS 4609 Course Materials #3 at the UNSW Law Reserve Desk) is one of the most systematic surveys of pre-internet approaches to legal information retrieval. It is often quite technical but is recommended highly. Howard Turtle played an important role in developing the WIN (WestLaw is Natural) relevance ranking retrieval system for WestLaw.

3.6.2. Erich Schweighofer's survey - 'The Revolution in Legal Information Retrieval'

Eric Schweighofer ''The Revolution in Legal Information Retrieval or: The Empire Strikes Back' - 1999 (1) The Journal of Information, Law and Technology (JILT). Schweighofer's paper is for the most part less technical than Turtle's. Among the many projects summarised and discussed, Schweighofer includes summaries of:


[Previous] [Next] [Title]