[Previous]
[Next]
[Up]
[Title]
7. Desirable features of a new approach
These factors support approach taken by AustLII, and given its most extensive
testing to date in the development of the Project DIAL prototype.
This approach to reducing the problems of legal research on the internet rests
on these propositions:
- An intellectual index is essential to identify high value law sites and
legal resources, but cannot and should not aim to be comprehensive,
particularly in its depth of indexing particular sites once they have been
identified.
- Web spider indexing of remote law sites, and a sufficiently powerful
search engine, are necessary to provide the depth of search capacity that
intellectual indexing cannot provide;
- This is particularly so when many significant law sites do not have search
engines at all, and where there is no consistency among the search engines used
by those that do.
- Searching robot indexed sites will work much better if (i) only law sites
are indexed (to remove non-legal `noise' and improve precision); and (ii) such
sites are indexed comprehensively (to improve recall). We call such a web
spider and search engine dedicated to legal materials only a `targeted' web
spider.
- Significant law sites which normally exclude robots may allow a `targeted'
law-oriented web spider to index them, by request. The number of requests may
be manageable.
- A comprehensive intellectual index is needed to identify the law sites
worth indexing, and therefore to `target' the robot. The intellectual index
therefore serves the double function of a useful resource in itself, and the
essential means of `feeding' the search engine.
- Once a law-oriented web spider has created a searchable index of key law
sites, specific searches over that index for various types of subject matter
can be `embedded' in the intellectual index, thereby making the intellectual
index `self-updating' to a certain extent, and so reducing its maintenance
costs. Such `embedded searches' also cater for inexperienced users who have
difficulty in formulating searches.
The key to effective legal research on
the internet may therefore be a tight integration of an intellectual index and
a search engine based on a web spider, a symbiotic relationship in which each
builds on the features provided by the other[30].
AustLII personnel[31] have developed the
following software which has been used to implement this approach:
- internet indexing software (`Feathers') which allows remote updating by
multiple contributors to an index, full search facilities over the index
entries, and a facility to `target' a robot to fully index specific sites
identified in the index;
- a robot or `web spider' (called `Gromit'), and a `harness' or means of
controlling it (called `Wallace').
- a search engine (SINO), which has the full range of boolean and proximity
search commands, optional relevance ranking of search results, and a facility
for limiting the scope of searches to specific databases or collections of
databases; a new interface to SINO (`Shaun') has been developed for Project
DIAL[32]http://www.wallaceandgromit.com/].
Project
DIAL provides the first opportunity for extensive testing of the targeted web
spider. It will soon play a significant role in other aspects of AustLII's
future developments, both in relation to Australian legal materials and other
international materials such as an indigenous law materials and world library
of case law. Some of the research on internet law indexing is supported by
Australian Research Council grants.
A history of the development of this approach, and of the early stages of
Project DIAL, can be found in `Future-proofing a global internet index by a
targeted web spider and embedded searches'[33]http://www2.austlii.edu.au/~graham/Futureproof/indexers.html].
The rest of this paper provides examples of the more interesting aspects of the
project to date.
[30] Aspects of such an approach, in a
pre-Internet context, are explored in Greenleaf G, Mowbray A and van Dijk P
(1995) 'Representing and using legal knowledge in integrated decision support
systems - DataLex WorkStations' Artificial Intelligence and Law ,
Kluwer, Vol 3, Nos 1-2, 1995, 97-124
[31] In particular, Geoffrey King, Daniel
Austin and Andrew Mowbray
[32] The names `Wendolyn' and `Wensleydale'
have been reserved for future software developed for this project. For further
details see
[33] Australian Society of Indexers Annual
Conference 'The Futureproof Indexer' 27-28 September 1997, Katoomba, Australia -
[Previous]
[Next]
[Up]
[Title]