[Previous] [Next] [Up] [Title]

8. World Law Index - the intellectual index

World Law Index, the intellectual index or catalogue aspect of World Law has a `Yahoo-like' interface. It is searchable for both categories and individual entries. The editing interface is via the web, so contributing editors can be located anywhere with web access.

Automated translation of pages

All pages in the index have a [Translate] button that takes the user to Alta Vista's automated translation service, provided by Systran translation software and ensures that the Systran page has inserted in it the correct URL for the DIAL page that the user was just viewing (in the example below, the DIAL Index page). The user then only has to select to which language the DIAL page is to be translated, press the `Translate' button, and then be returned to the DIAL page translated into the language of choice.

The resulting translation seems adequate to convey the meaning of most of the items on the page.

The DIAL Index page translated automatically to French by Systran

The Alta Vista/Systran translation facility is at present limited to translations from English to French, Spanish, Portuguese, Italian, or German, and vice-versa. This translation facility is also only a prototype, and sometimes has inadequate processing power to translate very long pages. It is also not recommended to use it to translate documents with complex grammar, or where accuracy is vital (such as legislation). However, for pages such as menus, or lists of search results, it is usually extremely helpful. The result for a world index is revolutionary: instead of being an `English only' facility, it is now effectively available in six of the most pervasive European languages.

Targeting the web spider from the index

Editing entries in the index also involves the editor deciding whether to send the Gromit web spider to index every word on the site which has been `targeted'. The harness program (Wallace) reads the list of instructions from the web indexing software, and then sends of multiple instances of the web spider program, each to download the content of a particular web site. The harness program ensures that only one instance of the web spider software is ever downloading from a particular site, to avoid saturating that site with spider requests and denying access to other users. The harness ensures that the web spider is 'well behaved', causing minimum impact on the sites from which it downloads web pages. We call this a targeted web spider, as is not designed to traverse the web generally, its downloading being limited to the site specified in the original URL specified when it is invoked.

Targeting the web spider to start indexing at the correct page, so that it when it indexes all other pages to which it its starting page is directly or indirectly linked, but are equal to or below the start page in the server's file hierarchy, it indexes all and only the desired pages, is a complex task. Some desired sets of data cannot be indexed because of the `noise' they will bring with them. For others, it is impossible to find an appropriately located `table of contents' page to use as the `start page'. Other `problems of targeting' have also been identified[40]http://www2.austlii.edu.au/~graham/Futureproof/indexers-6.html#Heading22]

The `New Additions ' page

Users can see at any time what content has been added recently to World Law Index (and to World Law Search) by checking the `New Additions' page from the World Law home page or from the [New] button on the button bars in the system. below means that Gromit has been sent to index that site.

The `New Additions ' page - http://www.austlii.edu.au/links/new.html


[Previous] [Next] [Up] [Title]