University of New South Wales - Faculty of Law - LAWS4610 Information Technology Law

`C y b e r s p a c e - l a w - 1 9 9 7`

G Greenleaf 'Privacy and cyberspace: An ambiguous relationship' 3 PLPR 88

Graham Greenleaf

This article was published in Privacy Law & Policy Reporter (Prospect Publishing), 3 PLPR 88, August 1996.

This is the first part of a series of articles (Pt I of `Privacy - lost in cyberspace?') surveying the privacy implications of the internet for users and service providers, starting with some general considerations.

Introduction
The inevitability of life in cyberspace

The pervasive network
The digital persona
Identification - the cyberspace / meatspace interface

`New' privacy problems in cyberspace

Search engines, robots and internet indexes

Web pages and Usenet news posts
Robot exclusion standards
Locator services: E-mail addresses, phone numbers and street addresses
`Generally available publications' and the Privacy Act

Between you and your browser

Introduction

It has been said with appropriate irony that `in cyberspace, everyone will be anonymous for 15 minutes'[1]. Cyberspace presents both an unexpected opportunity for private (and even anonymous) communications and transactions over distance, and the potential for a panopticon, surveillance more extensive than any previous form of social control.

This paper surveys some of the main elements of this relationship, starting with a look at some apparently new issues raised by the internet, and then by examining how existing privacy laws, particularly those dealing with interception and with privacy Principles, deal with cyberspace issues.

Throughout the paper, the term `internet service provider' or `ISP' is used in a rather loose fashion, to encompass a variety of functionally distinct (though sometimes overlapping) parties such as internet access providers, content servers, content creators, and even carriers. As a general principle, liabilities should only fall on those with appropriate functional responsibilities, but this paper does not seek to draw out fine distinctions on that point.

The inevitability of life in cyberspace

Whatever the 21st century reveals about the increasingly problematic status of life in outer space, there is no doubt that it will see life teeming in cyberspace. We need to start with a sketch, from a privacy perspective, of some key elements of cyberspace in order to understand its importance to the future of privacy (if there is one).

The pervasive network

`Cyberspace' is at least the internet and the non-TCP/IP networks connected to it, or its successor as the global information infrastructure, whatever that may be [2]http://www.anu.edu.au/people/Roger.Clarke/EscVel.html]. Irrespective of their level of computer literacy, education, interest or consent, everyone in at least the advanced industrial economies will spend a significant portion of time `in' cyberspace by the early years of the 21st century. People may not always realise that what they are doing is `on the internet', but the reality appears from a few simple factors. Transactions with business and government will much more commonly take place via information systems that are connected to the internet. Many tools that people use in their work will be connected to the internet as a means of distributing data to remote parts of an organisation. Inventory control systems, medical diagnostic equipment etc. These tools will tend to require information about who their user is, for security and accountability purposes. We will communicate with others in various public, semi-public and private ways via cyberspace, and obtain some portion of our entertainment from it.

The consequence is simply that vast quantities of personal information about all of us will be collected via a pervasive, world-wide-network (and stored on machines connected to it), whether we know or care, an event new in world history. The accessibility or interconnectedness of this information is contingent a lot of factors - including custom, public opinion and law - but is unlikely to be contingent on any serious technical considerations. Because the information will have been collected by processes related to one pervasive network, any impediments to it being found, published, or related to other data elsewhere on the internet are easily removed if those who control the information wish to remove them.

The past's great protectors of privacy: cost, distance, incompatibility, undiscoverability etc, are all disappearing in the face of the internet and its protocols - the great equalisers of the 21st Century.

The digital persona

So we will all have a digital persona, which Roger Clarke describes as 'a model of an individual's public personality based on data and maintained by transactions, and intended for use as a proxy for the individual' [3]http://www.anu.edu.au/people/Roger.Clarke/DV/AbstractDigPersona.html] - a representation in cyberspace of who and what we are. He makes a useful distinction between the `passive digital persona', the cumulation of details of our transactions and communications that are discoverable on the internet (our snail tracks), and the `active digital persona', the computerised `agents' of various types that actively affect what information the user receives or discloses (ranging from filters rejecting or classifying or replying to incoming mail, to 'knowbots' regularly trawling for information that the user wants).

We also need to distinguish between those parts of a person's digital persona which are in `public' spaces in the sense of being able to be found by internet search engines or other means, and those parts which are in non-public spaces, either `proprietary' (the databases of a government or company) or `personal' (information found only on the networked computers of the person the subject of the information, or those that person has provided it to, such as by e-mail).

An important point is that those who hold parts of our digital persona in proprietary (or `closed') systems can easily cumulate that information with our `public' digital persona, as well as combining it with that held in other proprietary systems to which they have access. From the cumulative effect of our digital personae, others will draw inferences about our personalities, behaviour etc. The extent to which we will be able (technically and/or legally) to have multiple digital personae will be an important privacy issue.

Identification - the cyberspace / meatspace interface

We only exist virtually in cyberspace - the digital persona is only a representation of the physical person that, as John Perry Barlow, puts it, exists in `meatspace'. Identification occurs at the cyberspace / meatspace interface.

A well known feature of cyberspace has been that it has often been relatively easy to impersonate someone. Recognising individuals over distance and time without recourse to human memory has always been a key organisational challenge to bureaucracies [4]http://www.anu.edu.au/people/Roger.Clarke/DV/HumanID.html]. Tokens, knowledge and biometrics, or combinations of these, provide the links between the physical person and the file. Identification in cyberspace intensifies the challenge because it removes any physical settings or proximity which assist identification, and it often requires real-time responses. The reliability of electronic commerce, or e-mail and other internet transactions, or the believability of a person's digital persona, depends to a very large extent on the continuing reliability of links between the virtual and physical person.

Biometric identifiers entered directly into networked devices will in the longer run provide a main means of identification. In the more immediate future, smart cards are likely to provide one of the main bridges between physical and virtual identity. They have many potential advantages because they can include in the one token (i) digital representations of value (e-cash or credit); (ii) digital signatures (to provide authentication of messages transmitted); and (iii) digital biometric identifiers (to guarantee security / access to networks). Their portability means they can be the link between mobile people and pervasive networks.

`New' privacy problems in cyberspace

Cyberspace is bringing forward new privacy issues that were not directly anticipated when sets of Privacy principles were first formulated. Here are a few examples that have been topical during 1996.

Search engines, robots and internet indexes

One of the most difficult privacy problems of the internet is the power of search engines and indexing facilities. One of the main protectors of privacy on the internet, as elsewhere, was inefficiency - that it was very difficult to find anything unless someone told you where it was [5]. This changed somewhat with very extensive indexes of internet sites like Yahoo[6]http://www.yahoo.com/], but has gone forever with the release in December 1995 of DEC's Alta Vista search engine [7]http://www.altavista.digital.com/], and with the subsequent proliferation of e-mail, telephone, address and Usenet directories.

Web pages and Usenet news posts

John Hilvert explains the travails of one user of the Alta Vista search engine [8]:

When Internet user, Ed Chilton heard about the hot new search engine, Alta Vista, from Digital Equipment Corporation (DEC), he had to try it out. Alta Vista was introduced as a free service back in December last year to show-case DEC's ability to handle the Internet, no matter how it scaled. Using high end DEC Alpha systems and sophisticated software, Alta Vista gobbles and disgorges in a very accessible way, the entire catalogue of some 22 million web pages (11 billion words) and about the last two months of the content of 13,000 news groups. It handles 5 million search requests a day.

Impressed with Alta Vista's remarkable speed, Chilton tried Alta Vista on the news groups and was sickened. ''What I found with the newsgroups, using my name or email address as search parameters, was a copy of almost every post I've made to Usenet newsgroups since the first week in January,'' he wrote on 6 March. ''That includes my posts to these two newsgroups, and all rejoinders from anyone here who included my name in his or her reply. Make out of that what you wish. My reaction to it is somewhere between disgust and fury.''

Chilton said it was an important feature of newsgroups that users get to know each other's themes, axes to grind, and pet peeves. ''What I do not expect is that the newsgroup clubhouse is bugged, and that what is said there, by any of us, will be recorded and made available to any person on the Internet, for whatever reason persons might have.'' Chilton said DEC's Usenet search engine should be banned and its developers publicly brought to their knees.

The irony of all this is: I came across Chilton's privacy lament using the Alta Vista search engine.

Alta Vista uses robots (also known as spiders or webcrawlers)[9]http://info.webcrawler.com/mak/projects/robots/robots.html] to trawl the internet, creating complete word occurrence indexes of every web page and every item posted to every News group that it is allowed to access. As a result it is now possible to search for any occurrence of a name or phrase occurring anywhere in the text of any web page, or in any News posting.

As Mr Chilton lamented, the privacy issue here is that, although you must technically make such information available to all on the internet (either by posting it to a newsgroup or putting it in a public_html directory) before robots can index it, you do not necessarily expect that it will be read by anyone outside those with whom you have some common experience, or the information used for purposes completely outside the intended purposes for which it was provided. For example, those involved in creating web pages, or involved in newsgroup discussions, concerning (say) gay and lesbian issues or issues relating to minority religious groups, could find that information about them was being systematically compiled and disseminated so as to harm them. Those who once valued the net as an escape from the values of small communities may find there is no longer any escape except behind barricades of secret communications.

Should there be some privacy right not to be indexed? It is a difficult issue which involves freedom of speech and freedom of the press considerations in a new context, and any legislative intervention could be dangerous indeed.

Robot exclusion standards

There is a very significant customary limitation on the operation of robots, which at present provides part of the answer to privacy problems here. The Robot Exclusion Standard[10]http://128.163.69.70/notes/robots/norobots.htm], which is not any official internet standard but rather `a common facility the majority of robot authors offer the WWW community to protect WWW server against unwanted accesses by their robots'. The Standard allows a server administrator to define which parts of a web site are allowed to be indexed by robots [11], but the designers recognise that this has its limitations for privacy protection:

A possible drawback of this single-file approach is that only a server administrator can maintain such a list, not the individual document maintainers on the server. This can be resolved by a local process to construct the single file from a number of others, but if, or how, this is done is outside of the scope of this document.

If there was a change to the html mark-up standard so that pages could contain information in their header that excluded robot indexing on a page-by-page basis, then such a technical solution would largely solve the problem - provided all robots obeyed the Robot Exclusion Standard. This would in effect be an `opt out' solution to the problem.

Such a solution has already been adopted with Usenet news posts. Deja News, Alta Vista and some other search facilities allow users to insert the flag `x-no-archive:yes' at the beginning of each post, and they are then not indexed.

`Living down' old internet information is still possible. Web indexing engines only maintain details of current versions of pages. Most Usenet indexes (eg Alta Vista) only retain postings for a few weeks or months, but DejaNews intends to archive all Usenet posts as far back as it can. However, it does accept requests for old posts to be deleted - again, an opt-out solution.

Locator services: E-mail addresses, phone numbers and street addresses

Susan Stellin explains the current state of play of personal locator services on the internet [12]http://www.cnet.com/Content/Features/Dlife/Privacy2/]:

Several sites have launched recently that allow you to search for email addresses. At this stage, the results tend to be hit or miss, but it won't be long before these services are as comprehensive as other search engines. While being able to find an old friend or a distant relative is a valuable service, it can raise some privacy issues. First, you might not even know that you're listed. Second, if you want to remove your name, you have to write to each service and request that your name be deleted.

The Four11 White Page Directory FAQ, the Bigfoot FAQ, and the Internet Address Finder FAQ all explain how you can remove your name from their list. However, you may have to be persistent. When I sent a message to Four11 asking to have my name removed, I got an email back asking me to reconsider. (Weeks later, Four11 still hasn't removed my name.) Notably, most of these sites built their databases with names culled from Usenet.

In addition to cataloging online information, there are several directories on the Web that allow you to search for offline information, such as phone numbers and addresses--one site even links successful matches to a map showing how to find a person's home! Switchboard offers both a business and a residential directory. Switchboard also publishes a policy statement explaining where the company gets its data, and how you can remove your name.

Yahoo's People Search also finds individuals' phone numbers and addresses, as well as email addresses and home pages. The service created a furor because it included unlisted information (culled from product registration cards and magazine subscriptions), but it has since deleted this data. Yahoo has also posted a privacy policy explaining how you can suppress your personal information. Other than writing to each service and asking that your name be removed, there's not much you can do to keep yourself out of these databases.

Again, where such location information is culled from a wide variety of sources and aggregated, the surveillance capacity of the internet could severely hamper participation in it by individuals who did not wish such location information concerning them to be instantly available, centrally stored, and regularly updated.

`Generally available publications' and the Privacy Act

The application of exisiting privacy Principles will be discussed in detail later. It is arguable that a web page, and Usenet post, an index like Alta Vista, and the various location services, would all be `generally available publications' in the terms of the Commonwealth Privacy Act 1988, and therefore only the Collection Principles would apply to them. However, it might be questionable whether expectations of very limited circulation deny this conclusion.

If so, can the mere fact of external indexing by a search engine then turn something into a generally available publication, destroying otherwise existing privacy rights? That would be somewhat paradoxical. It might be argued that the act of indexing by an index like Alta Vista or a location service was, in some cases, an unfair collection practice (IPP 1) - which does apply to generally available publications.

Between you and your browser

Users of the world-wide-web sometimes thought that the fact that they did not have to enter the names or other details in order to access web pages meant that there was a high degree of privacy in the use of the web - that it was virtually anonymous. Far fewer people would be likely to believe that any longer, as the net's surveillance capacity has become more notorious, but it is still worth cataloguing some of the information that your browser typically reveals about you.

With most web browsing software, such as Netscape or Microsoft Explorer, any request to a web sit discloses to the web server accessed [13]:

* the network identity of the machine you use to access the web (both its IP address and, if desired, a domain name look-up), thereby usually identifying geographically where the user has come from and (for single user machines) what is effectively the identity of the user. Whether this is disclosure of personal information is discussed later.;
the URL of the web page you immediately previously accessed (or other resource such as an ftp site) - the `HTTP_REFERER'. If a member of a religious group accessed the group's web site by a link from the site of a group known to be hostile to that religion or anathema to it (eg a gay site), this disclosure of `where you have been' could be very privacy-invasive.;
`cookies' - information stored by a web server on the computers of users who have accessed it [14]http://proxima.cs.purdue.edu:7000/remote.html ], such as a list of previously accessed web pages, or transactional information generated while accessing those web pages (eg what you bought on all the web pages for one store); however, the web site accessed can only retrieve such information from your host if it knows the storage format within the `cookie'[15].

Current browsers don't allow these disclosure mechanisms to be turned off, although it is not obvious why users could not be given the option to turn off any other than the first one listed. A user can can delete cookies from his or her machine, but they are like (if mixed Mediterranean metaphors are allowed) a Medusa-like Trojan horse that keeps reappearing inside your PC, no matter how often you trash it. Commenting on cookies, but with comments equally applicable to other forms of disclosure, Marc Rotenberg identifies the privacy issue as `data collection practices should be fully visible to the individual ... Any feature which results in the collection of personally identifiable information should be made known prior to operation and ... the individual should retain the ability to disengage the feature if he or she so chooses.'[16]

Another area where web users may have little awareness of who is capable of finding out details of their browsing habits, is caused by the use of proxy servers and proxy caches, where an internet service provider (ISP), in order to preserve bandwidth and costs, caches all pages accessed by users of the ISP, so that subsequent users access copies of the page in the ISP's cache, rather than on the `original' site. However, this means that an ISP who is potentially local to the user - and with whom the user is a client - can record information about the user's browsing habits which the user would rather have known only by a server on the other side of the world. There are many other aspects of monitoring of network usage that also raise privacy issues. The effect of telecommunications interception laws will be considered later.

Later parts of this article will consider the applicability of existing Information Privacy Principles to cyberspace, and the implications of telecommunications interception laws. [1] I stole this quip from John Hilvert, via Andy Warhol and who knows who else ...

[2] Science fiction writers and others make it a lot more than that! - see

[3] Roger Clarke 'The Digital Persona and its Application to Data Surveillance', The Information Society March 1994; for abstract only -

[4] Roger Clarke "Human Identification in Information Systems: Management Challenges and Public Policy Issues" Information Technology and People

[5] A year ago I heard the world-wide-web compared to removing the spines from all the books in the Library of Congress, just as a tornado hit. You wouldn't say that any more.

[6]

[7]

[8] John Hilvert "Private Lies", Information Age, May 1996, pp 18-23

[9] See `World Wide Web Robots, Wanderers, and Spiders' -

[10] `A Standard for Robot Exclusion' -

[11] In the file on the local URL "/robots.txt" - which only the server administrator could normally access.

[12] Susan Stellin `How private is your personal information?' in C-Net's Digital Life series, 1996 -

[13] Geoff King, Manager of AustLII (http://www.austlii.edu.au/), assisted with some of these details.

[14] for example, with Netscape, by way of a file called COOKIES.TXT on Windows machines or "MagicCookie" on the Apple Macintosh; see to inspect the contents of the cookie on your computer, and other information

[15] The most likely use of cookie information is therefore for sites you have previously visited to customise the appearance of their site to take into account what they already know about you.

[16] Quoted in John Hilvert, op cit