Address omitted in published version. 29 April 2009 Dear Sir, I write in response to the consultation document “Protecting the Public in a Changing Environment”. I am a computer scientist with a detailed knowledge of the Internet and its protocols, and I have watched the growth of information and communication services on the Internet for over 20 years. I believe this gives me an insight into the likely impact of these services on the ability of the authorities to access this data and the mechanisms required for them to do so. As the consultation document explains, existing legislation draws a clear distinction between communications content and communications data. The latter is the data used to identify the particular equipment (e.g. a computer or cell phone) and hence the person doing the communicating. The case studies in the consultation paper make it clear that the Government requires information about who people are associating with on-line. The problem faced by the Government is that on the Internet a communication is no longer a simple point-to-point connection that exists at a defined time. For instance, if I use a web-based email service then all the Internet packets might go between my computer and the email server (which might be outside the UK). But this is no use to the Government: it needs to know who I am emailing. This information is embedded in the data exchanged between my computer and the email server. At present this is considered to be part of the communication between my computer and the email server, and hence cannot be collected. Therefore the Government is seeking to expand the definition of communications data to include this information. Q1: On the basis of this evidence and subject to current safeguards and oversight arrangements, do you agree that communications data is vital for law enforcement, security and intelligence agencies and emergency services in tackling serious crime, preventing terrorism and protecting the public? I agree that communications data is of great utility. However I disagree that it is “vital”. If this capability were lost then the proportion of crimes solved would decrease. I do not believe that this presents a significant threat to the rule of law in this country. The Government argues that if this were to occur then lives would be lost, implying that no degree of loss of privacy is worth a life. However this argument can be applied to any invasion of privacy. For instance the “telescreens” of Orwell's 1984 might well have saved the life of Baby P by detecting and providing clear evidence of abuse. Does this justify mass surveillance of people in their homes? Of course not. Furthermore the existence of the data in question also creates opportunities for serious crimes against innocent people, including blackmail, stalking and intimidation of witnesses. The consultation paper proposes various legal and technical safeguards that will certainly constrain public servants who seek to do their work within the law, but will have little impact on criminals. A practical policy cannot assume that all those entrusted with data will be honest. Q2: Is it right for Government to maintain this capability by responding to the new communications environment? The question assumes that maintaining this capability is a feasible proposition. However there is good reason to suppose that this is not the case, at least for serious crime conducted by intelligent criminals. Of course many criminals are not intelligent and will not have the ability or foresight to hide their identities and associations. However such people are likely to be caught by other means in any case. The consultation document considers a scenario in which three friends use different communications systems in the course of a single evening without any intention to conceal their activities, but it does not consider the options available to criminals with a strong incentive to
remain hidden. The following is a list of some of these options: •
Conspirators might use public message drops for encrypted messages. Many services in many countries allow users to place data for others to download later. Perhaps the best option for a conspiracy would be the venerable Usenet service (see http://en.wikipedia.org/wiki/Usenet for a technical overview). This service was originally created in the 1980s as a distributed discussion forum, but in practice today most of the Usenet data by volume is audio and video, and most of that is pirated. A group of conspirators could agree on a newsgroup and then leave encrypted messages for each other. The Government could tell that they had accessed Usenet but not that they had communicated with each other. Alternatively encrypted messages could be embedded using “steganography” within image files on photo sharing services such as Flikr.
•
The “Freenet” system (http://en.wikipedia.org/wiki/Freenet) is a peer-to-peer data sharing system designed to be resistant to government censorship and monitoring. Again, a group of conspirators could easily use it to exchange secret messages without their association becoming visible to the Government.
•
The TOR network routes HTTP connections randomly around a network of volunteer routers in many countries, so that the origin of a request for a web page is impossible to determine.
•
Anyone wishing to use the Internet without being identified can sign up to one of a large number of cheap “SSH tunnel” services. Examples include http://www.guardster.com/, https://secure-tunnel.com/ and http://www.privacy.li/, and there are many others. The service works by setting up an encrypted “tunnel” between the client and the provider and routing all internet data through it. The client can then use any Internet software as normal. If, for instance, the client accesses a website through the tunnel then the web server will see a connection from the tunnel provider rather than the client. The client's ISP can see that the client has connected to the provider by an encrypted tunnel, and may be able to infer something from the pattern of data transfers (e.g. web browsing versus VOIP phone), but nothing more.
Plainly there are many ways for a moderately careful conspiracy to conceal itself, and for many forms of serious crime to continue undetected. The only way for the Government to maintain full coverage of communication data would be to tightly regulate cryptography, banning all use of codes that the Government cannot decipher. This approach has already been rejected as impractical and dangerously illiberal throughout the Free World. Too much Internet commerce depends on strong encryption, and encrypted connections are difficult to identify. It might be argued that most people have no particular interest in hiding from the Government, and hence those that do stand out from the crowd. However this is not the case. The music industry has conducted a sustained legal campaign against file sharing, and this has led many otherwise lawabiding people to hide some of their online activities. Some ISPs have used packet inspection to block or slow down particular Internet protocols, and this has led software developers to mask the protocols that their programs use, or to use HTTP (the Web protocol) as the foundation. Both of these trends will make it more difficult for ISPs to implement effective monitoring on behalf of the Government. Therefore I believe that the Government’s options are more limited than the question assumes. If communication providers could easily record all communication data on the Internet then it might be right to do so. However in practice the Government can only obtain partial coverage that is easy to evade, and as I shall explain in the next section it can only do so by creating an expensive and intrusive surveillance system. I do not believe that the benefits of this system will justify the costs, both in financial terms, and also from intrusion into privacy and increased opportunity for crime created by the data itself. Therefore I do not believe it is right for the Government to attempt to maintain its current capability in this area.
Q3: Do you support the Government’s approach to maintaining our capabilities? Which of the solutions should it adopt? There are a number of fundamental issues with the proposed approach: 1. As explained above, it will not provide coverage of anyone with an incentive to hide and a modicum of technical knowledge. Many of these are exactly the people that the Government most needs to catch. The remainder are those who either lack all technical expertise (a shrinking minority, especially amongst the young) or believe their crimes to be too minor to be worth investigating. Hence the Government will increasingly find itself using an expensive and intrusive surveillance system to detect mundane offences while serious crimes are clearly going undetected. 2. As the consultation document points out, existing law makes a clear distinction between data about a communication (such as the parties and the time) and the contents of the communication itself. However the proposed data retention regime would find it difficult to maintain that clear distinction. Take, for instance, the URL of a web page. This consists of three main components: a) The identity of the machine that serves the website. b) The name of the page within the website. c) Optionally, a query string that is processed to generate a “dynamic” web page, such as the result of a search for a keyword. Clearly the identity of the server is “communications data” since it is a physical party to the communication, but what of the other components? If I am browsing a website such as NHS Direct where a page stores information about a specific disease and searches are made for disease names or symptoms, then a good analogy would be a telephone consultation in which I ask a medical practitioner for information. So it seems clear to me that in this case the (b) and (c) components are part of the communication (as current law states). However if I browse a website such as Google Groups, where groups of people can form simple websites with newsletters and mailing lists, then browsing a page such as “http://groups.google.com/group/privacy” (a fictional example) will bring me into contact with other privacy activists while “http://groups.google.com/group/dog-owners” will not. So here the page name shows who the user is associating with, and hence would be sought by the Government. In legal terms, when someone posts information to a Google Group, the group name is used as an identifier for all the equipment that subsequently accesses the group, and hence it falls under the definition of communications data. In this case Google have chosen to use the page name to distinguish between groups, but they could just have easily used a query string instead, with URLs like “http://groups.google.com/group?privacy” (the question mark denotes the end of the page name and the start of the query). Thus there is a grey area of data that can be communication content in some contexts and communications data in others. Automatic collection systems cannot distinguish between the two. Even if some websites can be given special rules (e.g. NHS Direct page names are content, but Google Groups names are communication data) the Web is too big and changes too rapidly for such a system to work in general. Furthermore this is only a single example of a much more general dilemma: many communications protocols have combinations of communications data and content that fall into this grey area. 3. New communications technologies are being introduced all the time, and people find new ways to use them even faster. Every time a new protocol is introduced the collection technology will have to be adapted to correctly identify it and collect the right data, and as the use of a protocol shifts the technology will have to be modified. This would present ISPs with an onerous burden. They will never be able to collect everything without also intercepting some communications content as well, but it will be difficult to determine how much effort will be enough to be deemed compliant with the regulations.
For instance, multi-player games such as World of Warcraft, Runescape and Second Life exchange game data in proprietary formats between the player's computer and a central server, including “location” within the game (i.e. is the player in the Castle or the Inn?). Games have “chat” functions where typed messages can be read by those nearby in the game, and in the future this is likely to be extended to voice as well. So a criminal conspiracy might decide to meet at a virtual location within a game, in which case their locations in the game would be communications data of legitimate interest to the police. Thus if ISPs are to implement an effective system to capture and store communications data then there will have to be a central organisation dedicating considerable resources to tracking and analysing all the protocols and web sites on the Internet. Given the scale of the task it would be very inefficient for every ISP to duplicate this work in-house; even a large and well-funded organisation will find this a challenge. In the case of proprietary protocols such as those used for game playing they will also have to write special-purpose software to extract the communication data. This software will need to be very efficient in order to cope with the large amounts of data in question. It will also need to be written in multiple versions to work with all the different equipment used by ISPs, and be rapidly developed and maintained to keep pace with the Internet. All of these are challenging technical requirements; doing them all at once is probably not feasible. Therefore I do not believe that it is feasible for the Government to maintain its current capability to capture most communications data. It must therefore resign itself to a steadily decreasing capability in this area, and should take steps to build up other capabilities to compensate. Q4: Do you believe that the safeguards outlined are sufficient for communications data in the future? Privacy violation is not an abstract harm; it can lead directly to serious personal consequences including identity theft, victimisation, blackmail, stalking, loss of reputation, unemployment and in extreme cases even kidnapping and death. Communications data can also be of economic importance; if lots of people from company A are accessing the website and financial reports of company B then this may indicate a planned takeover. The potential for political abuse should also not be underestimated. This would be illegal, but history shows that not every individual in every government is above temptation. One can imagine, for instance, the potential for political damage if the pornographic browsing of a member of the opposition were made public. As shown above, the Government proposals will force communication providers to capture and store a great deal of sensitive data. Because of the unclear boundary between communications data and communications content, some of this data will be content that has been inadvertently collected. The consultation document proposes that this data will be held by communication providers in order to avoid creating a single big database. However this is a distinction without a difference. The consultation document makes it clear that a government agency will act as a clearing house, and hence will have rapid and easy access to the various databases created by communication providers, presumably on-line. This places all the data at the disposal of this agency just as if it were stored in a single big database; once the data are on-line the ownership and location of the actual hard drives is irrelevant. The consultation document envisages that when a suitably senior person signs an official request for data the government agency in charge of the data will then search the relevant databases and issue whatever data they deem to be relevant and proportionate to the goal of the investigation. However this agency will itself have unlimited access to the various databases held by the different communication providers. This will not be a small agency; identifying and reviewing the data relevant to even a simple request will require significant manual work. Furthermore the consultation document strongly implies that this service will be available around the clock. Hence a significant number of people will have essentially unlimited access to the data. In addition people
at communication providers will also have access to the data they collect. It is unreasonable to suppose that all of these people will be entirely honest, especially given the value and importance of the data they are supposed to guard. If any of them do decide to engage in a criminal scheme to sell or exploit this data then they will also be well placed to identify weak spots in the data retention system, and therefore evade detection. Indeed this knowledge itself would be valuable to criminals. It might be argued that all data accesses will be logged and linked to requests. However in practice this is difficult to enforce for all cases; system administrators, for instance, generally need unlimited access. In addition if the data are stored by communication providers then their staff will also have access to the data they have collected. Given the scope for harm created by the data it is clear that the proposed safeguards are not sufficient. In fact a centralised database would be more secure because fewer people would have access and security policies would be easier to implement and enforce. In conclusion, I believe that the proposals in the consultation document are seriously flawed and should be rethought. Yours sincerely,
Paul Johnson.