April 2006 issue of e-Discovery Law & Strategy
37,000 Libraries of Congress created in one year No wonder data management budgets are bursting and the right data is so hard to find By Sonya Sigler, Cataphora How much would it cost to house 37,000 Libraries of Congress? That is how much new electronic data is created in a single year, according to one study. Fortunately, the data is stored in computers not in a monumental building, but the costs of creating, storing and managing data nevertheless are significant and growing for all businesses. Smart data retention policies can help decrease the costs of managing all of this data. Knowing exactly what information you have and where it is reaps benefits for any organization, allowing this knowledge to be shared efficiently. Good data management can help an organization avoid having to wade through any more data than absolutely necessary in the event of legal action. Finally, well-defined and repeatable processes can decrease the costs of repeatedly reviewing the same data in the event of repeated pattern litigation, and are more readily defensible. The Costs of Mushrooming Data Electronic documents are mushrooming with the spread of email and instant messaging (IM) and, of course, the use of electronic documents in general. A University of California Berkeley study estimated that new stored information grew about 30% a year between 1999 and 20021. The same study concluded that five exabytes of new information were created in a single year. Most of us are familiar with megabytes and gigabytes. A reasonable hard drive on a PC may have a capacity in the region of 100 gigabytes. An exabyte is one billion gigabytes. Another way of looking at it – five exabytes is approximately 37,000 times greater than all of the information in the Library of Congress. 40% of this information was produced in the United States and 92% of it was stored on magnetic media, mostly on hard disks. Another study forecasts that 84 billion e-mails – more than 33 billion of which will be spam messages – will be sent daily worldwide in 20062. Of course, all of this electronic communication includes personal as well as business communications, but there can be little doubt that a significant proportion of these numbers refers to purely business communications. A slew of associated costs are growing right along with the increase in numbers of documents. 1
How Much Information? University of California School of Information Management and Systems, 2003. http://www.sims.berkeley.edu:8000/research/projects/how-much-info-2003/ 2 Worldwide Email Usage 2005-2009 Forecast: Email's Future Depends on Keeping Its Value High and Its Cost Low, IDC
The most direct and obvious cost associated with all of this data is, of course, the cost of storing it. Per-gigabyte storage costs have seen dramatic reduction in recent years, but the cost of the media for storing more and more data, coupled with the cost of physical space for the media and the associated management logistics have led to overall increased costs of storage. This is further fueled by more stringent legal and regulatory requirements governing the retention of business records. Additional costs result when an organization gets involved in legal action, when the larger volume of potentially relevant electronic evidence clearly has ripple effects. All other things being equal, more time must be taken by more people to go through the vast volumes of data involved. Someone has to make sense of the data and actually find what is actually relevant. The legal risks and implication in fact outrun just the growth in data. Legal costs risk spiraling upward at a rate that is even greater than the actual growth in volume, as more and more data disproportionately adds to the complexity and risk of the legal response. Thus, there are not only the directly quantifiable costs of legal work, but also the higher risk of failure to defend successfully as a result of the volume and complexity of potentially relevant data. Furthermore, in many cases, pattern litigation results in the same (or nearly the same) data being subject to discovery over and over again. Lack of repeatable analysis and processes can result in significant duplication of effort and related costs. Crucial Steps While all of these electronic documents have a negative impact on the corporate bottom line, they can also provide a golden opportunity to proactively use the data to reduce costs and meet strategic goals throughout the organization. A few crucial steps can help an organization to minimize its costs and maximize the advantages of the data. The very first step is to know what data you have. This consists not only of knowing where all copies of backup tapes and so forth are to be found. It also involves understanding just what information is stored on each tape or other medium. For example, if it is known that a tape does not contain data from any relevant custodian, it may not be necessary to go to the expense of restoring that tape at all. At a more sophisticated level, many documents can only be really understood in the context of other related documents, so any technology that hopes to provide true insight should support the concept of context between documents rather than merely within documents. The more efficient processes and decreased volumes of stored data resulting from this step offer an immediate cost benefit.
Once the ability to truly understand the information has been developed, it is a vital costsaving measure to build a repeatable process. There are two aspects to this. First, the way relevant evidence is produced should be consistent; such consistency is the bedrock of a defensible process. An inconsistent, non-repeatable process is dangerously susceptible to attack by the opposing party, which may succeed in making the argument that the response has been inadequate. This can lead to further, far-reaching discovery which will be costly in simple monetary terms and which exposes the respondent to increased danger that something damaging may be unearthed. Second, in terms of repeatable processes, in the case of pattern litigation, any relevant information that was previously produced should be readily available for subsequent cases, eliminating the unnecessary cost of repeatedly searching for it. A sound retention policy, coupled with good insight into what data resides where, can allow an organization to delete data that it is not legally required to retain. This reduces the cost of storage, and can also significantly reduce the cost of legal discovery, since there will be less data to review. A further cost management step is to address employees’ use of e-mail and instant messaging. A 2003 survey showed some troubling statistics on the topic of employee use of email3. In that year, the average employee reportedly spent 25% of the workday on email, with 8% of workers devoting over four hours a day to this activity. Equally disturbingly, 14% of respondents had been ordered by a court or regulatory body to produce employee e-mail. While e-mail is a legitimate business tool, managing its use (both in terms of time and content) has the potential to increase productivity and to decrease many of the costs associated with large volumes of inadequately controlled communications. The first step here is to put in place an effective policy for e-mail and instant message (IM) use, coupled with policies and processes for systematic and defensible retention – and deletion – of e-mail messages. Departmental Responsibilities Such actions are the basic building blocks for cost reductions and business intelligence benefits across the enterprise. In addition, individual departments can play a role. The overarching trend is that no single group is solely responsible for an organization’s information assets – every group must play its part. Managerial functions are the owners of most of the data. Legal folks are most intimately acquainted with laws, regulations, and the implications of poor data management policies. And, finally, the IT organization is in charge of the nuts and bolts of putting in place and running the systems on which the data resides and which must support the organization’s business and legal imperatives. Information Technology There is increasing need for IT to understand the business and legal implications of corporate information. Some people have characterized this as a change in the role of the 3
2003 E-Mail Rules, Policies and Practices Survey, American Management Association. http://www.gwtools.com/policy/articles/Email_Policies_Practices.pdf
Chief Information Officer (CIO) to one of CISO (Chief Information Security Officer), providing a new and significant emphasis for that position. In general, the IT department will need to be more proactively involved in the business, needing to understand other functional areas better, in addition to understanding the evolving relevant technology trends. Human Resources A significant potential source of liability relates to personnel issues. A carefully crafted policy can be reinforced by analyzing data for evidence of sexual harassment or any kind of illegal discriminatory behavior. This allows the organization to be proactive in dealing with potential liabilities and the costs that they may entail. Business Intelligence A variety of cost benefits can be realized from a more smoothly, more efficiently operating organization. Especially in larger organizations, it can often be very hard to locate information that is known, or believed, to exist. According to one report, 29% of professionals spend upwards of eight hours per week trying to find electronic information4. In other words, almost one third of these people are spending the equivalent of one day per week in this way. And over 70% spend half a day or more per week. It is not hard to believe that neither the people involved nor their employers are particularly happy with this situation. Any tool that can help with truly understanding what information is where, and cutting out the unnecessary stuff to lessen the search space, has to be a valuable and welcome contribution to an organization’s costs. Customer Relationship Management There are numerous ways that good insight into an organization’s communications with its customers – and vice versa – can provide benefits, including tangible cost savings. Broad, high-level analysis of communication patterns can help reveal who is communicating with whom. This in itself can provide unprecedented insight and can lead to improvements in customer relationships as customers can be more readily put in touch with the right people to help them. The actual content of the communications can be automatically reviewed to make sure that it accords with relevant regulations and policies. All of this can improve customer relations and lower costs through more efficient practices. Conclusion It summary, it is important for all departments, not just the legal team, to think about the broad implications of information creation, storage, and use. At a time when 4
Information Intelligence: Content Classification and the Enterprise Taxonomy Practice, Delphi Research, 2004. http://www.delphiweb.com/knowledgebase/newsflash_guest.htm?nid=953
organizations continue to be particularly cost-conscious, such initiatives can not only save costs in the short term – for example by reducing the need for resources such as data storage – but also down the road, such as reduced discovery costs or legal liability in the event of litigation. Sonya L. Sigler is Vice President of Business Development and General Counsel at Cataphora, Inc. She is a member of the California State Bar and is a frequent speaker on various topics, including electronic discovery. She is a member of the Sedona Conference Working Group 1 on Electronic Document Retention and Production.