Problems in the preservation of electronic records The Authors Lim Siew Lin, Student, Division of Information Studies, School of Communication Studies, Nanyang Technological University, Singapore Chennupati K. Ramaiah, Assistant Professor, Division of Information Studies, School of Communication Studies, Nanyang Technological University, Singapore Pitt Kuan Wal, Director, National Archives of Singapore, Singapore
Abstract Terms such as medium preservation and technology preservation are now widely used when discussing issues related to the preservation of electronic records. The advent of electronic information introduces new preservation requirements. Medium preservation has been addressed in discussions on environmental and handling concerns for tapes, magnetic disks, optical disks, and the like. Greater attention should instead be directed to the obsolescence of technologies. It is a challenge to imagine not only how to technically preserve electronic records indefinitely, but also how to choose what to preserve and how to guarantee the electronic record’s reliability and authenticity in the future. The combined problems of immense volume, unstable storage media, and obsolete hardware and software add up to some very tough problems, which have to be dealt with. Digital preservation is becoming a business issue. Not only are historians, librarians and archivists alarmed by the loss of cultural and government records due to a lack in digital preservation, but certain industries have also realised that they need to keep data longer and longer for regulatory or business reasons.
Article Type: Research paper
Keyword(s): Preservation; Conservation; Library materials; Digital mapping; Records management; Digital documents.
Journal: Library Review
Volume:
52
Number: 3
Year: 2003
pp: 117-125
Copyright © MCB UP Ltd
ISSN: 0024-2535 Introduction As organisations rely more and more on digital technology to produce, process, store, communicate, and use information in their activities, the quantity of records being created in electronic form will increase exponentially. The technological challenge is compounded by the continuing extension of information technology, making records increasingly more diverse and complex. This creates an impact not only on individual records but also on the archival fonds as a structured whole. Betts (1999) felt that digital information is at risk of disappearing or becoming inaccessible because of the deterioration of storage media like magnetic tapes. Other concerns include everchanging data formats and the fact that software and hardware become obsolete quickly. The greatest challenge to electronic record keeping is the evolution of technology (Coombs, 1999). New hardware and software are replacing the products and methods used to record, store, and retrieve digital information on cycles of two to five years. No system is currently capable of more than 30 years of retention and access. With the constant upgrade of operating systems, applications, and storage technology, obsolescence of access will overtake any durable assortment of hardware and software. Stephens (2000a) indicated that for many years, archival institutions around the world have operated programs to preserve records possessing archival value that are worthy of permanent preservation. These programs have been organised around physical paper records, microfilm, and other visible record media. Now, the archival repositories are in a long transition from paper to electronic records as the predominant record-keeping medium. Electronic records of archival
value are getting older by the day and they may be threatened with attacks on their integrity and retrievability. Archival institutions and libraries worldwide must have plans for dealing with this situation and must implement such plans as aggressively as possible before data are lost. The two primary contributors that fuel the debate in managing and preserving society’s documentary heritage are: first the rapid technological changes of the last two decades, and second, the increasing dependence of society on electronic or digital documentation. Bantin (2001) discussed the various research projects that have addressed the challenges presented by electronic records. The most prominent are those devoted to developing basic requirements for record-keeping systems, and identifying documentation or metadata that must be present to create reliable and authentic records. The field of information technology has, by and large, ignored the problems of long-term preservation and now need to find an efficient and effective way to preserve electronic records. Records are distinguished by two characteristics, namely: 1. (1) records that reflect business processes or individual activities, i.e. a record is not just a collection of data but the consequences or product of an event; and 2. (2) records that provide evidence of these transactions or activities, i.e. recorded documentation cannot qualify as a record unless certain evidence about the content and structure of the document and the context of its creation are present and accessible. Records collected and preserved by libraries cannot be defined in the same way, but the preservation problems that they present are the same. A structured unit of recorded information, published or unpublished, in hard copy or in electronic form is called a document. Difference between physical and electronic records The essence of the digital preservation problem is how to support the ability to digitally process and read many years from now the electronic records being created today (Stephens, 2000a,b). With physical records, the information is contained on media that are durable which can be read by sight or with relatively simple viewing devices. Deterioration is visibly apparent, and there is a time window during which conservation measures can be undertaken if required. Once appraised as archival, they are accessioned into the archives, finding aids are prepared for them and they are housed in environmentally appropriate space. Records should retain their integrity and usability for many decades, even centuries, under proper environmental conditions. Preservation of physical records is generally a once-and-done task and no further measures are required. The electronic record is entirely different as the recording media are not durable and records are not human readable. The hardware and software required to read electronic records have even less life expectancy than the media on which they are recorded. The tasks required to ensure long-term preservation of digital records are technically difficult as specialised expertise, the
requisite hardware, software and systems documentation will be required. Data preservation tasks must be performed periodically for as long as ongoing retention is required. Preservation of electronic records An electronic record is preserved if and only if it continues to exist in a form that allows it to be retrieved, providing reliable and authentic evidence of the activity that produced the record. Demonstrating the authenticity of electronic records depends on verifying that: • • • • •
the right data was put into storage properly; either nothing happened in storage to change this data or alternatively any changes in the data over time are insignificant; all the right data and only the right data were retrieved from storage; the retrieved data were subjected to an appropriate process; and the processing was executed correctly to output an authentic reproduction of the record.
Preserving electronic records Stephens (2000b) mentioned that archives are now receiving information in digital form, in endless sizes and formats, up to 15 to 20 years old. Most of the material is impossible to read today although they have permanent value; but it is not available on paper and so is really lost. This is considered a “great and grave problem”, as we need to preserve our information today for the future. Digital preservation has now become a global information management problem. The large archival repositories, which keep thousands of electronic record-keeping systems with long-term retention requirements but no formal policy or practices in place to assure their preservation, will experience problems of reading and processing the older digital media. The continuing progress and rapid obsolescence of information technology is the most often cited, but perhaps not the most significant, challenge archives face in the endeavour to preserve electronic records. Betts (1999) felt that the pharmaceutical companies are a prime example of the importance of digital preservation. They need to keep records about new drugs as long as the drug is on the market as well as records about clinical trials through the life span of the patients. Besides proper stewardship of storage media, they need to have a records-management team that maintains a central repository of metadata and schedules the conversion to new media. Short-term thinking and sloppy records keeping could lead to data disasters for corporations if they lose valuable information due to decomposition of the magnetic tape or because they no longer have the software or hardware required to retrieve it. Another problem is the lack of organisational commitment and the willingness to allocate sufficient resources to preservation efforts. Companies are seen not to be doing enough to make sure that data that was archived several years ago is retrievable. They need to ensure that data recovery plans are in place to enhance digital preservation.
Volume of electronic records The use of electronic information technologies has brought impressive gains in efficiency and productivity as well as accumulating volumes of information that provide a wealth of records to monitor and manage (Coombs, 1999). The downside of this electronic revolution is the staggering accumulation of information in all media, from paper to microfilm and electronic formats, that has dramatically accelerated the need to manage this information. Information is produced in ever-increasing volumes of electronic records in the form of e-mail, e-documents, eforms including e-journals and e-zines, e-transactions, databases and Web pages, which makes the application of records management principles and practices all the more urgent. Physical media Books can be seen to be crumbling but there is no way of looking at magnetic tape and seeing errors in it. The tape will need to be to run and this is certainly time- and labor-intensive. Early attempts to solve the difficulties in preserving digital information focused on the longevity of the physical media on which the information is stored (Coombs, 1999). Even under the best storage conditions, digital media still have a very limited shelf life. Given such rates of technological change, even the most fragile media may well outlive the continued availability of equipment to read those media. Though the digital systems provide an excellent means to create and widely communicate information quickly, digital file formats and media are not optimal for long-term, multi-general preservation (Lawrence, 2001a,b). For instance, today’s CDs, DVDs, and optical jukeboxes are valid for file back-ups and short-term storage, but they may become obsolete – even forgotten – in a very short time as new storage media types and specifications are developed. Likewise, file formats experience the same dilemma, since today’s “standard” for any application will certainly shift or disappear within a decade. Simply stated, data is only as permanent as the hardware or software that gives it life. It seems that technological obsolescence represents a far greater threat to the preservation of digital archives than does media longevity. Thus, efforts to preserve physical media provide only a short-term, partial solution. Authenticity of electronic records In general, the closer one stays to the original technology and original digital format of the records, the less the problem of authenticity. However, as one stays closer to the original technology, the more complex and impractical the approach becomes over time. More complex because as records continue to accumulate over time, there will be more and more varieties of technology that the archives would have to maintain. This is more impractical because, first, support for obsolete technologies will eventually disappear and, second, the distance and difference between the preserved technology or technical artefacts including the records and the best available technology for preserving, managing, retrieving and delivering the records will
increase continuously. On the other hand, moving ahead as technology progresses can eliminate such practical problems and that can entail loss or corruption of records. Electronic records, unlike paper records, are susceptible to undetectable changes in content and format unless they are held securely and under defined and auditable procedures. This is essential if electronic records are to be acceptable as evidence in legal proceedings. These procedures must ensure that electronic records are an authentic and accurate representation of the transaction and have been safe from alteration. The fragile nature of the electronic medium, as well as the dynamic way in which information technology is deployed, threatens the reliability and authenticity of the record if appropriate information management disciplines are not applied. Migration of electronic records One solution to the problem of media decay is to copy the data to newer storage media but this kind of migration is not perfect. Migration of data to the latest media and software versions can be done on a two to three-year cycle but will require a significant monetary investment for each conversion, constant human attention and personnel training. Digital migration requires the staff to have knowledge of the old formats as well as the new, the ability to analyse and recommend the best of new formats, the time to implement and test migration pilot programs, and the capacity to develop and continually refine migration processes. Moreover, during each conversion, the file is susceptible to corruption. Formatting can shift and data can drop out bit by bit over time. This can result from machine, software or human error leading to strange renderings of documents from which the original information cannot be recovered. Ideal policies, standards and strategies need to be formulated to ensure that authentic electronic records can be preserved over long periods of time. Metadata in electronic records From a records-keeping perspective, the absence of some critical documentation on the structure of the record in the form of system metadata will hinder digital preservation. Of particular importance is the structural metadata describing how to open and read a record as it was originally created and viewed. The absence of critical metadata has meant that most collections of electronic data, electronic documents, or information are not records because they cannot qualify as evidence. The importance of metadata in organisations that employ the transaction processing system (TPS) was stressed by Bantin (2001). The primary goal of TPS is to provide a computerised system that performs and records the daily routine transactions to conduct business. The guiding principle of these systems focuses on creating data that is current, accurate, and consistent, employing database management system (DBMS) software to achieve this. These databases are dynamic, volatile systems, in a state of continual change. Historical data, if kept at all, is usually incomplete or summarised. Consequently, historical snapshots of a database do not routinely capture the data values needed to reconstruct a specific record. Even if all data values are captured in historical versions of the database, archivists argue that the system is still not capturing and preserving records. Retaining database tables does preserve data but not records.
In most automated systems, the physical relationship between the record content and the metadata that gives the content meaning often does not exist. Vital links between metadata and the record content may exist only in a computer software program or may not be a part of the automated system at all. It may exist only as a paper document totally dissociated with the records it describes. Of particular concern is the relative lack of metadata related to the context of creation and use – metadata that addresses the questions of why the record was created, who were the users of the record, and who had custody of the record. The availability of this contextual metadata, archivists argue, could make the difference between a useful and a useless record, particularly when it is viewed over longer periods of time. Legal issues on preserving electronic records Tennant (2000) mentioned that selecting collections to digitize begins with the issue of copyright, since no digitization project can get off the ground without permission to make it digitally available. Issues relating to the intellectual nature of the source materials; current and potential users; anticipated use; the format and nature of the digital product; describing, delivering and retaining the digital product; relationships to other efforts; and costs and benefits have to be considered in preserving electronic records. Approaches to preserving electronic records Current initiatives are pursuing a variety of approaches that can be put broadly into five categories: 1. 2. 3. 4. 5.
(1) preserving the original technology used to create or store the records; (2) emulating the original technology on new platforms; (3) migrating the software necessary to retrieve, deliver, and use the records; (4) migrating the records to up-to-date formats; and (5) converting records to standard forms.
These approaches define a spectrum or range, in broad terms, from no change in the records or the technological context in which they exist to one in which the original hardware and software have disappeared and the digital format of the records has changed. Each of these methods has pros and cons and none of them is entirely satisfactory. Preserving the original technology Stephens (2000a,b) suggested the adoption of the museum approach to preserving digital data. Such an approach, according to Stephens, requires the fulfillment of the following: • • • • • •
digital archives should be transcribed every 10 to 20 years; recording systems and media should be preserved; as should system hardware and software; operating system; operation manuals; and ample spare parts.
However, saving these processing components indefinitely may not be necessary. Improved storage media There has been a significant improvement in recent years, with a movement away from digital storage media, which are more fragile and less stable over time, towards the introduction of more stable and reliable media. The improved options for long-term storage of digital information include areas such as ion-milling and holographic media. The International Council on Archives (ICA) Guide to Managing Electronic Records sets out seven criteria for media used for preserving electronic records (ICA, 1997): 1. 2. 3. 4. 5. 6. 7.
(1) open standards for digital recording on the medium; (2) robust methods for preventing, detecting and reporting errors; (3) sufficient market penetration; (4) known longevity; (5) known susceptibility to degradation or deterioration; (6) a favorable cost/benefit ratio; and (7) availability of methods for recovering from loss.
The current best practices in digital preservation employed by many of the archival institutions include: • • • • •
selecting storage media most appropriate for long-term data retention; converting data to standard formats to facilitate its processing on a variety of computing platforms; migrating data to new technology platforms when the computing environment is upgraded; preserving systems documentation required to process the data copying or recopying the data onto new storage media at regular intervals; and taking steps to store and maintain these media properly.
These measures are labor-intensive, error-prone and costly and must be performed periodically for as long as the data are retained, making the long-term preservation of digital data a substantial undertaking. Emulation and migration of electronic records Coombs (1999) indicates that there are two techniques to maintain the readability and accessibility of electronic records: emulation and migration. There are limitations to the record refreshing and emulation technique. Emulation of electronic records Refreshing of digital information is done by copying it onto new media and this process presumes stability of the underlying application and operating systems. Digital information today
is produced in highly varying degrees of dependence on particular hardware and software. Moreover, it is difficult for vendors to ensure that their products are either backwardly compatible with previous versions or that they can interoperate with competing products. Thus, refreshing cannot serve as a solution for preserving digital information. An alternative strategy called emulation requires the development of software applications on newer systems that can read and process records from earlier systems, thus delaying obsolescence. Owing to the ever increasing complexity and variety of systems, this approach has been largely abandoned. Migration of electronic records Migration is the periodic transfer of digital information from one hardware/software configuration to another, or from one generation of computer technology to a subsequent generation. The purpose is to preserve the integrity of digital objects and to retain the ability for clients to retrieve, display, and otherwise use them in the face of constantly changing technology. Migration includes refreshing as a means of digital preservation, but differs from refreshing in that it is not always possible to make an exact digital copy or replica of a database or other information object (due to hardware and software changes) and still maintain the compatibility of the object with the new generation of technology. Some content, functionality, or structure may be lost during migration. Migration of the information to a new standard or application program is time consuming, costly, and much more complex than simple refreshing. Preservation metadata This involves creating a snapshot of the document, not merely of its content, but also of its surrounding technical environment: operating system, peripherals, browsers, etc. The document and its accompanying “preservation metadata” are stored in a format that will allow access by subsequent generations of software and hardware. To provide for ongoing user accessibility, the system will have the ability to develop new interfaces or to allow any kind of user interface for discovery as technology changes. Betts (1999) mentioned that besides proper stewardship of storage media, companies need to have a record-management team that maintains a central repository of metadata – a sort of catalog of the company’s data and formats – and schedules the conversion to new media. The trick is to schedule conversions well before the particular storage medium is expected to deteriorate. Managers should log the software and hardware versions required to read or manipulate the data – and keep an eye out for discontinued models. Companies need to keep an “application archive”, e.g. keep a copy of the old applications so they can read the legacy data or they may use a hardware emulator. Coombs (1999) states that successful management of electronic records requires a welldocumented administrative context and information describing how the document is structured and used. This is best done at the time of creation using index terms (metadata) and classifications. Information about electronic records is as important to manage as are the records themselves.
Integrated archival approach Lawrence (2001a,b) feels a promising solution is to use an integrated archival approach. This involves using digital and analog systems together to gain the benefits of both. Digital allows information to be kept accessible and sharable. Analog provides insurers with the ability to keep digital documents and files available for decades upon decades, transcending technological advances without concern for alteration or loss of readability. New perspectives on preserving electronic records All these approaches to preserving electronic records have a common objective of solving technological problems related to the passage of time. None of them actually focuses on the objective of preserving records. Logically, archival principles and objectives should dictate the requirements that technical solutions must satisfy. Archival requirements for preservation must be based on the conception of electronic records, not as a product of computer applications, but as the instruments and by-products of the practical activity of a record creator. For success in the preservation of electronic records, it is not important whether they remain true to some given technological materialisation, but whether they continue to provide authentic evidence of the activities in which they were created. The archival profession needs to determine the specific requirements for the preservation of different types of records, and also to guarantee respect for provenance and the integrity of archival fonds over time. There must be a concerted effort to delineate specific archival requirements for preserving authentic electronic records. Strictly speaking, it is not possible to preserve electronic records, it is only possible to maintain the ability to reproduce electronic records. It is always necessary to retrieve from storage the binary digits that make up the record and process them through some software for delivery or presentation. There is a shift in priority in preservation of electronic records from their storage over time to the integral processes of putting the records into archival storage, getting them out of storage, and delivering them to future researchers. The recognition that electronic records must inevitably be reproduced accentuates the importance of being able to demonstrate the integrity and authenticity of the records. New partnerships Some archivists and record managers need to form partnerships with other information professionals for effective management of electronic records. The most valuable partnerships are between information professionals and the decision support personnel, system analysts and internal auditors. Working with these partnerships is an effective strategy for inserting the archives/records management program into the mainstream process of designing, analysing, and modifying electronic information systems. New skills
According to Bantin (2001), the archival profession needs to acquire some new skills to its “tool kit” to be effective in a world of increasingly automated records. A solid grounding in basic archival principles and techniques, new and different managerial and technical skills are required. The new skill sets can be grouped into three basic categories: 1. (1) Basic knowledge of automated systems and how they process data: first, good working knowledge of the most prevalent systems presently being employed in most organisations, i.e. transaction processing systems, database management systems, management information systems, decision support systems, data warehouses, and electronic document management systems; and second, thorough understanding of all metadata systems such as data dictionaries, information resource dictionary systems, and transaction logs. 2. (2) Information system analysis and design: ability to create conceptual models for representing records and system requirements such as the business process models. 3. (3) Management skills to translate this knowledge into a strategic plan: for example, translating a set of goals and objectives into a realistic and effective implementation project. Projects for preservation of electronic records InterPARES project The broad goal of the InterPARES project is to develop the theoretical and methodological knowledge essential for the permanent preservation of electronic records, and, on the basis of this knowledge, to formulate models, policies, strategies and standards to ensure their preservation in and over time. The InterPARES project is guided by the principles of archival science and by diplomatics. Web document digital archive project OCLC is working with a number of other organisations to develop a digital archive to preserve Web published, electronic-only documents. The Web Document Digital Archive (WDDA) project is intended to help libraries’ basic needs for identification, selection, capture, description, preservation and access to documents that would not otherwise be available in the future. The WDDA extends the digital preservation techniques to Web content (O’Leary, 2001). WDDA is a program to develop technologies and standards to preserve and use Web content that must last forever. It uses the emerging standard called Open Archive Information System (OAIS) reference model, which is on track to be an ISO standard. It will be able to handle rich media and may eventually be adaptable to intranet content, and possibly even to print content that has been digitized. Betts (1999) observes that OAIS is an information management architecture that addresses the basic problem of continuing change in technology over a period of time. The architecture postulates that archival information systems should be independent of the particular technology used to implement them at any time. That is, an archival information system should be built in such a way that it is possible to replace any component of hardware or software used in the
system with a minimal impact on the rest of the system and with no impact on the preserved collections of records. Conclusion Other long-term preservation plans include migration of source software versions, migrating records to appropriate data standards, and purchasing “virtual” computers to run obsolete software via emulation protocols. Organisations of all types use the Web for disseminating content, making preservation of electronic records critical. Currently, there is a great emphasis on creating content, but on the back end of the process, there is information that simply cannot afford to be lost. They must be accessible indefinitely, by whatever technology that comes along. The digital world provides great opportunities but great risks as well for record professionals. Plenty of evidence exists of what appears to be an international concern that automated systems are out of control and need to be better managed. Record professionals are increasingly viewed as part of the solution, even if their role is not yet truly defined or clearly understood. Betts (1999) thinks that if we can design system and data standards while thinking of multiple generations, we are in better shape. Stephens (2000a,b) suggests that archivists, record managers, and other information management specialists need to reinvent their professional practices to ensure permanent or long-term preservation of electronic records. Archivists must be willing to experiment with creative combinations of ideas, old and new, as well as be courageous enough to seek out and form partnerships with information specialists whose language and methodologies are presently foreign to them. They must be motivated enough to learn new skills and be committed to developing realistic strategies for managing electronic records (Bantin, 2001). More studies will be required in the area of the World Wide Web as Internet connections are so widely used in most organizations. The Internet in particular is a potential source of uncontrolled documents, and if appropriate planning is not done, the difficulties of preserving electronic records would greatly increase.
References Bantin, P.C. (2001), "The Indiana University electronic records project: lessons learned", Information Management Journal, Vol. 35 No.1, pp.16-24. Betts, M. (1999), "Businesses worry about long-term data losses", Computerworld, Vol. 33 No.38, pp.22-4. Coombs, P. (1999), "The crisis in electronic government record keeping: a strategy for long-term storage", Library Computing, Vol. 18 No.3, pp.196-202. ICA (1997), Guide for Managing Electronic Records from Archival Perspective, International Council on Archives, Paris, .
Lawrence, H.A. (2001a), "New perspectives on preserving documents", National Underwriter, Vol. 105 No.23, pp.3-5. Lawrence, H.A. (2001b), "When analog outpaces digital: new perspectives on preserving documents", National Underwriter, Vol. 105 No.23, pp.14-16. O’Leary, M. (2001), "Web document digital archive assures constant permanence", Econtent, Vol. 24 No.9, pp.60-1. Stephens, D.O. (2000a), "Digital preservation: a global information management problem", Information Management Journal, Vol. 34 No.3, pp.68-73. Stephens, D.O. (2000a), "Digital preservation in the United Kingdom", Information Management Journal, Vol. 34 No.4, pp.68-71. Tennant, R. (2000), "Selecting collections to digitize", Library Journal, Vol. 125 No.19, pp.26-8. ]
Further reading Brand, S., Sanders, T. (1999), "Escaping the digital dark age", Library Journal, Vol. 124 No.2, pp.46-8. Cambell, C.R. (2001), "NCUA rule gives electronic imaging a boost", Credit Union Magazine, Vol. 67 No.11, pp.46-7. Duranti, L. (2001), "The InterPARES international research project", Information Management Journal, Vol. 35 No.1, pp.44-50. Duranti, L., Thibodeau, K. (2001), "The InterPARES international research project", Information Management Journal, Vol. 35 No.1, pp.44-50. Ewalt, D.M. (2001), "Innovation: preserving data virtually", Informationweek, Vol. 860 pp.20. Jezzard, H. (2001), "OCLC leads drive to create archive of electronic-only documents", Information World Review, Vol. 172 No.2, pp.2-3. Kampffmeyer, U., Llewellyn, A. (1999), "Are e-documents legal in Europe?", Document World, Vol. 4 No.3, pp.45-8. Kranch, D.A. (1998), "Beyond migration: preserving electronic documents with digital tablets", Information Technology and Libraries, Vol. 17 No.3, pp.138-48. Phillips, J.T. (2001), "Should PDF be used for archiving electronic records?", Information Management Journal, Vol. 35 No.1, pp.60-3.
Rogers, M. (2000), "For business preservation … get it on tape", Computer Technology Review, Vol. 20 No.4, pp.38-41. Sanett, S., Trace, C. (2000), "InterPARES: securing the future of our electronic records", Information Science, Vol. 27 No.1, pp.24-6. Skaggs, D. (1999), "Electronic records preservation: an emerging technology to augment electronic documents management and electronic record keeping systems", Inform, Vol. 13 No.1, pp.35-7. Sullivan, C. (1999), "E-mail archiving: why is it important to keep track of all your e-mails?", Inform, Vol. 13 No.8, pp.24-6. Tennant, R. (1999), "Time is not on our side: the challenge of preserving digital materials", Library Journal, Vol. 124 No.5, pp.30-1.