INTRODUCTION At the TEI and XML in Digital Libraries Workshop that was held at the Library of Congress in July 1998, several working groups were formed to consider various aspects of the Text Encoding Initiative. Group 1 was charged to recommend some best practices for TEI header content and to review the relationship between the Text Encoding Initiative header and MARC. To this end, representatives of the University of Virginia Library and the University of Michigan Library gathered in Ann Arbor in early October to develop a recommended practice guide. Our work was assisted by similar efforts that had taken place in the United Kingdom under the auspices of the Oxford Text Archive the previous year. The following document represents a draft of those recommended practices. It has been submitted to various constituencies for comment
Definition: Text Encoding Initiative: defines a general-purpose scheme that makes it possible to encode different textual views. “Grew out of technology based textual analysis applications employed by Humanities scholars” e.g, tracing the use of the word ‘love’ in the genre poems within a specific historical period. Focus has been on text capture (in electronic form from already existing text in another medium) rather than text creation, i.e., no other text copy exists. Assumes texts and works on texts have a common core of textual features.
Encoding: SGML (ISO 8879) and ISO 646 (7-bit character set standard). Encodings for different views of text; alternative encodings for the same text features; mechanisms for user-defined extensions to the scheme. The Guidelines make it possible to encode many different views of the text, simulataneously if necessary. TEI Guidelines are not prescriptive: few features are mandatory, but the Guidelines define a core set of tags. Extensible. The focus is on the capture of text that already exists in another medium rather than text creation. TEI Header is a set of descriptions prefixed to a TEI encoded document that specifies four components: • file description (a full bibliographic description), • encoding description (level of detail of the analysis-the aim or purpose for which an electronic file was encoded; editorial principles and practices used during the encoding of the text), • text profile (classificatory and contextual information such as the text’s subject matter; the languages and sublanguages used, the situation in which it was produced, the participants and their setting), • revision history (history of changes during the electronic files’ development). contains bibliographic information supporting resource discovery, and data management portions supporting use of the resource.
http://libraries.mit.edu/guides/subjects/metadata/standards/tei.html
HISTORY The TEI was established in 1987 to develop, maintain, and promulgate hardware- and software-independent methods for encoding humanities data in electronic form. Over nearly three decades the TEI has been extraordinarily successful at achieving its objective and it is now widely used by scholarly projects and libraries around the world. Although a comprehensive history of the TEI has not yet been written, all known documentary resources about the TEI are stored in the Archive. If you (or others you know) have electronic copies of any original TEI documents not available here, please get in touch.The archive of the TEI-L discussion list is a rich resource for historical information, as is the archive of the now defunct TEI-TECH mailing list, which can be downloaded in its entirety.
Origins of the TEI When the Text Encoding Initiative (TEI) was originally established, scholarly projects and libraries attempting to take advantage of digital technology seemed to be faced with an overwhelming obstacle to creating sustainable and shareable archives and tools: the proliferating systems for representing textual material. These systems seemed almost always to be incompatible, often poorly designed, and multiplying at nearly the same rapid rate as the electronic text projects themselves. This situation was inhibiting the development of the full potential of computers to support humanistic inquiry by erecting barriers to access, creating new problems for preservation, making the sharing of data (and theories) difficult, and making the development of common tools impractical. Part of the problem was simply a lack of opportunity for sustained communication and coordination, but there were more systemic forces at work as well. Longevity and re-usability were clearly not high on the priority lists of software vendors and electronic publishers, and proprietary formats were often part of a business
strategy that might benefit a particular company, but at the expense of the broader scholarly and cultural community. At the end of the eighties there was a real concern that the entrepreneurial forces which (then as now) drive information technology forward would impede such integration by the proliferation of mutually incompatible technical standards. In November 1987 a meeting at Vassar College was convened to address these problems. Sponsored by the Association for Computers in the Humanities and funded by the National Endowment for the Humanities, it brought together a diverse group of scholars from many different disciplines and representing leading professional societies, libraries, archives, and projects in a number of countries in Europe, North America, and Asia. At this meeting the intellectual foundation for Text Encoding Initiative was articulated. The organization of the actual work of developing the TEI Guidelines was then undertaken by the three TEI sponsoring organizations: The Association for Computers in the Humanities, the Association for Literary and Linguistic Computing, and the Association for Computational
Linguistics. A Steering
Committee
was
organized
from
representatives of the sponsoring organizations, and an Advisory Board of delegates from various professional societies was formed. To lead the actual work two editors were chosen and four working committees appointed. By the end of 1989 well over 50 scholars were already directly involved and the size of the effort was growing rapidly. The initial phase resulted in the release of the first draft (known as "P1") of the Guidelines in June 1990. A second phase, involving an additional 15 working groups making revisions and extensions, immediately began and released its results throughout 1990–1993. Then, after another round of revisions, extensions, and supplements, the first official version of the Guidelines (‘P3’) was released in May 1994. Early on in this process a number of leading humanities textbase projects adopted the Guidelines — while they were still very much a moving target of rapidly changing drafts — as their encoding scheme, identifying problems and needs and contributing proposed solutions.
In addition, workshops and seminars were conducted to introduce the wider community to the Guidelines and ensure a steady source of experience to support continuing development. As more scholars became acquainted with the Guidelines, comments, corrections, and requests for extensions arrived from around the world. In the end there were nearly 200 scholars from many disciplines, professions, and countries in the core group that was developing the TEI Guidelines.
The TEI Consortium In January of 1999, the University of Virginia and the University of Bergen (Norway) presented a proposal to the TEI Executive Committee for the creation of an international membership organization, to be known as the TEI Consortium, which would maintain, continue developing, and promote the TEI. This proposal was accepted by the TEI Executive Committee, and shortly thereafter, Virginia and Bergen added two other host institutions with longstanding ties to the TEI: Brown University and Oxford University. This group then formulated an Agreement to Establish a Consortium for the Maintenance of the Text Encoding Initiative which was the basis on which a transition group comprising representatives from the three original sponsoring organizations of the TEI, as custodians of rights in the TEI, and from the incoming Host Organizations set about the job of drafting and incorporating the TEI Consortium during 2000.Incorporation was completed during December of 2000, and the first Board members took office during January of 2001. The goal of establishing the TEI Consortium was to maintain a permanent home for the TEI as a democratically constituted, academically and economically independent, self-sustaining, non-profit organization. In addition, the TEI Consortium was intended to foster a broad-based user community with sustained involvement in the future development and widespread use of the TEI Guidelines. In both of these goals the creation of the Consortium has proven a positive step. Inasmuch as the original goal of the TEI was to promote
collaborative research on electronic texts, by making the encoding system no longer an obstacle to such work, the Consortium's efforts are similarly directed towards making the TEI encoding system as effective a tool for creating, archiving, and sharing textual data as possible. For its members, the TEI Consortium provides valuable services to assist them in the creation and use of digital resources, and to help them stay abreast of rapidly changing technologies and practices. Following the establishment of the TEI Consortium, a critical priority was the release of an XML version of the TEI Guidelines, updating P3 to enable users to work with the emerging XML toolset. The P4 version of the Guidelines was published in June 2002. It was essentially an XML version of P3, making no substantive changes to the constraints expressed in the schemas apart from those necessitated by the shift to XML, and changing only corrigible errors identified in the prose of the P3 Guidelines. However, given that P3 had by this time been in steady use since 1994, it was clear that a substantial revision of its content was necessary, and work began immediately on the P5 version of the Guidelines. This was planned as a thorough overhaul, involving a public call for features and new development in a set of crucial areas including character encoding, graphics, manuscript description, standoff markup, and the language in which the TEI Guidelines themselves are written. The P5 version of the Guidelines is scheduled to be released at the end of 2007.
OBJECTIVE 1)
Review notes and documents prepared by Manuscript Description work group concerning collation.
2) Review the needs and practices of those parts of the TEI community (and relevant parts of the potential TEI community: i.e. those who would use the TEI if it included provision for this kind of encoding) likely to use facilities for encoding collation and physical document structure.
3) Propose a detailed work plan to improve and extend upon the recommendations currently provided by TEI P4 in these areas. The work plan will be determined by agreement of the working group but is expected to address at least the following: •
provision for encoding basic structural information about each page in the document (i.e. its identification with respect to the collation of the entire document), this information being associated directly with the individual page.
•
provision for encoding a summary of structural information about the document as a whole (i.e. an equivalent of a collational formula, encoded in the TEI header)
•
provision for several types of commentary on the physical document structure (e.g. information, both structured and unstructured, such as
measurements, identification, and description of features of paper or typography; summaries of printing history; identification of cancels, etc.); •
provision for several types of derived analytical perspectives on the physical document structure (e.g. reconstructions of individual formes, bifolia, other higher-order structures) using stand-off markup (e.g. <join> ), and provision for where this information should be located within the encoded document.
•
in concert with the Manuscript Description workgroup, harmonization of treatment of collation and physical document structure for printed books and manuscripts, at least to ensure that no redundant or incompatible recommendations are made in either section of the Guidelines.
4) Respond to comments on relevant other work that may be routed to this work group by the editors.
FUNCTION 1) A TEI Header can serve many publics. Headers can be created in a text center and reflect the center's standards, or they can serve as the basis for other types of metadata system records produced by other agencies. Headers can function in detached form as records in a catalog, as a title page inherent to the document, or as a source for index displays. 2) In addition, a header may describe a collection of documents, a single item, or a portion of an item. Variances in TEI Header content can result from making different choices of what is being described. 3) A TEI Header may not have a one to one correspondence with a MARC record. One TEI Header may have multiple MARC analytic records, or one MARC record may be used to describe a collection of TEI documents with individual headers. 4) A TEI Header serves several purposes. It may contain an historical background on how the file has been treated. It can extend the information of a classic catalog record. The Text Center and/or cataloging agency can act as the gatekeeper for creators by providing standards for content. 5) Does the TEI Header act as the electronic title page or as a catalog record? Is it integral to the document it describes or independent? Depending on the community being served, the TEI elements will reflect the interest of that community. Nonetheless, it is possible to describe a set of
"best
practices"
that
will
produce
compatible
content
while
accommodating this variety of purposes. Compatibility of content encourages a more understandable set of results when information about assorted items is displayed as a set of search results, a contents list, or an index, and it allows for more reasonable conversion of content information from TEI tags to elements of other metadata sets when this action seems advisable.
6) It is a traditional practice of librarianship to agree upon where in a document and in what order of preference one should look to identify the title, author, etc., of that document. This permits a certain consistency in terminology and allows for a certain amount of authentication of content. We recommend the following preferences to those who create headers and to those who attempt to use headers to create traditional catalog records that are compliant with AACR2 and ISBD(ER) rules. 7) As a member of the academic community, the header creator/editor has a responsibility to verify, whenever humanly possible, the intellectual source for an electronic document that presents itself without any information regarding its source or authorship.
http://wwwpersonal.umich.edu/~jaheim/teiguide.html
BENEFITS There are several tangible benefits of membership in the TEI Consortium, and the TEI is in the process of developing additional benefits as well. One of the most important benefits, which is difficult to quantify, is the fact that support for the TEI helps ensure that this important community standard will continue to be available and supported for the future, and that its development keeps pace with the needs of the text encoding community. Other, more specific benefits, include the following: 1) TEI annual meeting and conference The TEI annual meeting and conference is a central event in the TEI community and an excellent opportunity to meet with other TEI projects and users and learn more about new developments in the TEI world. Registration is free to current members and subscribers. 2) Voting in TEI elections All TEI member institutions have a vote in TEI elections, which is cast by their designated elector at the TEI annual meeting. 3) Discounts on software The TEI works to negotiate discounts with vendors of software. Currently TEI members and subscribers are entitled to a 20% discount on the popular
XML editor, which comes bundled with TEI schemas and stylesheets. Members and subscribers may obtain a discount code by contacting the TEI at
[email protected]. 4) Discounts on training and consultation TEI members and subscribers are entitled to receive discounts from participating institutions on TEI training workshops and consultation.
5) Free printed copy of the TEI Guidelines All TEI members receive a free copy of each new printed release of the TEI Guidelines.
The TEI continues to explore additional opportunities for membership benefits, such as discounts on vendor rates for digitization services. Any new benefits will be announced on TEI-L and at this site.
http://www.tei-c.org/Membership/benefits.xml?style=printable
CONCLUSION The above overview hopefully demonstrates the comprehensive nature of the TEI Header as a mechanism for documenting electronic texts. The emergence of the electronic text over the past decade has presented librarians and cataloguers with many new challenges. Existing library cataloguing procedures, while inadequate to document all the features of electronic texts properly, were used as a secure foundation onto which additional features directly relevant to the electronic text could be grafted. Chapter Nine of AACR2 (Anglo-American Cataloguing Rules) requires substantial updating and revision, as it assumes that all electronic texts are published through a publishing company and cannot adequately catalogue texts which are only published on the Internet. The TEI Header has proved to be an invaluable tool for those concerned with documenting electronic resources; its supremacy in this field can be measured by the increasing number of electronic text centres, libraries, and archives which have adopted its framework. The Oxford Text Archive has found it indispensable as a means of managing its large collection of disparate electronic texts, not only as a mechanism for creating its searchable catalogue, but as a means of creating other forms of metadata which can communicate with other information systems.
Ironically it is the same generality and flexibility offered by the TEI Guidelines (P3) on creating a header which have hindered the progress of one of the main goals of the TEI and the hopes of the electronic text community as a whole, namely the interoperability and interchangeability of metadata. Unlike the Dublin Core element set, which has a defined set of rules governing its content, the TEI Header has a set of guidelines, which allow for widely divergent approaches to header creation. While this is not a major problem for individual texts, or texts within a single collection, the variant way in which the guidelines are interpreted and put into practice make easy interoperability with other systems using TEI Headers more difficult than first imagined. As with the Dublin Core element set,
what is required is the wholescale adoption of a mutually acceptable code of practice which header creators could implement. One final aspect of the TEI Header which is a cause of irritation to those creating and managing TEI Headers and texts; the apparent dearth of affordable and user-friendly software aimed specifically at header production. While this has long been a general criticism of SGML applications as a whole, the TEI can in no way be held to blame for this absence, as it was not part of the TEI remit to create software. However it has contributed to the relatively slow uptake and implementation of the TEI Header as the predominant method of providing well structured metadata to the electronic text community as a whole. Until this situation is adequately resolved the tools on offer tend to be freeware products designed by people within the SGML community itself, or large and very expensive purpose-built SGML aware products aimed at the commercial market.
http://www.slais.ubc.ca/COURSES/libr500/2000-2001-wt1/www/L_Little-Wolfe/tei.htm
1. To specify a common interchange format for machine readable texts. 2. To provide a set of recommendations for encoding new textual materials. The recommendations would specify both what features are to be encoded and how those features are to be represented. 3. To document the major existing encoding schemes, and develop a metalanguage in which to describe them. (from The ACH/ACL/ALLC Text Encoding Initiative: An Overview by Susan Hockey