XML for Libraries Roy Tennant eScholarship California Digital Library escholarship.cdlib.org
Introduction • Goal: introduce you to XML, explain what it can do in general terms, and highlight particular uses • Caveat: you will not learn enough to do it without further study
04:24 AM 04:24 AM
Outline • • • • •
Introduction to XML Serving XML to the Web Case Studies Tips & Advice Resources
04:24 AM 04:24 AM
Introduction to XML • Extensible Markup Language • A method of creating and using tags to identify the structure and contents of a document — not how it should be displayed • The tags used can be arbitrary or can come from a specification
04:24 AM 04:24 AM
What it Looks Like
Tennant Roy The Great American Novel It Was Dark and Stormy “I’m scared,” I said.
04:24 AM 04:24 AM
Two Types of XML • Well-Formed • Valid
04:24 AM 04:24 AM
Well-Formed XML • Follows general tagging rules: – All tags begin and end • But can be minimized if empty:
instead of
– All tags are lowercase – All tags are properly nested: •
Mark Twain
– All attribute values are quoted: • <subject scheme=“LCSH”>Music
• Has identification & declaration tags • Software can make sure a document follows these rules 04:24 AM 04:24 AM
Valid XML • Uses only specific tags and rules as codified by one of: – A document type definition (DTD) – A schema definition
• Only the tags listed by the schema or DTD can be used • Software can take a DTD or schema and verify that a document adheres to the rules • Editing software can prevent an author from using anything except allowed tags 04:24 AM 04:24 AM
Ways to Use XML • Behind the scenes as a standard and easily transformed format for information • As a transfer syntax, to exchange information in a machine-parseable form • As a method of delivery direct to the user (not recommended) 04:24 AM 04:24 AM
Why is XML Important? • It is a standard, easily extensible way to encode loosely-structured as well as highlystructured information • Due to its easy parseability, software can transform it in countless ways, thereby allowing: – Easy migration paths – Alternative displays – On-the-fly response to user needs 04:24 AM 04:24 AM
XML vs. Databases (a simplistic formula)
• If your information is… – Tightly structured – Fixed field length – Massive numbers of individual items
• You need a database • If your information is… – Loosely structured – Variable field length – Massive record size
• You need XML 04:24 AM 04:24 AM
Serving XML to the Web • Directly in native form • Transformed to static HTML • Transformed to HTML dynamically
04:24 AM 04:24 AM
Transforming XML: XSLT • XML Stylesheet Language — Transformations (XSLT) • A markup language and programming syntax for processing XML • Is most often used to: – Transform XML to HTML for delivery to standard web clients – Transform XML from one set of XML tags to another – Transform XML into another syntax/system 04:24 AM 04:24 AM
Required Components for Serving XML to the Web • An XML-encoded “document” • An XSLT stylesheet to… • …transform it to HTML or XHTML: – Static – Dynamic
• A CSS stylesheet (optional)
04:24 AM 04:24 AM
XML Web Publishing Software • Required to: – Apply dynamic transformations to XML content – Render HTML dynamically for standard web browsers
• Just beginning to be available: – Cocoon: http://xml.apache.org/cocoon/ – AxKit: http://axkit.org/
04:25 AM 04:25 AM
Case Study: Publishing Books @ the California Digital Library • Goals: – To create highly usable online versions of books – To create versions that will migrate easily as technology changes – To create an infrastructure that will support dynamic presentations of the same content 04:25 AM 04:25 AM
Case Study: Publishing Books @ the California Digital Library • Strategy: Markup the texts in XML Serve them dynamically using XML web publishing software (currently Cocoon) Create different displays for different purposes, and a mechanism for allowing the user to select their preferred view Find and apply an XML-aware search engine – Create a method by which users can create their own Adobe Acrobat versions
04:25 AM 04:25 AM
AxKit mod_perl
Web Server
Cocoon Tomcat
Web Server
Cocoon Tomcat
Web Server I want this XML doc…
XSLT Stylesheet XML Doc
Cocoon Tomcat
Web Server
XSLT Stylesheet XML Doc
Cocoon Tomcat
Web Server
* Dynamic document
XHTML Document (no display markup)* HTML Stylesheet (CSS)
Transformation Information XML Doc
XSLT Stylesheet
Cocoon Tomcat
Web Server
* Dynamic document
Presentation XHTML Document (no display markup)* HTML Stylesheet (CSS)
Case Study: ILL ASAP ILL ASAP OCLC
Downloaded Requests
XML File
XSL Stylesheet
04:25 AM 04:25 AM
Local Catalog
Internet Explorer
Printable XHTML File
04:25 AM 04:25 AM
04:25 AM 04:25 AM
Service Tasmania Architecture
04:25 AM 04:25 AM
Case Study: Univ. of Michigan
04:25 AM 04:25 AM
04:25 AM 04:25 AM
Tips and Advice • Begin transitioning to XML now: – XHTML and CSS for web files, XML for static documents with long-term worth
• Do not rely on browser support of XML • DTDs? We don’t need no stinkin’ DTDs! • Get on the XML4Lib discussion list: http://sunsite.berkeley.edu/XML4Lib/ • Buy my book! 04:25 AM 04:25 AM
Resources • • • • •
Web sites Electronic discussions Books Magazines and journals Individuals
04:25 AM 04:25 AM