The Simple API for XML (SAX) Part II Ethan Cerami New York University ©Copyright 2003-2004. These slides are based on material from the upcoming book, “XML and Bioinformatics” (Springer-Verlag) by Ethan Cerami. Please email
[email protected] for permission to copy. 10/17/08
Simple API for XML (SAX), Part II
1
Road Map
Validating Documents with SAX
Introduction to XML Namespaces
SAXValidator.java Declaring Namespaces Qualified Names Default Namespaces
Working with SAX Elements, Attributes and Namespaces
10/17/08
SAXElementAttribute.java
Simple API for XML (SAX), Part II
2
Validating Documents with SAX
10/17/08
Simple API for XML (SAX), Part II
3
Error Categories
The XML 1.0 specification defines three types of errors:
10/17/08
Fatal Errors: these are usually errors in well-formedness. Parser must stop normal processing if a fatal error is encountered. Errors: these are non-fatal errors, usually related to document validity. Warnings: catch-all category for other minor problems.
Simple API for XML (SAX), Part II
4
Defaults
By default, the Xerces XML parser (and most other parsers) will check for well-formedness, but they will not automatically check for validity. To check for validity, you must follow three steps:
10/17/08
Turn the SAX Validation Feature On Implement an ErrorHandler interface Register your error handler
Simple API for XML (SAX), Part II
5
Validating Documents
Turn the SAX validation feature on.
try { parser.setFeature("http://xml.org/sax/features/validation", true); } catch (SAXNotRecognizedException e) { System.out.println ("SAX Feature Not Recognized: " +e.getMessage()); } catch (SAXNotSupportedException e) { System.out.println ("SAX Feature Not Supported: " +e.getMessage()); }
10/17/08
Simple API for XML (SAX), Part II
6
Working with SAX Features
The XMLReader interface defines setFeature()/getFeature() methods. The setting of properties or features may trigger a:
10/17/08
SAXNotRecognizedException: the requested feature is not recognized. SAXNotSupportedException: the feature is recognized, but not supported. For example, not all parsers support the validation feature.
Simple API for XML (SAX), Part II
7
Validating Documents Implement a SAX Error Handler Interface. The ErrorHandler interface defines three error methods, corresponding to the three levels of errors defined within the XML 1.0 specification:
10/17/08
fatalError() error() warning()
Simple API for XML (SAX), Part II
8
Error Handler Implementation
Each of the ErrorHandler methods receives a SAXParseException parameter. The exception encapsulates the error and its location within the document. The ErrorHandler implementation has two main options:
10/17/08
throw the embedded SAXException, and thereby stop normal parsing. record the exception (for example, write to a log file) and not throw the embedded SAXException. The parser will therefore continue normal parsing.
Simple API for XML (SAX), Part II
9
Error Handler Implementation
For example, the following implementation stops parsing when regular errors are encountered: /** * Receives notification of a recoverable error. * Validation Errors are reported here. * In this case, validation errors result in SAXExceptions. */ public void error(SAXParseException exception) throws SAXException { logError(exception); throw exception; }
Note: Regardless of your implementations, fatalErrors always result in SAXExceptions being thrown.
10/17/08
Simple API for XML (SAX), Part II
10
Registering your Error Handler
Once you have an implementation of Error Handler, you must register it with your parser:
parser.setErrorHandler(errorHandler);
Note: Default Handler also provides a no-op implementation of the Error Handler interface. Complete example follows on the next few slides.
10/17/08
Simple API for XML (SAX), Part II
11
Example: SAXValidator.java package com.oreilly.bioxml.sax; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException; import org.xml.sax.XMLReader; import org.xml.sax.SAXNotRecognizedException; import org.xml.sax.SAXNotSupportedException; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.helpers.XMLReaderFactory; import java.io.IOException; /** * SAX Validator. * Illustrates Basic Error Handling. */ 10/17/08
Simple API for XML (SAX), Part II
12
public class SAXValidator extends DefaultHandler { private boolean isValid = true; /** * Receives notification of a recoverable error. * Validation Errors are reported here. */ public void error(SAXParseException exception) throws SAXException { isValid = false; reportError("Error", exception); }
Log Errors and Warnings
/** * Receives notification of a warning. */ public void warning(SAXParseException exception) throws SAXException { reportError("Warning", exception); }
10/17/08
Simple API for XML (SAX), Part II
13
/** * Reports SAXParseException Information */ private void reportError(String errorType, SAXParseException exception) { System.out.println(errorType+": "+exception.getMessage()); System.out.println(" Line: " + exception.getLineNumber()); System.out.println(" Column: " + exception.getColumnNumber()); } public boolean isValid () { return isValid; } /** * Prints Command Line Usage */ private static void printUsage() { System.out.println("usage: SAXValidator xml-file"); System.exit(0); }
10/17/08
Simple API for XML (SAX), Part II
14
/** * Main Method */ public static void main(String[] args) { if (args.length != 1) { printUsage(); } try { SAXValidator errorHandler = new SAXValidator(); XMLReader parser = XMLReaderFactory.createXMLReader ("org.apache.xerces.parsers.SAXParser"); // Turn Validation On and Set Error Handler turnValidationOn(parser); parser.setErrorHandler(errorHandler); parser.parse(args[0]); // If SAXException has not been thrown, // document must be well-formed
Set Error Handler 10/17/08
Simple API for XML (SAX), Part II
15
System.out.println ("The Document is well-formed."); if (errorHandler.isValid()) { System.out.println ("The Document is valid."); } } catch (SAXException e) { System.out.println(e.getMessage()); } catch (IOException e) { e.printStackTrace(); Turn } }
Validation On
private static void turnValidationOn(XMLReader parser) { try { parser.setFeature ("http://xml.org/sax/features/validation", true); } catch (SAXNotRecognizedException e) { System.out.println ("SAX Feature Not Recognized: "+e.getMessage()); } catch (SAXNotSupportedException e) { System.out.println ("SAX Feature Not Supported: "+e.getMessage()); } } } 10/17/08
Simple API for XML (SAX), Part II
16
Example Invalid Document
<SEQUENCE version="8.30" start="1000" stop="1050"> taatttctcccattttgtaggttatcacttcactctgttgactttcttttg <SEQUENCE id="2" version="8.30" start="1000" stop="1050"> taatgcaactaaatccaggcgaagcatttcagcttaaccccgagacttttg This document is invalid
10/17/08
because I deleted the id attribute for the first sequence element. Simple API for XML (SAX), Part II
17
Example Output Error: Attribute "id" is required and must be specified for element type "SEQUENCE". Line: 5 Column: 53 The Document is well-formed.
10/17/08
Simple API for XML (SAX), Part II
18
Introduction to XML Namespaces
10/17/08
Simple API for XML (SAX), Part II
19
Introduction to XML Namespaces
The biggest difference between SAX 1.0 and SAX 2.0: support for XML Namespaces. We therefore need to digress for a while to introduce the basics of Namespaces. Attribution: These namespace slides come from the XML Namespaces Tutorial at: http://www.w3schools.com/xml/xml_namespaces
10/17/08
Simple API for XML (SAX), Part II
20
Name Conflicts Name conflicts can occurs. For example, consider these two documents: This XML document carries information in an XHTML table:
10/17/08
Simple API for XML (SAX), Part II
21
Name Conflicts
This XML document carries information about a table (a piece of furniture):
Coffee Table <width>80 120
10/17/08
Simple API for XML (SAX), Part II
22
Name Conflicts
If these two XML documents were added together, there would be an element name conflict:
both documents contain a
element with different content and definition.
To solve the problem, we use XML Namespaces. Namespaces enables us to distinguish elements even if they have the same name. Namespaces Specification was created about a year after the regular XML 1.0 spec.
10/17/08
Simple API for XML (SAX), Part II
23
Adding Namespaces
To add a namespace, you must specify a namespace attribute:
For example:
xmlns:namespace-prefix=“namespace” This declares that a namespace for XHTML: This declares a namespace for furniture:
The namespace value is usually a URL, but it doesn’t have to be.
10/17/08
Simple API for XML (SAX), Part II
24
Qualified Names
Once you have declared a namespace, you specify elements and attributes with Qualified Names:
element-prefix:local-name
The next two slides show complete examples.
10/17/08
Simple API for XML (SAX), Part II
25
Example #1 Apples Bananas All qualified names that begin with “h” are within the XHTML namespace.
10/17/08
Simple API for XML (SAX), Part II
26
Example #2 African Coffee Table 80 120 All qualified names that begin with “f” are within the W3Schools Furniture namespace. You can now combine both examples into one document, and you no longer have namespace conflicts.
10/17/08
Simple API for XML (SAX), Part II
27
Default Namespaces
You can also specify a default namespace like this:
All elements within the element are considered part of the XHTML namespace.
10/17/08
Simple API for XML (SAX), Part II
28
Working with SAX Element, Attributes and Namespaces
10/17/08
Simple API for XML (SAX), Part II
29
ContentHandler Now that we understand Namespaces, we return to the SAX Content Handler. The example on the next few slides illustrates how to handle elements, attributes and namespaces.
10/17/08
Simple API for XML (SAX), Part II
30
package com.oreilly.bioxml.sax; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.helpers.XMLReaderFactory; import org.xml.sax.SAXException; import org.xml.sax.Attributes; import org.xml.sax.XMLReader; import org.xml.sax.Locator; import java.io.IOException; /** * SAXElementAttribute. * Illustrates Elements, Attributes and Namespace Functionality. * Also illustrates use of Document Locator object. */ public class SAXElementAttribute extends DefaultHandler { private Locator _locator;
10/17/08
Simple API for XML (SAX), Part II
31
/** * Prints out all three name/namespace parameters. * Also prints out all attribute information. */ public void startElement(String namespaceURI, String localName, String qName, Attributes atts) throws SAXException { System.out.println ("Start Element: "); System.out.println ("... Line: " + _locator.getLineNumber()); System.out.println ("... Column: " + _locator.getColumnNumber()); System.out.println ("... Namespace URI: "+ namespaceURI); System.out.println ("... Local Name: "+localName); System.out.println ("... qName: "+qName); for (int i=0; i< atts.getLength(); i++) { System.out.println ("> Attribute: "); System.out.println ("... URL: "+atts.getURI(i)); System.out.println ("... Local Name: "+atts.getLocalName(i)); System.out.println ("... QName: "+atts.getQName(i)); System.out.println ("... Type: "+atts.getType(i)); System.out.println ("... Value: "+atts.getValue(i)); } } 10/17/08
Simple API for XML (SAX), Part II
32
/** * Start Prefix Mapping for XML Namespaces */ public void startPrefixMapping(String prefix, String uri) throws SAXException { System.out.println ("Start Prefix Mapping: "); System.out.println ("... Prefix: "+prefix); System.out.println ("... URI: "+uri); } /** * End Prefix Mapping for XML Namespaces */ public void endPrefixMapping(String prefix) throws SAXException { System.out.println ("End Prefix Mapping: "+prefix); } /** * Stores Document Locator */ public void setDocumentLocator (Locator locator) { this._locator = locator; } 10/17/08
Simple API for XML (SAX), Part II
33
/** * Prints Command Line Usage */ private static void printUsage() { System.out.println ("usage: SAXElementAttribute xml-file"); System.exit(0); } /** * Main Method */ public static void main(String[] args) { if (args.length != 1) { printUsage(); } try { SAXElementAttribute saxHandler = new SAXElementAttribute(); XMLReader parser = XMLReaderFactory.createXMLReader ("org.apache.xerces.parsers.SAXParser");
10/17/08
Simple API for XML (SAX), Part II
34
parser.setContentHandler(saxHandler); parser.parse(args[0]); } catch (SAXException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } }
10/17/08
Simple API for XML (SAX), Part II
35
Sample Document <xhtml:html xmlns:xhtml='http://www.w3.org/TR/REC-html40'> <xhtml:head> <xhtml:title>XML and Bioinformatics <xhtml:body> <xhtml:table xhtml:width="100%"> <xhtml:tr><xhtml:td>Welcome!
Note: All elements and attributes are within the XHTML namespace. 10/17/08
Simple API for XML (SAX), Part II
36
Sample Output Start Prefix Mapping: ... Prefix: xhtml ... URI: http://www.w3.org/TR/REC-html40 Start Element: ... Line: 2 ... Column: 59 ... Namespace URI: http://www.w3.org/TR/REC-html40 ... Local Name: html ... qName: xhtml:html Start Element: ... Line: 3 ... Column: 16 ... Namespace URI: http://www.w3.org/TR/REC-html40 ... Local Name: head ... qName: xhtml:head … (output continues…) 10/17/08
Simple API for XML (SAX), Part II
37
Summary
To validate an XML document with SAX, you must explicitly turn validation on, and implement a SAX Error Handler. XML Namespaces enable you to different elements, even if they have the same names. The startElement() SAX method passes all namespace information. Review SAXElementAttribute.java for details.
10/17/08
Simple API for XML (SAX), Part II
38
Related Documents