XML Schema Neeraj Singh October 2009
© 2008 MindTree Consulting
Agenda XML Validation Introduction to XML Schema Examples / Demo
Slide 2
XML Validation
© 2008 MindTree Consulting
An Introduction to XML Validation
One of the important innovations of XML is the ability to place preconditions on the data the programs read, and to do this in a simple declarative way.
XML allows you to say that every Order element must contain exactly one Customer element, that each Customer element must have an id attribute that contains an XML name token,
that every ShipTo element must contain one or more Streets, one City, one State, and one Zip, and so forth.
Checking an XML document against this list of conditions is called validation.
Validation is an optional step but an important one. Slide 4
Validation There are many reasons and opportunities to validate an XML document: When we receive one, before importing data into a legacy system When we receive one, before importing data into a legacy system, when we have produced or hand-edited one
To test the output of an application, etc.
Validation as “firewall” to serve as actual firewalls when we receive documents from the external world (as is commonly the case with Web Services and other XML communications),
to provide check points when we design processes as pipelines of transformations.
Validation can take place at several levels. Structural validation Data validation
Slide 5
Schema Languages There is more than one language in which you can express such validation conditions. Generically, these are called schema languages, and the documents that list the constraints are called schemas.
Different schema languages have different strengths and weaknesses.
The document type definition (DTD) is the only schema language built into most XML parsers and endorsed as a standard part of XML.
The W3C XML Schema Language (schemas for short, though it’s hardly the only schema language) addresses several limitations of DTDs.
Many other schema languages have been invented that can easily be integrated with your systems. Slide 6
XML Schema
© 2008 MindTree Consulting
XML Schema Introduction
W3C XML Schema (Schema) is an XML-based technology that is considered a replacement for DTDs. Just like DTDs, schemas are used for defining the constraints of an XML document. But unlike DTDs, they provide strong data typing and support for namespaces -- and since they are based on XML, they are also extensible.
Advantage of XML Schema over DTD Schemas are written in XML instance document syntax, using tags, elements, and attributes.
Schemas are fully namespace aware. Schemas can assign data types like integer and date to elements, and validate documents not only based on the element structure but also on the contents of the elements.
Slide 8
Schema definition
A schema is defined in a separate file and generally stored with the .xsd extension.
Every schema definition has a schema root element that belongs to the http://www.w3.org/2001/XMLSchema namespace. The schema element can also contain optional attributes.
For example: The following example indicates that the elements used in the schema come from the http://www.w3.org/2001/XMLSchema namespace. <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
Slide 9
Schema Linking when document root element is from null namespace
Let's start with our first document. It must have only "root" element and this element can contain text only. The element is from null namespace. Valid document –
aaa
If you want to validate this document with XML Schema, you have to associate some Schema document with it. If the root element is from null namespace, you will use "noNamespaceSchemaLocation" attribute.
test
Slide 10
Schema Linking when document root element from some particular namespace
Now, let's have the same document as in previous example, but the "root" element must be from some concrete namespace, let's say "http://foo". Valid document
aaa
If the root element is from some particular namespace, you associate the Schema using "schemaLocation" attribute. The first part of this attribute is the target namespace, the second one the URL of the Schema file.
test
Slide 11
Example s / Demo
01_FirstXMLSchema.xsd Writing your first XML Schema and a valid XML file based on this. This will also demonstrate how to link a XML file with a XML schema.
02_FirstNameSpace.xsd This example demonstrate the use of namespace. If you have a xml document that belongs to certain namespace, how to connect to a XML Schema.
Slide 12
Schema elements
A schema file contains definitions for element and attributes, as well as data types for elements and attributes. It is also used to define the structure or the content model of an XML document.
Elements in a schema file can be classified as either simple or complex
Schema elements: Simple type A simple type element is an element that cannot contain any attributes or child elements; it can only contain the data type specified in its declaration. The syntax for defining a simple element is: <xs:element name="ELEMENT_NAME" type="DATA_TYPE" default/fixed="VALUE" /> Where DATA_TYPE is one of the built-in schema data types
Slide 13
Schema elements: Simple type Contd…
You can also specify default or fixed values for an element. You do this with either the default or fixed attribute and specify a value for the attribute. Note: Specifying a fixed or default attribute is optional.
An example of a simple type element is: <xs:element name="Author" type="xs:string" default="Whizlabs"/>
All attributes are simple types, so they are defined in the same way that simple elements are defined. For example:
<xs:attribute name="title" type="xs:string" />
Slide 14
Schema data types
All complex types
All data types in schema inherit from anyType. This includes both simple and complex data types. You can further classify simple types into builtin-primitive types and built-in-derived types.
Built-in datatype hierarchy
A complete hierarchical diagram from the XML Schema Datatypes Recommendation is shown below. ur types – derived by restriction built-in primitive types – derived by list built-in primitive types – derived by extension or restriction Complex types Slide 15
Schema elements: Complex types Complex types are elements that either:
Contain other elements Contain attributes Are empty (empty elements) Contain text
To define a complex type in a schema, use a complexType element. You can specify the order of occurrence and the number of times an element can occur (cardinality) by using the order and occurrence indicators, respectively.
For example: <xs:element name="Book"> <xs:complexType> <xs:sequence> <xs:element name="Name" type="xs:string" /> <xs:element name="Author" type="xs:string" maxOccurs="4"/> <xs:element name="ID" type="xs:string"/> <xs:element name="Price" type="xs:string"/>
In this example, the order indicator is xs:sequence, and the occurrence indicator is maxOccurs in the Author element name.
Slide 16
Schema elements: Complex types (Mixed content)
W3C XML Schema supports mixed content though the mixed attribute in the xs:complexType elements. Consider <xs:element name="book"> <xs:complexType mixed="true"> <xs:all> <xs:element name="title" type="xs:string"/>
It will validate an XML element such as:
Funny book by Charles M. Schulz. Its title (Being a Dog Is a FullTime Job) says it all !
<xs:element name="author" type="xs:string"/> <xs:attribute name="isbn" type="xs:string"/>
Slide 17
Example s / Demo
07_ComplexType01.xsd Your first complex type. Element can contain a mixture of elements. Now, we want the element "root" to contain elements "aaa", "bbb", and "ccc" in any order. We will use the "all" element. It also demonstrate the use of All.
11_EmptyElementUsingAnyType.xsd Empty element. We want to have the root element to be named "AAA", from null namespace and empty. The empty element is defined as a "complexType" with a "complexContent" which is a restriction of "anyType", but without any elements.
Slide 18
Occurrence indicators
Occurrence indicators specify the number of times an element can occur in an XML document. You specify them with the minOccurs and maxOccurs attributes of the element in the element definition.
As the names suggest, minOccurs specifies the minimum number of times an element can occur in an XML document while maxOccurs specifies the maximum number of times the element can occur.
It is possible to specify that an element might occur any number of times in an XML document. This is determined by setting the maxOccurs value to unbounded.
The default values for both minOccurs and maxOccurs is 1, which means that by default an element or attribute can appear exactly one time.
Slide 19
Order indicators
Order indicators define the order or sequence in which elements can occur in an XML document. Three types of order indicators are:
All: If All is the order indicator, then the defined elements can appear in any order and must occur only once. Remember that both the maxOccurs and minOccurs values for All are always 1.
Sequence: If Sequence is the order indicator, then the elements must appear in the order specified.
Choice: If Choice is the order indicator, then any one of the elements specified must appear in the XML document.
Slide 20
Example: Occurrence and order indicators <xs:element name="Book"> <xs:complexType> <xs:all> <xs:element name="Name" type="xs:string" /> <xs:element name="ID" type="xs:string"/> <xs:element name="Authors" type="authorType"/> <xs:element name="Price" type="priceType"/> <xs:complexType name="authorType"> <xs:sequence> <xs:element name="Author" type="xs:string" maxOccurs="4"/> <xs:complexType name="priceType"> <xs:choice> <xs:element name="dollars" type="xs:double" /> <xs:element name="pounds" type="xs:double" />
the <xs:all> indicator specifies that the Book element, if present, must contain only one instance of each of the following four elements: Name, ID, Authors, Price.
The xs:sequence indicator in the authorType declaration specifies that elements of this particular type (Authors element) contain at least one Author element and can contain up to four Author elements.
The xs:choice indicator in the priceType declaration specifies that elements of this particular type (Price element) can contain either a dollars element or a pounds element, but not both.
Slide 21
Restriction
A main advantage of schema is that you have the ability to control the value of XML attributes and elements.
A restriction, which applies to all of the simple data elements in a schema, allows you to define your own data type according to the requirements by modifying the facets available for a particular simple type.
To achieve this, use the restriction element defined in the schema namespace.
W3C XML Schema defines 12 facets for simple data types. Enumeration, maxExclusive, minExclusive, maxInclusive, minInclusive, maxLength, minLength, pattern, length, whiteSpace, fractionDigits, totalDigits
Slide 22
Example - To restrict the length of the text node An example that shows how to restrict the length of the text node <xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:restriction base="tokenWithLangAndNote"> <xs:maxLength value="255"/> <xs:attribute name="lang" type="xs:language"/> <xs:attribute name="note" type="xs:token"/> Slide 23
Example – Remove an attribute from the element To remove the note attribute from the element title, we declare note to be prohibited in the list of attributes in the restriction: <xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:restriction base="tokenWithLangAndNote"> <xs:maxLength value="255"/> <xs:attribute name="lang" type="xs:language"/> <xs:attribute name="note" use="prohibited"/>
Slide 24
Facets
enumeration - Value of the data type is constrained to a specific set of values. <xs:simpleType name="Subjects"> <xs:restriction base="xs:string">
maxExclusive - Numeric value of the data type is less than the value specified.
minExclusive -Numeric value of
<xs:enumeration value="History"/>
the data type is greater than the value specified.
<xs:enumeration value="Geology"/>
<xs:simpleType name="id">
<xs:enumeration value="Biology"/>
<xs:restriction base="xs:integer"> <xs:maxExclusive value="101"/> <xs:minExclusive value="1"/>
Slide 25
Facets Contd…
maxInclusive - Numeric value of the data type is less than or equal to the value specified.
minInclusive - Numeric value of the data type is greater than or equal to the value specified. <xs:simpleType name="id"> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> <xs:maxInclusive value="100"/>
maxLength - Specifies the maximum number of characters or list items allowed in the value.
minLength - Specifies the minimum number of characters or list items allowed in the value.
pattern - Value of the data type is constrained to a specific sequence of characters that are expressed using regular expressions. <xs:simpleType name="nameFormat"> <xs:restriction base="xs:string"> <xs:minLength value="3"/> <xs:maxLength value="10"/> <xs:pattern value="[a-z][A-Z]*"/> Slide 26
Facets Contd… length - Specifies the exact number of characters or list items allowed in the value. <xs:simpleType name="secretCode"> <xs:restriction base="xs:string"> <xs:length value="5"/>
whiteSpace - Specifies the method for handling white space. Allowed values for the value attribute are preserve, replace, and collapse. <xs:simpleType name="FirstName"> <xs:restriction base="xs:string"> <xs:whiteSpace value="preserve"/>
fractionDigits - Constrains the maximum number of decimal places allowed in the value.
totalDigits - The number of digits allowed in the value. <xs:simpleType name="reducedPrice"> <xs:restriction base="xs:float"> <xs:totalDigits value="4"/> <xs:fractionDigits value="2"/>
Slide 27
Multiple Restriction using ‘Union’ The union has been applied on the two embedded simple types to allow values from both data types, our new data type will now accept the values from an enumeration with two possible values (TBD and NA). <xs:simpleType name="isbnType"> <xs:union> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{10}"/> <xs:simpleType> <xs:restriction base="xs:NMTOKEN"> <xs:enumeration value="TBD"/> <xs:enumeration value="NA"/>
Slide 28
Example s / Demo
03_RestrictSimpleType01.xsd This example restricts a simple type. Here we will require the value of the element "root" to be integer and less than 25.
04_RestrictUsingUnion01.xsd We want the element "root" to be from the range 0-100 or 300-400 (including the border values). We will make a union from two intervals.
06_RestrictUnionEnum02.xsd Element can contain a string from an enumerated set. Now, we want the element "root" to have a value "N/A" or "#REF!".
14_RestrictionOfSequence.xsd The Schema declares type "AAA", which can contain up to two sequences of "x" and "y" elements. Then we declare the type "BBB", which is a restriction of the type "AAA" and contain only one x-y sequence. Slide 29
Extension The extension element defines complex types that might derive from other complex or simple types.
If the base type is a simple type, then the complex type can only add attributes. If the base type is a complex type, then it is possible to add attributes and elements.
To derive from a complex type, you have to use the complexContent element in conjunction with the base attribute of the extension element.
Extensions are particularly useful when you need to reuse complex element definitions in other complex element definitions.
For example, it is possible to define a Name element that contains two child elements (First and Last) and then reuse it in other complex element definitions.
Slide 30
An example of extensions
<xs:complexType name="Name">
<xs:complexType name="Student">
<xs:sequence> <xs:element name="First"/> <xs:element name="Last"/>
<xs:complexContent> <xs:extension base="Name"> <xs:sequence> <xs:element name="school" type="xs:string"/> <xs:element name="year" type="xs:string"/>
<xs:complexType name="Customer"> <xs:complexContent> <xs:extension base="Name">
<xs:sequence> <xs:element name="phone" type="xs:string"/>
Slide 31
Example s / Demo
12_ExtensionOfSequence.xsd Extension of a sequence. When we extend the complexType, which contains a sequence A with a sequence B, then the sequence B will be appended to sequence A.
Slide 32
Groups W3C XML Schema also allows the definition of groups W3C XML Schema also allows the of elements and attributes.
These groups are not datatypes but containers holding a
definition of groups of elements and attributes.
set of elements or attributes that can be used to describe <xs:complexType name="bookType"> complex types.
<xs:group name="mainBookElements"> <xs:sequence> <xs:element name="title" type="nameType"/> <xs:element name="author" type="nameType"/>
<xs:sequence> <xs:group ref="mainBookElements"/> <xs:element name="character" type="characterType" minOccurs="0" maxOccurs="unbounded"/>
<xs:attributeGroup ref="bookAttributes"/>
<xs:attributeGroup name="bookAttributes"> <xs:attribute name="isbn" type="isbnType" use="required"/>
<xs:attribute name="available" type="xs:string"/> Slide 33
Example s / Demo
08_AttributeGroup01.xsd Defining a group of attributes. Let's say we want to define a group of common attributes, which will be reused. The root element is named "root", it must contain the "aaa" and "bbb" elements, and these elements must have attributes "x" and "y".
12_SequenceChoiceGroup.xsd Element which contains two "patterns" (sequences), in any order. We want to have the root element to be named "AAA", from null namespace and contains two patterns in any order. The first pattern is a sequence of "BBB" and "CCC" elements, the second one is a sequence of "XXX" and "YYY" element. The element "choice" allows one of the cases: either the sequence "myFirstSequence"-"mySecondSequence" or "mySecondSequence"-"myFirstSequence".
Slide 34
List Datatypes The definition of a list datatype can List datatypes are special cases in which a structure is defined within the content of a single attribute or element.
IDREFS, ENTITIES, and NMTOKENS are predefined list datatypes
As we have seen with these three datatypes, all the list datatypes that can be defined must be whitespaceseparated. No other separator is accepted.
The definition of a list datatype by reference to an existing type is done through a itemType attribute: <xs:simpleType name="integerList"> <xs:list itemType="xs:integer"/>
also be done by embedding a xs:simpleType element: <xs:simpleType name="myIntegerList"> <xs:list> <xs:simpleType> <xs:restriction base="xs:integer"> <xs:maxInclusive value="100"/>
This datatype can be used to define attributes or elements that accept a whitespace-separated list of integers smaller than or equal to 100 such as: "1 -25000 100." Slide 35
Example s / Demo
09_ListDataType01.xsd Attribute contains a list of values. Now, we want the "root" element to have attribute "xyz", which contains a list of three integers. We will define a general list (element "list") of integers and then restrict it (element "restriction") to have exact length (element "length") of three items.
10_ListDataType02.xsd Element contains a list of values. Now, we want the "root" element to contain a list of three integers. We will define a general list (element "list") of integers and then restrict it (element "restriction") to have exact length (element "length") of three items.
Slide 36
Example s / Demo
More Examples
© 2008 MindTree Consulting
Example s / Demo
15_CustomSimpleType.xsd Definition of a custom simpleType - temperature must be greater than -273.15. The element "T" must contain number greater than -273.15. We will define our custom type for temperature named "Temperature" and will require the element "T" to be of that type.
16_PatternElement.xsd String must contain e-mail address. The element "A" must contain an email address. We will define our custom type, which will at least approximately check the validity of the address. We will use the "pattern" element, to restrict the string using regular expressions.
Slide 38
Summary
W3C XML Schema has become the de facto standard for defining the structure of an XML document and for checking the validity of XML documents. Using schema, it is possible to define:
Elements (simple and complex) Attributes Facets for XML elements The structure of a document (order indicators) The allowable number of elements (occurrence indicators) in an XML document
Slide 39
References
ibm.com/developerWorks IBM XML certification success, Part 1:
W3schools.com www.Xml.com XML Schema by OReilly http://www.zvon.org/xxl/XMLSchemaTutorial Examples used in the presentation are attached here XML-Schema-Project.zip
Slide 40
Questions
Slide 41
Thank you
XML Technology, Semester 4 SICSR Executive MBA(IT) @ MindTree, Bangalore, India
By Neeraj Singh (toneeraj(AT)gmail(DOT)com ) Slide 42