XML SCHEMA CHAPTER 3
Introduction In the previous chapter we learnt DTD that is Traditional way of validating an XML document, which were inherited from SGML. Over many times people have complained to the W3C about the complexity of DTDs and have asked for something simpler. W3C for the above complaint assigned a committee to work on the problem, and came up with a solution, which is more complex than DTDs called XML Schemas. On the other hand XML Schemas are also far more powerful than DTDs everwere. What DTDs cannot provide? Specific data types for attribute type. But where as Schemas supports data types for attributes. A Schema is a set of rules for constraining the structure and articulating the information set of XML documents.
Advantages of Schema over DTDs XML Schema is based on XML, not some specialized syntax. XML can be parsed and manipulated just like any other XML document. XML Schemas support a verity of data types (int, floats, Booleans, dates, Strings...) XML Schemas present an open-ended data model, which allows you to extend vocabularies and establish inheritance relationships between elements without invalidating documents. XML Schemas support namespace integration, which allows you o associate individual nodes of a document with type declarations in a schema. XML Schemas support attribute groups, which allows you to logically combine attributes. One of the original proponents of XML Schemas was Microsoft. Microsoft documentation on XML frequently decried DTD as being too complex and said that schemas would fix the problem. In fact, the Microsoft implementation of XML Schemas in IE was promptly outdated not long after it was introduced. XML Schemas in Internet Explorer As with many other developers, Microsoft got caught basing its software on a relatively early XML specification, which promptly changed. As implemented in IE, Microsoft's Schemas are based on the XML data.
Writing XML Schema The DTD is very straight forward, primarily because XML schema is a pretty simple vocabulary by most standards. The root element of all XML schema documents is schema, which is declared in the DTD as potentially containing three child elements: AttributeType, ElementType and Description. In addition to these elements, the XML schema vocabulary declares several other elements that are used to describe document schemas. The following are the elements that make you the XML schema vocabulary
Schema
Serves as the root element for XML schema documents
Datatype
Describes data types for elements and attributes
ElementType
Describes a type of element
Element
Identifies an element that can occur with in another element type
Group
Organizes elements into groups for ordering purposes
Attribute type
Describes a type of attribute
Attribute
Identifies an attribute that can occur within an element type
Description
Provides documentation for an element or attribute
The Schema Element The schema element serves as the root (document) element for XML schema documents and acts as a container for all other schema content. The schema element includes two attributes Name
The name of the schema
Xmlns
The namespace for the schema
The name attribute establishes the name of the schema. The Xmlns attribute is very important in that it establishes the namespace for the schema. This attribute must be set to urn: schemas-microsoft-com: xml -data in order to use Microsoft's XML schema implementation. <Schema name="myschema" xmlns="urn:schemas-microsoft-com: xml -data"> Schema > NOTE Namespace are used in XML documents to guarantee uniqueness among element and attribute names associated with a given XML vocabulary. Namespaces take the form of URLs, which are often the familiar URLs, used to identify resources on the Web. In addition to specifying the namespace for the schema, usually it is also necessary to specify the namespace for XML schema data types. The data type namespace is typically assigned to the
xmlns:dt attribute and is set to urn: schemas-microsoft-com:datatypes. You must set this namespace in order to use any of the XML schema datatypes, such as date, time, int and float. <Schema name="myschema "xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes"> The schema element can contain child elements of type AttributeType, ElementType and Description. The AttributeType and ElementType elements define attribute types and element types. The ElementType Element The Element Type element is used to define element types that establish the schema of documents. The ElementType element can contain datatypes, element, group, AttributeType, attribute and Description child elements. The element attribute identifies an instance of a child element with in the element; you use the element attribute to establish the content model for an element type. Attributes for an element type are established using the AttributeType and attribute elements. The AttributeType element defines a type of attribute, while the attribute element identifies an actual attribute of the element type. Any attribute types defined with in an ElementType element are considered local to that element. The ElementType element includes several attributes for defining the specific parameters of the element type:
name
The name of the element
model
Whether the content model is open or closed
content
The type of content contained within the element
order
The order of the child elements and groups contain within the element
dt:type
The type of the element
The following are the examples of element types defined using the ElementType element: <ElementType name="name" content="textOnly" dt:type="string"/> <ElementType name="type" content="textOnly" dt:type="string"/> <ElementType name="product" content="eltOnly" model="closed" order="seq"> <element type="name"/>
<element type="type"/> <ElementType/> <ElementType name="products" content="eltOnly" model="closed" order="seq"> <element type="product"/> <ElementType/> Notice that the name and type elements are first declared using the ElementType element, and then are identified within the content model of the session element using the element. The name and model Attributes The name attribute is used to specify the name of the ElementType and is required attribute. This value must be unique for element types within the scope in which it is defined. The model attribute specifies whether the schema document adheres to an open or closed content model. An open model allows additional elements to be defined within the element type that aren't declared in schema, for a very extensible schema. Element types will assume an open model by default. The content Attribute The content attribute of ElementType is used to establish the type of content contained within the element type. The following are acceptable values for this attribute: empty
The element type doesn't contain any content
textOnly
The element type can only contain text (if the content model is open, the element type may also contain other unspecified elements)
eltOnly
The element type can only contain the specified child elements
mixed
The element type can contain the mixture of text and specified child elements (if the content model is open, the element type may also contain other unspecified elements)
The order Attribute The order attribute is used to establish the order and frequency of the group of child elements contained within the element type. The following are acceptable values for this attribute: one
Only one of a set of elements is allowed
seq
The elements must occur in the specified sequence
many
The elements can occur any number of times in any order
The dt:type Attribute The dt:type Attribute is used to establish the type of content contained within the element type. The types allowed in the dt:type attribute match those that are allowed in the datatypes element. XML Schema datatypes will be covered later. The element Element The Element element is used to declare an instance of an element with a group or element type. The Element element includes three attributes for describing additional information about an element instance: type
The type of element
minoccurs
The minimum number of times the element must occur
maxoccurs
The maximum number of times the element must occur
The type attribute is used to specify the type of the element. The value assigned to the type attribute
must
be
the
name
of
an
element
type
already
declared
in
the
schema.
The minoccurs and maxoccurs attributes are used to establish the number of times an element can occur within a group or element type. Both attributes have default values of 1 in the XML-Data note, which means that an element must occur exactly one by default. The relationship between the minoccurs and maxoccurs Attributes and the number of times an Element or Group can occur minoccurs
maxoccurs
# Of Times Element /Group can occur
0
1
0or1
1
1
1
0
*
Any number of times
1
*
At least once
>0
*
At least minoccurs times
>maxoccurs
>0
0
Any value
<minoccurs
0
Note The table applies to the group element, because groups also have minOccurs and maxOccurs attributes that serve the same purpose. The following is an example of the element used to declare element instances within an element type:
<ElementType name="location" content="textOnly"/> <ElementType name="comments" content="textOnly"/> <ElementType name="session" model="closed" content="eltOnly" order="seq"> <element type="location" minOccurs="1" maxOccurs="1"/> <element type="comments" minOccurs="0" maxOccurs="1"/> The Group Element The group element is used to group elements for organizational purpose and for establishing complex content models. A complex content model consists of more than one group of elements. The group element includes three attributes for fine-tuning groups: order
The order of the child elements contained within the group
minoccurs
The minimum number of times the group must occur
maxOccurs
The maximum number of times the group must occur
The order attribute works exactly like its counterpart in the ElementType element. The following are acceptable values for this attribute: one
Only one of a set of elements is allowed within the group.
seq
The elements must occur in the specified sequence in the group.
many
The elements can Occur any number of times and in any order in the group.
The minOccurs and maxOccurs attributes play the exact same role in the group element as they did in the element, which is constraining the number of times the group can Occur. The AttributeType Element The attribute type element is used to define attribute types for use in elements. Similar to the ElementType element, the attribute type element simply defines an attribute type. To actually declare an attribute as part of an element, you must use the attribute element, which reference an attribute type element. Attribute type may be defined at the top level of a schema document or within individual element type. This allows you to create either global attributes or local attributes within a given scope. Global attributes are handy because they can be used in multiple elements. On the other hand, local attributes can be used within a given scope to supercede another attribute of the same name.
The AttributeType element includes the following attributes to allow you to fully describe an attribute type: name
The name of the attribute type
dt:type
The data type of the attribute type
dt:values
The list of possible values for an enumerated attribute; only applicable when
dt:type
is set to enumeration
default
The default value for the attribute
required
Flag indicating whether the attribute must be provided in the element
The name attributes specifies the name of the attribute type and is a required attribute. This name must be unique among attributes within a given scope. The dt:type attribute specifies the data type of the attribute. The dt:values attribute is used to specify a list of possible values for enumerated attributes. This attribute is applicable only when dt:type is set to enumeration. The list of enumerated attribute values
is
specified
as
a
single
string
with
spaces
between
each
possible
value.
The following is an example of an enumerated attribute definition:
In this example, the available values that can be assigned to the type attribute are running, cycling and swimming. Any value other than one of these three will be considered an error during validation. The default Attribute of the AttributeType element is used to establish the default value for the attribute type. The following is an example of establishing the default value of an attribute:
name="type"
dt:type="enumeration"
dt:values="running
cycling
swimming"
default="running"/ rel="nofollow"> The required attribute is basically a flag that is used to specify whether the attribute type is required of the element in which it is defined. Acceptable values for the required attribute are yes and no, which indicate the requirement of the attribute type.
The Attribute Element The attribute element is used to declare an instance of an attribute for an element type. The attribute element includes three attributes for describing additional information about an attribute instance: type
The type of the attribute
default
The default value for the attribute
required
Flag indicating whether the attribute must be provided in the element
The type attribute is used to specify the type of the attribute. The value assigned to the type attribute must be the name of an attribute type already declared in the schema. The type attribute is what ties attribute instances to their associated attribute types. The default and required attribute serve the same purposes as their equivalents in the AttributeType element, and they will supercede the equalent attributes if they are set in the attribute type. The following is an example of the attribute element used to declare attribute instances within an element type. <ElementType name="session" content="eltOnly" order="seq"> <element type="duration" minoccurs="1" maxoccurs="1"/> <element type="distance" minoccurs="1" maxoccurs="1"/> <element type="location" minoccurs="1" maxoccurs="1"/> <element type="comments" minoccurs="0" maxoccurs="1"/> In this example, the type and date attributes are first declared using the AttributeType element and then associated with an element type using the attribute element. Notice that the default value of the type attribute is set in the attribute element instead of the AttributeType element. Note There is no constraint on the order of attributes within an element, but there can be no more than one attribute of a given name per element.
The description Element The last element used in XML Schema documents is the description element, which simply provides a means of placing a text description within a schema. The description element is a text only element that is designed for documentation purposes. You can use description element in any way you choose to provide documentation about an XML Schema construct. The following is an
example
of
how
you
might
add
documentation
to
element
type:
<ElementType name="trainlog" content="eltOnly"> <description> This element type represents training log consisting of one or more training sessions. <element type="session" minOccurs="1" maxOccurs="*"/> XML Schema Data Types As you know, XML DTDs offer a limited number of data types and they are rather primitive. For all practical purposes, XML really only supports a string data type, which is extremely limiting if you're creating structured document schemas. The XML-Data note defines a number of rich data types that can be used to specify familiar data types, such as integers, floating point numbers, dates, and times, to name a few. As of Internet Explorer 5.0, XML Schema supports all of these data types in elements and hopefully will support them for attributes at some point in the future. XML Schema data types are referenced from the urn:schema-microsoft-com: datatypes data types namespace. To make referencing the data types easier, you must declare this namespace at the document level of your schema documents. The data type namespace is typically assigned to the xmlns:dt attribute, which means that you reference XML Schema data type by preceding them with dt:. Example <Schema
name="Myschema"
xmlns="urn:schema-microsoft-com:xml-data"
xmlns:dt="urn:schema-microsoft-com:datatypes"> The whole point of declaring the XML Schema data type namespace is so you can use the data
types it supports. The following is a list of these data types, which go far beyond the limited data types supported in XML 1.0: char
Character (text string with a length of one)
boolean
Boolean (0 or 1)
int
Whole number (integer)
float
Real (floating point) number with fractional part and optional exponent
number
Real number (same as float)
fixed.14.4
Real number with 14 whole digits and 4 fractional digits
i1
One-byte integer
i2
Two-byte integer
i4
Four-byte integer
r4
Four-byte real number
r8
Eight-byte real number (same as float)
ui1
One-byte unsigned integer
ui2
Two-byte unsigned integer
ui4
Four-byte unsigned integer
bin.hex
Hexadecimal (base 16) number
bin.base64
Base 64 number
date
Date (without time or zone)
dateTime
Date with optional time (without time zone)
dateTime.tz
Date with optional time and time zone
time
Time (without data and time zone)
time.tz
Time with time zone (without data)
uri
Universal Resource Identifier (URI)
uuid
Global identifier
The following are the primitive data types available for use in XML Schema: string
A string type
enumeration
An enumerated type (attributes only)
notation
A NOTATION type
entity
The ENTITY type
entities
The ENTITIES type
id
The ID type
idref
The IDREF type
idrefs
The IDREFS type
nmtoken
The NMTOKEN type
nmtokens
The NMTOKENS type
Employees.xml <employees xmlns="x-schema:employees.xml"> <employee> <eid id="A100">A100 <ename>Surya <sal>50000.00 <desig>CEO 3751135 <email>[email protected] <employee> <eid id="A101">A101 <ename>Rajesh <sal>30000.00 <desig>Director 3751238 <email>[email protected] empSchema.xml Note Microsoft Schema extension is .xml, whereas W3C Schema file extension is .xsd <Schema
xmlns="urn:schemas-microsoft-com:xml-data"
com:datatypes"> XML Namespace urn=> Uniform Resource Namespace dt=> datatype
xmlns:dt="urn:schemas-microsoft-
--> How do you associate a schema with this document as far as Internet Explorer is concerned? You do so by specifying a default namespace attribute in the root element, and prefacing the name of the schema file with x-schema: like this: <programming_team xmlns="x-schema:schema1.xml"> <programmer>Fred Samson <programmer>Edward Here, I'm naming the schema file schema1.xml (IE Schema does not insist on any special extension for schema file) Creating Schema file you can name the schema using the name attribute in Schema. <Schema
name="schema1"
xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:datatypes"> <ElementType name="programming" content="textonly" model="closed"/> <ElementType name="programming_team" content="eltonly" model="closed"> <element type="programming" minOccurs="1" mixOccurs="*"/> One of the advantages of using schemas is that they allow you to specify the actual data types that you want to use, but those data types weren't fully fleshed out at the time Microsoft decided to implement schemas, so Microsoft implemented its own. To create a schema for Internet Explorer, you set up a default namespace