Deserializing Individual Elements In Xml Documents

  • Uploaded by: Charteris Plc
  • 0
  • 0
  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Deserializing Individual Elements In Xml Documents as PDF for free.

More details

  • Words: 2,863
  • Pages: 14
Charteris White Paper:

Deserializing Individual Elements in XML Documents Version 1.0 Thomas Manson ([email protected]) 12 May 2003

2003 Charteris plc

CONTENTS 1.

INTRODUCTION

3

2.

BACKGROUND

3

3.

LIMITATIONS OF MONOLITHIC DESERIALIZATION

3

4.

DESERIALIZING INDIVIDUAL ELEMENTS IN XML DOCUMENTS

4

5.

TEST RESULTS

12

6.

CONCLUSIONS

14

20 May 2003 Version 1.0

Deserializing Individual Elements in XML Documents Charteris White Paper:

Page 2

1.

INTRODUCTION This paper discusses one of the limitations of common implementations of XML deserialization using the .NET Framework. It then discusses a possible solution, and highlights the differences between the standard solution and the proposed one. It is assumed that the reader has some familiarity with XML Serialization, when it is used, and how this is implemented within the .NET Framework. This paper focuses on the current version of the .NET Framework at the time of writing – version 1.1. However, the recommendations should still be valid for version 1.0.

2.

BACKGROUND XML Serialization is the process of converting an object to a form that can be easily transported. For example, an object can be serialized and transported over HTTP. XML Deserialization is used on the receiving system to create an object tree from XML. There is not necessarily any correlation between the system doing the serialization, and the system doing the deserialization – they may be on different platforms and/or using different technologies to process the requests. The object that is created as a result of deserialization is not the same object that was serialized; it only has the same public properties. Typically, when a .NET system is built to handle incoming XML, the system will have a number of classes that conform to the XML schema definition language schema for the incoming XML. These can either be generated by hand, or by using XSD.exe. When incoming XML is received, the .NET framework will create instances of the classes, and populate their public properties according to the XML received. By default, this is a monolithic process – the XmlSerializer will read the entire stream of XML and populate all the objects.

3.

LIMITATIONS OF MONOLITHIC DESERIALIZATION Depending on the XML being received, deserializing the entire stream may not be appropriate. If the stream is large, the resulting in-memory representation may consume significant amounts of memory. The processing logic may also decide to stop processing the XML after only processing a relatively small number of the created entities. This results in suboptimal performance as the system has had to create all the entities, only not to use them. Also, as each object created is part of the entire object tree, none are available to garbage collection until the entire tree has been processed, even though many objects have been processed. It is proposed in this paper that, in some cases, it would be better to only deserialize the entities as they are required. If processing needs to stop, the rest of the XML has not been deserialized, so the are no unnecessary objects. Once an individual item has been

20 May 2003 Version 1.0

Deserializing Individual Elements in XML Documents Charteris White Paper:

Page 3

deserialized and processed, it is not part of an object tree, and is thus available for garbage collection.

4.

DESERIALIZING INDIVIDUAL ELEMENTS IN XML DOCUMENTS To demonstrate deserializing individual elements, this paper will use a fictitious example of an Estate Agent system that exchanges agent and property details with other systems. This system allows estate agents across the country to see properties that are outside of their specific areas. To facilitate the exchange of the agent and property details, a schema has been drawn up, and it is shown below. <xs:schema targetNamespace="www.charteris.com/namespaces/propertyexchange" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="www.charteris.com/namespaces/propertyexchange" elementFormDefault="qualified"> <xs:element name="PropertyExchange" type="PropertyExchangeType"/> <xs:complexType name="PropertyExchangeType"> <xs:sequence> <xs:element name="Agents" type="AgentsType"/> <xs:element name="Properties" type="PropertiesType"/> <xs:complexType name="AgentsType"> <xs:sequence> <xs:element name="Agent" type="AgentType" minOccurs="0" maxOccurs="unbounded"/> <xs:complexType name="PropertiesType"> <xs:sequence> <xs:element name="Property" type="PropertyType" minOccurs="0" maxOccurs="unbounded"/> <xs:simpleType name="AgentID"> <xs:restriction base="xs:string"> <xs:minLength value="5"/> <xs:maxLength value="30"/> <xs:complexType name="AddressType"> <xs:sequence> <xs:element name="Line1" type="xs:string"/> <xs:element name="Line2" type="xs:string"/> <xs:element name="Line3" type="xs:string"/> <xs:element name="Line4" type="xs:string"/> <xs:element name="PostalCode" type="xs:string"/>

20 May 2003 Version 1.0

Deserializing Individual Elements in XML Documents Charteris White Paper:

Page 4

<xs:complexType name="AgentType"> <xs:sequence> <xs:element name="AgentID" type="AgentID" minOccurs="0"/> <xs:element name="Name" type="xs:string" minOccurs="0"/> <xs:element name="Address" type="AddressType" minOccurs="0"/> <xs:attribute name="action" type="Action" use="required"/> <xs:complexType name="PropertyType"> <xs:sequence> <xs:element name="PropertyID" type="PropertyID" minOccurs="0"/> <xs:element name="OwningAgentID" type="AgentID" minOccurs="0"/> <xs:element name="Address" type="AddressType" minOccurs="0"/> <xs:element name="Price" type="xs:int" minOccurs="0"/> <xs:element name="DateListed" type="xs:date" minOccurs="0"/> <xs:element name="PropertyDetails" minOccurs="0"> <xs:complexType> <xs:sequence> <xs:element name="Type"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Detached House"/> <xs:enumeration value="Semi Detached House"/> <xs:enumeration value="Terraced House"/> <xs:enumeration value="End of terrace House"/> <xs:enumeration value="Flat"/> <xs:enumeration value="Bungalow"/> <xs:element name="BedRooms" type="xs:int"/> <xs:element name="Bathrooms" type="xs:int"/> <xs:element name="Kitchen" type="xs:int"/> <xs:element name="ReceptionRooms" type="xs:int"/> <xs:element name="Garage" type="xs:int" minOccurs="0"/> <xs:element name="OffRoadParking" type="xs:int" minOccurs="0"/> <xs:element name="Built" type="xs:gYear" minOccurs="0"/> <xs:element name="Viewings" minOccurs="0"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Appointment"/> <xs:enumeration value="PhoneBefore"/> <xs:enumeration value="Weekdays"/>

20 May 2003 Version 1.0

Deserializing Individual Elements in XML Documents Charteris White Paper:

Page 5

<xs:enumeration value="Weekends"/> <xs:enumeration value="AllWeek"/> <xs:attribute name="action" type="Action" use="required"/> <xs:simpleType name="Action"> <xs:restriction base="xs:string"> <xs:enumeration value="Add"/> <xs:enumeration value="Update"/> <xs:enumeration value="Delete"/> <xs:enumeration value="Query"/> <xs:simpleType name="PropertyID"> <xs:restriction base="xs:string"> <xs:minLength value="5"/> <xs:maxLength value="30"/>

Using XSD.exe to generate the required classes results in the following classes for the PropertyExchangeType, the AgentType and the PropertyType types (the complete sample application source code can be downloaded from http://www.charteris.com/Publications/WhitePapers/Downloads/PropertyExchange. zip). /// [System.Xml.Serialization.XmlTypeAttribute(Namespace="www.charteris. com/namespaces/propertyexchange")] [System.Xml.Serialization.XmlRootAttribute("PropertyExchange", Namespace="www.charteris.com/namespaces/propertyexchange", IsNullable=false)] public class PropertyExchangeType { /// [System.Xml.Serialization.XmlArrayItemAttribute("Agent", IsNullable=false)] public AgentType[] Agents; /// [System.Xml.Serialization.XmlArrayItemAttribute("Property", IsNullable=false)] public PropertyType[] Properties; } /// [System.Xml.Serialization.XmlTypeAttribute(Namespace="www.charteris. com/namespaces/propertyexchange")]

20 May 2003 Version 1.0

Deserializing Individual Elements in XML Documents Charteris White Paper:

Page 6

public class AgentType { /// public string AgentID; /// public string Name; /// public AddressType Address; /// [System.Xml.Serialization.XmlAttributeAttribute()] public Action action; } /// [System.Xml.Serialization.XmlTypeAttribute(Namespace="www.charteris. com/namespaces/propertyexchange")] public class PropertyType { /// public string PropertyID; /// public string OwningAgentID; /// public AddressType Address; /// public int Price; /// [System.Xml.Serialization.XmlIgnoreAttribute()] public bool PriceSpecified; /// [System.Xml.Serialization.XmlElementAttribute(DataType="date")] public System.DateTime DateListed; /// [System.Xml.Serialization.XmlIgnoreAttribute()] public bool DateListedSpecified; /// public PropertyTypePropertyDetails PropertyDetails; /// public PropertyTypeViewings Viewings; /// [System.Xml.Serialization.XmlIgnoreAttribute()] public bool ViewingsSpecified;

20 May 2003 Version 1.0

Deserializing Individual Elements in XML Documents Charteris White Paper:

Page 7

/// [System.Xml.Serialization.XmlAttributeAttribute()] public Action action; }

If the standard model of deserialization was used, the PropertyExchangeType would be the root of the object tree, and it would contain arrays of AgentType and PropertyType. Before any of the AgentTypes could be processed, all of the PropertyTypes would have to be deserialized, and there could be an unlimited number of them. It would be potentially more efficient if each AgentType could be processed independently before the next one was deserialized, and then the same applied to the PropertyTypes. To do this, an XmlNodeDeserializer class has been added to the project. This class will deserialize an individual element into an object, given the XmlReader that contains the XML to read. To deserialize an element independently of the rest of the XML document, the XmlSerializer has to consider the element to be the root node of the document. This could be done by modifying the code generated by XSD.exe and changing the XmlTypeAttribute to an XmlRootAttribute. However, this means that if the schema should change, and the code need to be regenerated, the change would need to be made again. It would also affect the serialization of the PropertyExchangeType for outgoing messages. The same affect can be achieved by overriding the current XmlTypeAttribute that is applied to the AgentType during the process of deserialization. This is done using the XmlAttributeOverrides class, and is shown in the code below. When the XmlSerializer deserializes the XML, it will use the attributes in the XmlAttributesOverrides, to override the ones applied to the classes. // As we are only deserializing a fragment, we need to add // the XmlRootAttribute // This is done by overriding the current attribute XmlAttributes xmlAttribs = new XmlAttributes(); // Create the new XmlRootAttribute and set its // name and namespace XmlRootAttribute rootAttrib = new XmlRootAttribute(elementName); rootAttrib.Namespace = ns; xmlAttribs.XmlRoot = rootAttrib; // Create the overrides object and add the attributes XmlAttributeOverrides overrides = new XmlAttributeOverrides(); overrides.Add(objectType, xmlAttribs); // Use the overrides to deserialize xmlSer = new XmlSerializer(objectType, overrides); // Now actually deserialize the object

20 May 2003 Version 1.0

Deserializing Individual Elements in XML Documents Charteris White Paper:

Page 8

object returnData = xmlSer.Deserialize(xmlReader);

Creating an XmlSerializer is an expensive operation. If this is done repeatedly for each element to deserialize, it will cripple the performance of the application. To prevent this happening, XmlNodeDeserializer stores the created XmlSerializer in a private static Hashtable. The XmlSerializer is not thread safe for instance methods, so each instance of the XmlSerializer is stored in the Hashtable by type and ThreadID. A new instance of the XmlSerializer will be created for each thread, and will be dedicated to that thread. In a single-threaded application, this will result in an instance of the XmlSerializer being created for each type to deserialize. In a multithreaded application, an instance will be created for each type for each thread. The complete code for the class is shown below. using using using using

System; System.Collections; System.Xml; System.Xml.Serialization;

namespace PropertyExchange { /// <summary> /// Summary description for XmlNodeDeserializer. /// internal class XmlNodeDeserializer { static Hashtable serializerCache = new Hashtable(2); static object serializerCacheLock = new object(); const string ns = "www.charteris.com/namespaces/propertyexchange"; internal XmlNodeDeserializer() { } internal object Deserialize(XmlReader xmlReader) { XmlReader localReader; // If xmlReader is a validatingReader, need to use // its reader if (xmlReader is XmlValidatingReader) { localReader = ((XmlValidatingReader)xmlReader).Reader; } else { localReader = xmlReader; } string elementName = string.Empty; Type objectType = null; if (localReader.NodeType == XmlNodeType.Element) { if (localReader.NamespaceURI == ns) { switch (localReader.LocalName) { case "Agent":

20 May 2003 Version 1.0

Deserializing Individual Elements in XML Documents Charteris White Paper:

Page 9

elementName = "Agent"; objectType = typeof(AgentType); break; case "Property": elementName = "Property"; objectType = typeof(PropertyType); break; default: throw new XmlException("Unrecognised node: " + localReader.LocalName); } } else { throw new XmlException("Unrecognised namespace on node:" + localReader.Name); } } else { throw new XmlException("xmlReader must be on the element to deserialize"); } XmlSerializer xmlSer = GetXmlSerializer(elementName, objectType); object returnData = xmlSer.Deserialize(xmlReader); return returnData; } private XmlSerializer GetXmlSerializer(string elementName, Type objectType) { // Attempt to retrieve the XmlSerializer from the hashtable XmlSerializer xmlSer = (XmlSerializer)serializerCache[objectType.FullName + AppDomain.GetCurrentThreadId().ToString()]; if (xmlSer == null) { // As we are only deserializing a fragment, we need to // add the XmlRootAttribute // This is done by overriding the current attribute XmlAttributes xmlAttribs = new XmlAttributes(); // Create the new XmlRootAttribute and set its // name and namespace XmlRootAttribute rootAttrib = new XmlRootAttribute(elementName); rootAttrib.Namespace = ns; xmlAttribs.XmlRoot = rootAttrib; // Create the overrides object and add the attributes XmlAttributeOverrides overrides = new XmlAttributeOverrides();

20 May 2003 Version 1.0

Deserializing Individual Elements in XML Documents Charteris White Paper:

Page 10

overrides.Add(objectType, xmlAttribs); // Use the overrides to deserialize xmlSer = new XmlSerializer(objectType, overrides); // Store the XmlSerializer in the Hashtable // by type and threadid serializerCache.Add(objectType.FullName + AppDomain.GetCurrentThreadId().ToString(), xmlSer); } return xmlSer; } } }

The result of this is that the Deserialize method of the XmlNodeDeserializer will deserialize the current element in the XML, and return it. It will also have moved the current position in the XmlReader to the end of closing element tag. However, for this to work, the XmlReader has to be at the start of the required element (in this case, either an or tag) when the Deserialize method is called. Below is the code that is used to set up the XmlReader, move it to the correct position, and then call the Deserialize method. In this case an XmlValidatingReader has been used, but if schema validation is not required, an XmlTextReader can be used. Note that if an XmlValidatingReader is used, only those nodes that are actually read are validated against the schema. So in this case, if the XML is invalid somewhere within the Properties node, the Agents will still get deserialized. This may or may not be appropriate, depending on the application requirements. const string ns = "www.charteris.com/namespaces/propertyexchange"; // Deserialize the agents each one in turn XmlTextReader reader = new XmlTextReader(fileName); XmlValidatingReader valReader = new XmlValidatingReader(reader); valReader.Schemas.Add(ns, schemaFile); // Move the reader to the start of the agents valReader.ReadStartElement("PropertyExchange", ns); valReader.ReadStartElement("Agents", ns); XmlNodeDeserializer ser = new XmlNodeDeserializer(); AgentType agent = null; while (valReader.Reader.LocalName == "Agent" && valReader.Reader.NamespaceURI == ns) { agent = (AgentType)ser.Deserialize(valReader); // Simulate the processing of the agent Console.WriteLine("Agent Name: {0}", agent.Name); }

20 May 2003 Version 1.0

Deserializing Individual Elements in XML Documents Charteris White Paper:

Page 11

The significant lines of code are the two ReadStartElement lines. These are used to move the current position within the XmlTextReader to the node. When the Deserialize method is called, it will start reading at the next node, which is the node. Each agent is deserialized, one at a time, until the current node is not an node. This also takes care of the situation where there are no nodes in the xml. Once all the agents have been processed, the XmlTextReader is moved to the node, and the process is repeated for each of the nodes. If agent nodes have been processed, the current node will be of type EndElement. However, if no agent nodes have been processed (because there were none in the XML), current node will be of type Element. So before moving, a check is made on the current node type, and if it is EndElement, the ReadEndElement method is called. Once all the nodes have been read, the Close method on the XmlTextReader is called to close the file.

5.

TEST RESULTS To test the memory usage and execution times, a large XML file was used with 113 agents and 180 properties. Both the XmlValidatingReader and the XmlTextReader readers were used. The total execution time and memory usage was recorded for 3 runs of each of the following: ♦ Standard deserialization where the entire xml is deserialized first, before processing starts, ♦ Item deserialization, where each agent and property is deserialized and processed before the next one is deserialized. The timings were also collected for the full processing of the entire XML, and when the processing was halted after the first agent was processed. The results are shown in the graphs below.

20 May 2003 Version 1.0

Deserializing Individual Elements in XML Documents Charteris White Paper:

Page 12

Execution Times 1.6 1.4

Seconds

1.2 1 Full

0.8

Failed

0.6 0.4 0.2 0 Standard Deserialization

Item Deserialization

Standard Item Deserialization Deserialization with with validation validation

Working Set 18000000 17500000

17000000 Full Failed

16500000

16000000 15500000 Standard Deserialization

Item Deserialization

Standard Deserialization with validation

Item Deserialization with validation

Bytes

The above graphs show that deserializing each item just prior to processing is more expensive in terms of execution time, by 32% for both an XmlTextReader and an XmlValidatingReader. However, if processing fails early on, the execution times

20 May 2003 Version 1.0

Deserializing Individual Elements in XML Documents Charteris White Paper:

Page 13

for item deserialization drop significantly, but for standard deserialization, they hardly change. When looking at the working set required by the process, a very similar picture emerges, but the differences are not as large. Standard deserialization requires less memory if everything completes (by 1.4% for an XmlTextReader and 1.1% for an XmlValidatingReader). Again, if processing fails early on, there is a memory saving using the node deserialization (3% for an XmlTextReader and 3.2% for an XmlValidatingReader). The objects that were created were simple entities that were easy to serialize and deserialize. If the complexity of the objects were to change, it is doubtful if the above results could be extrapolated to cover that situation without further study. Also, as the actual memory required is small, and the tests short, it is unlikely that the garbage collector (GC) will have run. How the GC will be able to recover memory will be application-dependant, but it is expected that in most cases, when an item has been deserialized and processed, it will be available for garbage collection. If the entire xml has been deserialized, each object is still reference within the object tree, so none of the objects are available for garbage collection. This would have the effect of increasing the memory savings shown by deserializing each item as it is required.

6.

CONCLUSIONS If an application deserializes large XML documents, this can consume significant amounts of memory. If this is done before any of the objects created are processed, it may result in some of the objects never being used, as processing may be stopped before they are used. As has been shown in this paper, significant gains can be made by only deserializing objects as they are required. However, it should be remembered that this will not always be appropriate, especially if it is expected that most of the time all the entities will be processed.

20 May 2003 Version 1.0

Deserializing Individual Elements in XML Documents Charteris White Paper:

Page 14

Related Documents

Xml
June 2020 21
Xml
November 2019 35
Xml
May 2020 25
Xml
November 2019 45
Xml
November 2019 11

More Documents from ""