Highperformance Xml

November 2019
PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA

Overview

Download & View Highperformance Xml as PDF for free.

More details

Words: 3,363
Pages: 5

Preview
Full text

C24 White Paper — High performance scalable XML

XML is great, but …. How do you achieve High Performance Scalable XML processing? A C24 White Paper by John Davies, Technical Director of C24 John Davies summarizes the key points that make the approach taken by the C24 Integration Objects toolkit faster, more efficient, more scalable and more robust – and it works straight out of the box.

The background We've had XML (Extensible Mark-up Language), as we know it, for almost 6 years. It is not a new idea, having evolved from Standard Generalized Mark-up Language (SGML) which itself evolved from GML. These two predecessors date from 1986 and 1969 respectively. W3C finally produced the XML 1.0 standard in early 1998; at last we had a common path for HTML (XHTML) and programmers could finally work towards a standard. But somewhere along the line, things got a little out of hand. XML hit the "dot com" boom in its heyday, tens and even hundreds of millions were spent on promoting technologies that used XML and it became the "must have" technology of the era. Products were laughed at if they didn't support XML in some form or another. New versions of Java came out with XML support; C# did the same a little later. XML was now the markup language of choice for inter-process and inter-application communication. New standards like SOAP and XML-RPC evolved and before we knew it there were suggestions that it could even replace RMI, (Java's Remote Method Invocation), IIOP (CORBA's Internet Inter ORB Protocol) and DCOM (Microsoft's Distributed COM) as the "standard" for distributed computing.

XML is great, but… XML is great. It has made so many previously flat tag/value forms actually usable. Certainly from a data-oriented rather than a document-oriented perspective, there is no better way to store application data and configuration files. All this doesn't come without compromise and cost though. You can't for example execute XML (OK, excluding one of my favourite technologies, Jelly but it’s basically tags for Java). Furthermore, how can we specify that a number conforms to a checksum routine in XML Schema? The main problem comes with complexity. In specialist vertical market applications such as retail, telcos, pharmaceuticals and banking things get very complex. These verticals have their own “languages”, some not too dissimilar to XML and many derived directly from it. The majority of these verticals are moving towards XML as a standard; while this doesn't give them much over the existing languages itself it does make third party integration much easier given the vast number of tools available. The complexity of these standards creates new problems, first of all the implementation of the standard itself, and then its usage. I.e. is XML schema up to the job? If so then how do we manage the resulting XML instance documents, we can define them but they start to get complex and bloated?

Can we standardise on XML Schema? Many of the industry standards have rules and conditions; these are often so complex that they can't be represented in Schema. The number of different message and sub-message types is often so numerous that each one needs its own schema. The end result is often an unmanageable tangle of schema documents that implement only 95% of the original standard. We are actually losing definability by moving to XML. Having said that though, XML Schema is about the best language we have for defining data models and constraints, all the others are proprietary, it’s just lacking precision.

But we need human readable messages. Or do we? One much-quoted advantage of XML is that it is "human readable". Whereas this is still technically true for complex XML, your poor human will need a PhD in the particular industry vertical before being able to "read" the instance document. The document will have dozens of namespace prefixes, frequently lack “pretty” formatting and start to look like something off the Matrix.

www.C24.biz

Applied Solutions in Finance

copyright © 2004 Century 24 Solutions. All rights reserved

C24 White Paper — High performance scalable XML

Grinding to a halt Finally the show stopper! You'll have watched with dismay as your XML documents grow in size and complexity, and throughput performance declines proportionally. You need to parse and validate it against the rules (at least the 95% you managed to implement) in the schema and these are dispersed across a dozen or so .XSD files. These schema files themselves contain XML, and all have to be parsed. The alternative, choosing not to validate, is pointless since XML without a schema has no concept of types. Someone could easily put "NaN" into a numeric field and it will be perfectly valid without the schema. Your invalidated XML will parse but your program logic won't be so happy. Inevitably your application source code would become polluted with the very constrain tests which XML Schema should be doing for you.

XML is legacy XML data interchange is the equivalent of the fax machine of contemporary software architecture. "Fax?" I hear you say, "surely XML is up to date and leading edge, not like the outmoded technology of fax which has been superseded by email". Let’s backtrack for a moment and look at how XML got to its position as the jewel in the crown of data interchange architecture that it is today, then see if I can justify this analogy. Using XML for complex message exchange is like using a fax machine to exchange Word documents even though you've got email. You enter data into your document and then serialize the document on the printer, the universal human readable "standard". You then fax the document to your colleague who passes it through his OCR (Optical Character Recognition) reader. After what could be as quick as 5 minutes your colleague has the same document and can begin proof reading it. He makes the changes on his version of Word (or something else for that matter) and faxes it back,. Voila - two way document flow! About the only advantage gained over email is that we've solved the problem of which of the umpteen versions of Word to use and reduced the risk of getting a virus to virtually nothing. This might sound like a daft way of working but it is an accurate description of the process. It’s very inefficient on both sides of the message exchange.

The real world I have spent most of my working life in large wholesale banks, architecting complex systems. About 5 to 6 years ago we started to hear "XML is our standard", but very few systems actually "spoke" XML. As time moved on, we started to see XML message buses and even messaging frameworks built around XML e.g. OpenAdapter. Trade volume went up as efficiency improved, XML complexity increased as we got better tools. More applications came on line, the network volume went through the roof and servers started to red-line. Throwing hardware at the problem rarely helped; the bottleneck was parsing and validating the XML and it's not easy to share that across multiple machines without a framework in place. It is not unusual to see thousands of messages a second come from the front or middle office in a bank. A small change in a base rate can force a re-valuation of tens of thousands of trades, each one having multiple "legs" (parts), and delays can cost millions. Speed can be absolutely business-critical.

The XML bus Over the last 2 to 3 years entire architectures have now been built around an XML bus. A deal is entered into the front office trading system; it is transformed into XML and sent on to the middle office. From there we have risk systems, P&L, limit checking, counterparty confirmation, settlement instructions, matching, ticket printing and anti-laundering checking, to mention just a few. Some results are sent back up to the front, some stored, some sent to brokers as confirmation and some sent to the back office. From the back office the documents are parsed (from XML again) and transformed once again for confirmation, reconciliation and settlement. Volume and XML complexity is increasing faster than Moore's law is helping (by producing faster hardware). We find ourselves closer and closer to gridlock, it takes just one small surprise on the markets and tens of millions can be lost “in process” waiting for XML messages to filter through the systems.

What about SOA as a solution? Does a Service Oriented Architecture (SOA) help? Yes it can but not on its own. By reducing the amount of discrete, stand-alone processes and placing them in one manageable "box" parts of the system become easier to manage and at the same time reduce the amount of XML being passed from place to place. What's happening though is that we are un-doing the XML-bus architecture and going back to a single server model (or at least a cluster). It’s very hard to get these various parties to agree to put everything into one box. Traditionally these systems have evolved over a number of years and each contains decades of expertise. It’s not unusual for each application to be in a physically different location and we’re back to having XML flowing between them. SOA is good for expressing the connections between application clusters but it is not the silver bullet for large enterprise scenarios. Rather it is important to see SOA as part of a wider architectural solution in which it plays a part but does not dominate.

www.C24.biz

Applied Solutions in Finance

copyright © 2004 Century 24 Solutions. All rights reserved

C24 White Paper — High performance scalable XML

What is the solution to XML performance? There are a lot of quick fixes. Hardware helps but it's an expensive way to fix performance problems and it's unlikely that doubling up on the hardware will double your capacity. Move from .NET to J2EE or move from J2EE to .NET? This is exactly what you're likely to hear if you speak to one of the big application server vendors or a major consulting firm. If you just change vendor and not your architecture you are unlikely to see much return other than empty pockets after you've paid the consulting firm for recommending the change and implementing it for you. There are a number of companies offering significant performance gains with XML. Confirmative Systems claims “The company estimates that its solution provides a greater than 20X improvement over conventional servers by addressing XML data processing including parsing, validation, and transformation in its proprietary chipset.” PolarLake claims “PolarLake overcomes the performance issues often associated with processing XML by employing a number of innovative technologies, typically increasing throughput by 30-50 times compared with other servers.”, and go on to list “XML-streaming, Multi-threading, Single scan and Selective processing” as key factors. So, these companies have obviously seen the problem but have different solutions: one will sell you yet more hardware and the other will sell you a closed server using “innovative technologies”.

We have a simpler and cheaper solution! Don't use XML for inter-process communication and data transfer. Use XML for what it was designed for, document oriented mark-up and use Java objects for complex data-oriented messages. This isn’t a new idea; we just provide the tools to facilitate it, you can then devote more time to your business.

Replace XML with Java? Document oriented; no, data-oriented; yes, what’s the difference? The difference is simple If your data was, is or at some time will be a document, e.g. web form, web page, report etc. then stick to XML. The tools around XML are mainly designed for documents. If however your message is part of an exchange of data, e.g. FpML, FIX, SWIFT etc. then use Java to exchange data and not XML. Note that XML/SOAP still has its place in inter-company messaging; it just makes more sense to use Java objects in many internal scenarios.

How do I replace XML with Java? The goal here is not to change something that works already, just fix the problems. XML works, but it's just slow and inefficient. XML Schema is a good way to define data models and it has been the main drive behind standardisation initiatives in the vertical industries (e.g. FpML, MDDL). The result is to generate Java classes from the Schema, a Model Driven Architecture (MDA) where XML Schema is the model. The generated Java object model then not only functions as a template for the Schema model but also undertakes the validation. It is type-safe, self validating, in most cases quite a bit smaller than the XML equivalent and requires no parsing. Since it has knowledge of its own structure it knows how many instances of a particular element can be added as defined by the XML Schema model. What's more, when you want to get XML back out of it you just ask the object to output itself as XML.

C24’s Integration Objects Century 24 Solutions, C24, is a well established software house selling integration solutions, predominantly for the financial services industry. We have tier one banks and clearing houses all over the world as clients using our SWIFT, FIX, XML and other message format objects. C24’s Integration Objects (IO) is quite simply a model driven code generator. You can either design the data model using the rich Swing based GUI, import it from an external source e.g. XML Schema, DTD, RDBMS etc. or use one of the library models that we’ve painstakingly created from the original specification (as was done with SWIFT for example). Taking FpML as an example the user can simply load the main FpML 4.0 schema into the IO-Editor in a second or two. From there the FpML IO model is an exact copy of the FpML Schema, so much so that if re-exported it is binary identical to the original. Changes can now be made to the model under the control of the version management system; although in the example of FpML it might not be the best idea to do so. You can of course, use the IO Editor to create and manage the XML schema data models rather than just import them.

www.C24.biz

Applied Solutions in Finance

copyright © 2004 Century 24 Solutions. All rights reserved

C24 White Paper — High performance scalable XML

The resultant models can be deployed as Java code, in the case of FpML it results in something between 350 and 1050 source files. The range is due to the number of deployment options. You can for example produce an interface with each complex type implementation; this provides the user with a fixed API that resists change rather like XML without a schema does. NameSpaces become packages, complex types become classes and schema restrictions and regular expressions are implemented as validation methods. The deployed Java is not simply a directory full of source files, messages are deployed along with ANT scripts, dependent JARs, JavaDoc (including the Schema annotations) and even a Maven project file for the brave. In less than 5 minutes you can go from FpML Schema to a Java component in the form of a JAR with a richly documented API. Anything simpler than FpML is obviously much quicker. The classes implement “hand coded” externalization routines whereby they serialize themselves with near perfect efficiency. All deployed IO components have utility classes for reading and writing XML instances into and out of the IO component and all include an XPath implementation. This also works for non-XML based models including SWIFT and FIX etc. The serialized objects can be easily decoded to allow simple and effective debugging as well as XSL-style facilities implemented directly in Java. This increases both performance and the coherence of your code.

XML and beyond… With C24’s IO you now have a Java object model that is very close to what would have been written if you had had to code it yourself. It is small, efficient and powerful and yet it retains all of the features of XML. It goes a lot further though, rather than simply take on XML Schema restrictions it can be extended. We can now write real Java code for checking the value of elements, these can reference other elements or even external sources. We have clients for example that check counterparties and currencies from live databases and these checks are actually built into the deployed code. We can fully validate things like IBAN ISO13616 codes, ISO currency, country and BIC codes, postcodes, zip codes, credit cards numbers, payment dates, holidays dates etc. all things that are impossible in XML Schema. Because IOs are small and totally self-contained a lot of other interesting possibilities arise. We can apply rules to the components and send them off on their way. These rules can be executed remotely without having to be centralised. Components can by truly distributed by using technologies like RuleML and Enigmatec’s RIF. C24’s IO provides the components needed for Grid-computing and Jini’s JavaSpaces. IOs and JavaSpaces were made for each other. The IO-Editor can deploy code that implements (for example) net.jini.core.entry.Entry, they can all be written into JavaSpaces. C24’s IO in JavaSpaces is like having a database that works on native XML but with everything in memory – indeed shared memory across a number of machines. This database is transactional, scalable and with IO can contain not only data but executable validation and workflow rules. It makes a high performance and scaleable processing using a Grid computing model a practical framework. By applying matching rules to IO components and writing them into JavaSpaces we are able to provide XML-to-XML matching and reconciliation orders of magnitude faster than traditional “flat” matching engines. Using GigaSpace’s Embedded Spaces for example we are able to achieve more than 2 and in some cases 3 orders of magnitude faster throughput than using “raw” XML messaging.

Conclusion Using XML for standards publication and associated rules is good. But in message based integration, as XML becomes more popular, it becomes proportionally less practical. The issues are that XML tends to get bloated and inefficient if used in large complex inter process communications. The necessary parsing and validation based on moving standardised XML instances is inefficient in terms of computational horsepower to achieve the required throughputs and latency demands, and expensive to deploy in terms of development resources. This existing integration infrastructure can be made more efficient using open C24 IO components without any investment in proprietary “go faster” solutions. C24 IO also enables the practical application of Grid type architectures. With the increasing availability of Blade type high density low cost computing platform appliances, the Grid type model provided by JavaSpaces implementations comes of age. Put simply, the C24 IO toolkit technology provides a more efficient model drive architecture approach to XML integration.

www.C24.biz

Applied Solutions in Finance

copyright © 2004 Century 24 Solutions. All rights reserved

C24 White Paper — High performance scalable XML

Glossary of terms: FpML (Financial products Markup Language) is the business information exchange standard for electronic dealing and processing of financial derivatives instruments. (http://www.fpml.org) OpenAdapter can be loosely classified as EAI (Enterprise Application Integration) software based on Java and XML. (http://www.openadapter.org) S.W.I.F.T. (Society for Worldwide Interbank Financial Telecommunications), one of the main standards for financial messaging, handles over 2 billion messages per year (http://www.c24.biz/swift.htm) FIX (Financial Interface eXchange) another de facto standard in the banking industry (http://www.c24.biz/fix.htm) Jelly is an excellent tool for turning XML into executable code from Apache (http://jakarta.apache.org/commons/jelly/) RuleML (Rule Markup Language) is an open standard for rules in XML (http://www.ruleml.org/) CodeMesh is a company that provides leading edge C++ to Java integration tools (http://www.codemesh.com) Enigmatec, an innovative company working on leading edge technologies like Grid computing and Distributed Rules (http://www.enigmatec.net) IBAN (International Bank Account Number) is an ISO standard that sounds simple but it’s actually rather complex. (http://www.ecbs.org/iban.htm) JavaSpaces is part of Jini. Grid Computing for Java, enabling truly distributed systems with minimal overhead. The perfect framework for Blade hardware. Jini has been around since the 90s, now finally come of age as the Grid technology for the future (http://www.jini.org)

www.C24.biz

Applied Solutions in Finance

copyright © 2004 Century 24 Solutions. All rights reserved

Highperformance Xml

Overview

More details

Related Documents

Highperformance Xml

Xml

Xml

Xml

Xml

Xml