Dmreview - Sunopsis Overview Etl

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Dmreview - Sunopsis Overview Etl as PDF for free.

More details

  • Words: 3,705
  • Pages: 16
Sunopsis Integration Suite Author: Philip Howard

an evaluation by

Sunopsis Integration Suite

Fast facts The Sunopsis Integration Suite is a set of applications that, between them, aim to offer every conceivable way that you might want to integrate data, whether in batch or real-time, whether persistent or non-persistent, whether in an eventdriven environment or one which is data-driven, whether synchronously or asynchronously. To put that in terms of product types, this means ETL (extract, transform and load), replication, event-driven application (data) integration via a service bus, Web Services, support for federated data and more. However, this does not tell the whole story. It is also important to appreciate that everything in the Sunopsis suite is implemented just once. Thus there is one transformation mechanism that supports all four of the front-end products that the company markets. Similarly, there is a single design tool, a single operator console, a single set of data quality capabilities, a single set of database connectors, and a single metadata repository that supports all of the facilities. The importance of this integrated, platform approach cannot be emphasised too much. There are, in fact, four products in the Sunopsis Integration Suite: Sunopsis ETL (extract, transform and load), which does what its name suggests; Sunopsis DataBus, which provides an event-driven integration solution based on Sunopsis MQ, the company’s message oriented middleware; Sunopsis ActiveData, which provides data-driven application integration to support a variety of functions such as business activity monitoring, data federation, master data management and real-time replication; and Sunopsis DataServices, which provides facilities to encapsulate data as Web Services. That said, not all of these capabilities are released as of the time of writing (March 2005). Both Sunopsis ETL and Sunopsis DataBus are currently available, while Sunopsis ActiveData and Sunopsis DataServices are scheduled to be released during the next six months. For this reason, this report focuses more on the first two of these products than those that have yet to be released.

Key findings In the opinion of Bloor Research the following represent the key facts of which prospective users should be aware:

© Bloor Research 2005



We are hugely impressed with the architectural approach taken by Sunopsis in building an integration platform. In our opinion it is clearly the market leader in the extent and breadth of its capabilities.



Sunopsis ETL is a code-free environment that generates native SQL for the source and target systems it supports. As such, you can process transformations anywhere within the environment that is most sensible, which means that, if it is appropriate, you can dispense with the need to have an intermediate processor. In other words, you can optimise performance with reduced hardware costs.

Page 1

Sunopsis Integration Suite



Change data capture is provided by Sunopsis, either to support real-time capabilities in ETL environments and/or to support real-time replication. Sunopsis MQ (which forms a part of Sunopsis DataBus) may be used in conjunction with these capabilities, either as a strategic choice or because a heavy duty (and expensive) message queuing solution is not required.



Data quality facilities, including data profiling, are provided by Sunopsis but these are only intended to ensure the quality of the data that is moved. For more sophisticated capabilities a third-party solution may be required.



The repository provides metadata management capabilities and is shared by all applications. This means that you can take transformations and mappings defined in one tool and reuse them in another. It also means that things like impact and dependency analyses span all of the Sunopsis solutions.



Similarly, there is a single design environment and a single management environment for the entire Integration Suite.

The bottom line Over the last few years Sunopsis has developed a growing reputation with its Sunopsis ETL product. However, it is only with this release that the company has expanded into providing a full Integration Suite. This is an ambitious undertaking. On the other hand it is clearly what the market needs. Where is the sense in having different integration tools with different transformation engines, different sets of connectors, different metadata repositories, and so on and so forth? Yet that is precisely the situation in which most companies find themselves. As far as we know Sunopsis is the first company in this market to commit itself to providing a platform that spans the entire data (as opposed to process) integration space. We think that Sunopsis has made the right decision and, based as it is on its successful and well-regarded ETL tool, we believe that it will be successful. It is certainly setting out its stall as the market leader.

Page 2

© Bloor Research 2005

Sunopsis Integration Suite

Vendor information Background information Sunopsis was founded in 1998 and introduced its first product, Sunopsis ETL, the following year, with the company claiming that this was the first 3rd generation ETL tool be made available in the market. We will discuss this claim further in due course. Since that time, the company has been expanding its product set so that it provides (or will provide) a complete integration solution that spans multiple technologies, all based on a single platform. In our view, this is the direction in which the market is, and ought to be, headed: it is just that Sunopsis is running ahead of the pack. However, we should make it clear precisely what market Sunopsis is addressing. Historically, the integration market has been regarded as being split between application integration and data integration. That was always something of a nebulous distinction since application integration was about passing data between applications. It is now clear that we can separate the integration market in another way: between data integration and process integration. What used to be EAI (enterprise application integration) now encompasses both business process integration and, via enterprise service buses, data integration. Sunopsis is addressing the market for data integration in all its aspects, no matter where the data originates. Sunopsis web address: www.sunopsis.com

Product availability At the time of writing the Sunopsis Integration Suite is in version 4.0, which was made generally available in March 2005. However, the Integration Suite consists of four modules, only two of which, Sunopsis ETL and Sunopsis DataBus, are currently available. The remaining modules are Sunopsis ActiveData, which is scheduled for release in spring 2005, while Sunopsis DataServices will be released during the summer of 2005. While we will discuss the last two of these modules in some detail we are not in a position to discuss them in the same depth as the products that are available today.

Figure 1: The Suite as a platform

© Bloor Research 2005

However, to think of the Sunopsis Integration Suite as simply a set of four products would be incorrect. In practice the suite consists of a platform upon which these four products can be delivered, as illustrated in Figure 1. Thus there are a number of both back-end and front-end facilities that are common across the product suite. Moreover, these include facilities that are not explicitly mentioned in this diagram, such as the transformation engine, alerting capabilities, security, auditing,

Page 3

Sunopsis Integration Suite

version control, data lineage and so on, all of which have been implemented just once within the Sunopsis Integration Suite, which means that you do not have to have multiple instances of these technologies, which would be the case if you implemented independent solutions to the various integration needs that Sunopsis supports. We will discuss these common features separately, some of them before we consider the individual Sunopsis modules and some of them afterwards. The only hardware required for a Sunopsis implementation is a suitable device for operators and developers. Since the whole environment has been written in Java this means that you can deploy these more or less anywhere (Windows, UNIX, Linux and so on). There is also a browser-based interface for operators. The system does require a repository but this can use whatever relational database you currently have installed. That is, you do not need a new instance.

Financial results Sunopsis is a privately-owned company with some 50 employees. It has offices in France (head office), the United States, Germany, Italy, Singapore and the UK. It also has a number of distributors representing the company in other European and Far Eastern countries, as well as Latin America. The company also has a number of partnerships with both systems integrators and other technology vendors.

Page 4

© Bloor Research 2005

Sunopsis Integration Suite

The platform – 1 It may seem strange to consider some of the common features of the Sunopsis Integration Suite before we consider the products themselves, and then to consider the rest afterwards. However, it is necessary to understand what some of the common facilities do in order to understand how the products work, while in other cases it makes more sense to work the other way round.

Connectors Connectors are provided by Sunopsis in what the company calls ‘Knowledge Modules’. The idea behind these modules is that each one understands the data source to which it relates (for example, the Teradata Knowledge Module understands any extensions to SQL that Teradata has implemented, supports parallel loading, and so on) and provides native capabilities for addressing that source. Further, the concept is that you tell the Knowledge Module what you want to do (the data that you want to access) and it will work out how to do that. In other words, relevant Knowledge Modules are used to generate the SQL necessary to access the relevant data. Each Knowledge Module is extensible and you can also manually edit the generated SQL if required. Standard Knowledge Modules that are available out of the box include Oracle, DB2, Sybase ASE, Sybase ASA, Sybase IQ, SQL Server, Teradata, Hyperion and Netēzza. There are also specific connectors for various applications and tools environments, such as SAP, PeopleSoft, Oracle Applications, J.D. Edwards, and Siebel on the one hand, and Business Objects on the other. In addition, Sunopsis also has customers using both Informix and Ingres and native capabilities are also available for these databases at no charge. Further, there are a number of additional Knowledge Modules available that are more limited in scope but still have detailed understanding of the host environment. These are available for UniVerse, Progress, PostgreSQL, Paradox, MySQL, NetRexx (for invocation only—scripting is not available), Jython (ditto), Interbase (and, presumably, Firebird, since that is based on Interbase), Hypersonic SQL, Dbase and Btrieve. If you use some other database technology then JDBC and ODBC are available (and you can build specific Knowledge Modules) as are JMS, LDAP and Web Services connectors. There is also an XML capability that allows the software to extract from XML sources in a relational format. In addition to standard connectors, Sunopsis also provides change data capture as an optional extension to these connectors. That is, the ability to capture changes as they are made to the source system. This can be used for various purposes, including trickle feeding data into a data warehouse and real-time replication. Change data capture may be either trigger-based (in conjunction with message queuing—see later) or log-based. In the latter case, Sunopsis typically provides its own connectors though you can use third-party adapters if you need to access obscure legacy systems that Sunopsis does not support.

© Bloor Research 2005

Page 5

Sunopsis Integration Suite

Repository While a number of the features (such as version control) provided by the repository will be discussed later, it is important to understand how the repository works with respect to the architecture of the Sunopsis platform. This is essentially agent-based, with agents residing wherever is appropriate within the architecture (for example, you might want an agent on a source system to intercept operating system calls). These agents run Sunopsis jobs and do so by referencing the repository, in which all job definitions are stored. Note that agents have scheduling functionality built into them though you can also use third-party scheduling tools, as you can call agents via the command line or through Java code. As far as the repository itself is concerned, it is important to remember that the same repository is used by all of the Sunopsis products. This means that you can reuse transformations and data quality rules, for example, across the product set. It also means that cross-product capabilities such as data lineage and dependencies are supported, as illustrated in Figure 2.

Figure 2: Repository Explorer

Page 6

The repository can be based on any database that supports the ANSI ’92 standard. It is not CWM (Common Warehouse Metamodel) compliant though it can export (but not import at present) data in a compliant format.

© Bloor Research 2005

Sunopsis Integration Suite

Sunopsis ETL Introduction Sunopsis describes its ETL as a 3rd generation tool. The first generation of such tools generated code (COBOL and so forth) that you ran on the source system. The problem with this approach was that it did not support heterogeneous environments very well and it imposed a significant load on the source system. This was then superseded (though code generating products are still available) by socalled ‘black box’ solutions whereby you have an intermediate processor that takes over the processing load. However, these black boxes can become a bottleneck in their own right or, in order to avoid this, you have to spend a lot of money on extra hardware in order to preserve performance. The idea behind a 3rd generation approach is that where you process the data should depend on the situation. It might be most efficient to transform some data on the source system, it might be better to perform other processes on the target, and it may also be appropriate (or it may not be) to perform some operations on an intermediate processor. In other words, you need to have a choice and this is what Sunopsis ETL provides. Indeed, this means that Sunopsis ETL is a misnomer: it could equally well be ELT or TEL or TELT or even TETLT. You can have an intermediate processor if you want one, but you don’t have to and you can also implement in-memory staging if that is appropriate. In order to provide this sort of flexibility, Sunopsis ETL generates SQL relevant to the source systems being accessed, as discussed previously, which means that Sunopsis can leverage the power of whatever existing systems are in place without requiring you to add new hardware.

Design All the Sunopsis products use a common front-end environment, which is one of the strengths of the suite. For design purposes, this is the Sunopsis Designer. This operates in a fairly conventional way, as illustrated in Figures 3 and 4, with drag and drop interfaces that allow you to build what Sunopsis refers to as an ‘interface’, by which it means a single interface job; and then you can combine multiple interfaces by means of a process flow (with support for variables, branching and so on) into packages.

Figure 3: Sunopsis Designer screen

© Bloor Research 2005

Notable features available during the design stage include an annotation facility, though this applies only to the flow as a whole—we would prefer it if it was possible to append notes to individual elements within the flow; the ability to set error thresholds for each interface that will make the interface either a ‘pass’ or a ‘fail’, and a logging option. In addition, there are facilities to support

Page 7

Sunopsis Integration Suite

such things as slowly changing dimensions within OLAP implementations. It is also pertinent to note that there is an alerting engine that can not only be used to send emails and other notifications, but which can also be used to trigger further processes—however, this facility is not limited to Sunopsis ETL but can also be applied by the other Sunopsis products. Sunopsis ETL has reverse engineering capabilities built into it for tables, columns, relationships and so on, though it does not generate graphical (entity-relationship) models from this information. In principle you could import appropriate models from third-party tools but this is not an automated facility. Figure 4: Sunopsis Designer screen showing relationships

Page 8

© Bloor Research 2005

Sunopsis Integration Suite

Sunopsis DataBus Sunopsis DataBus is based around Sunopsis MQ, which is the company’s message queuing system. This was introduced with version 3.2 of the product suite for customers wanting to implement trigger-based replication but who could not afford, or did not want, to go to the extent of implementing a mainstream messaging queuing system such as WebSphereMQ. Nevertheless, Sunopsis MQ was previously treated as an optional add-on rather than an independent product in its own right. Indeed, in a sense that remains the case, since DataBus is more than just the message queuing software.

Figure 5: Common Format Designer screen

© Bloor Research 2005

Sunopsis DataBus is what is otherwise known as an enterprise service bus (ESB) in that it consists of message queuing software and a transformation engine to provide event-driven integration. In other words, it is the sort of tool that you would use when an incoming document (for example, an invoice or a SWIFT message) needs to be transformed into your own standard format in order to be processed. It offers a nonpersistent environment (unlike the Sunopsis Active Data Hub—see next section) that guarantees the completion of a transaction, with delivery and proper usage capabilities. There is also an asynchronous integration mode for applications that are located on remote sites or for which connectivity can be intermittent. Along with the ActiveData product it shares a Common Format Designer (see Figure 5), which is a physical (not logical—you could link to a third-party modelling tool if you need this facility) schema design tool that is new in this release.

Page 9

Sunopsis Integration Suite

Sunopsis ActiveData Whereas Sunopsis DataBus provides application integration on an event-driven basis, Sunopsis ActiveData provides complementary capabilities for data-driven environments that allows you to replicate data (in real-time) from production applications into a centralised database (using the rdbms of your choice) called the Active Data Hub. This is, in effect, an operational data store (ODS) that stores data persistently and which can be used to propagate changes back to production applications, thanks to the fact that the replication facilities provided are bi-directional. Updates support two-phase commit but are not XA compliant. Data in the Active Data Hub can be accessed either via SQL or through Web Services using Sunopsis DataServices (see next section).

Sunopsis DataServices

Sunopsis ActiveData can be used for a variety of purposes: for example, it might be used as a traditional ODS. However, it also might be used to store master reference data, or to form the repository for business activity monitoring (BAM), or as a data federation platform for enterprise information integration.

As Sunopsis DataServices is some months away from being generally available, little detail of what it will provide is currently available. However, the principle is that the product will allow you to expose your data through Web Services, so that any data held by the organisation (irrespective of the applications that use that data) can participate in a services-oriented architecture (SOA).

Page 10

© Bloor Research 2005

Sunopsis Integration Suite

The platform – 2 There remain a number of supporting functions that we have not discussed, that are available to all of the applications within the Sunopsis suite, which we will discuss in turn.

Data quality Sunopsis does not claim to provide a comprehensive data quality solution but, rather, what is necessary to support data integration. That is, it aims to prevent bad data from being moved from one location to another, to write exception records that capture instances of bad data, and to reprocess rejected data. Thus, for example, its profiling capabilities are not capable of discovering such things as referential integrity errors. Similarly, it does not do things such as name and address cleansing. On the other hand, it allows you to define your rules (in SQL) so that you can extend the environment to capture industry or business-specific errors.

Metadata management

Figure 6: Developer capabilities

Again, the metadata management provided through the Sunopsis repository is not intended to provide a generalpurpose repository capability but a set of features that have been specifically designed to support an integration environment. Thus there are features such as impact analysis, dependency diagrams, where-used capabilities, data lineage graphs and so on. There are also particularly good multi-developer capabilities, which extend beyond the traditional check-in/check-out and versioning (illustrated in Figure 6) to such things as supports for baselines, which you would more commonly expect in configuration management systems.

Administration As one might expect by now, administration is also a single facility no matter which application is implemented with an operator console (see Figure 7) that presents a dashboard of all integration processes and allows you to drill down within the execution results (provided that logging is turned on). Other administrative functions, such as security, are also platform rather than productspecific components.

Figure 7: Operator Console

© Bloor Research 2005

Page 11

Sunopsis Integration Suite

Summary We are very impressed with Sunopsis’ vision and, to date, the reality that it has implemented. We believe that it is time that users stopped looking for point solutions to integration problems and started considering strategic platform choices: Sunopsis would be a good place to start.

Page 12

© Bloor Research 2005

Copyright & Disclaimer This document is subject to copyright. No part of this publication may be reproduced by any method whatsoever without the prior consent of Bloor Research. Due to the nature of this material, numerous hardware and software products have been mentioned by name. In the majority, if not all, of the cases, these product names are claimed as trademarks by the companies that manufacture the products. It is not Bloor Research’s intent to claim these names or trademarks as our own. Whilst every care has been taken in the preparation of this document to ensure that the information is correct, the publishers cannot accept responsibility for any errors or omissions.

Suite 4, Town Hall, 86 Watling Street East TOWCESTER, Northamptonshire, NN12 6BS, United Kingdom Tel: +44 (0)870 345 9911 – Fax: +44 (0)870 345 9922 Web: www.bloor-research.com – email: [email protected]

Related Documents