Dmreview - Avenir Extraction From 04 Etl Eai

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Dmreview - Avenir Extraction From 04 Etl Eai as PDF for free.

More details

  • Words: 3,110
  • Pages: 11
metagroup.com



800-945-META [6382]

February 2004

The Future of Data Integration Technologies A META Group White Paper Sponsored by Sunopsis

“Today, IT organizations have a wealth of technologies from which to choose to meet their various data integration requirements, including database replication, ETL, EAI, EII, database gateways, data quality and cleansing, metadata management, portals, and information logistics agents. Yet the sheer number of choices, usually provided by different vendors, actually compounds the challenges. Which technology is the most appropriate for the problem at hand?”

The Future of Data Integration Technologies

Contents

Introduction .......................................................................................................... 2 Pattern 1: Business Continuity ...................................................................................................... 2 Pattern 2: Business Intelligence .................................................................................................... 3 Pattern 3: Multistep Business Process Coordination .................................................................... 3 Pattern 4: Synchronization of Distributed Information ................................................................... 4 Pattern 5: Visibility of Information Assets ...................................................................................... 5

Data Integration Architectures............................................................................ 5 EAI and Data Integration...................................................................................... 6 The Future of Data Integration ............................................................................ 7 Bottom Line .......................................................................................................... 9

208 Harbor Drive • Stamford, CT 06902 • (203) 973-6700 • Fax (203) 359-8066 • metagroup.com, www.sunopsis.com Copyright © 2004 META Group, Inc. All rights reserved.

1

The Future of Data Integration Technologies

Introduction During the last 10-15 years, as information systems have become increasingly decentralized and distributed, the challenges of data integration have increased. The volume of data that IT organizations are managing, the variety of data formats, and users’ velocity requirements for access to information have all increased dramatically. Likewise, data integration technology has advanced to meet the more challenging data consistency management requirements created by these demands for access to information. Today, IT organizations have a wealth of technologies from which to choose to meet their various data integration requirements, including database replication, ETL, EAI, EII, database gateways, data quality and cleansing, metadata management, portals, and information logistics agents. Yet the sheer number of choices, usually provided by different vendors, actually compounds the challenges. Which technology is the most appropriate for the problem at hand? How many different data integration technology specialists are needed to maintain all these products? Each technology satisfies different data volume/velocity/variety and business requirements. However, a more pragmatic and unified approach is needed to simplify this complexity. Business user requirements for access to and manipulation of corporate information dictate requirements to the IT organization (ITO) for data integration technologies. From the ITO’s perspective, these requirements fall into five patterns.

Pattern 1: Business Continuity Particularly in larger enterprises with users dispersed across many different geographies and time zones, ITOs may be expected to provide access to some information 24x7, 365 days a year. To accommodate all users’ needs for access, data for one application may be artificially distributed because the telecommunications network is insufficient or unreliable to meet the needs for data access. In these scenarios, typically, data from the source system is moved on a regular schedule to the receiving target system. Once updates to the source are complete and maintenance is finished, a new replica copy can be delivered to the receiving system again, overlying the previous version. IT can create duplicate copies so that one can be maintained offline while users continue to access the alternative copy. In addition, for disaster recovery reasons, duplicate copies of data may be kept in separate locations.

208 Harbor Drive • Stamford, CT 06902 • (203) 973-6700 • Fax (203) 359-8066 • metagroup.com, www.sunopsis.com Copyright © 2004 META Group, Inc. All rights reserved.

2

The Future of Data Integration Technologies

Pattern 2: Business Intelligence Different users require access to data for their own unique analysis in support of their decision making. Even the same subject area of information may need to be organized and refined in various different ways (such as varying degrees of detail versus summarized records) to meet different analytical requirements. To meet a broad and diverse set of business intelligence (BI) requirements, IT will consolidate, reorganize, and refine operational data from various sources into data warehouses, data marts, operational data stores (ODSs), reporting systems, and other applications to support analytical needs. Here again, data typically moves in a unidirectional manner, but from multiple sources into one target. A key characteristic of this pattern is that the data is being transformed or aggregated along the way to meet the specific needs of BI.

Pattern 3: Multistep Business Process Coordination Because operational applications have been developed over many years independently of each other, most organizations have a number of siloed (or stovepiped) applications. Over the years, many of the silos have been integrated by shipping files from one to another, often by manually writing scripts that move data from one point to another, reflecting IT’s recognition that the output of one silo is in fact the input required by another silo. Any singular system may potentially create multiple output files to be sent to various receiving systems, each designed independently, such that the output file must be reformatted into multiple unique files. The same systems can also be accessed by multiple scripts, processing the same information over and over again. Consequently, many enterprises today have found their IT operations hampered by the resulting spaghetti network of cross-application interdependencies manifested as point-topoint interfaces. However, what these interface networks accomplish is the automation of a multistep business process and potentially even multiple business processes. Information is moved from system to system in the sequence required by this overarching business process. Traditionally, this information moved once a day as a file to the receiving system, though in recent years, information is moved as messages throughout the day. Key to this pattern, however, is that the sent information is used as the input for a particular business function (e.g., application logic). The receiving system rarely accepts this input as direct updates to its database or data model. Rather, by using native application-layer interfaces (e.g., APIs, screens, objects, IDOCs) and its native validation and editing logic for input, the receiving system’s data integrity and transactional integrity are preserved (see Figure 1).

208 Harbor Drive • Stamford, CT 06902 • (203) 973-6700 • Fax (203) 359-8066 • metagroup.com, www.sunopsis.com Copyright © 2004 META Group, Inc. All rights reserved.

3

The Future of Data Integration Technologies

Figure 1 — Distinguishing Business Events vs. Data vs. Messages vs. Transactions In any integration scenario, what to use as the basis for integration — data, events, transactions, messages, business objects — and how to integrate applications (e.g., APIs, screens, files, messages, database calls) are the major sources of confusion when it comes to the kind of integration technology to use. Users must differentiate between the what and the how aspects of their interfacing decisions. Toward that objective, we offer the following definitions: • Data: Facts that are recorded on non-volatile storage media. Data represents the state of the business transaction or event at the time of recording. Example: ABC Corporation is recorded as the name of a new supplier. • Transaction: A programming construct consisting of one of the elements of CRUD (create, read, update, delete) and a payload (the data) to act upon. Example: Create ABC Corporation (in supplier master file). • Business event: A set of conditions or circumstances that have meaning as a group. The occurrence of the conditions can be simultaneous, but does not have to be. Example: The approval of a new supplier. • Message: An envelope-based transport format. Many things can be put inside this envelope (data, transactions, events, XML documents, EDI transactions, etc), depending on size limitations. Example: A SOAP message holding a PO acknowledgment from an ABC supplier. Source: META Group

Pattern 4: Synchronization of Distributed Information With distributed copies of similar or related data and requirements for reducing the latency of the shared information, IT may have to synchronize various copies of data in a near-real-time fashion. This occurs when the processes of the various systems are unrelated, yet there are common data elements. Logic must be written to reconcile both formatting (syntax) and semantic variations (see Figure 2) in the copies. This logic is encapsulated as an information logistics agent (ILA), running independently of any singular application. The ILA essentially represents a new process that is laid over existing data structures to resolve inconsistencies in the data resulting from prior distribution. The risk in this design, however, is that this new logic lives outside of any business process/application and creates an alternative method by which data state is changed. For example, this kind of update logic circumvents any editing and validation rules that are coded into the business application that natively owns the data model/database. In this circumvention, transactional integrity can be corrupted.

208 Harbor Drive • Stamford, CT 06902 • (203) 973-6700 • Fax (203) 359-8066 • metagroup.com, www.sunopsis.com Copyright © 2004 META Group, Inc. All rights reserved.

4

The Future of Data Integration Technologies

Figure 2 — Syntax vs. Semantics Syntax: Systematic, orderly arrangement. Example: The syntax of a customer name may be “last name, first name.” Semantics: Of or relating to meaning, especially meaning in language. Example: In a business that sells products or services to other businesses, the term “customer” may refer to another business as a buyer of the products or services, or it may be used to refer to individual people that work at the business customer. Source: META Group

Pattern 5: Visibility of Information Assets Senior-level managers and executives especially will want to have certain pieces of information about business operations available at their fingertips. Usually, the need is for access only (i.e., reading), not updating. Thus, IT must provide online, direct access to information sources in such a way that is secure and that will not have too much of a performance impact on the native application and users utilizing the data source.

Data Integration Architectures What is common in most of these scenarios is the need for data — data that is upto-date, relevant, and provided in a timely manner. Data integration technology, in all of its various flavors mentioned earlier, can be appropriate for each of these patterns. In Figure 3, we match these patterns to the most appropriate data integration technology. Pink indicates architecturally where the data integration technology resides.

208 Harbor Drive • Stamford, CT 06902 • (203) 973-6700 • Fax (203) 359-8066 • metagroup.com, www.sunopsis.com Copyright © 2004 META Group, Inc. All rights reserved.

5

The Future of Data Integration Technologies

Figure 3 — Deploying Discrete Data Integration Technologies

Database Replication

Extract/Transform/Load ETL Scheduled

DBR

Business Intelligence

Business Continuity

Demand-Based

Enterprise Application Integration Msg. Queue

EAI

Msg. Queue

Event-Based API

Business Process Coordination Information Logistics Agents

Enterprise Information Integration

ILA

ILA ILA

EII View

Business Synchronization

OLTP or Query App

Business Visibility

Source: META Group

Readers should note that, in each case, the movement of information starts (with data extraction) or ends (with data loading) at the database level of the application architecture. Furthermore, the data integration engine can exist either close to the source, on the target, or on a distinct platform altogether — whatever is most appropriate given the physical connectivity between them, the available skills, and the available resources to host the middleware.

EAI and Data Integration EAI technology (also known as an integration server) is one form of integration technology that can be used for data integration as depicted above. In this scenario, data is written into a message queue (acting as the envelope and mailbox) and forwarded to the EAI broker. The intelligence in the EAI broker decides how to route the information to the target database. However, the information is received via a message queue or the target application’s native API before the receiving database is updated. For this reason, EAI is not a true data

208 Harbor Drive • Stamford, CT 06902 • (203) 973-6700 • Fax (203) 359-8066 • metagroup.com, www.sunopsis.com Copyright © 2004 META Group, Inc. All rights reserved.

6

The Future of Data Integration Technologies

integration technology. However, its most frequent use is in such data integration scenarios.

The Future of Data Integration To date, all these technologies have been offered by various independent vendors and products. Even if the ITO can appropriately match the right integration technology to the problems as described above, it still has the problem of how many skilled, specialist resources are needed to implement these technologies and integration scenarios, as well as how many different vendors must be contracted. Each of these disparate technologies has its own user interface to the developer, brings its own development environment (often graphical), has its own metadata repository to document the interface, has its own security framework, and has its own management framework. Rather than having multiple data integration technologies, ideally the ITO would like to have one product that supports all data integration requirements with a single (in other words, rationalized) user interface, development language or tool, metadata repository, security framework, and management framework. A consolidated data integration platform should support data consistency management requirements across a broad, diverse set of DBMSs, platforms, and data types. To do this, it should be able to leverage native DBMS-provided interfacing mechanisms (such as bulk loaders, APIs, and load utilities) as well as support portable, standards-based access mechanisms (e.g., JDBC, ODBC drivers) for reading and writing into the databases. In addition, to accommodate both the ITO’s shrinking maintenance window and users’ appetite for increasingly current information, a unified data integration platform should be able to capture net changes at the source and reduce the volume of incremental updates to the target. This changed data capture (CDC) capability can be achieved through various means (e.g., timestamps, database logs, event-based flagging of data, DBMS triggers, DBMS stored procedures). This is important in business continuity scenarios, BI scenarios, business process coordination patterns, and information synchronization patterns. Furthermore, the unified platform must provide an adequate metadata management layer to define and easily refine the reconciliation of semantic and syntactic differences between sources and targets. The data integration technology should not pretend to replace the centralized metadata management for the enterprise. Rather, the enterprise metadata management architecture should interface with the metadata management layer of the data integration platform. This metadata management layer is of utmost importance for information

208 Harbor Drive • Stamford, CT 06902 • (203) 973-6700 • Fax (203) 359-8066 • metagroup.com, www.sunopsis.com Copyright © 2004 META Group, Inc. All rights reserved.

7

The Future of Data Integration Technologies

consistency. The metadata repository should house the data definitions and the business rules for driving semantic reconciliation. Of course, a unified product would present a single user interface to developers. This means fewer specialists are needed. A data integration specialist will have one tool with one learning curve and one development language (such as RDBMS SQL) or paradigm to learn that can be applied to various data integration projects. Ideally, the development environment should be graphical, iconic, easy to learn and easy to use, generating a standard programming language (such as SQL or Java) to further simplify data access. The latency of data is also important to users. There is an increasing interest in real-time or near-real-time information (see Figure 4). Enterprises must define “real time” relative to business event integrity, not to absolute time. The ITO must deliver data (and make it available) to users that is “fresh enough” (or “current within an established degree of certainty”) for the individual or process using it. Thus, a consolidated data integration platform must be able to support multiple degrees of latency, ranging from batch to real-time, and multiple transport technologies such as direct database connections, file transfer, message queues, event-driven operations, and even e-mail attachments.

Decision Class

Figure 4 — Applying Real-Time Analytics

Strategy

Oversteering Innovation

Confusion Tactics

Planning

Info Overload

Coordination

Optimization Responsiveness

Operations

Interaction

Real Time

Auditing

Detection

Near Real Time

Periodic

Ad Hoc

Data Latency Source: META Group

208 Harbor Drive • Stamford, CT 06902 • (203) 973-6700 • Fax (203) 359-8066 • metagroup.com, www.sunopsis.com Copyright © 2004 META Group, Inc. All rights reserved.

8

The Future of Data Integration Technologies

Finally, when data is accessed for analytical purposes, such as in BI and information visibility patterns, it represents a secondary, independent usage of the information. New, analytical processes are executed against the data. Through 2008, data integration technology will continue to dominate integration patterns where the user access is predominantly read only, such as in BI and information visibility patterns. As META Group predicted, data integration technologies began to converge in 2003, providing a unified approach to these set-oriented integration patterns, and this trend will increase through 2004. However, many additional data integration processes will be required such as data cleansing, deduplication, and data profiling, as well as handling new data formats (e.g., XML schemas). Emerging unified data integration platforms need to be extensible, through APIs and Web services, to accommodate this ever increasing set of requirements. Data management will increasingly be viewed as a part of overall integration architectures and designs (2004/05). This will cause organizations to consolidate these efforts organizationally (i.e., create a center of excellence for integration services), to take advantage of common technologies and skills. Enterprise information integration (EII), extraction/transformation/loading (ETL), and enterprise application integration (EAI) will ultimately disappear as distinct categories and become one single “information integration” category (2007).

Bottom Line With the escalation of data management issues (e.g., increasing velocity of data through information supply chains, increasing variety of data sources to be integrated, increasing volume of data to be managed), enterprises must adopt data integration solutions that accommodate a broader set of enterprise needs. Niche data integration tools with singular abilities (e.g., data warehouse populating, application-level integration, database replication, data synchronization, virtual data integration) will survive (if not thrive) through 2004/05. However, by 2006/07, hybrid data integration technologies will dominate if they provide a single mapping/transformation interface, development environment, and metadata repository, and if they share a common security framework and multiple communications modes.

Janelle Hill is a program director with Integration & Development Strategies, a META Group advisory service. Doug Laney is a vice president and director with Enterprise Analytics Strategies, a META Group advisory service. For additional information on this topic or other META Group offerings, contact [email protected].

208 Harbor Drive • Stamford, CT 06902 • (203) 973-6700 • Fax (203) 359-8066 • metagroup.com, www.sunopsis.com Copyright © 2004 META Group, Inc. All rights reserved.

9

About META Group Return On IntelligenceSM META Group is a leading provider of information technology research, advisory services, and strategic consulting. Delivering objective and actionable guidance, META Group’s experienced analysts and consultants are trusted advisors to IT and business executives around the world. Our unique collaborative models and dedicated customer service help clients be more efficient, effective, and timely in their use of IT to achieve their business goals. Visit metagroup.com for more details on our high-value approach.

Related Documents