Object Persistence

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Object Persistence as PDF for free.

More details

  • Words: 7,412
  • Pages: 21
A Survey of Strategies for Object Persistence Aaron Fontaine, Anh-Thu Truong, Thomas Manley CSci 5802 - Spring 2006

Abstract Storing objects from an object-oriented application so that the objects can persist from one execution of the application to the next requires additional development overhead. In this paper we examine two methodologies that attempt to reduce this overhead. These two methodologies are object/relational mapping and object-oriented databases. Each one has its own set of benefits and drawbacks, and hence neither solution will work best for all applications. Nevertheless, many tools that currently exist implement these methodologies and can provide many advantages to application developers who need to utilize persistent objects.

1

Introduction

Since the inception of the object-oriented programming paradigm, application developers have been faced with the task of persisting objects. A persistent object continues to exist, even when the application that created or used the object has finished execution. There are multiple ways that an object can be persisted. One way is with a database. Early database research has led to the relational model, which is currently in widespread use in products like Oracle and DB/2 and is primarily suited to the quick processing of simple data types. Although relational databases are well suited to their task of quick information retrieval, they lack many of the constructs necessary for true object persistence. Relational databases store data in the form of tables with rows and columns. Storing objects in such a database requires translating persistent objects into data in the tables, a process known as object-relational mapping. In a basic object-to-relational mapping scheme, each object type is mapped to a separate table. Each row represents one instantiation of that object type and each column represents one of that object’s fields. Foreign keys1 are used to relate rows from different tables to replace the actual object references that would exist within the program. This scheme is about as straightforward as possible, but depending on the complexity of the project and the nature of the tools being used, it may represent significant development overhead, both in terms of design and implementation. This difficulty in mapping persistent objects into relational 1

Foreign and primary keys are used to relate information in separate database records, often between tables. In this way, disparate information can be linked together.

-1-

databases is known as the object-relational impedance mismatch problem. Our goal in this paper is to study the nature of this problem and what is currently being done to solve it.

2

Background

Relational Databases Relational databases represent the de facto standard for persistent data storage and are used across a wide range of industries that require storage and manipulation of large and complex sets of data. Airline reservation systems, stock tracking systems, inventory systems, banking systems, and point-of-sale systems all typically rely on relational database management systems (RDBMSs) to manage their information. According to the National Research Council “thousands of small businesses and organizations use databases to track everything from inventory and personnel to DNA and pottery shards from archeological digs” . The relational model was not always an obvious choice for data persistence. As computers began their rise in prominence in the 1950s and 1960s, and industry and government began to rely on them more and more, the need for reliable and efficient data storage became apparent. It was becoming too costly for businesses to maintain staff for the purpose of storing and indexing information in flat files . The early database systems offered by companies like IBM were not much of an improvement however, as they still required users to do most of the legwork in indexing and searching the data. In 1970, E. F. Codd, an IBM researcher, published a paper titled “A Relational Model of Data for Large Shared Data Banks” . This landmark paper outlined a method for using relational calculus and algebra to enable the storage and retrieval of large amounts of information, and laid the foundation for the relational model . This model had two major advantages. First is that the data gained independence from hardware and storage implementation. Second is the provision for a high-level nonprocedural language for querying data . Thus, the burden of searching and indexing potentially large volumes of data was removed from the user and placed on the database management system itself. The importance of Codd’s paper and the relational model was not recognized at first however. Even then, IBM had already invested in existing database technology and was not eager to transition to something untested and unproven . Thus the first commercially available database based on the relational model was released in 1976 by Honeywell, and the first database built on the SQL standard, which IBM also invented, was released by Oracle in the early 1980’s. The real power of the relational model comes from the fact that the model uses a formal mathematical representation for data . Relations exist within the data, but the data itself is not tied to any particular view. That is, it is not necessary to rely on any built-in

-2-

navigational information that may be part of the domain model. The data can be freely indexed and queried as a whole. This provides tremendous power to businesses which warehouse large volumes of information and often need to do ad-hoc sorts of queries or data mining on that data.

Object-Oriented Programming The object-oriented programming (OOP) model arose to address major deficiencies with the functional approach to designing software. The functional approach is good from the perspective of how computers operate, but it does not fundamentally address the manner in which humans think about the world around them. As software projects have grown more complex, this contrast has become more apparent. People think in terms of subject, object, and action. This is readily apparent from observing the structure and grammar of almost any spoken language. In terms of subject, object, and action, there is the subject that performs an action, there is the action it performs, and possibly, there is the object it performs the action on. The giant leap forward provided by OOP was to organize data together with the actions they can perform into discrete objects. For example, a bank object may contain customer objects and account objects, and a customer object may apply the withdraw action on the account object. The available funds (data) and withdraw function (action) are both owned by the account object, and the customer object contains associations to all accounts to which that customer is allowed access. This shift in programming style allowed designers to focus on design models that contained collections of interacting objects and their corresponding associations. This simplified the design process into something that is much more naturally understandable. Since for any reasonably large development effort, a large portion of development time is invested in the design phase, this paradigm shift represents large gains in terms of development effort, maintainability, and understandability.

3

The Problem

The difficulty in storing object-oriented data into a relational database is known as the object-relational impedance mismatch problem and it still has not been completely solved. Its existence is evident in almost any software development project that requires some form of persistent storage. Relational databases are currently the de facto standard for storing information. Likewise OO methodologies are the de facto standard for designing many types of applications. As long as things stay this way, developers will be

-3-

faced with the impedance mismatch problem. Why exactly this problem is difficult is not intuitively easy to understand for someone who has not had to deal with it directly. The problem is due to a difference in worldviews. Object-oriented design is based on proven software engineering principles whereas relational databases are based on proven mathematical principles . That is to say, object-oriented modeling techniques were developed for different purposes than relational modeling techniques. Object-oriented development is designed to help with abstract views of a system. It necessarily contains semantically rich information to help describe the associations and interactions between the objects in a system . Relational models, on the other hand, do not contain such information about the data they model. Translating between these two domain models is a non-trivial task, and in cases where no tools or methods are used to aid the process, can account for up to 20% – 30% of the total development effort . Worse yet, this act of translation restricts both the object model that the application is designed from and the relational model that describes how the persistent data is to be stored. Since the object-oriented model is semantically richer, this often implies there are extra constructs within the database to capture the information that the relational model cannot naturally handle. As an example of the kind of semantic information a relational database cannot implicitly store, consider the following two common examples: Example 1 An object contains another object as one of its attributes. Normally we might map each attribute to a column in the table. This works as long as the attributes are simple data types. User defined types, however, are poorly supported by relational databases and not very portable. Consider the Customer class, shown below, which contains an Address as one of its attributes.

Customer

Address

-firstname : String -lastname : String -phone : String -SSN : String +getXXX() : Object +setXXX(in object : Object)

-street1 : String -street2 : String -city : String -state : String -zipcode : String +getXXX() : Object +setXXX(in object : Object)

1

Figure 1: A UML class diagram of the Customer and Address entities

We could use two tables with a foreign key to link the customer to the correct address. Since each customer has only one address, and Address objects are not

-4-

shared amongst Customer objects, it may be more economical to store all the address information in the Customer table, like so:

Customer Column Name:

Data Type:

Firstname Lastname Phone SSN street1 street2 City State Zipcode

String String String String String String String String String

Already this implies extra work on the part of the application developer to translate the data back into their separate objects when read. Worse, the information for the object model is not stored in the database. If any other applications need to use this data, they will also need to be designed with the same semantic information in mind. Also, if the design model is ever modified to change the nature of the association between Customer and Address (allowing one customer to have multiple addresses, for example), there may be substantial development effort involved to convert the database to the new schema. Example 2 Consider a university course registration system. A student can enroll in more than one course, and a course can have more than one student. This association is shown below in Figure 2. Since a course has an unspecified number of students, we do not want to use foreign keys in the course table because we do not know how many columns to reserve. If we wish to be safe, we would set aside enough columns to account for the maximum class size. Most courses would have far fewer students, and this would result in a lot of waste. The same problem also exists for the student table. Unfortunately, there is no simple, direct, and economical way to map a many-to-many association into a relational database.

Course

Student

-CourseID : Integer

-StudentID : Integer 1..*

1..*

Figure 2: A many-to-many association between students and courses

-5-

The accepted solution to this problem is to create a separate junction table (shown below). This table simply contains pairs of foreign keys. Each pair represents one of the links of a many-to-many association. This extra information does not appear anywhere in the object model. It is an extraneous artifact necessitated by the nature of relational databases. Course Table Column Name:

Data Type:

CourseID

Integer (primary key)

Student Table Column Name:

Data Type:

StudentID

Integer (primary key)

CourseToStudentJunctionTable Column Name:

Data Type:

CourseID

Integer (foreign key to Course Table) Integer (foreign key to Student Table)

StudentID

In addition, it is often necessary for the application to contain information about the associations between objects within the database. Consider that the foreign keys in the database do not really contain the knowledge of which tables they are cross-referencing or the columns within those tables to which they are matched. This information generally comes in the form of the SQL queries that access the data. Thus, the application contains navigability information it would otherwise not need if the data were stored in a true object-oriented fashion. Perhaps the biggest difference between object-oriented and relational data models is that object-oriented models are inherently navigational. That is, as an object-oriented program executes, it generally follows references between objects one at a time. This implies a piece-meal approach to data access that is not practical to many business needs. Since the only provision that OOP provides for navigating objects is the references between those objects, there is little inherent support for manipulating and processing large disjoint sets of data. Consider a task such as searching for all employees who received an ‘excellent’ in their performance review and applying a 5% pay increase. This is a non-trivial task starting from an object model. It implies we need to know the associations between the data so that we can navigate to all the employee objects and then do a manual traversal of all these objects to locate those which received an ‘excellent’. The relational model has no such restrictions. We simply exercise a single SQL statement to retrieve those objects and then perform an update on them. This is where the object model really falls short. It -6-

provides a semantically rich and natural way to model objects but it also locks the developer into the view that is chosen. The objects that are chosen and how they are associated is not always clear or straightforward. Decisions may be made that have clear understandability or performance benefits from the point of view of one application. If a different application needs to access that same data in a different manner however, it may find the navigability very sub-optimal. One of the key differences to note here is the manner in which object-oriented programs access data, since it has performance implications. Since object-oriented applications access objects one at a time as they are referenced, this implies some sort of prefetch2 and caching scheme is necessary for performance reasons . Not prefetching enough implies a performance penalty due to excessive database accesses. Prefetching too much, on the other hand, implies a performance penalty due to the retrieval of unnecessary data. Determining which information needs to be prefetched is not a straightforward task. The cohesion between different objects or their likeliness to be used together is not apparent in the object model and thus not present in the database . This implies some form of additional meta-information may be necessary to classify objects into subgroups or provide some form of application aware prefetch information. This meta-information, which does exist for some solutions such as TopLink, implies additional effort on the part of the developer . It also results in a muddying of the object domain model since it implies that the goal of separating the object model from the relational model can not be fully achieved . Alternatively, a dynamic solution may be used, such as that proposed by Bernstein, Pal, and Shutt in “Context-Based Prefetch for Implementing Objects on Relations” . However, even they use some static metadata to aid their algorithm.

4

Solutions

There are multiple ways the impedance mismatch problem can be addressed. We focused our research on two options: object/relational mapping and object-oriented databases. We chose these solutions because there is a large amount of applicable literature available and a number of commercial products using these methodologies are present in the marketplace.

4.1

Object/Relational Mapping

Object/relation mapping (ORM) is a technique that maps between object-oriented data in software applications to the relational data in databases. In other words, the main purpose 2

In this context prefetching means anticipating the use of objects and loading them before the program references them. This can be done based on the references contained in objects already loaded from the database, or in a metadata schema that says which objects are often used together.

-7-

of ORM is to allow an application written in an object-oriented language to deal with the information it manipulates in terms of objects, rather than in terms of database-specific concepts such as rows, columns and tables. An ORM implementation typically consists of the following pieces:  An application program interface (API) for performing basic Create, Read, Update, and Delete (CRUD) operations on objects of persistent classes  A language or API for specifying queries that refer to classes and properties of classes  A facility for specifying mapping metadata  A technique for the ORM implementation to interact with transactional objects to perform dirty checking, lazy association fetching, and other optimization functions  A persistent layer that handles the object/relational mapping Without ORM, the mapping between the persistent objects and a relational database is performed by calling getter and setter methods . The application code issues the SQL statements to the database via a specific database API like Java Database Connectivity (JDBC). The SQL statements might have been written by hand and embedded in the code, or they might have been generated dynamically at runtime, and the application retrieves data from the result set. This implementation is tedious and error prone. See Figure 3 for an example.

Figure 3: Application Architecture without ORM

When ORM is introduced into the application, the ORM component can automatically translate operations on an object model into operations on a relational model and vice versa. This is accomplished through the use of mapping metadata that relates persistent attributes and associations in the object model with tables and columns in the relational model. As a result, the application developers do not have to worry about implementing the “messy” SQL statements. The fewer lines of code they write, the fewer programming

-8-

bugs in the application, and the more time they have to focus on the application business problems .

Figure 4: Application Architecture with ORM

Embedding the ORM in the application provides a clean and managed component architecture. ORM separates an application business layer and a data service layer . Therefore, the application is more maintainable and adaptable to requirement changes. As mentioned above, the application developers do not have to directly handle the SQL statements, which saves a significant amount of development costs. Much of the work in implementing an ORM solution is redundant and mundane since the work required to persist one object is similar to that required to persist other objects. For example: adding objects’ IDs and methods to store and retrieve items to the database. This kind of repetitive overhead can introduce a large opportunity for errors on the part of the developers. A good tool will relieve the developers of this burden, thereby increasing productivity and helping to make sure the code remains consistent. This actually has a secondary benefit as well, which is that the reduction in errors will also speed up test time. Since the ORM handles all the data accessing, the applications have fewer lines of code, which makes it easier to re-factor. In addition, the separation between the relation model and object model makes the application flexible for future software changes . One of the common limitations with ORM is using a lazy loading approach . An object and the object hierarchies are pre-fetched from the database when the object is accessed. A prefetch data method can improve the application performance if the prefetched data are going to be used in the application in someway; otherwise, it significantly impacts the application performance. It generates data bottlenecks . Applications with embedded ORM perform on average 15% - 20% slower than raw JDBC without caching . For example, Hibernate has four different types of fetching strategies – immediate fetching,

-9-

lazy fetching, eager fetching, batch fetching. All of the fetching strategies involve the association fetching . TopLink requires the application programmer to enter static prefetched information into the metadata before compilation . At runtime, this information is used to determine which references need to be followed and loaded along with the current object being loaded. According to the Ali Ibrahim and William R. Cook benchmark, it can be difficult to determine exactly what related objects should be pre-fetched. Since the prefetch annotations only affect performance, it is very difficult to test or validate if they are correct .

Tools Hibernate Hibernate is one popular open-source ORM application. It relies on an XML (Extensible Markup Language) document written by the application programmer to describe the mapping between the object view and the relational database. Hibernate provides constructs within this document that allow for such things as auto-incrementing object IDs and many-to-one associations. Hibernate provides a basic framework to handle accesses to databases from the object view . The three main interfaces in Hibernate are the Session interface, the Transaction interface, and the Query interface. The Session interface is the primary interface. It performs the storing and retrieving operations and is created and destroyed on every request. Once a session is opened however, the developer does not need to be concerned with the persistence of objects from the application’s perspective. The Hibernate application takes care of automatically updating and storing information. The Transaction interface is an optional API. It is designed to manage transactions such as concurrency and caching. The Query interface performs queries against the database and controls how the query is executed. Queries are written in Hibernate Query Language (HQL) or in the native SQL dialect of the specific database. Hibernate not only takes care of the mapping from application objects to the database tables but also provides data query and retrieval facilities, which reduce the development time . However, Hibernate still has two main problems 1) Only one person can perform mapping at a time – project bottleneck 2) Lazy loading – slows down the application

- 10 -

Figure 5: Application Architecture with Hibernate

TopLink TopLink is a licensed ORM application owned by Oracle, Inc. TopLink is a persistence framework that manages relational, object-relational, Enterprise Information System, and XML mapping. TopLink’s main benefits are: 1) Can persist Java objects to virtually any relational database. 2) Can persist Java objects to virtually any non-relational data source using indexed, mapped, or XML records. 3) Can perform in-memory conversions between Java objects and XML Schema. 4) Can map an object model to a relational and non-relational schema . TopLink is one step ahead of Hibernate in term of creating the object-mapping . TopLink Workbench allows developers to capture and define object-to-data source and object-todata representation mappings in a flexible and efficient metadata format, which solves the project bottleneck problem . However, TopLink still has an object fetching problem. It allows the developers to enter the pre-fetch information before the compilation. This means the developers have to know the objects’ relationships ahead of time. According to Ibrahim and Cook, it is very difficult to determine exactly what objects’ relationships will be in advance. In addition, mentioned a number of performance problems the author encountered with TopLink. First, his application was severely hurting database performance because lazy loading was not turned on by default, and even simple object retrievals were resulting in half the database being retrieved due to the extended object references. After turning on lazy loading he experienced the other extreme, which was that every single object reference required a database access, which also hurt database performance.

- 11 -

4.2

Object-Oriented Databases

In 1970 E.F. Codd laid out a clear specification for relational database systems in his paper “A Relational Model of Data for Large Shared Data Banks” . This marked the beginning of the use of the relational model in database management systems that proliferate today. Eventually the relational standard was published by ANSI as the X3H2 specification . However, no such universally accepted standard exists for object-oriented database systems . Research on object-oriented databases (OOD) began in the late 1970’s and early products employing the OOD techniques started to show up in the late 1980’s . The Object Data Management Group (ODMG) was created in 1991 by a group of object database vendors to craft a set of specifications for object database and object/relational mapping products . This group produced the ODMG version 3.0 specification in 2000 but disbanded the following year. The specification included definitions of an object model, object specification languages, object query languages, and language bindings for various object-oriented programming languages like C++ and Java. Not all parts of the specification were accepted by database vendors, but most claimed conformance to the Java language binding component. That portion of the specification was submitted to the Java Community Process to become the beginning of the Java Data Objects specification. Currently the Object Management Group (OMG) is in charge of updating the ODMG version 3.0 specification in order to create the standards for the next generation of objectoriented databases . Although there currently is not a clear standard for object-oriented database management systems (ODBMS), there are some qualities proposed by the literature that such systems must possess. These are essentially a superset of features taken from both the objectoriented programming (OOP) world and the database world. Examples of such features consistent with OOP paradigms include support for complex objects, inheritance, object identity, object encapsulation and polymorphism . From the database world comes features like persistence, concurrency, recovery, ad-hoc queries, integrity, and security . Various object-oriented database systems will differ in to what degree they implement each of these features, but they must have most, if not all, of these qualities to really be considered an ODBMS . Interest in ODBMS was initially driven by the needs of design support systems like computer-aided drafting (CAD), computer-aided manufacturing (CAM), and computeraided software engineering (CASE) applications . One of the main reasons this happened is because those types of applications rely heavily on the ability to manage complex and highly interrelated information . The ability to efficiently persist complex objects is one of the main advantages that object-oriented database systems have over relational systems.

- 12 -

RDBMS systems typically can only support a small number of primitive data types, such as strings, integers and dates. In contrast, ODBMS systems use a data model that supports encapsulation, inheritance, object identifiers, and abstract data types. These are highly beneficial properties for certain types of applications, as already mentioned above. For example, by encapsulating both the data and the methods that operate on that data within the object stored by the database, the system guarantees that only the allowed operations can be performed on the object. It also means that information can be hidden, so that the implementation can change without necessitating changes in other parts of the application . This indicates one of the dynamic properties of the OOD data model, which is that changes of state in one object can cause changes in state in other objects automatically . This is different from the relational model because in those types of systems the application does all of the operations on the database. Since the usage scenario of the data is stored outside the database, it is not possible to guarantee that all applications will use the database in a consistent manner . Object-oriented database systems support the concept of object identity, which means that objects have an existence separate from the value of the objects . For example, an Employee object for employee “Bob Jones” is always the same object even if the Salary data field of the object increases or decreases. This property means that two objects can be considered identical if they are actually the same instance of the object or equal if the values of the objects are the same. This concept enables object sharing, where multiple objects all reference a common object. Updates to the referenced common object only need to be made once. Value-based systems like the relational model can approximate object identity by assigning explicit identifiers (keys) to all objects. This is suboptimal however because it puts the burden on the programmer to ensure uniqueness and maintain referential integrity. Referential integrity is always maintained by an ODBMS because object relationships are directly represented . One of the main disadvantages of using object-oriented programming languages to interact with relational databases is that it forces the programmer to write additional code to interact with the database. This extra code is not written in the native OO language, but instead in a query language like SQL. This is a problem for a number of reasons. One reason is that this additional code can account for 25%-35% of the total application . Another reason is that it forces the programmer to use objects that will map well to the relational model, which may result in less-than-optimal data structures. Also, since the SQL statements often exist as strings, the statements themselves cannot be verified by the compiler, which makes debugging more difficult and wastes productivity. When there is a high congruence between the data model used in the application and the data model used in the database, such as with ODBMS systems, all of these problems are addressed. The programmer writes less code because there is much less need to explicitly interact with the database. He can use more natural data structures, and the maintainability and reusability of the code is increased .

- 13 -

Rather than doing table searches like in a relational system, objects are found in an ODBMS by navigating reference pointers in other objects. This can often lead to improved performance and makes it superior for certain types of applications. This can also be a disadvantage because it complicates the use of ad-hoc queries of the data. Forming a pointer-based query in response to general search query can be difficult and slow. This is part of the reason ODBMS systems are best for dealing with highly interrelated complex objects. Another disadvantage of ODMBS systems is that changes to class definitions can require changes to the database schema. Depending on the type of definition change and the product support for schema migration, these modifications can be quite costly. For example, the database may need to be taken offline while the migration takes place, which may negatively impact revenue. Application support may need to be provided to perform the migration, which requires additional time from the developers. Some ODB products support multiple concurrent schema versions which may help alleviate some of the problems, at least until a good time can be found to take the system offline. Object-oriented database systems can be a good solution to the impedance-mismatch problem. Certain types of applications will benefit the most, and could experience significant performance gains while decreasing development time when compared to relational systems.

Tools GemStone GemStone Systems currently makes a couple of different flavors of their GemStone product. One of these is GemStone Facets, which “provides transactional, easy to implement, highly scalable transparent persistence by reachability for native Java objects - no O/R mapping required” . The other is GemStone/S, which “is the state-of-the-art platform for developing, deploying, and managing scalable, high-performance, multi-tier applications based on business objects” . We found descriptions of two older GemStone products in published literature; GemStone Version 3.2 in and GemStone/J in . We thought that these product descriptions would be less biased then the official product literature and the general conclusions they draw still might apply to the current line of products. GemStone 3.2 Although GemStone 3.2 supports development in C, C++ or Smalltalk, they focused on C++ in . Classes describe the database schema, and may include predefined GemStone classes. In order to indicate that a particular class should be persisted it must inherit from - 14 -

the GS_Object class. User classes can contain standard C++ data types, other user defined or GemStone defined classes or single dimensional arrays of these types, but multiple inheritance is not supported. The database schema is created by a GemStone utility called the registrar. This program parses the header files for class definitions and automatically generates the extra classes and methods necessary for making the persisted objects available to the application. Additional properties of the class can be passed to the registrar using keywords embedded in comments in the code. When the application accesses a persisted object, GemStone moves the object from the database to the address space of the application. If the object state is modified the database is also updated. The application may also retain control over when database updates are done in order to improve performance. One-to-one and one-to-many unidirectional relationships are supported by GemStone, but many-to-many and bidirectional relationships are not. Object versioning is also not supported. Certain types of schema changes (due to changes in the persistent class definitions) can be applied automatically. Other types of class definition changes require programmatic migration of the database to the new schema. GemStone/J The paper on GemStone/J was more of a case study than an objective analysis. A team was tasked with building a customer loyalty portal for FirstRand Bank called eBucks.com. They were able to deliver a working system on time, which they contribute to their choice to use an ODBMS to provide storage for the application . They found the ODBMS very easy to use. No extra work was required to map classes to tables as would be necessary in a RDBMS. Inheritance was supported so they did not have to make difficult decisions about how to map variables across multiple tables. Both the application and the database used Java, so the developers did not have to learn another language to create the database schema. They were also able to isolate over 97% of the 1000 classes from having any knowledge of the underlying database, which would probably not have been possible with an RDBMS. Because of the support for complex object models they were able to create an easily extendable application while improving the maintainability of the system and reducing code complexity. They did run into some problems due to the use of an ODBMS system. One problem was the lack of good support for arbitrary queries. They attempted to work around this issue by building collections of objects indexed according to every possible key they could think of. This allows queries to work more like they do on a traditional RDBMS. This solution worked except that every few months they had to add a new key which they had not thought of before. Another problem was caused by changing class definitions. When

- 15 -

this happened every instance of the persisted class had to be found and modified. The system could not be running while the schema was being changed, which meant lost revenue. Overall they believed they were very successful at developing the system using GemStone/J and will continue to use it for new applications in the future. The benefits they received by using an ODBMS far outweighed the disadvantages.

Jasmine The Jasmine ODB object database “transforms enterprise information into powerful, interactive objects for use in any application” . There are a few case studies mentioned in that put this ODBMS in a very positive light. It says that Jasmine is the “industry’s first visual development environment designed explicitly for object-oriented, multimedia applications.” That is a pretty strong statement but we were unable to confirm or deny it. Toyota used Jasmine to create a multimedia rich application for use on dealership kiosks. They found the Jasmine Application Development System very easy to use, due to Jasmine Studio’s drag-and-drop, point-and-click development scheme. They were also impressed with Jasmine’s ability to deploy the same application in multiple ways: as an application, on client/server systems and on the Web . L’Oreal appreciated the many features that make Jasmine an object-oriented database, like inheritance and the ability to work with complex data objects like video, audio and images.

4.3

ORM vs. ODBMS

In this paper we have examined both object-relational mapping and object-oriented databases for the purpose of persisting objects. It may appear, given the complexity of ORM and the clear benefits of ODBMSs, that developers should always use an ODBMS. That is not always the case. Which method is used depends on the nature of the program or programs persisting data and the requirements of the organization using that data. If the relationships in the data are relatively simple, and query-based access is a high priority, an RDBMS, along with an ORM tool, is probably the most appropriate implementation. If the data contains complex relationships, and its use is more specialized and specific, than an ODBMS is probably the appropriate solution. In addition, many organizations have legacy RDBMS systems that cannot be easily upgraded. This may necessitate the use of ORM techniques even where they are not beneficial. To help give a general overview, we have created a table detailing the pros and cons for both solutions.

- 16 -

Relational Databases Pros: Data exists independently from any object model. Thus, it is not tied to any specific implementation. More portability between languages.

Excellent support for ad-hoc queries.

Very mature technology. Lots of industry support. Cons: 25% of coding time is spent mapping objects into the relational schema . Lack of encapsulation of methods with data means data may be used inconsistently between programs, leading to data corruption . Poor support for handling complex data.

Object-Oriented Databases Pros: No extra work required to map the domain model to the database schema. Objects are supported in their native format. Encapsulation allows better sharing and reuse between programs. This helps ensure data integrity . Easily allows arbitrary complexity, since the data model is carried directly into the database . Allows access to the database to be done transparently, directly in a language’s own native syntax. Cons: Data is tied heavily to implementation. A change in the domain model implies a refactoring of the database. Data is tied to the language used to store it. This limits implementation choices for future programs that access the data. Support for ad-hoc queries is marginal, since data must be navigated using the references stored in the data.

The largest benefit the object-oriented databases have here is to handle data persistence transparently in a languages native syntax. This alone is a very lucrative reason to use an ODBMS. That does not mean the relational model should be thrown out. Relational databases were developed for different reasons and for different purposes. Opinions on the effectiveness of various ORM tools are mixed, but if used correctly, they do help simplify the development effort .

5

Conclusion

The ability to persist data is clearly an important one. Object-oriented applications deal with complex objects, which often model real world entities. Storing these types of - 17 -

objects in a database may initially sound like a simple task, but upon investigation we discovered that it is actually extremely complicated. Even mapping the object data to columns in a relational database brings up difficult questions, such as “how many tables are required to store two objects with an inheritance relationship?” There are many other issues that make this a tough problem to solve, but two methodologies do a fairly good job at addressing them. One of these methodologies is object/relational mapping (ORM), which refers to the mapping between OO application constructs and relational database constructs. Countless tools exist that simplify the job of the application programmer by hiding many of the difficult aspects of the mapping process. This frees up the programmer to focus more time on developing the application without worrying as much about how to map the objects into the database. The fact that the object data will be stored in a relational database cannot be forgotten completely, however, and the programmer may still have to make some less-than-optimal design decisions in order to work well with the database. The other solution we researched was object-oriented databases. These databases have advantages over relational databases because they are actually composed of objects instead of tables. This is especially valuable to OO application programmers because they do not have to worry about how their objects will get mapped into the database constructs. Instead, there is no mapping required because the objects stored in the database are indistinguishable from the objects that existed in memory. This means that the objects still contain data and operations, the inheritance hierarchies and object references are still intact, and all without sacrificing the benefits of traditional databases like concurrency, security and integrity. Unfortunately, database schema migrations due to changes in class definitions can be costly, and support for ad-hoc queries is typically not as good as it is with relational databases. These are not the only two options for addressing the problem of persisting objects, but they are the two main ones. A number of tools exist in the marketplace that use these methodologies, such as TopLink, Hibernate, Jasmine and GemStone. It appears that some of these tools work quite well, but each one also has its own disadvantages. As time goes on we expect object-oriented databases to become more popular with OO application developers because of the many benefits they can provide that relational databases cannot. On the other hand, there is a lot of dependence on relational databases and many companies will not be quick to throw away their existing applications or be willing to give up the ability to make efficient ad-hoc queries of their data.

- 18 -

Bibliography [1] A Brief History of Databases, http://wwwdb.web.cern.ch/wwwdb/aboutdbs/history/industry.html [2] A. Keller, R. Jensen and S. Agrawal: Persistence Software: Bridging Object-Oriented Programming and Relational Databases, In Proceedings of the International Conference on Management of Data (SIGMOD 93): 523-528, Washington, May 1993. [3] Ali Ibrahim and William R. Cook: Automatic Prefetching by Traversal Profiling in Object Persistence Architectures, to be presented at ECOOP 2006 [4] Charly Kleissner: Enterprise Objects Framework, A Second Generation ObjectRelational Enabler, SIGMOD Conference 1995: 455-459 [5] Christian Bauer and Gavin King: Hibernate in Action, Manning Publishing, August 2004 [6] Christopher Keene: An ounce of prevention: Avoid J2EE data layer bottlenecks, http://www.javaworld.com/javaworld/jw-04-2004/jw-0405-bottleneck_p.html [7] Computer Science and Telecommunications Board, (U.S.) National Research Council: Funding a Revolution: Government Support for Computing Research, National Academy Press, Washington, DC, 1999, http://www7.nationalacademies.org/cstb/pub_revolution.html [8] Dare Obasanjo: An Exploration of Object Oriented Database Management Systems, http://www.25hoursaday.com/WhyArentYouUsingAnOODBMS.html [9] David Chamberlin: Has the Objects/Relational Impedance Mismatch been Solved?, http://www.javalobby.org/java/forums/t54475.html [10]E. F. Codd: A relational model of data for large shared data banks, Communications of the ACM, 13(6): 377-387, 1970 [11]GemStone Systems: http://www.GemStone.com/ [12]Gregory McFarland, Andres Rudmik, and David Lange: Object-Oriented Database Management Systems Revisited, DACS report, 1999 [13]Gustav Fahl and Tore Risch: Query Processing Over Object Views of Relational Data, VLDB Journal, 6(4): 261-281, 1997

- 19 -

[14]Industry Leading Performance For Dynamic POJO Data Persistence™ With CocoBase® Enterprise O/R, whitepaper from http://www.thoughtinc.com/cber_whitepaper_perf.html [15]Jack A. Orenstein and D. N. Kamber, Accessing a Relational Database through an Object-Oriented Database Interface, Proc. 21st VLDB Conf.: 702-705, 1995 [16]Jasmine Object Database: http://www3.ca.com/solutions/Product.aspx?ID=3008 [17]Joseph M. Hellerstein: Optimization Techniques for Queries with Expensive Methods, ACM Trans. Database Systems. 23(2): 113-157, 1998 [18]M. Atkinson, F. Bancilhon, D. DeWitt, K. Dittrich, D. Maier, and S. Zdonik: The Object-Oriented Database System Manifesto, Proc. 1st Intl. Conf. on Deductive and Object Oriented Databases, Kyoto: 40-57, 1989 [19]Mansour Zand, Val Collins, and Dale Caviness: A survey of current object-oriented databases, ACM SIGMIS Database, 26(1): 14-29, 1995 [20]Merlin Hughes and Michael Shoffner: Build an object database, http://www.javaworld.com/javaworld/jw-01-2000/jw-01-step_p.html [21]mirleid: Java ORM: lessons learned, http://www.kuro5hin.org/story/2006/3/11/1001/81803 [22]Object Data Management Group, http://www.odmg.org/ [23]Object Data Management Group, http://en.wikipedia.org/wiki/Object_Data_Management_Group [24]Oracle TopLink Developer’s Guide, 10g Release3 (10.1.3), Volume 1 – 5, http://www.oracle.com/technology/products/ias/toplink/doc/1013/main/_pdf/index.ht m, 2006 [25]Philip A. Bernstein, Shankar Pal, David Shutt: Context-based prefetch - an optimization for implementing objects on relations, VLDB Journal 9(3): 177-189, 2000 [26]Probir Goyal: Mapping Data Objects Simply, http://www.ftponline.com/javapro/2004_08/online/pgoyal_08_04_04/ [27]Saitha S.V: Understanding Hibernate ORM for Java/J2EE, Whitepaper:VLAN Best Practices, July 12, 2005, http://www.codetoad.com/java_Hibernate.asp [28]Scott Ambler: Agile Database Techniques, John Wiley & Sons, 2004

- 20 -

[29]Seth Grimes: Modeling Object/Relational Databases, DBMS, April 1998 [30]Steve McClure: Object Databases vs. Object-Relational Databases, IDC Bulletin #14821E, August 1997 [31]Vincent Coetzee, Robert Walker: Experiences using an ODBMS for a high-volume internet banking system, OOPSLA Companion 2003: 334-338

- 21 -

Related Documents

Object Persistence
November 2019 23
Persistence
May 2020 10
Encourage Persistence
October 2019 28
Object
June 2020 12
Object
August 2019 31
Object
November 2019 33