PDM:
An Object-Oriented Data Model * Frank Manola, Umeshwar Dayal Computer Corporation of America Cambridge, Massachusetts U.S.A.
Abstract
PDM (PROBE Data Model).
This paper describes the development of the data model of PROBE, a knowledge-oriented DBMS being developed at CCA [DAYA85, DAYA86]. The data model, called PDM, is an extension of the Daplex functional data model [SHIPS1, FOX84] that illustrates an integration of funco tionai, relational, and object-oriented approaches. The extensions are primarily those required to handle the requirements of new database applications, such as engineering applications and cartography, having spatial or temporal semantics.
Our interest in starting with DAPLEX was due not only to our prior, experience with both the design of D A P L E X and its implementation in several DBMS systems (e.g., LDM [CHAN82]). It was also because DAPLEX incorporates many of the features generally associated with objectoriented models, and considered as advantages in the new DBMS applications, including:
1. I n t r o d u c t i o n
• object- and set-valued properties of entities (in DAPLEX, entity- and set-valued functions); this in turn allows the structure of complex objects to be modeled directly.
• the concept of an entity or object that has existence independent of any properties or relationships with other entities it might have.
The application of database technology to new applications, such as CAD/CAM, geographic information systems, software engineering, and office automation, is an extremely active area of database research. Many of these new applications deal with highly structured objects that are composed of other objects. For example, a part in a part hierarchy may be composed of other parts; an integrated circuit module may be composed of other modules, pins, and wires; a geographic feature such as an industrial park may be a complex composed of other features such as buildings, smokestacks, and rail lines; and so forth. Moreover, many of the objects dealt with in these applications have spatial or temporal properties, or have conventional properties that vary in time or space. For example, a mechanical part has a geometric shape; the altitude of the earth's surface varies with the latitude and longitude.
• an entity class or type concept, together with generalization (subtype) hierarchies among classes with property inheritance. • a method for incorporating entity behavior and derived properties in the model; this is provided by functions in DAPLEX (these correspond to messages in other objectoriented models). The implementation of the basic DAPLEX model, including support for entities, functions (typically implemented by pointers), referential integrity, and generalization hierarchies, has been described in [CHANg2]. Our specific goals for PDM extensions were to provide:
A number of papers have proposed extensions to conventional record- or tuple-uriented data models to deal with these applications, e.g., [LORI83]. However, another approach has been the development of entity- or objectoriented data models for these applications [LOCHS6]. This paper describes the development of the data model of PROBE, a knowledge-oriented DBMS being developed at CCA [DAYA85, DAYA86]. Our approach has been to start from DAPLEX, an entity-oriented functional data model and query language [SHIP81, SMIT81] which already provides some of the necessary basic features, and to enhance DAPLEX to meet the requirements of these new database applications. We refer to the enhanced model as
• a smooth incorporation of multiargument functions and computed functions (general procedures) into the model. •
• a clean way of extending the model with objects having nonconventional semantics, particularly spatial and temporal semantics. Multiargument and computed functions are important as representations for time- and space-varying entity properties, and in the integration of derived data, general procedures, and operations into the model. A variant of relational algebra has been used previously as an informal target interpretation of Daplex in the study of query optimization techniques (e.g., [RIES83]). It was felt that a more explicit definition of such an algebra would be useful for PDM.
* This work was supported by the Defence Advanced Research Projects Agency and by the Space and Naval Warfare Systems Command under Contract No. N0003985-C-0263. The views and conclusions contained in this paper are those of the authors and do not necessarily represent the official policies of the Defense Advanced Research Projects Agency, the Space and Naval Warfare Systems Command, or the U.S. Government.
18
TH0161-0/86/0000/0018501.00 © 1986 IEEE
an algebra-based formal data model definition (along the lines of the relational model) for use as a foundation for. query optimization, view mapping, and query language development studies.
The structure of this paper is as follows. First, we describe the basic characteristics of the model. We then describe an algebraic definition of the model and some issues related to the definition. Next, we briefly describe other facilities of the model, such as rule facilities and recursion, and finally we describe current work in progress. Our intent is to describe the general characteristics of PDM objects. We do not consider in this paper the special characteristics of particular object classes that have been investigated for PROBE, such as spatial/temporal entities, or their functions (for a description of such entities and functions, see [MANO86]).
QUANTITY_ON_HAND(DEPOT, -* (MATERIAL, INTEGER) or FINITE_ELEMENT_ANALYSIS (MODEL,LOADS, OPTION) ~ DEFORMED_MODEL Functions may be defined that have no input arguments. For example, an entity type is, in some contexts, considered as a function of no arguments that returns all entities of that type (see below). Also, functions can be defined that have only boolean (truth valued) results. The PDM generalization of Daplex functions allowing multiple input and output arguments has a number of notational and semantic effects. First, because it is possible for functions to have more than one output argument, the function name and its input parameters do not always denote a single output value, as in conventional functional languages: sometimes it denotes a tuple of output argument values. Because it is necessary to be able to refer to individual output values within such a tuple, PDM functional notation provides each argument with a label. This acts as a formal parameter name as in a procedure defined in a programming language.
2. P D M - - D a t a O b j e c t s
2.1 E n t i t i e s
There are two basic types of data objects in PDM, entities and functions. An entity is a data object that denotes some individual thing. The basic characteristic of an entity that must be preserved in the model is its distinct identity. An entity is represented by a surrogate.
Second, because of the way PDM functions are defined as relationships between collections of entities and scalar values, it is possible to have function arguments that serve as both input and output parameters. For example, given a 3-ary relationship QUANTITY-ON-HAND between depots, materials, and amounts of materials stored at depots, one could imagine both the functions
In PDM, entities are grouped into collections called
entity types. Examples of entity types might be PERSON (representing people) and MATERIAL (representing types of materials). An entity is not "typed" in the sense that it necessarily has a representation distinct from that of other types. An entity must be asserted to be of one or more types by operations. The types of an entity serve to define what functions may be applied with the entity as a parameter value.
QUANTITY_ON_HAND_I (DEPOT,MATERIAL) --~ INTEGER and QUANTITY_ON_HAND_2(DEPOT) ---. (MATERIAL,INTEGER)
Generalization hierarchies may be defined among entity types, as in Daplex, with the usual function inheritance semantics. This allows inheritance of operations as well as properties. The same entity may be associated with one or more entity types in the database, as defined by database declarations, and may move in and out of types as a result of database operations. Thus, one might define STUDENT and INSTRUCTOR as subtypes of PERSON. An entity that is a STUDENT is also a PERSON, and any function that is defined for an entity of type PERSON may also be applied to an entity of type STUDENT. Moreover, an entity might move from subtype STUDENT to INSTRUCTOR (e.g., if a former STUDENT was hired as an INSTRUCTOR).
being useful, even though they are based on the same 3-ary relationship. This notation appears to define two different functions, but the only real difference is that MATERIAL is an input variable in one and an output variable in the other. The PDM notation for the above two functions would be: QUANTITY_ON_HAND(D: in DEPOT, M: in out MATERIAL, QUAN: out integer) This notation is essentially that used in declaring an Ada* subprogram, using "in" and "out" declarations to specify how the parameter may be used, as well as giving a parameter label and the required parameter type. (As in Ada, parameters may be supplied to functions using either "keyword" or positional references). The general notation is:
2.2 F u n c t i o n s
function~ame(labell: {in/out/in out} typel .... ,labeln: {in/out/in out} typen)...
Properties of entities, relationships between entities and operations on entities are represented in PDM by functions (operations). In order to access properties and relationships of an entity, one must evaluate a function over the entity, rather than imagining that the properties and relationships are "there", as in a record-oriented approach. In Daplex, a function is a mapping from entities to either individual entities, sets of entities, or scalar values. Thus, for type STUDENT, one might define:
(This notation applies to the concrete syntax being used to define the PDM model. A query language based on the model might very well adopt a different convention). Using this notation, general procedures can be included in the model, e.g., function FE_ANALYSIS(MODEL: in FE_MODEL, LOADS: in FE_LOAD, OPTION: in integer, DEFORMED_MODEL: out FE_MODEL)
AGE(STUDENT) ~ INTEGER ADVISOR(STUDENT) --. INSTRUCTOR PDM generalizes this concept by defining a function as a relationship between collections of entities and scalar values. For example, one might define:
* A d a is a trademark of the Department of Defense (Ada Joint Program Office).
19
The definition of this function indicates that parameters labeled M O D E L , L O A D S , and O P T I O N are input-only parameters, while D E F O R M E D _ M O D E L is an output-only parameter. Similarly, a conventional entity property might be defined by:
3. P D M A l g e b r a
3.1 R e l a t i o n s h i p t o R e l a t i o n a l A l g e b r a
function A G E ( P E R S O N : in out P E R S O N , R E S U L T : in out integer)
An algebra can be formed for the objects (entities and functions) defined in the last section by adopting a "functional interpretation" for operations of a modified relational algebra, and considering these operations as built-in operations (functions) on function objects. (For functions with stored extents, this interpretation is consistent with the one normally associated with the relational model). These operators are viewed as operating on functions and returning functions (effectively relations, as noted above). The arguments of these returned functions are defined in terms of the input functions in the same way as columns of output relations are defined for conventional relational algebra operations. In addition, operators are provided for creating and destroying entities.
a function in which either argument can be an input or output parameter (note in this notation that the first appearance of P E R S O N is a formal parameter name, the second is an entity type). In this case, the function may be invoked with a value for either parameter, to produce a value for the other. Functions can also be used to model arbitrary operations on entity types. Examples of such operations are "the center of gravity of a part" (computed from the shape of the part), and "rotate part X through angle A."
2.3 C o m p u t e d v e r s u s S t o r e d F u n c t i o n s
To illustrate the "functional interpretation" of relational operators, the projection of F(X,Y) on X can be interpreted as asking "what values of argument X are defined for function F(X,Y)?" It is possible to imagine t h a t in some cases projection would be defined for intensionally-defined functions as well as for conventional relations. Metadata specifications can be used to indicate whether or not such a built-in function applies to a given function in the database.
PDM functions may either be intensionally-defined (with output values computed by procedures) or extensionallydefined (having stored database extents, as is usually the case in conventional Daplex). If the function is intensionally-defined, the body of the function is stored in the DBMS metadata. If the function has a stored extent, the extent is conceptually a stored table (a database relation) containing a tuple for each combination of legal input and output values. Set-valued functions are viewed as being "flattened" into tuples in such relations in the normal relational style. Conceptually, the column names of this table are the labels of the parameters defined in the function definition.
In PDM algebra, both functions with stored extents and functions with computed extents are treated as subroutines. Instead of a join operation, an "apply and append" operation is provided that applies a function to arguments contained in tuples of another function. The result is a new function formed conceptually by appending columns to the argument function containing the results of the function evaluation. In the case of functions with stored extents, the interpretation of an "apply and append" is that of a relational join of the relation containing the arguments (the argument function) to the relation representing the function extent.
In PDM, references to all functions are treated syntactically as if they were references to computed functions, even when a stored extent exists, rather than either treating all functions as if they corresponded to stored relations, or treating the two classes of functions differently. (The next section describes the effect this has on the definition of algebraic operations). This functional interpretation matches the functional syntax intended for query languages based on the model (similar to Daplex) more closely than a relational interpretation would. It has also assisted in understanding problems related to integrating computed functions into the model (discussed in the next section).
The appearance of a function in an algebraic expression is a call for its evaluation, using actual parameters substituted for its formal parameters. The form for this is: function-name(Lil:Lol,...,Lin:Lon), where Li is a formal parameter label. For input parameters, Lo will be the label of an existing column in the argument function that contains the values to be supplied for that parameter. For output parameters, Lo will be the label to be assigned to that output parameter for use within the result function.
Effectively, the difference between the two classes of functions is that, if a function has a stored extent, the function can actually be evaluated with any combination of input or output arguments assigned values. This is reflected in parameters of stored functions being generally declared as "in out".
An advantage of including function application within an algebraic framework like this is that the resulting function "built up" by an algebraic expression serves as a context for the evaluation of subsequent functions. In particular, it retains associations (possibly indirect) between input and output arguments t h a t can be useful in constructing complex results, such as those found in view definitions and other types of computed functions. The result appears to be an effective integration of strictly functional capabilities such as those in [BUNE82] with relational and object-oriented ones. The identification ~f functional and object-oriented capabilities with relational ones is particularly important in suggesting implementation strategies and query-optimization techniques.
It may be observed that mathematically these "functions" are relations, and those with stored extents are relations in the sense of the conventional tabular view of relations in the relational data model. We will continue to refer to these objects as "functions", because syntactically they will be "applied" like (computed) functions. However, as the definition of the algebra will make clear, what has been defined here can be considered a form of "entity relational" model, along the lines of R M / T [CODD79].
20
a. what should be the result when the function is not defined for a particular tuple of input arguments?
The algebra contains operations for projection, Cartesian product, and set operations (and their "outer" variants [CODD79]). The usual union-compatibility constraints apply. Any columns containing entities are considered union-compatible (allowing formation of new generalizations). Selection is defined as in the relational algebra, where the selection condition may be either a Boolean function evaluation (which may be nested), including such functions as the usual predicates allowed in relational O-joins, logical connectives, or function arguments ("columns") containing truth values.
b. what should be the result when the function is defined for tuples of input arguments that are not actually supplied as arguments? c. should the function return both the input argument values and results, or just results? d. how are functions to be updated (primarily of concern for stored functions).
To illustrate these ideas, consider the following example (in a Daplex-like query language):
These issues have been partially dealt with in the relational model by providing different types of joins, and by adding null values and "outer" operations. In addition, Codd has noted that the differences between "closed world" and "open world" assumptions about relations could be dealt with in the model by determining which of two types of null values should be returned by outer joins [CODD79]. Codd has suggested that this be determined on a perrelation basis, although the type of null could also be indicated by a parameter or by different operation variants. Provision of these join variants gives the user control over the appropriate interpretation of the functions represented by relations, and what the user intends to happen under various circumstances. For completeness, similar capabilities must either be provided for computed functions, or their absence explicitly recognized, and dealt with. T h a t is why the above definition of "apply and append" was referred to as "generic": like join, a number of variants have been identified, and are defined in the model. Note that one always has the option to "construct" appropriate results using more conventional operations, rather than defining multiple "apply" options. Again, this is just like the outer join case: the results of outer joins can be constructed using more conventional operations, but it is often more convenient to have the special operations.
print(NAME({P in PERSON where AGE(P) < 10})) PDM algebra: TI: apply_append(,PERSON:P) W2: apply_append(T1,AGE(PERSON:P,RESULT:A)) T3: select(W2,LT(A,10)) T4: apply_append(T3,NAME(PERSON:P,RESULT:N)) T5: print (T4,N) "(The operations have been separated for clarity). The sequence of operations is similar to that which would be used in the relational algebra in an RM/T-like database, except for the use of "apply_append" instead of join. The initial "apply_append" in T1 denotes the evaluation of entity type PERSON as a 0-argument function returning a set (column) of entities of that type. The functional interpretation of this is that each new "apply_append" of a nser-defined function appends additional columns to the function denoted by the algebraic expression, as a relational join would do to a relation.
3.2 Function Application The more interesting operations of the algebra are those involving function application, and those involving individual entities. The primary operation of interest here, is "apply (function) and append". This plays a role in the PDM algebra corresponding to that of join in the relational algebra (and for functions having stored extents, the intended interpretation is the same). The generic version of "apply and append" is defined as follows:
Case (a) refers, in relational terms, to the choice between an ordinary join and a "left outer join" (using terminology from [MERR84]). In a conventional join, the input tuple for which the function was undefined would be eliminated. However, even though this might be desirable, it would be an unconventional side effect of trying to apply a computed function to a set of arguments. A more normal result for computed functions would be that the function aborts in some way. This can currently occur in Duplex if, for example, STUDENT and INSTRUCTOR entities are defined as subtypes of PERSON, ADVISOR(STUDENT) ---* INSTRUCTOR is defined, and one attempts to evaluate ADVISOR over a set of PERSON entities, some of which are not students. A third possible result would be for the function to return some form of null value, as in an outer join. This allows the user to ensure that the result contains all the original arguments, and that the query does not abort in the middle of a set-oriented expression.
apply and append - - Given a function P having an existing set of labels LIi (the "input" labels), a set of column labels LOi (the "output" labels) not currently defined in P, and a function F(L1,L2,...,Ln) where the Li are formal parameter labels, and LIi is type-compatible with Li, the application of F to P, apply_append(P, F(..., Lj:LIj . . . . . Lk:LOk, ...)) is the set of tuples t such that t is the concatenation of a tuple t l of P and a distinct tuple of values labeled LOk returned by evaluating the function using parameters taken from the LIi columns of tl. If, for a given set of parameter values, function F is set-valued, each member of the set will be concatenated with t l to produce a new tuple t.
To support the requirements of these operations, two types of "null values" are currently supported, as described in {MERR84]. The "don't care" null, denoted * here, behaves like a special value, with properties similar to those of non-null values. Thus, every domain contains * as a potential value, and a three-valued logic is not required. The "don't k n o w " null, denoted ? here, can be used as the default value of stored functions for which no explicit value has been supplied. The use of the ? null requires the definition of a three-vaiued logic. The one currently being considered is that defined in [MERR84], although this is under investigation.
If computed functions are to be integrated into PDM algebra with the same generality that relational algebras such as that found in [CODD79] provide for relations, a number of issues must be addressed. These include issues relating to the semantics of partial functions and incomplete information. The issues can be identified in functional terms a~ follows:
21
Case (b) tellers, in relational terms, to the choice between an ordinary join and a "right outer join". In a conventional join, the effect would be the "normal" one: the function is unevaluated for that argument tuple. However, an option corresponding to a right outer join might apply here too, i.e. supplying "null columns" of input arguments for unsupplied arguments for which the function is defined. (This would certainly be required for stored functions). Cases (a) and (b) also involve the choice of what kind of null would be returned if that were required.
and a set of labels Li denoting other columns of P, the removal of duplicates among the entities labeled L1 in P is denoted by unique(P,L1,Li). It is the function obtained by replacing all entities in the same label L1 position in tuples having the same values for columns Li by a single entity. It should be noted that using "createobi" to form a new entity (or a column of new entities in a function) effectively constructs a dynamically-created entity type (since all an entity type is in PDM is a set of entities). Such entities may exist in algebraic expressions without functions t h a t can be applied to them (these would have to be created by separate operations).
Case (c) refers, in relational terms, to the choice between a natural join and a O-join. For computed functions, the problem is relatively straightforward for equijoins, but is less so for other forms of e-join. The obvious solution here, as in the previous cases, is to forbid forms of function application that make no sense for specific functions. The point is that all these variants are useful for stored functions (relations), and it is these sorts of semantic distinctions that create problems in trying to hide the fact that certain functions are computed.
Metadata is represented using entities and functions, so that PDM algebra can also be used in operating on metadata. Currently, the structure of this metadata is a hybrid between that used in the LDM [CHAN82] and R M / T [CODD79], although there is obviously a great deal more potential flexibility in the use of an object-oriented approach for metadata. The need for some of the special operators defined in R M / T for use in mixed m e t a d a t a / d a t a queries is currently being investigated. Unlike R M / T , which requires operators to translate between names and relations, P D M allows direct functions between metaentities and extensional objects (e.g., database entities). Thus, some of these operators may not be needed. The full model definition also includes operations for updating database entity types and functions (where this is permissable).
Case (d) refers to a number of problems. For example, if a multiargument function is intended to be a total function, i.e., defined for all possible values of the input arguments, this means that all possible n-tuples of input arguments would ordinarily have to be present in a stored extent. Moreover, when a user added a new argument value involved in such a function, the user would theoretically have to supply a value for each combination of that value with all combinations of the other arguments, in order to comply with the requirement that the stored extent reflect the "totalness" of the function. This is clearly unrealistic, from both the user and implementation perspectives. One solution would be to allow functions to be declared as either partial or total. A function declared as total might then have a less-than complete extent, and would by default generate ? nulls if applied with input £rgument values not present in its stored extent. The same function declared as partial, with the same stored extent, would generate * nulls under similar circumstances.
3.4 A l g e b r a P o w e r The algebra as currently defined is powerful enough to perform arbitrary manipulations of PDM databases, including formation of generalization hierarchies, new entity types, and multiargument functions. A number of examples have been worked out, although space precludes presenting one here. The technique is essentially that used in performing database "restructuring" in the relational model. Existing entity types and functions can be combined into complex functions using apply_append or Cartesian product, and "new" entity types and functions can be projected from those functions.
The general problem of updating functions in the model is coextensive with that of updating views. In general, functions may be defined that are partially stored and partially computed. For example, the function
4. O t h e r P D M C a p a b i l i t i e s
ALTITUDE(LAT,LONG) ~ HEIGHT 4.1 Ext ensibility
in a geographic database would typically be defined by interpolation over a set of (LAT, LONG, HEIGHT) tuples, while using a stored value where one existed for a particular latitude and longitude.
New data types and operations can be added by defining them as new entity types and functions. The physical realization of these types and their operations may be made invisible to the users through an abstract data type mechanism (as in [STON83]). Essentially this provides a straightforward way of interfacing specialized hardware or software processors to the DBMS. The data types and operations implemented by these processors are defined in the schema as entity types and functions that can be used in queries and transactions exactly like the entity types and functions t h a t are "directly" implemented by the DBMS.
In addition to the forms of functions described already, PDM also supports various other forms, such as aggregate functions. Again, the techniques used borrow heavily from those familiar in the relational model. • 3.3 O t h e r O p e r a t i o n s Operations for operating on single entities are also provided in the model. For example:
Recognizing that many of the objects that occur in the applications to be supported by PROBE deal with spatial and temporal data, we have given special treatment to spatial and temporal semantics. This is accomplished through the definition of special entity types (which are subtypes of the generic type ENTITY), with special behavior, to model spatial and temporal properties of objects. How this works in general is shown in the following simplified (spatial)
Createob] - - Given a function P(L1,...,Ln), createobj(P,L) returns a function PI(L1,...,Ln,L) obtained by concatenating each tuple of P with a new entity (surrogate) in column L. Duplicates R e m o v a l (Unique) - - Given a function P, one of its labels L1 denoting a column containing entity surrogates,
22
example. A [MANO86].
thorough
discussion can
be
found
Constraints themselves are specified either as rules that assert some invariants over the database or indirectly through procedures that check whether some desired conditions are satisfied.
in
entity PART function P A R T # ( P T : in PART,NUM: out integer) function SHAPE(PT: in PART,SHP: out PTSET) function COLOR(PT: in PART,EXTENT: in PTSET, CLR: out COLORVALUE)
4.3 R e c u r s i o n The inclusion of recursion in DBMSs, especially to support PROLOG-style deduction over relational databases, has become an extremely active area of research. Rather than adding general recursive PROLOG-style rule processing, which is potentially inefficient and not guaranteed to terminate, to a DBMS, we have identified a special class of recursion, called traversal recursion, that has two crucial properties. First, it is powerful enough to express the recursive computations needed for complex objects. Second, very efficient algorithms, based on graph traversal, exist for performing these computations. Traversal recursions generalize transitive closure computations on the database viewed as a graph.
entity PTSET function CONTAINS(P:PTSET,Q:PTSET) Briefly, entities of type PTSET that have the semantics of points or point sets (such as lines, areas, or volumes) are defined in the model. These serve as the values of spatial attributes, such as "shape" or "boundary", of ordinary database entities, such as "parts". Subtypes of PTSET can be defined to represent general classes of point sets (e.g., 2D and 3D point sets), and specific types of objects within those general classes (e.g., 2D lines and curves, 3D solids). (Points and intervals can also be defined in the same way to represent temporal objects). For each subtype, appropriate functions can be defined to represent the properties and operations appropriate to the type of object being represented. Note that both spatial and non-spatial attributes (such as " P A R T ~ " ) to be associated with the same database entities in a straightforward way.
Our implementation approach is to tailor fast graph traversal algorithms for various flavors of traversal recursion, which differ depending upon properties of the graph (is it acyclic or cyclic? is it disjoint or not?), properties of the functions (e.g., whether they are monotonic, whether components are spatially enclosed within their parent objects), and properties of the computations, which are gleaned from the user's query and the metadata. See [ROSE86] for details of these techniques.
Attributes, such as COLOR or DENSITY, that vary over the shape of the part may be handled in two ways. First, the attribute, e.g. COLOR, can be defined as a multiargument function (in this case, as taking a part and a portion of its boundary, and returning a color value). Alternatively, a separate entity could be defined. In this model, object versions (which have received special treatment in some models) are treated as temporal objects. The model is general enough to support many different notions of space and time.
Specific forms of recursion are integrated into the model as multiargument functions. These functions can then be invoked in the usual way. The parameters for these functions, and their semantics, are given by the "recursion templates" defined in [ROSE86].
Our approach to implementing spatial objects is to use simple approximate geometric representations (based on space-filling curves) and corresponding fast query processing algorithms that produce approximate answers to spatial queries. These approximate techniques work with a set of objects at a time and can dramatically reduce the amount of work that must subsequently be done by specialized processors that work with one or two objects at a time and manipulate more detailed representations to refine the approximate answers. Temporal data is handled as a onedimensional special case [OREN86].
4.4 G l o b a l Q u e r y O p t i m i z a t i o n Query optimization in PROBE will adopt a compromise between the approach to query optimization in distributed optimizers [DANI82], which can obtain some knowledge of the file structures, query processing algorithms, and cost statistics of the individual local processors, and the pure abstract data type approach or object-oriented approach, in which the implementation of every abstract type and its operations is a black box to the optimizer. In the P R O B E design, while the implementations of the abstract types and their operations are, indeed, hidden from the end user, each abstract type definition includes an interface to the global optimizer in which some information is revealed. This information includes the algebraic properties of the operations supported (e.g., commutativity, associativity, distributivity), and cost and result size estimates. The global optimizer will utilize this information in constructing efficient global execution plans. This will make the system extensible, in t h a t new types and new operations can be added easily, while at the same time permitting the exploitation of new functionally specialized components for improved performance. Another area of interest is the optimization of the multiple related queries that can result during the construction of complex output objects using the "restructuring" capabilities of the algebra.
Since metadata is also represented by entities and functions, extensibility can also be applied to metadata. This might be used to create multiple metadata-data levels, although this has not yet been investigated in any great detail. 4.2 R u l e S p e c i f i c a t i o n A general mechanism for specifying rules is also included in the PROBE data model and language. This mechanism is used to specify rules for propagating changes to derived functions, rules for propagating operations on some entities to other entities to which they are related (hence, for propagating operations from a complex object to its components, and vice versa), rules for specifying how and when to check constraints, rules for invoking exception handlers, etc.
4.5 C o n t e x t s
The PDM view mechanism involves a generalization that we
23
call "contexts". The motivation for the facility is that adding spatial/temporal capabilities may greatly increase database sizes and their structural complexity. It is felt that an improved method of organizing this data for users will be required. For example, in a conventional database system, only "current" data is available. Using PDM temporal facilities, data about the state at multiple points in time will be available. The simultaneous availability of data as of multiple points in time may prove rather confusing. Similar comments apply regarding multiple spatial perspectives about the same entities.
6. References
[BUNE82] Buneman, P., R.E. Frankel, and R. Nikhil, "An Implementation Technique for Database Query Languages," A C M Trans. Database Systems, 7, No. 2 (June 1982). [CHAN82] Chart, A., et al, "Storage and Access Structures to Support a Semantic Data Model," 8th Intl. Conf. on Very Large Data Bases,, Mexico City, 1982. [CODD79] Codd, E.F., "Extending the Database Relational Model to Capture More Meaning," A CM Trans. Database Systems, $, No. 4 (December 1979).
In our approach, a context is a generalization of the conventional database view facility that allows switching between views during operations, while retaining some data and metadata from previous views in subsequent ones. There are similarities with the "consult" facility in Prolog. At various points in a Prolog program, new rules and facts may be read into the active database of the program by specifying "consult(filename)" in a rule. These new facts and rules constitute additions (or changes) to the "knowledge" available.
[DANI82] Daniels, D., et. al., "An Introduction to Distributed Query Compilation in R*," Proc. 2nd Intl. Syrup. o n Distributed Databases, Berlin, September 1982. [DAYA85] Dayal, U., et.al., "PROBE - - A Research Project in Knowledge-Oriented Database Systems: Preliminary Analysis," Technical Report CCA-85-03, Computer Corporation of America, July 1985.
Thus, at any time, a user would be operating in a specific "context". This context is a collection of data and metadata that defines, for that user: a. what entities exist; b. what functions can be applied to them; c. what the definitions of those functions are;
[DAYA86] Dayal, U., and J.M. Smith, "PROBE: A KnowledgeOriented Database Management System," to appear in M.L. Brodie and J. Mylopoulos (eds.), On Knowledge Base Management Systems: Integrating Artificial Intelligence and Database Technologies, Springer-Verlag, 1986.
Users may change contexts, and in doing so acquire access to additional or different entities, additional functions that may be applied to entities, or changed definitions of various functions. For example, an entity may be in one type in one context, and in a different type in another. A context change could be used, for example, to change the scale of a map, or to zoom in on a portion of it. In this case, the same entities might exist in both contexts, but the definitions of their functions having spatial values (and thus the values themselves) would change. Also, new entities (representing features not visible in the first scale) might "become visible" in the new context.
[LOCH86] Lochovsky, F. (ed.), Database Engineering, Vol. 8, No. 4, Special Issue on Object-Oriented Systems, 1986. [LORI83] Lorie, R.A., and W. Plouffe, "Complex Objects and Their Use in Design Transactions,," Proc. 1983 ACM Engineering Design Applications, San Jose, CA (May 1983). [MANO86] Manola, F.A., Spatial Data Proc. Twelfth Kyoto, August
The context facility is currently under development. It requires at minimum the ability to define contexts and relationships between them, and to switch contexts during database operations.
and J.A. Orenstein, "Toward a General Model for an Object-Oriented DBMS," Intl. Conf. on Very Large Data Bases, 1986.
[MERR84] Merrett, T.H., Relational Information Systems, Reston, 1984.
5. O t h e r C u r r e n t W o r k Work related to PDM is ongoing in a number of areas, some of them mentioned in previous sections. We are continuing to investigate the semantics of function application in the context of set-oriented queries. We feel particularly that being able t o deal consistently with partial functions will be of assistance in integrating user-defined extensions (including functions defined for individual objects as opposed to whole types) into the framework. A breadboard implementation of the Probe Data Model and algebra, and of some query processing algorithms, is under way. This involves the definition of a number of specific extended entity types involving spatial semantics [MANO86].
[OREN86] Orenstein, J., "Spatial Query Processing in an ObjectOriented Database System," Proc. 1986 ACM-SIGMOD Intl. Conf. on Management of Data. [RIES83] Ries, D.R., et. al., "Decompilation and Optimization for ADAPLEX: A Procedural Database Language", Technical Report CCA-82-04, Computer Corporation of America, September 1983. [ROSE86] Rosenthal, A., et al., "A DBMS Approach to Recursion," Proc. 1986 ACM-SIGMOD Intl Conf. on Management of Data.
Acknowledgements We express our thanks to Alex Buchmann, David Goldhirsch, Sandra Heiler, Jack Orenstein, and Arnie Rosenthal for their contributions to the development of PDM.
24
[SHIP81] Shipman, D., "The Functional Data Model and the Data Language DAPLEX," A CM Trans. Database Systems, 6,1 (March 1981). [SMIT81] Smith, J.M., et al., "ADAPLEX Rationale and Reference
Manual," Technical Report CCA-83-08, Computer Corporationof America (May 1983). [STON83] Stonebraker, M., B. Rubenstein, and A. Guttman, "Application of Abstract Data Types and Abstract Indices to CAD Databases," Proc. Database Week: Engineering Design Applications, IEEE Computer Society, 1983.
25