Generic Model Management

  • Uploaded by: vthung
  • 0
  • 0
  • April 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Generic Model Management as PDF for free.

More details

  • Words: 1,946
  • Pages: 38
Generic Model Management: A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Research Sept. 15, 2003

© 2003 Microsoft Corporation

1

Meta Data Management 

Meta data = structural information 

DB schema, interface defn, web site map, form defns, … UML Architecture

Forms ER Diagram Customer

Order

Product

Scheduled Delivery

Salesperson

Business Rules Emp.Sal < Emp.Mgr.Sal

Business Process Update Marketing

Authorize Credit

Order Entry

Bill Customer

Table Defns

Schedule Delivery Inventory

VB interfaces

Sept. 15, 2003

C++ interfaces

© 2003 Microsoft Corporation

2

Meta Data Problems  

They all involve schemas and mappings E.g., data translation between data models Hierarchical Schema PO PO # POdate

Relational Schema PurchaseOrder OrdIDOrderDate

POLines Prod# PName Sept. 15, 2003

Items Item#OrdIDI_Name © 2003 Microsoft Corporation

3

Such Problems are Pervasive 

Data translation



Schema evolution & data migration



XML message translation for e-commerce



Integrate custom apps with commercial apps



Data warehouse loading (clean & transform)



Design tool support (DB, UML, …)



OO or XML wrapper generation for SQL DB



Semantic web

Sept. 15, 2003

© 2003 Microsoft Corporation

4

Meta Data Solutions 

Solutions strongly resemble each other, but  usually are problem-specific  usually are language-specific SQL, ODMG, UML, XML, RDF, ….  usually involve a lot of object-at-a-time programming



Goals  

Generic solutions “Set”-at-a-time programming

Sept. 15, 2003

© 2003 Microsoft Corporation

5

Model Management 

A generic approach to meta data mgmt



Model Mgmt operators manipulate models and mappings as bulk objects 

Their representation is generic



Operators - Match, Merge, Diff, Compose



Avoids problem-specific and languagespecific solutions



Avoids object-at-a-time programming

Sept. 15, 2003

© 2003 Microsoft Corporation

6

Models and Mappings A model is a rooted directed graph, which represents a complex information structure. Relational Emp Schema

map1

E#

Emp E#

Dept#

Dept#

Name

Name

A mapping is a model that represents a transformation Sept. 15, 2003 © 2003 Microsoft Corporation

XSD

First Last 7

Models and Mappings A model is a rooted directed graph, which represents a complex information structure. map1

Relational Emp Schema

Emp E#

E# Dept#

Dept#

Name

Name

Or it could be a binary table (a morphism) Sept. 15, 2003

XSD

© 2003 Microsoft Corporation

First Last 8

Model Mgmt Algebra 

map = Match (M1, M2)



<M3, map13, map23> = Merge (M1, M2, map)



map3 = Compose(map1, map2)



<M2, map12> = Diff(M1, map)



<M2, map12> = ModelGen(M1, metamodel2)



Sept. 15, 2003

© 2003 Microsoft M = Copy( M )Corporation

9

Outline  Introduction to Model Management  Using MM to solve meta data problems  Matching anatomy ontologies  Model merging  Wrap-up

Sept. 15, 2003

© 2003 Microsoft Corporation

10

Categorizing Meta Data Problems 

Model mapping

M1

M2



Data translation



XML message translation for e-commerce



Integrate custom apps with commercial apps Data warehouse loading (clean & transform)





map12

Solution is the match “operator” 

Sept. 15, 2003

Really a CAD system for mapping generation

© 2003 Microsoft Corporation

11

Categorizing M D Problems (2) 

Model integration

map12

M1 m

ap

23

ap

M2

m

13

M3





View integration



Data integration

Solution is the Merge operator

Sept. 15, 2003

© 2003 Microsoft Corporation

12

Categorizing M D Problems (3) 

Model and mapping generation

M1



map12

M2



Design tools (ER → SQL)



Wrapper generation (SQL → OO or XML)

Solution is the ModelGen operator 

<M2, map12> = ModelGen(M1, metamodel2)

Sept. 15, 2003

© 2003 Microsoft Corporation

13

Categorizing M D Problems (4) 



Change propagation

M1

map12

M2

M1′

map12

M2′



Schema evolution



Required maintenance for all meta data problems

Solution requires the rest of MM algebra

Sept. 15, 2003

© 2003 Microsoft Corporation

14

Change Propagation 

Given  



map1 between xsd1 and SQL schema rdb1 xsd2, a modified version of xsd1

Produce  

rdb2 to store instances of xsd2 a mapping between xsd2 and rdb2

1. map2

xsd1

map1 m . 2

rdb1

2. map3 = map2 • map1

3 p a

3. <map4, rdb3 > = Copy(map3)

map4 rdb3 rdb2 xsd2 3.map Sept. 15, 2003

1. map2= Match(xsd1, xsd2)

Now we need to merge Diff(xsd2,map4) into rdb3

© 2003 Microsoft Corporation

15

Change Propagation (cont’d) map1 m . 2

3 p a

xsd2 3. map4 4. map5

rdb1

7. = Merge(rdb3, rdb4, map7) rdb3 7. m 6. map7

1. map2

xsd1

4. <xsd2′, map5> = Diff(xsd2,map4) 5. = ModelGen(xsd2′, SQL) 6. map7 = map4 • map5 • map6

xsd2′ 5. map6 rdb4 7. Sept. 15, 2003

ap

9

rdb2

p8 a m

© 2003 Microsoft Corporation

16

Complete Script in Rondo Operator Definition: PropagateChanges(s1, d1, s1_d1, s2, c, s2_c) 1. s1_s2 = Match(s1, s2); 2. 〈d1′, d1′_d1〉 = Delete(d1, Traverse(All(s1) − Domain(s1_s2), s1_d1)); 3. 〈c′, c′_c〉 = Extract(c, Traverse(All(s2) − Range(s1_s2), s2_c)); 4. c′_d1′ = c′_c ∗ Invert(s2_c) ∗ Invert(s1_s2) ∗ s1_d1 ∗ Invert(d1′_d1); 5. 〈d2, c′_d2, d1′_d2〉 = Merge(c′, d1′, c′_d1′); •

s2_d2 = s2_c ∗ Invert(c′_c) ∗ c′_d2 + Invert(s1_s2) ∗ s1_d1 ∗ Invert(d1′_d1) ∗ d1′_d2;

7. return 〈d2, s2_d2〉; Operator Use: SQLXSD: PropagateChanges(s1, d1, s1_d1, s2, ModelGen(s2, XSD));

Sept. 15, 2003

© 2003 Microsoft Corporation

17

Status Report 

Previous scenario is executable in Rondo, the first complete MM prototype 



There are many prototypes for Match 



[Rahm & Bernstein, VLDB J., Dec. 2001]

Detailed design for Merge 



[Melnik et al, SIGMOD 2003]

[Pottinger & Bernstein, VLDB 2003]

There are several efforts on a formal semantics for MM operators

Sept. 15, 2003

© 2003 Microsoft Corporation

18

Outline  Introduction to Model Management  Using MM to solve meta data problems  Matching anatomy ontologies  Model merging  Wrap-up

Sept. 15, 2003

© 2003 Microsoft Corporation

19

Schema Matching Algorithms 

About a dozen published algorithms     



Many good ideas, but none are robust 



Schema-based vs. content-based Per-element vs. structural Linguistic vs. constraint-based Independently-developed schemas vs. incrementally-modified schemas Hybrid vs. composite Human review and input is essential

User interface is also quite important

Sept. 15, 2003

© 2003 Microsoft Corporation

20

Matching Anatomy Ontologies  Match

two human anatomy ontologies

 FMA

– Univ. of Washington  Galen CRM – Univ. of Manchester (UK)  By Peter Mork (Univ. of Washington)  Both models are big  Ultimate

goal was finding differences  Like most match algorithms, ours calculates a similarity score for the m× n pairs of elements Sept. 15, 2003

© 2003 Microsoft Corporation

21

Aligning Representations generic part

FMA: generic Cardiac Heart part valve

Heart

CRM: Heart Heart sensibly hasStructuralComponent ValveInHeart sensibly Sept. 15, 2003

© 2003 Microsoft Corporation

Cardiac valve

h-S-C

Valve In Heart 22

Anatomy Matching Algorithm Lexical Match

1. •

• • •

Normalize string, UMLS dictionary lookup, convert to concept-ID from thesaurus String comparison → 306 matches Adding spaces, ignoring case → 1834 matches Lexical tools → 3503 matches

Sept. 15, 2003

© 2003 Microsoft Corporation

23

Anatomy Matching Example S: similarity score

generic part

S = 2/15

Heart

Cardiac valve

Heart

h-S-C

sensibly

Valve In Heart

S=1

Sept. 15, 2003

© 2003 Microsoft Corporation

S=1

24

Anatomy Matching Algorithm Lexical Match

1. •

Structure Match

2. • •



Normalize string, UMLS dictionary lookup, convert to concept-ID from thesaurus Similarity(reified nodes) = Average(neighbors) Back-propagate to neighbors

Adds 64 matches (to previous 3503) • Implies 875 reified relationship matches Sept. 15, 2003

© 2003 Microsoft Corporation

25

Anatomy Matching Example S: similarity score

generic part Cardiac valve

Heart S=1

Sept. 15, 2003

S = 2/15

S = 2/3 Heart

h-S-C

sensibly

Valve In Heart

© 2003 Microsoft Corporation

S=1

26

Anatomy Matching Algorithm Lexical Match

1. •

Structure Match

2. • •

Similarity(reified nodes) = Average(neighbors) Back-propagate to neighbors

Align Super-classes

3. • •

Normalize string, UMLS dictionary lookup, convert to concept-ID from thesaurus

Super-class similarity = average similarity of children, grandchildren, great-grandchildren Adds 213 matches (to 3567)

Sept. 15, 2003

© 2003 Microsoft Corporation

27

Some Lessons 

A common encoding of models is hard and involves compromises  



Match needs to invent generalizations  



Different styles of reifying relationships CRM stores transitive relationships In FMA, arterial supply, venous drainage, nerve supply, lymphatic drainage In CRM, these all map to isServedBy

On big models, Match is expensive

Some steps required days to execute  Cross-product filled 80 GB (< 1GB input). Sept. 15, 2003 © 2003 Microsoft Corporation 

28

Outline  Introduction to Model Management  Using MM to solve meta data problems  Matching anatomy ontologies  Model merging  Wrap-up

Sept. 15, 2003

© 2003 Microsoft Corporation

29

Merge(M1, M2, map) Return the union of models M1 and M2



 

Use map to guide the Merge If elements x = y in map, then collapse them into one element



Emp

map

Emp

AddrName

=

NamePhone

Sept. 15, 2003

© 2003 Microsoft Corporation

Emp Addr NamePhone

30

Merge(M1, M2, map)  [Buneman,   

Meta-model has aggregation & generalization only Union, and collapse objects having the same name Fix-up step for inconsistencies created by merging X a Y

  

Davidson, Kosky, EDBT 92]

a

a

X a

Y

Z

W

X a Z

X

Y

Z

Successive fixups lead to different results  Batch them at the end, to get a unique minimal result Now enrich the meta-model (containment, complex mappings, …) & merge semantics (conflicts, deletes) Sept. 15, 2003

© 2003 Microsoft Corporation

31

Resolving Merge Conflicts

Meta Meta Model Conflict Model Conflict

1

EmployeeID

Name

2

FirstName

3

6

Sept. 15, 2003

Employee

Emp#

5

Meta Model Conflict

mapee

Emp

7

4

LastName

Emp Emp# Name FirstNameLastName © 2003 Microsoft Corporation

8 9 10 11 32

Contributions to Merge

[Pottinger & Bernstein, VLDB 03]      

Generic correctness criteria for Merge Use of first-class input mapping (not just correspondences) Taxonomy of conflicts & resolution strategies Characterize when Merge can be automatic A merge algorithm for an EER representation Experimental evaluation

Sept. 15, 2003

© 2003 Microsoft Corporation

33

An Approach to ModelGen [Atzeni & Torlone, EDBT '96] 

Meta-models are made of patterns Object has sub-object (a)



Aggregation has key (c)

Define pattern transformations as rules 



Aggregation has attributes (b)

For XSD→SQL,

+

To translate Ms into meta-modelt (MMt), 

Apply rules that replace patterns in MS that are not in MMt by patterns that are in MMt

Sept. 15, 2003

© 2003 Microsoft Corporation

34

ModelGen Research    

More complete repertoire of patterns Make patterns more generic Integrate with rules engine (avoid cycles, control search) Implement it

Sept. 15, 2003

© 2003 Microsoft Corporation

35

What Next?       

Add semantics to mappings Thorough formal semantics of operators Industrial strength schema matching More and bigger applications More prototypes More operators Better user interfaces

Sept. 15, 2003

© 2003 Microsoft Corporation

36

References  

http://www.research.microsoft.com/~philbe Overview  



Implementation 





Survey: Rahm & Bernstein , VLDB J., Dec. 2001 Prototype: Madhavan, Bernstein, & Rahm, VLDB 2001

Merge Operation 



Bernstein & Rahm, ER 2000

Match Operation 



Melnik, Rahm, & Bernstein, SIGMOD 2003

Data Warehouse Examples 



Bernstein, CIDR 2003 Bernstein, Halevy, & Pottinger, SIGMOD Record, Dec. 2000

Pottinger & Bernstein, VLDB 200337

Theory  

Alagić & Bernstein, DBPL 2001 Madhavan et al, AAAI 2002

Sept. 15, 2003

© 2003 Microsoft Corporation

37

Sept. 15, 2003

© 2003 Microsoft Corporation

38

Related Documents


More Documents from ""