Qd Dl

  • Uploaded by: api-19981450
  • 0
  • 0
  • July 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Qd Dl as PDF for free.

More details

  • Words: 2,349
  • Pages: 37
L05: Distributed Query Processing & Optimization  Query

Processing  Query Decomposition  Data Localization  Query Optimization

H.Lu/HKUST

Query Processing    



Any high-level query (SQL) on a database must be processed, optimized and executed by the DBMS The high-level query is scanned, and parsed to check for syntactic correctness An internal representation of a query is created, which is either a query tree or a query graph The DBMS then devises an execution strategy for retrieving the result of the query. (An execution strategy is a plan for executing the query by accessing the data, and storing the intermediate results) The process of choosing one out of the many execution strategies is known as query optimization

H.Lu/HKUST

L05: Distributed Query Processing -- 2

Query Processor A query processor is a module in the DBMS that performs the tasks to process, to optimize, and to generate execution strategy for a high-level query  For a DDBMS, the QP also does data localization for the query based on the fragmentation scheme and generates the execution strategy that incorporates the communication operations involved in processing the query 

H.Lu/HKUST

L05: Distributed Query Processing -- 3

Query Optimizer Queries expressed in SQL can have multiple equivalent relational algebra query expressions  The distributed query optimizer must select the ordering of relational algebra operations, sites to process data, and possibly the way data should be transferred. This makes distributed query processing significantly more difficult 

H.Lu/HKUST

L05: Distributed Query Processing -- 4

Complexity of Relational Algebra Operations 



The relational algebra is used to express the output of the query. The complexity of relational algebra operations play a role in defining some of the principles of query optimization. All complexity measures are based on the cardinality of the relation Operations Complexity  Select, Project (w/o duplicate elimination) O(n)  Project (with duplicate elimination), Group O(n logn)  Join, Semi-join, Division, Set Operators O(n logn)  Cartesian Product O(n2 )  This was given in the book (p194). It is over simplified.

H.Lu/HKUST

L05: Distributed Query Processing -- 5

Characteristics of Query Processors 



Languages  Input language can be relational algebra or calculus; output language is relational algebra (annotated with communication primitives). The query processor must efficiently map input language to output language Types of Optimization  The output language specification represents the execution strategy. There can be many such strategies, the best one can be selected through exhaustive search, or by applying heuristic (minimize size of intermediate relations). For distributed databases semi-joins can be applied to reduce data transfer.

H.Lu/HKUST

L05: Distributed Query Processing -- 6

When to Optimize 





Static: done before executing the query (at compilation time), cost of optimization amortized over multiple executions, mostly based on exhaustive search. Since sizes of intermediate relations need to be estimated, it can result in sub-optimal strategies. Dynamic: done at run time; every time the query is executed, can make use of exact sizes of intermediate relations, expensive, based on heuristics Hybrid: mixes static and dynamic approaches; the approach is mainly static, but dynamic query optimization may take place when high difference between predicted and actual sizes are detected

H.Lu/HKUST

L05: Distributed Query Processing -- 7

Characteristics of Query Processors 





Statistics  fragment cardinality and size  size and number of distinct values for each attribute. detailed histograms of attribute values for better selectivity estimation. Decision Sites  one site or several sites participate in selection of strategy Exploitation of network topology  wide area network - communication cost  local area network - parallel execution

H.Lu/HKUST

L05: Distributed Query Processing -- 8

Characteristics of Query Processors 

Exploitation of replicated fragments  larger number of possible strategies  Use of Semijoins  reduce size of data transfer  increase # of messages and local processing  good for fast or slow networks?

H.Lu/HKUST

L05: Distributed Query Processing -- 9

Layers of Query Processing Calculus Query on Distributed Relations QUERY DECOMPOSITION

CONTROL SITE

GLOBAL SCHEMA

Algebra Query on Distributed Relations DATA LOCALIZATION

FRAGMENT SCHEMA

Fragment Query GLOBAL OPTIMIZATION

LOCAL SITE H.Lu/HKUST

STATISTICS ON FRAGMENTS

Optimized Fragment Query With Communication Operations LOCAL OPTIMIZATION

LOCAL SCHEMA

Optimized Local Queries L05: Distributed Query Processing

-- 10

L05: Distributed Query Processing & Optimization Query Processing  Query Decomposition  Data Localization  Query Optimization 

H.Lu/HKUST

Query Decomposition 







Normalization  The calculus query is written in a normalized form (CNF or DNF) for subsequent manipulation Analysis  The query is analyzed for semantic correctness Simplification  Redundant predicates are eliminated to obtain simplified queries Restructuring  The calculus query is translated to optimal algebraic query representation

H.Lu/HKUST

L05: Distributed Query Processing -- 12

Query Decomposition: Normalization 



Lexical and syntactic analysis  check validity  check for attributes and relations  type checking on the qualification There are two possible forms of representing the predicates in query qualification: Conjunctive Normal Form (CNF) or Disjunctive Normal Form (DNF)  CNF: (p11 ∨p12 ∨... ∨p1n) ∧... ∧(pm1 ∨pm2 ∨... ∨pmn)  DNF: (p11 ∧p12 ∧... ∧p1n) ∨... ∨(pm1 ∧pm2 ∧... ∧pmn)  OR's mapped into union  AND's mapped into join or selection

H.Lu/HKUST

L05: Distributed Query Processing -- 13

Query Decomposition: Analysis 





Queries are rejected because  the attributes or relations are not defined in the global schema; or  operations used in qualifiers are semantically incorrect For only those queries that do not use disjunction or negation semantic correctness can be determined by using query graph One node of the query graph represents result sites, others operand relations, edge between nodes operand nodes represent joins, and edge between operand node and result node represents project

H.Lu/HKUST

L05: Distributed Query Processing -- 14

Query Graph and Join Graph SELECT Ename, Resp FROM E, G, J WHERE E. ENo = G. ENO AND G.JNO = J.JNO AND JNAME = ``CAD'' AND DUR >= 36 AND Title = ``Prog'' JNAME = ``CAD''

PROJ

EMP

PROJ

E. ENo = G. ENO

G.JNO = J.JNO

ASG Result Resp

Ename EMP

Title = ``Prog''

E. ENo = G. ENO G.JNO = J.JNO ASG DUR >= 36 H.Lu/HKUST

L05: Distributed Query Processing -- 15

Disconnected Query Graph 

Semantically incorrect conjunctive multivariable query without negation have query graphs which are not connected

SELECT Ename, Resp FROM E, G, J WHERE E. ENo = G. ENO AND JNAME = ``CAD'' AND DUR >= 36 AND Title = ``Prog''

Result

Ename

Resp

EMP

E. ENo = G. ENO ASG

PROJ

DUR >= 36 H.Lu/HKUST

Title = ``Prog''

JNAME = ``CAD''

L05: Distributed Query Processing -- 16

Simplification: Eliminating Redundancy 

Elimination of redundant predicates using well known idempotency rules: p ∧p = p; p ∨p = p; p ∨true = true; p ∨false = p; p ∧true = p; p ∨false = false; p1 ∧(p1 ∨p 2 ) = p1; p1 ∨(p1 ∧p 2 ) = p1



Such redundant predicates arise when user query is enriched with several predicates to incorporate view­ relation correspondence, and ensure semantic integrity and security

H.Lu/HKUST

L05: Distributed Query Processing -- 17

Eliminating Redundancy-- An Example SELECT TITLE FROM E WHERE (NOT (TITLE = ``Programmer'') AND (TITLE = ``Programmer'' OR TITLE = ``Elec.Engr'') AND NOT (TITLE = ``Elec.Engr'')) OR ENAME = ``J.Doe''; SELECT TITLE FROM E WHERE ENAME = ``J.Doe''; H.Lu/HKUST

L05: Distributed Query Processing -- 18

Eliminating Redundancy-- An Example p1 = <TITLE = ``Programmer''> p2 = <TITLE = ``Elec. Engr''> p3 = <ENAME = ``J.Doe''> Let the query qualification is (¬ p1 ∧(p1 ∨p2) ∧¬ p2) ∨p3 The disjunctive normal form of the query is = (¬ p1 ∧p1 ∧¬p2) ∨(¬ p1 ∨p2 ∧¬ p2) ∨p3 = (false ∧¬ p2) ∨(¬ p1 ∧false) Ú p3 = false ∨false ∨p3 = p3 H.Lu/HKUST

L05: Distributed Query Processing -- 19

Query Decomposition: Rewriting 

Rewriting calculus query in relational algebra;  straightforward transformation from relational calculus to relational algebra, and  restructuring relational algebra expression to improve performance

H.Lu/HKUST

L05: Distributed Query Processing -- 20

Rewriting -- Transformation Rules (I) 





Commutativity of binary operations: R× S ⇔ S× R R S ⇔S R R∪S ⇔ S∪R Associativity of binary operations: (R × S) × T ⇔ R × ( S × T ) (R S) T ⇔ R (S T) Idempotence of unary operations: grouping of projections and selections  Π A’ ( Π A’’ (R )) ⇔ Π A’ (R ) for A’⊆A’’⊆ A σ

H.Lu/HKUST

p1(A1)



p2(A2)

(R )) ⇔ σ

p1(A1) ∧p2(A2)

(R ) L05: Distributed Query Processing -- 21

Rewriting -- Transformation Rules (II) 

Commuting selection with projection Π A1,





…, An



p(Ap)

(R )) ⇔ Π A1,



…, An

p(Ap)

( Π A1,

…, An, Ap

(R )))

Commuting selection with binary operations σ

p(Ai)

(R × S) ⇔ (σ

σ

p(Ai)

(R

σ

p(Ai)

(R ∪ S) ⇔ σ

S) ⇔ (σ

p(Ai) p(Ai) p(Ai)

(R)) × S (R)) (R) ∪ σ

S p(Ai)

(S)

Commuting projection with binary operations Π C(R × S) ⇔ Π A(R) × Π B (S) C = A ∪ B Π C(R

S) ⇔ Π C(R)

Π C (S)

Π C (R ∪ S) ⇔ Π C (R) ∪ Π C (S) H.Lu/HKUST

L05: Distributed Query Processing -- 22

An SQL Query and Its Query Tree Π ENAME σ

(ENAME<>“J.DOE”)

SELECT Ename FROM J, G, E WHERE G.Eno=E.ENo AND G.JNo = J.JNo AND ENAME <> `J.Doe' AND JName = `CAD' AND (Dur=12 or Dur=24) H.Lu/HKUST

∧(JNAME=“CAD/CAM”)

∧(Dur=12

∨Dur=24)

JNO

PROJ

ENO

ASG

EMP

L05: Distributed Query Processing -- 23

Query Decomposition: Rewriting Π ENAME JNO

Π σ

Π

JNO

ENO JNAME=“CAD/CAM”

Π σ

PROJ H.Lu/HKUST

JNO,ENAME

Π

JNO,ENO

Dur=12

∨Dur=24

ASG

σ

ENO,ENAME

ENAME<>“J.DOE”

EMP L05: Distributed Query Processing -- 24

L05: Distributed Query Processing & Optimization Query Processing  Query Decomposition  Data Localization  Query Optimization 

H.Lu/HKUST

Data Localization Localization program  Given an algebraic query on global schema  Determine which fragments are involved  A naïve way to localize a distributed query  Substitute each global relation with its localization program  generic query  Use reduction techniques for efficiency 

H.Lu/HKUST

L05: Distributed Query Processing -- 26

Reduction for HF Remove empty relations generated by contradicting selection on horizontal fragments;  Remove useless relations generated by projections on vertical fragments;  Distribute joins over unions in order to isolate and remove useless joins 

H.Lu/HKUST

L05: Distributed Query Processing -- 27

Data Localization-- An Example EMP is fragmented into EMP1 = σ ENO ≤ “E3” (EMP)

Π ENAME σ σ σ

Dur=12

EMP2 = σ

∨Dur=24

EMP3 = σ

ENAME<>“J.DOE”

“E3”<ENO ENO>“E6”

≤ “E6”

(EMP)

(EMP)

ASG is fragmented into ASG1 = σ ENO ≤ “E3” (ASG)

JNAME=“CAD/CAM”

ASG2 = σ

JNO

ENO>“E3”

(ASG)

ENO

PROJ ASG1 H.Lu/HKUST



∪ ASG2 EMP1

EMP1

EMP1

L05: Distributed Query Processing -- 28

Reduction with Selection EMP is fragmented into EMP1 = σ ENO ≤ “E3” (EMP)

SELECT * FROM EMP WHERE ENO=“E5”

EMP2 = σ EMP3 = σ

≤ “E6”

“E3”<ENO ENO>“E6”

(EMP)

(EMP)

Given Relation R, FR={R1, R2, …, Rn} where Rj =σ pj (R) σ pj (Rj) = ∅ if ∀x ∈ R: ¬(pi(x)∧pj(x)) σ

σ

H.Lu/HKUST

σ



ENO=“E5”

EMP

ENO=“E5”

EMP1

EMP2

EMP3

ENO=“E5”

EMP2

L05: Distributed Query Processing -- 29

Reduction with join SELECT * FROM EMP, ASG WHERE EMP.ENO=ASG.ENO ENO

ASG



∪ ASG1

ASG2 EMP1

EMP1

EMP1

EMP

ASG is fragmented into ASG1 = σ ENO ≤ “E3” (ASG) ASG2 = σ

ENO

ENO>“E3”

H.Lu/HKUST

(ASG)

EMP is fragmented into EMP1 = σ ENO ≤ “E3” (EMP) EMP2 = σ EMP3 = σ

“E3”<ENO ENO>“E6”

≤ “E6”

(EMP)

(EMP)

L05: Distributed Query Processing -- 30

Reduction with Join (I) (R1 ∪ R2) S ⇔ (R1 S) ∪ (R2

ENO



∪ ASG1

ASG2 EMP1

EMP2

S)

EMP3

∪ ENO

ENO

ENO

ENO

ENO

ENO

EMP1 ASG1 EMP1 ASG2 EMP2 ASG1 EMP2 ASG2 EMP3 ASG1 EMP3 ASG2

H.Lu/HKUST

L05: Distributed Query Processing -- 31

Reduction with Join (II) ∪ ENO

EMP1 ASG1

ENO

EMP2 ASG2

ENO

EMP3 ASG2

Given Ri =σ pi (R) and Rj =σ pj (R) Ri

Rj = ∅ if ∀x ∈ Ri , ∀y∈ Rj: ¬(pi(x)∧pj(y)) Reduction with join 1. Distribute join over union 2. Eliminate unnecessary work

H.Lu/HKUST

L05: Distributed Query Processing -- 32

Reduction for VF 

Find useless intermediate relations Relation R defined over attributes A = {A1, A2, …, An} vertically fragmented as Ri =Π A’ (R) where A’⊆ A Π K,D (Ri) is useless if the set of projection attributes D is not in A’

EMP1= Π ENO,ENAME EMP2= Π ENO,TITLE SELECT ENAME FROM EMP H.Lu/HKUST

(EMP) (EMP)

Π ENAME

Π ENAME

ENO

EMP1

EMP2

EMP1

L05: Distributed Query Processing -- 33

Reduction for DHF Distribute joins over union Apply the join reduction for horizontal fragmentation EMP1: σ

TITLE=“Programmer”

(EMP)

EMP2: σ

TITLE ≠ “Programmer”

(EMP)

ASG1: ASG ASG2: ASG

ENO

EMP1

ENO

EMP2

SELECT * ASG1 FROM EMP, ASG WHERE ASG.ENO = EMP.ENO AND EMP.TITLE = “Mech. Eng.” H.Lu/HKUST

ENO

σ

TITLE=“MECH.Eng.”



∪ ASG2 EMP1

EMP2

L05: Distributed Query Processing -- 34

Reduction for DHF (II) Selection first

ENO

σ

Joins over union TITLE=“Mech. Eng.”

∪ ASG1

ASG2



EMP2

ENO

σ

ENO

σ

TITLE=“Mech. Eng.”

ASG2 ASG1 H.Lu/HKUST

EMP2

ASG1

ENO

σ

TITLE=“Mech. Eng.”

EMP2

ASG2 ASG1

TITLE=“Mech. Eng.”

EMP2

L05: Distributed Query Processing -- 35

Reduction for HF --An Example EMP1 = σ

ENO ≤ “E4”

(Π ENO,ENAME

(EMP))

EMP2 = σ

ENO>“E4”

(Π ENO,ENAME

(EMP))

EMP3 = Π ENO,TITLE

Π ENAME

(EMP)

σ

Π ENAME

QUERY SELECT ENAME FROM EMP WHERE ENO = “E5”

σ

ENO=“E5”

EMP2 ENO=“E5”

ENO

∪ EMP1 ASG1 H.Lu/HKUST

EMP2

EMP3

L05: Distributed Query Processing -- 36

Summary Query decomposition is done as in a centralized DBMS  A lot of logic rules about transformations  Data localization happens only in a distributed DBMS  Use fragmentation characteristics to simplify the queries on fragments 

H.Lu/HKUST

L05: Distributed Query Processing -- 37

Related Documents

Qd Dl
July 2020 7
Dl
June 2020 53
Dl
June 2020 56
Qd Doanctac
November 2019 12
Qd-416
October 2019 14
Dl-101c / Dl-102c
May 2020 62