SQL TUNING
TABLE OF CONTENTS PURPOSE OF THE DOCUMENT...................................................................... .........................3 DOCUMENT USAGE............................................................................................................ ........3 DETAILS OF DOCUMENT............................................................................................ ..............3 SQL TUNING OVERVIEW....................................................................................... ...................3 1. WHY IS SQL TUNING IMPORTANT?....................................................................................................3 2. SOME THINGS TO KNOW ABOUT SQL TUNING...................................................................................... 3 3. GENERAL TECHNIQUES FOR IMPROVING SQL PERFORMANCE................................................................4 4. SQL PROCESSING...........................................................................................................................5 SQL STANDARDS..................................................................................................... ....................6 THE INAPPROPRIATE WAY OF WRITING THE QUERY.....................................................................................6 THE APPROPRIATE WAY OF WRITING THE QUERY ...................................................................................... 6 1. MAKE THE MOST OFTEN USED COLUMN IN A WHERE CLAUSE AS THE LEADING COLUMN IN A CONCATENATED INDEX................................................................................ 8 2. AVOID USING UPPER OR LOWER FUNCTIONS ON COLUMNS WHICH ARE INDEXED.......................................................................................................................................9 3. AVOID USING SUBSTR FUNCTION ON INDEXED COLUMNS........................................9 4. USE EXISTS IN PLACE OF DISTINCT................................................................................ 10 5. WHICH IS BETTER - UNION ALL OR UNION?..................................................................11 6. IS HAVING CLAUSE COSTLY?............................................................................................12 7. USE CLUSTERS FOR TABLES WHICH ARE OFTEN USED TOGETHER IN A JOIN CONDITION................................................................................................................................13 8. USE DUMMY WHERE CLAUSES TO FORCE USE OF INDEXES ON CHARACTER COLUMNS...................................................................................................................................14 9. USE FUNCTION BASED INDEXES ON COLUMNS WHICH ARE OFTEN USED WITH A FUNCTION. ............................................................................................................................15 10. AVOID USING FUNCTIONS LIKE TRUNC ON INDEXED COLUMNS......................... 16 11. CONSIDER THE USE OF BIND VARIABLES ...................................................................17 12. CREATE INDEXES ON COLUMNS, WHICH ARE OFTEN USED FOR SORTING PURPOSES...................................................................................................................................17 13. USE EXISTS IN PLACE OF JOINS WHERE THE NUMBER OF SUCCESSFUL ROWS RETURNED FROM THE DRIVING TABLE IS SMALL..........................................................18 14. USE EITHER ROWID OR AN INDEXED COLUMN IN COUNT FUNCTION................ 19 15. USE DECODE WHEREVER POSSIBLE.............................................................................20 16. USE PARTITIONS FOR LARGE TABLES FOR FASTER RETRIEVAL OF DATA..........21 17. USE INDEX-ORGANIZED TABLES FOR QUERIES INVOLVING EXACT MATCH AND RANGE SEARCHES ESPECIALLY ON THE PRIMARY KEY. .................................... 22 ADDENDUM........................................................................................................... .....................23 1. HASH JOINS:................................................................................................................................23 2. CLUSTERS:...................................................................................................................................23 3. FUNCTION BASED INDEXES:............................................................................................................24 4. PARTITIONED TABLES:....................................................................................................................25 5. INDEX-ORGANIZED TABLES:...........................................................................................................25
Purpose of the document The document details the most cost effective way of writing SQL statements. Poorly written SQL statements account for 80% of performance problems. Using the guidelines detailed in this document the cost of SQL statements can be reduced drastically.
Document Usage The target audience of this document are : •
SQL Programmers/Developers
•
DBA’s
•
Technical Leads
•
Testing/Quality Assurance
Details of Document SQL Tuning Overview 1. Why is SQL tuning important? • • •
First in the series of steps undertaken for tuning the database. One of the least expensive steps. Generally highest possible performance gain
2. Some things to know about SQL tuning • The DBA & Developers should work together when tuning statements. During design and development, the application Developers can determine which combination of system resources and Oracle features best meet the performance goals and requirements as established by the business rules. The best possible execution time occurs with the usage of least amount of resources, which include I/O (logical and physical) to and from the database, CPU usage. • There are many tools that can assist in the tuning endeavor such as TKPROF, the SQL trace facility, SQL Analyze, Oracle Trace, and the Enterprise Manager Tuning Pack. But tuning SQL statements remains a trial and error process. • SQL tuning should be done prior to looking at database tuning effort. Until it is certain that all the applicable SQL statements are tuned, the system parameters should not be modified to adjust for poorly written SQL. • SQL tuning involves the least expense compared to hardware changes to increase the memory capacity of the systems.
• For SQL tuning to be accomplished, it is necessary to know what the current execution time is and what resources are needed to successfully execute the statement there by improving its performance. • It is very essential to write optimal queries so that the CPU resources can be equally shared among the various users connected to the systems in a fair manner. • Oracle execution path may not be the source of the performance problem; it is very likely the wrong approach to the problem has been followed. This may not even be the best path to follow. It is helpful if alternative ways of accomplishing the same tasks are looked into. High performance improvements may be achieved by making a simple statement work from a different angle. • The initial focus should be on the most offending SQL, as they will yield the best immediate returns.
3. General Techniques For Improving SQL Performance
Views • Optimizing the performance of views involves optimizing the SQL statement upon which the view is based and also the SQL that is likely to result when selection criteria are pushed up into the view. • Creating views containing hints can be a useful technique for optimizing queries and SQL generated by query tools. • Partition views can be created which reduces the overhead of scanning a substantive range of large tables. • Snapshots can be used to store the results of complex queries and allow the rapid retrieval of results, which may somewhat be out of range. • Performance of a simple snapshot refresh is improved by the creation of a snapshot log, providing that only a small proportion of the source table has been changed. • Snapshots can reduce the overhead of SQL dramatically for static tables. • Oracle sequences are an efficient mechanism of generating primary key values and should be used in preference to sequence tables or other mechanisms. • The DECODE operator can be used to perform complex aggregations, which might otherwise need to be performed via multiple queries.
Indexes
• Indexes on small tables (less than 100 rows) are totally useless, unless of course they are here to implement unique constraints (primary or unique keys). • Indexes for which there are less than 8 different key values should come under very close scrutiny (this of course refers to 'traditional' tree-type indexes, not bitmap ones). Here, the distribution of values is of course extremely important: If the values are roughly equally distributed, the index can be scrapped. If, however, searches are for keys, which belong to a small minority, then the index can be kept. It can make sense to have an index on a personnel table on the 'SEX' column if you are frequently searching for females in a predominantly male environment or vice versa. • Concatenated indexes should be concatenated from the most selective to the least selective column. • Indexes on single columns, which also appear as the first column in a concatenated index, are redundant. Database engines can use the concatenated index only when the first columns in the index appear in the search condition.
4. SQL Processing Here's an overview about the SQL processing which will help appreciate the need for the better-written SQL statements. SQL Processing Phases 1. 2. 3. 4. 5. 6.
Open Parse Bind Execute Fetch (Select statements) Close
The PARSE phase • • • • • •
Checks to see if the statement already exists in the shared pool Checks to see if the statement is valid SQL language Oracle will then look to see if the referenced objects exist. Oracle will see if the parsing user has permission on all underlying objects. Explodes any queries on views and attempts to simplify the statement by using a join. Picks an execution plan and stores it with the cursor
The BIND phase • Searches statement for bind variables • Assigns a value to the parsed statement? The execute phase • Oracle will then run the execution plan against the block buffers. • Oracle performs all the required input and output calls to gather the data for the DMLstatement. This phase may require Oracle to perform sorting operations. The FETCH phase
• • •
Retrieve all the rows for the select statement Possibly sort the retrieved records All returned rows are retrieved using an array fetch
SQL Standards It is essential to standardize the way the SQL statements are written. This not only ensures that identical statements performing the same tasks are shared but also provides the ease of reading the statements. If the list of employees whose salary is greater than the average salary of their respective department is to be found: -
The inappropriate way of writing the query select ename,job,sal,comm,hiredate,deptno from emp e1 where sal > (select avg(sal) from emp e2 where e2.deptno=e1.deptno);
The appropriate way of writing the query SELECT ename,job,sal,comm,hiredate,deptno FROM emp E1 WHERE sal > (SELECT AVG(sal) FROM emp E2 WHERE E2.deptno=E1.deptno); There is also a technical reason to use a set style throughout the organization. If the statement it is trying to parse, does not match a statement in the shared pool precisely, character by character, and case for case, Oracle will not re-use the parsed statement and instead will opt to create a new cursor. This means your statement will have to undergo the Oracle parse, bind, execute, and fetch phases again. Also the extra resources in the library cache will be added to maintain this copy of a statement. You may actually age out a statement frequently used if the shared pool is too small.
In the following section we would be looking into various situations. Under each scenario the appropriate way to write the query and an inappropriate way to write the query is discussed. It should be noted that both the compared queries return the same results, the only difference being the efficiency. The whole section deals with two tables, namely EMP and DEPT. Their structures are given below: EMP table EMPNO NOT NULL NUMBER(4) ENAME VARCHAR2(10) JOB VARCHAR2(9) MGR NUMBER(4) HIREDATE DATE SAL NUMBER(7,2) COMM NUMBER(7,2) DEPTNO NOT NULL NUMBER(2) DEPT Table DEPTNO DNAME LOC
NOT NULL NUMBER(2) VARCHAR2(14) VARCHAR2(13)
DEPTNO from the EMP table references the DEPT table. EMPNO column is the primary key in the EMP table and DEPTNO is the primary key in the DEPT table. EMP table has 9.5 lac rows. DEPT table has 4 rows. It is assumed that none of the columns have indexes unless specified. The cost depicted in the results may vary according to the size of the table and the number of distinct values in the columns. Also note that the tables used here are created in a test environment and more real time examples will be added later as and when they come up considering the fact that this is the first iteration of the document. It is also worth mentioning that Indexes should be made use of on column, only after carefully considering the frequency and type of activities on the column in question. The COST, OPERATIONS, OBJECT_NAME, OPTIONS columns are selected from the table, PLAN_TABLE, which is created by running the script utlxplan.sql (contact your DBA for creating this table).
GUIDELINE 1:
1. MAKE THE MOST OFTEN USED COLUMN IN A WHERE CLAUSE AS THE LEADING COLUMN IN A CONCATENATED INDEX. A composite index exists on the empno and deptno columns of the emp table(ie EMPNO_DEPTNO_INDX in which empno is the leading column). All employees belonging to department 10 have to be selected. Query 1 SELECT * FROM emp WHERE deptno = 10; OPERATION -----------------------------SELECT STATEMENT TABLE ACCESS
OPTIONS -----------------------------FULL
OBJECT_NAME COST ------------------------------ ---------822 EMP 822
Query 2 SELECT * FROM emp WHERE empno > 0 AND deptno = 10; OPERATION -------------------------------------SELECT STATEMENT TABLE ACCESS INDEX
OPTIONS OBJECT_NAME COST ---------------------------------------------------------------BY INDEX ROWID RANGE SCAN
293 EMP 293 EMPNO_DEPTNO_INDX 26
Inference There is a huge cost difference between the above two queries, almost a 62% reduction in the cost when the second approach is used. Thus, the leading column should be the most selective column, and it should also be the column most often used by limiting conditions in queries. So it is advisable to include a dummy WHERE clause for such queries as shown in the second query.
GUIDELINE 2:
2. AVOID USING UPPER OR LOWER FUNCTIONS ON COLUMNS WHICH ARE INDEXED. A non unique index exists on the ename column of the emp table (ie ENAME_INDX). All employees having the name 'KING' have to be selected. Query 1 SELECT ename FROM emp WHERE UPPER (ename) = 'KING'; or SELECT ename FROM emp WHERE LOWER(ename) = 'king'; OPERATION -----------------------------SELECT STATEMENT TABLE ACCESS
OPTIONS OBJECT_NAME ------------------------------ -----------------------------FULL
EMP
COST ---------822 822
Query 2 SELECT ename FROM emp WHERE ename='KING'; OPERATION -----------------------------SELECT STATEMENT INDEX
OPTIONS OBJECT_NAME COST ------------------------------ ------------------------------ ---------2 RANGE SCAN ENAME_INDX 2
Inference The cost is reduced by an incredible 99% when the use of functions in the WHERE clause is eliminated. Thus, even though the leading column of the index forms a part of the WHERE clause, the available index will not be used. This is because, an SQL function is applied to that column. In such cases, the Oracle Optimizer will not make use of the available index and will opt for a FULL scan on the table. The optimizer has to apply the function to the value in each row which satisfy the condition specified which increases the cost incurred.
GUIDELINE 3:
3. AVOID USING SUBSTR FUNCTION ON INDEXED COLUMNS.
A non unique index exists on the ename column of the emp table(ie ENAME_INDX). All employees with their names having the first two letters as 'KI' have to be selected. Query 1 SELECT ename FROM emp WHERE SUBSTR (ename,1,2) = 'KI'; OPERATION -----------------------------SELECT STATEMENT TABLE ACCESS
OPTIONS OBJECT_NAME COST ----------------------------------------------------------- ---------822 FULL EMP 822
Query 2 SELECT ename FROM emp WHERE ename LIKE 'KI%'; OPERATION -----------------------------SELECT STATEMENT INDEX
OPTIONS -------------------RANGE SCAN
OBJECT_NAME COST -------------------------------148 ENAME_INDX 148
Inference Here again, the cost is reduced by more than 82% by avoiding the use of SUBSTR function in the WHERE clause. As explained above, the Optimizer will not make use of the index even though it is available, since there is a function on the indexed column in the WHERE condition.
GUIDELINE 4:
4. USE EXISTS IN PLACE OF DISTINCT. The DISTINCT combination of deptno and dname columns have to be displayed. Only those rows need be displayed, whose deptno exists in the EMP table. Query 1 SELECT DISTINCT E.deptno,D.dname FROM emp E,dept D WHERE D.deptno = E.deptno;
OPERATION OPTIONS ------------------------------ -----------------------------SELECT STATEMENT SORT UNIQUE HASH JOIN TABLE ACCESS FULL TABLE ACCESS FULL
OBJECT_NAME COST ------------------------------ ---------14659 14659 1000 DEPT 1 EMP 822
Query 2 SELECT D.deptno,D.dname FROM dept D WHERE EXISTS ( SELECT 'X' FROM emp E WHERE E.deptno = D.deptno ); OPERATION -----------------------------SELECT STATEMENT FILTER TABLE ACCESS TABLE ACCESS
OPTIONS OBJECT_NAME ------------------------------ -----------------------------FULL FULL
DEPT EMP1
COST ---------1 1 822
Inference There is a reduction of more than 97% in estimated cost.In the query using the DISTINCT clause, the explain plan shows that there is a HASH join between the two tables. In this, the smaller of the two tables (in this case the DEPT table) is converted into a hash table and stored in memory. For every row of the larger table (EMP table) retrieved, the hash table is scanned to check for the join condition. Click here for a brief explanation on hash joins. The query using EXISTS uses a FILTER operation. The FILTER operation accepts a set of rows, eliminates some of them, and returns the rest. In the above case, the rows from the DEPT and EMP table are filtered out based on the join condition. EXISTS is a faster and a cheaper alternative because the optimizer realizes that when the sub-query has been satisfied once, there is no need to proceed any further, and the next driving row can be fetched. In the example shown only one row needs to be returned from dept even though many rows in emp might match the subquery. GUIDELINE 5:
5. WHICH IS BETTER - UNION ALL OR UNION? Two tables EMP1 and DEPT having different data have to be displayed in a combined fashion. Query 1
SELECT empno FROM emp1 UNION SELECT depno FROM dept; OPERATION -----------------------------SELECT STATEMENT SORT UNION-ALL INDEX TABLE ACCESS
OPTIONS OBJECT_NAME ------------------------------ -----------------------------UNIQUE FAST FULL SCAN FULL
EMP1_EMPNO_IDX DEPT
COST ---------7378 7378 291 1
Query 2 SELECT empno FROM emp1 UNION ALL SELECT deptno FROM dept; OPERATION -----------------------------SELECT STATEMENT UNION-ALL INDEX TABLE ACCESS
OPTIONS OBJECT_NAME COST ----------------------------------------------------------- ---------292 FAST FULL SCAN FULL
EMP1_EMPNO_IDX DEPT
291 1
Inference The estimated cost of processing the query is reduced by 96% when UNION ALL is used instead of UNION. UNION ALL simply returns all rows including duplicates and does not have to perform any sort, merge or filter Operations (costly operations performed by UNION, which are unnecessary in the above case especially since the unique columns are being selected).
GUIDELINE 6:
6. IS HAVING CLAUSE COSTLY? The maximum SALARY in each JOB type has to be displayed. JOB types of 'CLERK' and 'SALESMAN' are NOT to be considered. Query 1 SELECT JOB,MAX(sal) FROM emp
GROUP BY job HAVING job NOT IN ('CLERK','SALESMAN'); OPERATION -----------------------------SELECT STATEMENT FILTER SORT TABLE ACCESS
OPTIONS OBJECT_NAME COST ------------------------------ ------------------------------ ---------11443 GROUP BY FULL
EMP
11443 822
Query 2 SELECT JOB,MAX(sal) FROM emp WHERE job NOT IN ('CLERK','SALESMAN') GROUP BY job; OPERATION -----------------------------SELECT STATEMENT SORT TABLE ACCESS
OPTIONS OBJECT_NAME COST ----------------------------------------------------------- ---------7623 GROUP BY 7623 FULL EMP 822
Inference The estimated cost of processing the second query is less than that of the first one, by 33%. In the first case, the GROUP operation is performed on the specified columns and the required rows are FILTERED out based on the condition specified (eliminating those rows where JOB is CLERK or SALESMAN). Since the filtering is performed after the group operation, the cost increases. In the second case, the required rows are first selected by a FULL table scan on the table EMP1. These rows are then grouped according to the specification. Since, the HAVING clause filters selected records only after all rows have been fetched. Thus, it is advisable to use the WHERE clause in combination with a GROUP BY clause to reduce these overheads.
GUIDELINE 7:
7. USE CLUSTERS FOR TABLES WHICH ARE OFTEN USED TOGETHER IN A JOIN CONDITION. Select the EMPNO, ENAME, DNAME from the EMP and DEPT tables. This example demonstrates the efficiency of a CLUSTER in case of joined table. Query 1 (no cluster exists)
SELECT empno, ename, dname FROM emp, dept WHERE emp.deptno=dept.deptno; OPERATION -----------------------------SELECT STATEMENT HASH JOIN TABLE ACCESS TABLE ACCESS
OPTIONS OBJECT_NAME COST ----------------------------------------------------------- ---------1100 1100 FULL DEPT 1 FULL EMP 822
Query 2 (a cluster exists with deptno as the cluster key) SELECT empno,ename,dname FROM emp,dept WHERE emp.deptno=dept.deptno; OPERATION -----------------------------SELECT STATEMENT NESTED LOOPS TABLE ACCESS TABLE ACCESS
OPTIONS OBJECT_NAME COST ------------------------------ --------------------------------------879 879 FULL DEPT 879 CLUSTER EMP
Inference The above execution plans for the join query shows a reduction of cost by 33% in favour of clustered tables. Hence it is advisable to use clusters for tables, which are predominantly used together using a join condition. Click here for limitations and capabilities of clusters. GUIDELINE 8:
8. USE DUMMY WHERE CLAUSES TO FORCE USE OF INDEXES ON CHARACTER COLUMNS. Generating an ordered output of the ENAME column (a varchar column). The ENAME column has a non-unique index (ENAME_INDX). Query 1 SELECT ename FROM emp ORDER BY ename; OPERATION -----------------------------SELECT STATEMENT SORT
OPTIONS OBJECT_NAME COST ----------------------------------------------------------- ---------8915 ORDER BY 8915
TABLE ACCESS
FULL
EMP
822
Query 2 SELECT ename FROM emp WHERE ename > TO_CHAR(0); OPERATION -----------------------------SELECT STATEMENT INDEX
OPTIONS OBJECT_NAME COST ------------------------------ ------------------------------ ---------4 FAST FULL SCAN ENAME_INDX 4
Inference The cost is reduced by an incredible 99% when the second query is used.It is clear from the above example that the Oracle Optimizer ignores indexes created on character columns. Hence it is necessary to explicitly specify the usage of indexes to reduce the I/O costs. Use of dummy WHERE clause as shown in the second query is effective in such cases. NOTE: - This property is true for null columns (even null number columns). However, this is not true for non-null number columns. If an index is available in a number column oracle uses it if necessary.
GUIDELINE 9:
9. USE FUNCTION BASED INDEXES ON COLUMNS WHICH ARE OFTEN USED WITH A FUNCTION. Details of all those employees whose SALARY*COMMISION is greater than 5000, is to be displayed. Query 1 (No Function based index [FBI] created) SELECT * FROM emp WHERE (sal * NVL(comm,0)) > 5000; OPERATION -----------------------------SELECT STATEMENT TABLE ACCESS
OPTIONS OBJECT_NAME COST ------------------------------ ------------------------------ --------822 FULL EMP 822
Query 2 (A FBI has been created)
SELECT * FROM emp WHERE (sal * NVL(comm,0)) > 5000; OPERATION -----------------------------SELECT STATEMENT TABLE ACCESS INDEX
OPTIONS OBJECT_NAME COST ------------------------------ ------------------------------ ---------10 BY INDEX ROWID EMP 10 RANGE SCAN SAL_COMM_FINX 2
Inference With a FBI, the processing cost is reduced by 98%.Avoid doing calculations on indexed columns. When the optimizer encounters a calculation on an indexed column, it will not use the index and will perform a full table scan. If a need arises to perform such calculations and specify them in the WHERE clause, create a FBI. Click here for capabilities of FBI’s.
GUIDELINE 10:
10. AVOID USING FUNCTIONS LIKE TRUNC ON INDEXED COLUMNS. The detailed list of all those employees who have been hired today has to be displayed. An index exists on the HIREDATE column of the emp table. Query 1 SELECT * FROM emp WHERE TRUNC(hiredate) = TRUNC(sysdate); OPERATION -----------------------------SELECT STATEMENT TABLE ACCESS
OPTIONS OBJECT_NAME COST ----------------------------------------------------------- ---------822 FULL EMP 822
Query 2 SELECT * FROM emp WHERE hiredate BETWEEN TRUNC(sysdate) AND TRUNC(sysdate)+.99999; [ie hiredate between 12:00:00 AM to 11:59.59 PM ] OPERATION -----------------------------SELECT STATEMENT TABLE ACCESS INDEX
OPTIONS OBJECT_NAME COST ------------------------------ ------------------------------ ---------6 BY INDEX ROWID EMP 6 RANGE SCAN EMP_HDT_IDX 2
Inference The cost in the second case is reduced by 99% than in the first case.As said before using of functions disables the use of indexes. Hence the above query shows a way to over come such an problem. GUIDELINE 11:
11. CONSIDER THE USE OF BIND VARIABLES It is always a good practice to use bind variables. It is also advisable to have consistency in the manner (i.e. consistent use of upper and lower case of alphabets in the query, uniform use of spaces etc.) the statements are typed. These practices facilitate oracle in saving up time and resources spent on processing the same statements. Example1: Select * from emp; Select * from emp; SELECT * FROM EMP; For the above queries, which perform the same task, parsing will be done three times because Oracle considers them as three separate queries. Therefore the given queries should be written in a consistent manner as shown below so that next time an identical query is issued Oracle uses the already parsed statement. SELECT * FROM emp; Example2: SELECT * FROM emp WHERE empno=7839; SELECT * FROM emp WHERE empno=7844; For the above queries bind variables should be used so as to make Oracle reuse the same statement. SELECT * FROM emp WHERE empno=:bind_var;
GUIDELINE 12:
12. CREATE INDEXES ON COLUMNS, WHICH ARE OFTEN USED FOR SORTING PURPOSES.
Use of Indexes should be followed while writing queries for sorting the data. Consider the following situation where in the first query there is no index on EMPNO and the query sorts the rows on the EMPNO. Query 1 (without index on EMPNO) SELECT empno FROM emp ORDER BY empno DESC; OPERATION -----------------------------SELECT STATEMENT SORT TABLE ACCESS
OPTIONS OBJECT_NAME COST ----------------------------------------------------------- ---------7906 ORDER BY 7906 FULL EMP 822
Query 2 (with index on EMPNO) SELECT empno FROM emp ORDER BY empno DESC;
OPERATION -----------------------------SELECT STATEMENT INDEX
OPTIONS OBJECT_NAME COST ------------------------------ ----------------------------- ---------26 FULL SCAN EMP _EMPNO_IDX 26
Inference In the above example, Oracle does a full table scan for the table, as it does not find any index for the EMPNO column. The cost incurred here is whopping. After the index is created, Oracle does a full scan of the index and the cost incurred is reduced by more than 99%. GUIDELINE 13:
13. USE EXISTS IN PLACE OF JOINS WHERE THE NUMBER OF SUCCESSFUL ROWS RETURNED FROM THE DRIVING TABLE IS SMALL. All employees who have a valid department ID are to be displayed (i.e. their corresponding DEPTNO must exist in the dept table). Here a primary key index is present in the DEPTNO column of the DEPT table. NOTE: - The situation to run this query may occur if the relational constraints are not in place. Query 1
SELECT E.empno,E.ename,D.deptno FROM emp E,dept D WHERE E.deptno = D.deptno; OPERATION -----------------------------SELECT STATEMENT NESTED LOOPS TABLE ACCESS INDEX
OPTIONS OBJECT_NAME COST ------------------------------ ------------------------------ ---------822 822 FULL EMP 822 UNIQUE SCAN DEPT_PRIMARY_KEY
Query 2 SELECT E.empno,E.ename,E.deptno FROM emp E WHERE EXISTS (SELECT deptno FROM dept D WHERE D.deptno = E.deptno); OPERATION -----------------------------SELECT STATEMENT FILTER TABLE ACCESS INDEX
OPTIONS OBJECT_NAME COST ----------------------------------------------------------- ---------822 FULL UNIQUE SCAN
EMP 822 DEPT_PRIMARY_KEY 1
Inference The cost of processing the query is reduced by almost 33% when EXISTS is used in the query instead of a join condition. Consider using EXISTS instead of joining the tables if the percentage of successful rows returned from the driving table (i.e. the number of rows that need to be validated against the subquery) is small. In such a case, a table join will be inefficient. GUIDELINE 14:
14. USE EITHER ROWID OR AN INDEXED COLUMN IN COUNT FUNCTION. The number of employees in the EMP table is to be found. Query 1 SELECT COUNT(job) FROM emp; OPERATION ------------------------------
OPTIONS OBJECT_NAME COST ------------------------------ ------------------------------ ----------
SELECT STATEMENT SORT TABLE ACCESS
822 AGGREGATE FULL
EMP
822
Query 2 SELECT COUNT(ROWID) FROM emp; or SELECT COUNT(empno) FROM emp; OPERATION -----------------------------SELECT STATEMENT SORT INDEX
OPTIONS -----------------------------AGGREGATE FAST FULL SCAN
OBJECT_NAME COST ------------------------------ ---------4 EMP_EMPNO_IDX
4
Inference The estimated processing costs are reduced by more than 99% in the second case.While using the COUNT function in the select statement make sure that you use either ROWID or the indexed column, using these the performance is increased and the cost incurred is low. This is because if COUNT is made on a non-indexed column, the Oracle optimizer will opt for a FULL table scan rather than use an index existing on some other column.Also if COUNT(*) is used Oracle will first resolve the columns of the table and then decide on the appropriate path for the query. GUIDELINE 15:
15. USE DECODE WHEREVER POSSIBLE. The employee details along with their respective department names are to be displayed. This theory may only be feasible if the second table has a very small number of rows. Query 1 SELECT E.empno,E.ename,D.dname DEPT_NAME FROM emp E,dept D WHERE E.deptno=D.deptno; OPERATION -----------------------------SELECT STATEMENT HASH JOIN TABLE ACCESS TABLE ACCESS
OPTIONS -----------------------------FULL FULL
OBJECT_NAME COST ------------------------------ ---------1100 1100 DEPT 1 EMP 822
Query 2 SELECT E.empno, E.ename, DECODE(E.deptno,10,'ACCOUNTING',20,'RESEARCH',30,'SALES','OPERATIO NS') DEPT_NAME FROM emp E; OPERATION -----------------------------SELECT STATEMENT TABLE ACCESS
OPTIONS OBJECT_NAME COST ------------------------------ ------------------------------ ---------822 FULL EMP 822
Inference The reduction in the estimated cost is almost 50% in the second case.Use DECODE statement to reduce processing and to avoid having to scan the same rows repetitively. This increases the performance and reduces the cost incurred. This method however is feasible only if the second table is small. GUIDELINE 16:
16. USE PARTITIONS FOR LARGE TABLES FOR FASTER RETRIEVAL OF DATA. Select the EMPNO, ENAME, DNAME from the EMP and DEPT tables where the deptno is 10 or 30. This example demonstrates the efficiency of paritioning large tables. Query 1 SELECT E.empno,E.ename,D.deptno,D.dname FROM emp E,dept D WHERE E.deptno=D.deptno AND E.deptno IN (10,30); OPERATION -----------------------------SELECT STATEMENT HASH JOIN TABLE ACCESS TABLE ACCESS
OPTIONS -----------------------------FULL FULL
OBJECT_NAME COST ------------------------------ ---------1047 1047 DEPT 1 EMP 822
Query 2 SELECT E.empno,E.ename,D.deptno,D.dname FROM emp_part E,dept D WHERE E.deptno=D.deptno AND E.deptno IN (10,30);
OPERATION -----------------------------SELECT STATEMENT HASH JOIN TABLE ACCESS PARTITION RANGE TABLE ACCESS
OPTIONS OBJECT_NAME COST ------------------------------ ------------------------------ ---------851 851 FULL DEPT 1 INLIST FULL EMP_PART 612
Inference The cost comes down by 20% when a table is partitioned.Large tables can be partitioned based either on RANGE or HASH. Range partitioning is useful when the table data can be distributed among many logical ranges based on ranges of a particular column. In the above example, EMP_PART is the partitioned table, which is partitioned by range based on the column DEPTNO into four partitions. The cost is reduced when the partitioned table is used since the rows are retrieved in parallel from the partitions. But such a feature is useful only if tables are very large and their selectivity is done more often on the partitioned column. Also, the number of partitions should not be too high. This performance can be further improved by incorporating partitioned indexes. This use of this feature is very optimal in case of warehouse Databases. Click here for syntax of creation of partitions.
GUIDELINE 17:
17. USE INDEX-ORGANIZED TABLES FOR QUERIES INVOLVING EXACT MATCH AND RANGE SEARCHES ESPECIALLY ON THE PRIMARY KEY. Select the employee details of all employees having EMPNO's less than 1000. Query 1 SELECT empno,ename FROM emp WHERE empno < 1000; OPERATION -----------------------------SELECT STATEMENT TABLE ACCESS
OPTIONS -----------------------------FULL
Query 2 SELECT empno,ename FROM empi WHERE empno < 1000;
OBJECT_NAME COST ------------------------------ ---------822 EMP 822
OPERATION -----------------------------SELECT STATEMENT INDEX
OPTIONS OBJECT_NAME COST ------------------------------ ------------------------------ ---------10 RANGE SCAN EMPI_EMPNO_PK 2
Inference The processing costs are reduced by almost 99% in case of the second query. In the above example, both EMP and EMPI have 9.5 lac rows with the same data and have EMPNO as the primary key. The only difference being that the EMPI table is an INDEX ORGANIZED TABLE. Click here for syntax and information on Index Organized tables.
Addendum Note1
1. Hash Joins: Hash joins are used for joining large data sets. The optimizer uses the smaller of the two tables/data sources to build a hash table on the join key in memory.It then scans the larger table, probing the hash table to find the joined rows.This operation involves joining two sets of rows and returning the result. Apart from this the result needs to be SORTED for unique values i.e. an operation involving sorting a set of rows to eliminate duplicates.If the table is small enough to fit into the memory,then the cost is limited to a single read pass over the data for the two tables.But if,the hash table is too big to fit in to the memory,optimizer chooses partitioning. When the Optimizer Uses Hash Joins: The optimizer uses a hash join to join two tables if they are joined using an equijoinand if either of the following are true: 1. A large amount of data needs to be joined. 2. A large fraction of the table needs to be joined. Hash joins Example: [BACK] Note2
2. Clusters: You can create a CLUSTER as follows, Suppose you want to create a cluster named EMP_DEPT for EMP and DEPT tables. 1)
First create a cluster named EMP_DEPT.
CREATE CLUSTER emp_dept(deptno number(2)); 2)
Now create the cluster tables as follows, CREATE TABLE dept ( deptno NUMBER(2) PRIMARY KEY, . . . ) CLUSTER emp_dept (deptno); CREATE TABLE emp ( empno NUMBER(5) PRIMARY KEY, ename VARCHAR2(15) NOT NULL, ... deptno NUMBER(2) REFERENCES dept) CLUSTER emp_dept (deptno);
In this cluster DEPTNO is the cluster key. Be sure to choose your cluster key carefully and it should represent the join condition between the cluster tables. Some Points to remember about Clusters: (i) Clusters should be used for tables predominantly used in join queries, they help in increasing the performance. They should be preferred in Data Warehouse Environments where the data in the cluster tables is not likely to change often. (ii) Clusters can degrade the performance of your database when used in OLTP (Online Transaction Processing) environments where the data in the cluster tables is likely to be modified often. You are advised make use clusters only after making a proper evaluation of the environment your application will be working in. [BACK]
Note3
3. Function Based Indexes: Function Based Index can be created as shown below: CREATE INDEX sal_comm_finx ON EMP ( sal * nvl(comm,0)); Function-based indexes facilitate queries that qualify a value returned by a function or expression. The value of the function or expression is pre-computed and stored in the index. Specific features of function-based indexing include: (1) Indexes can be created where the search key is an expression. (2) They provide an efficient mechanism for evaluating predicates involving functions. (3) Case-insensitive sorts can be Performed. (4) Descending order indexes can be created. They are treated as a special case of
function-based indexes. [BACK] Note4
4. Partitioned Tables: Partitioned tables can be created as follows.
CREATE TABLE sales ( invoice_no NUMBER, sale_year INT NOT NULL, sale_month INT NOT NULL, sale_day INT NOT NULL ) STORAGE (INITIAL 100K NEXT 50K) PARTITION BY RANGE ( sale_year, sale_month, sale_day) ( PARTITION sales_q1 VALUES LESS THAN ( 1999, 04, 01 ) TABLESPACE tsa , PARTITION sales_q2 VALUES LESS THAN ( 1999, 07, 01 ) TABLESPACE tsb, PARTITION sales_q3 VALUES LESS THAN ( 1999, 10, 01 ) TABLESPACE tsc, PARTITION sales q4 VALUES LESS THAN ( 2000, 01, 01 ) TABLESPACE tsd) ; Partitioned tables allow your data to be broken down into smaller, more manageable pieces called partitions, or even subpartitions. Indexes may be partitioned in similar fashion. Each partition can be managed individually, and can operate independently of the other partitions, thus providing a structure that can be better tuned for availability and performance. [BACK] Note5
5. Index-Organized Tables: You can create index-organized tables in the following way.
CREATE TABLE docindex( token CHAR(20), doc_id NUMBER, token_frequency NUMBER, token_offsets VARCHAR2(512), CONSTRAINT pk_docindex PRIMARY KEY (token, doc_id)) ORGANIZATION INDEX TABLESPACE ind_tbs; Index-organized tables are like regular tables with a primary key index on one or more of its columns. However, instead of maintaining two separate storage spaces for the table and B*tree index, an index-organized table only maintains a single B*tree index containing the primary key of the table and other column values.
Index-organized tables provide fast key-based access to table data for queries involving exact match and range searches especially on the primary key as seen above. Changes to the table data (such as adding new rows, updating rows, or deleting rows) result only in updating the index structure (because there is no separate table storage area). Index-organized tables are suitable for accessing data by way of primary key or any key that is a valid prefix of the primary key. There is no duplication of key values and storage requirements are reduced because a separate index structure containing the key values and ROWID is not created. [BACK]