Raising Index Awareness
Date 20th Jan 2005
19th Jan 2005 25th Oct 2004
Version Notes Listed full details all ERs and bugs that have been raised as a result of this document’s creation Removed some of the TODO (outstanding items) but left some there for later checking and to make sure readers knows there is some ambiguity Corrected note on bottom of page 29 to clarify when GROUP BY clause is added First version released
Permission to use this document is authorized, provided that A) the use of the document is for informational and noncommercial purposes, and B) the document retain the copyright notice contained in this disclaimer. Business Objects may modify this document at any time and without notice. THIS DOCUMENT IS DELIVERED “AS IS” AND WITHOUT WARRANTY OF ANY KIND INCLUDING, BUT NOT LIMITED TO, ALL IMPLIED WARRANTIES AND CONDITIONS OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL BUSINESS OBJECTS, ITS SUPPLIERS OR OTHER THIRD PARTIES MENTIONED HEREIN BE LIABLE FOR ANY DAMAGES ARISING FROM THE USE OF THIS DOCUMENT INCLUDING, BUT NOT LIMITED TO, SPECIAL, INDIRECT, PUNATIVE OR CONSEQUENTIAL DAMAGES. The information in this document is subject to change without notice. If you find any problems with this article, please report them using the feedback link on this site or at
[email protected]. Business Objects does not warrant that this document is error free. Copyright © Business Objects 2000. All rights reserved. Portions © Copyright 1996, Microsoft Corporation. All rights reserved. Trademarks: The Business Objects logo, BusinessMiner, BusinessQuery, and WebIntelligence are registered trademarks of Business Objects S.A. The Business Objects tagline, Broadcast Agent, BusinessObjects, Personal Trainer, Rapid Deployment Templates, and Set Analyzer are trademarks of Business Objects S.A. Microsoft, Windows, Windows NT, Access, Microsoft VBA, the Visual Basic Logo and other names of Microsoft products referenced herein are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. Oracle is a registered trademark of Oracle Corporation. All other names of Oracle products referenced herein are trademarks or registered trademarks of Oracle Corporation. All other product and company names mentioned herein are the trademarks of their respective owners. This software and documentation is commercial computer software under Federal Acquisition regulations, and is provided only under the Restricted Rights of the Federal Acquisition Regulations applicable to commercial computer software provided at private expense. The use, duplication, or disclosure by the U.S. Government is subject to restrictions set forth in subdivision (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at 252.2277013. U.S. Patent No. 5,555,403
Steven White
Raising Index Awareness
Page 2 of 73
Table of Contents Introduction...................................................... ............5 Database performance concepts....................................6 Examples in this document............................................8 Example 1 – Use of a table’s primary key........................9 Example 2 – Avoiding joins or tables.............................15 Example 3 – Multiple Foreign Key entries......................20 Example 4 – Unique and non-unique values..................22 Example 5 – Using a Primary Key object in RESULTS pane31 Example 6 – Using an Index Awareness WHERE clause. . .35 Example 7 – Multi-column primary keys ........................38 Example 8 –Affect of Row Based Restrictions.................40 KEYS dialog explained.................................................42 SQL editor dialog.................................................. .......48 Limitations and rules ........................ ..........................49 Appendix A – Bugs and Enhancement Requests.............51 Appendix B – Accessing Index Aware information..........52 ............................................... ..................................53 Appendix C – Outstanding Items...................................54 The Secrets of Oracle Bitmap Indexes ..........................54 Overview ...................................................................55
Steven White
Raising Index Awareness
Page 3 of 73
Bitmap Indexes and Deadlocks ....................................55
Steven White
Raising Index Awareness
Page 4 of 73
Introduction This document explores the new feature available in Designer 6.5.1 called Index Awareness. This feature can help make SQL queries faster by redirecting WHERE clause restrictions to columns that are indexed and/or to generate a query that references less tables.
Who should read this? Business Objects universe and report designers who wish to improve the performance on their reports. Database schema designers and DBAs who support Business Objects developers would also benefit from understanding the concepts in this document.
Environments used for this article • • • •
Windows 2000 SP3 Business Objects 6.5.1 GA (no Hotfixes or CSPs were installed) Microsoft Access 2002 SP2 (for most db interactions) SQL*Server 2000 (for some database interactions)
Pre-requisites The reader should be very familiar with Business Objects universe and report building concepts. A familiarization with RDBMS and database performance tuning would also be advantageous.
Disclaimer The functionality and features detailed herein are based on the author’s understanding of the software version stated earlier. Every attempt has been made to create an accurate document by testing workflows and confirming understanding with corporate development, but the reader assumes all risk in the using the information contained herein.
Resources needed The following documents or files will be needed to perform the task: Item description Universe for reporting on security and universe domains of the repository Demo Islands Resort Marketing MS Access database that is delivered with Business Objects 5.x and 6.x
Filename Combined.unv Club.mdb
Other universe performance techniques Index Awareness is only one of the many techniques available in universe design to improve the performance of queries. Please check out the following in the relevant documentation and Knowledge Base articles: • Aggregate Awarenes • Short cut joins
Steven White
Raising Index Awareness
Page 5 of 73
Database performance concepts This document does not intend to explain database performance tuning concepts in any depth but terminology used in later sections of this documents should be explained to avoid any confusion. • Database index • Primary Key • Foreign Key • Surrogate keys • Partitions • Referential Integrity • Influencing the database optimizer • Tuning philosophy
Database Index A database construct which allows faster access to rows in a table. Indexes are defined by the database designer and maintained automatically by the RDBMS. Since indexes are maintained in a sorted sequence, the database can quickly locate a row or rows in the table.
Primary Key A database designer defined column or columns which uniquely identify a row in a table. If the RDBMS is told about the Primary Key during database design, it will automatically maintain an index to help it guarantee uniqueness and provide referential integrity.
Foreign Key A database designer defined column or columns which match the Primary Key of another table. This facilitates referential integrity and provides likely columns for joins to a table that contains the Primary Key or other tables with Foreign Key instances.
Surrogate keys A common Data Warehouse type of column where the actual Primary Key and Foreign Key values are referenced via a smaller, faster column type i.e. a number instead of a longer character type This allows joins and indexes to be faster because indexes are smaller in size (same rows, less width) and therefore less memory and/or disk space is consumed during their navigation.
Partitions A way of dividing the physical layout of a table across separate disk locations to avoid contention when accessing the table via the disk system. Usually the partitioning is based on a commonly accessed dimension such as time e.g. break a fact table into separate partitions based on the year\month of the transactions.
Referential Integrity This is not really a performance tuning term but it is related because this RDBMS feature requires Foreign Key and Primary Keys to exist.
Influencing the database optimizer The optimizer, if it exists at all in your database, follows a set of internal rules which help it govern which path would result in the fastest query execution. It uses the database constructs detailed earlier and knowledge of the tables and query to makes its decision. There is no guarantee that the database optimizer will use any specific database performance improving construct just because the user has provided the opportunity for it to do so e.g. just because a index could be used does not mean it will be used. The database optimizer may decide that other rules, limits and conditions apply and it will choose a different route to the data. Although Index Awareness will help the optimizer make a better decision, it is no guarantee that a Primary Key or Foreign Key entry (see later in document) will cause the index to be used.
Steven White
Raising Index Awareness
Page 6 of 73
As with any performance tuning exercise, monitoring of the RDBMS path to the data (sometimes called the ‘explain plan’) is vital. The DBA must also keep RDBMS statistics up to date e.g. how large are the tables? What is the cardinality of the values?
Tuning philosophy Although no one will argue that using the Primary Key of a table is a good thing there seems to be 2 opinions on whether tuning should avoid dimension tables or use them explicitly Avoid dimension tables This method suggests that all conditions should be directed to the ‘weightiest table’ i.e. largest which is usually the fact table in database. This helps reduce the number of tables and therefore joins in a query because joins are considered very database resource intensive. Once the restrictions have been processed by the fact table, the database will join out to the smaller dimension tables. If this is your intention that you should setup your Index Awareness with Foreign Key entries that will reduce the number of tables when the weightiest tables are involved. Pro Con
: less joins and tables : possibly more rows in the join between fact and dimension tables
Use dimension tables Alternatively this methods suggests that aim of the optimizer is to reduce the number of rows that join into the fact table as a priority over the number of tables and joins. Leaving conditions on the dimension tables causes the database to restrict the smaller dimension tables first and then join into the fact table with less rows. Pro Con
: less rows in the join to the larger fact table : more tables and joins in the query
If this is your intention that you should setup your Index Awareness with less Foreign Key entries that directly point to the larger tables in the schema
Getting more information on tuning Ask your DBA (database administrator) for more information on these concepts.
Steven White
Raising Index Awareness
Page 7 of 73
Examples in this document Most of the examples in this document use a variation of the ‘club.mdb’ (aka Islands Resort Marketing) and ‘beach.unv’ the latter has been which has been renamed ‘IndxAwre.unv’ The performance improvements discussed in this document are theoretical. Because the ‘club.mdb’ is so small and Microsoft Access is not considered a production capable database, it is unlikely you will notice any performance improvement in running the small queries that ‘club.mdb’ is capable of. Any LOV that are shown in this document that involve editing the LOV (DESIGNER\OBJECT PROPERTIES\PROPERTIES\EDIT) have sorts applied to the objects to allow the hierarchical view of the LOV dialog to display values without duplicates. This step is discussed in another document by the same author. Note : MS Access may have a ‘explain plan’ feature (mentioned earlier) but this was not investigated by the author
Schema and its indexes The following schema (club.mdb) will be used for the examples: club.mdb
Steven White
Raising Index Awareness
Page 8 of 73
Example 1 – Use of a table’s primary key Create a query that returns Customer and Revenue
The SQL generated will be: SELECT Customer.last_name, sum(Invoice_Line.days * Invoice_Line.nb_guests * Service.price) FROM Customer, Invoice_Line, Service, Sales WHERE ( Customer.cust_id=Sales.cust_id ) AND ( Sales.inv_id=Invoice_Line.inv_id ) AND ( Invoice_Line.service_id=Service.service_id ) GROUP BY Customer.last_name
Now add a condition that restricts it to return only customers from Dallas, Houston, Los Angeles, San Diego and San Francisco.
Steven White
Raising Index Awareness
Page 9 of 73
Steven White
Raising Index Awareness
Page 10 of 73
The SQL will be: SELECT Customer.last_name, sum(Invoice_Line.days * Invoice_Line.nb_guests * Service.price) FROM Customer, Invoice_Line, Service,
City, Sales WHERE ( City.city_id=Customer.city_id ) AND ( Customer.cust_id=Sales.cust_id ) AND ( Sales.inv_id=Invoice_Line.inv_id ) AND ( Invoice_Line.service_id=Service.service_id AND (
)
City.city IN ('Dallas', 'Houston', 'Los Angeles', 'San Diego', 'San Francisco') ) GROUP BY Customer.last_name
Everything is as it should be. But what if we knew that the ‘city’ column is not indexed and therefore is not a good candidate for WHERE clause restrictions? (see schema description earlier) We also know that column ‘city_id’ is the primary key of the city table but more so, it is indexed. From our discussions on good performance earlier we determined that using the primary key of a table is one of the fastest ways of retrieving the rows of that table. So if we could get the SQL to automatically use the Primary Key of the city table, we could achieve a faster retrieval of the records. Note : Of course we could change the object’s SELECT definition to point to that Primary Key column (city_id) but that means we expect users to know what those Primary Key values mean. Since Primary Key and Foreign Key values are typically based on surrogate keys, and surrogate keys are automatically allocated by the ETL or RDBMS itself, an end user will not normally know what Primary Key or Foreign Key value relates to a real live value that they understand
Steven White
Raising Index Awareness
Page 11 of 73
Setting up the Index Awareness 1. 2. 3. 4. 5. 6. 7. 8. 9.
Open Designer Open the relevant universe e.g. IndxAwre.UNV Locate the City object (in the Customer class) Right click on the City object and choose ‘Object Properties’ Select KEYS tab Press INSERT In SELECT column, select the ‘…’ (3 dots) to open the ‘Edit SELECT statement’ dialog Select or type the SQL ‘City.city_id’ note : more on this dialog later in the document OK
Steven White
Raising Index Awareness
Page 12 of 73
Now let’s return to the query we had before, edit data provider and then view SQL and we discover that the SQL has changed: SELECT Customer.last_name, sum(Invoice_Line.days * Invoice_Line.nb_guests * Service.price) FROM Customer, Invoice_Line, Service, City, Sales WHERE ( City.city_id=Customer.city_id ) AND ( Customer.cust_id=Sales.cust_id ) AND ( Sales.inv_id=Invoice_Line.inv_id ) AND ( Invoice_Line.service_id=Service.service_id ) AND (
City.city_id
IN
(11, 10, 13, 14, 12)
) GROUP BY Customer.last_name
Notice that the ‘city.city’ column reference has been replaced by the ‘city.city_id’ column and that the city names have been replaced with the primary key values that represent their name. This query will likely run much faster because the main condition is now on an indexed column. Note : if the SQL does not show any change this could be a sporadic bug noticed by the author where the SQL would only change when the LOV dialog was visited and OK’d whether any changes were made to the LOV or not.
What happened behind the scenes? When we told the City object about its Primary Key, we offered Business Objects an alternative piece of SQL to use in any WHERE clause should that object be involved in a condition. It would have used ‘City.city’ but with Index Awareness it used ‘City.city_id’ instead. It determined this from the KEYS entries you made. But how did Business Objects convert the City names you selected in the LOV into the primary key values necessary for the SQL? Remember that the City table has the following values (that are relevant to our example) :
City table City_id (Primary Key) 11 10 13 14 12
City Dallas Houston Los Angeles San Diego San Francisco
Region 20 20 21 21 21
It did this by adding the Primary Key SELECT to the query it normally generates for the LOV and when you selected the City names, Business Objects matched the City names to the Primary Key values then.
Steven White
Raising Index Awareness
Page 13 of 73
The List of Values SQL
The SQL generated for a non-Index Aware LOV involving City would be: SELECT DISTINCT City.city FROM City But the SQL generated for the LOV on the Index Aware City object was: SELECT DISTINCT City.city, City.city_id FROM City Notice that the ‘City.city_id’ column has been added.
Steven White
Raising Index Awareness
Page 14 of 73
This SQL is visible by either editing the LOV for the City object and/or tracing SQL that passes thru the middleware to the database. So when you selected the City names from the LOV dialog, you were indirectly selecting the Primary Key values as well. Think of the Primary Key values as a hidden column in the LOV dialog.
Summary Index Awareness allowed us to automatically redirect a WHERE clause condition to another column (on the same table for this example) that we know would provide better performance at query time. We determined which column to choose as an alternative based on our knowledge of the database schema and the RDBMS optimizer. The LOV values we select actually tell Business Objects what Primary Key values to substitute in final query SQL. The KEYS tab tells Business Objects which SQL syntax to substitute in the final query SQL. Rule – when an Index Aware object is used in the CONDITIONS pane, the Primary Key entry will replace the object’s SELECT Rule – the operand ‘Show list of values’ dialog returns the Primary Key values that match the visible values in the LOV dialog
Example 2 – Avoiding joins or tables Steven White
Raising Index Awareness
Page 15 of 73
Another use of Index Awareness is to reduce the number of tables or joins involved in a query. This reduces the number of joins in a query which can improve query performance. From previous sections we discussed that joins are very expensive in performance terms for the RDBMS, so any possibility at avoiding them should be sought. The SQL from the last example improved performance by restricting on the Primary Key (indexed) of the city table. SELECT Customer.last_name, sum(Invoice_Line.days * Invoice_Line.nb_guests * Service.price) FROM Invoice_Line, Sales, City, Customer, Service WHERE ( City.city_id=Customer.city_id ) AND ( Customer.cust_id=Sales.cust_id ) AND ( Sales.inv_id=Invoice_Line.inv_id ) AND ( Invoice_Line.service_id=Service.service_id ) AND ( City.city_id IN (11, 10, 13, 14, 12) ) GROUP BY Customer.last_name
This query has the Customer and City tables in it which both can provide the City_id column. The City table is only needed to satisfy the WHERE clause and is not needed in the SELECT or GROUP BY clauses. Is it possible to remove the City table from the query completely? i.e. tell Business Objects to use the Customer table to get City_id from if it can City_id can come from more than one table
1.
Locate the City object (in the Customer class)
2.
Right click on the City object and choose ‘Object Properties’
Steven White
Raising Index Awareness
Page 16 of 73
3.
Select KEYS tab
4.
Skip to step 9 if the Primary Key from the previous example is still in place
5.
Press INSERT
6.
In the SELECT column, select the ‘…’ (3 dots) to open the ‘Edit SELECT statement’ dialog
7.
Select or type the SQL ‘City.city_id’
8.
OK
9.
Start here if the Primary Key of ‘City.City_id’ is still in place from the previous example
10. Press INSERT 11. In the Key Type column, choose ‘Foreign Key’ if not already chosen 12. In the SELECT column, click on the ‘…’ (3 dots) to open the ‘Edit SELECT statement’ dialog 13. Select or type the SQL ‘Customer.city_id’ 14. OK the dialog
Steven White
Raising Index Awareness
Page 17 of 73
Let’s return to the query panel and see what changes have taken place in the SQL: SELECT Customer.last_name, sum(Invoice_Line.days * Invoice_Line.nb_guests * Service.price) FROM Invoice_Line, Sales, Customer, Service WHERE ( Customer.cust_id=Sales.cust_id ) AND ( Sales.inv_id=Invoice_Line.inv_id ) AND ( Invoice_Line.service_id=Service.service_id ) AND (
Customer.city_id
) GROUP BY Customer.last_name
IN
(11, 10, 13, 14, 12)
Note that the City table is no longer referenced in the query and that the City_id is being restricted using the Customer table instead.
What affect did this have on our LOV? None When we check the SQL of the LOV for the City object we see it remains as it was in example 1 i.e. it has the Primary Key SELECT added to it. So we can conclude that the LOV SQL is only affected by the Primary Key entry and that the LOV selection will always return the Primary Key entries ‘behind the scenes’. Therefore all columns you provide in Foreign Key entries must have the same type of values and column type as the column referenced by the Primary Key entry.
Steven White
Raising Index Awareness
Page 18 of 73
Will the Foreign Key entry always be applied? No If the Primary Key table is still needed in the query’s RESULTS, SORTS or CONDITIONS which don’t involve Index Aware on the same table, then the table will remain in the query and the Foreign Key entry will not be applied City object used in the RESULTS causing the Foreign Key entry not to be used i.e. City table remains in the FROM clause and is used in the WHERE clause (restricting on its Primary Key)
But if the Primary Key table (the original object’s table) is not needed anywhere but the WHERE clause, then the Foreign Key entry will be applied: Example of Foreign Key being applied because Primary Key table not being needed anywhere else in the query
Rule – a Foreign Key entry will be ignored if it does not result in less tables being used. Business Objects can only use less tables if the Primary Key table is referenced only inside the WHERE clause.
Steven White
Raising Index Awareness
Page 19 of 73
Example 3 – Multiple Foreign Key entries The KEYS dialog allows more than one Foreign Key entry which allows us to deal with values which are repeated (denormalized in DBA speak) in multiple tables. Database designers may denormalize Primary Key values beyond whats necessary to satisfy constraints to assist in performance and simplification of SQL generation. It is expected to see an entity repeated twice. Initially in the entity own table as its Primary Key and then in a table that refers to this Primary Key thru its Foreign Key. Denormalization implies that the database designer has gone beyond this duplication and has more than 2 instances of the value in a schema. If we search the ‘club.mdb’ (Islands Resorts Marketing) database to find such entities, we will find only one i.e. sales person Tables list Sponsor.sales_id Sales_Person.sales_id Customer.sales_id That gives us three places that we can get sales_id from. So if we setup the KEYS for the ‘Sales Person’ object (in Sales class) properly we can give Business Objects 3 choices from which it can retrieve these values for the purposes of restricting data in the WHERE clause (and hopefully speeding up the query). KEYS entries for Sales Person object
Steven White
Raising Index Awareness
Page 20 of 73
Customer, Sponsor and Revenue object with Sales Person condition
The Sponsor table is used to restrict on Sales Person (sales_id) because it is one of the Foreign Key entries and the Sales_Person table is not used anywhere else in the query so can be dropped by Business Objects. But why does Business Objects use the Sponsor table and not the Customer table? Both tables exist in the FROM clause? Perhaps it’s the sequence of Foreign Key entries in the KEYS dialog? If we rearrange them so that they are listed as follows:
And check the query SQL we see: The SQL has changed again, this time using the Customer table to restrict Sales Person via sales_id. Rule – the sequence of Foreign Key entries in the KEYS dialog determines which entry is given preference i.e. the last enabled entry in the list that results in the least number of tables. Rule - If the ‘best’ Foreign Key entry does not result in a table count reduction compared to the original query, the Primary Key entry will be the only one that applies
Steven White
Raising Index Awareness
Page 21 of 73
Example 4 – Unique and non-unique values In the previous examples all the values that were restricted were unique e.g. Sales Person and City. But what happens when we use a value that is not unique e.g. Service First setup the Primary Key entry:
And the build an example query (using Customer and Revenue as RESULTS and Service IN LIST <list of values):
Why does the LOV has duplicates? Normally LOV have a DISTINCT on them and no duplicates are allowed as shown in the screenshot on the right:
Steven White
Raising Index Awareness
Page 22 of 73
Let’s look at the LOV SQL and note that as expected (as we discovered in example 1) the Primary Key entry has been added to the SELECT of the LOV SQL.
When we look at the service table, we see the duplicates there as well e.g. Hotel Room is repeated for SL_ID = 21, 31 and 41. SL_ID refers to Service Line.
Steven White
Raising Index Awareness
Page 23 of 73
This tell us that Service is not completely unique in the Service table. In fact it is only unique within a Service Line (SL_ID). So how can we help a user navigate to the correct value in a LOV when there are duplicates? Easy, we use normal customized LOV in Designer. EDIT the LOV, add the Service Line object, apply a primary ascending sort to the Service Line and a secondary ascending sort to the Service object note : the sorts are only necessary if the LOV is to be viewed in Hierarchical View
This results in a LOV that looks like this:
Steven White
Raising Index Awareness
Page 24 of 73
As expected the Primary Key value does not show in the LOV dialog but note that the Hierarchical View and the Tabular View show a different number of values. The Tabular View lists the values that are returned from the LOV query, excluding the Primary Key entry ofcourse. The Hierarchical View applies a further ‘distinct’ on the display values before it allows the user to navigate the LOV hierarchy. This means that hierarchy view ignores the impact of the hidden Primary Key value in the LOV. If we use this LOV as it is with the Primary Key entry, we’ll see the following SQL generated (based on original query RESULTS=Customer, Revenue CONDITIONS=Service IN LIST
): The Primary Key values from the Service table has been selected (behind the scenes) and put into the SQL. These values represent all the ‘Accommodation’ Service Line values. This is as expected.
Although the duplication is annoying, selecting all the Accomodation values while in Tabular View (in this example) does result in the correct SQL and therefor results.
Steven White
Raising Index Awareness
Page 25 of 73
Let’s see what happens when we use the Hierarchical View of the LOV dialog. This time only 3 Primary Key values have been selected e.g. 212, 211 and 213 which represent Bungalow, Hotel Room and Hotel Suite where SL_ID=21!! This means using the Hierarchical View compared to the Tabular View would give different results.
Clearly it would be unacceptable to users that depending on whether they used Tabular View or Hierarchical View they got the correct results or not. Note : Even before Index Aware, the Hierarchical View would always apply a further DISTINCT on the display values. The problem only arises now because using Index Aware we are not actually limting the query by the values we select in the dialog but by the Primary Key values indirectly being selected. See a separate article by the author on the LOV dialog So how do we make this LOV work correctly every time? We look at the data and see that the Service name is not really unique in its own table. That table is actually a Service within Service Line table. So we look at Service Line table and discover the Service Line name is not unique without being qualified by which Resort the Service Line is available at.
Service Line table
We could look beyond Resort because it is shown with its country but the Resort name is acutally unique across all countries. Although we determined this by looking at the data, we really should confirm it by talking to the schema designer as well. Since this may be test data which not reflect all the true relationships in the lifetime of the database So we need to determine what values should be displayed in the LOV to help guarantee that the unique Primary Key values the user indirectly selects from the hidden column represent the values that the user can see on screen.
Steven White
Raising Index Awareness
Page 26 of 73
Steven White
Raising Index Awareness
Page 27 of 73
We do this by further customizing the LOV for the Service object to the point that the Service name is fully qualified by all the values we think are necessary i.e. Service Line and Resort Tabular View – selecting the Bungalow, Hotel Room and Hote Suite values for Bahamas Beach and French Riveria generates the correct SQL much like before. 1
Steven White
Raising Index Awareness
Page 28 of 73
But note what happens you when you return the LOV to review your selections i.e. only the Bahamas Beach values are highlighted and if you OK the dialog only the Primary Key values representing those values are put in the SQL. Rule - The LOV dialog matches the values on the first match it can find and does not take into consideration that previous Primary Key values were remembered.
Steven White
Raising Index Awareness
Page 29 of 73
With Hierarchical View things are no different. Let’s see the affect of Hierarchical View on the LOV and SQL now that we’ve customized the LOV to allow the user to navigate correctly to the values. Hierarchical View gives the correct SQL
But when we return to the LOV dialog:
The same thing happens. Clearly the problem is that the LOV dialog matches on the first occurrence of the values selected and does not take the Primary Key value into account. This limits our use of the Primary Key feature because a. depending on Tabular View or Hierarchical View we could get different results b. we can navigate to the correct values select them and then use them but the next time we return to the LOV dialog, it will remove duplicates and only select the first instance of our value selected
Rule – use of Primary Key entries in LOV can only be used with accuracy when the value is unique e.g. customer social security number, product code.
Steven White
Raising Index Awareness
Page 30 of 73
Example 5 – Using a Primary Key object in RESULTS pane So far in our examples we’ve placed Index Aware objects in the CONDITIONS pane of the qeury. This is the main use of Index Awareness but what happens when those same objects are used in the RESULTS pane? Let’s build a query with Customer and Revenue in the RESULTS pane.
First of all we’ll see what the SQL will be without Index Aware being active: SQL without Index Aware being active
Now, let’s set the Customer object (in the Customer class) to have a Primary Key entry of ‘Customer.cust_id’ i.e. And the SQL is now:
Notice that the SELECT of the query has changed, a MAX function has been added to the Customer.last_name (the Customer object did not have a MAX function in its SELECT definition). Also the GROUP BY clause has been changed i.e. Customer.last_name replaced with Customer.cust_id
Steven White
Raising Index Awareness
Page 31 of 73
Note : If a query does not have a GROUP BY clause (no measures etc) then the MAX function is not applied i.e. the Index Awareness itself does NOT cause a GROUP BY to be generated, that is done by normal Business Objects SQL generation
Steven White
Raising Index Awareness
Page 32 of 73
What affect does this have on our results? None for this query. With or without Index Aware this query will return the same results. Why? Because Customer.last_name is unique across the whole data set i.e. cust_id has a 1-to-1 relationship to last_name. Now in a real database (I apologize to the designer of the ‘club.mdb’ demo database!) would have last names that are not unique. The row counts for the 2 refreshes are indentical. The first refresh had Index Aware disabled and the second has it enabled.
What happens when values are not unique? Let’s make the last name of existing record in the Customer table equal to another last name: Customer with last name ‘Oneda’ was changed to ‘Okumura’
Then run the same query with and without Index Awareness and see what the row counts are:
20 rows are returned when Index Awareness is not applied but 21 are returned when it is applied. The difference is the result set is that Okumura last name has been aggregated at the database level when Index Aware is not enabled but when Index Awareness was enabled, both Okumura records were returned from the database and optionally aggregated locally within the report.
Steven White
Raising Index Awareness
Page 33 of 73
The reporter feature ‘Avoid duplicate row aggregation’ is used in the screenshot below. The reporter tables below with the header ‘Dupicates Aggregated’ have this feature unchecked i.e. OFF
So this means that Index Awareness can affect result sets and can cause less aggregation to be performed on the database, which is exactly where we’d want aggregation to be peformed to reduce the load on Reporter (BUSOBJ.EXE). Rule – when Index Aware objects are used in the RESULTS pane their Primary Key entries can cause additional rows to be returned from the database. Report results should remain the same if no local reporter calculations depend on the row count and level of uniqueness of the result set
Steven White
Raising Index Awareness
Page 34 of 73
Example 6 – Using an Index Awareness WHERE clause So far all our examples have ignored the WHERE clause part of the Primary Key and Foreign Key entries within the KEYS dialog. First, let’s see what happens when you add a WHERE clause to the Primary Key entry. The WHERE clause SQL gets added to the LOV query for that object:
Steven White
Raising Index Awareness
Page 35 of 73
And it gets added to the query that the object is used in, whether the Index Aware object is used in the RESULTS pane or the CONDITIONS pane:
So far I cannot think of a use for adding WHERE clauses to the Primary Key entries. I would assume if the original object has a WHERE clause you would want the Primary Key WHERE clause to replace that much as the Primary Key SELECT clause replaces the original SELECT clause but this means the universe developer would need to look up the Primary Key values in advance Note : TODO CHECK THIS STATEMENT If these Primary Key values are based on surrogate keys there is no guarantee that a database reload will maintain the same surrogate key values.
Steven White
Raising Index Awareness
Page 36 of 73
What happens to the Foreign Key WHERE clause? Assuming the Foreign Key entry is applied (depending on rules discussed elsewhere) its WHERE clause is simply ANDed to the query once the Foreign Key replacement has taken place. Based on:
We get:
Again, right now I cannot think of a use for the addition of the WHERE clause entry. Rules: •
WHERE clause entries for Primary Key and Foreign Key entries are ANDed to the query once the SELECT entry has been implemented.
•
WHERE clauses must reference the same tables as the WHERE clause for Primary Key entries
•
WHERE clauses should try to reference the same tables as the WHERE clause for Foreign Key entries to avoid introducing more complexity into the query
Steven White
Raising Index Awareness
Page 37 of 73
Example 7 – Multi-column primary keys Where a table does not have one column which uniquely identifies the rows but instead relies on more than one column, this is a ‘multi-column primary key’ Note : this is not be to confused with ‘concatenated index’ or ‘composite index’ which is a database index that is made up of more than one column in a table. There are no examples of this in the ‘club.mdb’ so we have to create our own schema or make believe using ‘club.mdb’. To save time let’s make believe… Let’s pretend that the Resort table has a multi-column primary key based on Country_id and Resort_id.
Since the KEYS dialog only allows one Primary Key entry we have to make that one entry relate to our multicolumn primary key. So we define a Primary Key entry as “Country_id & Resort.Resort_id” (the & is MS
The value of 61 comes from the concantenation of ‘6’ and ‘1’. ‘6’ represents the country_id for Australia and ‘1’ represents the resort_id for ‘Australian Reef’
Access syntax for concatenation).
Steven White
Raising Index Awareness
Page 38 of 73
Since this concatenation will be stated in the final query, it will likely negate any indexes that exist (typically any manipulation of values at SQL query run time negate indexes that could have been used on the columns affected by the manipulation) unless that index has been setup as a ‘function based index’ (see Database Performance section earlier for an explanation) So although the Index Awareness feature allows SELECT clauses that span multiple columns (or even tables) it is unlikely that the database indexes will do the same. You should avoid multi-column primary key entries and/or discuss indexes with your DBA. Surrogate keys are a good way around multi-column primary keys in database design and your database designer should consider them Rule – Index Awareness is best used with single column Primary Keys. Multicolumn primary keys should be avoided and/or your DBA should consider ‘function based indexes’ or surrogate keys
Steven White
Raising Index Awareness
Page 39 of 73
Example 8 –Affect of Row Based Restrictions Row Based Restrictions (RBR) are settings created in Supervisor that allow additional WHERE clauses to be ANDed to the final query SQL should that table be involved in the query. See Designer documentation for more information. Let’s see what happens: City object with Primary Key and Foreign Key entries and no RBR applied to the City table. As expected the restriction of ‘City EQUAL TO Albertville’ has been converted to a Primary Key value (25) pointing to the Customer.City_id Foreign Key
Steven White
Raising Index Awareness
Page 40 of 73
Now with a RBR active on the City table: The Customer.City_id is not used, instead the original Primary Key entry is used and the RBR is applied to the City table (as expected)
Rule - If an Index Aware object tables list references a table that has RBR, we cannot use any Foreign Key entries. Why? because the RBR in Supervisor will be invalid for the Foreign Key table.
Steven White
Raising Index Awareness
Page 41 of 73
KEYS dialog explained
The KEYS tab of the Object Properties dialog contains the following controls: • ‘type of values in PK and FK’ • DETECT • PARSE • Key Type • Select • Where • Enable • OK • Cancel • Apply • Help • Insert • Delete Note : Buttons INSERT, DELETE, Cancel and Help are not detailed below because they are standard Designer (and Windows) buttons and their purpose should be self explanatory
Steven White
Raising Index Awareness
Page 42 of 73
Drop down - Type of values in PK and FK This drop down lists all the standard data types that Business Objects handles i.e. Character, Date, Long Text and Number. You have to inform Business Objects which data type is the relevant for the series of Primary Key and Foreign Keys because the data type may be different from the object you are defining Index Awareness for e.g. column ‘City.city’ is a character column and the object ‘City’ is also a Character type but its Primary Key column ‘City.city_id’ is a numeric type.
All Primary Key and Foreign Key entries must be of the same value type because there is only one drop down for all entries. This makes sense because depending on the rules applied, any of the Primary Key and Foreign Key entries could be used in the final query. If you select the wrong type and then try to PARSE then you will receive the error ‘The expression type is not compatible with the object type’ as shown below:
Steven White
Raising Index Awareness
Page 43 of 73
DETECT Detects Primary Key and Foreign Keys if the middleware and database support their detection. Some databases do not support their detection e.g. MS Access. Should the database be unable to supply key information, the error ‘Key is not supported’ is raised by Business Objects.
Should the Index Aware object reference no tables or more than one table, the error ‘No detected key for
objects involving several tables’ will be raised: Note : this error description is slightly misleading because it is raised even when the object makes reference to no tables i.e. the Index Aware object SELECT and WHERE clauses result in no tables being referenced in the TABLES list Any Primary Key detected will over-write without warning any existing Primary Key entry Once the Primary Key has been detected then the Foreign Keys are sought. All Foreign Key entries will be added to the list if they didn’t previously exist i.e. no duplicates will be allowed
Steven White
Raising Index Awareness
Page 44 of 73
Knowing if the database allows key detection The PRM parameter KEY_INFO_SUPPORTED states whether the target database allows key detection. This parameter has a default value of Y (for Yes) so if it is missing from a PRM file, Business Objects will assume it can supply key info and attempt it. PRM file parameter that states if database allows key detection
Steven White
Raising Index Awareness
Page 45 of 73
PARSE Sends the SELECT and WHERE SQL of each entry, enabled or not, to the target database to confirm its syntax. Errors are listed in a single dialog:
Apart from the Business Objects caught error ‘The expression type is not compatible with the object type’ all other errors will be database specific, usually relating to errors in the building of the SELECT and WHERE SQL statements.
The following limitations of the PARSE detection should be noted: a.
Primary Key SELECT that does not reference any table is not detected (but the WHERE does reference a table)
b.
Does not detect the potential Cartesian Product in the final query when the Primary Key or Foreign Key SELECT or WHERE clauses are not joined to at least one of the Index Aware object’s tables list
c.
Does not detect the potential ‘Incompatible combination of objects’ in the final query when the tables referenced by the Primary Key or Foreign Key SELECT or WHERE clauses are not on the same context as the Index Aware object’s tables list.
Warning – In case you’re using MS Access for testing this you should note that MS Access does not always detect syntax errors e.g. WHERE clause with a single word ‘steven’ is not detected. Expressions with words separated by a space are detected as incorrect
Key Type There are only 2 key types i.e. Primary Key and Foreign Key. Only one Primary Key entry can exist for a single Index Aware object. There can be zero or more Foreign Key entries but before even 1 Foreign Key entry can be used a Primary Key entry must exist. Note : Should you try to create more than one Primary Key entry, an existing Primary Key entry will be converted automatically without warning to a Foreign Key. Its SQL will remain unchanged only its type. This could lead to incorrect SQL in the final query if the changed Primary Key entry is not reviewed by the universe developer
Select Enter or build the SELECT SQL which will be used to: a.
Replace the Index Aware object’s original SELECT SQL should the object be used in the CONDITIONS pane of a query (subject to other rules stated elsewhere)
b.
Added to the SQL of the Index Aware object’s LOV query
c.
Replace the Index Aware object’s SELECT SQL in a GROUP BY clause should the Index Aware object be used in the RESULTS pane of a query
Tip – add SQL comments to the Primary Key and Foreign Key entries so that they easily identifiable in the final query SQL e.g. [ Customer.City_id /* Primary Key */ ] for Oracle, where /* comment */ is Oracle comment syntax
Steven White
Raising Index Awareness
Page 46 of 73
Where Enter or build the WHERE SQL which will be ANDed to any query where the SELECT part is also used. ANDing additional conditions can cause unexpected results. You cannot chose whether the condition ORed instead. It is always ANDed. Consider the effect of ANDing the conditions carefully. Tip – add SQL comments to the Primary Key and Foreign Key entries so that they are easily identifiable in the final query SQL e.g. [ Customer.City_id IN (10, 11, 12) /* Foreign Key */ ]
Enable Only Primary Key or Foreign Key entries that are enabled will be taken into consideration during SQL generation. But enabled or not, the entries will be checked when the PARSE button is used.
Apply or OK Applies the changes and optionally closes the dialog. Further checking is done to detect the following errors: •
“There is at least one key without a Select statement. Each key requires at least one Select statement. You need to verify your keys” this matches the rules discussed earlier
•
“You defined multiple foreign keys without a primary key. Each foreign key requires at least one primary key. You need to redefine your foreign keys” this wording is a little confusing because it should not say “requires at least one SELECT” but should say “requires one SELECT” instead
These errors should be self explanatory based on our understanding of the rules determined so far.
Steven White
Raising Index Awareness
Page 47 of 73
SQL editor dialog The SQL editor available to the management of Index Aware is the same SQL editor dialog available to predefined conditions, objects SELECT and objects WHERE clauses. This dialog provides normal SQL constructs as well as more complex items such as:
If your target database allows them, you can add SQL comments to the Primary Key and Foreign Key entries so that they easily identifiable in the final query SQL e.g. [ Customer.city_id /* Index Awareness Foreign Key */ ] for Oracle note : for Primary Key and Foreign Key WHERE clauses, the PARSE button within this dialog always presents the error ‘Parse failed : Invalid Definition (UNV0023)’ not matter what you enter. Ignore this error for now.
TODO – check what affect @Aggregate_Aware which also changes tables involved in a query has on Index Awareness
Steven White
Raising Index Awareness
Page 48 of 73
Limitations and rules Like any product functionality there are limits to its design. The following issues have been identified below.
Limited set of operands Index Awareness will only apply with the following operands: • Equal to • Different From • In List • Not In List
Right hand side of operand must be selected When the operands above are used, only value(s) selected from the ‘Show list of values’ dialog will trigger Index Awareness functionality. Typing or entering a value will not trigger it even if the values are identical to those in the LOV Why? Because Index Awareness uses the selection from the LOV dialog to grab the Primary Key value for each entry. If you entered your selection using ‘Type a new constant’ (for example) Index Awareness would then have to go out to the database again and locate the matching Primary Key value. (or at least interrogate the cached LOV which it currently does not do) Also, creating a condition by using the ‘Simple Condition’ toolbar button within the query panel does not permit Index Awareness either
Only works with Full Client reports Disappointing but we have to start somewhere!
Does not work with prompts Disappointing but expected because of the following reasons: a.
Complexity – prompt population takes place after all the tables in a query has been decided upon. Whereas Index Awareness helps build the overall query, changing tables etc. The point in the query building workflow where these things happen are entirely different.
b.
Shared LOVs – if more than one prompt exists in the report and/or universe with the same name it is reduced to a single prompt display and common LOV. But what if the prompt was related to different objects with completely different LOVs and KEY definitions? This complication has not been overcome in the current version of Index Awareness
c.
Prompt definition – prompts can be defined in the universe with hard coded LOV or LOVs that come from another object entirely. This complication has not been overcome in this feature.
Steven White
Raising Index Awareness
Page 49 of 73
List of rules found in examples The following are repetition of rules discovered and explained within the examples earlier in the document. They are listed here for reference: •
A Foreign Key entry will be ignored if it does not result in less tables being used. Business Objects can only use less tables if the Primary Key table is referenced only inside the WHERE clause.
•
The sequence of Foreign Key entries in the KEYS dialog determines which entry is given preference i.e. the last enabled entry in the list that results in the least number of tables. But if the ‘best’ Foreign Key entry does not result in a table count reduction compared to the original query, the Primary Key entry will be the only one that applies
•
Use of Primary Key entries in LOV can only be used with accuracy when the value is unique e.g. customer social security number, product code.
•
When Index Aware objects are used in the RESULTS pane their Primary Key entries can cause additional rows to be returned from the database. Report results should remain the same if no local reporter calculations depend on the row count and level of uniqueness of the result set
•
WHERE clause entries for Primary Key and Foreign Key entries are ANDed to the query once the SELECT entry has been implemented.
•
For Primary Key entries the WHERE clauses must reference the same tables as the Index Aware object’s original tables list
•
For Foreign Key entries the WHERE clauses should try to reference the same tables as the Index Aware original object’s tables list avoid introducing more complexity into the query
•
Index Awareness is best used with single column Primary Keys. Multi-column primary keys should be avoided and/or your DBA should consider ‘function based indexes’ and multi-column indexes.
•
If an Index Aware object tables list references a table that has RBR, we cannot use any Foreign Key entries. Why? because the RBR in Supervisor will be invalid for the Foreign Key table.
Steven White
Raising Index Awareness
Page 50 of 73
Appendix A – Bugs and Enhancement Requests Please find below a list of bugs and enhancement requests that the author has raised during the creation of this document. If you notice the same problems, please contact Business Objects Customer Support at www.techsupport.businessobjects,com and have them raise support cases which are linked to the bugs and enhancement requests listed below. The more feedback the developers receive, will help guide their coding efforts. If they don’t know its broke, they wont fix it! Bug n/a 1099793
ER number 30602 30615 30613
n/a 30612 30614 30610
30616
Steven White
List of bugs noticed in product Description Query SQL sometimes not being updated when returning to query NOTE : could not raise as a bug because could not replicate it consistently SQL editor dialog - for Primary Key and Foreign Key WHERE clauses, the PARSE button within this dialog always presents the error ‘Parse failed : Invalid Definition (UNV0023)’ not matter what you enter. Ignore this error for now.
List of Enhancement Requests Description When editing WHERE the dialog says ‘edit SELECT’ Allow Primary Key and Foreign Key entries in KEYS tab to be reordered Error on Apply/OK wording “You defined multiple foreign keys without a primary key. Each foreign key requires at least one primary key. You need to redefine your foreign keys” this wording is a little confusing. Basically you cannot have one Foreign Key without first having a single Primary Key entry. Work with prompts Note : did not raise it as a ER because I realize the difficulty involved Work with BOTH and EXCEPT Applying a simple filter to an object in results pane does not trigger Index Aware (only LOV from ‘list of values’ operand does) within KEYS property of an object, the DETECT button raises an error "No detected key for objects involving several tables". but the error is also raised when the object does not refer to any table. Suggest the error text is "Keys can only be detected for objects that refer to one and only one table" Should you try to create more than one Primary Key entry, an existing Primary Key entry will be converted automatically without warning to a Foreign Key. Its SQL will remain unchanged only its type. This could lead to incorrect SQL in the final query if the changed Primary Key entry is not reviewed by the universe developer suggest a dialog is raised warning the user to review the changed PK entry (which is now a FK)
Raising Index Awareness
Page 51 of 73
Appendix B – Accessing Index Aware information There are 2 ways to automate access to universe information: 1. Designer SDK 2. Business Objects repository tables
Designer SDK There seems to be no way using the SDK to access Index Aware information i.e. the properties and methods are not exposed.
Business Objects repository tables The new repository universe domain table that contains Index Aware information is UNV_OBJECT_KEY but existing UNV_OBJECT_DATA and UNIVERSE_OBJECT tables are also affected. Table UNV_OBJECT_KEY
A universe exists, called ‘combined.unv’ which was written by the author of this document, will include a first attempt at reporting on the KEY information. The additions to the universe are not perfect, its limitations are: 1. outer joins are not handled – so trying to list objects whether they have KEY records or not will not work i.e. only objects with KEY records will be returned 2. returning the SELECT and WHERE clauses in a single Business Objects query will generate 2 SELECT statements because these are on separate contexts. This may not be correct but it works for now
Steven White
Raising Index Awareness
Page 52 of 73
The following is a piece of SQL that shows the SELECT clauses of any Index Aware objects: SELECT base_class.CLS_NAME, UNV_OBJECT.OBJ_NAME, object_key_links.KEY_STATES, object_key_links.KEY_POSITION, object_keys.OBJ_NAME, key_select_clause.OBJ_DATAVALUE FROM UNV_OBJECT_KEY object_key_links, UNV_CLASS base_class, UNV_OBJECT, UNV_OBJECT object_keys, UNV_OBJECT_DATA key_select_clause WHERE ( UNV_OBJECT.OBJECT_ID=object_key_links.OBJECT_ID and UNV_OBJECT.UNIVERSE_ID=object_key_links.UNIVERSE_ID ) AND ( key_select_clause.OBJ_DATATYPE = 'S' ) AND ( key_select_clause.OBJ_SLICE=1 ) AND ( object_key_links.UNIVERSE_ID=object_keys.UNIVERSE_ID and object_key_links.KEY_ID=object_keys.OBJECT_ID ) AND ( object_keys.OBJECT_ID=key_select_clause.OBJECT_ID and object_keys.UNIVERSE_ID=key_select_clause.UNIVERSE_ID ) AND base_class.CLASS_ID=UNV_OBJECT.CLASS_ID AND base_class.UNIVERSE_ID=UNV_OBJECT.UNIVERSE_ID
It is suggested you experiment with the 3 tables highlighted and the combined.unv universe.
Steven White
Raising Index Awareness
Page 53 of 73
Appendix C – Outstanding Items The list below are items that need to be clarified or completed in this document. This will not teach the reader anything about Index Awareness, they are simply a reminder to the author to complete them for the next release of the document. 1.
The use Primary Key to solve the old problem of hierarchical prompts only returning the value selected and not the path
2.
Consider prefixing object value with Primary Key value
3.
What happens when you ignore ‘the expression type us not compatible with the object type’
The Secrets of Oracle Bitmap Indexes
Steven White
Raising Index Awareness
Page 54 of 73
Overview
Oracle's two major index types are Bitmap indexes and B-Tree indexes. B-Tree indexes are the regu OLTP systems make much use of, and bitmap indexes are a highly compressed index type that tends primarily for data warehouses. Characteristic of Bitmap Indexes •
For columns with very few unique values (low cardinality)
Columns that have low cardinality are good candidates (if the cardinality of a column is <= 0.1 % th column is ideal candidate, consider also 0.2% – 1%) •
Tables that have no or little insert/update are good candidates (static data in warehouse)
•
Stream of bits: each bit relates to a column value in a single row of table
create bitmap index person_region on person (region); Row 1 2 3 4 5 6
Region North East West South North 1 0 0 0 East 0 1 0 0 West 0 0 1 0 West 0 0 1 0 South 0 0 0 1 North 1 0 0 0
Advantage of Bitmap Indexes
The advantages of them are that they have a highly compressed structure, making them fast to read structure makes it possible for the system to combine multiple indexes together for fast access to the table.
Compressed indexes, like bitmap indexes, represent a trade-off between CPU usage and disk space u compressed structure is faster to read from disk but takes additional CPU cycles to decompress for a uncompressed structure imposes a lower CPU load but requires more bandwidth to read in a short tim
One belief concerning bitmap indexes is that they are only suitable for indexing low-cardinality data necessarily true, and bitmap indexes can be used very successfully for indexing columns with many different values. Disadvantage of Bitmap Indexes
The reason for confining bitmap indexes to data warehouses is that the overhead on maintaining th enormous. A modification to a bitmap index requires a great deal more work on behalf of the system modification to a b-tree index. In addition, the concurrency for modifications on bitmap indexes is dr Bitmap Indexes and Deadlocks
Bitmap indexes are not appropriate for tables that have lots of single row DML operations (inserts) a concurrent single row DML operations. Deadlock situations are the result of concurrent inserts as the example shows: Open two windows, one for Session 1 and one for Session 2 Session 1
Session 2
create table bitmap_index_demo ( value varchar2(20) );
Steven White
Raising Index Awareness
Page 55 of 73
Bitmap Index vs. B-tree Index: Which and When? by Vivek Sharma Understanding the proper application of each index can have a big impact on performance. Conventional wisdom holds that bitmap indexes are most appropriate for columns having low distinct values—such as GENDER, MARITAL_STATUS, and RELATION. This assumption is not completely accurate, however. In reality, a bitmap index is always advisable for systems in which data is not frequently updated by many concurrent systems. In fact, as I'll demonstrate here, a bitmap index on a column with 100-percent unique values (a column candidate for primary key) is as efficient as a B-tree index. In this article I'll provide some examples, along with optimizer decisions, that are common for both types of indexes on a low-cardinality column as well as a high-cardinality one. These examples will help DBAs understand that the usage of bitmap indexes is not in fact cardinality dependent but rather application dependent. Comparing the Indexes There are several disadvantages to using a bitmap index on a unique column—one being the need for sufficient space (and Oracle does not recommend it). However, the size of the bitmap index depends on the cardinality of the column on which it is created as well as the data distribution. Consequently, a bitmap index on the GENDER column will be smaller than a B-tree index on the same column. In contrast, a bitmap index on EMPNO (a candidate for primary key) will be much larger than a B-tree index on this column. But because fewer users access decision-support systems (DSS) systems than would access transaction-processing (OLTP) ones, resources are not a problem for these applications. To illustrate this point, I created two tables, TEST_NORMAL and TEST_RANDOM. I inserted one million rows into the TEST_NORMAL table using a PL/SQL block, and then inserted these rows into the TEST_RANDOM table in random order: Create table test_normal (empno number(10), ename varchar2(30), sal number(10)); Begin For i in 1..1000000 Loop Insert into test_normal values(i, dbms_random.string('U',30), dbms_random.value(1000,7000)); If mod(i, 10000) = 0 then Commit; End if; End loop; End; / Create table test_random as select /*+ append */ * from test_normal order by dbms_random.random; SQL> select count(*) "Total Rows" from test_normal; Total Rows ---------1000000
Steven White
Raising Index Awareness
Page 56 of 73
Elapsed: 00:00:01.09 SQL> select count(distinct empno) "Distinct Values" from test_normal; Distinct Values --------------1000000 Elapsed: 00:00:06.09 SQL> select count(*) "Total Rows" from test_random; Total Rows ---------1000000 Elapsed: 00:00:03.05 SQL> select count(distinct empno) "Distinct Values" from test_random; Distinct Values --------------1000000 Elapsed: 00:00:12.07
Note that the TEST_NORMAL table is organized and that the TEST_RANDOM table is randomly created and hence has disorganized data. In the above table, column EMPNO has 100-percent distinct values and is a good candidate to become a primary key. If you define this column as a primary key, you will create a B-tree index and not a bitmap index because Oracle does not support bitmap primary key indexes. To analyze the behavior of these indexes, we will perform the following steps: 1. On TEST_NORMAL: A. Create a bitmap index on the EMPNO column and execute some queries with equality predicates. B. Create a B-tree index on the EMPNO column, execute some queries with equality predicates, and compare the logical and physical I/Os done by the queries to fetch the results for different sets of values. 2. On TEST_RANDOM: A. Same as Step 1A. B. Same as Step 1B. 3. On TEST_NORMAL: A. Same as Step 1A, except that the queries are executed within a range of predicates. B. Same as Step 1B, except that the queries are executed within a range of predicates. Now compare the statistics. 4. On TEST_RANDOM: A. Same as Step 3A. B. Same as Step 3B. 5. On TEST_NORMAL: Steven White
Raising Index Awareness
Page 57 of 73
A. Create a bitmap index on the SAL column, and then execute some queries with equality predicates and some with range predicates. B. Create a B-tree index on the SAL column, and then execute some queries with equality predicates and some with range predicates (same set of values as in Step 5A). Compare the I/Os done by the queries to fetch the results. 6.
Add a GENDER column to both of the tables, and update the column with three possible values: M for male, F for female, and null for N/A. This column is updated with these values based on some condition.
7. Create a bitmap index on this column, and then execute some queries with equality predicates. 8. Create a B-tree index on the GENDER column, and then execute some queries with equality predicates. Compare to results from Step 7. Steps 1 to 4 involve a high-cardinality (100-percent distinct) column, Step 5 a normalcardinality column, and Steps 7 and 8 a low-cardinality column. Step 1A (on TEST_NORMAL) In this step, we will create a bitmap index on the TEST_NORMAL table and then check for the size of this index, its clustering factor, and the size of the table. Then we will run some queries with equality predicates and note the I/Os of these queries using this bitmap index. SQL> create bitmap index normal_empno_bmx on test_normal(empno); Index created. Elapsed: 00:00:29.06 SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns; Table analyzed. Elapsed: 00:00:19.01 SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" 2 from user_segments 3* where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_BMX'); SEGMENT_NAME -----------------------------------TEST_NORMAL NORMAL_EMPNO_BMX
Size in MB --------------50 28
Elapsed: 00:00:02.00 SQL> select index_name, clustering_factor from user_indexes; INDEX_NAME -----------------------------NORMAL_EMPNO_BMX
CLUSTERING_FACTOR --------------------------------1000000
Elapsed: 00:00:00.00
You can see in the preceding table that the size of the index is 28MB and that the clustering factor is equal to the number of rows in the table. Now let's execute the queries with equality predicates for different sets of values: SQL> set autotrace only SQL> select * from test_normal where empno=&empno; Enter value for empno: 1000 old 1: select * from test_normal where empno=&empno
Steven White
Raising Index Awareness
Page 58 of 73
new
1: select * from test_normal where empno=1000
Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car d=1 Bytes=34) 2 1 BITMAP CONVERSION (TO ROWIDS) 3 2 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_EMPNO_BMX' Statistics ---------------------------------------------------------0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 515 bytes sent via SQL*Net to client 499 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
Step 1B (on TEST_NORMAL) Now we will drop this bitmap index and create a B-tree index on the EMPNO column. As before, we will check for the size of the index and its clustering factor and execute the same queries for the same set of values, to compare the I/Os. SQL> drop index NORMAL_EMPNO_BMX; Index dropped. SQL> create index normal_empno_idx on test_normal(empno); Index created. SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns; Table analyzed. SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" 2 from user_segments 3 where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_IDX'); SEGMENT_NAME ---------------------------------TEST_NORMAL NORMAL_EMPNO_IDX
Size in MB --------------50 18
SQL> select index_name, clustering_factor from user_indexes; INDEX_NAME ---------------------------------NORMAL_EMPNO_IDX
CLUSTERING_FACTOR ---------------------------------6210
It is clear in this table that the B-tree index is smaller than the bitmap index on the EMPNO column. The clustering factor of the B-tree index is much nearer to the number of blocks in a table; for that reason, the B-tree index is efficient for range predicate queries. Now we'll run the same queries for the same set of values, using our B-tree index.
Steven White
Raising Index Awareness
Page 59 of 73
SQL> set autot trace SQL> select * from test_normal where empno=&empno; Enter value for empno: 1000 old 1: select * from test_normal where empno=&empno new 1: select * from test_normal where empno=1000 Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car d=1 Bytes=34) 2 1 INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (C ost=3 Card=1) Statistics ---------------------------------------------------------29 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 515 bytes sent via SQL*Net to client 499 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
As you can see, when the queries are executed for different set of values, the number of consistent gets and physical reads are identical for bitmap and B-tree indexes on a 100percent unique column. BITMAP B-TREE EMPNO Consistent Consistent Physical Physical Reads Reads Reads Reads 5
0
1000
5
0
5
2
2398
5
2
5
2
8545
5
2
5
2
98008
5
2
5
2
85342
5
2
5
2
128444
5
2
5
2
858
5
2
Step 2A (on TEST_RANDOM) Now we'll perform the same experiment on TEST_RANDOM: SQL> create bitmap index random_empno_bmx on test_random(empno); Index created. SQL> analyze table test_random compute statistics for table for all indexes for all indexed columns; Table analyzed. SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
Steven White
Raising Index Awareness
Page 60 of 73
2 from user_segments 3* where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_BMX'); SEGMENT_NAME -----------------------------------TEST_RANDOM RANDOM_EMPNO_BMX
Size in MB --------------50 28
SQL> select index_name, clustering_factor from user_indexes; INDEX_NAME -----------------------------RANDOM_EMPNO_BMX
CLUSTERING_FACTOR --------------------------------1000000
Again, the statistics (size and clustering factor) are identical to those of the index on the TEST_NORMAL table: SQL> select * from test_random where empno=&empno; Enter value for empno: 1000 old 1: select * from test_random where empno=&empno new 1: select * from test_random where empno=1000 Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1 Bytes=34) 2 1 BITMAP CONVERSION (TO ROWIDS) 3 2 BITMAP INDEX (SINGLE VALUE) OF 'RANDOM_EMPNO_BMX' Statistics ---------------------------------------------------------0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 515 bytes sent via SQL*Net to client 499 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
Step 2B (on TEST_RANDOM) Now, as in Step 1B, we will drop the bitmap index and create a B-tree index on the EMPNO column. SQL> drop index RANDOM_EMPNO_BMX; Index dropped. SQL> create index random_empno_idx on test_random(empno); Index created. SQL> analyze table test_random compute statistics for table for all indexes for all indexed columns; Table analyzed. SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" 2 from user_segments 3 where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_IDX');
Steven White
Raising Index Awareness
Page 61 of 73
SEGMENT_NAME ---------------------------------TEST_RANDOM RANDOM_EMPNO_IDX
Size in MB --------------50 18
SQL> select index_name, clustering_factor from user_indexes; INDEX_NAME ---------------------------------RANDOM_EMPNO_IDX
CLUSTERING_FACTOR ---------------------------------999830
This table shows that the size of the index is equal to the size of this index on TEST_NORMAL table but the clustering factor is much nearer to the number of rows, which makes this index inefficient for range predicate queries (which we'll see in Step 4). This clustering factor will not affect the equality predicate queries because the rows have 100-percent distinct values and the number of rows per key is 1. Now let's run the queries with equality predicates and the same set of values. SQL> select * from test_random where empno=&empno; Enter value for empno: 1000 old 1: select * from test_random where empno=&empno new 1: select * from test_random where empno=1000 Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1 Bytes=34) 2 1 INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_IDX' (NON-UNIQUE) (Cost=3 Card=1) Statistics ---------------------------------------------------------0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 515 bytes sent via SQL*Net to client 499 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
Again, the results are almost identical to those in Steps 1A and 1B. The data distribution did not affect the amount of consistent gets and physical reads for a unique column. Step 3A (on TEST_NORMAL) In this step, we will create the bitmap index (similar to Step 1A). We know the size and the clustering factor of the index, which equals the number of rows in the table. Now let's run some queries with range predicates. SQL> select * from test_normal where empno between &range1 and &range2; Enter value for range1: 1 Enter value for range2: 2300 old 1: select * from test_normal where empno between &range1 and &range2 new 1: select * from test_normal where empno between 1 and 2300 2300 rows selected.
Steven White
Raising Index Awareness
Page 62 of 73
Elapsed: 00:00:00.03 Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=451 Card=2299 Bytes=78166) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=451 Card=2299 Bytes=78166) 2 1 BITMAP CONVERSION (TO ROWIDS) 3 2 BITMAP INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_BMX' Statistics ---------------------------------------------------------0 recursive calls 0 db block gets 331 consistent gets 0 physical reads 0 redo size 111416 bytes sent via SQL*Net to client 2182 bytes received via SQL*Net from client 155 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 2300 rows processed
Step 3B (on TEST_NORMAL) In this step, we'll execute the queries against the TEST_NORMAL table with a B-tree index on it. SQL> select * from test_normal where empno between &range1 and &range2; Enter value for range1: 1 Enter value for range2: 2300 old 1: select * from test_normal where empno between &range1 and &range2 new 1: select * from test_normal where empno between 1 and 2300 2300 rows selected. Elapsed: 00:00:00.02 Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=23 Card=2299 Bytes=78166) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=23 Card=2299 Bytes=78166) 2 1 INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (Cost=8 Card=2299) Statistics ---------------------------------------------------------0 recursive calls 0 db block gets 329 consistent gets 15 physical reads 0 redo size 111416 bytes sent via SQL*Net to client 2182 bytes received via SQL*Net from client 155 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 2300 rows processed
When these queries are executed for different sets of ranges, the results below show:
Steven White
Raising Index Awareness
Page 63 of 73
BITMAP Consistent Reads
Physical Reads
331
0
285
EMPNO (Range)
B-TREE Consistent Reads
Physical Reads
1-2300
329
0
0
8-1980
283
0
346
19
1850-4250
344
16
427
31
28888-31850
424
28
371
27
82900-85478
367
23
2157
149
984888-1000000 2139
35
As you can see, the number of consistent gets and physical reads with both indexes is again nearly identical. The last range (984888-1000000) returned almost 15,000 rows, which was the maximum number of rows fetched for all the ranges given above. So when we asked for a full table scan (by giving the hint /*+ full(test_normal) */ ), the consistent read and physical read counts were 7,239 and 5,663, respectively. Step 4A (on TEST_RANDOM) In this step, we will run the queries with range predicates on the TEST_RANDOM table with bitmap index and check for consistent gets and physical reads. Here you'll see the impact of the clustering factor. SQL>select * from test_random where empno between &range1 and &range2; Enter value for range1: 1 Enter value for range2: 2300 old 1: select * from test_random where empno between &range1 and &range2 new 1: select * from test_random where empno between 1 and 2300 2300 rows selected. Elapsed: 00:00:08.01 Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=453 Card=2299 Bytes=78166) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=453 Card=2299 Bytes=78166) 2 1 BITMAP CONVERSION (TO ROWIDS) 3 2 BITMAP INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_BMX' Statistics ---------------------------------------------------------0 recursive calls 0 db block gets 2463 consistent gets 1200 physical reads 0 redo size 111416 bytes sent via SQL*Net to client 2182 bytes received via SQL*Net from client 155 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 2300 rows processed
Step 4B (on TEST_RANDOM)
Steven White
Raising Index Awareness
Page 64 of 73
In this step, we will execute the range predicate queries on TEST_RANDOM with a B-tree index on it. Recall that the clustering factor of this index was very close to the number of rows in a table (and thus inefficient). Here's what the optimizer has to say about that: SQL> select * from test_random where empno between &range1 and &range2; Enter value for range1: 1 Enter value for range2: 2300 old 1: select * from test_random where empno between &range1 and &range2 new 1: select * from test_random where empno between 1 and 2300 2300 rows selected. Elapsed: 00:00:03.04 Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=613 Card=2299 Bytes=78166) 1 0 TABLE ACCESS (FULL) OF 'TEST_RANDOM' (Cost=613 Card=2299 Bytes=78166) Statistics ---------------------------------------------------------0 recursive calls 0 db block gets 6415 consistent gets 4910 physical reads 0 redo size 111416 bytes sent via SQL*Net to client 2182 bytes received via SQL*Net from client 155 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 2300 rows processed
The optimizer opted for a full table scan rather than using the index because of the clustering factor: BITMAP Consistent Reads
Physical Reads
2463
1200
2114
EMPNO (Range)
B-TREE Consistent Reads
Physical Reads
1-2300
6415
4910
31
8-1980
6389
4910
2572
1135
1850-4250
6418
4909
3173
1620
28888-31850
6456
4909
2762
1358
82900-85478
6431
4909
7254
3329
984888-1000000 7254
4909
For the last range (984888-1000000) only, the optimizer opted for a full table scan for the bitmap index, whereas for all ranges, it opted for a full table scan for the B-tree index. This disparity is due to the clustering factor: The optimizer does not consider the value of the clustering factor when generating execution plans using a bitmap index, whereas for a Btree index, it does. In this scenario, the bitmap index performs more efficiently than the Btree index. The following steps reveal more interesting facts about these indexes. Step 5A (on TEST_NORMAL)
Steven White
Raising Index Awareness
Page 65 of 73
Create a bitmap index on the SAL column of the TEST_NORMAL table. This column has normal cardinality. SQL> create bitmap index normal_sal_bmx on test_normal(sal); Index created. SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns; Table analyzed.
Now let's get the size of the index and the clustering factor. SQL>select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" 2* from user_segments 3* where segment_name in ('TEST_NORMAL','NORMAL_SAL_BMX'); SEGMENT_NAME -----------------------------TEST_NORMAL NORMAL_SAL_BMX
Size in MB -------------50 4
SQL> select index_name, clustering_factor from user_indexes; INDEX_NAME -----------------------------NORMAL_SAL_BMX
CLUSTERING_FACTOR ---------------------------------6001
Now for the queries. First run them with equality predicates:
SQL> set autot trace SQL> select * from test_normal where sal=&sal; Enter value for sal: 1869 old 1: select * from test_normal where sal=&sal new 1: select * from test_normal where sal=1869 164 rows selected. Elapsed: 00:00:00.08 Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=39 Card=168 Bytes=4032) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=39 Card=168 Bytes=4032) 2 1 BITMAP CONVERSION (TO ROWIDS) 3 2 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' Statistics ---------------------------------------------------------0 recursive calls 0 db block gets 165 consistent gets 0 physical reads 0 redo size 8461 bytes sent via SQL*Net to client 609 bytes received via SQL*Net from client 12 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 164 rows processed
and then with range predicates: SQL> select * from test_normal where sal between &sal1 and &sal2; Enter value for sal1: 1500 Enter value for sal2: 2000
Steven White
Raising Index Awareness
Page 66 of 73
old new
1: select * from test_normal where sal between &sal1 and &sal2 1: select * from test_normal where sal between 1500 and 2000
83743 rows selected. Elapsed: 00:00:05.00 Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes =2001024) 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376 Bytes=2001024) Statistics ---------------------------------------------------------0 recursive calls 0 db block gets 11778 consistent gets 5850 physical reads 0 redo size 4123553 bytes sent via SQL*Net to client 61901 bytes received via SQL*Net from client 5584 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 83743 rows processed
Now drop the bitmap index and create a B-tree index on TEST_NORMAL. SQL> create index normal_sal_idx on test_normal(sal); Index created. SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns; Table analyzed.
Take a look at the size of the index and the clustering factor.
SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" 2 from user_segments 3 where segment_name in ('TEST_NORMAL','NORMAL_SAL_IDX'); SEGMENT_NAME -----------------------------TEST_NORMAL NORMAL_SAL_IDX
Size in MB --------------50 17
SQL> select index_name, clustering_factor from user_indexes; INDEX_NAME -----------------------------NORMAL_SAL_IDX
CLUSTERING_FACTOR ---------------------------------986778
In the above table, you can see that this index is larger than the bitmap index on the same column. The clustering factor is also near the number of rows in this table. Now for the tests; equality predicates first: SQL> set autot trace SQL> select * from test_normal where sal=&sal; Enter value for sal: 1869 old 1: select * from test_normal where sal=&sal new 1: select * from test_normal where sal=1869 164 rows selected.
Steven White
Raising Index Awareness
Page 67 of 73
Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=169 Card=168 Bytes=4032) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=169 Card=168 Bytes=4032) 2 1 INDEX (RANGE SCAN) OF 'NORMAL_SAL_IDX' (NON-UNIQUE) (Cost=3 Card=168) Statistics ---------------------------------------------------------0 recursive calls 0 db block gets 177 consistent gets 0 physical reads 0 redo size 8461 bytes sent via SQL*Net to client 609 bytes received via SQL*Net from client 12 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 164 rows processed
...and then, range predicates: SQL> select * from test_normal where sal between &sal1 and &sal2; Enter value for sal1: 1500 Enter value for sal2: 2000 old 1: select * from test_normal where sal between &sal1 and &sal2 new 1: select * from test_normal where sal between 1500 and 2000 83743 rows selected. Elapsed: 00:00:04.03 Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes =2001024) 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376 Bytes=2001024) Statistics ---------------------------------------------------------0 recursive calls 0 db block gets 11778 consistent gets 3891 physical reads 0 redo size 4123553 bytes sent via SQL*Net to client 61901 bytes received via SQL*Net from client 5584 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 83743 rows processed
When the queries were executed for different set of values, the resulting output, as shown in the tables below, reveals that the numbers of consistent gets and physical reads are identical. BITMAP B-TREE SAL Rows Consistent Physical (Equality) Consistent Reads Physical Fetched Reads Reads Reads 165 Steven White
0
1869
177
Raising Index Awareness
164 Page 68 of 73
169
163
3548
181
167
174
166
6500
187
172
75
69
7000
81
73
177
163
2500
190
175
BITMAP Consistent Reads
Physical Reads
11778
5850
11765
SAL (Range)
B-TREE
Rows Fetched
Consistent Reads
Physical Reads
1500-2000
11778
3891
83743
5468
2000-2500
11765
3879
83328
11753
5471
2500-3000
11753
3884
83318
17309
5472
3000-4000
17309
3892
166999
39398
5454
4000-7000
39398
3973
500520
For range predicates the optimizer opted for a full table scan for all the different set of values—it didn't use the indexes at all—whereas for equality predicates, the optimizer used the indexes. Again, the consistent gets and physical reads are identical. Consequently, you can conclude that for a normal-cardinality column, the optimizer decisions for the two types of indexes were the same and there were no significant differences between the I/Os. Step 6 (add a GENDER column) Before performing the test on a low-cardinality column, let's add a GENDER column to this table and update it with M, F, and null values. SQL> alter table test_normal add GENDER varchar2(1); Table altered. SQL> select GENDER, count(*) from test_normal group by GENDER; S F M
COUNT(*) ---------333769 499921 166310
3 rows selected.
The size of the bitmap index on this column is around 570KB, as indicated in the table below: SQL> create bitmap index normal_GENDER_bmx on test_normal(GENDER); Index created. Elapsed: 00:00:02.08 SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" 2 from user_segments 3 where segment_name in ('TEST_NORMAL','NORMAL_GENDER_BMX'); SEGMENT_NAME -----------------------------TEST_NORMAL
Steven White
Size in MB --------------50
Raising Index Awareness
Page 69 of 73
NORMAL_GENDER_BMX
.5625
2 rows selected.
In contrast, the B-tree index on this column is 13MB in size, which is much bigger than the bitmap index on this column. SQL> create index normal_GENDER_idx on test_normal(GENDER); Index created. SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" 2 from user_segments 3 where segment_name in ('TEST_NORMAL','NORMAL_GENDER_IDX'); SEGMENT_NAME -----------------------------TEST_NORMAL NORMAL_GENDER_IDX
Size in MB --------------50 13
2 rows selected.
Now, if we execute a query with equality predicates, the optimizer will not make use of this index, be it a bitmap or a B-tree. Rather, it will prefer a full table scan. SQL> select * from test_normal where GENDER is null; 166310 rows selected. Elapsed: 00:00:06.08 Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=166310 Bytes=4157750) 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=166310 Bytes=4157750) SQL> select * from test_normal where GENDER='M'; 499921 rows selected. Elapsed: 00:00:16.07 Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=499921 Bytes=12498025) 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=499921Bytes=12498025) SQL>select * from test_normal where GENDER='F' / 333769 rows selected. Elapsed: 00:00:12.02 Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=333769 Byte s=8344225) 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=333769 Bytes=8344225)
Conclusions
Steven White
Raising Index Awareness
Page 70 of 73
Now that we understood how the optimizer reacts to these techniques, let's examine a scenario that clearly demonstrates the best respective applications of bitmap indexes and Btree indexes. With a bitmap index on the GENDER column in place, create another bitmap index on the SAL column and then execute some queries. The queries will be re-executed with B-tree indexes on these columns. From the TEST_NORMAL table, you need the employee number of all the male employees whose monthly salaries equal any of the following values: 1000 1500 2000 2500 3000 3500 4000 4500
Thus: SQL>select * from test_normal where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';
This is a typical data warehouse query, which, of course, you should never execute on an OLTP system. Here are the results with the bitmap index in place on both columns: SQL>select * from test_normal where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M'; 1453 rows selected. Elapsed: 00:00:02.03 Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=198 Card=754 Bytes=18850) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=198 Card=754 Bytes=18850) 2 1 BITMAP CONVERSION (TO ROWIDS) 3 2 BITMAP AND 4 3 BITMAP OR 5 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' 6 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' 7 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' 8 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' 9 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' 10 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' 11 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' 12 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' 13 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' 14 3 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_GENDER_BMX' Statistics ---------------------------------------------------------0 recursive calls 0 db block gets 1353 consistent gets 920 physical reads 0 redo size 75604 bytes sent via SQL*Net to client 1555 bytes received via SQL*Net from client 98 SQL*Net roundtrips to/from client
Steven White
Raising Index Awareness
Page 71 of 73
0 0 1453
sorts (memory) sorts (disk) rows processed
And with the B-tree index in place:
SQL>select * from test_normal where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M'; 1453 rows selected. Elapsed: 00:00:03.01 Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=754 Bytes=18850) 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=754 Bytes=18850) Statistics ---------------------------------------------------------0 recursive calls 0 db block gets 6333 consistent gets 4412 physical reads 0 redo size 75604 bytes sent via SQL*Net to client 1555 bytes received via SQL*Net from client 98 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1453 rows processed
As you can see here, with the B-tree index, the optimizer opted for a full table scan, whereas in the case of the bitmap index, it used the index to answer the query. You can deduce performance by the number of I/Os required to fetch the result. In summary, bitmap indexes are best suited for DSS regardless of cardinality for these reasons: •
With bitmap indexes, the optimizer can efficiently answer queries that include AND, OR, or XOR. (Oracle supports dynamic B-tree-to-bitmap conversion, but it can be inefficient.)
•
With bitmaps, the optimizer can answer queries when searching or counting for nulls. Null values are also indexed in bitmap indexes (unlike B-tree indexes).
•
Most important, bitmap indexes in DSS systems support ad hoc queries, whereas Btree indexes do not. More specifically, if you have a table with 50 columns and users frequently query on 10 of them—either the combination of all 10 columns or sometimes a single column—creating a B-tree index will be very difficult. If you create 10 bitmap indexes on all these columns, all the queries can be answered by these indexes, whether they are queries on all 10 columns, on 4 or 6 columns out of the 10, or on a single column. The AND_EQUAL hint provides this functionality for B-tree indexes, but no more than five indexes can be used by a query. This limit is not imposed with bitmap indexes.
In contrast, B-tree indexes are well suited for OLTP applications in which users' queries are relatively routine (and well tuned before deployment in production), as opposed to ad hoc queries, which are much less frequent and executed during nonpeak business hours. Because data is frequently updated in and deleted from OLTP applications, bitmap indexes can cause a serious locking problem in these situations. Steven White
Raising Index Awareness
Page 72 of 73
The data here is fairly clear. Both indexes have a similar purpose: to return results as fast as possible. But your choice of which one to use should depend purely on the type of application, not on the level of cardinality.
Steven White
Raising Index Awareness
Page 73 of 73