What is a collection? A collection is nothing more than an array. The most basic collection is a single dimension array. Later on in this article, we will get into some examples of very complex arrays but we'll start with the simple stuff. Associative Array In Oracle 7, Oracle introduced an index by table, which was also called a PL/SQL table. This table was different from a SQL table in that it existed only in PL/SQL and you could NOT perform SQL against it. The most basic example of an index by table: TYPE a_number IS TABLE OF NUMBER INDEX BY BINARY_INTEGER. This creates an array of unlimited size (up to your OS and DB version limitations) of NUMBER which is indexed by a BINARY_INTEGER datatype. The index is just the subscript and BINARY_INTEGER is just a numeric data type. An index by table does NOT have to be initialized. Below is a small example showing how to create an array, create a variable from the array and then populate it with 5 values and display them in a loop:
DECLARE -- Declare the type TYPE a_char_data IS TABLE OF VARCHAR2(10) INDEX BY BINARY_INTEGER; --Declare a variable of that type v_char_data a_char_data;
BEGIN -- Populate some data v_char_data(1) := 'ABC'; v_char_data(2) := 'DEF'; v_char_data(3) := '1234567890'; v_char_data(4) := 'X+Y=Z'; v_char_data(5) := 'DE NADA';
-- Loop thru the data and display via dbms_pitput FOR i IN v_char_data.FIRST..v_char_data.LAST LOOP dbms_output.put_line( v_char_data(i) ); END LOOP; END; / This example introduced two Collection Methods. If you've read my Sorcerer of OO series, you know that a method is just a function assigned to an object type. The FIRST method returns the first subscript in a collection, in this case a 1, and the LAST method returns the last subscript in a collection, in this case a 5. I'll talk about additional methods later in this article. Records Before I go any further with collections, I think it's important to understand another feature Oracle introduced in v7, records. You can think of a PL/SQL record just like you would the column layout in a table. A PL/SQL record is for use in PL/SQL programs.
DECLARE -- Declare the record type TYPE r_data IS RECORD ( ssn VARCHAR2(9) NOT NULL := -1, name VARCHAR2(30), dob DATE ); --Declare a variable of that type v_data r_data;
BEGIN -- Set some values v_data.ssn := '123456789'; v_data.name := 'Lewis';
v_data.dob := '01-JAN-1900'; END; / You can also declare a record using the structure of a table, i.e. SCOTT.EMP%ROWTYPE. That's a subject for a different discussion, however. When you combine a record type with a collection type, you have a very powerful set of tools for building in memory data structures. With a minor change to the above examples, we can create a ragged, multi-dimensional array:
DECLARE -- Declare a basic table type type TYPE a_char_data IS TABLE OF VARCHAR2(10) INDEX BY BINARY_INTEGER; -- Declare a complex record type TYPE r_data IS RECORD ( ssn VARCHAR2(9) NOT NULL := -1, name a_char_data,
-- Notice the table_type used here
dob DATE ); -- Declare a index by table using the complex record type TYPE a_multi IS TABLE OF r_data INDEX BY BINARY_INTEGER; -- Declare a variable using the complex array v_data a_multi;
BEGIN -- Set some values v_data(1).ssn := '123456789'; v_data(1).dob := '01-JAN-1900';
-- Notice the second subscript v_data(1).name(1) := 'Lewis'; v_data(1).name(2) := 'Joe';
dbms_output.put_line(v_data(1).ssn); -- Loop through the v_data(1).name table FOR i IN v_data(1).name.FIRST..v_data(1).name.LAST LOOP dbms_output.put_line(v_data(1).name(i)); END LOOP; END; / Let me walk you through exactly what this example is doing: •
•
In the o o o
declare section: First we declare our very basic array of VARCHAR2(10). Next we declare a record type with an embedded index by table. So now we have a ragged record type. A single record with a dimensional name column. o Then we declare a new table based on the complex record type. This creates an array of ragged records. o And finally, we create the complex variable. Executeable section: o We populated the ssn and dob columns of the first record of the v_data variable. o Next we populated the first and second rows of the name table in the first row of the v_data variable. o Finally, we displayed the ssn of the first row of the v_data variable and then looped through the name table of the first row of the v_data variable.
Phew, again! If you are not quite getting this, stick with it. Cut and paste it into TOAD or PL/SQL Developer or something like that. It is confusing at first but it is so hugely powerful that it's worth the time. Don't try this in v7 though. Oracle has continuously made improvements to both PL/SQL in general and to collections specifically. As a matter of a fact, it was a while before you could have anything but single dimension tables (you could not combine records and collections). And I'm not sure exactly which release added multi-dimensional tables but it was very recent.
Oracle 10g also added a new feature to collections that has been a long time coming. Associative arrays! Wait a minute. Isn't that just index by tables with a new name? Sort of. What they added was index by varchar2. So now you can have something like this: DECLARE -- Declare the type using varchar2 TYPE a_depts IS TABLE OF NUMBER INDEX BY VARCHAR2(10); --Declare a variable of that type v_depts a_depts; -- Declare a counter i VARCHAR2(10); BEGIN -- Loop thru scott.emp FOR c_depts IN ( SELECT deptno, ename
FROM SCOTT.emp ) LOOP
-- Load up the array with values from scott.emp v_depts(c_depts.ename) := c_depts.deptno; END LOOP;
-- Initialize our CHAR counter, i.e. ename i := v_depts.FIRST; -- Loop thru the data LOOP -- Exit when the next value is null, i.e. does not exist EXIT WHEN NOT v_depts.EXISTS(i); dbms_output.put_line( i || ' is in dept: ' || to_char(v_depts(i)) ); -- Increment the counter i := v_depts.next(i);
END LOOP;
END; / I introduced two new collection methods here. EXISTS and NEXT. EXISTS accepts a subscript and returns TRUE if the element of the array at that subscript exists. NEXT returns the subscript of the next elements. PRIOR was not used here but it would return the subscript of the previous element in the array. By using PRIOR and NEXT, you can loop through arrays indexed by nonnumeric data as well as loop through sparse arrays. I'll cover sparse arrays in one of the next entries on this topic.
Turbocharged DML with FORALL Oracle introduced two new data manipulation language (DML) statements for PL/SQL in Oracle8i: BULK COLLECT and FORALL. Both statements implement a form of array processing inside PL/SQL; BULK COLLECT facilitates high-speed retrieval of data, and FORALL greatly improves performance of INSERT, UPDATE, and DELETE operations. Oracle Database achieves significant performance gains with these statements by greatly reducing the number of context switches between the PL/SQL and SQL statement execution engines. With BULK COLLECT, you fetch multiple rows into one or more collections, rather than individual variables or records. The following example of BULK COLLECT retrieves all books containing "PL/SQL" in their titles into an associative array of records, all in a single pass to the database. DECLARE TYPE books_aat IS TABLE OF book%ROWTYPE books books_aat; BEGIN SELECT * BULK COLLECT INTO book FROM books WHERE title LIKE '%PL/SQL%'; ... END;
INDEX BY PLS_INTEGER;
Similarly, FORALL transfers data from a PL/SQL collection to the specified table using collections. The following code example shows a procedure that accepts a nested table of book information and inserts the full contents of that collection (the binding array) into the book table. Note that this example also takes advantage of Oracle9i's FORALL enhancement that allows the INSERT of a record directly into a table.
and FORALL are very helpful, not only in improving performance but also in simplifying the code you need to write for SQL operations in PL/SQL. The following FORALL INSERT of multiple rows demonstrates rather clearly why PL/SQL is considered to be the best programming language for the Oracle database. BULK COLLECT
CREATE TYPE books_nt IS TABLE OF book%ROWTYPE; / CREATE OR REPLACE PROCEDURE add_books ( books_in IN books_nt ) IS BEGIN FORALL book_index IN books_in.FIRST .. books_in.LAST INSERT INTO book VALUES books_in(book_index); ... END;
Prior to Oracle Database 10g, however, there was an important restriction on using collections with FORALL: The database read the contents of the collection sequentially from the first to the last row in the IN range clause. If it encountered a row within that range that was undefined, Oracle Database would raise the ORA-22160 exception: ORA-22160: element at index [N] does not exist
For simple applications of FORALL, this rule would not cause any difficulties. If, however, you want to take full advantage of FORALL in the widest possible range of circumstances, the requirement that any FORALL driving array be sequentially populated might increase program complexity and degrade performance. In Oracle Database 10g, PL/SQL now offers two new clauses in the FORALL statement, INDICES OF and VALUES OF, which allow you to choose very selectively which rows from the driving array should be processed by the extended DML statement. comes in very handy when your binding array is sparse or contains gaps. The syntax for this statement is: INDICES OF
FORALL indx IN INDICES OF sparse_collection INSERT INTO my_table VALUES sparse_collection (indx);
responds to a different situation: My binding array may or may not be sparse, but I want to use only a subset of the elements in that array. I can then use VALUES OF to point to just the values I wish to use in my DML operation. The syntax for this statement is: VALUES OF
FORALL indx IN VALUES OF pointer_array INSERT INTO my_table VALUES binding_array (indx);
In the remainder of this article, I will take a look at how to move from a CURSOR FOR loop to a FORALL implementation and how the VALUES OF clause can ease your programming tasks. Moving FOR Loops to FORALL Suppose I need to write a procedure that gives raises to employees who are eligible (as determined by the comp_analysis.is_eligible function) and writes reports to the employee_history table for any employees who are ineligible for that raise. I work for a very large corporation; we have lots and lots of employees. This is not a terribly challenging task for a PL/SQL developer. I don't even need to use BULK COLLECT or FORALL to get the job done, as I demonstrate in Listing 1. Instead, I use a CURSOR FOR loop and individual INSERT and UPDATE statements. This is straightforward code; unfortunately, I have been given 10 minutes in which to run this code, and my "oldfashioned" approach is running 30 minutes or more. Fortunately, my company upgraded to Oracle9i and, even more fortunately, I learned about bulk processing at a recent Oracle seminar (as well as from the excellent demonstrations provided on Oracle Technology Network). So I decide to rewrite my program, using collections and bulk processing. The result is shown in Listing 2. A quick glance at Listings 1 and 2 should make one point very clear: converting to collections and bulk processing can increase the volume and complexity of your code. If you need a serious boost in performance, however, that increase is well-justified. Now let's move beyond the quick glance and see what it takes to handle the conditional logic inside a CURSOR FOR loop when using FORALL. Defining Collection Types and Collections In Listing 2, the first part of the declaration section (lines 6 through 11) defines several different collection TYPEs, one for each of the columns I will be retrieving from the employee table. I would rather declare a single collection TYPE based on employee%ROWTYPE, but FORALL does not yet support operations on collections of records in which I reference individual fields. So I still must declare a separate collection for each column for employee IDs, salaries, and hire dates. Next, I will declare the collections I need for each of those columns (lines 13 through 21). First, I define the collections corresponding to the queried columns (lines 13 through 15): employee_ids employee_aat; salaries salary_aat; hire_dates hire_date_aat;
I then need a new collection that holds the IDs only of those employees who are approved for a raise (line 17):
approved_employee_ids employee_aat;
Finally, I declare one collection for each column that will be used to record those who were ineligible for raises (lines 19 through 21): denied_employee_ids employee_aat; denied_salaries salary_aat; denied_hire_dates hire_date_aat;
Drilling Down Into the Code With these data structures in place, let's now skip down to the executable section of this procedure (lines 72 through 75), to begin to understand how the collections are used to speed up my process. retrieve_employee_info; partition_by_eligibility; add_to_history; give_the_raise;
I have used step-wise refinement (also known as "top-down design") to write this program. So rather than have a very long and hard-to-follow executable section, I have a four-line section that describes each step in my procedure by name. First, I retrieve the employee information (all the employees in the specified department). Then I partition or separate out those employees who will receive a raise and those who are denied. Once partitioned, I can add the denied employees to the employee history table and give a raise to the rest. Writing code in this fashion makes the end result much more readable. I can then drill down into whichever area of the program is of concern to me. With my collections declared, I can now use BULK COLLECT to retrieve the employee information (lines 23 through 30). This effectively replaces the CURSOR FOR loop. Now my data is loaded into my collections. The partition logic (lines 32 through 46) requires that I go through each row in the collections I just populated and check the eligibility for a raise. If approved, then I copy the employee ID from my query-populated collection to the approved collection. If the person is not eligible, I copy the employee ID, salary, and hire date, because all are needed for the insert into the employee_history table. With my original data now separated into two collections, I can use each of those collections as drivers for two different FORALL statements (starting on lines 51 and 66). I insert the denied collection data in bulk to the employee_history table (add_to_history), and I update the approved employee information in bulk in the employee table in the give_the_raise procedure.
Let's finish our examination of this rewrite with a closer look at add_to_history (lines 48 through 61). The FORALL statement (line 51) contains an IN clause that specifies the range of row numbers to be used in the bulk INSERT. In my explanation of the second rewrite of my procedure, I will refer to the collection used to define that range as the "driving collection." In this version of add_to_history, however, I simply say: use all the rows defined in denied_employee_ids. Within the INSERT itself, I use all three of the denied collections; I will refer to these collections as the "data collections." As you can see, the driving and data collections do not need to match. This is a critical point when learning to use the new features in Oracle Database 10g. The bottom line, then, of my rewrite in Listing 2 is that I have approximately twice the number of lines of code as in Listing 1, but the Listing 2 code runs within the required window of time. Prior to Oracle Database 10g, in such a situation I would simply be glad that I was able to fit within the window and move on to my next tasks. With the latest version of PL/SQL in Oracle Database 10g, however, I now have further opportunities for improvement, in terms of performance, readability, and code volume. Applying VALUES OF to the Process In Oracle Database 10g, you can now specify a subset of rows in the driving collection to be used by a FORALL statement. You can define that subset in one of two ways: • •
Match the row numbers in the data collection(s) with the row numbers in the driving collection. You would do this with the INDICES OF clause. Match the row numbers in the data collection(s) with the values found in the defined rows of the driving collection. You would do this with the VALUES OF clause.
I will use the VALUES OF clause in my second and final rewrite of give_raises_in_department. Listing 3 contains the full code for this version. I will skip over the parts of this program that are unchanged from the previous version. Starting with the declaration of collections, notice that I no longer define additional collections to hold the approved and denied employee information. Instead, in Listing 3 (lines 17 through 21) I declare two "guide" collections: one for approved employees and another for those denied raises. The datatype of both these collections is Boolean; as you will soon see, the datatype of the collections doesn't matter in the least. The only information the FORALL statement cares about is which rows are defined.
Implementation Elapsed Time CURSOR FOR loop
00:00:38.01
The elapsed times for the three implementations of Pre-Oracle give_raises_in_department Database 10g 00:00:06.09 with 50,000 rows in the bulk processing employee table. Oracle Database 00:00:02.06 10g bulk processing
Implementation Elapsed Time CURSOR FOR loop
00:00:58.01
The elapsed times for the three implementations of Pre-Oracle give_raises_in_department Database 10g 00:00:12.00 with 100,000 rows in the bulk processing employee table Oracle Database 00:00:05.05 10g bulk processing Table 1: Testing the timing with 50,000 and 100,000 rows The retrieve_employee_info subprogram is the same as before, but the way I partition my data is now quite different (lines 32 through 44). Rather than copy records of information from one collection to another (a relatively slow operation), I simply define the row (by assigning it a value of TRUE) in the appropriate guide collection that matches the row number of the employee IDs collection. I can now use the approved_list and denied_list collections as the driving collections in two different FORALL statements (starting on lines 49 and 65). To insert into the employee_history table, I use this formulation: FORALL indx IN VALUES OF denied_list
And to perform the updates (to give employees their raises), I use this formulation: FORALL indx IN VALUES OF approved_list
Within both DML statements, the data collections are the original collections populated in the BULK COLLECT retrieval step; no copies were made. With VALUES OF, Oracle Database chooses among the rows in those data collections, using only those rows whose row numbers match those of the driving collection. By taking advantage of VALUES OF in this program, I was able to avoid copying entire records of information, replacing them with a simple list of row numbers. For large arrays, the overhead of performing these copies can be quite noticeable. To test the advantages of the Oracle Database 10g approach, I loaded the employee table and ran tests with 50,000 and then 100,000 rows of data. I modified the pre-Oracle Database 10g bulk-processing implementation to perform many more copies of collection contents in order to emulate more of a real-world scenario. Then I used SQL*Plus SET TIMING ON to show the elapsed time for running each of my various implementations. Table 1 shows the results. The conclusions from these timings are clear: Switching from individual DML statements to bulk processing results in greatly reduced elapsed time, from 38 to 6 seconds with 50,000 rows and from 58 to 12 seconds with 100,000 rows. Furthermore, by avoiding the copies of data with VALUES OF, I was able to cut my elapsed time roughly in half. Even without the improvement in performance, VALUES OF and its sister clause, INDICES OF, increase the flexibility of the PL/SQL language and make it easier for developers to write more-intuitive and more-maintainable code. PL/SQL is, at this point in the product's life, a mature and powerful language. As a consequence, many of its new features are incremental and evolutionary. Nevertheless, these new features can make a significant difference in an application's performance and a developer's productivity. VALUES OF is a fine example of such a feature.