SQL: Joins A join is used to combine rows from multiple tables. A join is performed whenever two or more tables is listed in the FROM clause of an SQL statement. There are different kinds of joins. Let's take a look at a few examples.
Inner Join (simple join) Chances are, you've already written an SQL statement that uses an inner join. It is is the most common type of join. Inner joins return all rows from multiple tables where the join condition is met. For example, SELECT suppliers.supplier_id, suppliers.supplier_name, orders.order_date FROM suppliers, orders WHERE suppliers.supplier_id = orders.supplier_id; This SQL statement would return all rows from the suppliers and orders tables where there is a matching supplier_id value in both the suppliers and orders tables. Let's look at some data to explain how inner joins work: We have a table called suppliers with two fields (supplier_id and supplier_ name). It contains the following data: supplier_id
supplier_name
10000
IBM
10001
Hewlett Packard
10002
Microsoft
10003
Nvidia
We have another table called orders with three fields (order_id, supplier_id, and order_date). It contains the following data: order_id
supplier_id
order_date
500125
10000
2003/05/12
500126
10001
2003/05/13
If we ran the SQL statement below: SELECT suppliers.supplier_id, suppliers.supplier_name, orders.order_date FROM suppliers, orders WHERE suppliers.supplier_id = orders.supplier_id; Our result set would look like this: supplier_id
name
order_date
10000
IBM
2003/05/12
10001
Hewlett Packard
2003/05/13
The rows for Microsoft and Nvidia from the supplier table would be omitted, since the supplier_id's 10002 and 10003 do not exist in both tables. Mahendra Sajjala
Page 1 of 12
Joins
Outer Join Another type of join is called an outer join. This type of join returns all rows from one table and only those rows from a secondary table where the joined fields are equal (join condition is met). For example, select suppliers.supplier_id, suppliers.supplier_name, orders.order_date from suppliers, orders where suppliers.supplier_id = orders.supplier_id(+); This SQL statement would return all rows from the suppliers table and only those rows from the orders table where the joined fields are equal. The (+) after the orders.supplier_id field indicates that, if a supplier_id value in the suppliers table does not exist in the orders table, all fields in the orders table will display as in the result set. The above SQL statement could also be written as follows: select suppliers.supplier_id, suppliers.supplier_name, orders.order_date from suppliers, orders where orders.supplier_id(+) = suppliers.supplier_id Let's look at some data to explain how outer joins work: We have a table called suppliers with two fields (supplier_id and name). It contains the following data: supplier_id
supplier_name
10000
IBM
10001
Hewlett Packard
10002
Microsoft
10003
Nvidia
We have a second table called orders with three fields (order_id, supplier_id, and order_date). It contains the following data: order_id
supplier_id
order_date
500125
10000
2003/05/12
500126
10001
2003/05/13
If we ran the SQL statement below: select suppliers.supplier_id, suppliers.supplier_name, orders.order_date from suppliers, orders where suppliers.supplier_id = orders.supplier_id(+); Our result set would look like this: supplier_id
supplier_name
order_date
10000
IBM
2003/05/12
10001
Hewlett Packard
2003/05/13
10002
Microsoft
10003
Nvidia
Mahendra Sajjala
Page 2 of 12
Joins
The rows for Microsoft and Nvidia would be included because an outer join was used. However, you will notice that the order_date field for those records contains a value.
What's the difference between an inner join and an outer join? When you wish to get data from two related tables, you can use an inner join or an outer join to define how the data is related. For my examples, let's assume where is a table called "student" to hold student information, a table called "attendance" to hold student's attendance, and a table called StudentSchedule that holds a student's current schedule: Student has the following fields: ID FirstName LastName BirthDate Attendance has the following fields: ID AttendanceDate AttendanceCode MinutesAbsent StudentSchedule as the following fields: ID Course An inner join gets data from both tables where the specified data exists in both tables. For example, if you wanted a list of students in your database that were absent on December 4, 2003, you would use an inner join between the two examples tables "Student" and "Attendance": SELECT Student.ID, Student.FirstName, Student.LastName, Attendance.AttendanceCode, Attendance.MinutesAbsent FROM Student INNER JOIN Attendance ON Student.ID=Attendance.ID WHERE Attendance.AttendanceDate='12/4/2003' The above statement will only return students with attendance information on the specified date. Students who do not have attendance would not display. Like so: ID FirstName 10 Steve 32 Dale
LastName AttendanceCode Bartman Tardy Thropmorton ExcAbsent
MinutesAbsent 22 200
(maybe there are 200 kids in the database, but only Steve and Dale where absent on 12/4/2003. They are the only students to display) An outer join gets data from the source table at all times, and returns data from the outer joined table ONLY if it matches the criteria. You would use this type of join using my examples tables if you wanted a list of all students in a specified course, and you wanted attendance information if it existed. You would use an inner join between Student and StudentSchedule to
Mahendra Sajjala
Page 3 of 12
Joins
only get the students in the speicified course (for example 'ENGLISH 9'), but you would use an outer join against Attendance because you want ALL students in the course, not just students with attendance information on 12/4/2003. When using outer joins, fields will be set to NULL if data does not exist in the outer-joined table. SELECT Student.ID, Student.FirstName, Student.LastName, Attendance.AttendanceCode, Attendance.MinutesAbsent FROM Student INNER JOIN StudentSchedule ON StudentSchedule.ID=Student.ID LEFT OUTER JOIN Attendance ON Student.ID=Attendance.ID AND Attendance.AttendanceDate='12/4/2003' WHERE StudentSchedule.Course='ENGLISH 9' ID FirstName LastName AttendanceCode MinutesAbsent 10 Steve Bartman Tardy 22 32 Dale Thropmorton ExcAbsent 200 44 Jennifer Lopez NULL NULL Description A join is an operation that combines two tables to produce a third, subject to a restrictive condition. Every row of the new table must satisfy the restrictive condition. Joins provide the means of linking data in one table with data in another table and are frequently used in defining reports and queries. A join is used in a SELECT statement as part of the FROM clause. Caché supports many different syntactical forms of JOIN. However, these many formulations refer to the following four types of joins. Type of Join
Description
CROSS JOIN
Same as using a comma to separate two table names in a FROM clause.
INNER JOIN
Same as JOIN. Symbolic representation: "=" (in a WHERE clause).
LEFT OUTER JOIN
Same as LEFT JOIN. Symbolic representation: "=*" (in a WHERE clause).
RIGHT OUTER JOIN
Same as RIGHT JOIN. Symbolic representation: "*=" (in a WHERE clause).
The LEFT OUTER JOIN and the RIGHT OUTER JOIN are in most respects functionally identical (with reversed syntax), and thus are frequently referred to collectively as one-way outer joins. The INNER JOIN, LEFT OUTER JOIN, and RIGHT OUTER JOIN can take one (and only one) of the following qualifiers: a NATURAL keyword prefix, an ON clause, or a USING clause. Thus a NATURAL JOIN is any of these three types of join prefixed with the NATURAL keyword.
Mahendra Sajjala
Page 4 of 12
Joins
The following joins are only supported as the first join within a join expression: a RIGHT OUTER JOIN, a NATURAL JOIN (any type), or a JOIN with a USING clause (any type). Note that a JOIN can be recursive, such as involving the join argument in the last line of syntax, which can include any form of JOIN at that point. JOIN Definitions A CROSS JOIN is a join that crosses every row of the first table with every row of the second table. This results in a Cartesian product, a large, logically comprehensive table with much data duplication. An INNER JOIN is a join that links rows of the first table with rows of a second table, excluding any row in the first table that finds no corresponding row in the second table. This results in a small table in which every field contains unique data. A One-Way Outer Join (LEFT OUTER JOIN or RIGHT OUTER JOIN) is a join that links rows of the first (source) table with rows of a second table, including all rows from the first table even if there is no match in the second table. This results in a table in which some fields of the first (source) table may be paired with NULL data. When specifying a one-way outer join, the order in which you name the tables in the FROM clause is very important. For a LEFT OUTER JOIN, the first table you specify is the source table for the join. For a RIGHT OUTER JOIN the second table you specify is the source table for the join. These two types of one-way outer joins have different restrictions. Appending the word NATURAL to an inner or outer join specifies that you are joining on all the columns of the tables that have the same name. JOIN Restrictions For an INNER JOIN, there is no restriction on the conditional expression — anything works. For LEFT or RIGHT joins, the conditional expression in the ON clause must consist of ANDs or comparisons meeting one of the following criteria (given "A LEFT JOIN B ON conditional-expression" or "A RIGHT JOIN B ON conditional-expression"): •
A column from A equals a column from B (or vice versa).
•
For a LEFT JOIN, a field from B is a constant expression.
•
For a RIGHT JOIN, a field from A is a constant expression.
•
For a RIGHT OUTER JOIN, the left operand cannot be a join expression, a view, or a (query).
•
For a LEFT OUTER JOIN, the right operand cannot be a join expression. It can, however, be a view or a (query).
For further details, see OUTER JOIN Limitations in Using Caché SQL. For a NATURAL join, or a join with a USING clause, only simple base table references are supported for either operand. One-Way Outer Joins Caché supports one-way outer joins: LEFT OUTER JOIN and RIGHT OUTER JOIN. Caché does not support two-way outer joins. With standard "inner" joins, when rows of one table are linked with rows of a second table, a row in the first table that finds no corresponding row in the second table is excluded from the output table.
Mahendra Sajjala
Page 5 of 12
Joins
With one-way outer joins, all rows from the first table are included in the output table even if there is no match in the second table. With one-way outer joins, the first table pulls relevant information out of the second table but never sacrifices its own rows for lack of a match in the second table. For example, if a query lists Table1 first and creates a left outer join, then it should be able to see all the rows in Table1 even if they don't have corresponding records in Table2. When specifying a one-way outer join, the order in which you name the tables in the FROM clause is very important. For a left outer join, the first table you specify is the source table for the join. For a right outer join the second table you specify is the source table for the join. Outer Join Syntax Caché supports three formats for representing outer joins: 1. The ANSI standard syntax: LEFT OUTER JOIN and RIGHT OUTER JOIN. SQL Standard syntax puts the outer join in the FROM clause of the SELECT statement, rather than the WHERE clause, as shown in the following example: FROM tbl1 LEFT OUTER JOIN tbl2 ON (tbl1.key = tbl2.key) 2. The ODBC Specification outer join extension syntax, using the escape-syntax {oj ... } 3. Symbolic outer join extension syntax, using a condition such as A=*B in the WHERE clause. A left outer join is specified using the symbol =* in place of = in the WHERE clause. (Note that this is the reverse of the syntax used by Microsoft SQL Server.) A right outer join is specified using the symbol *= in place of = in the WHERE clause. These three formats are interchangeable and can also be mixed. However, we recommend the use of ANSI standard syntax, as it is the only one compatible with ODBC (and portable to the latest Microsoft products). Additionally, it can specify many operations not specifiable in principle with symbolic syntax. Examples of Outer Joins As an illustration of the difference between an inner join and a one-way outer join, consider the query: SELECT Patient.PName,Doctor.DName FROM Patient,Doctor WHERE Patient.Doctor = Doctor.Doctor The FROM clause specifies a cross join, and the WHERE clause restricts the results to an inner join. Thus, this query returns one row for each row of the "Patient" table that has a non-null "Doctor" field. By contrast, examine the following left outer join query: SELECT Patient.PName,Doctor.DName FROM Patient,Doctor WHERE Patient.Doctor =* Doctor.Doctor Though it differs only in the =* operator, this query returns all the rows of the previous query plus the rows of the "Patient" table that have null value for the
Mahendra Sajjala
Page 6 of 12
Joins
"Patient.Doctor" field. For such rows, "Doctor.Name" is returned as null. This is known as "null padding." Caché automatically performs null padding. The left outer join condition is expressed by the following syntax: A.x LEFT OUTER JOIN B.y This specifies that every row in A is returned. For each A row returned, if there is a B row such that A.x=B.y, all of the corresponding B values are also returned. If there is no B row such that A.x=B.y, null padding causes all B values for that A row to return as null. Order of Operations One-way outer join conditions, including the necessary null padding, are applied before other conditions. Therefore, a condition in the WHERE clause that cannot be satisfied by a null-padded value (for example, a range or equality condition on a field in B) effectively converts the one-way outer join of A and B into a regular join (an inner join). For example, if you add the clause "AND Doctor.Age < 45" to the two "Patient" table queries above, it makes them equivalent. However, if you add the clause "AND (Doctor.Age < 45 OR Doctor.Age IS NULL)", it preserves the difference between the two queries. Mixing Outer and Inner Joins In general, Caché supports all syntax of mixed inner and outer join, in any order, as long as no parenthesis is used. There are a few restrictions on the conditions within the ON clause of an outer join. For example, for a left outer join these conditions must be of the form A=B or of the form X=B, where A is a field in the left table, B is a field in the right table, and X is a constant or a host variable. The following examples display the results of the JOIN commands performed on Table1 and Table2. Table1 Table2 Column1 Column2 Column1 Column3 aaa bbb ggg hhh ccc ccc xxx zzz xxx yyy hhh zzz CROSS JOIN Example The statement: SELECT * FROM Table1 CROSS JOIN Table2 yields the table: Column1 Column2 Column1 Column3 aaa bbb ggg hhh aaa bbb xxx zzz ccc ccc ggg hhh ccc ccc xxx zzz xxx yyy ggg hhh xxx yyy xxx zzz hhh zzz ggg hhh hhh zzz xxx zzz NATURAL JOIN Example The statement: SELECT * FROM Table1 NATURAL JOIN Table2
Mahendra Sajjala
Page 7 of 12
Joins
yields the table Column1 Column2 Column1 Column3 xxx yyy xxx zzz Note that the Caché implementation of NATURAL JOIN does not merge columns with the same name. JOIN with an ON Clause Example The statement: SELECT * FROM Table1 INNER JOIN Table2 ON Table1.Column1=Table2.Column3 yields the table: Column1 Column2 Column1 Column3 hhh zzz ggg hhh JOIN with a USING Clause Example The statement: SELECT * FROM Table1 INNER JOIN Table2 USING (Column1) yields the table: Column1 Column2 Column1 Column3 xxx yyy xxx zzz LEFT OUTER JOIN Example The statement: SELECT * FROM Table1 LEFT OUTER JOIN Table2 ON Table1.Column1=Table2.Column3 yields the table: Column1 Column2 Column1 Column3 aaa bbb null null ccc ccc null null xxx yyy null null hhh zzz ggg hhh RIGHT OUTER JOIN Example The statement: SELECT * FROM Table1 RIGHT OUTER JOIN Table2 ON Table1.Column1=Table2.Column3 yields the table: Column1 Column2 Column1 Column3 hhh zzz ggg hhh null null xxx zzz
So what are the other types of Joins? As I told you initially, there are many other types of Joins. Inner Joins and Outer Joins are 2 of them.
What are Inner Joins? Inner Joins are the most commonly used type of Joins are are almost the same thing that you just learnt above. Check the following query SELECT * FROM (authors INNER JOIN books ON authors.author_id = books.author_id)
Mahendra Sajjala
Page 8 of 12
Joins
This would return 4 rows of data having all the information about the 4 books. This gives exactly the same result as the following query SELECT * FROM authors,books WHERE authors.author_id = books.author_id Basically Inner Joins combines all the records in the first table with all the records in the second table and then selects those rows depending on the criteria that is present after the the ON keyword in the query. The most important thing to remember in Inner Joins is that only those records from both the tables are combined where there is a corresponding value in both the tables. You will understand this point clearly when you read about Outer Joins.
What are Outer Joins? Outer Joins are almost similar to Inner Joins, but in Outer Joins all the records in one table are combined with records in the other table even if there is no corresponding common value. To get this point assume that our authors table is now the one shown below instead of the original one. The books table remains the same. author_id lastname 1 Hunter 2 Kanetkar 3 Pai new row added.
firstname Jason Yashwant Kiran
company O' Reilly BPB Publications Pai Ltd. Note that there is a
There are 2 types of Outer Joins - Left Join and Right Join. Consider an example of Left Join SELECT * FROM (authors LEFT JOIN books ON authors.author_id = books.author_id) This query would now (considering the new authors table) return 5 rows of data. The last row would correspond to the record of the author named Kiran Pai but there would be NULL values for the fields such as title,pages,book_id... since there are no books written by Kiran Pai in our books table. This is exactly what Left Joins do - they combine all the records in the first table with those in the second table irrespective of whether there are corresponding values in 2 tables. By corresponding I mean like in our case since there were no books by Kiran Pai in our books table there are no corresponding values for Kiran Pai in our books table. But inspite of there being no corresponding values all the values were joined and thus you got the NULL values in your final query result. This point is very important to understand Joins.. so please do read it again in case you have not understood it clearly.
Could you explain the concept of Inner Joins once again with respect to this Left Joins? Yeah.. now you will understand better. As I mentioned in Left Joins all the records in the first table are combined with data from the second table. And if there is no corresponding data in the second table then a NULL value is inserted into the results. Whereas in Inner Joins, records from the 2 tables are combined only if there are corresponding values in both the tables. For example consider
Mahendra Sajjala
Page 9 of 12
Joins
the following Inner Join on the books table and our new authors table SELECT * FROM (authors INNER JOIN books ON authors.author_id = books.author_id) This query would return only 4 rows of data and not 5 inspite of the new Kiran Pai entry being present. This is because it could find no corresponding entry in the books table for the author named Kiran Pai (there were no books written by Kiran Pai)
There are are 2 types of Outer Joins ..right?? Yeah.. there are 2 types. One is Left Join and the other obviously is Right Join. Consider the same example as shown previously for Left Join, but this time use a Right Join instead of a Left Join SELECT * FROM (authors RIGHT JOIN books ON authors.author_id = books.author_id) This time you would NOT GET 5 rows of data, instead you would get only 4 rows of data. Why so?? Since while joining the data from the books table with the authors table, the books table data is considered first.. since it is a Right Join. Thus for every row in the books table a check for corresponding value in the authoirs table would be made. Thus while adding all the rows of the books table to the joined table, since there were no books written by Kiran Pai, so that particular record from the authors table was not added at all to the joined table. Remember that the joined table is the kind of temporary table that is created while using Joins. That table is refined depending on the criteria present in the query.
Inner Joins and Outer Joins aren't really clear in my mind right now. Why don't you explain it once again in short ? Actually, all that is required is a re-reading of the previous 4-5 questions. In case you still haven't got it, there is only way out. Create actual tables using some software and then run some sample queries on those tables. Remember not to put corresponding entries for all your data in both of your tables. By that I mean in case you are making 2 tables - one for artists and one for songs. See to it that in case you have 10 artists names in the artists table, you do not have songs sung by all of those 10 artists in your songs table. Add songs by only 6-7 artists. When you have such a kind of database where there isn't corresponding information for all the records in one table in the other table, Inner Joins and Outer Joins show their real working. In case you have songs sung by all 10 artists in your Songs table, most of your query results would be the same irrespective of what kind of Join you use. Only the order of the rows in the resultset might be altered. So you would be confused..you wouldn't understand the difference between the working of different Joins.
Are there other ways in which I can use Joins? Yeah you can combine 2 Joins to create more complex Joins. I shall not go into the details of those kind of Joins, but I shall outline the structure of the query which you have to make. This kind of thing is required when you are combining
Mahendra Sajjala
Page 10 of 12
Joins
data from say 3 tables. Your query would look something like the following SELECT ... , ... FROM ( .... INNER JOIN .... ON .... = .... ) INNER JOIN .... ON .... = .... WHERE .... = ..... This is not really for beginners, but if you can get it.. you are not a beginner.. you are already on your way to be an expert.
So basically there are 3 Joins right?? Cross Join, Inner Join and Outer Join? There is also a Self Join, which I shall not be explaining in this article. Just remember that it exists. It is a fairly complex topic which you may not need right away.
Give me more examples of Joins with new tables I could do that.. but that would basically amount to waste of time. On the contrary I shall explain how to make effective use of Joins along with another SQL feature called Sub Query.
What's Sub Query? Use the same tables that I have used before. In case you want to find the titles of the books written by the Jason. What would you do?? Give it a thought and try to figure some way to get it done.. then read on.
Yes I can do it.. I shall first make a query and find his author_id from the authors table and then I shall make another query and get a list of books having that id associated with it from the books table. Ok.. good. but aren't you using 2 queries? Can you get it in one query? That where Sub Queries come in use. You could find the author_id for that particular author and then also find the books written by him in the same query.
Mahendra Sajjala
Page 11 of 12
Joins
This sounds exciting.. How do I do it? Ok here is what I am trying to do now. In English I could state it as follows - "I want a list of the books that have been written by author whose firstname is Jason" . In SQL I could possibly use the following SELECT title FROM authors,books WHERE ( books.author_id = (SELECT author_id from authors where firstname = 'Jason') AND books.author_id = authors.author_id) Seems complex.. but its actually simple.. (This is after all SQL In Simple English :) The first part is simple.. you select the title of the books whenever the criteria is met. Lets analyze the criteria.. The inner most part SELECT author_id from authors where firstname = 'Jason' evaluates to the 1 (its just a simple query which you learnt right at the beginning). Thus the entire query now becomes SELECT title FROM authors,books WHERE (books.author_id = 1 AND books.author_id = authors.author_id) Here we are using the 2 original tables the authors table with 2 rows and the books table with 4 rows. Now when the joined table is created it would be consisting of 8 rows (2x4 as explained earlier). From these 8 rows only 4 would be containing correct information and the rest of them would be just present as a result of the various combinations that were possible. Now the books.author_id = authors.author_id part of the WHERE clause separates these 4 correct rows and discards the remaining rows. And finally the other part of the WHERE clause books.author_id = 1 would selects the final single row from the 4 correct rows. Thus you would get the names of the books written by the author named Jason Hunter.
Mahendra Sajjala
Page 12 of 12
Joins