This document was uploaded by user and they confirmed that they have the permission to share
it. If you are author or own the copyright of this book, please report to us by using this DMCA
report form. Report DMCA
Overview
Download & View Sunny Sql Data Manipulation Language Help as PDF for free.
SQL Data Manipulation Language Help When writing this help, there were only used those syntax constructions of SQL-92 standard that were supported by MS SQL Server. We hope this help would suffice to master in wide and varied possibilities for query formulations of arbitrary complexity to relational databases. Although this help is primarily oriented at assisting in solving the suggested exercises, it actually spans all the syntax constructions of the data manipulation statements. Compatibility with the SQL-92 standard will make the acquired knowledge a universal tool of interacting with databases of different manufacturers. Some structures specific for MS SQL Server are also considered.
Simple SELECT statements. Eliminating duplicate rows. The DISTINCT keyword. Sorting the result set. The ORDER BY clause. Filtering Rows. The WHERE clause. Comparison predicates. Logical operations AND, OR, and NOT and three-valued logic (3VL). Checking the value to fall within a range. The BETWEEN predicate. Checking the value to be in a set. The IN predicate. Renaming the fields, and calculations in result set. Finding a substring. The LIKE predicate. Checking the availability/absence of a value. The IS NULL predicate. Summary values. Aggregate functions. Grouping rows. The GROUP BY clause. Filtering data by summary values. The HAVING clause. Ordering clauses in SELECT statement. The SELECT statement syntax. Using multiple tables in a query. The FROM clause. Cartesian product and equi-joins. Renaming tables. Using qualifying names for identification of fields. Join operations. The JOIN clauses. The realization of set-theoretic operations with the SELECT statement. Checking availability of records. The EXISTS predicate. Using the SOME|ANY and ALL keywords. Subqueries in the WHERE clause. Subqueries in the FROM and SELECT clauses. Type conversions. CASE statement. Functions Transact-SQL handling datetime data types.
Appendices 32. Appendix 1. About educational databases.
33. Appendix 2. The list of exercises.
SELECT statement SELECT statement retrieves rows from the database and has the most complex structure among other SQL statements. Almost any database user is capable of writing a simplest SELECT statement such as SELECT * FROM PC;
which retrieves all records from the table-type object PC; in so doing rows and columns of the result set have no order. To order columns of the result set they should be listed and separated by commas in the required order after the SELECT keyword: SELECT price, speed, hd, ram, cd, model, code FROM Pc;
Here is the result set of this query. price
speed
hd
ram
cd
model
code
600.0
500
5
64
12x 1232
1
850.0
750
14
128
40x 1121
2
600.0
500
5
64
12x 1233
3
850.0
600
14
128
40x 1121
4
850.0
600
8
128
40x 1121
5
950.0
750
20
128
50x 1233
6
400.0
500
10
32
12x 1232
7
350.0
450
8
64
24x 1232
8
350.0
450
10
32
24x 1232
9
350.0
500
10
32
12x 1260
10
980.0
900
40
128
40x 1233
11
The vertical projection of the РC table is obtained by listing the necessary fields only. For example, to get information about the processor speed and the amount of RAM in the computer run the following query: SELECT speed, ram FROM Pc;
which returns the data:
speed
ram
500
64
750
128
500
64
600
128
600
128
750
128
500
32
450
64
450
32
500
32
900
128
It should be noted that a vertical sample may include duplicate rows in case where the sample does not include any potential key with the values uniquely identify each row in the table. In the PC table, the code field is a potential key, which is specified in addition as primary key. Since this field is not included in the query, there are listed some duplicate rows in the above result set (for example, rows 1 and 3). If unique rows are needed (say, we only need different combinations of processor speed and RAM amount, not specifications of all available PCs), use the DISTINCT keyword: SELECT DISTINCT speed, ram FROM Pc;
Here's the result set: speed
ram
450
32
450
64
500
32
500
64
600
128
750
128
900
128
Apart from DISTINCT, the ALL keyword, which explicitly ask for all rows, may also be applicable. However, ALL keyword is accepted by default. It is possible to sort out the result set by a number of columns pointed out in the SELECT statement. For this purpose, the clause ORDER BY <list of fields> is used which is always the latest clause in the SELECT statement. In so doing, the sort column in list of fields may be specified as a name or a nonnegative integer representing the position of the name in SELECT list. For example, to sort the result set by RAM in descending order we can write
SELECT DISTINCT speed, ram FROM Pc ORDER BY ram DESC or SELECT DISTINCT speed, ram FROM Pc ORDER BY 2 DESC
The following result is the same. speed
ram
600
128
750
128
900
128
450
64
500
64
450
32
500
32
The result set can be sorted in ascending order (ASC is assumed by default) or in descending order (DESC). Sorting by two columns SELECT DISTINCT speed, ram FROM Pc ORDER BY ram DESC, speed DESC gives the following result: speed
ram
900
128
750
128
600
128
500
64
450
64
500
32
450
32
Horizontal restriction is realized by the clause WHERE <predicate> after the FROM clause. Now the result set will only include the rows from the record source for each of those the
predicate returns TRUE. In other words, the predicate for each row is checked . For example, the query "get information about processor's speed and RAM amount for computers priced below $500" can be written as follows: SELECT DISTINCT speed, ram FROM Pc WHERE price<500 ORDER BY 2 DESC
speed
ram
450
64
450
32
500
32
The latter query uses a comparison predicate with operator "<" (less than). Beside this operator, the following operators may be used: "=" (equal), ">" (greater than), ">="(greater or equal), "<=" (less or equal) and "<>" (not equal). Expressions in comparison predicates may include any columns from the tables listed in the FROM clause. Character strings and date/time constants are enclosed in single quotation marks. Here are some examples of simple comparison predicates: price < 1000
Price is less than $1000.
type = 'laptop'
Product type is Laptop.
cd = '24x'
24-speed CD-ROM.
color <>'y'
Not-color printer.
ram - 128 >0
RAM amount is over 128 Mb.
price <= speed*2
Price does not exceed twice processor's speed.
Predicates I Predicate is an expression taking truth value. It may be both a single expression or any combination of a numberless expressions built by means of Boolean operators AND, OR, and NOT. Besides, these combinations may inclide SQL-operator IS, and parentheses that define the order in which operations are to be executed. SQL predicate evaluates to TRUE, FALSE and UNKNOWN. The following predicates are exeptions to this rule: NULL, EXISTS, UNIQUE, and MATCH, which could not evaluate to UNKNOWN. To remember combination rules for these three truth values, let us denote TRUE as 1, FALSE as 0, and UNKNOWN as 1/2 (somewhere between true and false).
AND with two truth values gives minimum of these values. For example, TRUE AND UNKNOWN is UNKNOWN. OR with two truth values gives maximum of these values. For example, FALSE OR UNKNOWN is UNKNOWN. Negation of truth value is 1 minus this truth value. For example, NOT UNKNOWN is UNKNOWN.
Comparison predicates Comparison predicate is two expressions separated by a comparison operator. There are six conventional comparison operators: =, >, <, >=, <=, <>. The data of NUMERIC type (numbers) are compared in accordance with their algebraic values. The data of CHARACTER STRING type are compared in accordance with their alphabetic sequences. If a1a2…an and b1b2…bn are two character sequences, the first of these is "less" than the second if а1= 500 AND price < 800;
The query returns the following data: code
model
speed
ram
hd
cd
price
1
1232
500
64
5
12x 600.0
3
1233
500
64
5
12x 600.0
7
1232
500
32
10
12x 400.0
10
1260
500
32
10
12x 350.0
Example. Get information on all those printers that are not matrix and priced below $300: SELECT * FROM Printer WHERE NOT (type = 'matrix') AND price < 300;
Here is the result of that query: code
model
color
type
price
2
1433
y
Jet
270.0
3
1434
y
Jet
290.0
BETWEEN predicate BETWEEN predicate specifies the inclusive range to test the expression values. The range is defined by boundary expressions with AND keyword between them. Naturally, all the expressions in BETWEEN predicate must be the same data type, as in the case of comparison predicate. Syntax BETWEEN::= <expression to test> [NOT] BETWEEN AND <end expression> The predicate exp1 BETWEEN exp2 AND exp3 is equal to the predicate exp1>=exp2 AND exp1<=exp3 And the predicate exp1 NOT BETWEEN exp2 AND exp3 is equal to the predicate NOT (exp1 BETWEEN exp2 AND exp3) If the value of the predicate exp1 BETWEEN exp2 AND exp3 is TRUE, it does not generally mean that the value of predicate exp1 BETWEEN exp3 AND exp2 is TRUE also, because the first one may be interpreted as the predicate exp1>=exp2 AND exp1<=exp3 while the second one may be considered as exp1>=exp3 AND exp1<=exp2 Example. Find model and processor speed of computers priced between and including $400 through $600: SELECT model, speed FROM Pc WHERE price BETWEEN 400 AND 600;
model
speed
1232
500
1233
500
1232
500
IN predicate IN predicate determines whether the value of expression to test matches any value in the subquery or the list. Subquery is an ordinary SELECT statement that has a result set of one column and one or more rows. This column or all expressions in the list must have the same data type as the expression to test. If the target object is equal to any value returned by
subquery or is equal to any expression from the comma separated list, the Boolean value of IN predicate is TRUE. If target object <>X for each X in IN clause, the result value is FALSE. If a subquery does not return any rows (empty table), the predicate is FALSE. If none of the above conditions is valid, the predicate is UNKNOWN. Syntax IN::= <expression to | (<expression1>,...)
test>
[NOT]
IN
(<subquery>)
Example. Find the model, processor speed and hard drive capacity for those computers having the hard drive of 10Mb or 20Mb: SELECT model, speed, hd FROM Pc WHERE hd IN (10, 20);
model
speed
hd
1233
750
20
1232
500
10
1232
450
10
1260
500
10
Example. Find the model, processor speed and hard drive capacity for those computers having hard drive of 10Mb or 20Mb and produced by the manufacturer A: SELECT FROM WHERE
model,
speed, hd Pc hd IN (10, 20) AND model IN (SELECT model FROM product WHERE maker = 'A');
model
speed
hd
1233
750
20
1232
500
10
1232
450
10
It is possible to sort out the result set by a number of columns pointed out in the SELECT statement. For this purpose, the clause ORDER BY <list of fields> is used which is always the latest clause in the SELECT statement. In so doing, the sort column in list of fields may be specified as a name or a nonnegative integer representing the position of the name in SELECT list. For example, to sort the result set by RAM in descending order we can write
SELECT DISTINCT speed, ram FROM Pc ORDER BY ram DESC
or SELECT DISTINCT speed, ram FROM Pc ORDER BY 2 DESC
The following result is the same. speed
ram
600
128
750
128
900
128
450
64
500
64
450
32
500
32
The result set can be sorted in ascending order (ASC is assumed by default) or in descending order (DESC). Sorting by two columns SELECT DISTINCT speed, ram FROM Pc ORDER BY ram DESC, speed DESC
gives the following result: speed
ram
900
128
750
128
600
128
500
64
450
64
500
32
450
32
Horizontal restriction is realized by the clause WHERE <predicate> after the FROM clause. Now the result set will only include the rows from the record source for each of those the predicate returns TRUE. In other words, the predicate for each row is checked . For example, the query "get information about processor's speed and RAM amount for computers priced below $500" can be written as follows: SELECT DISTINCT speed, ram FROM Pc WHERE price<500 ORDER BY 2 DESC
speed
ram
450
64
450
32
500
32
The latter query uses a comparison predicate with operator "<" (less than). Beside this operator, the following operators may be used: "=" (equal), ">" (greater than), ">="(greater or equal), "<=" (less or equal) and "<>" (not equal). Expressions in comparison predicates may include any columns from the tables listed in the FROM clause. Character strings and date/time constants are enclosed in single quotation marks. Here are some examples of simple comparison predicates: price < 1000
Price is less than $1000.
type = 'laptop'
Product type is Laptop.
cd = '24x'
24-speed CD-ROM.
color <>'y'
Not-color printer.
ram - 128 >0
RAM amount is over 128 Mb.
price <= speed*2
Price does not exceed twice processor's speed.
Renaming columns and calculations in the result set The column names listed in the SELECT clause can be changed. This can be done to increase readability because the column names are frequently abbreviated to simplify typing. The keyword AS being used for renaming can be neglected in accordance with the standard, since it implicit. For example, the following query SELECT ram AS Mb, hd Gb FROM pc WHERE cd = '24x';
renames the ram column as Mb (megabytes) and the hd column as Gb (gigabytes). This query
will return the base memory amount and the hard drive storage on computers with a 24-speed CD-ROM: Mb
Gb
64
8
32
10
Renaming is especially desired when using value-calculation expressions in SELECT clause. These expressions give the data which may not be directly derived from database tables. An expression including the column names of the table contained in the FROM clause is calculated for each row of the result set. For example, RAM amount in kilobytes can be obtained when writing: SELECT ram * 1024 AS Kb, hd Gb FROM Pc WHERE cd = '24x';
Now the following result will be obtained: Kb
Gb
65536
8
32768
10
Sometimes, an explanation needs to be placed along with the corresponding value. To make this, add string expression as complementary column. For example, the query SELECT ram, 'Mb' AS ram_units, hd, 'Gb' AS hd_units FROM pc WHERE cd = '24x';
gives the following result: ram
ram_units
hd
hd_units
64
Mb
8
Gb
32
Mb
10
Gb
Should the expression have no specified name, a default naming way will be used, dependable on DBMS used. For example, the names like expression1 and so on will be used in MS Access, while the column returned in MS SQL Server has no title at all.
Predicates II LIKE predicate Syntax LIKE::=
<expression for calculating the string value> [NOT] LIKE <expression for calculating the string value> [ESCAPE <symbol>] The LIKE predicate compares the string specified in the first expression for calculating the string value, which is refered to as a value to test, with the pattern that is defined in the second expression for calculating the string value. The pattern may contain any combination of wildcards such as:
Underline symbol (_), which can be used instead of any single character in the value to test. Percent sign (%), which replaces any string of zero or more characters in the value to test.
If the value to test matches the pattern with the wildcards, the preducate equals TRUE. Below are some examples of the patterns. Pattern
Description
'abc%'
Any rows that begin with the characters "abc".
'abc_'
The rows with the length of strictly 4 characters, the first characters of the string being "abc".
'%z'
Any character sequence that necessarily ends with the character "z".
'%Rostov%'
Any character sequence including the word "Rostov" at any position of the string.
Example. Find all the ships the class names of those ends with the character 'о': SELECT * FROM Ships WHERE class LIKE '%o';
The result of that query is the following table: name
class
launched
Haruna
Kongo
1916
Hiei
Kongo
1914
Kirishima
Kongo
1915
Kongo
Kongo
1913
Musashi
Yamato
1942
Yamato
Yamato
1941
Example. Find all the ships the class names ending with the character 'о' but not with the characters 'go': SELECT FROM
* Ships
WHERE class NOT LIKE '%go' AND class LIKE '%o';
name
class
launched
Musashi
Yamato
1942
Yamato
Yamato
1941
If the string to search includes a wildcard as a character, then specify a control symbol the ESCAPE clause. This control symbol is to be used in the pattern prior to the wildcard to warn that the wildcard here should be considered as an ordinary character. For example, if the value to search includes the character "_", the pattern '%_%' results in returning all the record from the table. Hence, the pattern should be written as '%#_%' ESCAPE '#' To search the string "25%", the following predicate may be used: LIKE '25|%' ESCAPE '|' The truth value of LIKE predicate is assigned in accordance with the following rules:
If either value to test or the pattern, or control symbol is NULL, the truth value is UNKNOWN. Otherwise, if the value to test and the pattern have zero-length, the truth value is TRUE. Otherwise, if the value to test matches the pattern, LIKE predicate evaluates to TRUE. If none of the above conditions is met, LIKE predicate evaluates to FALSE.
Using NULL-value in the search conditions Predicate IS [NOT] NULL allows for checking the absence (availability) of value in the table columns. In such cases, using ordinary comparison predicates can lead to incorrect results, because comparing with NULL value evaluates to UNKNOWN. So, searching for no-price records in the PC table (typist errors) may be performed with the following operator: SELECT * FROM Pc WHERE price IS NULL;
Getting summarizing values
How many PC models does a particular supplier produce? How the average price is defined for computers with the same specifications? The answers to these and other questions associated with some statistic information may be obtained by means of summarizing (aggregate) functions. The following aggregate functions are assumed as standard: Function
Description
COUNT(*)
Returns the number of rows of the record source.
COUNT()
Returns the number of values in the specified column.
SUM()
Returns the sum of values in the specified column.
AVG()
Returns the average value in the specified column.
MIN()
Returns the minimal value in the specified column.
MAX()
Returns the maximum value in the specified column.
All these functions return a single value. In so doing, the functions COUNT, MIN, and MAX are applicable to any data types, while the functions SUM and AVG are only used with numeric fields. The difference between the functions COUNT(*) and COUNT() is that the second does not calculate NULL-values. Example. Find the minimal and maximal prices for PCs: SELECT MIN(price) AS Min_price, MAX(price) AS Max_price FROM PC;
The result is a single row containing the aggregate values: Min_price
Max_price
350.0
980.0
Example. Find the number of available computers produced by the maker А: SELECT COUNT(*) AS Qty FROM PC WHERE model IN (SELECT model FROM Product WHERE maker = 'A');
As a result we get: Qty 7 Example. If the number of different models produced by the maker A is needed, the query may be written as follows (taking into account the fact that each model in the Product table is shown once):
SELECT COUNT(model) AS Qty_model FROM Product WHERE maker = 'A';
The coincidence in the results is fully accidental and is due to the number of computers produced by maker A in database being equal to the number of models produced by this maker: Qty_model 7 Example. Find the number of available different models produced by maker A. This query is similar to the preceding one for the total number of models produced by maker A. Now we need to find the number of different models in the PC table (available for sale). To use only unique values in calculating the statistic, the parameter DISTINCT with an aggregate function argument may be used. ALL is another (default) parameter and assumes that all the column values returned are calculated. The statement SELECT COUNT(DISTINCT model) AS Qty FROM PC WHERE model IN (SELECT model FROM Product WHERE maker = 'A');
gives the following result: Qty 2 If we need the number of PC models produced by each maker, we will need to use the GROUP BY clause, placed immediately after the WHERE clause, if any.
GROUP BY clause The GROUP BY clause is used to define the row groups for each of those the aggregate functions (COUNT, MIN, MAX, AVG, and SUM) may be applied. When aggregate functions are used without a GROUP BY clause, all the columns with names mentioned in SELECT clause must be included in the aggregate functions. Then these functions are applied to the total set of the rows that fit the query predicate. Otherwise, all the columns in the SELECT list not included into the aggregate functions must be listed in the GROUP BY clause. As a result, all the returned query rows distributed into groups are characterized by the same combinations of these column values. Later on, aggregate functions are applied to each group. It is essential that NULL-values are considered equal in this case, i.e. when grouping by the column including NULL-values, all these rows will be combined in one group. When a GROUP BY clause is used without any aggregate function in the SELECT clause, the query will simply return one row from each group. Beside the DISTINCT keyword, this opportunity may be used in eliminating the row duplicates from the result set. Let us consider a simple example:
SELECT model, COUNT(model) AS Qty_model, AVG(price) AS Avg_price FROM PC GROUP BY model; The number of computers and their average price are defined for each PC model in the query. All rows with the same model value are combined in a group with value count and the average price calculated for each group thereafter. Executing this query gives the following table: model
Qty_model
Avg_price
1121
3
850.0
1232
4
425.0
1233
3
843.33333333333337
1260
1
350.0
Should the SELECT clause include date column these characteristics may be calculated for each date specified. For that, the date should be added as grouping column with the aggregate functions be calculated for each combination of (model−date). There are some particular rules for executing aggregate functions:
If none of the rows was returned by the query (or none of the rows for the given group), the source data for any aggregate function to be calculated is missing. In such case, the COUNT function returns zero, while other functions return NULL. The argument of the aggregate function cannot include aggregate functions itself (the function of function) i.e. no maximum of average values is obtainable. The result of the COUNT function is integer. Other aggregate functions inherit the types of processing data. An error occurs where the result is over the maximal value of the used data type while executing SUM function.
In so doing, if the query does not include GROUP BY clause, the aggregate functions in the SELECT clause process all the result rows of this query. If the query includes the GROUP BY clause, each row set with the same value in the column or the same combination of values in several columns given in the GROUP BY clause forms a group with the aggregate functions being calculated for each group separately.
HAVING clause While WHERE clause gives predicate for filtering rows, the HAVING clause is applied after grouping that gives a similar predicate but filtering groups by the values of aggregate functions. This clause is nessesary for checking the values that are obtained by means of an aggregate function not from separate rows of record source in the FROM clause but from the groups of these rows. Therefore, this checking is not applicable to the WHERE clause. Example. Get the count of PC and the average price for each model providing average price is less than $800: SELECT model, COUNT(model) AS Qty_model, AVG(price) AS Avg_price FROM PC GROUP BY model HAVING AVG(price) < 800;
As a result, we get: model
Qty_model
Avg_price
1232
4
425.0
1260
1
350.0
Note that the alias (Avg_price) for naming values of the aggregate function in the SELECT clause may not be used in the HAVING clause. This is because the SELECT clause forming the query result set is executed last but before the ORDER BY clause. Below is the execution order of clauses in the SELECT operator:
FROM WHERE GROUP BY HAVING SELECT ORDER BY
This order does not correspond to the syntax order of SELECT operator generally formed as follows: SELECT [DISTINCT | ALL]{* | [ [[AS] ]] [,…]} FROM
[[AS] ] [,…] [WHERE <predicate>] [[GROUP BY ] [HAVING ] ] [ORDER BY ]
Using multiple tables in a query The SELECT operator syntax given at the end of the previous chapter shows that more than one table may be pointed in the FROM clause. A table listing that does not use WHERE clause is practically unused because this produces the relational operation of the Cartesian product of the tables involved. That is, each record in one table meshes with each record in another table. For example, the tables A
in the query SELECT * FROM A, B;
produce the following result:
B
a
b
c
d
1
2
2
4
2
1
3
3
a
b
c
d
1
2
2
4
1
2
3
3
2
1
2
4
2
1
3
3
As a rule, the table listing is used with the condition of joining records from different tables in the WHERE clause. The condition for the above tables may be a coincidence of the values, say, in the columns a and c: SELECT * FROM A, B WHERE a=c;
Now the result set of that query is the following table: a
b
c
d
2
1
2
4
i.e. only those table rows that have equal values in specified columns (equi-join) are joined. Although some arbitrary conditions may be used, nevertheless the equi-join is most commonly used because this operation reproduces a certain entity split by two entities as a result of a normalization process. Even though different tables have columns with the same names, the dot notation is needed for uniqueness:
. In cases where ambiguity is not available, that notation need not be used. Example. Find the model and the maker of PC priced below $600: SELECT DISTINCT PC.model, maker FROM PC, Product WHERE PC.model = Product.model AND price < 600;
As a result each model of the same maker occurs in the result set only once: model
maker
1232
A
1260
E
Sometimes the table in the FROM clause need to be pointed more than once. In this case renaming is indispensable.
Example. Find the model pairs with equal price: SELECT DISTINCT A.model AS model_1, B.model AS model_2 FROM PC AS A, PC B WHERE A.price = B.price AND A.model < B.model;
Here the condition A.model < B.model is to issue one of similar pairs that is only distinguished by rearrangement, for example: 1232, 1233 and 1233, 1232. The DISTINCT keyword is only used to eliminate duplicate rows because equal models with the same price are in the PC table. As a result, we get the following table: model_1
model_2
1232
1233
1232
1260
Renaming is also needed in case the FROM clause uses a subquery. So, the first example can be rewritten as follows: SELECT DISTINCT PC.model, maker FROM PC, (SELECT maker, model FROM Product) AS prod WHERE PC.model = prod.model AND price < 600;
Note that in this case the Product qualifier may not be already used in other clauses of the SELECT statement. This is because the Product table is just out of use. Instead of this name the prod alias is used. Moreover, references are only possible to those Product table columns listed in the subquery.
Explicit join operations Explicit join operation for two and more tables may be present in the FROM clause. Among the join operation series described in the SQL standard, the join-on-predicate operation is only supported by the number of database servers. A join-on-predicate syntax is: FROM [INNER] | {{LEFT | RIGHT | FULL } [OUTER]} JOIN [ON <predicate>] A join type may be either the inner or one of the outers. The INNER and OUTER keywords may be omitted, because the outer join is uniquely defined by its type: LEFT, RIGHT, or FULL, whereas the inner join is simply refered to as JOIN. A predicate specifies the condition of joining the rows from different tables. In so doing INNER JOIN means that the result set will only include those combinations of rows in two tables for which the predicate elavuates to TRUE. As a rule, the predicate specifies the equijoin on foreign and primary keys of the tables joined, although need not be so. Example. Find the maker, model number, and price for each computer in the database:
SELECT maker, Product.model AS model_1, PC.model AS model_2, price FROM Product INNER JOIN PC ON PC.model = Product.model ORDER BY maker, PC.model;
In this example, the query returns only the row combinations from the PC and Product tables with identical model numbers. For better control, the result includes the model number both in the PC table and in the Product table: maker
model_1
model_2
price
A
1232
1232
600.0
A
1232
1232
400.0
A
1232
1232
350.0
A
1232
1232
350.0
A
1233
1233
600.0
A
1233
1233
950.0
A
1233
1233
980.0
B
1121
1121
850.0
B
1121
1121
850.0
B
1121
1121
850.0
E
1260
1260
350.0
The LEFT JOIN implies that all the rows from the first (left) table are to be in the result set along with the rows for which the predicate evaluates to true. In so doing, the non-matching colunm values in the right table are returned as null values. Example. Find all PC models, makers, and prices: SELECT maker, Product.model AS model_1, PC.model AS model_2, price FROM Product LEFT JOIN PC ON PC.model = Product.model WHERE type = 'PC' ORDER BY maker, PC.model;
It is essential that using the WHERE clause is indispensable for sampling only PC makers as compared with the previous example. Otherwise, laptop and printer models will also get into the result set. This condition is negligible for the above example because the rows only were joined with identical model numbers, and the PC table was between the two joined tables. Here is the result set: maker
model_1
model_2
price
A
1232
1232
600.0
A
1232
1232
400.0
A
1232
1232
350.0
A
1232
1232
350.0
A
1233
1233
600.0
A
1233
1233
950.0
A
1233
1233
980.0
B
1121
1121
850.0
B
1121
1121
850.0
B
1121
1121
850.0
E
2111
NULL
NULL
E
2112
NULL
NULL
E
1260
1260
350.0
Because models 2111 and 2112 in the Product table are absent in the PC table, the columns of the PC table contain NULL. The RIGHT JOIN is the reverse of the LEFT JOIN, i.e. the result set will include all the rows from the second table and only those rows from the first table for which the join condition is met. In our case the left join Product LEFT JOIN PC ON PC.model = Product.model will be equivalent to the following right join PC RIGHT JOIN Product ON PC.model = Product.model Just the query SELECT maker, Product.model AS model_1, PC.model AS model_2, price FROM Product RIGHT JOIN PC ON PC.model = Product.model ORDER BY maker, PC.model;
gives the same results as those for inner join, because the right table (PC) has no models that would be missing from the left table (Product), which is reasonable for one-to-many relationship type that occurs between the PC and Product tables. Finally, the result set with FULL JOIN will include not only the matching rows in the both tables but also all conceivable nonmatching rows in initial tables. All the nonmatching colunm values in both tables are returned as null values. A full join presents the combination of the left and right outer joins. So, for the above mentioned A and B tables, the query SELECT A.*, B.* FROM A FULL JOIN B ON A.a = B.c;
gives the following result set: a
b
c
d
1
2
NULL
NULL
2
1
2
4
NULL
NULL
3
3
Note this join is symmetrical, i.e. "A FULL JOIN B" is equivalent to "B FULL JOIN A". Also, it should be noted that notation A.* implies "all column from the А table".
Conventional set-theoretic operations and the SELECT statement Conventional set-theoretic operations are union, intersect, exception, and Cartesian product.
Cartesian product The Cartesian product discussed previously is realized as a comma-separated list of table expressions (tables, views, subqueries) in the FROM clause. In addition, another explicit join operation may be used: SELECT Laptop.model, Product.model FROM Laptop CROSS JOIN Product;
Recall that the Cartesian product combines each row in the first table with each row in the second table. The number of the rows in the result set is equal to the number of the rows in the first table multiplied by the number of the rows in the second table. In the example under consideration, the Laptop table has 5 rows while the Product table has 16 rows. As a result, we get 5*16 = 80 rows. Hence, there is no result set of that query here. You may check this assertion executing above query on the academic database. In the uncombined state, the Cartesian product is hardly used in practice. As a rule, it presents an intermediate restriction (horizontal ptojection) operation where the WHERE clause is available in the SELECT statement.
Union The UNION keyword is used for integrating queries:
1> [ALL]
The UNION operator combines the results of two SELECT statements into a single result set. If the ALL parameter is given, all the duplicates of the rows returned are retained; otherwise the result set includes only unique rows. Note that any number of queries may be combined. Moreover, the union order can be changed with parentheses. The following conditions should be observed:
The number of columns of each query must be the same. Result set columns of each query must be compared by the data type to each other (as they follows).
The result set uses the column names in the first query. The ORDER BY clause is applied to the union result, so it may only be written at the end of the combined query.
Example. Find the model numbers and prices of the PCs and laptops: SELECT model, price FROM PC UNION SELECT model, price FROM Laptop ORDER BY price DESC;
model
price
1750
1200.0
1752
1150.0
1298
1050.0
1233
980.0
1321
970.0
1233
950.0
1121
850.0
1298
700.0
1232
600.0
1233
600.0
1232
400.0
1232
350.0
1260
350.0
Example. Find the product type, the model number, and the price of the PCs and laptops: SELECT Product .type, PC.model, price FROM PC INNER JOIN Product ON PC.model = Product .model UNION SELECT Product .type, Laptop.model, price FROM Laptop INNER JOIN Product ON Laptop.model = Product .model ORDER BY price DESC;
type
model
price
Laptop
1750
1200.0
Laptop
1752
1150.0
Laptop
1298
1050.0
PC
1233
980.0
Laptop
1321
970.0
PC
1233
950.0
PC
1121
850.0
Laptop
1298
700.0
PC
1232
600.0
PC
1233
600.0
PC
1232
400.0
PC
1232
350.0
PC
1260
350.0
Intersect and Exception The SQL standard offers SELECT statement clauses for operating with the intersect and exception of queries. These are INTERSECT and EXCEPT clauses, which work as the UNION clause. The result set will include only those rows that are present in each query (INTERSECT) or only those rows from the first query that are not present in the second query (EXCEPT). Many of the DBMS do not support these clauses in the SELECT statement. This is also true for MS SQL Server. There are also other means to be involved while performing intersect and exception operations. It should be noted here that the same result may be reached by differently formulating the SELECT statement. In the case of intersection and exception one could use the EXISTS predicate.
The EXISTS predicate EXISTS::= [NOT] EXISTS (
) The EXISTS predicate evaluates to TRUE providing the subquery contains any rows, otherwise it evaluates to FALSE. NOT EXISTS works the same as EXISTS being satisfied if no rows are returnable by the subquery. This predicate does not evaluate to UNKNOWN. As in our case, the EXISTS predicate is generally used with dependent subqueries. That subquery type has an outer reference to the value in the main query. The subquery result may be dependent on this value and must be separately evaluated for each row of the query that includes the subquery. Because of this, the EXISTS predicate may have different values for each row of the main query. Intersection example. Find those laptop makers who also produce printers: SELECT DISTINCT maker
FROM Product AS Lap_product WHERE type = 'Laptop' AND EXISTS (SELECT maker FROM Product WHERE type = 'Printer' AND maker = Lap_product.maker);
The printer makers are retrieved by the subquery and compared with the maker returned from the main query. The main query returns the laptop makers. So, for each laptop maker it is checked that the subquery returns any rows (i.e. this maker also produces printers). Because the two queries in the WHERE clause must simultaneously be satisfied (AND), the result set includes only wanted rows. The DISTINCT keyword is used to make sure each maker is in the returned data only once. As a result, we get: maker A Exception example. Find those laptop makers who do not produce printers: SELECT DISTINCT maker FROM Product AS Lap_product WHERE type = 'Laptop' AND NOT EXISTS (SELECT maker FROM Product WHERE type = 'Printer' AND maker = Lap_product.maker);
Here, it is sufficient to replace EXIST in the previous example with NOT EXIST. So, the returned data includes only those main query rows, for which the subquery return no rows. As a result we get: maker B C
Using the SOME | ANY and ALL keywords with comparison predicates <expression> SOME|ANY (<subquery>) SOME and ANY are synonyms, i.e. any of them may be used. The subquery results is a single value column. If any value V returned by the subquery evaluates the operation "<expression value> V" to TRUE, the ANY predicate is also equal to TRUE. <expression> ALL (<subquery>) is similar to that with ANY, except that all values returned by the subquery must meet the predicate "<expression> V ".
Example. Find the PC makers whose models are not presently sold (i.e. they are not available in the PC table): SELECT DISTINCT maker FROM Product WHERE type = 'PC' AND NOT model = ANY (SELECT model FROM PC);
It turns out that maker E has not supplied their models from sale: maker E Let us consider that example in details. The predicate model = ANY (SELECT model FROM PC); returns TRUE if the model specified by the model column in the main query will be in the PCtable model list (returned by the subquery). Because of the predicate using the NOT negation, TRUE will be obtained unless the model is in the list. That predicate is checked for each row in the main query that return all PC models (type = 'PC' predicate) in the Product table. The result set consists of single column - maker's name. The DISTINCT keyword is used to eliminate any maker duplicates that may occur when a maker produces more than one model absent from the PC table. DISTINCT. Example. Find the models and prices for laptops with priced above any PC: SELECT DISTINCT model, price FROM Laptop WHERE price > ALL (SELECT price FROM PC);
model
price
1298
1050.0
1750
1200.0
1752
1150.0
Here are the formal rules for evaluating the predicate with ANY|SOME and ALL parameters:
If ALL or SOME parameter are given and all the comparison results of expression value with each value returned by the subquery are equal to TRUE, truth value is TRUE. If the result set of the subquery does not have any rows with the ALL parameter specified, the result is TRUE. However, if the SOME parameter is specified, the result is equal to FALSE.
If the ALL parameter has been specified and comparison of the expression value with at least one value obtained from the subquery gives FALSE, the truth value is equal to FALSE. If the SOME parameter is specified and comparison of the expression value with at least one value obtained from the subquery gives TRUE, the truth value is equal to TRUE. If the SOME parameter is specified and each comparison of the expression value with the values obtained from the subquery gives FALSE, the truth value is also equal to FALSE. Otherwise, the result evaluates to UNKNOWN.
Suggested exercises: 17, 24, 51.
Again about subqueries It should be noted that a query returns generally a collection of values, so a run-time error may occur during the query execution if one uses the subquery in the WHERE clause without EXISTS, IN, ALL, and ANY operators, which result in Boolean value. Example. Find the models and the prices of PC priced above laptops at minimal price: SELECT DISTINCT model, price FROM PC WHERE price > (SELECT MIN(price) FROM Laptop);
This query is quite correct, i.e. the scalar value of the price is compared with the subquery which returns a single value. As a result we get three PC models: model
price
1121
850.0
1233
950.0
1233
980.0
However, if in answer to question regarding the models and the prices of PCs that cost the same as a laptop one writes the following query: SELECT DISTINCT model, price FROM PC WHERE price = (SELECT price FROM Laptop);
the following error message will be obtained while executing the above query:
This error is due to comparison of the scalar value to the subquery, which returns either more that single value or none. In its turn, subqueries may also include nested queries. On the other hand, it is natural that subquery returning a number of rows and consisting of multiple columns may as well be used in the FROM clause. This restricts a column/row set when joining tables. Example. Find the maker, the type, and the processor's speed of the laptops with speed above 600 MGz. For example, this query may be formulated as follows: SELECT prod.maker, lap.* FROM (SELECT 'Laptop' AS type, model, speed FROM Laptop WHERE speed > 600) AS lap INNER JOIN (SELECT maker, model FROM Product) AS prod ON lap.model = prod.model;
As a result we get: maker
type
model
speed
B
Laptop 1750
750
A
Laptop 1752
750
And finally, queries may be present in the SELECT clause. Sometimes, this allows a query to be formulated in a shorthand form. Example. Find the difference between the average prices of PCs and laptops, i.e. by how mach is the laptop price higher than that of PC in average. Generally speaking, a single SELECT clause is sufficient in this case: SELECT (SELECT AVG(price) FROM Laptop) (SELECT AVG(price) FROM PC) AS dif_price;
Here is the result set:
dif_price 365.81818181818187
Type conversions Implicit data type conversions are possible in SQL implementations. For example, if a smallint is compared to an int in T-SQL, the smallint is implicitly converted to int before the comparison proceeds. See BOL for details about implicit and explicit conversions in MS SQL Server. Example. Find the average price of laptops with the prefix text "Average price = ". If you run the query SELECT 'Average price = ' + AVG(price) FROM laptop;
the following error message will be obtained: Implicit conversion from data type varchar to money is not allowed. Use the CONVERT function to run this query. This message implies that the system cannot accomplish implicit conversion of the character constant "Average price = " to data type money of the field price. In such cases, explicit type conversion can help. In so doing, as the above message says, you can use the CONVERT function. However this function is not based on the SQL-92 standard. Because of this, it is preferable to use the standard function CAST. So, we start with CAST. If we rewrite above query as follows SELECT 'Average price = ' + CAST(AVG(price) AS CHAR(15)) FROM laptop;
we get desired result: Average price = 1410.44
We have used the expression for explicit type conversion CAST to bring the average value to the character view. The statement CAST has very simple syntax: CAST(<expression> AS ) Firstly, it should be noted that some data type conversions are not supported. (SQL-92 Standard involves the table of allowed conversions). Secondly, NULL-value is converted into NULL-value also. Let us consider another example: Define the average launching year from the Ships table. The query SELECT AVG(launched) FROM ships;
gives 1926. In principle, it is correct, because a year is integer number. However arithmetic
mean will be about 1926,2381. It should be noted that aggregate functions (except COUNT which always returns integer value) inherits the type of data to be processed. Because the launched field is integer-valued, we have gotten the average value without fractional part (not rounded off). What must we do if the result ought to be obtained with two digits after decimal point? As mentioned above, applying the CAST statement to the average value gives no result. Indeed, SELECT CAST(AVG(launched) AS NUMERIC(6,2)) FROM ships;
returns the value of 1926.00. Consequently, the CAST statement should be applied to the argument of the aggregate function: SELECT AVG(CAST(launched AS NUMERIC(6,2))) FROM ships;
The result - 1926.238095 - is not exactly correct. This is because of implicit conversion that was accomplished when calculating the average value. Another step: SELECT CAST(AVG(CAST(launched AS NUMERIC(6,2))) AS NUMERIC(6,2)) FROM ships;
It is the correct result - 1926.24. However this solution looks too cumbersome. Let implicit conversion to work for us: SELECT CAST(AVG(launched*1.0) AS NUMERIC(6,2)) FROM ships;
Thus, we use implicit conversion of the argument from integer to exact numeric type by multiplying it by real unity. After that, explicit conversion is applied to the result of the aggregate function. The same conversions can be made with aid of the CONVERT function: SELECT CONVERT(NUMERIC(6,2),AVG(launched*1.0)) FROM ships;
The CONVERT function has the following syntax: CONVERT ([()], <expression> [, <style>]) The main distinction of the CONVERT function from the CAST statement is that the first allows formatting data (for example, temporal data of datetime type) when converting them to character data and specifying the format when converting character data to datetime. The values of integer optional argument style correspond to different formats. Let us consider the following example SELECT CONVERT(char(25),CONVERT(datetime,'20030722'));
Here, the string representation of a date is converted to datetime following the reverse
conversion to demonstrate the result of formatting. Since the style argument is omitted, default value is used (0 or 100). As a result, we obtain Jul 22 2003 12:00AM
Below are some values of the style argument and corresponding results from the above example. Note, the style values greater than 100 give four-place year. style
format
1
07/22/03
11
03/07/22
3
22/07/03
121
2003-07-22 00:00:00.000
All possible values of the style argument are given in BOL.
CASE statement Let the list of all the models of PC is required along with their prices. Besides that, if the model is not on sale (not in PC table), in the place of price must be the text "Not available". The list of all the PC models with its prices we can obtain running the query: SELECT DISTINCT product.model, price FROM product LEFT JOIN pc c ON product.model=c.model WHERE product.type='pc';
Missing prices will be replaced by NULL-values: model
price
1121
850
1232
350
1232
400
1232
600
1233
600
1233
950
1233
980
1260
350
2111
NULL
2112
NULL
The CASE statement helps us to get required text instead of NULL: SELECT DISTINCT product.model, CASE WHEN price IS NULL THEN 'Not available' ELSE CAST(price AS CHAR(20)) END price FROM product LEFT JOIN pc c ON product.model=c.model WHERE product.type='pc'
Depending on defined conditions, the CASE statement returns one of the possible values. The condition in above example is the checking for NULL. If this condition is satisfied, the text "Not available" will be returned; otherwise (ELSE) it will be the price. One principal moment is here. As a table is always the result of the SELECT statement, all values from any column must be of the same data type (having regard to implicit type conversions). Then, we cannot combine character constant with a price (numeric type) within a single column. That is the reason why we use the type conversion to the price column to reduce its type to character. As a result, we get model
price
1121
850
1232
350
1232
400
1232
600
1233
600
1233
950
1233
980
1260
350
2111
Not available
2112
Not available
The CASE statement may be used in one of two syntax forms: The first form CASE WHEN <when expression 1> THEN … WHEN <when expression N> THEN [ELSE ] END Second form CASE WHEN <predicate 1> THEN … WHEN <predicate N> THEN
[ELSE ] END All WHEN clauses must be in the same syntax form, i.e. first and second forms cannot be mixed. When using the first syntax form, the WHEN condition is satisfied as soon as the value of when expression will be equal to the value of input expression. When using the second syntax form, the WHEN condition is satisfied as soon as the predicate evaluates to TRUE. When satisfying condition, the CASE statement returns the return expression from the corresponding THEN clause. If no WHEN expression is satisfied, the return expression from the ELSE clause will be used. If no ELSE clause is specified, a NULL value will be returned. If more than one condition are satisfied, the first return expression from them will be returned. The above example uses the second form of the CASE statement. It should be noted that checking for NULL could also be made using the standard function COALESCE, which is simpler. This function has arbitrary number of arguments and returns the first not-NULL expression among them. In the case of two arguments, COALESCE(A, B) is equivalent to the following CASE statement: CASE WHEN A IS NOT NULL THEN A ELSE B END When using the COALESCE function, the solution to the above example may be rewritten as follows SELECT DISTINCT product.model, COALESCE(CAST(price as CHAR(20)),'Not available') price FROM product LEFT JOIN pc c ON product.model=c.model WHERE product.type='pc';
Usage of the first syntax form of the CASE statement can be demonstrated by the following example: Get all available PC models, their prices, and information about the most expensive and cheap models. SELECT DISTINCT model, price CASE price WHEN (SELECT MAX(price) FROM pc) THEN 'Most expensive' WHEN (SELECT MIN(price) FROM pc) THEN 'Most cheap' ELSE 'Mean price' END comment FROM pc ORDER BY price;
The result set model
price
comment
1232
350
Most cheap
1260
350
Most cheap
1232
400
Mean price
1233
400
Mean price
1233
600
Mean price
1121
850
Mean price
1233
950
Mean price
1233
980
Most expensive
Transact-SQL Functions handling datetime data types Standard SQL-92 specifies only the functions, which return current system date/time. Thus, the CURRENT_TIMESTAMP function returns both the date and the time. In addition, there are functions, which return one of the components. In view of that restriction, language realizations expand standard by introducing the functions that make handling datetime data types more convenient for users. Here we consider datetime functions in T-SQL.
DATEADD Syntax DATEADD ( datepart , number, date ) Returns a datetime value based on adding a number of intervals of the datepart types to the specified date. For example, we can add to the specified date any number of years, days, hours, minutes etc. The table from BOL lists the dateparts and abbreviations. Datepart
Abbreviations
Year
yy, yyyy
Quarter
qq, q
Month
mm, m
Dayofyear
dy, y
Day
dd, d
Week
wk, ww
Hour
hh
Minute
mi, n
Second
ss, s
Millisecond
ms
If we interested in the day, which will be after 01/23/2004, we can write SELECT DATEADD(day, 7, current_timestamp) or SELECT DATEADD(ww, 1, current_timestamp) We get the same result; something like 2004-01-30 19:40:58.923. But we cannot write in that case as follows SELECT DATEADD(mm, 1/4, current_timestamp) Because of eliminating the fractional part of the argument datepart, we get 0 instead of one fourth and, as result, the current date. We can also use the T-SQL GETDATE() function instead of CURRENT_TIMESTAMP. Example (scheme 4). Find the day through a week after the last flight. SELECT DATEADD(day, 7, (SELECT MAX(date) max_date FROM pass_in_trip))
Usage of subquery as an argument is allowed, for that subquery returns a single value of datetime type.
DATEDIFF Syntax DATEDIFF ( datepart , startdate , enddate ) The function returns the interval of date and time elapsed between two specified dates startdate and enddate. This interval may be meassured in different units. Available values of the argument datepart listed above for the DATEADD function. Example (scheme 4). Find the number of days elapsed between the first and the last flights executed. SELECT DATEDIFF(dd, (SELECT MIN(date) FROM pass_in_trip), (SELECT MAX(date) FROM pass_in_trip)) Example (scheme 4). Find the duration in minutes of the flight for the trip no. 1123. It should be noted that the departure time (time_out) and the landing time (time_in) are stored in the fields of datetime type in the Trip table. Note, SQL Server up to version 2000 has not temporal data types for storing the date or time separately, which are expected in the next version (Yukon). Because of this, when inserting only the time in the datetime column (for example, UPDATE trip SET time_out = '17:24:00' WHERE trip_no=1123), the time will be supplemented by the default date value ('1900-01-01'). The simple solution SELECT DATEDIFF(mi, time_out, time_in) dur FROM trip WHERE trip_no=1123, (which gives -760) will be incorrect for two reasons. Firstly, the value obtained in such manner will be incorrect for the trips that depart in one day and land in another one. Secondly, it is unreliably to make any suggestions on a day that is only presented of necessity to correspond to datetime data type. How can we know that a plane lands in the next day? This knowledge comes from the subject area, which says the flight cannot be over 24 hours. So, if the landing time not more than the departure time, this is the case. The second question: how do we calculate only the time apart from the day? Now we turn to DATEPART function.
DATEPART Syntax DATEPART ( datepart , date ) This function returns an integer representing the specified datepart of the specified date. The above list of available values of datepart argument is added by the following Datepart
Abbreviations
Weekday
dw
Note that the value returning by the DATEPART function in this case (weekday) depends on the value set by SET DATEFIRST, which sets the first day of the week. Default value is Sunday = 1
Let us turn to above example. Under suggestions that the departure/landing times are measured with an accuracy of one minute, we can define it in minutes as minimal integer units. So, departure time of the trip no. 1123 in minutes is SELECT DATEPART(hh, time_out)*60 + DATEPART(mi, time_out) FROM trip WHERE trip_no=1123 and the landing time is SELECT DATEPART(hh, time_in)*60 + DATEPART(mi, time_in) FROM trip WHERE trip_no=1123 Now we need to compare whether the landing time exceeds the departure time. If so, we must subtract second time from the first time; otherwise, 1440 minutes (one day) need to be added to the remainder. SELECT CASE WHEN time_dep>=time_arr THEN time_arr-time_dep+1440 ELSE time_arrtime_dep END dur FROM ( SELECT DATEPART(hh, time_out)*60 + DATEPART(mi, time_out) time_dep, DATEPART(hh, time_in)*60 + DATEPART(mi, time_in) time_arr FROM trip WHERE trip_no=1123 ) tm Here, we use subquery to avoid repetition of cumbersome constructions. Despite of tedious form, the result is absolute correct in view of above remarks. Example (scheme 4). Define the departure date and time of the trip no. 1123. Only a date but not a time is stored in the Pass_in_trip table. This is because of any trip is executed only once a day. To solve this task, we need to combine the time from the Trip table with the date from the Pass_in_trip table. SELECT Dpt.trip_no, DATEADD(mi, DATEPART(hh,time_out)*60 + DATEPART(mi,time_out), date) [time] FROM pass_in_trip pt JOIN trip t ON pt.trip_no=t.trip_no WHERE t.trip_no=1123 If you run above query, the following result will be obtained Trip_no
Time
1123
2003-04-05 16:20:00.000
1123
2003-04-08 16:20:00.000
DISTINCT is used here to eliminate duplicates if any. It should be noted that trip number and day is duplicated in Pass_in_trip table for each passenger of the same trip.
DATENAME Syntax DATENAME ( datepart , date ) This function returns a character string representing the specified datepart of the specified date. An argument representing the datepart can be one of the values listed in above table. The function gives us a simple opportunity to concatenate the date components to get any required format. For example, the query SELECT DATENAME ( weekday , '2003-12-31' )+', '+DATENAME ( day , '2003-12-31' )+' '+ DATENAME ( month , '2003-12-31' )+' '+DATENAME ( year , '2003-12-31' ) gives the following result Wednesday, 31 December 2003
Notice that DATENAME function reveals the difference between day and dayofyear values of datepart argument. The first gives a character representation of the day of the specified date, whereas the second gives a character representation of this day from the origin of a year, i.e. SELECT DATENAME ( day , '2003-12-31' ) gives 31, but SELECT DATENAME ( dayofyear , '2003-12-31' ) gives 365. In some cases the DATEPART function can be replaced by more simple functions. Here they are: DAY ( date ) - integer representing the day of the specified date. This function is equivalent to the DATEPART(dd, date) function. MONTH ( date ) - integer representing the month of the specified date. This function is equivalent to the DATEPART(mm, date) function. YEAR ( date ) - integer representing the year of the specified date. This function is equivalent to the DATEPART(yy, date) function.
Data modification statements Data Manipulation Language (DML) besides of the SELECT statement that retrieves information from databases includes also statements modifying data state. These statements are: INSERT
Inserts rows into database table
UPDATE
Changes values in columns of database table
DELETE
Deletes rows from database table
INSERT statement The INSERT statement adds new rows to a table. In so doing, the column values may be literal constants or be formed from result of subquery execution. In the first case, one INSERT statement adds only one row; in the second case, one INSERT statement adds as many rows as the subquery returns. Syntax INSERT INTO
[(,...)] {VALUES (< column value>,…)} |
[type] [varchar] (7) NOT NULL ) It is needed to add to the above table the model 1157 of the maker B. This can be accomplished by the following statement: INSERT INTO Product VALUES ('B', 1157, 'PC'); With defining the column list, we can change a "natural" order of columns: INSERT INTO Product (type, model, maker) VALUES ('PC', 1157, 'B'); This is seemingly excess opportunity that makes writing more cumbersome. However it can be very useful if columns have default values. Consider a table of the following structure: CREATE TABLE [product_D] ( [maker] [char] (1) NULL , [model] [varchar] (4) NULL , [type] [varchar] (7) NOT NULL DEFAULT 'PC' ) Note that all above columns have default values (the first two - NULL, and the last - type column - 'PC'). Now we could write: INSERT INTO Product_D (model, maker) VALUES (1157, 'B'); In this case when adding row, the absent value will be replaced by the default value - 'PC'. Note, if neither a default value nor the NOT NULL constraint definition was specified for a column in the CREATE TABLE statement, NULL is implied as default value. This raises the question of whether we use default values but, nevertheless, do not specify the column list? The answer is positive. To do this, we can use DEFAULT keyword instead of specifying a value explicitly: INSERT INTO Product_D VALUES ('B', 1158, DEFAULT); As all the columns have default values, we could add a row with default values by the following statement: INSERT INTO Product_D VALUES (DEFAULT, DEFAULT, DEFAULT); However for this case there is the special DEFAULT VALUES keyword (see syntax), thus we can rewrite the above statement as follows INSERT INTO Product_D DEFAULT VALUES; Note that when inserting a row into a table, all restrictions for this table will be checked. These restrictions are primary key or unique constraints, CHECK or foreign key constraints. When any of these restrictions is violated, the addition of a row will be denied.
Let us consider the case of subquery use on the following example: add to the Product_D table all the rows from the Product table, which refer to the models of PC (type = 'PC'). Since the needed values are just in a table, we should not add them by typing, but by using a subquery: INSERT INTO Product_D SELECT * FROM Product WHERE type = 'PC'; Usage the "*" symbol in the subquery is enough warranted here because the column orders are identical for both the tables. If this is not being so, it should be use column list either in the INSERT statement or in the subquery, or in the both, bringing the column order in consistency: INSERT INTO Product_D(maker, model, type) SELECT * FROM Product WHERE type = 'PC'; or INSERT INTO Product_D SELECT maker, model, type FROM Product WHERE type = 'PC'; or INSERT INTO Product_D(maker, model, type) SELECT maker, model, type FROM Product WHERE type = 'PC'; Here, as before, all the columns could not be listed if available default values are required to be use. For example: INSERT INTO Product_D (maker, model) SELECT maker, model FROM Product WHERE type = 'PC'; In this case, the default value - 'PC' - will be inserted into the type column of the Product_D table for all added rows. When using subquery with predicate, it should be noted that only those rows will be added for which the predicate evaluates to TRUE (not UNKNOWN!). In other words, if the type column in the Product table adopts NULL-value and NULLs are in any rows, these rows will not be added to the Product_D table. To overcome the restriction of inserting a single row with VALUES clause, we can use an artificial trick of forming by subquery the row with the clause UNION ALL. INSERT INTO Product_D SELECT 'B' AS maker, 1158 AS model, 'PC' AS type UNION ALL SELECT 'C', 2190, 'Laptop' UNION ALL SELECT 'D', 3219, 'Printer'; Using the UNION ALL clause is preferable to the UNION clause, even though duplicates of rows assuredly absent. This is because checking of duplicate would not be accomplished.
Adding rows to a table with identity column Many commercial products allow using autoincrementable columns, i.e. columns that form their values automatically when adding new rows. Such columns are widely used as primary keys in tables by virtue of uniqueness of these values. A typical example of that column is sequential counter, which generates a value greater by 1 than previous value (the value obtained when adding previous row). Below, there is example of creating a table with identity column (code) in MS SQL Server. CREATE TABLE [Printer_Inc] ( [code] [int] IDENTITY(1,1) PRIMARY KEY , [model] [varchar] (4) NOT NULL , [color] [char] (1) NOT NULL , [type] [varchar] (6) NOT NULL , [price] [float] NOT NULL ) Autoincrementable column is defined trough IDENTITY (1, 1) function where the first parameter (seed value) is the value to be assigned to the first row in the table, and the second is the increment to add to the seed value for successive rows in the table. So, in that example, the first inserted row obtains in the code column the value of 1, the second row - 2 and so on. Since the value in the code column is formed automatically, the statement INSERT INTO Printer_Inc VALUES (15, 3111, 'y', 'laser', 2599); raises error, even though the table has not a row with the value of 15 in the code column. Thus we will not include that column when adding the row to the table just as we do when using default value, i.e. INSERT INTO Printer_Inc (model, color, type, price) VALUES (3111, 'y', 'laser', 2599); As a result, the information about model 3111 for color laser printer priced $2599 will be added to the Printer_Inc table. The value in the column code will be 15 only accidentally. In the most cases, specific value in an identity column is of no concern, for this value has no sense as a rule; the first moment is uniqueness of the value. Nevertheless, there are cases where a specific value needs to be inserted into autoincrementable column. This takes place, for example, when existing data must be transferred into a new structure; in so doing, these data are in the "one-to-many" relation from the "one " side. Thus, arbitrariness is not allowed here. On the other hand, we want use autoincrementable field for later. In view of absence of autoincrementable columns in the SQL Standard, single approach does not exist. Here, the realization in MS SQL Server. The statement SET IDENTITY_INSERT < table name > { ON | OFF }; turn off (ON value) or on (OFF value) auto increment use. Thus, to add a row with the value of 15 in the code column, we may write
SET IDENTITY_INSERT Printer_Inc ON; INSERT INTO Printer_Inc(code, model, color, type, price) VALUES (15, 3111, 'y', 'laser', 2599); Note that column list is necessary in this case, i.e. we can neither write so: SET IDENTITY_INSERT Printer_Inc ON; INSERT INTO Printer_Inc VALUES (15, 3111, 'y', 'laser', 599); nor more so SET IDENTITY_INSERT Printer_Inc ON; INSERT INTO Printer_Inc(model, color, type, price) VALUES(3111, 'y', 'laser', 2599); In the last case, missing value in the code column cannot be inserted automatically because auto increment is turned off. It should be noted that numbering would be continued from the value of 16 if the value of 15 were the maximum in the code column. Clearly if the auto increment will be turned on: SET IDENTITY_INSERT Printer_Inc OFF. At last, let us consider an example addition the data from the Product table to the Product_Inc table with conserving the values in the code column: SET IDENTITY_INSERT Printer_Inc ON; INSERT INTO Printer_Inc(code, model,color,type,price) SELECT * FROM Printer; It should be said the following on autoincrementable columns. Let the last value in the code column is 16. Then the row with that value is deleted. What the value is in this column after adding a new row? Correctly, 17, i.e. the last value of counter is conserved despite of deletion of the row with the value of 16. So, the numbering will not be sequential after deletion and addition of rows. This is another reason for inserting the row with a given (missed) value in the autoincrementable column.
UPDATE statement UPDATE statement changes existing data in a table. Syntax UPDATE
SET {column = {expression | NULL | DEFAULT},...} [ {WHERE <predicate>}]; With a single UPDATE statement, arbitrary number of columns can change their values. However a column cannot be changed in the same UPDATE statement more than once. All rows of a table will be affected if a WHERE clause is eliminated from an UPDATE statement.
If the column is defined to allow NULL values, NULL can be specified explicitly. Moreover, existing value can be changed by default value (DEFAULT) for a given column. An expression may refer to current values in a table to be changed. For example, we may decrease prices by 10 percent on all the laptops with the following statement: UPDATE Laptop SET price=price*0.9 The values of one column are allowed to assign to another column. For example, it is required to change HD less than 10 Gb in laptops. New HD capacity must be one-half of available RAM. The solution may be as follows: UPDATE Laptop SET hd=ram/2 WHERE hd<10 Clearly, data types of the hd and ram columns need to be compatible. The CAST expression may be used for conversion. If the data needs to be changed depending on contents of some column, we can use the CASE expression. Say, if it is needed to setup HD with capacity of 20 Gb in laptops with RAM less than 128 Mb and HD with capacity of 40 Gb in remaining laptops, we can write the query: UPDATE Laptop SET hd = CASE WHEN ram<128 THEN 20 ELSE 40 END Use of subquery is also allowed to define the column value. For example, we need to complete all laptops with most fast available processor. Then we write: UPDATE Laptop SET speed = (SELECT MAX(speed) FROM Laptop) Some words about autoincrementable columns ought to be said. If the code column in the Laptop table is defined as IDENTITY(1,1), then the following statement UPDATE Laptop SET code=5 WHERE code=4 will not be executed, since autoincrementable column does not allow modification, but execution error will be returned. To solve above task, we can proceed as follows. At first, let us insert the needed row using SET IDENTITY_INSERT, then delete the old row: SET IDENTITY_INSERT Laptop ON INSERT INTO Laptop_ID(code, model, speed, ram, hd, price, screen) SELECT 5, model, speed, ram, hd, price, screen FROM Laptop_ID WHERE code=4 DELETE FROM Laptop_ID WHERE code=4 Clearly, another row with the value code=5 must not be in the table. Transact-SQL UPDATE statement extends the Standard at the cost of using the optional FROM clause. This clause specifies a table that is used to provide the criteria for the update
operation. This extension gives additional flexibility specifying a join that can be used instead of a subquery in the WHERE clause to identify rows to be updated. Example. Let us write "No PC" in the type column for those PC models in the Product table that have not corresponding rows in the PC table. The solution through table join may be written as: UPDATE Product SET type='No PC' FROM Product pr LEFT JOIN PC ON pr.model=pc.model WHERE type='pc' AND pc.model IS NULL Here, we use outer join that results in the pc.model column contains NULL values for those PC models that are absent from the PC table, which is used for providing the criteria for the update operation. Clearly, this task has also a "standard" solution: UPDATE Product SET type='No PC' WHERE type='pc' AND model NOT IN (SELECT model FROM PC)
DELETE statement DELETE statement delete rows from temporary or basic tables, views and cursors, with the statement action in the two last cases is propagated on those basic tables, from which were extracted data into these views and cursors. Syntax: DELETE FROM
[WHERE <predicate>]; If the WHERE clause is absent, all rows from the table or view will be deleted (the view must be updatable). Deleting the all rows from a table in Transact-SQL can be also done (faster than a DELETE statement) by using the TRUNCATE TABLE statement. Syntax TRUNCATE TABLE
Some distinctions are in the realization of the TRUNCATE TABLE statement, which one should keep in mind: 1. TRUNCATE TABLE statement does not record an entry in the transaction log for each deleted row, only the page deallocations are recorded in the transaction log. 2. Triggers do not fire. As a result, this statement cannot be used on a table referenced by a FOREIGN KEY constraint. 3. The counter used by an IDENTITY for new rows is reset to the seed for the column. Example. Needed to delete from the Laptop table all the laptops with the screen size less than 12 in. DELETE FROM Laptop WHERE screen<12
All laptops we can delete with the query DELETE FROM Laptop or TRUNCATE TABLE Laptop Transact-SQL statement extends the DELETE statement over Standard introducing additional FROM clause. FROM
This extension allows us to specify data to be deleted from the table in the first FROM clause. This extension gives additional flexibility specifying a join that can be used instead of a subquery in the WHERE clause to identify rows to be deleted. Example. Let us need to delete the PC models from the Product table that have not corresponding rows in the PC table. Using standard syntax, the task may be solved with the query: DELETE FROM Product WHERE type='pc' AND model NOT IN (SELECT model FROM PC) Note that the predicate type='pc' is necessary here; otherwise printer and laptop models will be deleted. This task can be solved through the additional FROM clause as follows: DELETE FROM Product FROM Product pr LEFT JOIN PC ON pr.model=pc.model WHERE type='pc' AND pc.model IS NULL Here, we use outer join that results in the pc.model column contains NULL values for those PC models that are absent from the PC table, which is used for providing the criteria for the delete operation.
Appendix 1. About educational databases All exercises are performed on the databases described below.
1. Computer firm
The database scheme consists of four relations: Product(maker, model, type) PC(code, model, speed, ram, hd, cd, price) Laptop(code, model, speed, ram, hd, screen, price) Printer(code, model, color, type, price) The relation "Product" shows the maker, model number, and type (pc, laptop, or printer). It is assumed that model numbers are unique for all the makers and product types. For each model number specifying pc in the relation "PC", its listed speed (of the processor in MGz), total RAM (in MGb), hd capacity (in Gb), CD ROM speed (for example, '4x'), and the price. The relation "Laptop" is similar to that one of PCs except for the CD ROM speed which is replaced by screen size (in inches). For each printer model in the relation "Printer" it is pointed whether the printer is color or not (color attribute is 'y' for color printers; otherwise it is 'n'), printer type (laser, jet, or matrix), and the price.
2. Recycling firm The firm has a few outlets that receive items for recycling. Each of the shops receives funds to be paid to deliverers. Information on received funds is registered in a table: Income_o(point, date, inc) The primary key is (point, date), thus receipt of money (inc) takes place not more then once a day. Information on payments to deliverers is registered in the table: Outcome_o(point, date, out) In this table the primary key (point, date) also ensures bookkeeping of the funds distribution at each point not more then once a day. In case income and expenses may occur more then once a day, the tables (primary key is code) are used: Income(code, point, date, inc) Outcome(code, point, date, out)
3. Ships Under consideration is the database of naval ships that took part in World War II. The database has the following relations: Classes(class, type, country, numGuns, bore, displacement) Ships(name, class, launched) Battles(name, date) Outcomes(ship, battle, result) Ships in classes are arranged to a single project. A class is normally assigned the name of the first ship in the class under consideration (head ship); otherwise, class name does not coincide with any ship name in the database). The relation Classes includes the class name, type (bb for a battle ship, or bc for a battle cruiser), country the ship was built, number of main guns, gun caliber (diameter of the gun barrel, in inches), and displacement (weight in tons). The relation Ships includes the ship's name, its class name, and launch year. The relation Battles covers the name and date of the battle the ships participated; while the result of their participation in the battle (sunk, damaged, or unharmed - OK) is in the relation Outcomes. Note: the relation Outcomes may include the ships not included in the relation Ships.
4. Airport Database schema consists of 4 tables: Company(ID_comp, name)
Trip(trip_no, id_comp, plane, town_from, town_to, time_out, time_in) Passenger(ID_psg, name) Pass_in_trip(trip_no, date, ID_psg, place) Company table has ID and name of the company, which transports passengers. Trip table has information about trips: trip's number, company's ID, plane's type, departure city, arrival city, departure time, and arrival time. The passenger table has passenger's ID and passenger's name. Pass_in_trip table has information about the flights: trip's number, departure date, passenger's ID and his place during the flight. We should note that, - Any trip is being accomplished every day; duration of the flight is less than a calendar-day (24 hours) - Time and date are considered comparatively one time zone - There can be the passengers bearing the same surnames - Relationships and restrictions are shown in the data schema.
5. Painting Database schema consists of 3 tables: utQ (Q_ID int,Q_NAME varchar(35)) utV (V_ID int,V_NAME varchar(35),V_COLOR char(1)) utB (B_Q_ID int,B_V_ID int,B_VOL tinyint, B_DATETIME datetime) The table utQ includes square identifier, square name. Note that non-painted square is black. The table utV includes balloon identifier, balloon name, and paint color. The table utB shows information on painting square with paint balloon and includes the square identifier, the balloon identifier, the paint quantity, and time of painting. It should be noted that - balloon may be of one from three colors: red (V_COLOR='R'), green (V_COLOR='G'), or blue (V_COLOR='B'); - any balloon is originally full with volume of 255; - square color is defined in accordance with RGB rule, i.e. R=0, G=0, B=0 is black, whereas R=255, G=255, B=255 is white; - any record in the table utB decreases paint quantity in the balloon by B_VOL and increase paint quantity in the square by the same value; - B_VOL may be between 0 and 255; - paint quantity of the same color in one square may not be over 255; - paint quantity in a balloon may not be less than 0
Appendix 2. The list of exercises (SELECT) Below is the list of all the first-phase exercises with the number of each exercise stated in the tasks, the database number for these exercises, and the complexity level. Номер
База Уровень
Упражнение
1
1
1
Find the model number, speed and hard drive capacity for all the PCs with prices below $500. Result set: model, speed, hd.
2
1
1
Find printer makers. Result set: maker.
3
1
1
Find the model number, RAM and screen size of the laptops with prices over $1000.
4
1
1
Find all the records from the relation Printer for color printers.
5
1
1
Find the model number, speed and hard drive capacity of the PCs having 12x or 24x CD and prices less than $600.
6
1
1
Point out the maker and speed of the laptops having hard drive capacity more or equal to 10 Gb.
7
1
1
Find out the models and prices for all the products (of any type) produced by maker B.
8
1
2
Find out the makers that sale PCs but not laptops.
9
1
1
Find the makers of the PCs that have speed not less than 450MGz. Result set: Maker.
10
1
1
Find the printers having the highest price. Result set: model, price.
11
1
1
Find out the average speed of PCs.
12
1
1
Find out the average speed of the laptops priced over $1000.
13
1
1
Find out the average speed of the PCs produced by maker A.
14
1
1
For each value of speed find the average price of the PC with the same processor speed. Result set: speed, average price.
15
1
2
Find the hard drive sizes coinciding with two or more PCs. Result set: hd.
16
1
2
Find the pairs of PC models having similar speeds and RAM. As a result, each resulting pair is shown only once, i.e. (i, j) but not (j,
i). Result set: model with high number, model with low number, speed, and RAM. 17
1
2
Find the laptops having speeds less than all PC's. Result set: type, model, speed.
18
1
2
Find the makers of the cheapest color printers. Result set: maker, price.
19
1
2
For each maker find out the average screen size of the laptops produced by him. Result set: maker, average screen size.
20
1
2
Find the makers producing at least three distinct models of PCs. Result set: maker, number of models.
21
1
2
Define the maximal price of PC produced by each maker. Result set: maker, maximal price.
22
1
2
For each value of PC speed that exceeds 600 MGz, define the average price of the PCs with identical speeds. Result set: speed, average price.
23
1
3
Find the maker producing at least two different computers (both PC and laptop) having speed not less than 750 MGz. Result set: Maker.
24
1
3
Find the model number of the product (PC, laptop, or printer) with the highest price. Result set: model.
25
1
3
Find the printer makers which also produce PCs with the lowest RAM and the highest-speed processor among PCs with the lowest RAM. Result set: maker.
26
1
3
Define the average price of the PCs and laptops produced by maker A. Result set: single total price.
27
1
3
Define the average size of the PC hard drive for each maker that also produces printers. Result set: maker, average capacity of HD.
28
1
3
Define the average HD size (single value for all makers) of PCs produced by those makers that also produce printers. Result set: average size of HD.
3
Under the assumption that the income (inc) and expenses (out) of the money at each outlet are written not more than once a day, get a result set with the fields: point, date, income, expense. Use tables Income_o and Outcome_o.
29
2
30
2
3
Under the assumption that the income (inc) and expenses (out) of the money at each outlet (point) are registered any number of times a day, get a result set with fields: outlet, date, income, expense. Note that a single record must correspond to each outlet at each date. Use tables Income and Outcome.
31
3
1
Point out the names and countries of the ship classes the gun caliber of which is not less than 16 in.
32
3
1
Find the ships launched before 1921. Result set: name.
33
3
1
Define the ships sunken at the battles in the North Atlantic. Result set: ship.
34
3
1
In accordance with the Washington International Treaty on the begining of 1922, it was prohibited to build linear ships with
displacement more than 35 thousand tons. Define the ships which violate this treaty. Result set: ship name. 35
3
3
Point out the names, displacements and number of guns of ships participating in the battle at Guadalcanal.
36
3
2
List the names of head ships in the database (taking into account Outcomes table).
37
3
2
Find the classes including only one ship in the database (taking into account Outcomes table).
38
3
2
Find the countries having classes of both ordinary ships and cruisers.
39
3
3
Define the ships that "survived for future battles"; being damaged in one battle, they took part in another.
40
3
1
Find the class, name and country for all the ships having not less than 10 guns.
41
3
2
Define the names of all ships in the database launched before 1941.
42
3
1
Find the names of ships sunked in battles and the names of corresponding battles.
43
3
2
For each country, find the year, in which the maximal number of ships had been launched. In the case of more than one year in question, take a minimal year. Result set: country, number of ships, year
44
3
1
Find all ship names beginning with letter "R".
45
3
1
Find the all ship names consisting of three or more words (for example King George V)
46
3
3
Point out the battles in which at least three ships from the same country took part.
47
3
1
Find the countries the ships of which have the largest amount of guns.
48
3
2
Find the ship classes which have at least one ship sunked in a battle.
49
3
2
Find the names of the ships having the gun caliber of 16 in (taking into account Outcomes table).
50
3
1
Find the battles in which Kongo-class ships took part.
51
3
3
Find the names of the ships having the largest amount of guns among all the ships with the same displacement (taking into account Outcomes table).
52
3
1
Define the number of battleship classes.
53
3
1
To within two decimal digits, define the average amount of guns for the battleship classes.
54
3
2
To within two decimal digits, define the average amount of guns for all the battleships (taking into account Outcomes table).
55
3
1
For each class, define the year in which the first ship of this class was launched. Result set: class, year.
56
3
3
For each class, define the number of ships of this class that were sunked in a battles. Result set: class, number of sunked ships.
57
3
3
For each class, consisting of at least three ships, define the number of ships of this class sunked in battles. Result set: class, number of sunked ships.
58
3
3
The weight of a gun missile (in pounds) is almost equal to one-half cube of its caliber (in inches). Define the average weight of the missile for the ships of each country (taking into account Outcomes table).
59
2
2
Calculate the remainder at each outlet for the database with accounts drawn not more than once a day. Result set: point, remainder.