Normalization

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Normalization as PDF for free.

More details

  • Words: 4,228
  • Pages: 10
Normalization & Denormalization

1

Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminate redundant data (for example, storing the same data in more than one table) and ensure data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. First normal form (1NF) sets the very basic rules for an organized database: 1. Eliminate duplicative columns from the same table. 2. Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key). Second normal form (2NF) further addresses the concept of removing duplicative data: 1 Meet all the requirements of the first normal form. 2 Remove subsets of data that apply to multiple rows of a table and place them in separate tables. 3 Create relationships between these new tables and their predecessors through the use of foreign keys. Third normal form (3NF) goes one large step further: 1 Meet all the requirements of the second normal form. 2 Remove columns that are not dependent upon the primary key. Finally, fourth normal form (4NF) has one additional requirement: 1 Meet all the requirements of the third normal form. 2 A relation is in 4NF if it has no multi-valued Dependencies Q A clustered table will store its rows physically on disk in order by a specified column or columns The difference is that, Clustered index is unique for any given table and we can have only one clustered index on a table. The leaf level of a clustered index is the actual data and the data is resorted in case of clustered index.Whereas in case of non-clustered index the leaf level is actually a pointer to the data in rows so we can have as many non-clustered indexes as we can on the db. The clustering index forces table rows to be stored in ascending order by the indexed columns. There can be only one clustering sequence per table (because physically the data can be stored in only one sequence).

Use clustered indexes for the following situations: • • • • • •

Join columns, to optimize SQL joins where multiple rows match for one or both tables participating in the join Foreign key columns because they are frequently involved in joins and the DBMS accesses foreign key values during declarative referential integrity checking Predicates in a WHERE clause Range columns Columns that do not change often (reduces physically reclustering) Columns that are frequently grouped or sorted in SQL statements

Q Triggers are basically used to implement business rules.Triggers is also similar to stored procedures. The difference is that it can be activated when data is added or edited or deleted from a table in a database.

Normalization & Denormalization

2

Q Delete command removes the rows from a table based on the condition that we provide with a WHERE clause. Truncate will actually remove all the rows from a table and there will be no data in the table after we run the truncate command. Delete Command require Log file updation for each row of deleting process. But the Truncate command not.So, the Truncate Command is so faster than Delete Command. Q A transaction is a sequence of sql Operations(commands), work as single atomic unit of work. To be qualify as "Transaction" , this sequence of operations must satisfy 4 properties , which is knwon as ACID test. A(Atomicity):-The sequence of operations must be atomic, either all or no operations are performed. C(Consistency):- When completed, the sequence of operations must leave data in consistent mode. All the defined relations/constraints must me Maintained. I(Isolation): A Transaction must be isolated from all other transactions. A transaction sees the data before the operations are performed , or after all the operations has performed, it can't see the data in between. D(Durability): All operations must be permanently placed on the system. Even in the event of system failure , all the operations must be exhibit. Q data integrity A constraint is a property assigned to a column or the set of columns in a table that prevents certain types of inconsistent data values from being placed in the column(s). Constraints are used to enforce the data integrity. This ensures the accuracy and reliability of the data in the database. The following categories of the data integrity exist: Data integrity (DI) is an important feature in SQL Server. When used properly, it ensures that data is accurate, correct, and valid. It also acts as a trap for otherwise undetectable bugs within your applications. However, DI remains one of the most neglected SQL Server features. 1 Entity Integrity 2 Domain Integrity 3 Referential integrity 4 User-Defined Integrity Entity Integrity ensures that there are no duplicate rows in a table. Domain Integrity enforces valid entries for a given column by restricting the type, the format, or the range of possible values. Referential integrity ensures that rows cannot be deleted, which are used by other records (for example, corresponding data values between tables will be vital). User-Defined Integrity enforces some specific business rules that do not fall into entity, domain, or referential integrity categories. Each of these categories of the data integrity can be enforced by the appropriate constraints. Microsoft SQL Server supports the following constraints: PRIMARY KEY UNIQUE FOREIGN KEY CHECK NOT NULL A PRIMARY KEY constraint is a unique identifier for a row within a database table. Every table should have a primary key constraint to uniquely identify each row and only one primary key constraint can be created for each table. The primary key constraints are used to enforce entity integrity.

Normalization & Denormalization

3

A UNIQUE constraint enforces the uniqueness of the values in a set of columns, so no duplicate values are entered. The unique key constraints are used to enforce entity integrity as the primary key constraints. A FOREIGN KEY constraint prevents any actions that would destroy link between tables with the corresponding data values. A foreign key in one table points to a primary key in another table. Foreign keys prevent actions that would leave rows with foreign key values when there are no primary keys with that value. The foreign key constraints are used to enforce referential integrity. A CHECK constraint is used to limit the values that can be placed in a column. The check constraints are used to enforce domain integrity. A NOT NULL constraint enforces that the column will not accept null values. The not null constraints are used to enforce domain integrity, as the check constraints. Denormalization

Denormalization is the art of introducing database design mechanisms that enhance performance. Successful denormalization depends upon a solid database design normalized to at least the third normal form. Only then can you begin a process called responsible denormalization. The only reason to denormalize a database is to improve performance. In this section, we will discuss techniques for denormalization and when they should be used. The techniques are • • • • •

Creating redundant data Converting views to tables Using derived (summary) columns and tables Using contrived (identity) columns Partitioning data o Subsets/vertical partitioning o Horizontal partitioning o Server partitioning (multiple servers)

Creating Redundant Data Repeating a column from one table in another table can help avoid a join. For example, because an invoice is for a customer, you could repeat the customer name in the invoice table so that when it is time to print the invoice you do not have to join to the customer table to retrieve the name. However, you also need a link to the customer table to get the billing address. Therefore, it makes no sense to duplicate the customer name when you already need to join to the customer table to get other data. A good example of redundant data is to carry the customer name in a table that contains time-related customer activity; for example, annual or monthly summaries of activity. You can keep redundant data up-to-date with a trigger so that when the base table changes, the tables carrying the redundant data also change. Converting Views to Tables This technique creates redundant data on a grand scale. The idea is that you create a table from a view that joins multiple tables. You need complex triggers to keep this table synchronized with the base tables — whenever any of the base tables change, the corresponding data in the new table must also change. An example of this technique is a vendor tracking system that contains several tables, as shown in Figure 16.6, an entity-relationship diagram. A vendor can have multiple addresses, one of which is marked “billing”; a vendor can have multiple contacts, one marked “primary”; and each

Normalization & Denormalization

4

contact can have many phone numbers, one marked “Main.” The following Select statement creates a view that joins all the tables. SELECT V.VendorID, V.Name, VA.Address1, VA.Address2, VA.City, VA.State, VA.PostalCode, VA.Country, VC.FirstName, VC.LastName, CP.Phone FROM Vendor V, VendorAddress VA, VendorContact VC, ContactPhone CP WHERE V.VendorID = VA.VendorID AND VA.AddressType = 'Billing' AND V.VendorID = VC.VendorID AND VC.ContactType = 'Primary' AND VC.ContactID = CP.ContactID AND CP.PhoneType = 'Main' The above view is often too slow for quick lookup and retrieval. Converting that view to a table with the same columns and datatypes and using a trigger to keep it up-to-date could give you improved performance. More important, creating a table from a view gives developers a choice of retrieving data from either the base tables or the redundant table. The drawback to this technique is the performance penalty of keeping the table up-to-date. You would use it only if the retrieval speed outweighed the cost of updating the table. Using Derived (Summary) Columns and Tables The two main types of derived columns are calculated and summary. Calculated columns are usually extensions to the row containing the base field. For example, an invoice detail record has QtyOrdered and ItemCost fields. To show the cost multiplied by the quantity, you could either calculate it each time it is needed (for example, in screens, reports, or stored procedures) or you could calculate the value each time you save the record and keep the value in the database. However, in this particular example you would not use this form of denormalization because calculated fields whose base data is in the same row don’t take much time to calculate, and storing the calculation isn’t worth the extra storage cost. Summary columns, on the other hand, are examples of denormalization that can have substantial performance gains. Good examples include adding MonthToDate, QtrToDate, and YearToDate columns in the Account table of a general ledger system. Each of these columns summarizes of data in many different records. By adding these columns, you avoid the long calculation needed for the summary and get the same data from one read instead. Of course, this option is not very flexible, because if you need a total from a certain date range, you still have to go back to the base data to summarize the detail records. A poor example of using summary fields is carrying an invoice subtotal on the header record when your orders average five lines — for example, in a high-priced, low-volume inventory system, such as a major appliance store. It doesn’t take much time or effort for SQL Server to add five rows. But what if you have a low-priced, high-turnover inventory, such as a grocery store, with hundreds of lines per order? Then it would make more sense to carry an InvoiceSubtotal field in

Normalization & Denormalization

5

the header record. Summarizing data by time period (day, month, year, etc.) or across product lines and saving the summaries in a table or tables is a good strategy to shortcut the reporting process. This technique is sometimes called a data warehouse. It can be expensive to keep up-to-date, because the data collection process must also update the summary tables. One strategy to bypass the cost of the data collection process is to replicate the transaction data to another database on another server where the data warehouse is periodically updated. Using Contrived (Identity) Columns Sometimes called counter columns and identity columns, contrived columns are columns to which the database assigns the next available number. Contrived columns are the primary keys in these tables, but they are not necessarily the only unique indexes. They are usually a shortcut for wide, multi-column keys. The contrived values are carried in other tables as foreign key values and link to the original table. Data Partitioning You can partition data vertically or horizontally, depending on whether you want to split columns or rows. You can also partition data across servers. We consider each type of partitioning below. Subsets/Vertical Partitioning Also called subsets, vertical partitions split columns that are not used often into new tables. Subsets are zero- or one-to-one relationships, meaning that for each row in the main table, one record (at most) exists in the subset table. The advantage to vertical partitioning is that the main table has narrower records, which results in more rows per page. When you do need data from the subset table, a simple join can retrieve the record. Horizontal Partitioning Horizontal partitioning, also known as row partitioning, is a more difficult type of partitioning to program and manage. The table is split so that rows with certain values are kept in different tables. For example, the data for each branch office could be kept in a separate table. A common example in data warehousing — historical transactions — keeps each month of data in a separate table. The difficulty in horizontal partitioning comes in returning rows from each table as one result set. You use a Union statement to create this result set, as demonstrated below. SELECT * FROM JanuaryTable UNION SELECT * FROM MarchTable SQL 6.5 has a major new feature that allows Unions in views. Most systems that use horizontal partitioning build flexible SQL statements on the front-end, which do not take advantage of stored procedures. Server Partitioning (Using Multiple Servers) An enhancement in version 6.5 gives you even more incentive to distribute your data and spread the load among multiple servers. Now that you can return result sets from a remote procedure and replicate to ODBC databases, programming and maintaining distributed databases is getting easier. Mixing OLTP and data warehousing has rarely been successful. It is best to have your data collection systems on one server and reporting database(s) on another. The keys to designing an architecture involving multiple servers are a thorough knowledge of replication and of returning results from remote procedures. For a complete discussion of replication, see Chapter 11.

Normalization & Denormalization

6

Here is a simple example of returning a result set from a remote procedure and inserting it into a table. INSERT MyTable (Field1, Field2, Field3) EXECUTE OtherServer.OtherDatabase.Owner.prProcedure @iParm1, @sParm2 The remote procedure must return a result set with the same number of fields and the same data types. In the example above, three fields must be returned. If they are not exactly the same data types, they must be able to be implicitly converted by SQL Server. As you can see from the example, you can pass parameters to the remote procedures. You can also insert the results into an existing temporary table by placing the # symbol in front of the table name, as in #MyTable. SUMMARY Performance tuning can be a never-ending series of adjustments in which you alleviate one bottleneck only to find another immediately. If your server is performing poorly, you need to answer two basic questions: “Do I add more memory, disk space, or CPU?” and “Do I have enough capacity to handle the anticipated growth of the company?” This book gives you enough information to make informed decisions. Although more can be gained from tuning hardware than ever before, the real gains come from tuning your queries and making good use of indexes, so don’t stop reading here. Introduction Transaction Isolation Levels Lock types Locking optimizer hints Deadlocks View locks (sp_lock) Literature Introduction In this article, I want to tell you about SQL Server 7.0/2000 Transaction Isolation Levels, what kinds of Transaction Isolation Levels exist, and how you can set the appropriate Transaction Isolation Level, about Lock types and Locking optimizer hints, about deadlocks, and about how you can view locks by using the sp_lock stored procedure. Transaction Isolation Levels There are four isolation levels: READ UNCOMMITTED READ COMMITTED REPEATABLE READ SERIALIZABLE Microsoft SQL Server supports all of these Transaction Isolation Levels and can separate REPEATABLE READ and SERIALIZABLE. Let me to describe each isolation level. READ UNCOMMITTED

Normalization & Denormalization

7

When it's used, SQL Server not issue shared locks while reading data. So, you can read an uncommitted transaction that might get rolled back later. This isolation level is also called dirty read. This is the lowest isolation level. It ensures only that a physically corrupt data will not be read. READ COMMITTED This is the default isolation level in SQL Server. When it's used, SQL Server will use shared locks while reading data. It ensures that a physically corrupt data will not be read and will never read data that another application has changed and not yet committed, but it not ensures that the data will not be changed before the end of the transaction. REPEATABLE READ When it's used, the dirty reads and nonrepeatable reads cannot occur. It means that locks will be placed on all data that is used in a query, and another transactions cannot update the data. This is the definition of nonrepeatable read from SQL Server Books Online: nonrepeatable read When a transaction reads the same row more than one time, and between the two (or more) reads, a separate transaction modifies that row. Because the row was modified between reads within the same transaction, each read produces different values, which introduces inconsistency. SERIALIZABLE Most restrictive isolation level. When it's used, the phantom values cannot occur. It prevents other users from updating or inserting rows into the data set until the transaction will be completed. This is the definition of phantom from SQL Server Books Online: phantom Phantom behavior occurs when a transaction attempts to select a row that does not exist and a second transaction inserts the row before the first transaction finishes. If the row is inserted, the row appears as a phantom to the first transaction, inconsistently appearing and disappearing. You can set the appropriate isolation level for an entire SQL Server session by using the SET TRANSACTION ISOLATION LEVEL statement. This is the syntax from SQL Server Books Online: SET TRANSACTION ISOLATION LEVEL { READ COMMITTED | READ UNCOMMITTED | REPEATABLE READ | SERIALIZABLE } You can use the DBCC USEROPTIONS statement to determine the Transaction Isolation Level currently set. This command returns the set options that are active for the current connection. This is the example: SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED GO

Normalization & Denormalization

8

DBCC USEROPTIONS GO Lock types There are three main types of locks that SQL Server 7.0/2000 uses: Shared locks Update locks Exclusive locks Shared locks are used for operations that do not change or update data, such as a SELECT statement. Update locks are used when SQL Server intends to modify a page, and later promotes the update page lock to an exclusive page lock before actually making the changes. Exclusive locks are used for the data modification operations, such as UPDATE, INSERT, or DELETE. Shared locks are compatible with other Shared locks or Update locks. Update locks are compatible with Shared locks only. Exclusive locks are not compatible with other lock types. Let me to describe it on the real example. There are four processes, which attempt to lock the same page of the same table. These processes start one after another, so Process1 is the first process, Process2 is the second process and so on. Process1 : SELECT Process2 : SELECT Process3 : UPDATE Process4 : SELECT Process1 sets the Shared lock on the page, because there are no another locks on this page. Process2 sets the Shared lock on the page, because Shared locks are compatible with other Shared locks. Process3 wants to modify data and wants to set Exclusive lock, but it cannot make it before Process1 and Process2 will be finished, because Exclusive lock is not compatible with other lock types. So, Process3 sets Update lock. Process4 cannot set Shared lock on the page before Process3 will be finished. So, there is no Lock starvation. Lock starvation occurs when read transactions can monopolize a table or page, forcing a write transaction to wait indefinitely. So, Process4 waits before Process3 will be finished. After Process1 and Process2 were finished, Process3 transfer Update lock into Exclusive lock to modify data. After Process3 was finished, Process4 sets the Shared lock on the page to select data. Locking optimizer hints SQL Server 7.0/2000 supports the following Locking optimizer hints: NOLOCK HOLDLOCK

Normalization & Denormalization

9

UPDLOCK TABLOCK PAGLOCK TABLOCKX READCOMMITTED READUNCOMMITTED REPEATABLEREAD SERIALIZABLE READPAST ROWLOCK NOLOCK is also known as "dirty reads". This option directs SQL Server not to issue shared locks and not to honor exclusive locks. So, if this option is specified, it is possible to read an uncommitted transaction. This results in higher concurrency and in lower consistency. HOLDLOCK directs SQL Server to hold a shared lock until completion of the transaction in which HOLDLOCK is used. You cannot use HOLDLOCK in a SELECT statement that includes the FOR BROWSE option. HOLDLOCK is equivalent to SERIALIZABLE. UPDLOCK instructs SQL Server to use update locks instead of shared locks while reading a table and holds them until the end of the command or transaction. TABLOCK takes a shared lock on the table that is held until the end of the command. If you also specify HOLDLOCK, the lock is held until the end of the transaction. PAGLOCK is used by default. Directs SQL Server to use shared page locks. TABLOCKX takes an exclusive lock on the table that is held until the end of the command or transaction. READCOMMITTED Perform a scan with the same locking semantics as a transaction running at the READ COMMITTED isolation level. By default, SQL Server operates at this isolation level. READUNCOMMITTED Equivalent to NOLOCK. REPEATABLEREAD Perform a scan with the same locking semantics as a transaction running at the REPEATABLE READ isolation level. SERIALIZABLE Perform a scan with the same locking semantics as a transaction running at the SERIALIZABLE isolation level. Equivalent to HOLDLOCK. READPAST Skip locked rows. This option causes a transaction to skip over rows locked by other transactions that would ordinarily appear in the result set, rather than block the transaction waiting for the other transactions to release their locks on these rows. The READPAST lock hint applies only to transactions operating at READ COMMITTED isolation and will read only past row-level locks. Applies only to the SELECT statement. You can only specify the READPAST lock in the READ COMMITTED or REPEATABLE READ isolation levels. ROWLOCK Use row-level locks rather than use the coarser-grained page- and table-level locks.

Normalization & Denormalization

10

You can specify one of these locking options in a SELECT statement. This is the example: SELECT au_fname FROM pubs..authors (holdlock) Deadlocks Deadlock occurs when two users have locks on separate objects and each user wants a lock on the other's object. For example, User1 has a lock on object "A" and wants a lock on object "B" and User2 has a lock on object "B" and wants a lock on object "A". In this case, SQL Server ends a deadlock by choosing the user, who will be a deadlock victim. After that, SQL Server rolls back the breaking user's transaction, sends message number 1205 to notify the user's application about breaking, and then allows the nonbreaking user's process to continue. You can decide which connection will be the candidate for deadlock victim by using SET DEADLOCK_PRIORITY. In other case, SQL Server selects the deadlock victim by choosing the process that completes the circular chain of locks. So, in a multiuser situation, your application should check the error 1205 to indicate that the transaction was rolled back, and if it's so, restart the transaction. Note. To reduce the chance of a deadlock, you should minimize the size of transactions and transaction times. View locks (sp_lock) Sometimes you need a reference to information about locks. Microsoft recommends using the sp_lock system stored procedure to report locks information. This very useful procedure returns the information about SQL Server process ID, which lock the data, about locked database, about locked table ID, about locked page and about type of locking (locktype column). This is the example of using the sp_lock system stored procedure: spid locktype table_id page dbname ------ ----------------------------------- ----------- ----------- --------------11 Sh_intent 688005482 0 master 11 Ex_extent 0 336 tempdb The information, returned by sp_lock system stored procedure needs in some clarification, because it's difficult to understand database name, object name and index name by their ID numbers. Check the link below if you need to get user name, host name, database name, index name object name and object owner instead of their ID numbers: Detailed locking view: sp_lock2

Related Documents

Normalization
November 2019 16
Normalization
June 2020 12
Normalization
November 2019 34
Normalization
November 2019 26
Normalization
November 2019 18
Normalization
November 2019 17