Ten Common Database Design Mistakes

June 2020
PDF

Download

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA

Overview

Download & View Ten Common Database Design Mistakes as PDF for free.

More details

Words: 13,376
Pages: 21

Preview
Full text

Ten Common Database Design Mistakes

Home

SQL

.NET

Sysadmin

Opinion

Blogs

Forums

Books

About

Sign in | Join | Help

SQL Home > Database Administration > Ten Common Database Design Mistakes

Ten Common Database Design Mistakes 26 February 2007 by Louis Davidson

Av rating: Total votes: 656 Total comments: 42

send to a friend printer friendly version

No list of mistakes is ever going to be exhaustive. People (myself included) do a lot of really stupid things, at times, in the name of "getting it done." This list simply reflects the database design mistakes that are currently on my mind, or in some cases, constantly on my mind. NOTE: I have done this topic two times before. If you're interested in hearing the podcast version, visit Greg Low's super-excellent SQL Down Under. I also presented a boiled down, ten-minute version at PASS for the Simple-Talk booth. Originally there were ten, then six, and today back to ten. And these aren't exactly the same ten that I started with; these are ten that stand out to me as of today. Before I start with the list, let me be honest for a minute. I used to have a preacher who made sure to tell us before some sermons that he was preaching to himself as much as he was to the congregation. When I speak, or when I write an article, I have to listen to that tiny little voice in my head that helps filter out my own bad habits, to make sure that I am teaching only the best practices. Hopefully, after reading this article, the little voice in your head will talk to you when you start to stray from what is right in terms of database design practices.

SQL Home Learn SQL Server (35 articles) SQL Training (7 articles) Database Administration (54 articles) T-SQL Programming (73 articles) Performance (12 articles) Backup and Recovery (18 articles) SQL Tools (44 articles) SSIS (5 articles) Reporting Services (3 articles)

So, the list: 1. 2. 3. 4. 5. 6. 7. 8. 9.

Poor design/planning Ignoring normalization Poor naming standards Lack of documentation One table to hold all domain values Using identity/guid columns as your only key Not using SQL facilities to protect data integrity Not using stored procedures to access data Trying to build generic objects

10. Lack of testing

To Boldly Ask IT for Development Work Phil has always been mystified by the way that, in Science-Fiction films, the crew of space-ships are able to... Read more...

RECENT BLOG POSTS: Drawing Block-graphs from Stored Procedures.

Poor design/planning

Stolen Pages, Ad-hoc queries and the sins of...

"If you don't know where you are going, any road will take you there" – George Harrison

We Don't Need Another Hero

Prophetic words for all parts of life and a description of the type of issues that plague many projects these days. Let me ask you: would you hire a contractor to build a house and then demand that they start pouring a foundation the very next day? Even worse, would you demand that it be done without blueprints or house plans? Hopefully, you answered "no" to both of these. A design is needed make sure that the house you want gets built, and that the land you are building it on will not sink into some underground cavern. If you answered yes, I am not sure if anything I can say will help you. Like a house, a good database is built with forethought, and with proper care and attention given to the needs of the data that will inhabit it; it cannot be tossed together in some sort of reverse implosion. Since the database is the cornerstone of pretty much every business project, if you don't take the time to map out the needs of the project and how the database is going to meet them, then the chances are that the whole project will veer off course and lose direction. Furthermore, if you don't take the time at the start to get the database design right, then you'll find that any substantial changes in the database structures that you need to make further down the line could have a huge impact on the whole project, and greatly increase the likelihood of the project timeline slipping. Far too often, a proper planning phase is ignored in favor of just "getting it done". The project heads off in a

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[9/14/2009 7:08:02 PM]

View the blog

Using the Filtering API with the SQL Comparison SDK Red Gate's SQL Comparison SDK provides a means to compare and synchronize database schemas and data... Read more... SQL Toolbelt 2008: Predominantly an Engineering Task The conversion of the Red-Gate tools to be compatible with SQL Server 2008 might not seem, on first... Read more... Audit Crosschecks In this short article, the second of a 2-part series, William suggests a solution, using SQL Data... Read

Ten Common Database Design Mistakes

certain direction and when problems inevitably arise – due to the lack of proper designing and planning – there is "no time" to go back and fix them properly, using proper techniques. That's when the "hacking" starts, with the veiled promise to go back and fix things later, something that happens very rarely indeed. Admittedly it is impossible to predict every need that your design will have to fulfill and every issue that is likely to arise, but it is important to mitigate against potential problems as much as possible, by careful planning.

Ignoring Normalization

more... SQL Response: The dim sum interview Richard Morris met David and Nigel of the SQL Response team, in a dim sum Restaurant in Cambridge. They... Read more... SQL Server eBook Download - SQL Server Tacklebox The SQL Server Tacklebox contains a collection of practical tools and techniques to automate and... Read more...

Normalization defines a set of methods to break down tables to their constituent parts until each table represents one and only one "thing", and its columns serve to fully describe only the one "thing" that the table represents. The concept of normalization has been around for 30 years and is the basis on which SQL and relational databases are implemented. In other words, SQL was created to work with normalized data structures. Normalization is not just some plot by database programmers to annoy application programmers (that is merely a satisfying side effect!) SQL is very additive in nature in that, if you have bits and pieces of data, it is easy to build up a set of values or results. In the FROM clause, you take a set of data (a table) and add (JOIN) it to another table. You can add as many sets of data together as you like, to produce the final set you need. This additive nature is extremely important, not only for ease of development, but also for performance. Indexes are most effective when they can work with the entire key value. Whenever you have to use SUBSTRING, CHARINDEX, LIKE, and so on, to parse out a value that is combined with other values in a single column (for example, to split the last name of a person out of a full name column) the SQL paradigm starts to break down and data becomes become less and less searchable. So normalizing your data is essential to good performance, and ease of development, but the question always comes up: "How normalized is normalized enough?" If you have read any books about normalization, then you will have heard many times that 3rd Normal Form is essential, but 4th and 5th Normal Forms are really useful and, once you get a handle on them, quite easy to follow and well worth the time required to implement them. In reality, however, it is quite common that not even the first Normal Form is implemented correctly. Whenever I see a table with repeating column names appended with numbers, I cringe in horror. And I cringe in horror quite often. Consider the following example Customer table:

Beginning SQL Server 2005 Reporting Services Part 1 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more... Ten Common Database Design Mistakes Database design and implementation is the cornerstone of any data centric project (read 99.9% of... Read more... Beginning SQL Server 2005 Reporting Services Part 2 Continuing his in-depth tour of SQL Server 2005 Reporting Services, Steve Joubert demonstrates the most... Read more... SQL Server Full Text Search Language Features SQL Full-text Search (SQL FTS) is an optional component of SQL Server 7 and later, which allows fast... Read more... Executing SSIS Packages Nigel Rivett demonstrates how to execute all SSIS packages in a given folder using either an SSIS... Read more...

Over 150,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.

Are there always 12 payments? Is the order of payments significant? Does a NULL value for a payment mean UNKNOWN (not filled in yet), or a missed payment? And when was the payment made?!? A payment does not describe a Customer and should not be stored in the Customer table. Details of payments should be stored in a Payment table, in which you could also record extra information about the payment, like when the payment was made, and what the payment was for:

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[9/14/2009 7:08:02 PM]

Ten Common Database Design Mistakes

In this second design, each column stores a single unit of information about a single "thing" (a payment), and each row represents a specific instance of a payment. This second design is going to require a bit more code early in the process but, it is far more likely that you will be able to figure out what is going on in the system without having to hunt down the original programmer and kick their butt…sorry… figure out what they were thinking

Poor naming standards "That which we call a rose, by any other name would smell as sweet" This quote from Romeo and Juliet by William Shakespeare sounds nice, and it is true from one angle. If everyone agreed that, from now on, a rose was going to be called dung, then we could get over it and it would smell just as sweet. The problem is that if, when building a database for a florist, the designer calls it dung and the client calls it a rose, then you are going to have some meetings that sound far more like an Abbott and Costello routine than a serious conversation about storing information about horticulture products. Names, while a personal choice, are the first and most important line of documentation for your application. I will not get into all of the details of how best to name things here– it is a large and messy topic. What I want to stress in this article is the need for consistency. The names you choose are not just to enable you to identify the purpose of an object, but to allow all future programmers, users, and so on to quickly and easily understand how a component part of your database was intended to be used, and what data it stores. No future user of your design should need to wade through a 500 page document to determine the meaning of some wacky name. Consider, for example, a column named, X304_DSCR. What the heck does that mean? You might decide, after some head scratching, that it means "X304 description". Possibly it does, but maybe DSCR means discriminator, or discretizator? Unless you have established DSCR as a corporate standard abbreviation for description, then X304_DESCRIPTION is a much better name, and one leaves nothing to the imagination. That just leaves you to figure out what the X304 part of the name means. On first inspection, to me, X304 sounds like more like it should be data in a column rather than a column name. If I subsequently found that, in the organization, there was also an X305 and X306 then I would flag that as an issue with the database design. For maximum flexibility, data is stored in columns, not in column names. Along these same lines, resist the temptation to include "metadata" in an object's name. A name such as tblCustomer or colVarcharAddress might seem useful from a development perspective, but to the end user it is just confusing. As a developer, you should rely on being able to determine that a table name is a table name by context in the code or tool, and present to the users clear, simple, descriptive names, such as Customer and Address. A practice I strongly advise against is the use of spaces and quoted identifiers in object names. You should avoid column names such as "Part Number" or, in Microsoft style, [Part Number], therefore requiring you users to include these spaces and identifiers in their code. It is annoying and simply unnecessary. Acceptable alternatives would be part_number, partNumber or PartNumber. Again, consistency is key. If you choose PartNumber then that's fine – as long as the column containing invoice numbers is called InvoiceNumber, and not one of the other possible variations.

Lack of documentation I hinted in the intro that, in some cases, I am writing for myself as much as you. This is the topic where that is most true. By carefully naming your objects, columns, and so on, you can make it clear to anyone what it is that your database is modeling. However, this is only step one in the documentation battle. The unfortunate reality is, though, that "step one" is all too often the only step. Not only will a well-designed data model adhere to a solid naming standard, it will also contain definitions on

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[9/14/2009 7:08:02 PM]

Ten Common Database Design Mistakes

its tables, columns, relationships, and even default and check constraints, so that it is clear to everyone how they are intended to be used. In many cases, you may want to include sample values, where the need arose for the object, and anything else that you may want to know in a year or two when "future you" has to go back and make changes to the code. NOTE: Where this documentation is stored is largely a matter of corporate standards and/or convenience to the developer and end users. It could be stored in the database itself, using extended properties. Alternatively, it might be in maintained in the data modeling tools. It could even be in a separate data store, such as Excel or another relational database. My company maintains a metadata repository database, which we developed in order to present this data to end users in a searchable, linkable format. Format and usability is important, but the primary battle is to have the information available and up to date. Your goal should be to provide enough information that when you turn the database over to a support programmer, they can figure out your minor bugs and fix them (yes, we all make bugs in our code!). I know there is an old joke that poorly documented code is a synonym for "job security." While there is a hint of truth to this, it is also a way to be hated by your coworkers and never get a raise. And no good programmer I know of wants to go back and rework their own code years later. It is best if the bugs in the code can be managed by a junior support programmer while you create the next new thing. Job security along with raises is achieved by being the go-to person for new challenges.

One table to hold all domain values "One Ring to rule them all and in the darkness bind them" This is all well and good for fantasy lore, but it's not so good when applied to database design, in the form of a "ruling" domain table. Relational databases are based on the fundamental idea that every object represents one and only one thing. There should never be any doubt as to what a piece of data refers to. By tracing through the relationships, from column name, to table name, to primary key, it should be easy to examine the relationships and know exactly what a piece of data means. The big myth perpetrated by architects who don't really understand relational database architecture (me included early in my career) is that the more tables there are, the more complex the design will be. So, conversely, shouldn't condensing multiple tables into a single "catch-all" table simplify the design? It does sound like a good idea, but at one time giving Pauly Shore the lead in a movie sounded like a good idea too. For example, consider the following model snippet where I needed domain values for: Customer CreditStatus Customer Type Invoice Status Invoice Line Item BackOrder Status Invoice Line Item Ship Via Carrier On the face of it that would be five domain tables…but why not just use one generic domain table, like this?

This may seem a very clean and natural way to design a table for all but the problem is that it is just not very natural to work with in SQL. Say we just want the domain values for the Customer table: SELECT * FROM Customer JOIN GenericDomain as CustomerType

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[9/14/2009 7:08:02 PM]

Ten Common Database Design Mistakes

ON Customer.CustomerTypeId = CustomerType.GenericDomainId and CustomerType.RelatedToTable = 'Customer' and CustomerType.RelatedToColumn = 'CustomerTypeId' JOIN GenericDomain as CreditStatus ON Customer.CreditStatusId = CreditStatus.GenericDomainId and CreditStatus.RelatedToTable = 'Customer' and CreditStatus.RelatedToColumn = ' CreditStatusId' As you can see, this is far from being a natural join. It comes down to the problem of mixing apples with oranges. At first glance, domain tables are just an abstract concept of a container that holds text. And from an implementation centric standpoint, this is quite true, but it is not the correct way to build a database. In a database, the process of normalization, as a means of breaking down and isolating data, takes every table to the point where one row represents one thing. And each domain of values is a distinctly different thing from all of the other domains (unless it is not, in which case the one table will suffice.). So what you do, in essence, is normalize the data on each usage, spreading the work out over time, rather than doing the task once and getting it over with. So instead of the single table for all domains, you might model it as:

Looks harder to do, right? Well, it is initially. Frankly it took me longer to flesh out the example tables. But, there are quite a few tremendous gains to be had: Using the data in a query is much easier: SELECT * FROM Customer JOIN CustomerType ON Customer.CustomerTypeId = CustomerType.CustomerTypeId JOIN CreditStatus ON Customer.CreditStatusId = CreditStatus.CreditStatusId Data can be validated using foreign key constraints very naturally, something not feasible for the other solution unless you implement ranges of keys for every table – a terrible mess to maintain. If it turns out that you need to keep more information about a ShipViaCarrier than just the code, 'UPS', and description, 'United Parcel Service', then it is as simple as adding a column or two. You could even expand the table to be a full blown representation of the businesses that are carriers for the item. All of the smaller domain tables will fit on a single page of disk. This ensures a single read (and likely a single page in cache). If the other case, you might have your domain table spread across many pages, unless you cluster on the referring table name, which then could cause it to be more costly to use a non-clustered index if you have many values. You can still have one editor for all rows, as most domain tables will likely have the same base structure/usage. And while you would lose the ability to query all domain values in one query easily, why would you want to? (A union query could easily be created of the tables easily if needed, but this would seem an unlikely need.) I should probably rebut the thought that might be in your mind. "What if I need to add a new column to all domain tables?" For example, you forgot that the customer wants to be able to do custom sorting on domain values and didn't put anything in the tables to allow this. This is a fair question, especially if you have 1000 of these tables in a very large database. First, this rarely happens, and when it does it is going to be a major change to your database in either way.

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[9/14/2009 7:08:02 PM]

Ten Common Database Design Mistakes

Second, even if this became a task that was required, SQL has a complete set of commands that you can use to add columns to tables, and using the system tables it is a pretty straightforward task to build a script to add the same column to hundreds of tables all at once. That will not be as easy of a change, but it will not be so much more difficult to outweigh the large benefits. The point of this tip is simply that it is better to do the work upfront, making structures solid and maintainable, rather than trying to attempt to do the least amount of work to start out a project. By keeping tables down to representing one "thing" it means that most changes will only affect one table, after which it follows that there will be less rework for you down the road.

Using identity/guid columns as your only key First Normal Form dictates that all rows in a table must be uniquely identifiable. Hence, every table should have a primary key. SQL Server allows you to define a numeric column as an IDENTITY column, and then automatically generates a unique value for each row. Alternatively, you can use NEWID() (or NEWSEQUENTIALID()) to generate a random, 16 byte unique value for each row. These types of values, when used as keys, are what are known as surrogate keys. The word surrogate means "something that substitutes for" and in this case, a surrogate key should be the stand-in for a natural key. The problem is that too many designers use a surrogate key column as the only key column on a given table. The surrogate key values have no actual meaning in the real world; they are just there to uniquely identify each row. Now, consider the following Part table, whereby PartID is an IDENTITY column and is the primary key for the table:

PartID

PartNumber

Description

1

XXXXXXXX

The X part

2

XXXXXXXX

The X part

3

YYYYYYYY

The Y part

How many rows are there in this table? Well, there seem to be three, but are rows with PartIDs 1 and 2 actually the same row, duplicated? Or are they two different rows that should be unique but were keyed in incorrectly? The rule of thumb I use is simple. If a human being could not pick which row they want from a table without knowledge of the surrogate key, then you need to reconsider your design. This is why there should be a key of some sort on the table to guarantee uniqueness, in this case likely on PartNumber. In summary: as a rule, each of your tables should have a natural key that means something to the user, and can uniquely identify each row in your table. In the very rare event that you cannot find a natural key (perhaps, for example, a table that provides a log of events), then use an artificial/surrogate key.

Not using SQL facilities to protect data integrity All fundamental, non-changing business rules should be implemented by the relational engine. The base rules of nullability, string length, assignment of foreign keys, and so on, should all be defined in the database. There are many different ways to import data into SQL Server. If your base rules are defined in the database itself can you guarantee that they will never be bypassed and you can write your queries without ever having to worry whether the data you're viewing adheres to the base business rules. Rules that are optional, on the other hand, are wonderful candidates to go into a business layer of the application. For example, consider a rule such as this: "For the first part of the month, no part can be sold at more than a 20% discount, without a manager's approval". Taken as a whole, this rule smacks of being rather messy, not very well controlled, and subject to frequent change. For example, what happens when next week the maximum discount is 30%? Or when the definition of "first part of the month" changes from 15 days to 20 days? Most likely you won't want go through the difficulty of implementing these complex temporal business rules in SQL Server code – the business layer is a great place to implement rules like this. However, consider the rule a little more closely. There are elements of it that will probably never change. E.g.

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[9/14/2009 7:08:02 PM]

Ten Common Database Design Mistakes

The maximum discount it is ever possible to offer The fact that the approver must be a manager These aspects of the business rule very much ought to get enforced by the database and design. Even if the substance of the rule is implemented in the business layer, you are still going to have a table in the database that records the size of the discount, the date it was offered, the ID of the person who approved it, and so on. On the Discount column, you should have a CHECK constraint that restricts the values allowed in this column to between 0.00 and 0.90 (or whatever the maximum is). Not only will this implement your "maximum discount" rule, but will also guard against a user entering a 200% or a negative discount by mistake. On the ManagerID column, you should place a foreign key constraint, which reference the Managers table and ensures that the ID entered is that of a real manager (or, alternatively, a trigger that selects only EmployeeIds corresponding to managers). Now, at the very least we can be sure that the data meets the very basic rules that the data must follow, so we never have to code something like this in order to check that the data is good:

SELECT CASE WHEN discount < 0 then 0 else WHEN discount > 1 then 1…

We can feel safe that data meets the basic criteria, every time.

Not using stored procedures to access data Stored procedures are your friend. Use them whenever possible as a method to insulate the database layer from the users of the data. Do they take a bit more effort? Sure, initially, but what good thing doesn't take a bit more time? Stored procedures make database development much cleaner, and encourage collaborative development between your database and functional programmers. A few of the other interesting reasons that stored procedures are important include the following. Maintainability Stored procedures provide a known interface to the data, and to me, this is probably the largest draw. When code that accesses the database is compiled into a different layer, performance tweaks cannot be made without a functional programmer's involvement. Stored procedures give the database professional the power to change characteristics of the database code without additional resource involvement, making small changes, or large upgrades (for example changes to SQL syntax) easier to do. Encapsulation Stored procedures allow you to "encapsulate" any structural changes that you need to make to the database so that the knock on effect on user interfaces is minimized. For example, say you originally modeled one phone number, but now want an unlimited number of phone numbers. You could leave the single phone number in the procedure call, but store it in a different table as a stopgap measure, or even permanently if you have a "primary" number of some sort that you always want to display. Then a stored proc could be built to handle the other phone numbers. In this manner the impact to the user interfaces could be quite small, while the code of stored procedures might change greatly. Security Stored procedures can provide specific and granular access to the system. For example, you may have 10 stored procedures that all update table X in some way. If a user needs to be able to update a particular column in a table and you want to make sure they never update any others, then you can simply grant to that user the permission to execute just the one procedure out of the ten that allows them perform the required update. Performance There are a couple of reasons that I believe stored procedures enhance performance. First, if a newbie writes ratty code (like using a cursor to go row by row through an entire ten million row table to find one value, instead of using a WHERE clause), the procedure can be rewritten without impact to the system (other than giving back valuable resources.) The second reason is plan reuse. Unless you are using dynamic SQL calls in your procedure, SQL Server can store a plan and not need to compile it every time it is executed. It's true that in every version of SQL Server since 7.0 this has become less and less significant, as SQL Server gets better at storing plans ad hoc SQL calls (see note below). However, stored procedures still make it easier for plan reuse and performance tweaks. In the case where ad hoc SQL would actually be faster, this can be coded into the stored procedure seamlessly. In 2005, there is a database setting (PARAMETERIZATION FORCED) that, when enabled, will cause all queries to have their plans saved. This does not cover more complicated situations that procedures would cover, but can be a big help. There is also a feature known as plan guides, which allow you to override the

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[9/14/2009 7:08:02 PM]

Ten Common Database Design Mistakes

plan for a known query type. Both of these features are there to help out when stored procedures are not used, but stored procedures do the job with no tricks. And this list could go on and on. There are drawbacks too, because nothing is ever perfect. It can take longer to code stored procedures than it does to just use ad hoc calls. However, the amount of time to design your interface and implement it is well worth it, when all is said and done.

Trying to code generic T-SQL objects I touched on this subject earlier in the discussion of generic domain tables, but the problem is more prevalent than that. Every new T-SQL programmer, when they first start coding stored procedures, starts to think "I wish I could just pass a table name as a parameter to a procedure." It does sound quite attractive: one generic stored procedure that can perform its operations on any table you choose. However, this should be avoided as it can be very detrimental to performance and will actually make life more difficult in the long run. T-SQL objects do not do "generic" easily, largely because lots of design considerations in SQL Server have clearly been made to facilitate reuse of plans, not code. SQL Server works best when you minimize the unknowns so it can produce the best plan possible. The more it has to generalize the plan, the less it can optimize that plan. Note that I am not specifically talking about dynamic SQL procedures. Dynamic SQL is a great tool to use when you have procedures that are not optimizable / manageable otherwise. A good example is a search procedure with many different choices. A precompiled solution with multiple OR conditions might have to take a worst case scenario approach to the plan and yield weak results, especially if parameter usage is sporadic. However, the main point of this tip is that you should avoid coding very generic objects, such as ones that take a table name and twenty column names/value pairs as a parameter and lets you update the values in the table. For example, you could write a procedure that started out: CREATE PROCEDURE updateAnyTable @tableName sysname, @columnName1 sysname, @columnName1Value varchar(max) @columnName2 sysname, @columnName2Value varchar(max) … The idea would be to dynamically specify the name of a column and the value to pass to a SQL statement. This solution is no better than simply using ad hoc calls with an UPDATE statement. Instead, when building stored procedures, you should build specific, dedicated stored procedures for each task performed on a table (or multiple tables.) This gives you several benefits: Properly compiled stored procedures can have a single compiled plan attached to it and reused. Properly compiled stored procedures are more secure than ad-hoc SQL or even dynamic SQL procedures, reducing the surface area for an injection attack greatly because the only parameters to queries are search arguments or output values. Testing and maintenance of compiled stored procedures is far easier to do since you generally have only to search arguments, not that tables/columns/etc exist and handling the case where they do not A nice technique is to build a code generation tool in your favorite programming language (even T-SQL) using SQL metadata to build very specific stored procedures for every table in your system. Generate all of the boring, straightforward objects, including all of the tedious code to perform error handling that is so essential, but painful to write more than once or twice. In my Apress book, Pro SQL Server 2005 Database Design and Optimization, I provide several such "templates" (manly for triggers, abut also stored procedures) that have all of the error handling built in, I would suggest you consider building your own (possibly based on mine) to use when you need to manually build a trigger/procedure or whatever.

Lack of testing When the dial in your car says that your engine is overheating, what is the first thing you blame? The engine. Why don't you immediately assume that the dial is broken? Or something else minor? Two reasons: The engine is the most important component of the car and it is common to blame the most important part of the system first. It is all too often true. As database professionals know, the first thing to get blamed when a business system is running slow is the database. Why? First because it is the central piece of most any business system, and second because it also is all too often true.

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[9/14/2009 7:08:02 PM]

Ten Common Database Design Mistakes

We can play our part in dispelling this notion, by gaining deep knowledge of the system we have created and understanding its limits through testing. But let's face it; testing is the first thing to go in a project plan when time slips a bit. And what suffers the most from the lack of testing? Functionality? Maybe a little, but users will notice and complain if the "Save" button doesn't actually work and they cannot save changes to a row they spent 10 minutes editing. What really gets the shaft in this whole process is deep system testing to make sure that the design you (presumably) worked so hard on at the beginning of the project is actually implemented correctly. But, you say, the users accepted the system as working, so isn't that good enough? The problem with this statement is that what user acceptance "testing" usually amounts to is the users poking around, trying out the functionality that they understand and giving you the thumbs up if their little bit of the system works. Is this reasonable testing? Not in any other industry would this be vaguely acceptable. Do you want your automobile tested like this? "Well, we drove it slowly around the block once, one sunny afternoon with no problems; it is good!" When that car subsequently "failed" on the first drive along a freeway, or during the first drive through rain or snow, then the driver would have every right to be very upset. Too many database systems get tested like that car, with just a bit of poking around to see if individual queries and modules work. The first real test is in production, when users attempt to do real work. This is especially true when it is implemented for a single client (even worse when it is a corporate project, with management pushing for completion more than quality). Initially, major bugs come in thick and fast, especially performance related ones. If the first time you have tried a full production set of users, background process, workflow processes, system maintenance routines, ETL, etc, is on your system launch day, you are extremely likely to discover that you have not anticipated all of the locking issues that might be caused by users creating data while others are reading it, or hardware issues cause by poorly set up hardware. It can take weeks to live down the cries of "SQL Server can't handle it" even after you have done the proper tuning. Once the major bugs are squashed, the fringe cases (which are pretty rare cases, like a user entering a negative amount for hours worked) start to raise their ugly heads. What you end up with at this point is software that irregularly fails in what seem like weird places (since large quantities of fringe bugs will show up in ways that aren't very obvious and are really hard to find.) Now, it is far harder to diagnose and correct because now you have to deal with the fact that users are working with live data and trying to get work done. Plus you probably have a manager or two sitting on your back saying things like "when will it be done?" every 30 seconds, even though it can take days and weeks to discover the kinds of bugs that result in minor (yet important) data aberrations. Had proper testing been done, it would never have taken weeks of testing to find these bugs, because a proper test plan takes into consideration all possible types of failures, codes them into an automated test, and tries them over and over. Good testing won't find all of the bugs, but it will get you to the point where most of the issues that correspond to the original design are ironed out. If everyone insisted on a strict testing plan as an integral and immutable part of the database development process, then maybe someday the database won't be the first thing to be fingered when there is a system slowdown.

Summary Database design and implementation is the cornerstone of any data centric project (read 99.9% of business applications) and should be treated as such when you are developing. This article, while probably a bit preachy, is as much a reminder to me as it is to anyone else who reads it. Some of the tips, like planning properly, using proper normalization, using a strong naming standards and documenting your work– these are things that even the best DBAs and data architects have to fight to make happen. In the heat of battle, when your manager's manager's manager is being berated for things taking too long to get started, it is not easy to push back and remind them that they pay you now, or they pay you later. These tasks pay dividends that are very difficult to quantify, because to quantify success you must fail first. And even when you succeed in one area, all too often other minor failures crop up in other parts of the project so that some of your successes don't even get noticed. The tips covered here are ones that I have picked up over the years that have turned me from being mediocre to a good data architect/database programmer. None of them take extraordinary amounts of time (except perhaps design and planning) but they all take more time upfront than doing it the "easy way". Let's face it, if the easy way were that easy in the long run, I for one would abandon the harder way in a second. It is not until you see the end result that you realize that success comes from starting off right as much as finishing right.

This article has been viewed 206239 times.

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[9/14/2009 7:08:02 PM]

Ten Common Database Design Mistakes

Author profile: Louis Davidson Louis has been in the IT industry for 15 years as a corporate database developer and data architect. Currently he is the Data Architect for Compass Technology in Chesapeake, Virginia, supporting the Christian Broadcasting Network and NorthStar Studios in Nashville, Tennessee. Louis has been a Microsoft MVP since 2004, and is an active volunteer for the Professional Association for SQL Server working in their Special Interest Groups. He is the author of SQL Server 2005 Database Design and Optimization. Search for other articles by Louis Davidson

Rate this article: Avg rating: Poor

OK

Good

from a total of 656 votes.

Great

Must read

Have Your Say Do you have an opinion on this article? Then add your comment below:

You must be logged in to post to this forum Click here to log in. Subject: Posted by: Posted on: Message:

One minor disagreement Anonymous (not signed in) Wednesday, February 28, 2007 at 4:31 PM You say "In summary: as a rule, each of your tables should have a natural key that means something to the user, and can uniquely identify each row in your table. In the very rare event that you cannot find a natural key (perhaps, for example, a table that provides a log of events), then use an artificial/surrogate key." I would almost agree with that, but I would suggest that using an Identity column and another "person readable" unique key can provide benefits. For example, Supplier A changes their part number for Part x. Now when you order Part X you need to quote their part number. By having an Identity column used for foreign keys, the new unique part number can be included in all purchase orders.

Subject: Posted by: Posted on: Message:

RE: One minor disagreement Anonymous (not signed in) Wednesday, February 28, 2007 at 10:44 PM If you read that I don't like surrogate (identity) primary keys then I might need to make some edits. The point of the section was that you needed to *at least* have a natural key. I always set the primary key on every table that I create to be an integer, usually set by the identity property. The problem comes when that is the only key, not in the use of identity keys, one of the reasons being the one you stated about changing the natural key. Thanks for reading!

Subject: Posted by: Posted on: Message:

Surrogate Keys ASdanz (view profile) Tuesday, March 06, 2007 at 3:02 PM I, too, initially misread your opinion of identities as surrogate keys. I read the title of that section as: Using identity/guid columns as your only [type of primary] key. I completely agree on the need for stable keys. Most of the time, Identities are much easier to implement and manage. Although purists would say that you are shirking your duty as a DBA to find a stable natural key.

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[9/14/2009 7:08:02 PM]

Ten Common Database Design Mistakes

Thanks for the article. Subject: Posted by: Posted on: Message:

except stored procedures:-) Anonymous (not signed in) Tuesday, March 06, 2007 at 3:54 PM Fully agree on all stuff except stored procedures. SP are a good practice, but they do have shortcomings: 1. can't be ported easily if your application must work againts another db vendor, stay away from sp 2. must sometimes use dynamic sql Complex dynamic queries are much easier built in another language (or use a .NET SP) 3. can't receive arrays of parameters (XML parameters don't count :-) 4. ackward for some updates If you need to update only a field but do not know which one in advance, binding 40 parameters when you need one + the PK is a bit stupid and quite innefficient. I DO recommend stored procedures, but they are not always a best practice

Subject: Posted by: Posted on: Message:

Good Article, Nice Use of Quotes Anonymous (not signed in) Tuesday, March 06, 2007 at 5:35 PM The one from Lord of the Rings is the best. I have a client that has bought software built around the the "One Ring" domain idea, and not just for domains, but for all the business data, too.

Subject: Posted by: Posted on: Message:

Column names Anonymous (not signed in) Wednesday, March 07, 2007 at 5:56 AM I disagree on your assertion that column names should be end-user friendly. On applications that must be prepared to be translated to several languages (and most commercial applications should be), displaying the column name directly to the user isn't an option, and so names that are more meaningful to the developer can and should be used.

Subject: Posted by: Posted on: Message:

More on surrogate keys Anonymous (not signed in) Wednesday, March 07, 2007 at 8:38 AM I think the problem with "Using identity/guid columns as your only key" is that your described mistake isn't possible. The candidate keys (the full list of possible keys on the table including the natural keys for those who aren't familiar with the term) are always going to be keys. You can't forget to make them keys, they just are. You should rename the section "Not putting at least a unique constaint on all candidate keys" or something like that. Once you have the title right the rewrite should be easier. Oh, don't forget that the other common database design mistake is NOT using a surrogate key as the primary key.

Subject: Posted by: Posted on: Message:

Use of identity columns in foreign key references Anonymous (not signed in) Wednesday, March 07, 2007 at 9:03 AM Outstanding article. Most of your guidelines are indisputable. I often see IDENTITY columns used improperly and I'd be interested to get your thoughts. I was chief architect for Sybase back in the early 1990s when we first added identity columns to SQL Server, so I have a bit of history with this topic. At the time, we added identity columns to SQL Server (this was before the Microsoft-Sybase divorce when Sybase did all of the development on SQL Server) because Oracle had a counterpart called ROWID columns. We couldn't call them ROWID columns because, as you probably know, SQL Server moves rows around from page to page whereas Oracle doesn't. Hence the row id in SQL Server is constantly changing, even when you do certain types of UPDATE statements. In short, we added IDENTITY columns because we needed to be feature-compatible with Oracle and because application vendors like Peoplesoft and SAP demanded it.

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[9/14/2009 7:08:02 PM]

Ten Common Database Design Mistakes

Our design guideline at Sybase was that you only use IDENTITY columns when you truly do not care what the value in that field is. That means that if you BCP the contents of the table from one server to another and all the values in that table end up different, you're okay with it. Most of the time, that's not the case, particularly when there are foreign key references to the column on which you used IDENTITY. What happens if you use BCP to move data between databases? BCP doesn't override the identity column which means that all the values might change and your foreign key references will all be broken. The "purist" design approach is to use IDENTITY columns as a primary key but don't allow foreign key references to it. Make all foreign key references to the "natural" key and use the identity column as a surrogate. Subject: Posted by: Posted on: Message:

Nuances Anonymous (not signed in) Wednesday, March 07, 2007 at 12:33 PM Great article! I fully agree with most of what you're saying. Depending on the system you're designing and the environment you're designing for some nuances might be in order ofcourse but that doesn't matter. The point about normalization is a good one but sometimes it's nescesary to denormalize a little again for performance's sake. In reply to a poster before me, the argument for not using IDENTITY-columns as a primary key is not valid. Never BCP into "life" tables! Always use an intermediate table to BCP in (prefferably with varchar-columns) then process and check(!) the data before inserting into the final target table. You can use the natural key to do that. If there's really no way around BCP-ing into a "life" table then use the SET IDENTITY_INSERT ON statement (but be very very very careful).

Subject: Posted by: Posted on: Message:

One addition to naming conventions BugEyedMonster (view profile) Wednesday, March 07, 2007 at 2:04 PM I will add my personal pet peeve into the naming convention, in part it is included with the don't name things as table, varchar, etc... Imaging a database table called servletDatabase, that's a good name right? WRONG! The conversation went something like this: P1: That data is stored in servletDatabase Me: Which server has the servlet database? P1: It is in the production database. Me: (Thinking, a database in a database?) Hey I found a servlet table, but it doesn't have the data I want. P1: The data is in the table servletDatabase. Me: Of course, I should have realized that.(NOT!) Since databases are made up of multiple components, servers, databases, tables, columns, I would expand the rule to include don't name two or more different components the same name. Why? What happens when you have a server named playpen with a database named playpen and you have to rehost the playpen database, or when someone asks for a new server so they can get another database besides playpen. Finally can you imagine telling someone to look in the name column in name database, running on the name server?

Subject: Posted by: Posted on: Message:

"purist" Anonymous (not signed in) Wednesday, March 07, 2007 at 3:53 PM What is the definition of a "purist" anyway? The definition of a "purist" that I know of says that the foreign key should never be based on natural keys, only surrogate keys. That is because natural keys mean something in the real world and therefore can change. As for having to BCP data around, that is when you switch to a GUID (or equivalent) surrogate key which is designed to be universally unique.

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[9/14/2009 7:08:02 PM]

Ten Common Database Design Mistakes

Subject: Posted by: Posted on: Message:

Saving Problem Anonymous (not signed in) Wednesday, March 07, 2007 at 4:50 PM Thank you for this grear subject. I have a project with two tables....1. Customers 2. Orders My problem is that when I try to display the data on the form in "details", I can edit and save data in the main Customers table, but the related Orders table is not responding. It removes any new data I try to enter in any control as a textbox. If the data is displayed in datagridview for the Orders table, it works although I can add a new recor but whenever I try to edit anything I got an "Update failed" mesage. If possible help me please. [email protected]

Subject: Posted by: Posted on: Message:

Replies Anonymous (not signed in) Thursday, March 08, 2007 at 8:05 PM 1. Except stored procedures... >>must sometimes use dynamic sql<< true enough, but it still makes performance tuning easier if you know what might be dynamically generated. CLR proc is a good idea here. >>ackward for some updates<< The only reason I slightly disagree with you there is that it takes longer to write the middle tier code to determine what has changed rather than just send all columns. And it usually isn't that expensive to update extra columns when setting the value to the same thing it was.

2. Column names >>and so names that are more meaningful to the developer can and should be used<< I can see that, sort of, but all of the gobbly gook that gets added really has no added value. TblFred? ColNameString? And do you include the length too? Too easy to query the metadata to waste time adding metadata to the name, in my opinion. 3. More on surrogate keys >>The candidate keys (the full list of possible keys on the table including the natural keys for those who aren't familiar with the term) are always going to be keys.<< You are being ironically nieve. Too too too many times people just have an identity value, make it the primary key and they are done. So many forum posts start: I have duplicate key values... >>Oh, don't forget that the other common database design mistake is NOT using a surrogate key as the primary key.<< I disagree here. You can choose to use natural keys for primary keys if you want, there is nothing wrong with it. I personally far identities, but there is noting wrong with it at all. 4. Use of identity columns in foreign key references >>Our design guideline at Sybase was that you only use IDENTITY columns when you truly do not care what the value in that field is. << Amen to that my friend. >>The "purist" design approach is to use IDENTITY columns as a primary key but don't allow foreign key references to it. Make all foreign key references to the "natural" key and use the identity column as a surrogate.<<

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[9/14/2009 7:08:02 PM]

Ten Common Database Design Mistakes

In SQL Server, we can keep identity values when moving data around. But the question to me isn't about identity values, but of using surrogates. I want the FK's to be meaningless too, because it makes joins faster since relationships are built on small values rather than larger ones. And the PK being a clustered key helps joins out too. 5. "purist" >>The definition of a "purist" that I know of says that the foreign key should never be based on natural keys, only surrogate keys.<< I would disagree here. I would expect a lot of "purist"s to cringe at the idea of using surrogates at all (I know of a certain purist that would, for sure). I agree with you that I never like keys with values that can change. 6. One addition to naming conventions >>Since databases are made up of multiple components, servers, databases, tables, columns, I would expand the rule to include don't name two or more different components the same name. << That is a good idea, but I think that for the most part this is not a problem if you have named your objects to match what they are. Also, I usually speak of things in terms of what they are. >>Finally can you imagine telling someone to look in the name column in name database, running on the name server?<< If it did, the way you put it makes it non-ambiguous what you mean. Is this likely to happen like this? Probably not, since you will be naming your objects in a manner that makes sense, and it is unlikely that the database, table, and column would all have the same name..

Subject: Posted by: Posted on: Message:

Mistakes Anonymous (not signed in) Saturday, March 10, 2007 at 11:12 AM Your books are getting better, your writing is getting better, and your face is getting more robust. :=)

Subject: Posted by: Posted on: Message:

Replies Anonymous (not signed in) Saturday, March 10, 2007 at 4:15 PM To Mistakes: My last book had a magnificant crew working behind the scenes, which included the editor of this site. He was simply excellent to work with, which is why I came back here. As for the face remark...yeah, that is something I have to work on (I wish it was just the face...) I have become obsessed with the home office of late and spend too much time sitting when I should be out doing (like right now :)

Subject: Posted by: Posted on: Message:

Excellent Article Anonymous (not signed in) Saturday, March 10, 2007 at 10:40 PM I have been studying normalization for some time now and have been fighting the concept. A search for a clearer understanding is what brought me here. Now, I am a newbie and a lot of the stuff you are talking about above and beyond normalization is over my head but I have to compliment you on an a fantastic article. I have gained a clearer understanding with regard to surrogate keys; something I never really understood. I always wondered why one couldn't use a part number instead of “some random key”. I was obviously taught incorrectly. It makes perfect sense. It is always easier to comprehend something when you back it up with a DETAILED example as you have done. I love developing databases and hope to be technically where you

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[9/14/2009 7:08:02 PM]

Ten Common Database Design Mistakes

are one day. Thank you for a great article and sharing your knowledge. -Frank Subject: Posted by: Posted on: Message:

Books Anonymous (not signed in) Saturday, March 10, 2007 at 10:52 PM After reading a bit more about you, I see that you have authored some books on SQL Server. I was wondering if you have written anything on design techniques. Microsoft is not my game but I could sure use your design knowledge. Thanks again, -Frank captonline at yahoo

Subject: Posted by: Posted on: Message:

Excellent! Anonymous (not signed in) Sunday, March 11, 2007 at 12:15 AM Nice column. Particularly like the advice contained in "One table to hold all domain values". Been in the position where people ahve done this and would not be pursuaded to see the light!

Subject: Posted by: Posted on: Message:

Great! Anonymous (not signed in) Wednesday, March 14, 2007 at 2:50 PM Great! I just do not agree with the 8th mistake. But it's personal with databases store procedures Congrats!

Subject: Posted by: Posted on: Message:

Brilliant arcticle Anonymous (not signed in) Thursday, March 15, 2007 at 5:06 AM Good argument on all rules and I generally agree on all. Now all we need is use them ;) The domain value table rule might be overridden by dynamic database design, if that is needed. (Non developers generating database objects / tables as in Microsoft CRM) But the rules are most valuable and valid for common database design. Congratulations

Subject: Posted by: Posted on: Message:

You've Inspired me! Anonymous (not signed in) Friday, March 23, 2007 at 3:52 PM I write web spiders for aggregating content (Web 2.0 - Displaying the same data in a different way). My tables can get very large the longer the spider runs. I StumbledUpon your article and the "thought bubble" caught my eye, so I started reading. Fantastic stuff, after reading your article, I'm now going to go back and fix my databases and try to use good practices from now on. Thanks!

Subject: Posted by: Posted on: Message:

Thank you for info Anonymous (not signed in) Friday, March 30, 2007 at 9:47 AM I am no wizz kid on databases, your article is great as it opens my eyes to some of the mistakes on database designs etc. Information like this is invaluable to us humble humans that are not too technical. Thank you

Subject: Posted by: Posted on: Message:

Normalization Anonymous (not signed in) Tuesday, April 17, 2007 at 12:19 PM I have set through two graduate level classes in Database and Object Oriented Development. In most instance, I was led to believe that with todays computing performance the idea of normalization is not as critical.

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[9/14/2009 7:08:02 PM]

Ten Common Database Design Mistakes

It appears that you do not agree. Do we really need to be overly concerned with normalization? Subject: Posted by: Posted on: Message:

re: Normalisation The SQL Server Thought Police (view profile) Wednesday, April 18, 2007 at 1:44 PM Yes! Goodness me Yes!

Subject: Posted by: Posted on: Message:

Generally, a quite good article, but... Anonymous (not signed in) Saturday, April 28, 2007 at 1:08 PM "As a developer, you should rely on being able to determine that a table name is a table name by context in the code or tool..." But in fact you can not. Consider this (frequent) scenario. To investigate the impact of a proposed change, you need to review the documentation (not just the code!) for references to table Customer. Not surprisingly, your documentation (external and internal) contains the word Customer, say, 4,000 times. (How long will it take to read those?) But if your table were named tblCustomer instead, the documentation might contain that word just 58 times, likely all of which are relevant. For decades we have known that maintenance is 70-80% of the lifetime expense of software. Thoughtful planning should minimize the expense of using documentation.

<< On the ManagerID column, you should place a foreign key constraint ... >> Unless that Manager table contains historical data, this leaves you open to problems. E.g., what happens when a manager recorded in the "table in the database that records the size of the discount" is demoted? (But the use of the trigger sounds good.)

The article would also be better by mentioning (formal/informal) code reviews as an adjunct to testing. Also, edit "(manly [sic] for triggers, abut [sic] also ...". Subject: Posted by: Posted on: Message:

this article Anonymous (not signed in) Sunday, April 29, 2007 at 9:58 AM THIS ARTICAL STINKS LIKE ROTTEN EGGS!

Subject: Posted by: Posted on: Message:

Identity pk, not a good idea? Anonymous (not signed in) Thursday, May 03, 2007 at 3:16 AM I can´t agree with the point of identity´s as primary keys. One of the greatest problems when developing applications is that when the client wants to modify some field that are part of the primary key as a requirement you need to spend many hours to change this typical problem. I think that the best approach is to use ALWAYS an indentity, or similar (Oracle/PostgreSql sequences over a numeric field) and use always too unique index to restrict duplicates.

Subject: Posted by: Posted on: Message:

More replies from the author... Anonymous (not signed in) Thursday, May 10, 2007 at 10:18 PM Thanks for almost all of the comments :) To: Normalization >>I have set through two graduate level classes in Database and Object Oriented Development. In most instance, I was led to believe that with todays computing performance the idea of normalization is not as critical. << Mercy. Normalization is just as important as it has ever been, and because of today's computing performance it is much easier to implement. Normalization is nothing more than forming the data in the way that it is natural to work with in SQL and eliminate redundancies that you have to maintain. Performance-wise, you can expend energy keeping every redundancy straight, or you can expend energy calculating what you need when you need it.

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[9/14/2009 7:08:02 PM]

Ten Common Database Design Mistakes

I also should make the point that while I completely believe in a normalized database structure, what is far more important is a normalized design. If you understand the denormalizations you make, then it won't kill you. The problem comes in when you have structures that are not optimized to use in a relational database query processor and you have to do unnatural acts to get things to work, or you miss conditions that are super-important to your client... To: Anonymous >>To investigate the impact of a proposed change, you need to review the documentation (not just the code!) for references to table Customer documentation (external and internal) << >>if your table were named tblCustomer instead, the documentation might contain that word just 58 times, likely all of which are relevant. << An interesting point. I would probably suggest a reasonable documentation pattern such as using [customer], or something like this, perhaps Customer table... >><< On the ManagerID column, you should place a foreign key constraint ... >> Unless that Manager table contains historical data, this leaves you open to problems. E.g., what happens when a manager recorded in the "table in the database that records the size of the discount" is demoted? (But the use of the trigger sounds good.)<< Good point. And truthfully, it would have been better to use EmployeeId there and have a check constraint/trigger to make sure that the person was a valid approver for a discount. >>The article would also be better by mentioning (formal/informal) code reviews as an adjunct to testing.<< Good point, and I can't disagree, though technically code reviews serve a far different purpose than the sort of testing I was meaning. Code reviews are an essential part of the process so you have good coding practices, but the kind of testing I was referring to was making sure that the data is correct, which is more or less impractical in a code review. To: this article >>THIS ARTICAL STINKS LIKE ROTTEN EGGS! << Wow... Can you be more specific? Like exactly what an Artical is?

To: Identity pk, not a good idea? >>I can´t agree with the point of identity´s as primary keys<<>>I think that the best approach is to use ALWAYS an indentity<< I can only assume you are replying to another commenter, because I never design a table without an identity (or guid, if forced) primary key. Subject: Posted by: Posted on: Message:

Disagree with Stored Procs Anonymous (not signed in) Tuesday, May 15, 2007 at 3:28 AM With ORM tools rife, ActiveRecord patterns becoming popular and database independance becoming more important now than ever (I wouldn't dare sell a product which relied on proprietary DB's), stored procs are a pain, if not extinct.

Subject: Posted by: Posted on: Message:

One table... Anonymous (not signed in) Thursday, May 17, 2007 at 11:31 PM Does this issue illustrate the differences between application and database programming? For an application, performance is a direct function of code flexibility and efficiency because data is given to the application at the moment of use. For databases, however, the data is already present so code that anticipates the users request gives higher performance.

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[9/14/2009 7:08:02 PM]

Ten Common Database Design Mistakes

Subject: Posted by: Posted on: Message:

Good advice except for.. Anonymous (not signed in) Wednesday, May 30, 2007 at 9:15 PM I agree with most of the points except... Store procedures are NOT a best practice. Implementing too much business logic in them will result in a poorly maintainable system. You can get better encapsulation by implementing a proper business model, and placing the business logic in there. Let's face it. Most general purpose programming languages (GPL) are significantly more powerful than store procedures, and because of that you will create different architectures that can handle more requirements than you would with SP. With this freedom you'll express much more complicated logic more susinctly in them, and get greater flexibility, more maintainability, and just as good performance. Most GPL are OO and they have objects for encapsulation, inherritance and components for reuse, and offer much more expressiveness through dynamic binding. That's not even getting into the mapping inconsistencies between your object model and procedure create. Stored procedures are a left over from 1990 architecture modeling. It's much more effecient to use ORM/Data Mapping tools to develop your code.

Subject: Posted by: Posted on: Message:

Underscores in names? Anonymous (not signed in) Friday, June 01, 2007 at 8:16 AM Hi, thank you so much for such an informative article! There is some debate in my project on naming conventions. One thing that I'd like to get your perspective on: is there any reason why one should not have an undercore in names? Above, you mention that "part_number", for example, is acceptable. I'm inclined to think that this makes things much more readable, especially when users need to go directly to the tables (no front end app yet).... but what reason might someone have for saying that undercores make future development difficult? Much thanks!

Subject: Posted by: Posted on: Message:

No to stored procedures Anonymous (not signed in) Tuesday, June 19, 2007 at 3:56 PM I agree with the comment at Wednesday, May 30, 2007 at 9:15 PM. All the examples given to support NOT using stored procedures can be satisfied with non-store procedure solutions.

Subject: Posted by: Posted on: Message:

Complex Busniness logic and Stored Procedures Makky (not signed in) Monday, July 09, 2007 at 5:05 AM what is your take on this? performance wise wouldnt it be better to do this in application level. application languages like C# would out perform t-sql wouldnt it?

Subject: Posted by: Posted on: Message:

Using natural identity as PK is slow Makky (not signed in) Monday, July 09, 2007 at 5:10 AM indexing on columns that have complex data and/or data types is not a good idea. sql will always retrieve records quicker when searching on an indexed field that is of a simple type like an IDENT key field. Yes or No?

Subject: Reply to more comments Posted Anonymous (not signed in) by: Posted Saturday, July 14, 2007 at 4:42 PM on: Message:Thanks for the comments. So many of them are reasonable it is great. I disagree with a few, but I understand your points too....

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[9/14/2009 7:08:02 PM]

Ten Common Database Design Mistakes

>>(I wouldn't dare sell a product which relied on proprietary DB's)<< >>stored procs are a pain, if not extinct.<< I agree with you in essense about stored procedures, except for one thing. All of database systems are different, and have minor differences. Putting things into stored procedures may take more work but it will pay off in the end. I have more to say about procedures, after a few more replies...

>>Store procedures are NOT a best practice. Implementing too much business logic in them will result in a poorly maintainable system. You can get better encapsulation by implementing a proper business model, and placing the business logic in there.<< And the difference is? I have yet to see a well maintained middle layer of logic that made it reasonable to tune and maintain the DATA oriented code. I agree 100% that business logic shouldn't be done in procedures, just data oriented code and constraints that deal with the ALWAYS rules. If a column must always be non-null, then the column is built as NOT NULL. If a column must always be > 100, then a constraint is applied. Whenever you build a function that builds a SQL Query, you are putting data logic in a difficult place. Just build a procedure with parameters. >>Let's face it. Most general purpose programming languages (GPL) are significantly more powerful than store procedures<< This is the big lie. It is also very very true, hence the confusion. T-SQL (and other relational languages are not good programming languages for working with data individually. Anytime you have to work with one thing at a time, TSQL is slow and clunky. On the other hand, where functional programming languages fall apart is working with thousands of rows. All I advocate is using the right tools in the right place. >>That's not even getting into the mapping inconsistencies between your object model and procedure create. << This is generally a lack of creativity where people try to only build procedures that match one to one with tables. And if you have a well normalized set of tables, things get even more messy >>Stored procedures are a left over from 1990 architecture modeling. It's much more effecient to use ORM/Data Mapping tools to develop your code.<< Efficient is the other challenging subject that comes up all of the time. Yes, it is more efficient in human terms to use tools to do your work. But the problem is when the system is in production and you have to manage it. Systems built on stored procedures are a dream to work with. This procedure is slow, well, let's tune it. When SQL is built in layers of application code, along with data protection code, you can spend hours debugging and tuning the application, just trying to locate the problem. And if you use a single row at a time mentality that many tools seem to encourage, then in a larger system you are going to be sifting through hundreds and hundreds of calls where one or two would have sufficed. >>All the examples given to support NOT using stored procedures can be satisfied with non-store procedure solutions.<< Sure, I don't dispute that stored procedures are not a requirement. Just because something can be done adequately in one manner doesn't mean that something else is better. You can build a house with your bare hands if you try hard enough, but with a hammer and a saw it is easier. That is not a perfect analogy, but the point is, stored procedures, when done right, are far easier to work with than is code embedded in the application. >>performance wise wouldnt it be better to do this in application level. application languages like C# would out perform t-sql wouldnt it? << It depends on what the code does. Using stored procedures is not an advocation of building client server applications. I don't want to go back to the elaborate stored procedures we once wrote to do everything in the database 10+ years ago. All I advocate is that where your application might build a string like:

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[9/14/2009 7:08:02 PM]

Ten Common Database Design Mistakes

SELECT column1, column2 FROM table WHERE column3 = 2 A stored procedure could be built such that the call is simply: table$retreiveAll @column3 = 2 Now, if I need to optimize that call somehow, it is accessible. It is easier to build each of these things in their own proper languages, instead of building code on the fly for the SQL. I mean, if you are a good C programmer (any flavor) you probably clear all of your warnings before shipping, much less generating part of the code on the fly when the client pushes a button. Why not do the same for the data-oriented code? Procedures are just another layer of abstraction enabling data programming to go on using data oriented tools. You should use the functional languages for anything that is not set based in nature. Like if column2 needs to be all uppercase, it would be best to do that in the data layer, Or the date formatted YYYYMMDD, etc. Look, it is not that database people want to take over and do all of the work. Not at all. It is just that when the rubber hits the road and the application is running slowly, who are you going to call? The dba (http://drsql.spaces.live.com/blog/cns!80677FB08B3162E4!1691.entry). And they will be charged with discovering what is going on. If something is wrong in the data layer, the dba can change things and get things up and running in no time.

>>There is some debate in my project on naming conventions. One thing that I'd like to get your perspective on: is there any reason why one should not have an undercore in names? << Good luck. Naming conventions are freaking annoying to argue about, because there are no clear answers. I have actually started to like underscores again because they are far more readably and easily parsed by tools. I have no good answer for you :) >>indexing on columns that have complex data and/or data types is not a good idea. sql will always retrieve records quicker when searching on an indexed field that is of a simple type like an IDENT key field. Yes or No?<< To call this an oversimplication is a disservice to the word. I more or less agree with you that it is just the simplest pattern to use to always just use an identity column for the key. ANd if you are always fetching things by the key (like so many ORM style solutions would, though likely with a GUID) then this is going to be your best bet. But there is a lot of debate about this. For examples, check out Greg Linwood's blog: blogs.sqlserver.org.au/blogs/greg_linwood (or search google with this link: http://www.google.com/search? hl=en&q=clustered+site%3Ablogs.sqlserver.org.au%2Fblogs%2Fgreg_linwood) He has an interesting series of posts about why clustered indexes may not be the best thing since sliced bread.

Subject: Posted by: Posted on: Message:

Ten Common Database Design Mistakes Anonymous (not signed in) Wednesday, July 18, 2007 at 11:27 PM Very good article.

Subject: Posted by: Posted on: Message:

Anonymous comments disabled AnnaL (view profile) Monday, September 17, 2007 at 10:03 AM We've had to disable anonymous comments on this article due to relentless spamming. To post a comment please sign in, or register if you are not already a member.

Subject: Posted by: Posted on: Message:

Meaningless keys ams101 (view profile) Friday, November 28, 2008 at 7:08 AM It is a vry strong argument that every column in a table should be meaningful. But there is on important situation which I faced more than often, especially in material data: if you have to import data of

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[9/14/2009 7:08:02 PM]

Ten Common Database Design Mistakes

limited quality and doubtful source (XL etc.) you will love your additional system key and an additional record status column. Another horrible real-world example: The 1995 newly designed 5-digit postal zip codes (!!!) were not unique and are still not unique. You will need two additional columns, one for a proper id and one for East/West (the Wall still exists at least in BI SW). Subject: Posted by: Posted on: Message:

I don't know much but... sgtish (view profile) Monday, January 05, 2009 at 11:35 PM This article has been very informative for me. The only things I know about database design are the things that I'm learning / learned the hardway. I don't have a SQL Server and have to work with MS Access but the beginning part all the way down to "Lack of Documentation" did reinforce all the lessions I've learned so far. After that things started to get fuzzy because that was all new to me except for the testing part. I've gave the newly built database to the users and they have found more bugs than I had anticipated and, fortunately or luckily whatever you want, they were relatively easy to fix. I fully anticipate that when they really start to use it the serious bugs will rear their nasty heads. I think that in the future better planning, user input, and testing will help me a lot.

Subject: Posted by: Posted on: Message:

HELP!! Requirements Analysis rhasken (view profile) Sunday, March 15, 2009 at 7:33 PM I'm new to this, so please bear with me. I work at a start up company which was started ~ 9 years ago. We're handling more work than we ever have before, and there's a strain on what is available in our database, as compared to what we want. While planning our database improvement projects for this year, we realized that we probably need to create a master plan to detail what we want from our database. From what I've recently read, we're going to go through the Requirements Analysis process of database planning. So, finally, my dilemma. How do I, as an engineer and a manager, communicate with our database team? Is there a format for creating a flowchart, or the like, that will help me to ogranize my ideas in a way that the implementors of our database will understand? Any advice is greatly appreciated. Thanks, Rob.

Subject: Posted by: Posted on: Message:

Thank you! PeggyScott84 (view profile) Friday, May 22, 2009 at 4:48 PM I am a newbie. It was a great article to get me started. Spoke about things I hadn't anticipated (The point on surrogate v/s natural keys). Thank you!

About

| Site map | Become an author | Newsletters | Contact us

Privacy policy

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/[9/14/2009 7:08:02 PM]

| Terms and conditions | ©2005-2009 Red Gate Software

Ten Common Database Design Mistakes

Overview

More details

Related Documents

Ten Common Database Design Mistakes

Common Mistakes

Common Mistakes

Csr Top Ten Mistakes

Common Grammatical Mistakes

Common Mistakes At Pet.pdf