This book contains the best material published on the site from 2003. It’s a variety of topics from administration to advanced querying. XML to DTS, security to performance tuning. And of course, the famous White Board, Flip Chart, or Notepad debate. So why print a book containing material you can get for free? Take a minute, read the introduction and find out!
The Best of SQLServerCentral.com Vol. 2 Essays and Ideas from the SQL Server Community
Andy Jones, Andy Warren, Bob Musser, Brian Kelley, Brian Knight, Bruce Szabo, Chad Miller, Chris Cubley, Chris Kempster, Christopher Duncan, Christoffer Hedgate, Dale Elizabeth Corey, Darwin Hatheway, David Poole, David Sumlin, Dinesh Asanka, Dinesh Priyankara, Don Peterson, Frank Kalis, Gheorghe Ciubuc, Greg Robidoux, Gregory Larsen, Haidong Ji, Herve Roggero, James Travis, Jeremy Kadlec, Jon Reade, Jon Winer, Joseph Gama, Joseph Sack, Kevin Feit, M Ivica, Mike Pearson, Nagabhushanam Ponnapalli, Narayana Raghavendra, Rahul Sharma, Ramesh Gummadi, Randy Dyess, Robert Marda, Robin Back, Ryan Randall, Sean Burke, Sharad Nandwani, Stefan Popovski, Steve Jones, Tom Osoba, Viktor Gorodnichenko
Book printing partially sponsored by
Shelving: Database/SQL Server
The Best of SQLServerCentral.com — Vol. 2
In April 2001 six geeks banded together to form a more perfect site. Three years and 140,000+ members later, SQLServerCentral.com is one of the premier SQL Server communities in the world. We’ve got over 1,000 articles, 100s of scripts and FAQs, everything you would need as a SQL Server DBA or developer, and all at a great price — free.
$22.99 USA
2030
5060
0 10 40
90 7080
0 10 90 2030
5060 40
7080
The Best of SQLServerCentral.com - Vol. 2 Andy Jones Andy Warren Bob Musser Brian Kelley Brian Knight Bruce Szabo Chad Miller Chris Cubley Chris Kempster Christoffer Hedgate Christopher Duncan Dale Elizabeth Corey Darwin Hatheway David Poole David Sumlin Dinesh Asanka Dinesh Priyankara Don Peterson Frank Kalis Gheorghe Ciubuc Greg Robidoux Gregory Larsen Haidong Ji Herve Roggero James Travis Jeremy Kadlec Jon Reade Jon Winer Joseph Gama Joseph Sack Kevin Feit M Ivica Mike Pearson Nagabhushanam Ponnapalli Narayana Raghavendra Rahul Sharma Ramesh Gummadi Randy Dyess Robert Marda Robin Back Ryan Randall Sean Burke Sharad Nandwani Stefan Popovski Steve Jones Tom Osoba Viktor Gorodnichenko
3
The Central Publishing Group 3186 Michaels Ct Green Cove Springs, FL 32043 U.S.A
Copyright Notice Copyright 2004 by The Central Publishing Group. All rights reserved. Except as permitted under the Copyright Act of 1976, no part of this publication may be reproduced under the Copyright Act of 1976. No part of this publication may be reproduced in any form or by any means or by a database retrieval system without the prior written consent of The Central Publishing Group. The publication is intended for the audience of the purchaser of the book. This publication cannot be reproduced for the use of any other person other than the purchaser. Authors of the material contained in this book retain copyright to their respective works. Disclaimer The Central Publishing Group, SQLServerCentral.com, and the authors of the articles contained in this book are not liable for any problems resulting from the use of techniques, source code, or compiled executables referenced in this book. Users should review all procedures carefully, test first on a non-production server, and always have good backup before using on a production server. Trademarks Microsoft, SQL Server, Windows, and Visual Basic are registered trademarks of Microsoft Corporation, Inc. Oracle is a trademark of Oracle Corporation. Editors Steve Jones and Andy Warren Cover Art Sylvia Peretz of PeretzDesign.com
4
The Best of SQLServerCentral.com – Vol. 2 Table of Contents Introduction
8
About The Authors
9
Administration
15
Auto Close and Auto Shrink - Just Don't
Mike Pearson
16
Autoclose for Databases
Steve Jones
17
Autoclose for Databases - Part II
Jon Reade
18
AWE Adventures
Joseph Sack
19
Best Practices in an Adhoc Environment
Sharad Nandwani
20
Finding Real Dependencies
Stefan Popovski
21
Tips for Full-Text Indexing/Catalog Population/Querying in SQL 7.0 and 2000
Jon Winer
25
Getting Rid of Excessive Files and Filegroups in SQL Server
Chad Miller
27
Importing And Analyzing Event Logs
Gheorghe Ciubuc
30
Initial Installation of the Production Database
Andy Jones
32
Scheduling SQL Server Traces - Part 2
Rahul Sharma
34
SQL Server Upgrade Recommendations and Best Practices - Part 1
Jeremy Kadlec
41
Who Needs Change Management?
Greg Robidoux
46
Who Owns that Database
Steve Jones
49
DTS
51
Auditing DTS Packages
Haidong Ji
52
Automate DTS Logging
Haidong Ji
54
Building Business Intelligence Data Warehouses
Tom Osoba
56
Comparison of Business Intelligence Strategies between SQL and Oracle
Dinesh Priyankara
58
Portable DTS Packages
Kevin Feit
61
Replacing BCP with SQLBulkLoad
Steven Popovski
64
Security
67 5
Block the DBA?
Robert Marda
68
SQL Server Security: Login Weaknesses
Brian Kelley
70
SQL Server Security: Why Security Is Important
Brian Kelley
77
TSQL Virus or Bomb?
Joseph Gama
81
Performance
85
Cluster That Index
Christoffer Hedgate
86
Cluster That Index – Part 2
Christoffer Hedgate
88
Managing Max Degree of Parallelism
Herve Roggero
90
Monitoring Performance
Victor Gorodnichenko
92
Squeezing Wasted Full Scans out of SQL Server Agent
Bob Musser
97
Troubleshooting SQL Server with the Sysperfinfo Table
Joseph Sack
98
T-SQL
100
A Lookup Strategy Defined
David Sumlin
101
Create Maintenance Job with a Click without using a Wizard
Robin Back
104
Creating a PDF from a Stored Procedure
M Ivaca
110
Creating a Script from a Stored Procedure
Ryan Randall
112
Date Time Values and Time Zones
Dinesh Asanka
115
Find Min/Max Values in a Set
Dinesh Asanka
117
Gathering Random Data
Brian Knight
118
It Can't Be Done with SQL
Cade Bryant
120
Managing Jobs Using T-SQL
Randy Dyess
124
Multiple Table Insert
Narayana Raghavendra
128
Reusing Identities
Dinesh Priyankara
132
Sequential Numbering
Gregory Larsen
135
Using Built in Functions in User Defined Functions
Nagabhushanam Ponnapalli
138
Understanding the Difference Between IS NULL and =NULL
James Travis
139
Using Exotic Joins in SQL Part 1
Chris Cubley
141
Using Exotic Joins in SQL Part 2
Chris Cubley
144
6
Replication Altering Replicated Tables (SQL 2000)
147 Andy Warren
XML Is XML the Answer?
148
153 Don Peterson
Design and Strategies
154
157
Codd's Rules
Frank Kalis
158
Design A Database Using an Entity-Relationship Diagram
Ramesh Gummadi
160
Miscellaneous
164
A Brief History of SQL
Frank Kalis
165
Change Management
Chris Kempster
166
DBMS vs File Management System
Dale Elizabeth Corey
174
Introduction to English Query and Speech Recognition
Sean Burke
177
Lessons from my first Project as a Project Manager
David Poole
179
Pro Developer: This is Business
Christopher Duncan
182
Two Best Practices!
Darwin Hatheway
184
VBScript Class to Return Backup Information
Bruce Szabo
186
White Board, Flip Chart or Notepad
Andy Warren
192
7
INTRODUCTION Welcome to The Best of SQL Server Central.com – Vol. 2! Once again SQL Server Central.com had another fantastic year and we decided to reprint some of the best articles, the most popular, and most read in dead tree format. We wanted to give all our authors a chance to see their names in print as well as give you an offline resource that you can take with you wherever you may need it. Most likely at your bedside to help you drop off at night , but for commutes, holding your coffee cup whatever. And Red-Gate Software has once again sponsored the book and worked with us to bring you this great reference. We would also like to thank everyone for their support both on the website as well as by purchasing this book. Your visits to the site, clicking through to advertisers, purchasing products, registering for PASS, all help us continue this community and provide you with a valuable resource that hopefully helps you learn, perform better at your job, and grow your career. We’d like to encourage all of you to submit an article in 2005! This is a community and we aren’t looking for the guru’s only to contribute. We love hearing about the real world you all live in and deal with on a daily basis. We plan to get at least one article from each author and send you a couple copies of the book. Great for your bookshelf and they make a great Mother’s Day present. Once again, thanks so much for your support and we look forward to 2005. Andy Warren Brian Knight Steve Jones
8
About The Authors Andy Jones I am currently employed at a large UK software house and am working as a SQL Server 2000 DBA within a development environment. After previously working with Visual Basic and Oracle for three years, I chose to move over to solely concentrate on database development. I have been in my current position working with SQL Server for the past two years, my role encompasses daily administrative tasks like managing backups and users through to my main job of database design and development. I also have extensive experience of reporting against RDBMS using such tools as Crystal Reports and Visual Basic Reports. Initial Installation of the Production Database – pg. 33
Andy Warren Altering Replicated Tables – pg. 164 White Board, Flip Chart, or Notepad? – pg. 213
Bob Musser Bob Musser is the President of Database Services, Inc., an Orlando based vertical market software provider. His company, in business since 1988, provides software and support primarily for process servers and couriers. They also run an exchange system for process servers to trade work with built on SQL Server. Squeezing Wasted Full Scans out of SQL Server Agent – pg. 107
Brian Kelley Brian is currently an Enterprise Systems Architect with AgFirst Farm Credit Bank (http://www.agfirst.com) in Columbia, SC. Prior to that he served as a senior DBA and web developer with AgFirst. His primary responsibilities include overhauling the current Windows NT infrastructure to provide for a highly available, network-optimized framework that is Active Directory ready. Brian assumed his Architect role in December 2001. He has been at AgFirst since January of 2000 when he originally came on-board as an Intranet web developer and database programmer. In addition to his role at AgFirst Farm Credit Bank, Brian heads Warp Drive Design Ministries (http://www.warpdrivedesign.org), a Christian ministry devoted to using technology for encouraging Christians in their faith as well as presenting the Gospel in a non-confrontational manner. The main focus is an email devotional ministry currently being penned by Michael Bishop (
[email protected]), a computer engineering student at Clemson University (http://www.clemson.edu). Prior to AgFirst, Brian worked as a system administrator and web developer for BellSouth's Yellow Pages group and served on active duty as an officer with the US Air Force. He has been a columnist at SQL Server Central since July 2001. Brian is the author of the eBook, Start to Finish Guide to SQL Server Performance Monitoring, on sale here at SQL Server Central: http://www.netimpress.com/shop/product.asp?ProductID=1 SQL Server Security: Login Weaknesses – pg. 71 SQL ServerSecurity: Why Security is Important – pg 81
Brian Knight Brian Knight, MCSE, MCDBA, is on the Board of Directors for the Professional Association for SQL Server (PASS) and runs the local SQL Server users group in Jacksonville. Brian is a contributing columnist for SQL Magazine and also maintains a weekly column for the database website SQLServerCentral.com. He is the author of Admin911: SQL Server (Osborne/McGraw-Hill Publishing) and co-author of Professional SQL Server DTS (Wrox Press). Brian is a Senior SQL Server Database Consultant at Alltel in Jacksonville and spends most of his time deep in DTS and SQL Server. Gathering Random Data pg. 131
Bruce Szabo VBScript Class to Return Backup Information – pg. 212
Cade Bryant It Can’t Be Done With TSQL - Pg. 132
9
Chad Miller Getting Rid of Excessive Files and Filegroups in SQL Server – pg. 27
Chris Cubley Chris Cubley is an MCSD with over four years of experience designing and implementing SQL Server-based solutions in the education, healthcare, and telecommunications industries. He can be reached at
[email protected]. Using Exotic Joins in SQL Part 1 – pg. 155 Using Exotic Joins in SQL Part 2 – pg. 158
Chris Kempster Chris has been working in the computer industry for around 8 years as an Application Development DBA. He began his career as an Oracle AP then moved into a DBA role using Oracle. From there he has specialised in DBA technical consulting with a focus to both Oracle and SQL Server. Chris has been actively involved with SQL Server since 1999 for a variety of clients and is currently working with a range of large scale internet based development projects. Visit www.chriskempster.com for further information and his ebook titled "SQL Server 2k for the Oracle DBA". Change Management – pg. 186
Christoffer Hedgate I work in Lund, Sweden, at a company called Apptus Technologies. Apptus are specialized in database research and consulting, including development of search engines. Most of the time I work with SQL Server as an architect, administrator and developer, but I have also done some work with other DBMS such as Oracle and TimesTen. I also do some programming, mainly in Visual Basic, C# and Java (plus a couple of scripting languages). I am also the co-owner of sql.nu (http://www.sql.nu/) where you can find more articles from me. Clustre That Index – pg. 93 Clustre That Index – Part 2 – pg. 96
Christopher Duncan Christopher Duncan is an author, musician, veteran programmer and corporate troublemaker, and is also President of Show Programming of Atlanta, Inc. Irreverent, passionate, unconventional and sometimes controversial, his focus has always been less on the academic and more on simply delivering the goods, breaking any rules that happen to be inconvenient at the moment. Pro Developer: This is Business – pg. 205
Dale Elizabeth Corey Dale Elizabeth Corey has 16 years of professional experience including IT/IS Management, DBA, Programmer, System/Network Administrator, and Software Instructor for such organizations such as the City of Titusville, FL, Brevard Community College, and McDonnell Douglas Space Systems at Kennedy Space Center. Currently, she is employed by Yellow Page Directory Services, Inc. (an affiliation of E-Marketing Technology, Inc.) Her education credentials include a M.S. in Management Information Systems from Nova Southeastern University (4.0 GPA); and a B.B.A. in Computer Information Systems. Her membership and certification credentials include Upsilon Pi Epsilon (international honor society in the computing sciences); Microsoft Certified Systems Engineer; Microsoft Certified Professional + Internet; Association of Computing Machinery; and IEEE. DBMS vs. File Management System – pg. 195
Darwin Hatheway Two Best Practices! – pg. 210
David Poole David Poole has been developing business applications since the days of the Commodore Pet. Those were the days when 8K was called RAM not KEYBOARD BUFFER. He specialised in databases at an early stage of his career. He started developing marketing applications using Ashton-Tate dBase II/Clipper, progressing through a variety of PC database applications before working on HP Image/3000 systems. He has spent 5 years as DBA within the Manchester (UK) office of the worlds No 1 advertising agency, where he focussed on data warehousing and systems integration. At present he is working within the web development department of a company specialising in information delivery. Lessons from my first project as a Project Manager – pg. 202
10
David Sumlin David Sumlin has run his own contracting business for the last 6 years and has been involved in SQL development for the last 8 years. He and his team of skilled developers have tried to stay up to date with Microsoft development technologies when it comes to SQL and .net. For the last few years, HeadWorks Inc. has been focused on data warehouses and marts and the reporting applications that interact with them for large financial institutions. When David isn't coding, he's out looking for birdies! A Lookup Strategy Defined – pg. 113
Dinesh Asanka I started my carrier in 1993 as an implementation officer in Jagath Robotics Pvt Ltd. After around three months I started his programming life in Cobol, Dbase 3+ , Fox base. I was involved in developing accounting / plantation / Audit / Hr systems. Then graduated from the University of Moratuwa , Sri Lanka as an electrical Engineer in 2001. Currently I am working as a Software engineer for Eutech Cybertics Pte. Ltd. I am involving in software developments and database designing. These days I am following a course of MBA in IT. In the field of databases I have experience in Dbase 3+ , Fox Base , Clipper, MSAccess , Oracle and SQL SERVER. My SQLServer carrier started in 2000. Still I’m learning and there is lot more to learn in SQL Server. I’m a cricket lover and I would like to continue knowledge sharing in SQL SERVER. Date Time Values and Time Zones – pg. 126 Find Min/Max Values in a Ser – pg. 128
Dinesh Priyankara Comparison of Business Intelligence Strategies between SQL and Oracle – pg. 142 Reusing Identities – pg. 57
Don Peterson Is XML the Answer? – pg. 171
Frank Kalis Codd’s Rules – pg. 176 A Brief History of SQL – pg. 184
Gheorghe Ciubuc Importing and Analyzing Event Logs – pg
Greg Robidoux Who Needs Change Management? – pg. 42
Gregory Larsen Currently a SQL Server DBA. I've been working with SQL Server since 1999. I'm an old-time mainframe DBA. My DBA career started in 1985. Currently studying to obtaining MCDBA. Sequential Numbering – pg. 146
Haidong Ji I was a developer, working with VB, SQL Server, Access and lots of other Microsoft stuff. I am currently a SQL Server DBA in my company in the Chicago area. I am MCSD and MCDBA certified. In my spare time, if I have any, I like to do Linux, C and other open source project. I can be reached at
[email protected] Auditing DTS Packages – pg. 48 Automate DTS Logging – pg. 50
Herve Roggero Herve Roggero (MCSD, MCDBA) is an Integration Architect with Unisys and works on Windows Datacenter and SQL Server running on 32 processors (and it rocks!). Herve has experience in several industries including retail, real estate and insurance. First contact with RDBMS was in the early 90's. Herve is a member of the Chicago SQL Server user group. Hobbies: Piano and Star Gazing. Managing Max Degree of Parallelism – pg. 98
11
James Travis I currently work for a major US Bank in their internal support service area. I develop tools and applications used by the call center, division, and corporate employees for various tasks. I work with the following development platforms: ASP Visual Basic ActiveX COM/DCOM .NET Crystal Reports VBScript JavaScript XML/XSL HTML DHTML VBA with Office 97/2000/XP SQL Server Oracle Informix Visual C++ Photoshop I am admin over several severs. Platforms include: SQL 7 & 2000 IIS 4 & 5 Windows NT 4 & 2000 Currently I have developed or am developing applications for: Project and employee time tracking Financial Center/ATM information Web based password resets for various systems Call center scheduling and adherence Inventory Understanding the Difference Between IS NULL and = NULL – pg. 152
Jeremy Kadlec Jeremy Kadlec is the Principal Database Engineer at Edgewood Solutions, (www.edgewoodsolutions.com) a technology services company delivering full spectrum Microsoft SQL Server Services in North America. Jeremy can be reached at 410.591.4683 or
[email protected]. SQL Server Upgrade Recommendations and Best Practices – Part 1 – pg. 37
Jon Reade Based in the UK at present, SQL Server DBA since early 1996 (SQL Server 6.0, so a relative novice!), prior to that a database developer for ten years on various Microsoft based systems. Started out with 6502 assembler on the Acorn Atom back in '78, I now work as a SQL Server database administrator / troubleshooter for various financial organisations, mainly speeding up very, very slow databases, which for some inexplicable reason I get enormous gratification from :)
Autoclose for Databases Part II – pg.16
Jon Winer Tips for Full-Text Indexing/Catalog Population/Querying in SQL 7.0 and 2000 – pg 24
Joseph Gama José (Joseph) Gama is a software engineer currently working with .NET and SQL Server 2000. He has contributed over 10,000 lines of source code to the public domain. "Open source is the future, a prosperous future based on trust, Truthfulness and cooperation. Freedom is not only the right to choose but also the right to have unconditional access to tools and information. Learning, creating, communicating and leisure are rights bond to every human being. TSQL Virus of Bomb? – pg. 87
Joseph Sack Joseph Sack is a SQL Server consultant based in Minneapolis, Minnesota. Since 1997, he has been developing and supporting SQL Server environments for clients in financial services, IT consulting, and manufacturing. He is a Microsoft Certified Database Administrator (MCDBA). Joseph has written for SQL Server Magazine, and recently wrote a book called “SQL Server 2000 Fast Answers for DBAs and Developers”. For questions or comments, feel to contact him at www.joesack.com. AWE Adventures – pg 18 Troubleshooting SQL Server with the Sysperfinfo Table – pg. 109
Kevin Feit I have been involved with software development for the past 13 years. I started working with SQL Server in 1990, when Microsoft and Sybase were development partners. One major project I have worked was development of custom middleware for communications between OS/2 and mainframe using SQL Server as the data store for the directory service. (This was in 1991-2, before the term middleware was even coined.) More recently, I was the project manager and database architect for
12
the development of the Intercommercial Markets web site (www.intercommercial.com), an online exchange for the green coffee industry using a SQL Server 7 database. I am currently working for a large financial services company. Recent activities have included upgrading my division's servers from SQL Server 7 to 2000, and developing several complex DTS packages required for an accounting reconciliation process. Portable DTS Packages – pg. 60
M Ivica Creating a PDF from a Stored Procedure – pg. 123
Mike Pearson Auto Close and Auto Shrink - Just Don't – pg. 13
Nagabhushanam Ponnapalli Using Built in Functions in User Defined Functions – pg.. 151
Narayana Raghavendra Multiple Table Insert – pg. 140
Rahul Sharma Rahul Sharma is a senior database administrator for Manhattan Associates, Inc., and a columnist for databasejournal.com, dbazine.com and SQLServerCentral.com. He has a bachelors and a masters degree in engineering and has been working with Microsoft SQL Server since the release of SQL Server 6.5 and is currently working with both SQL Server 2000 and Oracle 9i. He is a Microsoft Certified Professional with more than six years of experience in database development and administration. He is published (Publishers: Addison Wesley) and his book's title is: Microsoft SQL Server 2000: A Guide to Enhancements and New Features (ISBN: 0201752832). Scheduling SQL Server Traces – Part 2 – pg. 35
Ramesh Gummadi Design Using an Entity-Relationship Diagram – pg. 178
Randy Dyess I have been working with SQL Server for over 5 years as both a development and production DBA. Before SQL Server I spent time as both a Visual Basic developer and Microsoft Access developer. Numerous projects upsizing Access to SQL Server lead me to become a full-time SQL Server DBA. Currently I have the privilege of working on one of the world's largest SQL Server "read-world" production installations at Verizon Communications for Verizon's billing project. We have 11 main databases totaling over 9 Terabytes of data with the largest single database over 2.2 Terabytes. My current position is as a development DBA, developing new Transact-SQL code and enhancing existing code. Before working at Verizon, I worked at one of the largest advertising firms in America: Rapp Collins. There I supported up to 60 SQL Server web databases at a time, with some Oracle thrown in, doubling as both a development DBA and production DBA. Clients before Rapp Collins include: Auto One (a leading auto loan lender), Realpage, Inc. (leader in multi-housing management software) and BlueCross BlueShield of Texas (a large insurance company). You can find out more about me and my works by visiting my website. Managing Jobs Using T-SQL – pg. 138
Robert Marda I have worked for bigdough.com since 18 May 2000 as an SQL Programmer. My duties include backup management for all our SQL Servers, mentoring junior SQL Programmers, and serving as DBA while our DBA is on vacation. I develop, test, and deploy stored procedures and DTS packages as well as manage most major SQL projects. Our offices are located in Bethesda, Maryland. I have been married to Leoncia Guzman since 23 Jul 1994. We met in the Dominican Republic where I lived for about 2 years as a missionary. We have 4 children, Willem (age 8), Adonis (age 6), Liem (age 4 and a half), and Sharleen (age 3 and a half).
13
My hobbies include spending time with our 4 children (we play chess, dominos, mancala, and video or computer games together), keeping tropical freshwater fish, breeding and training parakeets, coin collecting (US and foreign), and geneology. I have a 55 gallon tank and 20 gallon tank. I have many kinds of fish (such as a pleco, tiger barbs, mollies, cichlids, tetras, and guppies) I also have a small aquatic turtle. Block the DBA – pg. 68
Robin Back Create Maintenance Job with a Click without using a Wizard – pg. 117
Ryan Randall Aged 28, from London. Just left my first job after Uni after a 6 year stint, most recently managing the development team for a group of financial companies. Currently having a much needed rest teaching myself some new stuff before getting back into it. Creating a Script from a Stored Procedure – pg 125
Sean Burke Sean has been working with databases for over 15 years, and has developed many custom solutions for dozens of high profile companies. While he thoroughly enjoys working with the latest and greatest SQL Server version, he still has a strange affinity for early DOS based database products like R:Base. As an intermediate VB programmer, he is an staunch advocate for understanding the fundamentals of the database platform upon which your application relies for its data. He is currently the CIO for Hancock Information Group in Longwood, FL, and is passively pursuing MCDBA certification. Introduction to English Query and Speech Recognition – pg 198
Sharad Nandwani Best Practices in an Adhoc Environment – pg 20
Stefan Popovski Replacing BCP with SQLBulkLoad – pg 64
Steve Jones My background is I have been working with computers since I was about 12. My first "career" job in this industry was with network administration where I became the local DBA by default. I have also spent lots of time administering Netware and NT networks, developing software, managing smaller IT groups, making lots of coffee, ordering pizza for late nights, etc., etc. For those of you interested (or really bored), you can check out my resume.
Autoclose for Databases – pg 14
Tom Osoba Building Business Intelligence Data Warehouses – pg. 54
Viktor Gorodnichenko Monitoring Performance – pg. 102
14
ADMINISTRATION This is what we do, administer servers and databases. Everyone has their own set of tricks, tips, scripts tailored to the quirks of their own systems. We can each get sidetracked into dealing with our own systems and miss out on understanding some other sections of SQL Server that we don’t work with. Here’s a selection of articles to impart a little more information about the server, Autoclose, AWE, Traces and more. In 2003, Microsoft has a very mature product that is advancing as hardware grows, loads increase, and different stresses occur. Nothing earthshattering here, just some good information that might help you save the day. Auto Close and Auto Shrink - Just Don't
Mike Pearson
13
Autoclose for Databases
Steve Jones
14
Autoclose for Databases - Part II
Jon Reade
16
AWE Adventures
Joseph Sack
18
Best Practices in an Adhoc Environment
Sharad Nandwani
20
Finding Real Dependencies
Stefan Popovski
22
Tips for Full-Text Indexing/Catalog Population/Querying in SQL 7.0 and 2000
Jon Winer
24
Getting Rid of Excessive Files and Filegroups in SQL Server
Chad Miller
27
Importing And Analyzing Event Logs
Gheorghe Ciubuc
31
Initial Installation of the Production Database
Andy Jones
33
Scheduling SQL Server Traces - Part 2
Rahul Sharma
35
SQL Server Upgrade Recommendations and Best Practices - Part 1
Jeremy Kadlec
37
Who Needs Change Management?
Greg Robidoux
42
15
Auto Close and Auto Shrink - Just Don't Mike Pearson 5/5/2003
I was on-site with a client, who had a server which was performing very sluggishly. It was a beefy brute with heaps of memory and processing power, so clearly something was just not what it should have been. For me step 1 in doing any sort of trouble-shooting is to look at the logs. Yup, always a good place to start because the problem was ticking over at 5-8 entries per second… Starting up Database ‘Tom’ Starting up Database ‘Dick’ Starting up Database ‘Harry’ So, what’s the problem then? Well, before I answer that, you can find these properties either by looking at the ‘Options’ tab of your database properties, or by running SELECT DATABASEPROPERTYEX('DatabaseName','IsAutoShrink') GO SELECT DATABASEPROPERTYEX('DatabaseName','IsAutoClose') GO If the option is ‘True’ (the T-SQL statement will return 0=false or 1=True), then there’s a performance hit just looking for a place to happen.
Auto_Close When Auto_Close is set to ON/TRUE, the database is closed and shut down when all processes in the database complete and the last user exits the database, thereby freeing up the resources held by that database. When a new connection calls the database again, it automatically reopens. This option is set to ON/TRUE when using the SQL Server Desktop Edition, but is set to OFF/FALSE for all other editions of SQL Server. The problem is that most servers sit behind applications that are repeatedly opening and closing connections to your databases, so the overhead of closing and reopening the databases between each connection is, well, “performance abuse”. The amount of memory that is saved by this is insignificant, and certainly does not make up for cost of repeatedly initializing the database. Admittedly, this option may have advantages on personal desktop scenarios as (when they are closed) you can treat these database files as any other files. You can move them and copy them, or even e-mail them to other users. However, when it comes to a proper server environment these points are fairly irrelevant. So as far as Auto_Close is concerned, don’t even be tempted. Just Don’t.
Auto_Shrink The auto_shrink option has its uses in scenarios such as development servers and the like where disk space resources are usually limited and hotly contested, but (there’s always a ‘but’) there is a performance cost. Shrinking a database hogs the CPU and takes a long time. Plus, any indexes on the heaps (a table without a clustered index) affected by the shrink must be adjusted because the row locators will have changed. More work for the CPU. Like Auto_Close, this option is set to ON/TRUE for all databases when using SQL Server Desktop Edition, and OFF for all other editions, regardless of operating system. When this option is set to ON/TRUE, all of a database's files are marked for shrinking, and will be automatically shrunk by SQL Server. This option causes files to be shrunk automatically when more than 25 percent of the file contains unused space. Not a wise option for your production systems which would suddenly suffer a performance hit when SQL decides it’s shrink-time. So, again – just don’t. Here is a quick script which will run on SQL2000 giving you a list of your databases with the status of these options. SET NOCOUNT ON SELECT [name] AS DatabaseName , CONVERT(varchar(100),DATABASEPROPERTYEX([Name] , 'Recovery')) AS RecoveryType
16
, CONVERT(varchar(10),DATABASEPROPERTYEX([Name] , 'IsAutoClose')) AS AutoClose , CONVERT(varchar(10),DATABASEPROPERTYEX([Name] , 'IsAutoShrink')) AS AutoShrink FROM master.dbo.sysdatabases Order By DatabaseName
Autoclose for Databases Steve Jones 2/10/2003
Introduction Let's start with an easy one? What is Autoclose? Autoclose is one of the options that you can set for a database, along with autoshrink, auto create statistics, auto update statistics, etc. This option basically "closes" the database file whenever the last user disconnects from the database. The resources are freed up, but when a user connects to the server, the database is reopened. Note that I said database, not server. It is also an option that you should NEVER set on a production database. Hmmmm, he said "never". As a general rule, I'm distrustful of someone who says "never". In this case, however, take my word for it. You never want to set this on a production database. In fact, I'm struggling to find a reason why you would ever want to set this, but let's take a look at what this option does: Normally when SQL Server boots, it opens each .mdf and .ldf file for all the databases, the databases are checked and some small amount of resources are consumed to keep track of the database files, handles, etc. I decided to then set the database option for Northwind to autoclose (SQL Server 2000 Standard). I next checked the SQL Server error log and found that there were a bunch of entries that all said "Starting up database 'Northwind'". Now I ran sp_who2 to verify there were no users in Northwind. Wait a minute and connect to the server with Query Analyzer. Even though I have the Object Browser open, no queries are made of Northwind until I actually select the database. I next select the Northwind database in the drop down and re-check the errorlog in Enterprise Manager (requires a refresh). I see 4 more entries that say "Starting up database 'Northwind'". I disconnect and recheck the errorlog, no entries. I had expected a "close" message, but none appeared. I checked the error logs and no entries were there either. I next ran Handle from SysInternals to check for file handles open. I saw all my .mdf and .ldf files open by the sqlsrvr.exe process. Reconnect and select Northwind, re-run Handle and sure enough, there are the Northwind files being held open by sqlsrvr.exe. I repeat this a few times and verify that when I disconnect, the file handles are closed. I even open two connections and verify that the database opens on the first connection (nothing happens on the 2nd) and stays open when I close the first until I close the 2nd connection. If you search Books Online, you will get 6 results, all of which have to do with setting or checking the option. No guidance is given to strategies for using this option. A search on Technet returns the same BOL options along with a few bugs. One thing to note is that in the desktop edition, this option is true by default ( set in model), but false in other editions. I guess this is to save resources on a desktop. It also allows you to easily move and copy the file when the database is not in use (mentioned in Ch 4 - Pocket Admin Consultant)(1). Course, don't know about you, but on my servers, if I move a db file (mdf, ldf), I usually have problems when I start the server backup or access the database. This is a strange option to me and I find myself wondering why Microsoft ever implemented it, every time I find it set. After all, how many resources can an open database hold? Since SQL Server tends to grab a large amount of memory, it's hard to see if memory changes with this option being set. I decided to run a few experiments. On my test server, the SQL Server process, sqlsrvr.exe, has 154,732kb in use. This is the steady state in general, with a few other things on this server. If I set Northwind to Autoclose on, then the memory usage drops to 154,672kb immediately. When I connect with QA to master, I see memory usage jump to 154,680. 8kb added for my connection, which is what I expect. I then select the "Northwind" database. Memory moves to 154,884, but when I change back to master, the memory is still in use by SQL Server. I disconnect and memory drops back to my baseline of 154,672kb. I repeat this a few times, adding some queries in there and while the memory values change (seem to fluctuate by about 20kb as a starting point), I don't see the memory usage increase when I select
17
Northwind. I know this isn't the most scientific test, but I don't see that many resources being used. I wonder if a large database, > 1GB, would show similar results and I hope to get some time on a production system to test this over the next few months along with some more in depth analysis, but for now, I'll repeat my advice. DO NOT SET THIS ON A PRODUCTION SYSTEM. In addition, there were some issues in SQL Server 7. Q309255 confirms that Autoclose may cause a stack dump in SQL Server 7. The fix? Turn it off. I did find a Q&A at http://www.microsoft.com/sql/techinfo/tips/administration/autoclose.asp that gave a mention that it is used for databases that are rarely used, but in general it should not be used, I guess. If the database isn't used much, it probably isn't taking many resources and isn't worth setting this. If you search Google, you'll find quite a few people who have recommended you avoid this option entirely. I concur and remind you to double check all your servers and shut this option down. Steve Jones ©dkRanch.net January 2003
Autoclose for Databases - Part II Jon Reade 3/12/2003
Introduction I was answering a question posed by a trainee DBA, who asked about an odd error she was getting when trying to create a new database – one I'd not experienced before. The error was: Server: Msg 1807, Level 16, State 2, Line 1 Could not obtain exclusive lock on database 'MODEL'. Retry the operation later.
Looking this up in Books Online doesn't help much, nor was there much out on the web. So I started investigating… As you may know, when SQL Server creates a new database, it uses the model database as a "template", which determines the data structures, initial file sizes and a few other things, for the new database that is being created. Whether you use Enterprise Manager, or the T-SQL CREATE DATABASE command (which is what executes in the background when you use the Enterprise Manager GUI to create a new database), SQL Server attempts to obtain an EXCLUSIVE lock on the model database. Presumably this is to prevent other processes from updating the model database's schema whilst the new database is being created, so that the new database's schema is in a known, consistent state. You can see this behavior in progress : • Open a Query Analyzer window and select the model database in the drop down. • Create a new database, either in Enterprise Manager or with the CREATE DATABASE command, and click on OK or execute it. • Open a new window in Query Analyzer, and execute an sp_lock - you'll see an 'X' in the 'mode' column, against the model database's database ID (I guess this is 3 on all
18
installations, but execute a select name, dbid from master..sysdatabases if you want to check this) You'll also get the Error 1807 message which sparked off this article. However, through trial and error, I found that if you have even a single connection open to the model database, it is not possible for SQL Server to obtain this exclusive lock. This can be caused by something as simple as having the model database selected in the database drop-down in Query Analyzer. This prevents the CREATE DATABASE command from creating a new database. Another reason is that if you have previously opened the model database in Enterprise Manager, then closed it, the connection to the database remains open, which means that the Create Database command cannot acquire the exclusive access lock that it requires to execute successfully. Not so bad if you've just done it, but how about if it was opened and closed three months back?
Solution? What has this got to do with Auto Close option? Well, if you have configured model to 'Auto Close' after opening it, then it will close, at least in Enterprise Manager, and prevent you from experiencing this error. So it might be very tempting to set Auto Close just on model to avoid encountering error 1807. But don't reach for that mouse just yet. Here's the real gotcha : Remember we said that SQL Server uses model as a template for every new database? Well, that includes all of the database options – including Auto Close. So if you set the Auto Close option to be on for the model database, every new database you create will inherit the Auto Close option – which as Steve Jones' original article pointed out, is not what we want.
Conclusion If you experience error 1807, remember that it's probably an open connection to the model db that's causing it. Drop the connection and try again. But don't be tempted to set the Auto Close option on model – at some point you'll forget you did it and all of your new databases will have it set, unless you manually reset it for each of them. As Steve said in his original article : "If the database isn't used much, it probably isn't taking many resources and isn't worth setting this." It's not – so don't. Jon Reade © Norb Technologies, March 2003.
AWE Adventures Joseph Sack 4/16/2003
Introduction Recently, 4GB of physical RAM was added to a SQL Server 2000 Enterprise edition instance I support. This brought the total physical RAM available on the machine up to 8GB. By using Windows 2000 Address Windowing Extensions (AWE), with SQL Server 2000 Enterprise or Developer Edition, on Windows 2000 Advanced Server or Windows 2000 Data Center, SQL Server can take advantage of physical memory exceeding 4GB of physical RAM. Although I had read a few articles about the AWE configuration process, this was the first time I had ever actually enabled this feature. After I completed the configuration, I discovered a few behaviors I had not read about, as well as configurations that could have caused issues had they not been addressed. In this article I will detail how I enabled the AWE functionality, as well as what behaviors I believe one should be aware of.
19
This scope of this article details the configuration of AWE for SQL Server 2000 Enterprise on a Windows 2000 Advanced Server machine. Configuring AWE on Windows 2000 Data Center, I’m assuming, is quite similar to configuring it on Windows 2000 Advanced Server, but as I have not performed such an operation, I will not address it here. Also, this article assumes you are using a single instance machine. Multiple instances and AWE settings require special planning not discussed here.
Why use AWE? Prior to adding the additional 4GB of memory, the application running against this particular SQL Server instance was experiencing significant I/O spikes throughout the day, and was running under maximum memory conditions. The buffer cache and procedure cache utilization was always 100%, with the procedure cache often being starved for memory. After adding the additional memory, and enabling AWE, I saw the I/O spikes decrease significantly. The extra memory allowed both the buffer cache and procedure cache to grab a sufficient amount of memory needed for the application queries (I’ll be writing another article describing how you can monitor such information). The bigger buffer decreased the number of times that read and write operations needed to read from disk. Keep in mind that extra memory will not solve all I/O and memory issues. The performance outcome after configuring AWE will vary depending on your application activity, read/write ratio, network throughput, hardware components (CPU, RAID settings), and database size.
Configuration Steps 1. Assuming 8GB of physical memory, after adding the extra RAM, and prior to rebooting the server, your boot.ini should contain both the “/3GB /PAE” switches. Not having /3GB in your boot.ini will translate to 2GB of RAM reserved for the operating system, instead of 1GB remaining free with the “/3GB” switch. The “/PAE” switch is required if you want SQL Server to support more than 4GB of RAM. 2. Make sure that the SQL Server service account has been granted “Lock Pages in Memory”) privileges. Just because your service account is a member of the administrators group does NOT mean that it has this policy setting already. I configured this setting by selecting Start | Run | and typing gpedit.msc. I selected OK to launch the Group Policy editor. I expanded Computer Configuration | expanded Windows Settings, Security Settings, Local Policies, and then clicked User Rights Assignments. In the Policy pane (on the right), I double clicked “Lock pages in memory”, and added the SQL Server service account used to run the SQL Server service. For Domain member machines, be sure that no security policies at the site, domain, or organization unit overwrite your policy change. Also, the policy change does not affect permissions of the service account until the SQL Server service is restarted. But do not restart the service yet! 3. In Query Analyzer, connected as sysadmin for your SQL Server instance. Enable AWE by executing the following script: sp_configure ‘show advanced options’, 1 RECONFIGURE GO sp_configure ‘awe enabled’, 1 RECONFIGURE GO This setting does not take effect until you restart the SQL Server instance – but do not do it yet – there is more! 4. Once AWE is enabled, SQL Server will no longer dynamically manage memory. SQL Server will grab all available physical memory, leaving 128MB or less for the OS and other applications to use. This underscores the importance of setting a max server memory amount that SQL Server should be allowed to consume. Determine this upper limit based on memory consumption of other applications on your server. Also note that a lower limit (min server memory) is no longer relevant in the context of AWE. In this example, to enable 7GB as the maximum SQL Server memory allowed to be consumed, issue the following command: sp_configure ‘max server memory’, 7168
20
RECONFIGURE GO sp_configure ‘show advanced options’, 0 RECONFIGURE GO 5. NOW reboot your machine (assuming you have not rebooted since reconfiguring the boot.ini file). If you have already rebooted after configuring the boot.ini file, you need only restart the SQL Server instance. 6. After the restart, check the SQL Log in Enterprise Manager right away. The most recent startup log should contain the words “Address Windowing Extensions enabled” early in the log. If you didn’t do it right, the log should say, “Cannot use Address Windowing Extensions because…”. The reasons for this message will be noted, such as not assigning “lock pages in memory”.
After the configuration AWE awareness is not built in to all Windows 2000 tools, so here are a few areas you should be aware of when monitoring memory utilization… The Windows Task Manager’s “Processes” tab tells a misleading tale about how much memory the SQLSERVR.EXE process is using. I was alarmed to see that, after a few hours, the process was still just consuming 118MBs, versus the maximum 6.5GB I configured it for. For a reality check, within the Windows Task Manager, switch to the Performance tab and check out the Available physical memory. This amount should be the total memory available less the maximum amount you set for SQL Server, along with other applications running on your instance. If you use Performance Monitor (System Monitor), keep in mind that for the SQLServer:Memory Manager object, that Target Server Memory (KB) and Total Server Memory (KB) counters will display the same number. This is because with AWE, SQL Server no longer dynamically manages the size of the memory used. It will consume the value of your ‘max server memory’. This memory will be made up of the physical RAM only, not the paging file. AWE memory can be monitored in Performance Monitor (System Monitor) using the Performance object “SQLServer:Buffer Manager” several AWE related counters. One last note on memory configuration… If you have not left enough RAM for other processes on the machine, do consider lowering the max server memory setting. Keep in mind that this change will not take effect until you restart the SQL Server service. In a future article, I will review how to take a look at the memory utilization internals, so you can better monitor how the allocated memory is actually being used.
Best Practices in an Adhoc Environment Sharad Nandwani 12/16/2003
In an environment where the developer has free access to the production servers, he can make unintentional mistakes which can result in server degradation and performance dipping down drastically over a period of time. The DBA needs to be aware of these common mistakes and take every precaution to monitor these mistakes, rectify the same and convey it back to the developer so that going forward the developers do not make such mistakes.
DATABASE CREATION The developer may choose to create a database with default options as provided by SQL Server. The default options have the initial size of a model database which may be very small and may result in creation of database file which needs to be expanded every few transactions while in production. The DBA should make sure that the developers who have admin access to the SQL Server are aware of the implications it can have on the production environment. The developer should be able to estimate the initial size of the database, to keep it free from overloading the server soon.
21
The default creation of the database also results in the file having unrestricted growth which leaves a lot of scope for fragmentation. Always ask your developers to have a maximum size for the database file; this will help in avoiding fragmentation. Keep a maximum size and have a small percentage set for increment of size. The recovery model is by default full, which may result in very large transaction logs over a period, if the backups are not scheduled on a regular basis or through SQL Server. The transaction log settings should be kept to simple, or as appropriate to your environment.
TSQL, DATABASE DESIGN Developers have a tendency to write "Select * from tblName" when they need to query just one or two columns resulting in more processor time, memory requirement and network traffic. The result can have a huge impact on the performance of the Server as well as the application. The developer or designer should make sure that the column data type should be 'varchar' rather then 'character'. This results in saving lot of memory and traffic across the network. Although it sounds very basic, one does come across many tables and database structures which do not have a primary key associated with them. Make sure that the Primary Keys always exist. The database designer has to strike a balance between normalization and the denormalized form of a design. At times the Database has to have the performance of a RDBMS and the flexibility of a warehouse. Once the database is in use, it will be good if a trace on Profiler can be used and the events be recorded, in order to fine tune the indexes, using Index Tuning Wizard. Make sure that the trace is done during the peak time and the Index Tuning Wizard is used in non Peak time. Developers often write stored procedures which have dynamic SQL. The developers should always try and avoid using dynamic SQL. The 'DROP' command in a stored Procedure should be avoided for dropping a table and should either be replaced by a 'truncate' command or an inline table operator for same. Another alternative can be a temporary table. The foreign key relation should exist for data accuracy and also to ensure that the attributes share the same data type across tables. A query which runs on separate data types can kill the system. The Developer should be aware of code that can result in deadlock. The objects should be accessed in the same order in different stored procedures or triggers. The transaction isolation level should be set to low wherever possible.
Finding Real Dependencies Stefan Popovski 12/18/2003
Do you remember the differences between SQL 6.5 and SQL 2000 about creating a procedure that calls another procedure that doesn't exist? Server 6.5 would not allow the procedure to be created when it depends upon a nonexisting procedure. On the other hand, SQL Server 7.0 and 2000 will allow the procedure to be created, and the SP_DEPENDS system procedure will not report correct results. If we run following script: USE Northwind go CREATE PROCEDURE AS exec proc2 GO CREATE PROCEDURE AS exec proc3 GO CREATE PROCEDURE AS exec proc4 GO CREATE PROCEDURE AS exec proc5 GO
proc1 proc2 proc3 proc4
22
CREATE PROCEDURE proc5 AS exec proc6 GO We receive sql messages: Cannot add rows to sysdepends for the current on the missing object 'proc2'. The stored procedure will still be created. Cannot add rows to sysdepends for the current on the missing object 'proc3'. The stored procedure will still be created. Cannot add rows to sysdepends for the current on the missing object 'proc4'. The stored procedure will still be created. Cannot add rows to sysdepends for the current on the missing object 'proc5'. The stored procedure will still be created. Cannot add rows to sysdepends for the current on the missing object 'proc6'. The stored procedure will still be created.
stored procedure because it depends stored procedure because it depends stored procedure because it depends stored procedure because it depends stored procedure because it depends
In sysdepends table will not exist dependenciesfor "proc(i) - proc(i+1)". We can check that with this statement which should yield zero records. select * from sysdepends where object_name(id) like 'proc%' So I can't trust system table sysdepends. However, sometimes I need real information about dependencies, especially between stored procedures, to get the real processing flow. So I developed a SQL statement to show stored procedure dependencies in one database by searching sysobjects and syscomments system tables. At first I'm creating a recursive function which will return sp text without comments. This function erases up to 160 line comments and 160 block comments, 32 nested levels of recursive function and five replacements in every function call. We can increase this number if we need it. CREATE Function funProcWithoutComments (@Input VARCHAR(4000)) RETURNS VARCHAR(4000) -- tuka da se proveri BEGIN DECLARE @Output VARCHAR(4000) DECLARE @i INT If @Input NOT LIKE '%--%' and @Input NOT LIKE '%/*%*/%' BEGIN SET @Output = REPLACE(@Input , CHAR(10) + CHAR(13) , '') RETURN @Output END ELSE BEGIN SET @input = @input + char(13) set @i = 1 while @i <= 5 begin IF charindex('/*',@Input) > 0 and charindex('*/',@Input, charindex('/*',@Input)) charindex('/*',@Input) + 2 > 0 BEGIN SET @Input = REPLACE( @Input, substring( @Input, charindex('/*',@Input), charindex('*/',@Input, charindex('/*',@Input)) - charindex('/*',@Input) + 2) , '') END set @i = @i+1 end set @i = 1 while @i <= 5
23
begin IF charindex('--',@Input) > 0 and charindex(char(13),@Input,charindex('--',@Input)) - charindex('--',@Input) +2 > 0 BEGIN SET @Input = REPLACE( @Input, substring(@Input , charindex('--',@Input), charindex(char(13),@Input,charindex('--',@Input)) - charindex('--',@Input) +2 ) , '') END set @i = @i+1 end SET @Output = dbo.funProcWithoutComments (@Input) END RETURN @Output END Then I find all the dependencies in the database with the following statement: SELECT so1.id as ID, so1.name As ProcName, dbo.funProcWithoutComments(sc.text) as ProcText into #T1 FROM sysobjects so1 inner join syscomments sc on so1.id = sc.id WHERE so1.type = 'P' and so1.name not like 'dt_%' ------------------------------------------------------------select left(#T1.ProcName,30), left(T2.DependOnProc,30) from #T1 inner join (select id, name as DependOnProc from sysobjects where type = 'P' and name not like 'dt_%') T2 on #T1.ID <> T2.ID WHERE #T1.ProcText LIKE '%' + T2.DependOnProc + '[' + char(9)+ char(10)+ char(13)+ char(32) + ']%' and CHARINDEX(#T1.ProcName, #T1.ProcText) <> CHARINDEX(T2.DependOnProc, #T1.ProcText,CHARINDEX(#T1.ProcName, #T1.ProcText)+1) order by 1,2 -------------------------------------------------------------drop table #T1 Running this statement in Northwind database will yield: ProcName DependOnProc --------------------------------proc1 proc2 proc2 proc3 proc3 proc4 proc4 proc5 In other words ProcName calls DependOnProc. The statement excludes system procedures with prefix 'dt_'. In addition these dependencies can be used to create a hierarchical report useful for documentation and error handling, especially when we use nested transactions. I will explain such examples in my next article.
24
Tips for Full-Text Indexing/Catalog Population/Querying in SQL 7.0 and 2000 Jon Winer 9/25/2003 This article is a brief summary of several tips & tricks I have learned through working with the Full-Text features in SQL Server.
Invalid Catalog Path: After installing SQL Server 2000 on a new machine and re-attaching the databases from tape back-up, I experienced some difficulties in getting the Full-Text Indexing to work properly. I received an error referencing an invalid location for the Full-Text catalog from within Enterprise Manger when I tried to run a full population of the catalog. The stored procedures for disabling and removing Full-Text indexes and catalogs did not resolve the issue. In poking around, I found the problem stemmed from the fact that on the previous machine, the catalog was located on the F: drive. On the new machine, there was no F: drive. The Full-Text Indexing wizard in SQL 2000 does not allow the user to alter the location of an existing catalog. It only lets the user create a new catalog. I tried to create a new catalog in a new location as a work-around, but because SQL could not resolve the erroneous location of the previous catalog, I could not complete the wizard. To fix the problem, I changed the SQL Server behavior to allow modifications to the system catalogs. I looked in the sysFullTextCatalogs table in the current database and changed the 'path' field value to the new location. (If the value is NULL, it means a path was not given at setup and the default installation path was used.) This allowed me to modify the Full-Text Indexing on the new machine. (Remember to change the server behavior back to its original setting.)
Incremental Population Discrepancies: Incremental Full-Text Index population in SQL 7.0 and 2000 exhibit different behaviors. The behavior of SQL 7.0's Full-Text catalog population is documented, but sometimes hard to find, so I wanted to discuss it here. On my SQL 7.0 machine, I scheduled an incremental population. When I checked in to see if it had run, it had not. In SQL 2000, I can schedule an incremental population and not have to worry about first running a full population. SQL 2000 knows if it is the first population or if it is a subsequent population. SQL 7.0 does not have this feature. If you schedule an incremental population without running a full population first, the incremental population will not run. And when you run your full-text query, it will return nothing. (Helpful hint: Make sure you have a field of type TimeStamp in your table. Without one, you cannot properly run an incremental population.)
Full-Text Querying Discrepancies: Issue 1 There are some differences between SQL 7.0 and 2000 in their Full-Text Querying capabilities. SQL 7.0 has limitations in the number of 'CONTAINS' clauses it can process at any one time (in my testing, it is around 16). I have not been able to find any specific documentation on this issue, but SQL 2000 does not seem to have this limit. Below is a brief reference from Microsoft on this issue: If you are using multiple CONTAINS or FREETEXT predicates in your SQL query and are experiencing poor full-text search query performance, reduce the number of CONTAINS or FREETEXT predicates or using "*" to use all full-text indexed columns in your query.
Issue 2 In SQL 7.0, if there are excessive grouping parenthesis (even though they match up), the query will hang. Even when the command timeout property is set, the query will hang past the command timeout value assigned, and you will receive an error message of 'Connection Failure -2147467259'. When the extra parentheses are removed, the query executes fine. In SQL 2000 the original query runs with no problems.
25
Issue 3 When a Full-Text query in SQL 7.0 contained a single noise word, I would receive the error 'Query contained only ignored words'. SQL 2000, handled the noise words and returned the query results. In SQL 7.0, I had to remove all noise words from the query for it to run successfully. Here is a recommendation from Microsoft pertaining to this issue: You also may encounter Error 7619, "The query contained only ignored words" when using any of the full-text predicates in a full-text query, such as CONTAINS(pr_info, 'between AND king'). The word "between" is an ignored or noise word and the full-text query parser considers this an error, even with an OR clause. Consider rewriting this query to a phrase-based query, removing the noise word, or options offered in Knowledge Base article Q246800, "INF: Correctly Parsing Quotation Marks in FTS Queries". Also, consider using Windows 2000 Server: There have been some enhancements to the word-breaker files for Indexing Services.
For more information on Full-Text Indexing and Querying, visit Microsoft MSDN.
Server Properties:
sysfulltextcatalogs table:(6)
26
Getting Rid of Excessive Files and Filegroups in SQL Server Chad Miller 2/11/2003 Recently I began supporting a database with 16 filegroups, which in and of itself is not an issue. However, this particular database is only 7GB in total used size and all of the 16 files were located on a single EMC symmetrix volume. Because of the use of file groups the database had expanded to a total of 15 GB unnecessarily doubling its size. Although there are legitimate reasons to use filegroups; In this situation 16 file groups were clearly excessive and did not create substantial value since all of the files were located on a single volume. Although it could be argued that filegroups can aid in recovering certain tables without restoring the entire database, this type of recoverability was not needed. If you buy quality physical disks and backup/recovery software you can avoid using filegroups entirely. So, I set out to remove 15 filegroups in favor of a single PRIMARY filegroup. I wanted to remove files/filegroups by using scripts, so began by creating scripts to move all objects to the PRIMARY filegroup: I began by backing up and restoring the existing production database to a QA environment, setting the recovery mode to simple, setting the default filegroup to primary and expanding the primary filegroup to be large enough to hold all database objects and accommodate index rebuilds/future growth: ALTER DATABASE MyDB SET RECOVERY SIMPLE GO ALTER DATABASE MyDB MODIFY FILEGROUP [PRIMARY] DEFAULT GO ALTER DATABASE MyDB MODIFY FILE (NAME = MYDB_fg_primary, SIZE = 10000MB, MAXSIZE = 20000MB, FILEGROWTH=500MB) GO Once this had been accomplished I scripted out all clustered indexes and non-clustered indexes for those tables with clustered indexes and then rebuilt those indexes on the PRIMARY filegroup. Tables without a clustered index or without any indexes must be handled differently. Drop the non-clustered indexes of tables with clustered indexes DROP INDEX TblWithClustIDX.IX_NonClustCol Drop the clustered index, which in this case is also the primary key constraint. ALTER TABLE TblWithClustIDX DROP CONSTRAINT PK_TblWithClustIDX GO Once the non-clustered and clustered indexes have been dropped rebuild the clustered and non-clustered indexes on the PRIMARY filegroup: ALTER TABLE TblWithClustIDX ADD CONSTRAINT PK_TblWithClustIDX PRIMARY KEY CLUSTERED (ClustCol) ON [PRIMARY] GO CREATE NONCLUSTERED INDEX IX_ NonClustCol ON TblWithClustIDX(NonClustCol) ON [PRIMARY] GO For tables without a clustered index, but with a non-clustered index you can move the data pages to the primary filegroup by dropping the existing non-clustered index and recreating the index as clustered. Then you can return the index to its original, nonclustered state by dropping and recreating again on the PRIMARY filegroup:
27
For example, TblWithClustIDXOnly does not have a clustered index, only nonclustered indexes. Drop existing nonclustered index, which in this case is a primary key. CREATE UNIQUE CLUSTERED INDEX PK_TblWithClustIDXOnly ON dbo.TblWithClustIDXOnly (NonClustCol) WITH DROP_EXISTING ON PRIMARY GO Drop the clustered index you’ve just created. ALTER TABLE TblWithClustIDXOnly DROP CONSTRAINT PK_TblWithClustIDXOnly GO Recreate the non-clustered index on the primary filegroup. ALTER TABLE TblWithClustIDXOnly ADD CONSTRAINT PK_TblWithClustIDXOnly PRIMARY KEY NONCLUSTERED (NonClustCol) ON PRIMARY GO For tables without indexes, you can simply move their data pages to the primary filegroup by selecting them into another database, dropping the existing table from the original and then selecting the table back into the original database. SELECT * INTO DBObjCopy.dbo.NoIndexesTable FROM NoIndexesTable GO DROP TABLE NoIndexesTable GO SELECT * INTO dbo.NoIndexesTable FROM DBObjCopy.dbo.NoIndexesTable GO At this point I thought I was done and could safely drop the files. However when I attempted to drop several of the filegroups SQL Server returned an error message indicating the file could not be dropped because it was not empty (SQL Server will not let you drop a file or filegroup if it is not empty). So I set out to determine which objects were still located on a filegroup other than the primary group. The undocumented stored procedure sp_objectfilegroup will list the filegroup for an object, provided you pass it the object_id, but I did not know the object_id plus I wanted to run an object to filegroup query for all objects. Using sp_objectfilegroup as a starting point and building on the query used by sp_objectfilegroup; I came up with a query to show all of the table/object names that are located on a filegroup other than primary: --Listing 1: T-SQL to Display objects and filegroups not on Primary Filegroup select TableNAME = o.name, ObjectName = i.name, i.indid, s.groupname from sysfilegroups s, sysindexes i,sysobjects o where i.id = o.id and o.type in ('S ','U ') --system or user table and i.groupid = s.groupid AND s.groupname <> 'PRIMARY' --indid values -- 1 clustered index -- 0 heap -- 255 text/image -- > 1 nonclustered index or statistic After I running the query I could see I had several hundred objects still located on the various filegroups. Since I’d taken care of the clustered indexes and the heaps (both with and without nonclustered indexes), the indid and name indicated the remaining objects were of two types: text/image columns and statistics. All but a few of them were system-generated statistics and the rest were text/image columns. SQL Server had created hundreds of system-generated statistics (all are named an _WA%), which appeared to be located on the same filegroup as the table’s original filegroup. I could simply drop all the system-generated statistics, however I didn’t want to take the performance hit of regenerating these statistics during production hours. So I created another script to drop all of the statistics and recreate them. When they were recreated they were then located on the primary filegroup. Listing 2: T-SQL to Drop and recreate all statistics SET NOCOUNT ON GO
28
create table #stat (stat_name sysname, stat_keys varchar(1000), table_name varchar(100)) Go DECLARE tab CURSOR READ_ONLY FOR SELECT table_name FROM information_schema.tables DECLARE @name varchar(40) OPEN tab FETCH NEXT FROM tab INTO @name WHILE (@@fetch_status <> -1) BEGIN IF (@@fetch_status <> -2) BEGIN insert into #stat(stat_name,stat_keys) EXEC sp_helpstats @name update #stat set table_name = @name where table_name is null END FETCH NEXT FROM tab INTO @name END CLOSE tab DEALLOCATE tab GO PRINT 'PRINT ''<<< DROPPING STATISTICS >>>''' GO select 'DROP STATISTICS ' + TABLE_NAME + '.' + STAT_NAME + ' ' + 'GO' from #stat GO PRINT 'PRINT ''<<
>>''' GO PRINT 'PRINT ''<<>>''' GO select 'CREATE STATISTICS ' + STAT_NAME + ' ' + 'ON MyDB..' + TABLE_NAME + '(' + STAT_KEYS + ')' + ' ' + 'GO' from #stat GO PRINT 'PRINT ''<<>>''' GO Once statistics were moved to the primary filegroup, all I had left to do was move the text/image columns. For this, and only this, portion I chose to use Enterprise Manager. Using the filegroup query in Listing 1, I identified 8 tables with text/image columns not located on the primary filegroup. To use EM to move text/image columns to a new filegroup go to Design Table >> Properties and select the new filegroup from the Text Filegroup drop down list. Since I had moved all tables, indexes and statistics to the primary filegroup by using scripts, I thought it would be nice to do the same with the text/image columns. Using profiler I traced the process of using Enterprise Manager to change text/image column filegroup. Unfortunately I discovered that SQL Server would drop and recreate the table in order to move the text/image column to primary filegroup. Although I could have accomplished moving to a new filegroup by backing up the table via “select into…” and dropping and recreating I felt it would be easier to just let Enterprise Manager do this and since it was only 8 tables I didn’t mind manually changing these columns via Enterprise Manager.
29
Once the text/image columns had been moved, I ran the query in Listing 1 and finally, there were no objects located on any filegroup other than the primary filegroup. Once again I attempted to drop the files and SQL Server again returned an error message indicating the files were not empty. I then shrank the files using DBCC SHRINKFILE and was finally able to the drop the files and filegroups. Mission accomplished! You’re probably wondering if this all was worth it? Yes, I could have used Enterprise Manager entirely instead of scripts, however going through 170 tables manually changing filegroups did not appeal to me. Also, because of production refreshes of QA, ongoing testing of filegroup moves and production implementation I would have to go through this whole process at least a half dozen more times. So, it did save me time. I would rather use testable, repeatable scripts to move changes into production instead of error prone labor-intensive processes utilizing Enterprise Manager.
Importing And Analyzing Event Logs Gheorghe Ciubuc 5/28/2003 Most DBAs have many tasks belonging to the System Administrator (SA) in a Windows 2K network – either there is a single person in an IT Department or in the case of a small company. One of these tasks is to read and analyze the Event Viewer Logs daily to see if there are any problems, errors etc. As we know, the operating system has a way of announcing to the SA when an special event appears in the system. Moreover, if we want to take a history of events we can save these logs in a text file (example: open Event Viewer, click Security Log, Action, Save Log File As…). The maximum log size is 512 K in Windows 2K that makes a text file with ~2,500 rows for reading. Let’s imagine a scenario: The company has 10 Windows 2K file servers .The network works but the logs are filled in 1 day. In this case, the SA has to read a file with ~25,000 rows to have a conclusion about how the machines are working. Like MS-DBA we can use SQL Server 2000 tools to make an image on this repository of Event Viewer events. The steps for this goal are: 1. Automate creating the text log file in Windows environment. 2. Run a scheduled DTS Package with following tasks: To copy the text file log in a SQL Server database insert a Transform Data Task:Text File with Source=Log file text(Ap2.txt) and Destination =SQL Server table (Ap2) with following design: Create Table [Ap2Rez] ( [Col001] [varchar] (255) , -- Date in Event Viewer Log [Col002] [varchar] (255) , -- Time in Event Viewer Log [Col003] [varchar] (255), -- Source] in Event Viewer Log [Col004] [varchar] (255), -- Type in Event Viewer Log [Col005] [varchar] (255) , -- Category in Event Viewer Log [Col006] [varchar] (255), -- EventID in Event Viewer Log [Col007] [varchar] (255), -- User in Event Viewer Log [Col008] [varchar] (255) , -- Computer in Event Viewer Log [Col009] [varchar] (456) ) . -- Description in Event Viewer Log To adjust SQL Server table resulted (Ap2) that has an anomaly (Col009 is too big and a part of it is introduced in Col001)
Insert an Execute SQL Task that runs a script (or a procedure) for append the rows in a table Ap2Rez2 with following design: [Ap2Rez2] ( [IDRow] [int] IDENTITY (1, 1) NOT NULL , [_Date] [datetime] NULL , --is Col001 + Co002
30
[_Source] [varchar] (255) , [_Type] [varchar] (255, [_Category] [varchar] (255), [_EventID] [int] NULL , [_User] , [_Computer] [varchar] (255) , [_Description] [varchar] (1000). --is
Col009 + Col001 just in case 3. Run a scheduled DTS Package to reprocess an Incrementally Updating OLAP cube made in following way: - Cube called EventViewer. - The Fact Table Source :Ap2Rez2 . - The Measure:Event Id with Aggregate Function Count. The structure of dimensions:
Shared Levels Dimensi ons
Member Key Column
Member Name Column
on Ap2Rez 2 Year
DatePart(year,"dbo"."Ap2Rez2"."_Date") DatePart(year,"dbo"."Ap2Rez2"."_Date")
Month
DatePart(month,"dbo"."Ap2Rez2"."_Date") convert(CHAR, DateName (month,"dbo"."Ap2Rez2"."_Date"))
Day
convert(CHAR,"dbo"."Ap2Rez2"."_Date", rtrim(convert(CHAR, DateName 112) (day,"dbo"."Ap2Rez2"."_Date")))+' th'
Hour
DatePart(hour,"dbo"."Ap2Rez2"."_Date") right(convert(varchar(19), "dbo"."Ap2Rez2"."_Date",0),7)
Time Comput Computer "dbo"."Ap2Rez2"."_Computer" er
"dbo"."Ap2Rez2"."_Computer"
User
User
"dbo"."Ap2Rez2"."_User"
"dbo"."Ap2Rez2"."_User"
Type
Type
"dbo"."Ap2Rez2"."_Type"
"dbo"."Ap2Rez2"."_Type"
After reprocessing we can have an image on network activity like this.
31
Normally, only the SA (DBA) can browse data . The DBA can use this cube to see if they can balance the SQL Server activity. For example: in replication a Distributor can be put on a Windows server with the lowest activity, or can be viewed from the unsafe Windows server that can affect SQL Server databases. I think that some questions can be asked about using this cube in a network: 1. How can we build an OLAP cube to see the track of an illegal attack in a network? (I suppose it can be linked on the order of events). 2. If a whole tools based on OLAP cube engine can be developed, can be it attached on a new version of Windows Operating System?
Initial Installation of the Production Database Andy Jones 2/4/2003
Introduction Your software has passed all (your only?!) testing phase(s) and it is time to install your database into production. I will outline below how I accomplish this task. This article is concerned with an evolving system i.e. you will perform an initial installation, but subsequent installations may be required for such things as customer change requests (no faults – your testing was perfect!) while retaining all data inserted since the application began use.
Scripts I create all my database objects from scripts and not the other way around. I never use Enterprise Manager (EM) to create a stored procedure then reverse engineer a script. If you perform unit testing against a database where you have created objects via EM, how can you guarantee that your scripts are consistent and that when you install to an external site you won’t introduce a bug? Aside from this, reverse engineering can sometimes produce scripts with ugly formatting which have poor readability. After unit testing the script, we then copy it to Visual SourceSafe (VSS) from where all version control is governed.
Testing Our software has the following testing cycle •
Unit testing (developer)
32
• •
Factory acceptance testing (FAT) (in-house test team) Site acceptance testing (SAT) (external test team)
For all test phases after unit testing I perform a full installation. The point is that your testing process is not only testing your software, but its installation too. Again, if you simply start FAT testing against your development database, you can not install to SAT with any confidence in your mechanism (objects missing out of build, necessary look up tables not populated etc…).
Initial installation After developing my first system, using SQL Server, I installed to production by simply restoring a backup of the test database. I now use a Windows command file to perform all installations, following the template from a previous excellent article by Steve Jones (Migrating Objects to Production); the file simply executes multiple scripts using the DOS command OSQL. I will outline below why I believe restoring a backup is the wrong approach.
Your library This is the main reason why I use this method. If you install from a backup you cannot guarantee you are installing what is in your library under source code control. What happens if you restore a backup, then for your first patch release you need to change a stored procedure? You check it out, make the change, test then install. Problem is, your script under version control was inconsistent with the version in the database you restored, and you have introduced a bug which causes another part of the system to fail with consequent down time. If you install from your scripts in the first place then test against that, you will eliminate any potential errors like these.
Reproducible You will need to perform the same installation time and again, for the test phases outlined above, and maybe multiple client sites have different versions of the same database. Surely it's better to have one command file which facilitates a completely re-producible build, which could be performed by anyone and has been pre-tested. If multiple people are performing a number of different installations by restoring a backup, can you be sure all installations are identical?
Documentation / Consistency Going back to the example above where you perform an initial installation, the system gets used for a bit, then one stored procedure needs to change following a change note. Presumably most people would perform this patch release by executing the one script into the database (via command file or not) – you cannot restore a backup in this case, as the customer would lose all the data they have entered already. If you had performed the initial release by the restore method, you would now have the situation where your two installations were by two different means. I prefer to have one consistent way to do things, which also makes documenting your procedures simpler if your company/client requires this.
Size of build I have found that, in a lot of cases, all the scripts required to produce the build will fit on a floppy disk, whereas taking a backup to install usually involves burning a CD. Not a great benefit here but it does make your life slightly simpler.
Commenting Using a command file allows you to add comments. This makes traceability better as you can document such things as who produced the build and the reason for it, etc.
Disadvantages The greatest disadvantage involved in this method is the overhead of creating the command file to execute the build. It’s less effort just to restore a backup of your test database straight into production. I believe the benefits outlined above offset this minimal effort which is required.
Conclusion This article outlines the methodology I use to perform my initial database release into production. How does everybody else perform this task? it’s always interesting to see how other people do things.
33
Scheduling SQL Server Traces - Part 2 Rahul Sharma 9/16/2003
This is the second part of the article on how to schedule traces using stored procedures in SQL Server 2000. The previous article was for SQL Server 7.0 SQL Profiler uses system stored procedures to create traces and send the trace output to the appropriate destination. These system stored procedures can be used from within your own applications to create traces manually, instead of using SQL Profiler. This allows you to write custom applications specific to the needs of your enterprise. In the case of SQL Server 2000, the server side traces are not done using the extended stored procedures anymore (as in SQL Server 7.0) but through system procedures which expose the underlying architecture used to create these traces. You can read more on that in BOL. In this article, I will walk you through some sample scripts that will illustrate how you can add: a) Tracing maintenance stored procedures to your DBA toolkit, and/or b) Add tracing capabilities in your application. There are so many events and data columns in SQL Server 2000 that sometimes it is very easy to get lost as to what you really want to trace for a particular scenario. What you can do is that you can maintain trace tables with data in it for the events and data columns for a given trace type and then at run-time select the trace type which will take in the specified values and create the traces for you. In the scripts mentioned below, please note the following: a) The trace types have been categorized into 7 main categories: 1 Slow running Queries. 2 General Performance. 3 Table/Index Scans. 4 Table/Index Usage. 5 Basic Trace for capturing the SQLs. 6 Locking/Deadlocking Issues. 7 Detailed Performance. b) The trace table is maintained in the tempdb database. You can change it to be maintained in a user database as well if you wish to. Otherwise whenever you re-start SQL Server, it will need to be re-created in the tempdb database (can be done using scripts for start-up as well). c) The USP_Trace_Info stored procedure does all the trace work and generates the trace file for a specified trace type. You can specify your filter criterias, trace names, and different parameter values as you would otherwise do through the SQL Profiler GUI tool. d) After running the scripts shown below, the explanation for the commands and what are the different options available can be obtained by just executing: exec usp_trace_info '/?', from Query Analyzer. This will display all the choices that are available to you. /********************************** Start of Scripts. **********************************/ /***************************************************************************** Trace_Scenario table: Contains the events and the data columns for the different Trace Scenarios Creating it in TEMPDB since this is not an application table and don't want this to hang around...
34
Can be created in the User Database(s) as well so that even when the service is re-started, it is available or it will need to be re-created every time the service is re-started. If more events and Data-Columns are needed, we can add/modify the values in here without even touching the trace templates. *****************************************************************************/ IF OBJECT_ID('TEMPDB.DBO.Trace_Scenario') IS NOT NULL DROP TABLE TEMPDB.DBO.Trace_Scenario GO /********************************************************************************** ******************* Different Trace Types: Trace_Type Description 1 Slow running Queries. 2 General Performance. 3 Table/Index Scans. 4 Table/Index Usage. 5 Basic Trace for capturing the SQLs. 6 Locking/Deadlocking Issues. 7 Detailed Performance. *********************************************************************************** ******************/ CREATE TABLE tempdb.dbo.Trace_Scenario (Trace_Type int, Trace_Description varchar(50), Events varchar(300), Data_Columns varchar(300), constraint pk_trace_scenario primary key (trace_type)) GO /********************************************************************************** ******************** NOTE: modify these enteries as per the finalized trace events and dala columns *********************************************************************************** ********************/ --Slow running queries insert into tempdb.dbo.Trace_Scenario (Trace_Type, Trace_Description, Events, Data_Columns) values (1, 'Slow Running Queries', '10,11,12,13,17,51,52,68', '1,2,3,6,8,10,11,12,13,14,15,16,17,18,22,25,26,40') --General Performance insert into tempdb.dbo.Trace_Scenario (Trace_Type, Trace_Description, Events, Data_Columns) values (2, 'General Performance', '75,76,16,21,22,33,67,69,55,79,80,61,25,27,59,58,14,15,81,17,10,11,34,35,36,37,38,3 9,50,11,12', '1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,3 1,32,33,34,35,36,37,38,39,40,41,42,43,44,') --Table/Index Scans insert into tempdb.dbo.Trace_Scenario (Trace_Type, Trace_Description, Events, Data_Columns) values (3, 'Table/Index Scans', '10,11,12,13,17,51,52,68', '1,2,3,6,8,10,12,13,14,15,16,17,18,20,21,22,25,26,31,40') --Table/Index Usage insert into tempdb.dbo.Trace_Scenario (Trace_Type, Trace_Description, Events, Data_Columns) values (4, 'Table/Index Usage', '10,11,12,13,17,48,58,68', '1,2,3,6,8,10,12,13,14,15,16,17,18,20,21,22,24,25,26,31,40') --Basic Trace for capturing the SQLs insert into tempdb.dbo.Trace_Scenario (Trace_Type, Trace_Description, Events, Data_Columns) values (5, 'Basic Trace for capturing the SQLs', '10,11,12,13,16,17,23,24,26,27,33,51,52,55,58,60,61,67,68,69,79,80', '1,2,3,6,8,9,10,11,12,13,14,15,16,17,18,20,21,22,24,25,26,31,32,35,40') --Locking/Deadlocking Issues insert into tempdb.dbo.Trace_Scenario (Trace_Type, Trace_Description, Events,
35
Data_Columns) values (6, 'Locking/Deadlocking Issues', '10,11,14,15,17,23,24,25,26,27,33,44,45,51,52,59,60,68,79,80', '1,2,3,8,10,12,13,14,15,16,17,18,22,24,25,31,32') --Detailed Performance insert into tempdb.dbo.Trace_Scenario (Trace_Type, Trace_Description, Events, Data_Columns) values (7, 'Detailed Performance', '53,75,76,60,92,93,94,95,16,21,22,28,33,67,69,55,79,80,61,25,27,59,58,14,15,81,17,1 0,11,34,35,36,37,38,39,50,11,12,97,98,18,100,41', '1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,3 1,32,33,34,35,36,37,38,39,40,41,42,43,44,') GO /********************************************************************************** **************** Stored Procedure: USP_TRACE_INFO *********************************************************************************** ****************/ USE master GO IF OBJECT_ID('TEMPDB.DBO.USP_TRACE_QUEUE') IS NOT NULL DROP TABLE TEMPDB.DBO.USP_TRACE_QUEUE GO IF OBJECT_ID('USP_TRACE_INFO') IS NOT NULL DROP PROC USP_TRACE_INFO GO CREATE PROC USP_TRACE_INFO @OnOff varchar(4)='/?', @file_name sysname=NULL, @TraceName sysname='Sample_Trace', @Options int=2, @MaxFileSize bigint=4000, @StopTime datetime=NULL, @TraceType int=0, @Events varchar(300)= -- Default values '11,13,14,15,16,17,33,42,43,45,55,67,69,79,80', @Cols varchar(300)= -- All columns '1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,3 1,32,33,34,35,36,37,38,39,40,41,42,43,44,', @IncludeTextFilter sysname=NULL, @ExcludeTextFilter sysname=NULL, @IncludeObjIdFilter int=NULL, @ExcludeObjIdFilter int=NULL, @IncludeObjNameFilter sysname=NULL, @ExcludeObjNameFilter sysname=NULL, @IncludeHostFilter sysname=NULL, @ExcludeHostFilter sysname='%Query%', @TraceId int = NULL AS BEGIN SET NOCOUNT ON IF @OnOff='/?' GOTO Help SET @OnOff=UPPER(@OnOff) IF (@OnOff='LIST') BEGIN IF (OBJECT_ID('tempdb..USP_trace_Queue') IS NOT NULL) BEGIN IF (@TraceId IS NULL) BEGIN DECLARE tc CURSOR FOR SELECT * FROM tempdb..USP_trace_Queue FOR READ ONLY DECLARE @tid int, @tname varchar(20), @tfile sysname OPEN tc
36
FETCH tc INTO @tid, @tname, @tfile IF @@ROWCOUNT<>0 BEGIN WHILE @@FETCH_STATUS=0 BEGIN SELECT TraceId, TraceName, TraceFile FROM tempdb..USP_trace_Queue WHERE TraceId=@tid SELECT * FROM ::fn_trace_getinfo(@tid) FETCH tc INTO @tid, @tname, @tfile END END ELSE PRINT 'No traces in the trace queue.' CLOSE tc DEALLOCATE tc END ELSE BEGIN SELECT TraceId, TraceName, TraceFile FROM tempdb..USP_trace_Queue WHERE TraceId=@TraceId SELECT * FROM ::fn_trace_getinfo(@TraceId) END END ELSE PRINT 'No traces to list.' RETURN 0 END -- Declare variables DECLARE @OldQueueHandle int -- Queue handle of currently running trace queue DECLARE @QueueHandle int -- Queue handle for new running trace queue DECLARE @On bit DECLARE @OurObjId int -- Used to keep us out of the trace log DECLARE @OldTraceFile sysname -- File name of running trace DECLARE @res int -- Result var for sp calls SET @On=1 -- Stop the trace if running IF OBJECT_ID('tempdb..USP_trace_Queue') IS NOT NULL BEGIN IF EXISTS(SELECT * FROM tempdb..USP_trace_Queue WHERE TraceName = @TraceName) BEGIN SELECT @OldQueueHandle = TraceId, @OldTraceFile=TraceFile FROM tempdb..USP_trace_Queue WHERE TraceName = @TraceName IF @@ROWCOUNT<>0 BEGIN EXEC sp_trace_setstatus @TraceId=@OldQueueHandle, @status=0 EXEC sp_trace_setstatus @TraceId=@OldQueueHandle, @status=2 PRINT 'Deleted trace queue ' + CAST(@OldQueueHandle AS varchar(20))+'.' PRINT 'The trace output file name is: '+@OldTraceFile+'.trc.' DELETE tempdb..USP_trace_Queue WHERE TraceName = @TraceName END END ELSE PRINT 'No active traces named '+@TraceName+'.' END ELSE PRINT 'No active traces.' IF @OnOff='OFF' RETURN 0 -- We've stopped the trace (if it's running), so exit -- Do some basic param validation IF (@Cols IS NULL) BEGIN RAISERROR('You must specify the columns to trace.',16,10) RETURN -1 END IF ((@TraceType=0) AND (@Events IS NULL)) BEGIN RAISERROR('You must specify either @TraceType or @Events.',16,10) RETURN -1 END -- Append the datetime to the file name to create a new, unique file name. IF @file_name IS NULL
37
begin SELECT @file_name = 'c:\TEMP\tsqltrace_' + CONVERT(CHAR(8),getdate(),112) + REPLACE(CONVERT(varchar(15),getdate(),114),':','') end else begin SELECT @file_name = 'c:\TEMP\' +@tracename + CONVERT(CHAR(8),getdate(),112) + REPLACE(CONVERT(varchar(15),getdate(),114),':','') end -- Delete the file if it exists DECLARE @cmd varchar(8000) SET @cmd='DEL '+@file_name EXEC master..xp_cmdshell @cmd -- Create the trace queue EXEC @res=sp_trace_create @TraceId=@QueueHandle OUT, @options=@Options, @tracefile=@file_name, @maxfilesize=@MaxFileSize, @stoptime=@StopTime IF @res<>0 BEGIN IF @res=1 PRINT 'Trace not started. Reason: Unknown error.' ELSE IF @res=10 PRINT 'Trace not started. Reason: Invalid options. Returned when options specified are incompatible.' ELSE IF @res=12 PRINT 'Trace not started. Reason: Error creating file. Returned if the file already exists, drive is out of space, or path does not exist.' ELSE IF @res=13 PRINT 'Trace not started. Reason: Out of memory. Returned when there is not enough memory to perform the specified action.' ELSE IF @res=14 PRINT 'Trace not started. Reason: Invalid stop time. Returned when the stop time specified has already happened.' ELSE IF @res=15 PRINT 'Trace not started. Reason: Invalid parameters. Returned when the user supplied incompatible parameters.' RETURN @res END PRINT 'Trace started.' PRINT 'The trace file name is : '+@file_name+'.' select @events = events, @cols = data_columns from tempdb.dbo.Trace_Scenario where trace_type = @tracetype -- Specify the event classes and columns to trace IF @Events IS NOT NULL BEGIN -- Loop through the @Events and @Cols strings, parsing out each event & column number and adding them to the trace definition IF RIGHT(@Events,1)<>',' SET @Events=@Events+',' -- Append a comma to satisfy the loop IF RIGHT(@Cols,1)<>',' SET @Cols=@Cols+',' -- Append a comma to satisfy the loop DECLARE @i int, @j int, @Event int, @Col int, @ColStr varchar(300) SET @i=CHARINDEX(',',@Events) WHILE @i<>0 BEGIN SET @Event=CAST(LEFT(@Events,@i-1) AS int) SET @ColStr=@Cols SET @j=CHARINDEX(',',@ColStr) WHILE @j<>0 BEGIN SET @Col=CAST(LEFT(@ColStr,@j-1) AS int) EXEC sp_trace_setevent @TraceId=@QueueHandle, @eventid=@Event, @columnid=@Col, @on=@On SET @ColStr=SUBSTRING(@ColStr,@j+1,300) SET @j=CHARINDEX(',',@ColStr) END SET @Events=SUBSTRING(@Events,@i+1,300) SET @i=CHARINDEX(',',@Events) END
38
END -- Set filters (default values avoid tracing the trace activity itself) -- Specify other filters like application name etc. by supplying strings to the @IncludeTextFilter/@ExcludeTextFilter parameters, separated by semicolons SET @OurObjId=OBJECT_ID('master..USP_TRACE_INFO') EXEC sp_trace_setfilter @TraceId=@QueueHandle, @columnid=1, @logical_operator=0, @comparison_operator=7, @value=N'EXEC% USP_TRACE_INFO%' IF @ExcludeTextFilter IS NOT NULL EXEC sp_trace_setfilter @TraceId=@QueueHandle, @columnid=1, @logical_operator=0, @comparison_operator=7, @value=@ExcludeTextFilter IF @IncludeTextFilter IS NOT NULL EXEC sp_trace_setfilter @TraceId=@QueueHandle, @columnid=1, @logical_operator=0, @comparison_operator=6, @value=@IncludeTextFilter IF @IncludeObjIdFilter IS NOT NULL EXEC sp_trace_setfilter @TraceId=@QueueHandle, @columnid=22, @logical_operator=0, @comparison_operator=0, @value=@IncludeObjIdFilter EXEC sp_trace_setfilter @TraceId=@QueueHandle, @columnid=22, @logical_operator=0, @comparison_operator=1, @value=@OurObjId IF @ExcludeObjIdFilter IS NOT NULL EXEC sp_trace_setfilter @TraceId=@QueueHandle, @columnid=22, @logical_operator=0, @comparison_operator=1, @value=@ExcludeObjIdFilter IF @IncludeObjNameFilter IS NOT NULL EXEC sp_trace_setfilter @TraceId=@QueueHandle, @columnid=34, @logical_operator=0, @comparison_operator=6, @value=@IncludeObjNameFilter IF @ExcludeObjNameFilter IS NOT NULL EXEC sp_trace_setfilter @TraceId=@QueueHandle, @columnid=34, @logical_operator=0, @comparison_operator=7, @value=@ExcludeObjNameFilter IF @IncludeHostFilter IS NOT NULL @columnid=8, @logical_operator=0, IF @ExcludeHostFilter IS NOT NULL @columnid=8, @logical_operator=0,
EXEC sp_trace_setfilter @comparison_operator=6, EXEC sp_trace_setfilter @comparison_operator=7,
@TraceId=@QueueHandle, @value=@IncludeHostFilter @TraceId=@QueueHandle, @value=@ExcludeHostFilter
-- Turn the trace on EXEC sp_trace_setstatus @TraceId=@QueueHandle, @status=1 -- Record the trace queue handle for subsequent jobs. (This allows us to know how to stop our trace.) IF OBJECT_ID('tempdb..USP_trace_Queue') IS NULL BEGIN CREATE TABLE tempdb..USP_trace_Queue (TraceId int, TraceName varchar(20), TraceFile sysname) INSERT tempdb..USP_trace_Queue VALUES(@QueueHandle, @TraceName, @file_name) END ELSE BEGIN IF EXISTS(SELECT 1 FROM tempdb..USP_trace_Queue WHERE TraceName = @TraceName) BEGIN UPDATE tempdb..USP_trace_Queue SET TraceId = @QueueHandle, TraceFile=@file_name WHERE TraceName = @TraceName END ELSE BEGIN INSERT tempdb..USP_trace_Queue VALUES(@QueueHandle, @TraceName, @file_name) END END RETURN 0 Help: PRINT 'USP_TRACE_INFO -- Starts/stops a Profiler-like trace using Transact-SQL server side stored procedures.' DECLARE @crlf char(2), @tabc char(1) SET @crlf=char(13)+char(10) SET @tabc=char(9) PRINT @crlf+'Parameters:'
39
PRINT @crlf+@tabc+'@OnOff varchar(3) default: /? -- Help' PRINT @crlf+@tabc+'@file_name sysname default: c:\temp\YYYYMMDDhhmissmmm.trc -Specifies the trace file name (SQL Server always appends .trc extension)' PRINT @crlf+@tabc+'@TraceName sysname default: tsqltrace -- Specifies the name of the trace' PRINT @crlf+@tabc+'@TraceType int default: 0 -- Specifies the type of trace to run (obtained from the Trace table: tempdb.dbo.Trace_Scenario)' PRINT @crlf+@tabc+'@Options int default: 2 (TRACE_FILE_ROLLOVER)' PRINT @crlf+@tabc+'@MaxFileSize bigint default: 4000 MB' PRINT @crlf+@tabc+'@StopTime datetime default: NULL' PRINT @crlf+@tabc+'@Events varchar(300) default: SP-related events and errors/warnings -- Comma-delimited list specifying the events numbers to trace. (Obtained from the Trace table: tempdb.dbo.Trace_Scenario)' PRINT @crlf+@tabc+'@Cols varchar(300) default: All columns -- Comma-delimited list specifying the column numbers to trace. (obtained from the Trace table: tempdb.dbo.Trace_Scenario)' PRINT @crlf+@tabc+'@IncludeTextFilter sysname default: NULL -- String mask specifying what TextData strings to include in the trace' PRINT @crlf+@tabc+'@ExcludeTextFilter sysname default: NULL -- String mask specifying what TextData strings to filter out of the trace' PRINT @crlf+@tabc+'@IncludeObjIdFilter sysname default: NULL -- Specifies the id of an object to target with the trace' PRINT @crlf+@tabc+'@ExcludeObjIdFilter sysname default: NULL -- Specifies the id of an object to exclude from the trace' PRINT @crlf+@tabc+'@TraceId int default: NULL -- Specified the id of the trace to list when you specify the LIST option to @OnOff' PRINT @crlf+'Examples: ' PRINT @crlf+@tabc+'EXEC USP_TRACE_INFO -- Displays this help text' PRINT @crlf+@tabc+'EXEC USP_TRACE_INFO ''ON'' -- Starts a trace' PRINT @crlf+@tabc+'EXEC USP_TRACE_INFO ''OFF'' -- Stops a trace' PRINT @crlf+@tabc+'EXEC USP_TRACE_INFO ''ON'', @file_name=''E:\log\mytrace'' -Starts a trace with the specified file name' PRINT @crlf+@tabc+'EXEC USP_TRACE_INFO ''ON'',@Events=''37,43'' -- Starts a trace the traps the specified event classes' PRINT @crlf+@tabc+'EXEC USP_TRACE_INFO ''ON'',@Cols=''1,2,3'' -- Starts a trace that includes the specified columns' PRINT @crlf+@tabc+'EXEC USP_TRACE_INFO ''ON'',@IncludeTextFilter=''EXEC% FooProc%'' -- Starts a trace that includes events matching the specified TextData mask' PRINT @crlf+@tabc+'EXEC USP_TRACE_INFO ''ON'',@tracename=''Receiving_50_Ctns'' -- Starts a trace using the specified name' PRINT @crlf+@tabc+'EXEC USP_TRACE_INFO ''OFF'',@tracename=''Receiving_50_Ctns'' -- Stops a trace with the specified name' PRINT @crlf+@tabc+'EXEC USP_TRACE_INFO ''ON'',@file_name = ''E:\log\mytrace'', -- Starts a trace with the specified parameters' PRINT @tabc+@tabc+'@TraceName = ''Receiving_50_Ctns'',' PRINT @tabc+@tabc+'@Options = 2, ' PRINT @tabc+@tabc+'@TraceType = 0,' PRINT @tabc+@tabc+'@MaxFileSize = 500,' PRINT @tabc+@tabc+'@StopTime = NULL, ' PRINT @tabc+@tabc+'@Events = ''10,11,14,15,16,17,27,37,40,41,55,58,67,69,79,80,98'',' PRINT @tabc+@tabc+'@Cols = DEFAULT,' PRINT @tabc+@tabc+'@IncludeTextFilter = NULL,' PRINT @tabc+@tabc+'@IncludeObjIdFilter = NULL,' PRINT @tabc+@tabc+'@ExcludeObjIdFilter = NULL' PRINT @crlf+@tabc+'To list all the traces currently running:' PRINT @crlf+@tabc+@tabc+'USP_TRACE_INFO ''LIST''' PRINT @crlf+@tabc+'To list information about a particular trace:' PRINT @crlf+@tabc+@tabc+'USP_TRACE_INFO ''LIST'', @TraceId=n -- where n is the trace ID you want to list' PRINT @crlf+@tabc+'To stop a specific trace, supply the @TraceName parameter when you call USP_TRACE_INFO ''OFF''.' RETURN 0
40
SET NOCOUNT OFF END GO /********************************** End of Scripts. **********************************/ Using these scripts, you can add tracing maintenance stored procedures to your DBA toolkit. And in case you wish to add tracing capabilities in your application, then you can pass a filter value for the spid for which you want to trace. You can find out the calling application program’s SPID by using @@SPID and pass that in as one of the filter values to the stored procedure so that the trace is generated for the activity done by that spid only. When you do that though, also make sure that you provide means of switching the traces off as well, by making another call to the stored procedure after the trace has been generated. So, these will be the steps in that case: a) Call the stored procedure with the appropriate trace type, passing in the @@SPID value as the filter. b) Do the application steps which will get traced and a trace file will be generated. c) Call the stored procedure again to turn off the trace. Using this approach you can easily achieve an effective way of tracing events and their data columns for different trace scenarios and add rich functionality of run-time traces to your application as well.
SQL Server Upgrade Recommendations and Best Practices - Part 1 Jeremy Kadlec 3/3/2003
Part 1 – Upgrade Overview andProject Planning This article is the first of a multi-part series detailing the SQL Server Upgrade process from the technical, logistical and business perspective. In the coming weeks, expanded articles will be published in the following areas: • • • • • • •
Part1 – Upgrade Overview and Project Planning SQL Server 6.5 and 7.0 Critical Upgrade Decisions and Redundant Upgrade Architecture SQL Server 6.5 and 7.0 Upgrade Checklist and Application Access Upgrades to SQL Server 2000 Upgrade from SQL Server 2000 to Yukon Sybase, Oracle and Data Upgrades to SQL Server 2000 Post SQL Server 2000 Upgrade Recommendations
Introduction– SQL Server Upgrades As the DBA in your organization, you are central to the success of the SQL Server environment. In the case of a system upgrade, you need to act as a ‘driver’ for an upgrade project to ensure success based on your technical expertise and role in the organization. Over this multi-part series, these articles will outline proven and recommended best practices for the upgrade process. This process is detailed from both technical and logistical perspectives which are both critical to the success of the project. Needless to say, upgrading to SQL Server 2000 can be a daunting task based on the criticality of the systems, level of coordination and technical planning. As such, the series of articles will provide valuable explanations, charts and graphics to best illustrate the points to assist you in the project. With this being said, be prepared to work with new team members, wear new hats and resolve challenging issues in the course of upgrading to SQL Server 2000. The motivation for this article is the realization that in many companies applications are in place, but the right tool for the job is not being leveraged. Too often, piece-meal applications are supporting business critical functions that cannot be leveraged to save time nor generate revenue. To further elaborate: •
Companies are still running SQL Server 6.5 and limping along by having IT staff spending hours resolving server down, corruption and data integrity problems with minimal user productivity
41
• • •
Microsoft Access has grown from a desktop database to a department of users that are severely stressing the database ultimately leading to corruption and frustration 3rd party Applications need to be upgraded in order to leverage new functionality released by the vendor and needed for the business Microsoft Excel is being used to run business critical functions and important data is scattered across the organization and is sometimes mistakenly lost
The bottom line contribution by the DBAs for the business is to improve efficiency and accuracy for the user community as well as save time and money for the business. The DBAs win by being able focus on more challenging IT projects on the latest and greatest technology. I am sure you can agree this is a WIN-WIN scenario for everyone involved.
Business Justification - SQL Server 2000 Upgrade For those companies that have not migrated existing servers to SQL Server 2000, the rewards certainly outweigh the effort. The level of effort may be moderate to high, but the overall platform stability and feature rich capabilities of SQL Server 2000 are unprecedented. As a DBA, your ultimate responsibility is to ensure your systems are available to support the business needs to include the proper platform to efficiently and accurately process the transactions in a cost effective manner. Below outlines the Business Justification to leverage SQL Server 2000.
BUSINESS JUSTIFICATION I D 1 2
3
JUSTIFICATION
SUPPORTING INFORMATION
Total Cost of Ownership3 System Performance3
1. 2.
Total Cost of Ownership (TOC) lower than any other DBMS in the market Unprecedented System Performance for both OLTP and OLAP environments
3.
Improved ability to scale up and out by leveraging expanded hardware resources
4.
1. As much as 64 GB of Memory and 32 Processors As SQL Server 6.5 ages, Microsoft is providing less support for the product and will eventually have few Support Engineers available to address critical needs
Microsoft Support
5. 4
Regulated Industry Requirements
6. 7.
5
DBA Support
8.
6
Level of Automation
9.
Currently, if you have a business critical issue with SQL Server 6.5, the typical Microsoft Support recommendation is to ‘Upgrade to SQL Server 2000’ Upgrading to SQL Server 2000 becomes especially important for companies in regulated industries that may require a several year data retention period Relying on SQL Server 6.5 for the short term may not be an issue because staff is familiar with the technology In five years, finding individuals to administer SQL Server 6.5 will be difficult and not attractive to DBAs who are typically interested in the latest and greatest technologies The level of automation from the SQL Server tool set 1.
Enterprise Manager
2.
Query Analyzer
3.
Profiler
4.
Data Transformation Services (DTS)
42
New Capabilities2
7
10. Analysis Services 11. DTS 12. XML Integration 13. Optimizer Enhancements 14. Functions 15. DBCC’s 16. Log Shipping 17. New Replication Models 18. Full Text Indexing 19. Database Recovery Models 20. Linked Servers
8
Third Party Products
21. SQL LiteSpeed – Compressed and Encrypted backups – www.sqllitespeed.com(1) 22. Lumigent Entegra – Enterprise Auditing Solution – www.lumigent.com/products/entegra/entegra.htm
23. Lumigent Log Explorer – Review and Rollback Database Transactions
-
www.lumigent.com/products/le_sql/le_sql.htm
24. Precise Indepth for SQL Server – Performance Tuning www.precise.com/Products/Indepth/SQLServer/
25. NetIQ SQL Management Suite – Enterprise Monitoring and Alerting www.netiq.com/products/sql/default.asp
Building the Upgrade Project Plan An Upgrade project that is critical to the business requires project planning in order to efficiently and accurately complete the project. Due to the sheer size of the project and the number of individuals involved, completing the project properly becomes more of a challenge. Although this can be challenging, as the DBA you are the cornerstone of the SQL Server environment. You can take on this project to benefit the company and showcase your skills to demonstrate that you can take on more responsibility. In order to break down the SQL Server Upgrade project, a DBA must: •
Identify the major project phases1
•
Expand the project phases to granular tasks in the proper sequence1
43
•
Determine time frame and responsibility per task1
•
Incorporate meetings, sign-off and hyperlinks to existing information into the plan1
•
Leverage a Project Management tool like Microsoft Project 2002 – For more information refer to – http://www.microsoft.com/office/project/default.asp
44
The next section of the article provides a fundamental outline of the Upgrade Project Phases for the SQL Server 2000 project which can serve as a starting point for the Project Plan.
Upgrade Project Phases In order to properly address the SQL Server 2000 Upgrade, it is necessary to set up a project plan with the necessary components for your environment. Below outlines a set of recommendations for the upgrade project plan. Can be further broken down with dates and time frames
UPGRADE PROJECT PHASES1 I PHASE 1 Requirements Analysis
DESCRIPTION 26. Setup a comprehensive Project Plan with tasks granular enough to assign to a single individual on the project 27. Hold a Kick-Off Meeting to properly start the project 28. Determine Upgrade Date and Time with the associated Downtime 29. Determine the Upgrade Freeze and Thaw Dates for Testing Purposes 30. Setup Roles and Responsibilities in order to establish Project Accountability 31. Submit a Change Management Request to notify key players in the corporation 32. Determine SQL Server Hardware Needs via Capacity Planning (disks, memory, processors, etc.)
2 Design and Development
33. Sign-Off – Requirements Analysis 34. Build an Upgrade Checklist to determine time frames and proposed processes to complete the Upgrade 35. Test the Upgrade Checklist and verify the results 36. Communicate the process to the team especially in terms of configurations
3 Functional, Integration, End User and Load Testing
37. Sign-Off – Upgrade Methodology 38. Setup a Test Environment to include the necessary SQL, Middle Tier and Web Servers as well as a Client PC; these machines should be configured as closely as possible to the Production Environment to ensure project success 39. Implement a Load Testing Tool 40. Build Test Plans for Functional, Integration, End User and Load Testing 41. Complete Functional, Integration, End User and Load Testing 42. Manage the Testing Exceptions until Completion for the Upgrade 43. Determine if Front End or T-SQL code must be applied prior to or following the upgrade in order to determine the code roll-out coordination 44. Update previously submitted Change Management request based on Testing results
4 Production Hardware Setup
45. Sign-Off – Testing 46. Server Assembly as well as Windows and SQL Server 2000 Installation 47. Configure, setup and burn-in the new hardware
5 Upgrade
48. Sign-Off – Production Hardware 49. GO | NO GO Meeting 50. Execute the Upgrade Checklist 51. Sign-Off SQL Server 2000 Upgrade 52. Monitor SQL Server Performance 53. Sign-Off – SQL Server 2000 Upgrade
Part 2 – Critical Upgrade Decisions and Redundant Upgrade Architecture In the coming weeks, the next article in the series will detail the Critical Upgrade Decisions related to ANSI NULLS, Quoted Identifiers, etc., as well as a valuable Redundant Upgrade Architecture for the project. These decisions can make or break the upgrade and require fore thought at the inception of the project. Further, find out
45
how to prevent management’s biggest fear during systems upgrades with a redundant architecture. Be sure to check it out!
Who Needs Change Management? Greg Robidoux 1/16/2003
The Story You’ve spent thousands of dollars on that cool technology; clustering, redundant controllers, redundant disks, redundant power supplies, redundant NIC cards, multiple network drops, fancy tape backup devices and the latest and greatest tape technology. You’re all set. There’s no way your going to have downtime. But one day something does go wrong; is it the disks, no way you’ve implemented RAID with hot swappable hard drives; is it the server, can’t be, you’ve got your servers clustered and any downtime due to failover would be so small that hardly anyone would even notice it. Well if it’s not your stuff it must be the network. Those guys are always making changes and not letting you know about it until there’s a problem. No, checked with them, no changes. What’s left? It must be the application. Once again that application group rolled out a new version and no one informed the DBAs. Once again foiled, the application team says no changes went out. A little investigation on the database and you’ve noticed that some of the create dates on a few stored procedures have yesterday’s date. I guess they could have been recompiled, but you didn’t do it. You take a look at one of the procedures and, lo and behold, someone was kind enough to actually put in comments. It was Dave, one of the DBAs in your group and guess what? He’s on vacation this week. It turns out he created a scheduled job to run this past Sunday and forgot to let you know about it. He changed a bunch of procedures for a load that occurs each month. You don’t know much about the load except that it runs and he is responsible. You have no idea why the change was made nor, if you undo the change, what effect it might have. To make things worse you try to find the old code, but you can’t find it anywhere. The heat starts to rise as the phones keep ringing with people wondering why the database is not responding. You start to panic trying to figure out what to do. This is a pretty critical database and a lot of people in your company need it to do their job. The last time something like this happened you caught hell from your boss, because her boss was breathing down her neck. Now you wish you were the one on vacation instead of Dave. You take a deep breath and think things through. You remember Dave sent you an email about the load, around this time last year when he went on vacation. You quickly do a search and you find the email. The email gives you steps on how to undo the load and what, if any, consequences you may face by undoing things. You go to a previous night's backup, do a database restore and script out the procedures. You’re taking a gamble that you’ve got the right procedures, but that’s your only course of action. After five or six hours of user complaints and a lot of sweating you’ve got the database back to normal again or at least you think so. You say to yourself, “I need a vacation and Dave’s dead meat when he gets back.”
The Solution Have you ever found yourself in this situation? Something gets changed and you don’t know about it until there’s a problem. Or someone makes a change and says “Don’t worry, it’s a small change. No one will even notice.” I think we have all found ourselves in these situations. The only way to fix things like this is to bolt down your servers and make the users sign their life away if they want to use your server. Not too likely, but it’ll work if you could get it implemented. I think we need to look for a solution in the middle of the road. Something that works for you as a DBA and something that works for the rest of the world. People just don’t understand how stressful your job really is. You’re the bottom of the totem pole, well maybe the NT Engineers are the bottom, but still you’re pretty close. All types of changes occur outside of your control and the only time you are aware is when something goes wrong. Well you might not be able to fix the things outside of your control, but you are the DBA, the master of the databases. In order to implement change control company-wide it takes a lot of effort, coordination, and buy-in
46
from a lot of people. But that doesn’t mean you can’t start with your own domain and put control mechanisms in place for the databases. So where do you begin?
Start Simple For most changes to take effect and have a quick payback, implementing things slow and steady is the way to go. Identify a critical database, kind of like the one Dave screwed up, and start to create guidelines around how you would handle changes for this database. If you try to make a master plan to solve all of the problems, you will never get it implemented. Create a small project plan or a task list of things you need to accomplish and take care of one item at a time. If something doesn’t work, adjust the plan accordingly.
Evaluate and tighten database security There are probably other reasons why you want tight database security, but have you ever thought about it from a change control perspective? The greatest process in the world won’t fix people from sneaking things into production. You might be able to catch and find out who did it, such as the change that DBA Dave did above, but at that point it’s too late. Take a look at your security settings and determine ways people might be able to make changes to your database that they shouldn’t. Start with the server roles and see who has access that really shouldn’t. Then take a look at the database roles. If you are using the standard database roles, see how you can create customized roles with limited access. Usually when things don’t work due to security access, the fix is to grant too much access. Now it’s your turn to reverse the tide.
Establish a release schedule Instead of making production changes on the fly, make changes on a periodic controlled basis. If people know that changes are to be made on a set schedule you set, they can adjust their schedule accordingly. It doesn’t mean that you can never put changes into production outside of this schedule, it just means that somebody better have a really good reason why they need something immediately, instead of waiting for the scheduled release. This may sound like it will slow down the development process and you know your users need those changes right away, but having a more controlled approach will actually benefit your end users as well. The users will know what’s coming and when it’s coming instead of finding out about a change after something that used to work no longer works.
Document and communicate It is very important to let people know what you have done or you are planning on doing. Verbal communication is great, but having things written down is even better. Generally when you tell someone something they usually hear what they want to hear and not the whole story. Giving them documentation allows them to later review what you have given them. I know documentation is probably not your favorite thing and it takes way too much time. It’s true, documentation does take time, but this is not because it is hard to do, it’s because most people put it off until the very end. At that point, instead of just having to fine-tune your documentation you have this mammoth task in front of you. Start small and document through the entire process instead of waiting until the very end. Also, make sure the documentation is meaningful to others. Have someone else take a look and provide feedback. Let them know the documentation is more for their benefit then yours, since you’re the one that implemented the change.
Define roles and guidelines In order for a change to be effective, people need to know the rules. You can’t just assume that since you think it is a great idea or that the process should work this way, others are going to feel the same way. Have you ever been involved in a company reorg? If so, I bet the one thing that you wanted to know was how it was going to affect you. Well this is kind of the same thing, just on a smaller scale. Define who can do what, your documentation needs, testing, signoffs, handoffs, etc… The better the roles are established the easier it will be to determine who is responsible and who needs to fix the problem if something goes wrong.
47
Always have a back out plan Whenever you move objects and configurations into production always make sure you have a way to back out your changes even if it is something as small as adding or changing one stored procedure. Your back out plan can be from as simple as restoring a backup, to having complex back out scripts and a coordinated effort with other key groups (i.e. Applications, NT Engineering, Networking). The back out plan is just as important as your roll out plan. You may never have to use it for an implementation, but the one time you do you’ll be glad you took the time to put it together.
Create a repeatable process Create a process that you can use over and over again for either the same application upgrade or for future projects. Take a look at all of the documents, processes, emails, etc… that you have used in the past and create a set of reusable documents that can be used for future projects. If you can recycle and reuse what you have already put together, it will simplify and streamline your procedures.
You start and then involve others If you really want to have more control over changes to your database, you need to first look at what you can do to get this done. You can’t keep blaming all those Developers or Users if you haven’t set the guidelines. After you have done your homework, then you can start to involve others. Take a look at the things that you have control over or the things you can bring to the surface that someone has to be aware of and manage. Face it, if you don’t do it, who will?
Convince others As a DBA this probably sounds good to you, but how do you convince others that will have to change the way they do things? Good question! • Past mistakes – Take a look at past mistakes and how a process like this will eliminate the issues from happening again. • Management – Find someone above you that will champion the cause and take it to the next level. • Find others that want a change – Find other people that see there has to be a better way and get them to join you in your effort. • Collaborate – Talk to other people in an informal manner. Get feedback from them, so you can address their concerns early in the process. This is a sure way to get them to feel like part of the solution instead of part of the problem.
Summary It may seem like a daunting task to put a change management process in place for your environment, but it’s not impossible. Many companies are already doing it through years of trial and error. The only way to get a better process in place is to start making changes to your existing process. You might need to just formalize what you already have or you may need a complete overhaul. Whatever it may be, start small and think big.
References Change Management for SQL Server Presentation by Greg Robidoux Published 01.10.2003 – Greg Robidoux – Edgewood Solutions. All rights reserved 2003
48
Who Owns That Database? Steve Jones 3/4/2003
Introduction Recently I was testing a new backup product, SQL Litespeed, and while setting up a stored procedure to backup my databases I ran into an interesting issue. Whenever I ran this statement:(0) insert #MyTempTable exec sp_helpdb I received an error about not being able to insert a null value into the table. This was slightly confusing as I'd declared the TempTable to accept null values. So I decided to dig in a bit further and execute only the sp_helpdb section. Surprisingly enough, this returned the same error, unable to insert a NULL value, column does not allow NULL values. Specifically the owner column. Hmmmmmm. After a moment of sitting there with a puzzled look on my face, I executed a simple: select * from master.dbo.sysdatabases to see where I might be having an issue. No NULLs in the sid column, which should map to the db owner. Now I was more annoyed than confused and needed to dig in further. Fortunately Microsoft allows us to examine the code behind the system stored procedures. Actually maybe they couldn't figure out a way to protect that code either (since no one I know has either), so you can look at it. In Query Analyzer, I browsed to the master database and selected the sp_helpdb procedure from the Object Browser. A quick right click and a script as create gave me the text of the procedure. Very quickly I zeroed in on this section of the code: /* ** Initialize #spdbdesc from sysdatabases */ insert into #spdbdesc (dbname, owner, created, dbid, cmptlevel) select name, suser_sname(sid), convert (nvarchar(11), crdate), dbid, cmptlevel from master.dbo.sysdatabases where (@dbname is null or name = @dbname) Since this is the area where the temp table is being populated and there is a column called "owner", I was guessing this was the problem area. And I was right. If I ran just the SELECT portion of this insert, I found that the owner column indeed did return a NULL value. For model and msdb! That didn't seem right. Check master, that has "sa" as the owner. Check a couple other SQL 2000 servers and they have "sa" as the owners. There is a "SID" for this column, so what is the issue? The issue is the suser_sname() function, which returns the name from the domain controller for this SID. In my case, however, the domain account was deleted, so there is no matching SID. As a result, the function returns NULL. OK, kind of interesting. How do I fix this? Well it turns out to be very simple. Though not as simple as I expected. My first thought was to use sp_changedbowner to change the owner to "sa". This procedure is run from the database and takes the name of an account to change to. No big deal, give it a try. It runs and returns Server: Msg 15110, Level 16, State 1, Procedure sp_changedbowner, Line 46 The proposed new database owner is already a user in the database. Not exactly what I was hoping for. A quick dig through Books Online confirmed that this is expected and either there is no easy workaround, or I'm not very good at searching Books Online. I'll let you decide which is more likely. I suppose I could have dropped sa, which is mapped to dbo for the databases, but that seemed risky to me and I wasn't really looking for a server outage to fix this little annoyance. Instead I decided to try a simple, albeit probably not always recommended technique. I know that the SID for the "sa" account is always 0x01, and I know that I can run a simple command that will allow me to update the system tables. My first test was on model because, well, it's not a critical database, and I know I can always grab model from another server and attach it here. I ran:
49
sp_configure 'allow updates', 1 reconfigure with override update sysdatabases set sid = 0x01 where name = 'model' sp_configure 'allow updates', 0 reconfigure with override and it worked. A refresh in Enterprise Manager confirmed that the owner of "model" was now "sa". I repeated this for "msdb" and sp_helpdb, my procedure, and my backups started running!
Conclusions Whew this was a fun day. Actually, the total time was only about an hour, but it is amazing how one little thing leads to another and you end up fixing some problems that you didn't know you had. Hopefully I've entertained you for a few minutes at my expense and stuck a thought in your head that you'll remember when you find a database with no owner. In the days of detach and attach, I'm sure you'll encounter this at some point. I welcome your comments, and look forward to reading them. Steve Jones ©dkRanch.net February 2003
50
DTS, ETL, and BI The acronym chapter dealing with the movement of data and large scale analysis, a world most SQL Server DBAs deal with relatively little. DTS, Data Transformation Services, first introduced in SQL Server 7.0, changed the way the ETL, Extraction, Transformation, and Load, industry built products forever. Once exclusively for large enterprises with big pockets, ETL moved down to every desktop developer and administrator working with SQL Server. A radical shift for the entire BI industry. BI, Business Intelligence usually deals with OLAP, Analysis Services, Cubes, and various other mechanisms for examining large amounts of data and drawing conclusions. Auditing DTS Packages
Haidong Ji
52
Automate DTS Logging
Haidong Ji
54
Building Business Intelligence Data Warehouses
Tom Osoba
56
Comparison of Business Intelligence Strategies between SQL and Oracle
Dinesh Priyankara
Portable DTS Packages
Kevin Feit
Replacing BCP with SQLBulkLoad
Stefan Popovski
58 61 64
51
Auditing DTS Packages Haidong Ji 10/6/2003 I received quite a few emails from readers who read my article on managing and scheduling DTS packages. Many readers like the idea of using a SQL Server login so that all packages are owned by the same account. Therefore a group working on the same project can edit each other's packages. However, many also asked the question of auditing. Sometime we want to know which person in a group that share the same SQL login edited the package. I think this can be useful in some situations. In this article, I will show you how we can create an audit trail for a DTS package. This method can provide information of who modified a package, when, and from what workstation. It is not only good for auditing DTS package changes, it can also be modified for auditing changes to other tables. In a lot of databases we manage, we all know that some tables are more important than others, such as tables for salary, social security number, etc. With very little modification, you can use this method to audit changes to those tables as well.
Background information When we are searching for ways to do auditing, we naturally turn to triggers. Triggers are a special class of stored procedure defined to execute automatically when an UPDATE, INSERT, or DELETE statement is issued against a table or view. Triggers are powerful tools that sites can use to enforce their business rules automatically when data is modified. We will use triggers to create an audit trail for DTS packages. As most of you know, DTS packages are stored in the MSDB database. Whenever a DTS package is created, a new record is created in the sysdtspackages table. Likewise, when a package is modified, a new record will be inserted also. The only difference is that this updated record retains the id of the package and generates a new versionid. Therefore we will just use the INSERT trigger. SQL Server automatically saves the old version of a package when it is being updated and saved. This gives you great flexibility of going back to the old version if needed. Since SQL Server keeps old versions of a package, there is no need for us to keep the before-and-after states of the packages. Therefore, to keep track of who made modifications at what time from where, we need to use data stored in the sysprocesses in the Master database. Among the many columns in the table, the following are of particular interest: spid (SQL Server process ID), hostname (Name of the workstation), program_name (Name of application program), cmd (Command being executed, not the full SQL statements though), nt_domain (Windows domain for the client), nt_username (Windows user name), net_address (Assigned unique identifier for the network interface card on each user's workstation, NIC card number), and loginname (SQL or Windows login name). How do we get that information from sysprocesses, you may ask. Fortunately, SQL Server provides a global variable of @@SPID. Based on @@SPID, we can find out who-is-doing-what-when-from-where from the sysprocesses table.
Parts, pieces, and putting it all together First off, we need to create the audit trail table. You might be tempted to create this table in MSDB. I personally am not fond of that idea. I usually leave all the system databases (Master, Msdb, Model, and Tempdb) alone. For one thing, it is not a good idea to mess around with system databases. For another, SQL Server patches, service packs, and hot fixes frequently update and modify system databases. You might lose anything you developed on those databases as a result. I have a database called DBA on every server that I manage. To the extent possible, all my admin-related activities, tables, and stored procedures are stored in this database. The DTS package audit table is no exception. However, INSERT authority needs to be granted to the login that makes DTS changes. Otherwise you will get an error. Below is the script for the audit table. if exists (select * from dbo.sysobjects where id = object_id(N'[dbo]. [DTSAuditTrail]')
52
and OBJECTPROPERTY(id, N'IsUserTable') = 1) drop table [dbo].[DTSAuditTrail] GO CREATE TABLE [dbo].[DTSAuditTrail] ( [DTSAuditID] [int] IDENTITY (1, 1) NOT NULL, [PackageName] [varchar] (100) NULL , [spid] [smallint] NULL , [dbid] [smallint] NULL , [status] [char] (60) NULL , [hostname] [char] (256) NULL , [program_name] [char] (256) NULL , [cmd] [char] (32) NULL , [nt_domain] [char] (256) NULL , [nt_username] [char] (256) NOT NULL , [net_address] [char] (24) NULL , [loginame] [char] (256) NULL , [AuditTime] [smalldatetime] NOT NULL ) GO ALTER TABLE [dbo].[DTSAuditTrail] WITH NOCHECK ADD CONSTRAINT [DF_AuditTrail_AuditTime] DEFAULT (getdate()) FOR [AuditTime] GO As you can see, this table has an identity field, a package name field, and a timestamp field. The rest are data from sysprocesses. Below is the trigger script on the sysdtspackages table within msdb database. One technique I want to highlight is the use of SCOPE_IDENTITY() function. SCOPE_IDENTITY() returns the last IDENTITY value inserted into an IDENTITY column in the same scope. Since the audit table has an identity field, I will use that to update the record with package name information. Using SCOPE_IDENTITY() makes the code look cleaner and simpler. We also save a few lines of code. Please note that, in my case, the audit table is in the DBA database. As I said earlier, you can put this table into msdb database. In that case, a slight modification of this trigger is needed. In any case, you want to make sure that the ID that modifies the DTS package has INSERT authority to the newly created audit table. CREATE TRIGGER [PackageChangeAudit] ON [dbo].[sysdtspackages] FOR INSERT AS --Declare a variable for package name declare @PackageName varchar(100) --Insert values into the audit table. These values come from master..sysprocesses --based on @@SPID insert into dba..DTSAuditTrail (cmd, dbid, hostname, net_address, nt_domain, nt_username, program_name, spid, status, loginame) select cmd, dbid, hostname, net_address, nt_domain, nt_username, program_name, spid, status, loginame from master..sysprocesses where spid = @@SPID --Get the package name select @PackageName=name from inserted --Update audit table with package name. Note SCOPE_IDENTITY( ) function is --used here to make the code cleaner update dba..DTSAuditTrail set PackageName = @PackageName where dba..DTSAuditTrail.DTSAuditID = SCOPE_IDENTITY( ) After the trigger is created, all modifications will be recorded into the audit table. You can search that table using package name, login ID, workstation name, and timestamp. Hopefully it can provide you with a good idea of changes made to the packages you manage.
Conclusion In this article, I showed you how to use @@SPID to create an audit trail for DTS packages. If you are interested in DTS automation like automating DTS logging. You can read this article I wrote last month.
53
Automate DTS Logging Haidong Ji 9/9/2003 Many DTS packages are written by developers who may not know much about SQL and/or SQL Server. With the popularity of DTS as an ETL tool increasing everyday, many SQL Server DBAs are called to debug and troubleshoot DTS packages that were poorly written and organized. One important tool to help this is DTS logging. A DBA can use package log to troubleshoot problems that occurred during the execution of a DTS package. The DTS package log, unlike SQL Server error log and the DTS exception log, contains information about the success or failure of each step in a package and can help determine the step at which a package failure occurred. Each time a package executes, execution information is appended to the package log, which is stored in msdb tables in SQL Server or in SQL Server Meta Data Services. You can save package logs on any server running an instance of SQL Server 2000. If a package log does not exist, the log will be created when a package is run. An executing package writes information to the package log about all steps in the package, whether or not an individual step runs. If a step runs, it will retain start and end times, and the step execution time. For steps that do not run, the log lists the steps and notes that the step was not executed. In addition, with the proliferation of packages on a server or servers, you can use DTS logging records to determine which package(s) is no longer used and get rid of orphaned packages. I'll probably write this in a different article. You can turn on DTS logging manually. However, it can be tedious and time-consuming, especially if you have many packages to manage. In this article, I will show you how to turn on DTS logging programmatically. Package logging is only available on servers running an instance of SQL Server 2000. As such, this article only applies to SQL Server 2000.
Manually turn on DTS logging One way to turn on DTS logging is to open the package in DTS designer, with no object selected within the package, click the property icon, or go to the Package menu and select Properties. The package property window (not property window of any individual component) will pop up. You can click on the Logging tab and fill out the relevant information to start DTS logging. See the attached image.
54
However, if you have many packages to manage, manually turning on each package logging can be tedious and time-consuming. That is why I wrote the following scripts to accomplish this task.
Use ActiveX scripts to turn on DTS package logging automatically With DTS package automation, we naturally turn to SQL-DMO. Using SQL-DMO, you can pretty much automate anything that is SQL Server related. The following code uses SQL-DMO to turn on package logging for a given server. You will need to create a package. Within the package, create an ActiveX task that has the attached code below. You then need to create 3 global variables (data type string) within this package: ServerName, SQLLoginID, and SQLLoginPassword. The variable names explain their purpose. After you give appropriate values to the three global variables, you are good to go. The key concept used here is the PackageInfos collection. The EnumPackageInfos method returns a PackageInfos collection containing information about all the packages stored in SQL Server 2000. We then use PackageInfos.Next method to walk through every package within the collection and turn on the logging property of that package. After running this task, all packages logging will be turned on. However, if you create this package on the same server with the other packages, this ActiveX package's logging property will not be turned on because it is in use. It cannot flip the logging button while it is open. Another thing you will notice is that the visual layout of various package components will change after this is run, but the components remain the same. '********************************************************************** ' Author: Haidong "Alex" Ji ' Purpose: To turn on package execution logging for each and every DTS ' packages in a given server. This is especially useful ' when there are many (hundreds) packages to handle. ' Note: 1. This script uses DTS global variables called ServerName, ' SQLLoginID and SQLLoginPassword; ' 2. ServerName defines the server whose DTS packages' ' execution logging you want to change. The other 2 DTS global ' variables' names explain their purposes. Change those ' variables' values to suit your specific needs ' 3. It seems that the layout of various Package component will ' change after this is run, but the components remain the same '************************************************************************ Function Main()
Dim oApplication Dim oPackageSQLServer Dim oPackageInfos Dim oPackageInfo Dim oPackage
' As DTS.Application ' As DTS.PackageSQLServer ' As DTS.PackageInfos ' As DTS.PackageInfo ' As DTS.Package
Set oApplication = CreateObject("DTS.Application") Set oPackageSQLServer = oApplication.GetPackageSQLServer(DTSGlobalVariables ("ServerName").Value, DTSGlobalVariables("SQLLoginID").Value, DTSGlobalVariables ("SQLLoginPassword").Value, 0) Set oPackageInfos = oPackageSQLServer.EnumPackageInfos("", True, "") Set oPackageInfo = oPackageInfos.Next 'Note: It is IMPORTANT that oPackage be instantiated and destroyed within the loop. Otherwise, 'previous package info will be carried over and snowballed into a bigger package every time
55
'this loop is run. That is NOT what you want. Do Until oPackageInfos.EOF Set oPackage = CreateObject("DTS.Package2") oPackage.LoadFromSQLServer DTSGlobalVariables("ServerName").Value, DTSGlobalVariables("SQLLoginID").Value, DTSGlobalVariables("SQLLoginPassword"). Value,DTSSQLStgFlag_Default , , , , oPackageInfo.Name oPackage.LogToSQLServer = True oPackage.LogServerName = DTSGlobalVariables("ServerName").Value oPackage.LogServerUserName = DTSGlobalVariables("SQLLoginID").Value oPackage.LogServerPassword = DTSGlobalVariables("SQLLoginPassword").Value oPackage.LogServerFlags = 0 oPackage.SaveToSQLServer DTSGlobalVariables("ServerName").Value, DTSGlobalVariables("SQLLoginID").Value, DTSGlobalVariables("SQLLoginPassword"). Value, DTSSQLStgFlag_Default Set oPackage = Nothing Set oPackageInfo = oPackageInfos.Next Loop 'Clean up and free resources Set oApplication = Nothing Set oPackageSQLServer = Nothing Set oPackageInfos = Nothing Set oPackageInfo = Nothing Set oPackage = Nothing End Function
Main = DTSTaskExecResult_Success
Conclusion In this article, I showed you how to use SQL-DMO to turn on DTS package logging. This is especially useful when there are many (hundreds of) packages to handle. For DTS package ownership and scheduling issue, please see a different article I wrote a while ago.
Building Business Intelligence Data Warehouses Tom Osoba 7/15/2003
Introduction Business intelligence can improve corporate performance in any information-intensive industry. With applications like target marketing, customer profiling, and product or service usage analysis, businesses can finally use their customer information as a competitive asset. They can enhance their customer and supplier relationships, improve the profitability of products and services, create worthwhile new offerings, better manage risk, and reduce expenses dramatically. In order to capture the power of business intelligence, a proper data warehouse needs to be built. A data warehouse project needs to avoid common project pitfalls, be business driven, and deliver proper functionality.
Data Warehouse Projects A data warehouse project is not immune to the pitfalls associated with any technology development project. Common mistakes can either immediately kill or severely cripple the project as it progresses through development. A data warehouse project typically is a consolidated solution between multiple systems that encourages data stewardship and advanced analytics form existing systems. However, it is possible to get trapped by the illusion that consolidating data will fix a problem or automatically answer a question. One mistake in a data warehouse project is the failure to define the business reason for the project. Are there problems with existing systems? Does the data need to be aggregated? Does one need point-in-time reporting? Critical questions need to be addressed before entering a project. Another common mistake is the assumption that a data warehouse load, often referred to as ETL (extract, transform, load) will fix source data. Although an ETL can be used to scrub existing source data, its general purpose is not to fix it. Dirty source data will contain missing, erroneous, or inconsistent data. “This is often not a problem for OLTP systems, but can be disastrous for decision
56
support systems.” (Mimno pg 4) Further, a data warehouse should not be considered the quick fix. A data warehouse should be viewed as part of an overall solution of data storage and reporting needs. A common mistake is the “failure to anticipate scalability and performance issues.” (Mimno pg 4) A data warehouse needs proper architecture, application configuration that would include RAID configuration and data normalization. RAID configuration constitutes how the data is placed across the disks of the server. In addition, it defines the performance and data redundancy of data stored on the server. Data normalization is a term that refers to data analysis technique that organizes data attributes such that they are grouped to form non-redundant, stable, flexible, and adaptive entities.
Business Driven Project Most companies enter a data warehouse project because they are experiencing problems with performance and integration from their source systems. The project should obtain proper executive sponsorship to help guide the project. A good solution is to have a proper business case approved and form a steering committee to gather requirements and guide the project from start to finish. In addition, users experience slower query response time coupled with the inflexible architecture form different systems. It is typical to see organizations have different departments start their own data warehouse project to solve their individual need. This starts the “stovepipe” problem common with data warehouse projects. The term stovepipe refers to the problem seen when common data is taken by different groups that don’t apply the same business rules and definitions. What happens is the same data is isolated between internal projects and produces a different result. It is common to see the Human Resources department come up with a different company headcount number than Finance. In addition to fractured data mart projects, data warehouses can lack the ability to properly change as users requests change. Without a proper development lifecycle, changes are not well managed and become over bearing to the project. The data warehouse is built for a problem now and does not focus on the organizations end goal. It soon becomes inflexible and unreliable as data can become corrupted with erroneous changes. Finally, a data warehouse project needs to meet the need of the business. The project should be geared toward answering a focused problem. Whether the business need is reporting, consolidation of source data, or archived storage. Mimno defines business driven as “define the business requirements first; next specify architecture; and finally select tools.”
A Business Intelligence data warehouse Business Intelligence provides a single, clean, and consistent source of data for decision-making. It allows executives and business analysts to perform financial analysis, variance analysis, budgeting analysis, and other types of reporting. It also allows the discovery of patterns and trends in data directly built from every day business actions. The primary goal of Business Intelligence is to increase productivity, reduce time, and reduce cost by establishing a single integrated source of information. Data warehouses exist to facilitate complex, data-intensive, and frequent ad hoc queries. Accordingly, data warehouses must provide far greater and more efficient query support than is demanded of transactional databases. The data warehouse access component supports enhanced spreadsheet functionality, efficient query processing, structured queries, ad hoc queries, data mining, and materialized views. In particular, enhanced spreadsheet functionality includes support for state-of-the-art spreadsheet applications (e.g., MS Excel) as well as for OLAP applications programs. These offer preprogrammed functionalities such as the following: 1. Roll-up: Data is summarized with increasing generalization (e.g., weekly to quarterly to annually). 2. Drill-down: Increasing levels of detail are revealed (the complement of roll-up). 3. Pivot: Cross tabulation (also referred as rotation) is performed. 4. Slice and dice: Performing projection operations on the dimensions. 5. Sorting: Data is sorted by ordinal value. 6. Selection: Data is available by value or range. 7. Derived (computed) attributes: Attributes are computed by operations on stored and derived values. “The classic approach to BI system implementation, users and technicians construct a data warehouse (DW) that feeds data into functional data marts and/or "cubes" of data for query and analysis by various BI tools. The functional data marts represent business domains such as marketing, finance, production, planning, etc. At a conceptual level, the logical architecture of the DW attempts to model the structure of the external business environment that it represents.” (Kurtyka)
57
The business intelligence data warehouse allows users to access information with a reporting tool to execute business planning and analysis. The proper BI data warehouse will deliver a cyclical “process of data acquisition (capture), analysis, decision making and action.” (Kurtyka) .
Conclusion A data warehouse can deliver a consolidated reporting source with increased flexibility and performance. However, in order to deliver a robust BI data warehouse solution some guiding principles need to be followed to deliver a successful product. First, the data warehouse project should encompass company directives in order to avoid individual projects that lead to stovepipe data marts. In addition, the ETL program that loads the data warehouse should not be used as a toll to fix source data. Finally, the data warehouse solution should be architected to provide superior performance and data normalization. Second, the warehouse solution should be business driven. The project should provide a solution for a specific problem. A proper business case should be approved and the project controlled by a steering committee. Third, the BI data warehouse should be built to increase productivity, increase data flexibility, and reduce company cost by integrating multiple sources of information.
References •
•
Kurtyka, Jerry. 2003. “The Limits of Business Intelligence: An Organizational Learning Approach.” http://www.dmreview.com/master.cfm?NavID=193&EdID=6800(0)
Mimno, Pieter. 2001. “Building a Successful Data Warehouse” Boot Camp Presentation. http://www.mimno.com
Comparison of Business Intelligence Strategies between SQL and Oracle Dinesh Priyankara 5/19/2003
Recently one of our business clients wanted to have a Business Intelligence System for his company. Because of availability of many BI platforms, he needed to do a comparison between MS SQL Server 2000 and Oracle 9i BI
58
platforms. So, I surfed on various web sites, read many white papers, and ended up with this document. Hope this will be useful for you too.
Business Intelligence Requirements Data warehouse databases: platform should support both relational and multidimensional data warehousing databases. OLAP (Online Analytical Processing): This is the most widely used component to Analysis. Platform should provide OLAP support within the databases, OLAP functionalities, interfaces to OLAP functionalities, and OLAP build and manage capabilities. Data Mining: platform should include data mining functionalities that offer a range of algorithms that can operate on data. Interfaces: platform should provide interfaces to data warehouse databases, OLAP, and data mining. Build and Manage capabilities: platform should support to build and manage data warehouses in their data warehouse databases like implementation of data warehouse models, the extraction, movement, transformation.
The Leadership Microsoft: Microsoft is quantitatively the OLAP leader and its BI platform is the equal of any other leaders such as Hyperion, IBM, and Oracle. And the pricing and packaging advantages demonstrated with OLAP in SQL 2000 are significant. As a result, Microsoft BI platform delivers value that is not approached by the platforms of other leaders Oracle: Oracle offers a more technologically consistent BI platform by delivering both OLAP and relational capabilities in its database. But its OLAP implementation has not been widely adopted by tools and application suppliers, and therefore has not yet achieved significant market share.
Build and Manage Capabilities Microsoft: Toolsets: Analysis Manager provides comprehensive relational and OLAP build and manage capabilities. Extraction data sources: MS SQL Server, Oracle, ODBC, Files, Access 2000, Excel 2000, MS Visual FoxPro, dBase, Paradox, MS Exchange Server and MS Active Directory. Oracle: Toolsets: Oracle 9i Warehouse Builder provides relational build and manage capabilities. Oracle Enterprise Manager provides OLAP build and manage capabilities. Extraction data sources: IBM DB2, Informix, MS SQL Server, Sybase, Oracle, ODBC, Flat Files.
Packaging and Pricing Microsoft: Entire BI platform for $19,999 (SQL Server Enterprise Edition - per processor license) Oracle: Fee of $40,000 per processor is just charged for Enterprise Edition of relational database. Oracle 9i OLAP and Data Mining are separately packaged and are priced at $20,000 and $20,000 consecutively per processor. And Warehouse Builder is priced $5,000 per named user. As a result entire Oracle BI platform is priced at about $85,000.
59
OLAP Interfaces Microsoft: MDX (Multi Dimensional Expression): This is Microsoft native OLAP interface and is an acronym for Multidimensional Expression. In many ways, this is very similar to Structured Query Language (SQL), but not an extension of SQL language. MDX provides Data Definition Language (DDL) syntax for managing data structures. DSO (Decision Support Objects): This library supplies a hierarchical object model for use with any development environment that can support Common Object Model (COM) objects and interfaces such as MS Visual C++, MS Visual Basic. Its objects encapsulate server platform, SQL Server databases, MDX functions, OLAP data structures, Data Mining models and user roles. Pivot Table Service: This is a client-based OLE DB provider for Analysis Service OLAP and Data Mining functionalities. This is powerful but heavy client interface. XML for Analysis: This is a Simple Object Access Protocol (SOAP)-based XML API that has been designed by Microsoft for accessing SQL Server Analysis Service data and functionality from the web client applications. This makes the SQL Server 2000 BI platform the first database to offer powerful data analysis over the web. And this allows application developers to provide analytic capabilities to any client on any device or platform, using any programming language. Oracle: OLAP DML: This is the native interface to Oracle 9i data and analytic functions. Through OLAP DML, application can access, query, navigate, and manipulate multidimensional data as well as perform analytic functions. Java OLAP API: Application can connect to multidimensional data and can perform navigation, selection and analysis functions but not all functions. For a example, Java application must execute OLAP DML command when the functionality is not available. SQL and PL/SQL: By using predefined PL/SQL packages that access OLAP command directly or OLAP multidimensional views or accessing table functions directly, OLAP data and functionalities can be accessed.
Data Mining Interfaces Microsoft: DSO (Decision Support Objects): This library supplies a hierarchical object model for use with any development environment that can support Common Object Model (COM) objects and interfaces such as MS Visual C++, MS Visual Basic. Its objects encapsulate server platform, SQL Server databases, MDX functions, OLAP data structures, Data Mining models and user roles. Pivot Table Service: This is a client-based OLE DB provider for Analysis Service OLAP and Data Mining functionalities. This is a powerful but heavy client interface. Oracle: Oracle 9i Data Mining API (java): This is open API and Oracle makes its published specification easily available.
Conclusion Microsoft and Oracle address all of our business intelligence platform requirements. They provide relational data warehousing, build and manage facilities, OLAP, data mining, and application interfaces to relational data warehouses, to OLAP data and analytic functionality, and to data mining. Microsoft provides a comprehensive business intelligence platform. Build and manage capabilities, OLAP capabilities, and application interfaces are its key strengths. Data mining is very new, although data mining integration and data mining tools are quite good. Oracle provides a comprehensive business intelligence platform. While this platform has a complete set of components, OLAP and data mining capabilities are unproven, data mining tools are low level, and build and manage capabilities are not consistently implemented for relational and OLAP data.
60
When considering the price, Microsoft leaves Oracle behind. Microsoft entire BI platform can be bought at $19,999 but it is about $80,000 for Oracle before adding $5,000 per user fees for build and manage capabilities. I highly appreciate all your comments about this article. You can reach me through [email protected]
Portable DTS Packages Kevin Feit 5/9/2003
Introduction Have you ever faced the situation where you have needed to move a DTS package from one server to another, say from development to production? The typical approach might be to save it as a file from your development server, then connect to production, open the file, modify the database connection, and then save it on production. This works fairly well assuming you are only moving one or two packages and you have access to the production. But if you need to move multiple packages across multiple environments, this will get tedious very quickly. It can also be error prone. For example, you can miss changing a connection, or the transformations can be inadvertently reset as the package is edited.
Running a package from the command line For our discussion, let’s assume you have a straightforward package to extract some data from your database based on a query, as shown in Figure 1 below. However the approach described will also work for nearly any activity using a DTS package, such as importing data from a file or moving data between databases.
Figure 1. A typical DTS package Connection 1 is the database connection (source) and Connection 2 is the destination, in this case a text file. The first question to address is: Can we avoid the need to save the package on different servers (development, QA, production)? Well, we can save the package as a file. But don’t you still have to open the package from Enterprise Manager to execute it? No. Microsoft provides a command line utility to run a DTS package. It is called dtsrun.exe. Dtsrun.exe accepts a file name and package name as arguments. So you can enter:
61
dtsrun /Fmydtspkg /Nmydtspkg to run a package named mydtskpkg.dts. Of course, we still have one major problem to overcome: the package is still executing against the database we created on.
Making the package portable So, how do we deal with the fact that the server name and database name are in effect hard coded in the package? The DTS editor provides the “Dynamic Properties Task”. Add a Dynamic Properties Task to the package. The properties window for it will appear. Type in a description, such as “Set Data Source”, and then click the “Add…” button. Open the tree to Connections-Connection 1-Data Source. Click the checkbox “Leave this dialog box open after adding a setting”, then click the Set… button.
Figure 2.Set the Data Source to a Global Variable In the next dialog box, set source to Global Variable and then click the Create Global Variables… button. Enter a Name, leave the type as String, and enter a default value. Now choose the variable that you just created. Repeat the process described for any other properties that you want to change, such as Initial Catalog (the database name) and User ID and Password if you are not using integrated security. If you are extracting to a text file, the Data Source for that connection will be the filename. Important: Now that you have added the Dynamic Properties task, make sure it is the first task to execute by adding a “On Success” workflow between it and Connection 1. If you don’t do this, the job will fail because the values are not yet set when it starts to execute the extraction step. Your DTS package should now look something like:
62
Figure 3. A DTS package with the Set Data Source task At this point, save the package and execute a test run of the package from Enterprise Manager to confirm that the changes made have been successful.
Setting variables from the command line As you recall from the first section, we can run a DTS package from the command line using the dtsrun utility. But how do we set the global variables? To do this use the /A switch. For example, dtsrun /Fmydtspkg /Nmydtspkg /A”Server:8=devserver” will set the global variable Server to devserver. The :8 is required to indicated that data type is string. Tip: The global variable names are case-sensitive. Make sure you exactly match the name in your command line with the name used in the package. If they don’t match, no error is reported, but the command line setting is ignored and the default value set in the package is used instead.
Putting it all together Now that we have the building blocks, let’s build a simple batch file to run any dts package. I will call it rundts.bat. @ECHO Off IF not exist %1.dts goto badfile set myserver=devserver set mydb=devdb set extractdir=c:\Extracts set outdir=.\output CALL dtsrun /F%1 /N%1 /WTRUE /A”DB:8=%mydb%” /A”Server:8=%myserver%” /A”Outfile:8=% extractdir%\%~1.txt” > “%outdir%\%~1.txt” IF /i %2==Y start notepad %outdir%\%~1.txt IF /i %3==Y pause goto end :badfile ECHO Please provide a filename without DTS extension, followed by Y to show output, and another Y to pause before returning :end TIME /T ECHO %1 Completed @ECHO ON
63
rundts.bat Edit the values in the four set statements accordingly to reflect your server name, database name, directory for the extracted data, and directory for the extract log. The extract will use the same filename as the DTS package, but with a .txt extension. Setting the /W flag to TRUE in the CALL dtsrun line indicates to log the output to the event viewer. There are also two flags that rundts.bat accepts. The first indicates whether to start notepad and open the output file after each step. The second flag determines whether to pause between each step. This allows the execution to be monitored or to run unattended. So if you need to run three DTS packages, you can create another batch file as: CALL RUNDTS extract1 Y Y CALL RUNDTS extract2 Y Y CALL RUNDTS extract3 Y N This will pause processing between each extract and open the output file for review.
Conclusion This article provided a straightforward approach to make DTS packages portable between servers or databases. By leveraging the SQL Server 2000 Dynamic Properties Task and the ability to run packages from the command line, the package can be migrated with almost no effort. Of course, what is presented is just a starting point, but the general technique can modified to meet many needs.
Replacing BCP with SQLBulkLoad Stefan Popovski 7/21/2003
The good old BCP easily moves to the side as a historical tool as XML becomes the worldwide adopted standard for data interchange. So if you are using BCP to import text data files, and you want to use XML import files instead of text files, now is good time to change. At first, we can see BCP utility called from xp_cmdshell procedure partial example: DECLARE @cmd VARCHAR(256) SET @cmd='bcp ' + @DbName + '..' + @TableName + ' in ' + @FullImportFileName + ' / S' + @ServerName + ' /f' + @FullFmtFileName + ' -T' EXEC @Result_cmd = master..xp_cmdshell @cmd BCP Advantages: • BCP enables fast import for big data files • Database can import data without developing other application using BCP. Although this method for importing data is very fast, it has several limitations when we want to use them in complex data processing systems in integration with other users' applications: • BCP is not appropriate for importing XML data • Inattention using master..xp_cmdshell can seriously endanger SQL Server security BCP enables full control over transactions from application to the final insert in database table. BCP was a very good utility that can help to build application independent database. What does it mean? I want to see a clear border between application and database. Databases have to be independent in the sense that every action from application against database can be “Execute Procedure”. I don’t want any SQL statement from application code acting directly in database risking damaging database logic. In that case applications bugs will have less damaging effects in the database. This is especially important if you want clear developers' responsibility in your development team. In the case of an unexpected crash of application, some procedures can be executed by hand through SQL Query Analyzer. This is because I want to grant the task for importing data in Database to the Database itself, instead of to some application or user interface. Then we need an appropriate tool for importing XML files in SQL Server database, called from Stored Procedure. Although OPENXML statement can be used for direct import in database, I prefer this option:
64
Using SQLXMLBulkLoad.SQLXMLBulkLoad.3.0 On your SQL Server you have to install SQLXML3.0 SP1 (http://msdn.microsoft.com), then create the file ImportData.xml in 'C:\Folder\ImportData.xml' ImportData.xml using the following data: Row1_ Filed1_Data Row1_ Filed2_Data Row1_ Filed3_Data
Row2_ Filed1_Data Row2_ Filed2_Data Row2_ Filed3_Data
You also need to create a file called schema.xml in the same folder, as follows: <Schema xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:xml:datatypes" xmlns:sql="urn:schemas-microsoft-com:xml-sql"> <ElementType name="Field1" dt:type="string"/> <ElementType name="Field2" dt:type="string"/> <ElementType name="Field3" dt:type="string"/> <ElementType name="ElementRow" sql:is-constant="1"> <element type="Row"/> <ElementType name="Row" sql:relation="TableImport"> <element type="Field1" sql:field="TabField1"/> <element type="Field2" sql:field="TabField2"/> <element type="Field3" sql:field="TabField3"/> Create TableImport1 in “YourDatabase” executing the following script SQL Query Analyzer: CREATE TABLE [dbo].[TableImport1] ( [TabField1] [varchar] (40) NULL, [TabField2] [varchar] (40) NULL , [TabField3] [varchar] (40) NULL) ON [PRIMARY] GO Then you can create a new procedure in your database: CREATE PROCEDURE BulkLoad AS DECLARE DECLARE DECLARE DECLARE DECLARE DECLARE
@objectINT @hr INT @src VARCHAR(255) @desc VARCHAR (255) @Schema VARCHAR(128)DECLARE @ImportDataVARCHAR(128) @ErrorFile VARCHAR(128)
SET @Schema = 'C:\Folder\schema.xml'SET @ImportData = 'C:\Folder\ImportData.xml'SET @ErrorFile = 'C:\Folder\Error.log' EXEC @hr = sp_OACreate 'SQLXMLBulkLoad.SQLXMLBulkLoad.3.0', @object OUT IF @hr <> 0 BEGIN EXEC sp_OAGetErrorInfo @object, @src OUT, @desc OUT SELECT hr=convert(varbinary(4),@hr), Source=@src, Description=@desc RETURN
65
END ELSE EXEC @hr = sp_OASetProperty @object, 'ConnectionString', 'provider=SQLOLEDB.1;data source=SERVERNAME; database= YourDatabase;Trusted_Connection=Yes' IF @hr <> 0 BEGIN PRINT 'ERROR sp_OAMethod - ConnectionString' EXEC sp_OAGetErrorInfo @object RETURN END EXEC @hr = sp_OASetProperty @object, 'ErrorLogFile', @ErrorFile IF @hr <> 0 BEGIN PRINT 'ERROR sp_OAMethod - ErrorLogFile' EXEC sp_OAGetErrorInfo @object RETURN END EXEC @hr = sp_OAMethod @object, 'Execute', NULL, @Schema, @ImportData IF @hr <> 0 BEGIN PRINT 'ERROR sp_OAMethod - Execute' EXEC sp_OAGetErrorInfo @object RETURN END EXEC @hr = sp_OADestroy @object
SELECT
'OK'
IF @hr <> 0 BEGIN PRINT 'ERROR sp_OADestroy ' EXEC sp_OAGetErrorInfo @object RETURN
END
GO At last you are prepared to execute your procedure to import data into TableImport1 by executing the 'BulkLoad' procedure from SQL Query Analyzer. Don’t forget to change connection string appropriate to your server (the one shown uses a trusted connection, but you could change to use a sql login). After executing, you can test if it was successful by doing a select * from TableImport1, which should yield three rows.
66
Security Second only to the the Information Security group, DBAs are usually very diligent in controlling security in the database. However this a complicated and difficult subject to deal with. 2002 saw a huge security push from Microsoft, which resulted in quite a few SQL Server patches. 2003 brought us SQL Slammer and ensured the security would remain at the forefront of the DBAs concerns.
Block the DBA?
Robert Marda
68
SQL Server Security: Login Weaknesses
Brian Kelley
70
SQL Server Security: Why Security Is Important
Brian Kelley
77
TSQL Virus or Bomb?
Joseph Gama
81
67
Block the DBA? Robert Marda 1/28/2003
Introduction What?! That can’t be done, can it? The final answer is no. However, you can certainly block an unknowledgeable DBA with the techniques I will describe in this article. The same techniques will block you and other users from forgetting business rules and doing tasks you shouldn’t, or simply block you from accidentally dropping the wrong table. I’m quite sure there are other ways to do the same things, ways that are considered better. My goal in this article is to use some extreme methods which could take a sequence of steps to undo and certainly will require some knowledge about system tables. The ways I plan to do this mean modifying system tables (also known as system catalogs) which many DBAs frown upon. Microsoft recommends you don’t modify system tables. Also, modifying system tables can mean Microsoft won’t help you if a problem is related to said modifications. Having said this, I likely brand myself as a rogue developer. I share them here now to show you what can be done should you feel the need to use these methods, say in your development environment as a joke. You can automatically include all the techniques I describe in this article in your list of worst practices and as such should not seriously consider any of them as viable solutions.
Sample Code The following sample code will be used for the examples in this article. I recommend that you create a new database and then execute the below code using the new database: CREATE TABLE [dbo].[TestTable] ( [col1] [char] (10) NULL ) ON [PRIMARY] GO
Example 1: Block DROP TABLE Command Here is a way to completely block the DROP TABLE command. First you must execute the following commands on your test server to allow you to make changes to the system tables. Later in this section I will give you the code to disallow changes to the system tables. EXEC sp_configure 'allow updates', '1' GO RECONFIGURE WITH OVERRIDE GO Now execute the sample code given in the previous section. Execute the below code to mark the table you created as a table that is replicated: UPDATE o SET replinfo = 1 FROM sysobjects o WHERE name = 'TestTable' SQL Server will block you from dropping TestTable since it is now considered a replicated table. Upon executution of the command “DROP TABLE TestTable” you will receive the following error: Server: Msg 3724, Level 16, State 2, Line 1 Cannot drop the table 'TestTable' because it is being used for replication. I don’t think I would ever mark all my user tables as replicated tables, however I have often considered changing some of them. This would definitely avoid the mistake of issuing a DROP TABLE command on a table in production when you thought you were connected to your development SQL Server. Use this code to return the table to its normal state: UPDATE o SET replinfo = 0 FROM sysobjects o WHERE name = 'TestTable' Now let's disallow modification to the system tables. Execute the below code: EXEC sp_configure 'allow updates', '0' GO RECONFIGURE GO Feel free to use the SELECT, DELETE, UPDATE, and INSERT commands. They will all work.
68
EXAMPLE 2: Block New Database User For this example we’ll do what you’ve either read or heard is not possible. We’re going to place a trigger on a system table. What?! Don’t blink or you might miss something. Execute the code from Example 1 to enable changes to system tables. Now execute the following code to let you place a trigger on the sysusers database: UPDATE o SET xtype = 'U' FROM sysobjects o WHERE name = 'sysusers' Please note that this change does not completely cause SQL Server to count the table as a user table. It will still show up as a system table in Enterprise Manager. However, it will allow you to place a trigger on the table by using the following code: CREATE TRIGGER BlockNewUsers ON sysusers AFTER INSERT AS DELETE sysusers FROM sysusers s INNER JOIN inserted i ON s.uid = i.uid PRINT 'New users not allowed. Please ignore words on next line.' Execute the below update to return the sysobjects table back to normal: UPDATE o SET xtype = 'S' FROM sysobjects o WHERE name = 'sysusers' Now execute the code that will disallow changes to system tables. Now create a new login for your SQL Server or use an existing login. Copy the below code and replace ‘u1’ with the login you are going to use and execute the code: EXEC sp_adduser 'u1' You should see the following in the Query Analyzer message window: New users not allowed. Please ignore words on next line. Granted database access to 'u1'. You can view the users via Enterprise Manager or view the table sysusers and you will not find the new user since the trigger did fire and deleted the user after it was added. You can drop the trigger at any time without modifying the xtype of the system table.
Example 3: Creating Read Only Tables This example could be the one I view as most likely to be useful should you have a need for a read only table that even DBA’s can’t change unless they undo what I show you here. Once we’re through you won’t be able to make any structural changes nor changes to the data in the table. You will be able to view that data in the table and that is about it. Again you will need to run the code from Example 1 to enable changes to the system tables. Now, rerun the sample code given at the beginning of this article. Execute the following INSERT statement to put some data in the table TestTable: INSERT INTO TestTable (col1) SELECT 'A' UNION SELECT 'B' UNION SELECT 'Robert' Once done, you can execute the below code so that SQL Server will count TestTable as a system table: UPDATE o SET xtype = 'S' FROM sysobjects o WHERE name = 'TestTable' Now execute the code from Example 1 to disable changes to system tables. Now you can run DELETE, UPDATE, and INSERT statements all you want and all you will get are errors like this one: Server: Msg 259, Level 16, State 2, Line 1 Ad hoc updates to system catalogs are not enabled. The system administrator must reconfigure SQL Server to allow this. You won’t be able to add nor modify columns. You can’t add a primary key, nor a trigger, nor an index to the table any more. This technique will also block the DROP TABLE command and the TRUNCATE TABLE command.
69
Conclusions In this article I have shown you how to modify system tables (also called system catalogs). I have shown you a few ways you can modify the system table called sysobjects to block certain activities that a user with SA privileges could normally do. I have also indicated that these techniques are best used as jokes on a development SQL Server even though there is a slim chance they could be useful for other uses. Once more let me stress that everything described in this article can be considered as worst practices and are shared to give you a brief look into a system table and how SQL server blocks certain activities via replication and the system tables. I look forward to your comments even if you do choose to blast me verbally for daring to share such information in the way I did in this article. Feel free to let me know what you think.
SQL Server Security: Login Weaknesses Brian Kelley 8/14/2003
If you've read any Microsoft literature, you know the party line is to use Windows authentication whenever possible. This is sensible from a security perspective because it follows the concept of single sign-on. If you're not familiar with single sign-on, it's pretty easy to understand: a user should only have to sign on one time to gain access to any resources that user might need. By using SQL Server authentication, most users are going to have to go through two sign-ons. The first sign-on comes when they logon to their computer systems. The second comes when they logon to SQL Server. Admittedly, the user may not manually enter the username and password but there are still two logins. Each additional login is another one for the user to keep track of. Too many logins and eventually users will start writing down username and password combinations or store them in unsecured Excel worksheets. However, if we can tie a user's rights within SQL Server (and the ability to access SQL Server) to the Windows login, we avoid the user having to keep track of his or her SQL Server logon. We can rely on the operating system and SQL Server to perform the tasks for authentication behind the scenes. It's all transparent to the user. Just what most people want! And there you have one reason why Windows authentication is preferred. Another one is administration. If I grant access for a Windows group, any members of that group have access. If a system administrator adds a person to the group, the person has access and I as a lazy DBA have not had to lift a finger. If a system administrator terminates a user account in the domain, that drops the user from the group. Said user no longer has access to my system. Again, I've not had to lift a finger. This is the life! Well, there's more to it than appeasing my laziness. Consider the scenario where you have a rogue user. The system administrator disables the account immediately and punts the user off any logged on sessions the user might have (there are ways to do this but I won't go into them). But said user has a laptop and has just enough time to plug up, connect to the SQL Server and grab some data. If the user has to rely on a Windows login the user is now out of luck. When the system administrator terminated said account, the user's access to SQL Server was subsequently terminated. However, if the user had access through a SQL Server login and the DBAs haven't been made aware of the situation, the user is in. Of course, this type of scenario is rare. More often is the case where a user leaves a company and the Windows account is disabled, but the SQL Server logins aren't cleaned up on all systems. Slips in communication do happen. Even if you have nifty scripts to go and scrub all your database servers they are no good if you don't know to remove the user. So in the end, Windows authentication comes to the forefront for ease of use and ease of administration. But that's not what this article is about. This article is about two well-known weaknesses in the SQL Server login passwords. One is a weakness as the password is transmitted across the wire. The other is a weakness in how the password is stored. None of this is new material. What does that mean? It means those who care about breaking into SQL Server systems have known about them for a while. I do need to point out there hasn't been any major compromises in the news because of these weaknesses. Keep that in mind as you read through the article (it puts things in perspective). However, a good DBA is a well-informed one. So if you're not familiar with these two vulnerabilities, read on.
Weak Encryption Across the Wire Unless the communications between a SQL Server 2000 server and its client is encrypted, an attacker can watch the data exchanged by the two and be able to sift through and grab usernames and passwords. The one issue is
70
the attacker has to be in a position to capture the network traffic. In today’s switched environment, this isn’t easy as it once was. Even in small office installations, most servers and workstations are connected via switches and not hubs. The one issue with hubs is that everyone sees everyone else’s traffic. Enter in the switch. The switch isolates my traffic. The only traffic I should see is traffic intended for just my system as well as any traffic intended for all systems (broadcast traffic). If I look at things from a performance perspective, this is what I want. I do not want to contend for bandwidth with the other 15 people I'm plugged in with on a 16-port network hub. Give me a 16-port switch any day so I get my bandwidth! Because of the advantages of a switch strictly from a performance viewpoint, hubs are quickly being phased out. With the reduced prices on switches, even the most miserly individual has a hard time cost-justifying a hub rather than a switch for a particular environment. This is great from a security perspective because it isolates data streams so that I, by sniffing on my network port, can’t see what John is doing on his. This isolation mitigates the SQL Server password vulnerability with respect to most users. But it doesn’t eliminate everyone. The malicious attacker who also happens to be employed as a network engineer can gain the setup I’ve described above. While I don’t mean to suggest all network engineers are necessarily looking to do such things, the truth of the matter is there are some individuals who would. I don't have any hard numbers, but I wouldn't even put it at 1% (1 in 100). The problem is we have to deal with that one out of very many who would. Money is a big motivator and if such an individual can secure access to a critical SQL Server and walk away with valuable information, expect the individual to do just that. As a result, if the SQL Server passwords are easily decrypted (and they are), using SQL server logins should be an option of last resort, one used when Windows authentication just doesn’t work. How weak is the encryption? Let’s find out!
Understanding XOR There are a lot of methods for encrypting data and quite a few of them are very secure. Some of these methods produce output that, even if you know the encryption algorithm, you still need the key used to encrypt the data in order to decrypt the data within your lifetime. There are also methods that can be pulled apart by ten-year-olds. Encryption by XOR fits the latter category. And that’s how a SQL Server password is encrypted when it goes across the network. Bruce Schneier, one of the best-known people in the field of cryptography, makes the comment, “An XOR might keep your kid sister from reading your files, but it won’t stop a cryptanalyst for more than a few minutes.” Ouch! But he’s absolutely right. XOR is a logic algorithm, not an encryption one. However, some applications tout having a fast encryption algorithm and it’s nothing more than XOR. So what’s wrong with XOR? To understand that, let’s look at the XOR operation in more detail. For those that may not be familiar with XOR, I’ll start with the OR operation. If I’m comparing two items, if either one is true, my end result is true. In other words: Item1 OR Item2 = ??? For instance, I may want to compare the following 2 statements: Item1: The United States flag has the colors red, white, and blue. Item2: The Canadian flag has the colors red, white, and blue.
If I look at these two items, I know Item1 is true. The US flag does have these three colors. Item2, however, is false because the Canadian flag only has red and white. The OR operation only cares if at least one of the two statements is true. Since Item1 is true, I can evaluate my expression: Item1 OR Item2 = TRUE The only way an OR operation evaluates to false is if both items are false. For instance: Item1: Tokyo is the capital of the United States. Item2: Richmond is the capital of Canada.
I have a scenario where both statements are false. In this case my OR operation will evaluate to false. So: Item1 OR Item2 = FALSE If both statements are true, I’ll get a true result as well. For instance: Item1: The United States flag has the colors red, white, and blue. Item2: The Canadian flag has the colors red and white.
Item1 and Item2 are both true. The OR operation only cares if at least one of the two statements is true. If both statements happen to be true, the OR operation will still evaluate to true. In fact, if the first statement happens to be true, there’s usually no reason to evaluate the second. Because of this, programming languages like C# and Java will shortcut the evaluation if Item1 is true. There are mechanisms to force the evaluation of the second item, because sometimes it’s necessary to carry out some programming code that’s part of the evaluation process. But typically, if the first statement is found to be true, there’s no need to look at the second. And here is where we see a big difference between OR and XOR. The XOR operation is also known as “exclusive or.” Unlike the OR operation, XOR will evaluate true only if one and only one of the two items is true. If both items are true, the XOR operation will return false. The exclusive part of the name means one and only one side can be true for the XOR operation to evaluate to true. So in the case of my previous example where both Item1 and Item2 were true, XOR will evaluate to false. Item1 XOR Item2 = FALSE
71
XOR at the Bit Level I’ve intentionally used the work “item” because I wanted to keep what was being XORed abstract. In logic, statements are typically compared. However, when dealing with computers we will sometimes compare statements but other times compare bits. That’s why you’ll sometimes see XOR referred to as a “bit-wise” operation. It is an operation that is often applied at the bit level because there are some definite uses for it. If you’ve not done much work with logic in school, this all may seem a bit confusing (pun intended). One of the helpful things I was shown in logic was the truth table. A truth table is simply a matrix of all the statements and what they evaluate to for all cases of true and false. Table 1 is a classic truth table for XOR from a logic class. Notice I’ve used p and q instead of item1 and item2. The letters p and q are often substituted for statements as a shortcut measure.
Table 1. Logic Truth Table for XOR p
q
p XOR q
True
True
False
True False
True
False True
True
False False
False
By looking at the truth table given in Table 1, I can quickly see what happens when I XOR the two statements. I can do the same for bits. In Table 2 I show the values, except I’ll be using bit1 and bit2 instead of p and q. I’ll also use 1 and 0 instead of “True” and “False”
Table 2. Bit-wise Truth Table for XOR
Bit1 Bit2 Bit1 XOR Bit2 1
1
0
1
0
1
0
1
1
0
0
0
When we compare two sets of bits, we line them up and check each pair of bits individually. Table 3 shows this process:
Table 3. Bit-wise XOR on two bit streams
Stream
8
7
6
5
4
3
2
1
Stream1
1
0
1
0
1
1
0
0
Stream2
0
1
1
0
0
1
1
0
XOR
1
1
0
0
1
0
1
0
XOR is a simple operation to carry out. As a result, companies looking for “encryption” may decide to use it because simple operations tend to be very fast and XOR fits the bill perfectly. The common user won’t have any idea of how to decipher the data, so it appears secure. The operation is very quick, so the end user also doesn’t
72
see a huge performance hit. But the problem is it’s too simple. The average end user may not realize how to decrypt the data, but any attacker worth his or her salt will. To make things worse, we reverse the XOR operation by using XOR yet again. Observe in Table 4 I’ve taken the result of Table 3. I then XOR it with Stream2 from Table 3 and I have Stream1 again.
Table 4. Undoing XOR Stream
8
7
6
5
4
3
2
1
XOR
1
1
0
0
1
0
1
0
Stream2
0
1
1
0
0
1
1
0
Stream1
1
0
1
0
1
1
0
0
Applying XOR to SQL Server Passwords If I know the key that was used, reversing XOR is trivial. I simply XOR the “encrypted” data by the key and I get my original data back. If I don’t know the key, I do have methods available to determine the length of the key and figure out what the key is. Those methods are beyond the scope of this article and aren’t necessary in the case of SQL Server passwords. The reason they aren’t necessary is because when a SQL Server password is transmitted across the network, each byte is XORed with the character 0xA5 (in hexadecimal representation). So my key is a stream of 0xA5 characters, one for each character in the original password. Since I know ahead of time the password stream has been XORed with 0xA5, I simply perform an XOR using 0xA5 and I get the stream as it existed before the XOR.
Flipping Bits Microsoft does throw in a step prior to XOR when encrypting the password. That step involves flipping sets of bits. A byte is made up on eight bits. Half a byte, or four bits, is sometimes referred to as a nibble. If I look at a byte, I can split it down the middle and get two nibbles. For instance, a byte of 10101100 has a nibble 1010 and another nibble 1100. What Microsoft does is flip the nibbles. So my byte 10101100 becomes 11001010 and it is this second byte that gets XORed. Keep in mind that Unicode characters are represented by two bytes. Each byte is treated separately with regards to flipping the nibbles. But in the case where a byte is 00000000, the flipped byte would look the same. The reason I bring this up is while the password is passed as a Unicode string, the second byte for most Latin characters (A, B, c, d) is 00000000 or 0x00. This little bit of information can often help us find the right packet of a network trace, even if we know nothing of how to tell which frames are for logins and which are for data.
Decrypting a SQL Server Password If you know how to read Tabular Data Stream (TDS) frames, you know what to look for to identify which ones correspond to logging in to SQL Server. Remember I said Latin characters would have the second byte as 0x00? Well, 0x00 XORed by 0xA5 is 0xA5. So even if you don't know the frame codes, you can look for a stream of hexadecimal codes that have A5 as every other code (if you're dealing with passwords requiring Unicode characters, you'll have to look for a certain code to identify what type of TDS packet - I'll cover this in a later article). An example would be this: A2 A5 B3 A5 92 A5 92 A5 D2 A5 53 A5 82 A5 E3 A5 If I'm dealing with Latin characters, I can drop the A5's and I get: A2 B3 92 92 D2 53 82 E3 Once I find the stream, I can decipher the password. I would start by XORing 0xA5 against each character. Then I flip the nibbles and I'm left with the ascii value for the particular letter. Once I look up the ascii value I have my letter. If I do this for all the letters and I have my password. Table 5 demonstrates the deciphering process. The first three streams are in hexadecimal.
73
Table 5. Deciphering the Password Steam Stream
1
2
3
4
5
6
7
8
Trace
A2
B3 92
92
D2
53
82
E3
XOR
07
16 37
37
77
F6
27
46
Flipped
70
61 73
73
77
6F
72
64
Decimal
112 97 115 115 119 111 114 100
Character p
a
s
s
w
o
r
d
The original password was simply "password" and I have a match. All SQL Server passwords go across the wire with this weak encryption algorithm. Once you have the right bit of info (the character stream), you can crack this thing with a scientific calculator and an ascii table. Since the SQL Server password is so easily deciphered, encrypting the connection between client and server becomes a necessary evil. Even given the use of Windows authentication, I would suggest a secure connection using SSL or IPSec because even if the login information isn’t being passed in plaintext (unencrypted) or something nearly as weak, the data will be. David Litchfield’s paper “Threat Profiling Microsoft SQL Server” describes the XOR against 0xA5 but it doesn’t discuss the flipping of the bits, which is part of the password “encryption.” A company calling themselves Network Intelligence India Pvt. Ltd. posted a correction to Mr. Litchfield’s paper. You can find a link to both in the Additional Resources section at the end. Before I get ahead of myself, let me say that a secure channel is an important part of our overall security for SQL Server, but it, in and of itself, isn’t a cure-all for SQL Server logins. David Litchfield and crew of NGSSoftware also found a weakness in the hash SQL Server generates to secure SQL server passwords. This hash means anyone who manages to get sysadmin rights to our SQL Servers can potentially crack the passwords and use them against us.
Hash Weakness in Password Storage SQL Server login passwords are stored in the sysxlogins system table but thankfully, only sysadmins have the ability to query against it directly. Therefore, this hash weakness isn’t one that can be exploited by a typical end user. The end user would have to somehow get sysadmin level privileges on a SQL Server. One of the ways a user could do this is by exploiting a vulnerability that leads to privilege escalation, something that's always a possibility. The hash weakness was reported in a paper entitled “Microsoft SQL Server Password (Cracking the password hashes)” that was released by NGSSoftware on June 24, 2002. You’ll find a link to the paper under Additional Resources. SQL Server doesn’t store user passwords in plaintext (unencrypted), but instead encrypts them. When SQL Server encrypts a password, it uses an undocumented function called pwdencrypt(). This function produces a hash. Since hash can mean different things based on context, let me define what I mean by a hash (also called a hash value) and a hash function. A hash or hash function is some function that takes a stream of bits or a string of characters and transforms them into another stream of bits or string of characters, usually smaller and of a fixedlength. A good hash function will return very few duplicate hashes, the fewer the better. The reason a good hash function should tend to return unique hashes is because these hashes are often used for comparison, such as with a password check. Hash functions are usually one-way functions, meaning I can’t reverse engineer the original bit stream or string of characters from the hash (as opposed to XOR which is a reverse function of itself). As a result, if I am doing a password check I’ll get the password from the user and I’ll then throw it through the hash function and generate a hash value. I’ll do a comparison of the hash value I’ve just generated against what I have previously stored for the user. If I have a match, I’ll let the user in. As a result, the less chance of a duplicate hash being generated, the better. The pwdencrypt() function is a hash function. It takes a plaintext password and converts it into a hash. Actually, it’s more correct to say two hashes. The pwdencrypt() function first generates what is called a “salt.” In cryptography what we mean by salt is a random string of data that is added to the plaintext before being sent through the hash function. In the case of our pwdencrypt() function, the salt is basically a random integer. It’s a bit more complicated that that, but not by a whole lot. The salt is time-dependent, however. How can we tell? If we execute SELECT pwdencrypt('PASSWORD')
74
We’ll get different results even if we’re only a second apart. For instance, the first time I ran this query, I received the following hash: 0x0100DE1E92554314EE57B322B8A89BF76E61A846A801D145FCAF4314EE57B322B8A89BF76E61A846A 801D145FCAF The second time I ran the query, I received this hash (which is clearly different): 0x01000F1F5C4BFFEE2BEFFA7D8B8AF3B519F2D7D89F2D4DAEDF49FFEE2BEFFA7D8B8AF3B519F2D7D89 F2D4DAEDF49 First, the pwdencrypt() function takes the password and converts it to Unicode if it isn’t already. It then adds the salt to the end of the password. This is the plaintext it sends through an algorithm known as the Secure Hashing Algorithm (SHA). SHA will generate a ciphertext (the encrypted characters) that pwdencrypt() will temporarily put to the side. Then pwdencrypt() takes the password and makes it all uppercase. Once again, it’ll append the salt to the end and send the resulting combination through SHA. Finally, pwdencrypt() will combine a standard static code (0x0100 in hexadecimal), the salt, the first ciphertext (password in the original case), and the second ciphertext (password in all uppercase) to create the password hash stored in sysxlogins. I’m not sure why the all-uppercase version of the password is included, but needless to say, it weakens the SQL Server password "hash." Since I only have to match against uppercase letters, I’ve eliminated 26 possible characters (the lowercase ones) to figure out what the password is. Granted, once I discover the password I won’t know the case of the individual characters, but to figure out the case is trivial. If I can find out that a user has a password of say “TRICERATOPS,” I can then build a quick little program to try every possibility of case for the word triceratops. Triceratops has 11 letters, so there are 211 possible combinations. That’s only 2048 different possibilities. A script or program can test each possibility until it gets a match. Remember SQL Server 7 and 2000 do not have account lockout policies for too many login failures. Consider that if I didn't have the all-uppercase version of the password I’d have to brute force every single dictionary word and every single possible case. That means just to test triceratops to see if it were the password (regardless of case), I’d have to run up to 2048 attempts instead of one. I would have to test every possible case combination for every single word. I couldn’t just test the word. But since the all-uppercase version is part of what is stored in sysxlogins, the number of attempts I may have to make to crack the password decreases drastically. Let's look at an example. An 8-character dictionary word has 256 (28) possible case combinations. I’ve been told the SQL Server account uses 1 of 8 words, all of them 8 characters in length (a controlled test). If I have to run through these 8 words and I have to potentially try every single case combination, I may have to try up to 256 x 8 = 2048 combinations. If I can test just all-uppercase words to find a match, I would have to test just 8 times to get the word. Then I’d have to run up to 256 combinations to find the exact password. Instead of 256 x 8, I’m looking at a maximum of 256 + 8 = 264 combinations. Now extrapolate this out to the entire Webster’s dictionary. The algorithm to attempt a dictionary attack against a SQL Server password hash isn’t very long or difficult. I’ve pretty much explained it in this section. And when NGSSoftware put out the paper revealing the weakness, they also included source code in VC++ to attempt such a crack. The program isn’t hard and it isn’t very complex, but it does require the Windows 2000 Software Development Kit (SDK) because it needs the CryptoAPI that’s part of the SDK. Figure 1 shows an example of the compiled source code in action against one of the password hashes from earlier.
75
Figure 1. Cracking the password hash. NGSSoftware has additional tools that are GUI-based to perform similar tests but with a much nicer interface and a few more features. These two tools are called NGSSQLCrack and NGSSQuirrel. NGSSQLCrack does have the ability to perform a brute force attack should the dictionary attack fails. I've included a link to Steve Jones' reviews of both products in the Additional Resources section. Most password hacking programs will attempt a dictionary attack first. Since dictionaries are easy to find in electronic form, people who use a password found in a dictionary are opening themselves up to having their passwords hacked. Too many programs can run through an entire dictionary listing in a very short time. SQL Server passwords are no different. In reality, if a user chooses a strong password, one with alphabetic and numeric characters as well as a special character that’s at least six characters long, the password is reasonably secure. I say reasonably, because someone who can bring the proper computer resources to bear will eventually be able to crack the password. The mechanism encrypting SQL Server passwords isn’t such that it is unreasonable for an attacker to be able to crack them, should the hacker get a hold of the hash. Tip: When I attempted to compile the VC++ code presented in NGSSoftware’s article on cracking SQL Server passwords, VC++ did return 1 compile error with regards to the following line of code: wp = &uwttf; The error VC++ returned indicated that it wouldn’t carry out the implicit conversion. I had to modify the line to read: wp = (char *) &uwttf; in order to generate a successful compile. As they say on the newsgroups, “Your mileage may vary.”
Concluding Thoughts Microsoft recommends Windows authentication because of single sign-on and also to reduce administrative overhead. These two reasons are good enough to use Windows authentication whenever possible. However, there are times when DBAs are forced to use SQL Server logins because that's all a program will support. There's not a whole lot we can do about the authentication method in those cases. But in cases where we do have a choice, such as a home grown application, the choice should usually point in the direction of Windows authentication. If addition to Microsoft's two main reasons, another reason is due to weaknesses in how the passwords are transmitted and how they are stored. I did say weaknesses but keep in mind to consider the mitigating circumstances. To dispel the FUD (Fear, Uncertainty, and Doubt), let's consider a couple of things. In the first case, you typically have to have a rogue network engineer. If that's the case, SQL Server access isn't the only, nor necessarily the most critical issue facing an organization. Anyone with half a lick of creativity can imagine what such an empowered individual could do. This doesn't mean we shouldn't take steps to reduce our vulnerability, but is also doesn't mean we should go around with our hands in the air screaming, "The sky is falling!" In the second case, you need sysadmin privileges to access the sysxlogins table. Yes, even without a rogue DBA, there is always the possibility of a privilege escalation where a process somehow gets itself to sysadmin rights. NGSSoftware has a paper on that very
76
possibility. But keep in mind that passwords aren't the only things that will be vulnerable. The data is there, too. Also, the more complex the password, the harder it is to crack, even if you do have an advantage of only having to get the upper-case letters. The fact is, if you don't mix in numbers and symbols, the passwords become relatively easy to crack. It's all about password complexity.
Additional Resources • • • • •
Threat Profiling Microsoft SQL Server - NGSSoftware Weak Password Obfuscation Scheme (Modified) in MS SQL Server - Network Intelligence India Pvt. Ltd. Microsoft SQL Server Passwords (Cracking the password hashes) - NGSSoftware Review: MSSQLCrack - Steve Jones Review: NGSSquirrel - Steve Jones
© 2003 by K. Brian Kelley. http://www.truthsolutions.com/(0) Author of Start to Finish Guide to SQL Server Performance Monitoring.
SQL Server Security: Why Security Is Important Brian Kelley 7/31/2003 Typically I write about technical solutions to SQL Server problems. This is true whether I'm writing about performance, security, or disaster recovery. This article will be atypical in that respect because it'll consist of case studies that point out why security is critical. All of the case studies will be on security incidents that have made the news. These incidents also involve compromised databases. Not all of them involve SQL Server but point out a fundamental axiom: databases are primary targets because they are information stores. As SQL Server DBAs, we have to realize that SQL Server is growing in market share. Microsoft’s SQL Server is an enterprise-class database platform more and more companies are using to store critical and sensitive data. SQL Server is an important cog in Microsoft’s .NET enterprise server architecture. It’s easy to use, takes fewer resources to maintain and performance tune than other comparable database platforms, and it’s reasonably priced. All these facts mean SQL Server has a big target painted on the side of your server that says, “I’m important. Come and get me.” If your SQL Servers aren’t secured, someone will. Others have found out the hard way. Here are some examples.
RealNames In February 2000, C|Net, one of the more prominent tech news organizations, reported the company RealNames informed customers that its customer information database had been breached and the attackers had walked off with valuable information, to include credit card numbers. RealNames was in the business of making complex web addresses accessible by the use of keywords. Anyone was capable of going to the RealNames’ site, registering and paying via credit card, and thus getting keywords associated with their website. It was this customer database the attackers broke into. RealNames is no longer in business, and though the reason for RealNames closing its doors has nothing to do with this security breach, obviously the breach caused numerous issues for RealNames’ customers. Credit card numbers were valuable then and they are now. RealNames’ customers most certainly had to go through the process of canceling any and all credit cards they might have used on the RealNames site and acquiring new ones. At least RealNames acted in a respectable manner, sending an email to its customer base within 24 hours of discovering the breach. RealNames then went and hired security firm Internet Security Systems (ISS) to conduct an audit and prevent against future security breaches. But the fact remains that up to 50,000 customers might have had their credit card information readily accessible by an attacker, one who had been on the system undetected for at least a few days.
77
World Economic Forum About a year later (Feb 2001), crackers from the group Virtual Monkeywrench announced they had hacked into the registration database of the World Economic Forum (WEF). What did they get? The group captured credit card numbers, personal addresses and emails, home and cell phone numbers, and passport information for all who had attended the WEF in the previous three years. Virtual Monkeywrench then passed this information on to a Zurich, Switzerland, newspaper that published some of it on the newspaper’s website. Among the information published: Bill Gates’ email address, Amazon.com head Jeff Bezo’s home phone number, and a credit card number for CEO of PepsiCo Beverages Peter Thompson. But the fun doesn’t stop there. The group was also able to grab participant passwords into the database for former US President Bill Clinton, Russian President Vladimir Putin, and Palestinian Leader Yasser Arafat. The newspaper that had received all the information, SonntagsZeitung, reported the crackers had turned over a CD-ROM with 800,000 pages of data! Note: In the hacker community, hacker isn’t a negative word. Hacker is used to describe one who investigates out of curiosity. A hacker isn’t necessarily one who breaks into systems for personal gain, though that’s how the media uses the term. The community prefers the term crackers for those people who maliciously seek to break into or compromise systems.
Midwest Express and Others The one thing I will say about the previous two examples is SQL Server wasn’t explicitly mentioned. The first example, RealNames, had an NT 4.0 server running SP 5 compromised, but this was the front-end or web server, according to the InternetNews.com article on the hack (see the Additional Resources section). None have specifically pointed a finger at SQL Server or I should say “Microsoft SQL,” which even if it isn’t SQL Server, users will read as Microsoft SQL Server. Midwest Express, an airline, was hacked in April 2002 and their flight schedule and passenger manifest was stolen. The people, calling themselves the Deceptive Duo, who carried out the attack then hit another site, the US Space and Naval Warfare Systems Command, and posted Midwest Express’ passenger manifest, complete with names and emails. But the Deceptive Duo didn’t stop there. They also hacked into several government agencies and banks. One of their methods of attack was to compromise SQL Servers with a “default” password. Since both SQL Server 7.0 and SQL Server 2000 allow for the sa account to have a blank password, this is probably what they meant since SQL Server 7.0’s install doesn’t even prompt for one (though to get a blank password in SQL Server 2000 I have to knowingly choose to leave the password blank during the install). So far as the Deceptive Duo was concerned, targeting SQL Server was part of their plan. If we put ourselves in the minds of the hacker, that plan makes perfect sense. If the default install of SQL Server is ripe for the plucking, why bother with anything else? And that’s probably what the Deceptive Duo thought, too. Note: SQL Server 7.0 isn’t alone in leaving the sa password blank, a frequent item of consternation for security experts. The open source database, MySQL, is installed by default with no password for root, the super user account for the database. I haven't kept up with the most recent versions, so if this behavior has changed, please let me know in the comments section. The bottom-line, regardless of database platform, is this: secure all privileged accounts with strong passwords immediately.
Attack of the Worms All three of these attacks were targeted. Attackers deliberately went after selected systems to try and break in. Could someone fashion an attack that goes after SQL Servers one-by-one? The answer to that question is a resounding and earth-shattering, “Yes!” In November 2001, the first of two worms targeted at SQL Server were released into the wild. In November 2001, security researchers reported a new worm that attempted to log on to servers using the standard Microsoft SQL Server TCP port (1433), the sa account, and a blank password. Since the default port for a SQL Server 7 installation is 1433 and since the setup program doesn’t prompt for a sa password during installation, quite a few SQL Servers were vulnerable to this worm. The worm had successfully infected a small
78
number of systems before it was detected. Because security researchers discovered the worm very quickly, most of the systems vulnerable to it were never attacked. Note: Brian Knight alerted the SQLServerCentral.com community about what the worm did with his article Security Alert: SQL Server Worm Virus Attacking Systems. The security community reaction was swift and security experts quickly asked the owner of the FTP server to remove the file that was downloaded by this worm. The owner did so. Then security experts sent out announcements through sources such as CERT (http://www.cert.org), the SANS Institute (http://www.sans.org) and NTBugTraq (http://www.ntbugtraq.com). Because security teams responded so quickly, W32.Cblade.Worm was relatively minor in scope. It attacked in a predictable way and was easily defendable. This worm should have served to be a wake-up call for the SQL Server community. Instead, it ended up being a warning shot before the main firefight. In early May 2002 security researchers starting reporting about increased port scanning for TCP port 1433. Chip Andrews put a notice on his SQLSecurity.com site on May 4, 2002. Then on May 28th, security researchers discovered another worm in the wild. The community gave the new worm several different names; among them were SQLSnake and Digispid.B.Worm. I’ll stick with the latter. The attack pattern for Digispid.B.Worm was the same as for W32.Cblade.Worm: make a connection to TCP port 1433 and attempt to login as sa with a blank password. However, this worm was more aggressive and it was able to infect more systems than W32.Cblade.Worm. Various sources report the number of systems infected range from a hundred to thousands. Cblade was a warning but many DBAs, system administrators and network engineers didn’t heed it. These personnel weren’t prepared and hadn’t locked down their SQL Servers. As a result, Digispid.B.Worm was able to get a foothold. This new worm was more intrusive than W32.Cblade.Worm because it was far more aggressive. It required some cleanup, but it pales in comparison to the next major worm attack. Note: Brian once again covered this worm for the SQLServerCentral.com community with his article: SQLsnake Worm Hits SQL Servers and Networks Hard.
SQL Slammer If you are a SQL Server DBA and you haven't heard about SQL Slammer, now's the time for a quick education. In January 2003, the hammer came down. A SQL Server worm hit on Friday, January 24th, and moved so fast that it effectively became a Denial of Service attack against any network where it got a foothold, including portions of the Internet. SQL Slammer has been to date the most aggressive worm the world has seen, bar none. I've included a link to a study detailing its propagation in the Additional Resources section. SQL Slammer attacked UDP port 1434, the listener port for SQL Server 2000 (SQL Server 7.0 was not vulnerable). Clients use this port when they are trying to discover what SQL Servers are on the network, such as when you pull down the drop-down list in Query Analyzer. Clients also use this port to identify what TCP port to connect to for a named instance. Keep in mind the default instance typically listens on TCP 1433. If you have a named instance, it's not going to be able to use the same port. Since SQL Server 2000 will randomly assign a port number when you install the named instance, it could be anything. The way to deal with this issue is to have that listener service. The client contacts it and finds out what TCP port to use. It can then connect and all of this occurs seamlessly for the user. The problem is that this port is set for every single SQL Server 2000 installation. A target and it's not moving! If you are running a SQL Server that doesn't require network access (local use only), as of SP3a you can disable this listener port, but this wasn't the case when SQL Slammer hit. SQL Slammer took advantage of a buffer overflow attack on this listener service. What really drove the security community nuts was a patch had been made available some six months before! NGSSoftware even had demonstration code that showed what the exploit could do (and the worm "writer" used it heavily). A lot of systems weren't patched. Some systems were patched but it was later found out that a certain Microsoft hotfix to repair an unrelated issue replaced critical files. The files changed to prevent the buffer overflow were overwritten with older versions that were vulnerable. In other words, the unrelated hotfix made systems vulnerable again. All in all, it was a very big mess that required a lot of people working a lot of long hours. Even for companies that weren't hit, every SQL Server in inventory had to be checked and verified. When you include MSDE builds, this was a very sizeable effort.
79
Note: Brian's write-up on this site is: Another SQL Server Virus Hits the Internet. He followed it up with Who's to Blame for the SQL Slammer Virus.
PetCo.Com This one is my new favorite when I talk to developers within my organization about security. Though SQL Inection has gotten a lot of press in recent days, Guess (the clothing manufacturer) and PetCo.Com fell victim to this now classic vulnerability. In February 2002, Guess' website was compromised by a SQL Injection attack which netted attackers an unknown number of customer credit card numbers. So you'd figure by June 2003 every major player on the Internet would have learned their lesson, right? Not quite. Not long after Guess settled with the FTC, Jeremiah Jacks discovered PetCo.com was vulnerable to the exact same SQL Injection attack he discovered on the Guess site. What would the payoff have been if he were a malicious cracker? The prize was a database with about 500,000 credit card entries complete with names, addresses, and order information. How did he do it? He used Google to search for pages that were likely to be vulnerable then tried to use an injection attack. He estimated it took less than a minute to be successful! Imagine finding a major vulnerability for one retailer and then almost a year and a half later finding the same type of vulnerability, one easily patched mind you, on another major retailer. It's enough to drive one insane. But that's what this security researcher found. Hopefully we've all received our wake-up call, but don't be surprised if another major target falls to SQL Injection in the near future.
Concluding Remarks These six case studies represent a fraction of the literature available on cases where databases have been breached. All involve configurations that weren't secure. In most cases, simple security procedures would have stopped an attacker cold but for whatever reason these procedures weren't done. After several of these high profile cases, it would seem logical that security would receive a heavy focus from companies and organizations but the reality is things haven't changed a lot. A recent survey showed that even after the incidents on 9/11, when companies were forced to take a hard look at not only their disaster recovery but also their security procedures, very little change has occurred. This is somewhat disheartening, but not altogether surprising. Proper security takes time and effort. Often it's an afterthought on projects and seen as an impediment for delivery on-time and on-schedule. I've faced this issue myself when supporting recent projects. The difference between the ones that take security into account from the beginning as opposed to waiting until the last minute is like night and day. Somehow we have to effect a corporate culture change where security is of paramount concern. Hopefully these case studies start the discussions in your own circles that may bring about that change where you work. Remember the mantra (double negative intended): Just because I'm paranoid doesn't mean someone is not out to get me.
Additional Resources • • • • • • • • •
RealNames is Latest Hack Victim, InternetNews.com RealNames' Customer Database Hacked, C|New News.Com Davos Hack: 'Good' Sabotage, Wired News (World Economic Forum article) Hackers Say They Hack for Our Sake, PCWorld (Deceptive Duo article) Airline Database Posted on Defacement, InternetNews.com (Midwest Express article) Analysis of the Sapphire Worm, CAIDA (SQL Slammer) FTC settles with Guess on Web vulnerabilities, InfoWorld PetCo Plugs Credit Card Leak, SecurityFocus Nearly two years after 9/11, corporate security focus still lacking, ComputerWorld
© 2003 by K. Brian Kelley. http://www.truthsolutions.com/ Author of Start to Finish Guide to SQL Server Performance Monitoring.
80
TSQL Virus or Bomb? Joseph Gama 12/29/2003 Yes, the first virus made in TSQL has been created! But even more dangerous, worms can be made applying similar but simpler techniques. What could be worse than that? Time bombs hidden somewhere in the code, waiting… Screenshots of the TSQL virus in action
Before
After
81
Before we get into the facts, some definitions from cybercrimes.net : Definition of virus "A computer virus is a self-replicating program that invades and attaches itself to computer programs. Virii can interfere with the operations of their host program or alter operations of their host computer." Definition of worm "A worm is a program whose primary task is to move copies of itself between computers connected by network. Though worms do not try to cause damage to a computer, by causing copies of itself to be made a worm can disrupt the operation of computers and computer networks." Definition of time bomb "A time or logic bomb is a specific program feature. A program with a time bomb will "explode" upon the occurrence of a certain event – often the occurrence of a predetermined date. The explosion can be anything from a display of messages on the computer screen to the complete wipe of the computer's system." The most complex of those three entities is the virus, which requires intrusion, execution and replication of its code. The intrusion is theoretically impossible in a SQL Server database properly secured. As TSQL has no "low level" features, port scanning and intrusion are not possible.
How can the virus infect a database? It will have to be executed by a user. There are three possible scenarios: • An unhappy user deliberately executes the code (probably before being laid off). • A user will execute some code of uncertain origin containing the virus. • An intruder gained access to the database and executed the viral code.
TSQL virii are not a threat This is very clear from the above scenarios. Scenario a) requires more effort and is more likely to be detected than a time bomb. It makes no sense to do an inside job that is so visible and complicated. Scenario b) would be possible if the user had permissions to run the code, knowing enough TSQL as to create a stored procedure but not enough as to understand what the code really does. This is very unlikely to happen. Scenario c) is obviously very far from reality. An intruder would go through a lot of work to gain access to the database and dropping some tables or the entire database could be done immediately, why wait? But there's more: TSQL data types used in stored procedures can't go over 8 Kb. This is a great obstacle because the virus code takes some room and so, the virus can only replicate to "small" stored procedures, which makes it more visible and easier to detect.
TSQL worms are not a threat A worm would face the same problems that a virus would but it would be detected much faster and easily stopped. The "standard" worm that replicates constantly would be simply coded as a stored procedure that makes copies of itself with a random name. That is easy to create and easy to remove. The best approach would be a stored procedure that consumes resources by creating, filling with data from system tables and then dropping lots of temporary tables constantly. Why bother with a stored procedure that lays among others but with code that wouldn't be so easy to disguise when this code could be hidden in some other stored procedure?
Conclusion: time bombs are the most real and dangerous threat The three scenarios for delivering a virus are perfectly possible and quite easy and effective for a time bomb. Let's rewrite them for this situation: • An unhappy user deliberately hides the time bomb code in a section of a big stored procedure. • A careless user copies code from an uncertain origin that has the time bomb hidden. • An intruder was able to gain access to the database and, instead of causing an immediate destruction, the intruder decided to place a time bomb that would slowly and randomly corrupt data so that even the backups would be storing corrupted versions of the database.
82
This is the most dangerous and most realistic attack that I can think of; after all bad coding can have an impact on the server as negative as a sneaky and pernicious worm. How to prevent TSQL virii, worms and time bombs' attacks: • No guest accounts and no accounts with null passwords. • Make sure all user passwords are safe. • User permissions and roles are very effective when used wisely. • Check database objects regularly. • Do not allow user passwords that are not encrypted. • Check the system databases and objects not only for changes but also for the inclusion of new objects that could have dangerous code. A user could create a system stored procedure by knowing of an exploit before it was patched and obtaining the necessary rights with privilege escalation. Later the user could run it from any database. Another possibility would be to use the tempdb to store or execute malicious code. Practical solutions for each of the above ideas 1) Carefully examine user permissions and roles. Restrict access to everything but the necessary for the user to work. Look for null or non encrypted passwords: SELECT [name], dbname, [password],CONVERT(VARBINARY (256), password) FROM syslogins 2) Look for size, complexity and effectiveness of user passwords. The best solution is to create random passwords for the users but forcing the users to have long passwords is fine too. You can use some free tools from SQLServerCentral.com to assist you. 3) Create a huge and unreadable SA password. Make sure that your application is safe from SQL injection and be careful with granting permissions and managing roles. Carefully attribute roles and permissions. Even the "public role" can be dangerous. 4) Check stored procedures, UDF's and triggers for changes using CRC32 or checksum. http://www.sqlservercentral.com/memberservices/updatescript.asp?Approve=y&scriptid=655 Or changes in size: http://www.sqlservercentral.com/memberservices/updatescript.asp?Approve=y&scriptid=630(1) 5) See 1) 6) See 4) How to detect data corruption 1) Use the Database Consistency Checker (dbcc). 2) Use TSQL BINARY_CHECKSUM or CHECKSUM. For multiple rows use CHECKSUM_AGG. 3) Compare tables from backups with new ones if they seem to have changed drastically. 4) Create views to verify that the numeric data is within "normal parameters"; look for max and min values and sums of values to find possibly corrupted data. 5) If the data is alphanumeric look for ASCII or UNICODE codes that should not be there, empty records or oversized ones'. 6) Look for Nulls, zeros and repeated values. 7) Use CRC32 to validate historic data. How to detect intrusion 1) Enable login auditing in SQL Server. 2) Use triggers to track changes. 3) Explore the transaction log with a commercial tool.
The softest spot of SQL Server can be Windows Windows NT/2000 login always have access granted to SQL Server. This is an extra security risk because breaking into Windows will provide access to SQL Server and it might be easier (in very particular situations) to crack Windows security than SQL Server's. Windows 2000 is safer than NT and even NT has very tight security if properly installed and with the latest service packs and patches applied. The problem arises from the fact that there might be one machine with SQL Server but dozens of machines in the network can reach it and the permissions are loose. It is easier to find out the password for one out of dozens of machines than the one for the machine with SQL Server. It is also possible to have one of the users download or receive by email a Trojan or to run ActiveX in a web page or any other technique to get access to a machine and, from there, attack SQL Server. Win 9x/ME is very unlikely to be used as a server but although it does not provide access granted to SQL Server it can be hacked and a brute force attack, sniffing or even key logging are all possible.
83
Avoid mixed mode and Windows 9x/ME Usually that is not the case with most real life database implementations, having a certain number of users, databases and database objects related to each other in a way that requires careful management in order to allow access without compromising security. Windows authentication is the recommended security mode, not only because of Windows architecture but also because login names and passwords are not sent over the network. If the OS is not NT/2000 then mixed mode has to be used but Windows 9x/ME have some many security flaws that they should be avoided at all cost!
Do not be permissive with permissions a) Each database has specific user accounts; do not let users access databases they really have no need for. b) Do not provide users with permissions and ownership of objects in the database that they really have no need for. c) Do not allow one login to have associated users in different databases if several people share that login, unless absolutely necessary. Splitting the group of users into smaller groups each with a different login would be safer and easier to manage in the future.
In case of doubt, search the code It is very simple to create code to look for potentially dangerous keywords in stored procedures, triggers and UDF's. The following example code looks for "EXEC" in all stored procedures. DECLARE @i int, @j int, @current_proc varchar(255),@current_text varchar(8000) DECLARE _Cursor CURSOR FORSELECT o.name,c.textFROM sysobjects o INNER JOIN syscomments c ON c.id =o.id WHERE o.type='p' and o.category=0 and encrypted=0 OPEN _CursorFETCH NEXT FROM _Cursor INTO @current_proc, @current_text WHILE @@FETCH_STATUS = 0 BEGIN set @i=0 lblAgain: set @i=CHARINDEX('exec',@current_text,@i) set @j=CHARINDEX(CHAR(13),@current_text,@i)-@i IF @j<0 set @j=datalength(@current_text)-@i+1 IF @i>0 BEGIN print @current_proc print ' '+SUBSTRING(@current_text, @i, @j) SET @i=@i+1 GOTO lblAgain END FETCH NEXT FROM _Cursor INTO @current_proc,@current_text END CLOSE _CursorDEALLOCATE _Cursor
References http://vyaskn.tripod.com/sql_server_security_best_practices.htm(2) http://cybercrimes.net/98MSCCC/Article4/commentarysection403.html(3)
84
Performance The hottest topic in most every company, achieving better performance is an ongoing challenge and an essential part of any DBAs job. While Moore’s Law helps with faster and faster hardware, code bloat, larger data sets and other factors mean that the DBA must have a few tricks up their sleeve in order to tune their databases to the optimum level. Cluster That Index – Part 1
Christoffer Hedgate
86
Cluster That Index – Part 2
Christoffer Hedgate
88
Managing Max Degree of Parallelism
Herve Roggero
90
Monitoring Performance
Viktor Gorodnichenko
92
Squeezing Wasted Full Scans out of SQL Server Agent
Bob Musser
97
Troubleshooting Dynamic SQL
Lowell Smith
98
Who Cares about FillFactor?
Gregory Jackson
100
85
Cluster That Index! – Part 1 Christoffer Hedgate 3/30/2003 4:19:47 PM One topic that is sometimes discussed in SQL Server communities is whether or not you should always have clustered indexes on your tables. Andy Warren discussed this briefly in one of his articles in the Worst Practicesseries (Not Using Primary Keys and Clustered Indexes(0)), here I will give my view on this matter. I will show you why I think you should always have clustered indexes on your tables, and hopefully you might learn something new about clustered indexes as well.
What is a clustered index First off, we'll go through what a clustered index is. SQL Server has two types of indexes, clustered indexes and non-clustered indexes. Both types are organized in the same way with a b-tree structure. The difference between them lies in what the leaf-level nodes – the lowest level of the tree – contains. In a clustered index the leaf-level is the data, while the leaves of a non-clustered index contains bookmarks to the actual data. This means that for a table that has a clustered index, the data is actually stored in the order of the index. What the bookmarks of the non-clustered index point to depends on if the table also has a clustered index or not. If it does have a clustered index then the leaves of non-clustered indexes will contain the clustering key – the specific value(s) of the column(s) that make up the clustered index – for each row. If the table does not have a clustered index it is known as a heap table and the bookmarks in non-clustered indexes are in RID format (File#:Page#:Slot#), i.e. direct pointers to the physical location the row is stored in. Later in this article we will see why this difference is important. To make sure that everyone understands the difference between a clustered index and a non-clustered index I have visualized them in these two images (clustered | non-clustered(1)). The indexes correspond to those of this table: CREATE TABLE EMPLOYEES ( empid int NOT NULL CONSTRAINT ix_pkEMPLOYEES PRIMARY KEY NONCLUSTERED , name varchar(25) NOT NULL , age tinyint NOT NULL ) CREATE CLUSTERED INDEX ixcEMPLOYEES ON EMPLOYEES (name) INSERT INSERT INSERT INSERT SELECT SELECT
INTO EMPLOYEES (empid, INTO EMPLOYEES (empid, INTO EMPLOYEES (empid, INTO EMPLOYEES (empid, * FROM EMPLOYEES WHERE * FROM EMPLOYEES WHERE
name, age) VALUES name, age) VALUES name, age) VALUES name, age) VALUES name = 'John' empid = 1
(1, (2, (3, (4,
'David', 42) 'Tom', 31) 'Adam', 27) 'John', 22)
In the real indexes these four rows would fit on the same page, but for this discussion I've just put one row on each page. So, to return results for the first query containing WHERE name = 'John' SQL Server will traverse the clustered index from the root down through the intermediate node levels until it finds the leaf page containing John, and it would have all the data available to return for the query. But to return results for the second query, it will traverse the non-clustered index until it finds the leaf page containing empid 1, then use the clustering key found there for empid 1 (David) for a lookup in the clustered index to find the remaining data (in this case just the column age is missing). You can see this for yourself by viewing the execution plan for the queries in Query Analyzer (press Ctrl-K to see the plan).
Disadvantages of having a clustered index
Although my general opinion is that you should always have a clustered index on a table, there are a few minor disadvantages with them that in some special circumstances might remedy not having one. First of all, the lookup operation for bookmarks in non-clustered indexes is of course faster if the bookmark contain a direct pointer to the data in RID format, since looking up the clustering key in a clustered index requires extra page reads. However, since this operation is very quick it will only matter in some very specific cases.
86
The other possible disadvantage of clustered indexes is that inserts might suffer a little from the page splits that can be necessary to add a row to the table. Because the data is stored in the order of the index, to insert a new row SQL Server must find the page with the two rows between which the new row shall be placed. Then, if there is not room to fit the row on that page, a split occurs and some of the rows get moved from this page to a newly created one. If the table would have been a heap – a table without a clustered index – the row would just have been placed on any page with enough space, or a new page if none exists. Some people see this as a big problem with clustered indexes, but many of them actually misunderstand how they work. When we say that the data in clustered indexes are stored in order of the index, this doesn't mean that all the data pages are physically stored in order on disk. If it actually was this way, it would mean that in order to do a page split to fit a new row, all following pages would have to be physically moved one 'step'. As I said, this is of course not how it works. By saying that data is stored in order of the index we only mean that the data on each page is stored in order. The pages themselves are stored in a doubly linked list, with the pointers for the list (i.e. the page chain) in order. This means that if a page split does occur, the new page can still be physically placed anywhere on the disk, it's just the pointers of the pages prior and next to it that need to be adjusted. So once again, this is actually a pretty small issue, and as you will see later in the article there are possible problems of not having a clustered index that can have much more significance than these minor disadvantages.
Advantages of having a clustered index Apart from avoiding the problems of not having a clustered index described later in this article, the real advantage you can get from a clustered index lies in the fact that they sort the data. While this will not have any noticeable effect on some queries, i.e. queries that return a single row, it could have a big effect on other queries. You can normally expect that apart from the disadvantages shown above, a clustered index will not perform worse than non-clustered indexes. And as I said, in some cases it will perform much better. Lets see why. Generally, the biggest performance bottleneck of a database is I/O. Reading data pages from disk is an expensive operation, and even if the pages are already cached in memory you always want to read as few pages as possible. Since the data in a clustered index is stored in order this mean that the rows returned by range searches on the column(s) that are part of the clustered index will be fetched from the same page, or at least from adjacent pages. In contrast, although a non-clustered index could help SQL Server find the rows that satisfy the search condition for the range search, since the rows might be placed on different pages many more data pages must be fetched from disk in order to return the rows for the result set. Even if the pages are cached in memory each page needs to be read once for every bookmark lookup (one for each hit in the non-clustered index), probably with each page read several times. You can see this for yourself in Script 1 on the web. As you can see, a carefully placed clustered index can speed up specific queries, but since you can only have one clustered index per table (since it actually sorts the data) you need to think about which column(s) to use it for. Unfortunately the default index type when creating a primary key in SQL Server is a clustered index, so if you're using surrogate keys with an auto-incrementing counter, make sure you specify non-clustered index for those primary keys as you will probably not do range searches on them. Also please note that ordering a result set is a great example of where a clustered index can be great, because, if the data is already physically stored in the same order as you are sorting, the result set the sort operation is (generally) free! However, make sure you don't fall into the trap of depending on the physical ordering of the data. Even though the data is physically stored in one order this does not mean that the result set will be returned in the same order. If you want an ordered result set, you must always explicitly state the order in which you want it sorted.
Problems with not having a clustered index
I have now shown the minor disadvantages that might occur from having a clustered index, plus shown how they can speed up some queries very much. However, neither of these facts are what really makes me recommend you to always have a clustered index on your tables. Instead it is the problems that you can run into when not having a clustered index that can really make a difference. There are two major problems with heap tables, fragmentation and forward-pointers. If the data in a heap table becomes fragmented there is no way to defragment it other than to copy all the data into a new table (or other data source), truncate the original table and then copy all data back into it. With a clustered index on the table you would simply either rebuild the index or better yet, simply run DBCC INDEXDEFRAG which is normally better since it is an online operation that doesn't block queries in the same way as rebuilding it. Of course in some cases rebuilding the index completely might actually suit your needs better. The next problem, forward-pointers, is a bit more complicated. As I mentioned earlier, in a non-clustered index on a heap table, the leaf nodes contain bookmarks to the physical location where the rows are stored on disk. This
87
means that if a row in a heap table must be moved to a different location (i.e. another data page), perhaps because the value of a column of variable length was updated to a larger value and no longer fits on the original page, SQL Server now has a problem. All non-clustered indexes on this table now have incorrect bookmarks. One solution would be to update all bookmarks for the affected row(s) to point at the new physical location(s), but this could take some time and would make the transaction unnecessarily long and would therefore hurt concurrency. Therefore SQL Server uses forward-pointers to solve this problem. What forward-pointers mean is that, instead of updating the bookmarks of non-clustered indexes to point to the new physical location, SQL Server places a reference message at the old location saying that the row has been moved and including a pointer to the new location. In this way the bookmarks of non-clustered indexes can still be used even though they point to the old location of the row. But, it also means that when doing a bookmark lookup from a non-clustered index for a row that has been moved, an extra page read is necessary to follow the forwardpointer to the new page. When retrieving a single row this probably won't even be noticed, but if you're retrieving multiple rows that have been moved from their original location it can have a significant impact. Note that even though the problem stems from the fact that SQL Server can't update the non-clustered index bookmarks, it is not limited to queries using the indexes. The worst case scenario is a query where SQL Server needs to do a table scan of a heap table containing lots of forward-pointers. For each row that has been forwarded SQL Server needs to follow the pointer to the new page to fetch the row, then go back to the page where the forward-pointer was (i.e. the page where row was originally located). So, for every forwarded row, SQL Server needs two extra page reads to complete the scan. If the table would have had a clustered index, the bookmarks of all non-clustered indexes would have been clustering keys for each row, and physically moving a row on disk would of course not have any effect on these. An extreme example of this is shown in Script 2(3). Even though this example may be a bit extreme, forward-pointers are likely to become a problem in tables where rows are sometimes moved, because there is no way in SQL Server to remove forward-pointers from a table.
Summary In this article I have described what a clustered index is and how they differ from non-clustered indexes, and I have also tried to show you why I think that you should always have a clustered index on every table. As I said there are, of course, exceptions, but these are so uncommon that I always check that all tables have clustered indexes as one of the first things I do when performing a database review. Please post your thoughts on this matter in the feedback section.
Cluster That Index – Part 2 Christoffer Hedgate 10/8/2003
I have previously discussed the issue of forward-pointers in the article Cluster that index! –Part 1. I described what forward-pointers are and how they are created by SQL Server. I also supplied a script that showed the effect of forward-pointers, but I did not discuss how to check for the existence of forward-pointers and how to remove them. This article will discuss this.
Recap of problem Forward-pointers are created by SQL Server to avoid making transactions longer than necessary. As described in the article mentioned above, the leaf level pages of non-clustered indexes contain pointers to the data that is indexed by them. If the table that the index is created on has a clustered index created for it, these 'pointers' are bookmark lookup values, each one containing a key value to look up in the clustered index. If the table does not have a clustered index, i.e. a heap table, these pointers point to the actual physical location of the rows in the data files. The problem is that data rows sometimes need to be moved to another data page. One reason is when the value of a variable length column is changed and the row no longer fits into the page where it is located. Now SQL Server must either change all of the pointers for this row (in all non-clustered indexes for the table) to its new location, or it can use forward-pointers. A forward-pointer is simply a pointer left in the original location of the row, pointing to the new location. This way no indexes need to be updated, SQL Server just follows the forward-pointer to the new location of the row when it needs to fetch it. As I said, instead of updating the pointers in all nonclustered indexes each time a row is moved, SQL Server uses forward-pointers to avoid making the transactions longer than necessary.
88
The problem with forward-pointers is that they can create a lot of extra I/O. When scanning a heap table containing forward-pointers, SQL Server needs two extra page reads for every forward-pointer, which in extreme situations might be very cumbersome. A script that showed this was supplied in the other article.
Checking for forward-pointers There are two ways in SQL Server to check for the existence of forward-pointers in a heap table. Before we view how to do this, use the following code snippet to create a table to use later: USE Northwind GO IF EXISTS(SELECT * FROM sysobjects WHERE name = 'Orders2') DROP TABLE Orders2 GO SELECT * INTO Orders2 FROM Orders GO ALTER TABLE Orders2 ADD BigString varchar(4000) GO CREATE NONCLUSTERED INDEX ixOrders2CustomerID ON Orders2 (CustomerID) GO The first way to check for forward-pointers is by using DBCC SHOWCONTIG, and supplying the option WITH TABLERESULTS. This option adds extra columns to the output of DBCC SHOWCONTIG, one of them is called ForwardedRecords. This column shows how many records (rows) of the table that have been moved to a new location and have a forward-pointer left in their original location. The syntax to run this is shown below, where @id represents the object id of the table you want to check (use OBJECT_ID() to retrieve this id): DBCC SHOWCONTIG (@id, 0) WITH TABLERESULTS At the moment the result of this command should show that Orders2 has 0 forwarded records. However, if you run the code below, the result will be very different: UPDATE Orders2 SET BigString = REPLICATE('-', 4000) DBCC SHOWCONTIG (@id, 0) WITH TABLERESULTS This time the table contains 810 forwarded records. The other way to check for forwarded rows is by running DBCC CHECKTABLE for the table you want to check. To have CHECKTABLE return info about forwarded records trace flag 2509 must be activated, so the following code will return info about forward-pointers in Orders2: DBCC TRACEON (2509) DBCC CHECKTABLE ('Orders2')
Removing forward-pointers from a table In the article about clustered indexes I said that there is no way in SQL Server to remove forward-pointers. Although there is no system procedure or DBCC command to simply remove them, there is actually a way to get rid of them. It is a very simple and effective solution, but for large tables it might take some time. Simply create a clustered index for the table, which will update all leaf-level pages of non-clustered indexes to contain bookmark lookup values for the clustered index instead of physical file pointers. Since the leaf-level pages of a clustered index contains the actual data rows in sorted order (as described in the article about clustered indexes), the data rows will need to be resorted and moved, at the same time removing the forward-pointers. If you don't want to keep the clustered index, just drop it and the non-clustered indexes leaf-levels will be changed back into pointers to the physical location of the data rows, however this time they will point to the actual location of the rows. As a final note, when a database is shrunk, the bookmarks of non-clustered indexes are reassigned and therefore any forward-pointers located on pages that are removed by the shrinking process are removed.
89
Managing Max Degree of Parallelism Herve Roggero 6/23/2003
Introduction In situations where your tuned T-SQL statements are pushing the limits of your CPUs, more processing power may be needed. Deploying database servers on two, four or even eight SMP systems is rather straightforward. SQL Server usually scales almost in a linear fashion on up to eight processors. However, some SQL Server installations may require up to 32 processors. In this kind of environment, configuration parameters that are usually ignored in smaller configurations come into play and can offer significant performance improvements. We will take a look at the Maximum Degree of Parallelism (DOP) and see how and why it may make sense to change its default setting.
Parallel Queries Performance Limitations When adding processors to SQL Server, the database engine will evaluate how to best leverage the available processors through internal algorithms. In essence, when receiving a SQL statement to process, SQL Server determines which processors are available, what the overall cost of the query is and executes the query on as many processors as necessary if the cost of the query reaches a configurable threshold. When 4 processors are available on a server, the likelihood of SQL Server using all processors for a complex SELECT statement is pretty high. The same holds true in larger environments. For instance on 16 processors, SQL Server will frequently use 12 or more processors to execute complex SELECT statements. This may turn out to be an issue for a couple of reasons. First, using more processors means managing more threads and requires more cache synchronization. System -> Context Switches/Sec is a measure of this effort. The more processors are used for a process, the higher this counter will be. In addition, SQL Server has more coordination to perform since it needs to slice and regroup the work spread over the processors. Since by default SQL Server will use as many processors as it can, upgrading your SQL Server from 8 to 12 processors may actually degrade the overall performance of your database. Although there are no golden rules, it appears that in most cases using more than 8 processors for a SELECT statement can degrade performance (although this may vary greatly by system).
Enforcing a Maximum DOP The DOP can be set in two ways. The first way is to include the OPTION (MAXDOP n) keyword in your T-SQL statement. For example, the following query will execute with a maximum of 4 processors, regardless of how many processors have been allocated to SQL Server: SELECT * FROM master..sysprocesses OPTION (MAXDOP4) The other approach is to set the maximum DOP at the database instance level, hence limiting the maximum number of CPUs to be used for any given query. To set this option at the system level, run the following command from Query Analyzer: EXEC sp_configure 'show advanced option', '1' RECONFIGURE GO sp_configure 'max degree of parallelism', 0 RECONFIGURE GO Note that this can be set differently for each instance of SQL Server. So if you have multiple SQL Server instances in the same server, it is possible to specify a different Maximum DOP value for each one. On large SMP systems, setting the maximum DOP to 4 or 8 is not unusual. The default value for this parameter is 0, which allows SQL Server to use all allocated processors. The following test shows the Context Switches/Sec and average response time of a T-SQL statement running off a few million records. The server utilized for this test
90
was loaded with the /PAE boot.ini option, 16 processors and 8GB of RAM. The statement is as follows (the statement itself is of little importance, but notice the OPTION keyword) Select (UnitPrice - UnitCost) * TotalUnitsSold FROM Salesdb..salesdata (NOLOCK) WHERE SalesYear = 2000 GROUP BY UPC ORDER BY 1 OPTION (MAXDOP 2) This statement was loaded 500 times in a table in a format that Profiler could understand. Then four Profilers were loaded on that same server, each running the content of the same table. So SQL Server was receiving four select statements at once. Note the (NOLOCK) hint that forces SQL Server to read the data without generating any locks. The results are as follows: DOP
Context Switches/Sec
Avg Execution Time
2
4280
12
4
5700
7.5
8
10,100
6
12
11,200
8.5
16
13000
9
As more processors are added to the query (by using the MAXDOP option), the Context Switches/Sec increases up to 13,000, which is expected behavior. This is really a low number, considering that we are only executing 4 statements at any single point in time. This graph shows that starting at 12 processors, the execution time degrades. Although it takes 12 seconds to execute this statement on 2 processors, it takes about 6 seconds on eight CPUs. However, we see that setting the DOP to 12 or 16 degrades the overall performance of our query when compared to a DOP of 8. Leaving the default Maximum Degree of Parallelism value of 0 would yield the same result as the DOP of 16 in our test. Hence, changing the DOP to 8 in our scenario would provide a 30% performance improvement over a DOP of 0 (or 16). Enforcing a system-wide Maximum DOP is a good practice since this allows you to control the maximum number of processors SQL Server will use at any given time, regardless of the statement, as long as the MAXDOP is not used in the query (which would override the global Maximum DOP setting).
91
Conclusion SQL Server has many parameters that give you more control on the performance of your databases. Understanding how SQL Server behaves on servers with 8 processors or less gives a strong understanding of the capabilities of SQL Server. However, SQL Server offers specific configuration parameters that may give you extra performance on larger systems. The Maximum Degree of Parallelism is a key parameter for environments with 8 or more processors, and allows you to gain control on the maximum number of processors used for a query. When deciding which DOP you should use, careful evaluation of your environment is needed. Certain queries may perform better with a DOP of 4, or even 1. Testing your environment with multiple DOPs should give you the answer. In cases where your database environment functions in OLTP and OLAP mode (for live reporting), you may consider setting a default DOP for SQL Server that works best for your OLTP system and use the OPTION keyword for your OLAP T-SQL to use the DOP that works best for these queries. Finally, SELECT statements are not the only types of statements that can take advantage of the DOP, specially if your action queries use correlated queries (in which a SELECT statement is found inside an UPDATE statement for example). The Maximum DOP is an advanced setting, and as such it is wise to test it thoroughly before making a decision in your production environment.
Monitoring Performance Viktor Gorodnichenko 6/9/2003
DBAs are in charge of performance on production SQL servers and, sure, they hate to hear complaints from endusers that the system is slow. But the truth is that often the complaints are not groundless. As long as for developers performance is a last thing to care about after delivering a required functionality, providing wonderful interface, debugging and loud celebrating a new release shipped to production, then we have what we have. As a result, managers often ask “What’s going on on the server?” and DBAs really need to have a clear and accurate answer.
Getting active processes For myself I created a stored procedure sp_ActiveProcesses showing what's running on the SQL Server. The sp has one parameter – time interval – to retrieve a snapshot of running processes. Default is 5 sec. It can be decreased to make the snapshot more instantaneous or increased to catch consumers for a longer term. A process to hit into the snapshot must be running at the beginning and at the end of the period. That's why there is no sense in making the interval too long (though maximum allowed is 59 sec). I run the sp so often that I had to create a shortcut in my Query Analyzer (Tools/Customize/Custom ...). Install the sp, set the shortcut assigning the sp, say, to Ctrl-4, press these two buttons and you've got the picture (the list of processes wrapped to fit on the page): CPU_ConsumedInTheTimeFragmen TotalPhy IO_InTheTimeF ProcessId TotalCPU ... t sical_IO ragment ------------------------------------------------------- ---------------... --------55 239 109 21 10 ... 85 31328 31 7521 536 ... 88 5678 1001 795 164 ... ...
Hostnam e -------BillS KirkA
ApplicationNam e --------------MSP MSP
NT_LoginName ------------Company\Bill Company\Kirk
DatabaseName -----------MSP MSP
SPIDBuffer ---------------------GetContacts ReassignTimeApprover
92
KimN
MSP
TheFragmentDuration
Company\Kim
NumberOfCPUs
------------------------------------5123 2
MSP
InsertExpense
SUM_CPU_Consum ed --------------1141
SUM_Physical_IO_Committed -------------------------------710
Maximum SUM_CPU_Consumed can be TheFragmentDuration times NumberOfCPUs. In the example above it is 10246 milliseconds. Just one note about the accuracy of the numbers being showed by the sp. Microsoft Knowledge Base Article 309377 says: In Microsoft SQL Server 2000 (all editions) CPU time for a particular server process ID (SPID) may not accumulate correctly. It is explained as: SQL Server maintains a pool of worker threads that execute the tasks given to it. SQL Server may assign a different thread from this pool to a given SPID to execute a new query batch. When SQL Server assigns a different thread to a SPID, SQL Server does not properly calculate the CPU time to indicate the accumulated CPU time up to that point for the SPID. Microsoft has confirmed this to be a problem in SQL Server 2000. This problem was first corrected in Microsoft SQL Server 2000 Service Pack 2. The wrong calculation almost never happens to user processes, which are most important for us, even if there is no SP2 for SQL Server 2000 installed. On the contrary, I saw lots of such cases with system processes, for instance, replication agents.
Getting blocked processes There are two kinds of SQL Server processes in the scope of DBA's concern: 1. Consumers 2. Long-runners Consumers eat up server resources. By identifying and optimizing the consuming codes you can free some room for other processes. Long-runners are codes having outstanding duration. Exactly they make end-users uncomfortable. However for developers to provide optimization "what consumes more" is more important than "what works longer". Because sometimes innocent processes take minutes being blocked by other processes (or maybe not blocked, but just waiting for resources – CPU, I/O to be released). And developers can do nothing about the cases. Not these codes but rather those blocking/consuming codes must be optimized. In other words: “consumption” indicates how healthy is the code, “duration” indicates how healthy is the system (hardware, surrounding processes etc). To identify processes blocked by others I created sp_BlockedProcesses (Listing 2). Assign it to the shortcut, say, Ctrl-5, press the buttons and here we go: BlockedSPID
BlockedBuff er
BlockingSP ID
BlockingBuff er
waitresour ce
BlockedHostna me
-----------
------------
-----------
-------------
-----------
--------------
21
GetLateTask s
65
GetImage
21
KimN
5
SetStatus
65
GetImage
21
JasonC
. . . . . . . . . . . .
I bet you can recall cases when some simple code was started and a quick reply was expected, but it seemed was hanging. It is quite probable that by pressing Ctrl-5 you'd see what is the matter.
Sending emails with top consumers Both sp_ActiveProcesses and sp_BlockedProcesses are instantaneous. Of course a DBA needs overall reports showing top consumers and long-runners. I can share how I organized it in the company I work for.
93
1. A job was scheduled to run every morning to start a trace on production servers. One of the trace parameters specifies time to stop the trace. The job runs at 8:30 AM and the trace stops itself at 5:30 PM, when the peak of user activity on production servers is over. The name of the executed stored procedure: spTraceBuild (Listing 3). spBuildTrace is based on a very useful stored procedure build_trace from the Microsoft Knowledge Base Article Q283790. I did the following minor modification: a. Added server name and current date to the trace file name b. Added error handling. If a mistake was done (for example, an option value is incorrect or a date to stop the trace has been already expired) and the trace wasn't created, it's nice to get a message. c. Changed the code to expect only time to stop the trace (instead of date/time) – 5:30 PM in my case. I.e. you don't need ever to modify ActivityTrace.ini. The trace will be always stopped at the specified time on the same day it was started. spTraceBuild gets configuration data from a text file named ActivityTrace.ini. Its contents could be like: @tracefile
=
@maxfilesize @stoptime @options @events @columns @filter1
= = = = = =
\\Process02\D$\Program Files\Microsoft SQL Server\MSSQL\LOG\Trace 15000 5:30 2 10,12 1,3,12,13,14,16,17,18 10, 0, 7, N'SQL Profiler'
Apparently you need to modify at least the first parameter, @tracefile, to make it appropriate. Two types of events give us consumption numbers: 10 - RPC Completed 12 - Batch Completed 2. Another job was scheduled to run every night to absorb the trace file and process it, i.e. to insert the trace data into a SQL Server table and aggregate the information. Why collect the data into a file to bulkcopy them afterward into a table instead of collecting them directly into the table? Firstly, collecting trace data into a file works faster, secondly, you cannot run a trace programmatically into a table as you do when starting a trace from Profiler. I created the following aggregations: - top CPU consumers - top long-runners 3. The processing job sends an email containing the reports to managers of development. Every morning development managers can find the "Top consumers" report in their Inbox. That is important as long as performance is a serious issue in my company. You can schedule the trace and processing/reporting jobs to run once a week, for example, on Tuesday, if user activity and workload do not differ from day to day. The name of the processing/reporting stored procedure is spProcessTrace (Listing 4). An important part of the code is a UDF fnExtractSPNameFromTextData (Listing 5). Definitely, you can aggregate data from a trace only if codes running against the server are named codes as stored procedures are. Ad hoc queries will be out of the scope. However, I do not think any interactive, frequently executed codes can be running as ad hoc queries, which would need compilation on the fly. Therefore, all real top consumers should be represented in the report. Run the sp from a scheduled job as: EXEC spProcessTraceFile @ServerName = 'MyProductionServerName', @ReportDate = null, -- date the trace file was created on (default - current) @TraceFilePath = '\\MyProcessingServerName\C$\Program Files\Microsoft SQL Server\MSSQL\LOG\', @recipients = '[email protected];[email protected]' The value specified in @TraceFilePath should match to @tracefile in ActivityTrace.ini. Resulting report for top CPU consumers looks like:
Top Process02 CPU consumers for Feb 22, 2003: SP -----------------------RemoveCoManager UpdateProspect
TimesExecuted -------------7 110
TotalCPU ----------615531 474517
AverageExecTime ----------------87933 4313
MinCPU -----------0 2328
MaxCPU ------595062 29062
94
AddStandardTaskTime TaskbyResource GetAssetTypes SubmitExpenseById BillingRatesByBillingOffices SessionCleanUp CheckSummaryTask RollupSummaryTask CreateBatchNumber RejectTime DeleteBillingRole ProjectSummary GetApprovedInvoices ProgressProject AddSubProject InsertExpense LoadCustomer PercentOfContractIncurred GetTaxes RolesFeatures GetWorkflowTypes GetDraftInvoices
673 2480 5318 1583 110 1231 230 207 2720 1345 12 143 12 228 280 7 16 164 8 6 246 250
457651 130684 88720 63696 63164 56099 16443 15844 14146 13396 12108 10003 9767 8322 7875 7422 6953 5790 5469 5330 4519 4439
680 52 16 40 574 45 71 76 5 9 1009 69 813 36 28 1060 434 35 683 888 18 17
594 0 0 0 32 0 46 0 0 0 578 15 718 0 0 0 312 15 640 750 0 0
829 6656 78 719 1312 19406 110 281 32 79 1390 172 1032 94 265 5906 688 47 828 1016 78 63
Consider locations of the codes and files as the following: spBuildTrace msdb on a production server spProcessTrace DB Traces on a processing server fnExtractSPNameFromTextData DB Traces on a processing server C:\Program Files\Microsoft SQL ActivityTrace.ini Server\MSSQL\Binn\ on the production server
Activity Graph The Results Pane of Query Analyzer does not allow us to represent results graphically, but spActivityGraph (Listing 6) challenges this limitation. The stored procedure uses the trace table created by spProcessTrace. spActivityGraph shows how processes running on the SQL Server interfere with each other. Looking at the graph you can see peak-times, concurrency and how this or that process is taking longer than usually being surrounded by a tough company of other processes: StartTime
Durati on
11:38:42
12033
InsertExpense
11:39:10
6220
GetAssetTypes
11:40:00
122810
GetRequestsToApprove
11:40:06
52826
GetMonthlyRevenue
11:40:11
30516
GetApproverTimes
11:40:16
30527
GetApproverTimes
11:40:17
30533
GetApproverTimes
Text
11:3011:3911:4911: 59
------------------------------
95
11:40:25
30530
PopulatePlanFact
11:40:28
30543
ProjectSummary
11:40:28
30516
LoadLeadByResource
11:40:30
30513
ProjectSummary
11:40:36
11736
SetLockAllTask
11:40:38
21623
InvoiceByClient
11:40:42
103116
PopulatePlanFact
11:40:44
15780
GetDraftInvoices
11:40:49
10310
InsertAd
11:40:50
9513
ModifyCodeUpdatedExpen se
11:40:51
8280
DeleteBillingRole
11:40:59
60966
ProjectSummary
11:41:04
30516
AutoEscalate
11:41:07
30446
GetLicenceUpdate
11:41:21
5046
GetImageBatch
-------------------------------------------
spActivityGraph has 6 input parameters: @TraceTable @TraceStart @TraceEnd @SPNameLayoutLen @DurationLayoutLen @LayoutWidth
-
name of the trace table created in spProcessTrace start time of the period end time of the period width of the column with SP names in the report. Default - 20. width of the column with duration values in the report. Default - 6 width of the report. Default - 115 symbols (wrapping would make the picture totally messed up).
I would suggest building graphs for 30-minute intervals. If you increase the interval trying to cover the entire day or at least a half of the day in one graph, duration of one unit (shown by dash '-') will be also increased and even processes with duration more than 10 sec will be missed in the graph. What if nonetheless you would like to see the activity for the entire day? No problem: the stored procedure spShowActivityGraphByChunks (Listing 7) will give you the full day picture divided into 0.5-hour pieces. The only 2 mandatory input parameters for the stored procedure (@ServerName, @ReportDate) serve to identify a trace table to work with.
Conclusion Stored procedures showing instantaneous and overall performance reports give us a clear picture of user activity on production SQL Servers and help us to find ways to make the performance better.
96
Squeezing Wasted Full Scans out of SQL Server Agent Bob Musser 2/13/2003
Introduction: This tweak in this article was done on a server running NT 4, SP6, with SQL Server 7 SP4 installed. The machine is a dual processor 1Ghz Pentium 3 with 2 Gig of ram. As always, make a backup first – your mileage may vary. This isn't for the faint of heart, it involves editing a MS supplied system stored procedure. Additionally, if you're using SQL Server Agent Alerts you won't see any performance benefit.
The Problem: While using the NT Performance Monitor to check out our server this weekend, I decided to add Full Scans/second and Index Searches/second to the graph to give me some general feedback on data access and how well our indexes and queries were designed. (You'll find Full Scans/second and Index Searches/second under SQL Server/Access Methods in Performance Monitor.) I was disappointed to find an average of 16 Full Scans/second even with a light load. It peaks every 20 seconds at 161 Full Scans/second, the rest of the time it's pretty much at zero. All in all, a very regular "heartbeat" looking graph, although a very slow one. I was quite unhappy to think that our software was doing a full table or index scan on such a regular basis and decided to dive into QA and the SQL Profiler to find out who was at fault and get it fixed. I'll spare you the details of the time I spent with profiler and reviewing our code (way too long to admit) to find the culprit, here's the summary: It was SQL Server Agent doing the full scans. Now, like many of you, we use SQL Server Agent to manage backups and optimizations. We have nightly full backups, weekly optimizations and transaction log backups several times an hour. But not anything every 20 seconds. Regardless, it seems that Agent checks out something every 20 seconds. A couple of questions immediately came to mind. One: Full scans are supposed to be a bad thing. Is this amount worth chasing? Probably not, but I wanted the Full Scans/second to be a red flag for me. Besides, 161 Full Scans 3 times a minute adds up eventually and we have a relatively busy server. The second question: How do I fix it? Can I add an index to whatever table Agent is scanning? Can I turn off a piece of Agent that I don't use like Alerts? Using Profiler, I found that the scans occur when Agent runs msdb.dbo.sp_sqlagent_get_perf_counters to see what's going on with your server so it can decide if you need any alerts generated. This scan takes place whether you have any defined, active alerts or not. I decided to "improve" on MS's efforts just a bit. The SP does two things. It first makes a temp table of all your enabled, defined alerts. The check for (performance_condition IS NOT NULL) is most likely done because the sample Alerts that come installed are enabled, but don't have a performance condition. Secondly, the SP does a pretty involved Select statement against the Master DB to find the alerts in your temp table that have out of band numbers in the Master DB. This second section of the SP is complex enough that I didn't want to rewrite it and I immediately ruled out adding any indexes to the tables it was looking at because they are tables in the Master DB.
The Fix – Modify msdb.dbo.sp_sqlagent_get_perf_counters: It seems smarter to me to just avoid that second chunk entirely if you don't have any alerts defined. So that's what I did: I just added a check on @@RowCount after the Insert section. If you don't have any Alerts that are enabled, you don't add any rows to the Temp table. I included the possibility that SQL calls this SP with the "All_Counters" flag set to 1 because they designed it that way, but I haven't caught it being used yet. My admittedly simple modification is in bold. --First Section --Insert on temp table for each alert IF (@all_counters = 0) BEGIN INSERT INTO #temp SELECT DISTINCT SUBSTRING(performance_condition, 1,
97
CHARINDEX('|', performance_condition, PATINDEX('%[_|_]%', performance_condition) + 1) - 1) FROM msdb.dbo.sysalerts WHERE (performance_condition IS NOT NULL) AND (enabled = 1) END If (@@RowCount > 0) or (@all_counters = 1) Begin --Long Select Statement against master.dbo.sysperfinfo --that checks every performance counter SQL has --and has a "not equals" in the Where clause End
Conclusion: It's working just fine. My every 20 second Full Scans are gone from the NT Performance monitor. Presumably if I add alerts in the future, I haven't broken the process. And my original goal, of treating any Full Scans as bad things that need to be investigated, is easier to monitor. Besides, over 695,000 full scans aren't taking place on my server every day now. MS probably wrote a good SP here in the first place. I think that they added the portion of the Where clause with the "not equals" later to avoid some problem. With the (spi1.cntr_type <> 1073939459) present in the second section, any index on the master.dbo.sysperfinfo table won't be used efficiently, resulting in the full scan.
Troubleshooting SQL Server with the Sysperfinfo Table Joseph Sack 5/14/2003
When given a choice between using GUI tools and using Transact-SQL, I choose the latter whenever possible or practical. This isn’t from a sense of technical superiority, but rather a need to counteract my lazy nature. This article will briefly describe a few queries that I use to troubleshoot memory bottleneck issues that are normally identified using System Monitor (Performance Monitor). System Monitor is useful for tracking trends over time (using counter logs), however sometimes I like to see snapshots of the current state of a SQL Server Instance. Using Query Analyzer, you can add or integrate these queries I detail into your own Transact-SQL script library or procedures as you see fit. SQL Server 2000 memory address space is made up of the memory pool and the executable code pool. The executable code pool contains memory objects such as loaded OLE DB Provider DLLs for distributed queries, extended stored procedure DLLs, and executable files for the SQL Server engine and net-libraries. The memory pool contains the various system table data structures; buffer cache (where data pages are read), procedure cache (containing execution plans for Transact-SQL statements), log cache (each transaction log for each database has its own cache of buffer pages), and connection context information. The memory pool is often the highest consumer of memory for busy SQL Server instances. Generally speaking, I've identified most "true" memory bottleneck issues via errors that manifest in the SQL Log. For example, a user may submit a prepared statement with an enormous IN clause. In such a scenario, we may see an error such as "Failed to reserve contiguous memory of Size=XXXX". When I see this error, I like to run a few different queries in Query Analyzer to pinpoint any abnormally high or low numbers. In all of these queries, I use the sysperfinfo system table. This table is used to store internal SQL Server performance counters – the very same counters that are retrieved by using System Monitor. When investigating a potential memory bottleneck scenario, I begin by checking the total memory used by the SQL Server executable. For a default instance of SQL Server I execute: SELECT cntr_value/1024 as 'MBs used' from master.dbo.sysperfinfo where object_name = 'SQLServer:Memory Manager' and counter_name = 'Total Server Memory (KB)' For a Named instance, I use the following code instead, where InstanceName is the second part of your Named Instance name, for example SERVERNAME\INSTANCENAME:
98
SELECT cntr_value/1024 as 'MBs used' from master.dbo.sysperfinfo where object_name = 'MSSQL$InstanceName:Memory Manager' and counter_name = 'Total Server Memory (KB)' This query returns the total MBs used by SQL Server. Of course, this number can fluctuate from second to second. Using the System Monitor may become necessary in order to track trends in memory utilization, in which case you could create a counter log (not covered in this article). When viewing the total server memory, let's start with the obvious questions… Is the total MB used by SQL Server less than the maximum available? Maximum memory usage should cause you to dig further. Less than maximum should also cause concern if your SQL Server instance is on a machine with other applications (not recommended). SQL Server may not be reaching its potential if it has to compete for resources. This next query is used for returning the size of the buffer cache, procedure cache, and free pages in MBs for a Default instance. For querying Named Instances, remember to replace 'SQLServer:Buffer' with 'MSSQL$InstanceName:Buffer Manager'. SELECT 'Procedure Cache Allocated', CONVERT(int,((CONVERT(numeric(10,2),cntr_value) * 8192)/1024)/ 1024) as 'MBs' from master.dbo.sysperfinfo where object_name = 'SQLServer:Buffer Manager' and counter_name = 'Procedure cache pages' UNION SELECT 'Buffer Cache database pages', CONVERT(int,((CONVERT(numeric(10,2),cntr_value) * 8192)/1024)/1024) as 'MBs' from master.dbo.sysperfinfo where object_name = 'SQLServer:Buffer Manager' and counter_name = 'Database pages' UNION SELECT 'Free pages', CONVERT(int,((CONVERT(numeric(10,2), cntr_value) * 8192)/1024)/1024) as 'MBs' from master.dbo.sysperfinfo where object_name = 'SQLServer:Buffer Manager' and counter_name = 'Free pages' Regarding these results returned from this query, keep watch for very high or low numbers. For example, with “contiguous memory” errors look out for a large buffer cache coupled with a small procedure cache (small being relative to your query activity, of course). Sometimes prepared statements or other user queries may suffer when the procedure cache is unable to expand due to fully utilized buffer caches. This is by no means a full account of SQL Server memory bottleneck investigation methodology, but rather a helpful technique that you can use in your troubleshooting toolkit.
99
T-SQL Each major database platform has its own slightly different version of SQL. Once you get past a basic select, insert, update, or delete, the vendors have added some unique features to their products to allow you to work with the data differently. This section looks a number of interesting ways to use Transact SQL or T-SQL, SQL Server’s version of the Structured Query Language. A Lookup Strategy Defined Create Maintenance Job with a Click without using a Wizard Creating a PDF from a Stored Procedure Creating a Script from a Stored Procedure Date Time Values and Time Zones
David Sumlin Robin Back M Ivaca Ryan Randall Dinesh Asanka
101 104 110 112 115
Find Min/Max Values in a Set
Dinesh Asanka
117
Gathering Random Data
Brian Knight
118
It Cant be Done with SQL
Cade Bryant
120
Managing Jobs Using TSQL
Randy Dyess
124
Multiple Table Insert
Narayana Raghavendra
128
Reusing Identities
Dinesh Priyankara
132
Sequential Numbering
Gregory Larsen
135
Using Built in Functions in User Defined Functions
Nagabhushanam Ponnapalli
138
Using Exotic Joins in SQL Part 1
Chris Cubley
139
Using Exotic Joins in SQL Part 2
Chris Cubley
141
Understanding the Difference Between IS NULL and =NULL
James Travis
144
100
A Lookup Strategy Defined David Sumlin 2/20/2003 Most database designs nowadays seem to have at least a few if not many lookup or reference tables. (I’ll use these two terms interchangeably) These tables are those small tables in which you maintain your list of States, or CustomerTypes or JobStatus or any number of valid domain values used to maintain data integrity within your application. These reference tables usually have simple 2–4 columns with the naming convention usually following along the lines of ID, Value, and Description, and maybe Active. (e.g. CustomerTypeID, CustomerTypeValue, CustomerTypeDesc, CustomerTypeActive) I have seen database designs that have hundreds of these reference tables. There is nothing wrong with the mere existence of these tables, but they do bring some baggage along with them. One of the considerations that happens when you have these tables is that someone has to design and approve them. Someone then has to design, code, and approve any necessary views, and stored procedures around them. And most of these tables, views, and stored procedures are fairly simple. There’s usually very little insert, update, or delete (IUD) activity happening. They’re mostly used for lookups and for joins in views to represent the entire picture of a record. In a large application you can also clutter up your list of tables with so many of these that you begin to think that you need to have a special naming convention for objects that are lookup related. (e.g. lkpCustomerType, lkpStates, kp_GetCustomerType, etc). All of the previous issues that I presented were from the DBA or database developer’s perspective, but there’s another perspective to take into consideration. The application developer’s perspective. Whether it’s a traditional client server or internet application, the application developer usually has to create separate functions to access & modify the data within each table, often creating separate classes to represent each table. Then the developer needs to create an interface for the user to maintain the values within these tables. This naturally makes more work for the developer. I’ve created a lookup architecture that simplifies things a bit. This is not necessarily a brand new concept, but it is something I’ve rarely seen. What I’ve done is to create two tables. I call them Look and LookType. The structure is shown in Figure 1.
Figure 1.
101
Before I go any further, let me first explain another design and naming convention that I have. All tables that I design have a field that is unique and is named after the table with the suffix of GID (Generated ID). This value is usually an IDENTITY integer although can sometimes be a uniqueidentifier. This field is not necessarily the Primary Key, although in this instance it is. My other convention is that all Foreign Key fields have the suffix FID (Foreign ID). This field doesn’t necessarily have to have the same name as the Primary Key it references, but usually ends up that way. So that explains the LookTypeGID, LookGID, and LookTypeFID fields. Each of the GID fields are IDENTITY integer fields and are also the Primary Key. The LookTypeFID field is the foreign key to the LookTypeGID field. The other convention that I have is that all foreign key values in the tables that point to the Look table have the LID (Lookup ID) suffix. This makes it easier for me to at a glance realize where things are related to. The main fields are the Value fields. These are normally where the reference value is stored. There is also a description field which can be used for longer and more descriptive descriptions of the value. On both tables there is also an Active field which can be used to either inactivate a single value in the list or an entire list. The LookOrder field is used solely for display or sorting purposes. In lookup lists, there isn’t a standard way of sorting things. Usually somebody wants things sorted a particular way besides alphabetical, numerical, etc. This field is for that and is of integer data type. The other important field is the Constant field. This is a place where you can put an easier to remember value to reference that row from your application. You don’t want to go hard coding distinct values into your application code such as SELECT * FROM Customers WHERE CustomerTypeLID = 2. The reason that this is bad is 1) it has no meaning without doing a little more work, 2) moving development data to production or any number of events can reset your IDENTITY values and screw your code up, and 3)the client usually doesn’t have any control over the IDs, so if he wants to change what the code is pointing to, he has to edit either the code or go into the database at the table level. Now you’re either a bit confused or you’re possibly saying “so what”? Well, first let’s put a couple of values in the tables so that you can see an example of what I’m talking about.
Sample LookType data.
Sample Look data
102
Now, from a lookup perspective, all of the values will come from the Look table. The only table within the database that would reference the LookType table would be the Look table. Its sole purpose is to create a grouping identifier for the Look table values. So we can see that our List of Shippers has a LookTypeGID of 37 and has 3 shippers in it. We use the constant value of SHIPPER_UPS etc. to identify within the application which row we’re referencing. A sample Order table would then have an integer field called ShipperLID with a possible value of 112 for UPS. If I wanted to get the list of shippers I’d call one of my stored procedures like “EXEC s_LookListByTypeConst ‘SHIPPERS’, NULL” (NULL representing my Active field. I can either get all of the active records, or all the records no matter whether active or not. It defaults to Null which here means only Active) Now, I know that there are probably a number of you who immediately see that this design breaks the 1st form of normalization. I contend that there are always exceptions to the rule based upon applicability of the situation. With this design, you never need to create new lookup or reference tables. You just need to add data to preexisting tables. That then leads us to the next and most valuable aspect of this design. We can make generic procedures that allow us to do anything and everything with these tables with a small list of stored procedures or functions. These stored procedures or functions can then be used for all application development. This is where it now gets interesting. I do mostly web front end applications for my data marts. I’ve created a single asp page that has access to a VBScript class which access the stored procedures. This page then allows the application users or managers to manage their own lookup lists. Gone are the days when the application manager asked me to add the new CustomerType of Wholesale or add an entire lookup table and the corresponding stored procedures, or to sort the ProjectStatus of Open to the top of the list and Closed at the bottom. Here’s a list of the stored procedures that I use. You’ll get the gist of their meanings since I’m fairly verbose with my object names. I’ve also included a couple of samples. s_LookAddEdit s_LookTypeAddEdit s_LookDelete s_LookTypeDelete s_LookListByGID s_LookListByConst CREATE PROCEDURE s_LookValueByConst ( @const varchar(100), @active int = NULL, @value varchar(1000) OUT ) AS SET NOCOUNT ON SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED BEGIN TRAN SELECT @value = LookValue FROM Look WHERE LookConst = @const AND (@active IS NULL OR LookActive = @active) COMMIT TRAN GO CREATE PROCEDURE s_LookListByTypeConst
103
(
@const varchar(100), @active int = NULL
) AS SET NOCOUNT ON
DECLARE @id int EXEC s_LookTypeGIDByConst @const, NULL, @id OUT SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED BEGIN TRAN SELECT * FROM Look WHERE LookTypeFID = @id AND (@active IS NULL OR LookActive = @active) ORDER BY LookOrder COMMIT TRAN GO I also have one view called v_Look which combines the two tables and all of the values. You’ll notice that I set the transaction isolation levels to read uncommitted. I do this since these are fairly popular tables used in many different views and sometimes used in the same view more than once. These tables are also fairly static and so speed is my main concern here. Now realize that there are considerations and some requirements in order to implement this design. 1) All of your value & description fields need to be the same data type, most normally varchar. You can obviously store numbers or dates in the value fields, but they’ll need to be cast correctly on the application for input and output. 2) You need to have unique LookConst values. 3) You should have check constraints on tables that reference the Look table so that you can validate that only allowable values are put into that field. (e.g. there would be nothing invalid about putting the value of 30 into the ShipperLID field in the Order table. Unfortunately, that would then mean that the Shipper was “December”. 4) All data access code should come through the stored procedures. 5) This design currently does not take into consideration different security rights for different reference domains. (e.g. If Mary can change CustomerType values, but not Shipper values.) This has to currently be done at the application level. 6) This design is limited in that if you have a domain of values that require more than the designed fields, it doesn’t work very well and you’ll be better off making an individual table for it. 7) In some of my applications I have added CreatedDate & CreatedBy & ModifiedDate & ModifiedBy fields for auditing purposes. 8) I found this design to work very well for those application level setting tables or application / DTS global variable tables. I have slowly over the years refined this module of functionality to include such things as reordering the display order if a user changes, inserts, or deletes a Look record, creating stored procedures to return not only the value but also the description (sometimes a user may want to see a drop down list of State abbreviations, other times the entire State name), and I am now working on the addition of functions to replicate a lot of the stored procedures so that a developer could use the function in a SELECT query simply. I hope this gets you to thinking on how to reduce the number of reference tables in your databases and if you have any feedback, please let me know. I’m always interested in hearing other developers’ thoughts.
Create Maintenance Job with a Click without using a Wizard Robin Back 8/20/2003
I have found, working at a company using a few hundred SQL servers, how much time I spent to track if a database has required database- and transaction log backups. We also had a few different ways of getting reports of the jobs, to know if they had run successfully or not. We all know the importance of standards, don't we?
104
At our company we use a monitoring product called Tivoli (www.tivoli.com) to check, for example, that SQL server is accessible and that disks are not filled up and so on. I came to the conclusion that our dba group should use this tool for monitoring our standard jobs for database maintenance, and also to get our Control Center to call our emergency service whenever anything goes wrong. We have a policy to use SQL Server Agent for backing up both databases and transaction logs (when needed); we also provide a complete rebuild of all indexes once a week (or more often if any customer would like to), and an update of the statistics if for any reason we can't let SQL server handle this feature itself. Tivoli can also report to us whenever a job has not been run as scheduled for any reason. I have stripped the code from our special things that Tivoli needed so that you can use it in your environment. Now we take one server at the time and replace any existing maintenance job with the new ones. Doing this we will for sure have control of the maintenance of all the sql-servers out there that we have contracts to take care of. It's an enormous task to change all of the jobs with different times so that transactional backups do not cross any database backup and so on. Also to have indexes rebuilt run on different times so that the server does not have to rebuild indexes for more then one database at the time, or even different days for performance reasons. That is why I have developed this script for saving time, and to avoid any typos or other human mistakes, no one does those – right?
Before running the script The script has been tested on SQL 7 and SQL 2000: create_maintenance_jobs.sql (Script available at www.sqlservercentral.com)
Change parameters First time you run the script I suggest that you take a close look at all the parameters described, and make necessary changes before running the script. The section for changing the parameters is found in the script under "Variables that are OK to change". VARIABLE
EXPLANATION
@keep_databasebackup_d ays
The path local (or on a remoteserver) that will hold the backups. Number of days database backups should be kept on disk
@keep_transactionlog_da ys
Number of days transactional backups should be kept on disk
@check_pubs_northwind
Are example database pubs or Northwind allowed to be installed or not
@check_maintenance_pla n
Should the script check for existing maintenance plans
@backup_mb_per_sek
Estimation of how many mb the server backs up per second
@start_backup
What time each day should database backup start What time each day should transaction backup start
@backuppath
@start_backup_trans
COMMENT
A full backup will be taken before deleting the old backup. The backup files are named differently every day. Old transaction logs will be deleted before backing up the current one (different from database backups there backups are deleted after current backup. This is because there should always be at least one database backup on disk. If yes (1) you will be prompted to delete those databases before being able to run the script (if one or both are present). If no (0) still no maintenance jobs will be created for those databases. Since you are about to create new maintenance jobs, there might be confusion to have more than i.e. one database backup. This parameter is used for schedule up so that the different jobs do not conflict with each other. The parameter does not take any notice of how much of the used part of the files, just how much space the files allocate on disk. Format has to be hhmmss. Format has to be hhmmss. Note: will end 1 minute before database backup start
@backup_trans_hour @start_rebuild @start_statistics @category_name
How many hours between each transaction log backup What time each Sunday should indexes be rebuilt What time each Sunday should statistics be updated What category shall the scheduled jobs have
Format has to be hhmmss. Format has to be hhmmss.
105
Description for all the scheduled jobs created Name of the user that will own and execute the scheduled job
@description @owner_login_name
@notify_level_eventlog
Should the jobs write any record to the NT eventlog
@workdir
The script will check the registry for the SQL Installation path, normally: "C:\Program Files\Microsoft SQL Server\MSSQL".
If the user is a member of the sysadmin role the user rights of the user that run SQL Server Agent will be used. If not a member of sysadmin role the proxy user (if present) will be used. The user that executes the scheduled job has to have write access to both the backup folder as well as to the folder @workdir\LOG. 0=never 1=on success 2=on failure 3=always The account that execute the script has to have read permissions in the registry. If changing this variable, make sure you un-comment that row in the script.
Permissions Make sure that all permissions are met in the description of the parameters. The user that executes the script might also have to have the correct permissions to create the @workdir directory and its subfolders "JOBS" and "LOG" if not present on disk.
Running the script Objects in the databases The first thing the script will do is to check that anything that the script creates do not already exist at the server. If, for example, there is a scheduled job with the same name as any of the ones that will be created, you will be prompted to delete the scheduled job (or simply rename it). The same goes for the stored procedures that will be created in all the user databases. Make sure to read each line carefully before running the output in another window or, as I would suggest, delete (or rename) everything by hand. The output might look something like: -- 1. Delete stored procedure 'REBUILD_INDEX' in database 'LSIPT100' use LSIPT100 drop proc REBUILD_INDEX go -- 2. Delete stored procedure 'REBUILD_INDEX' in database 'DOCUMENTS' use DOCUMENTS drop proc REBUILD_INDEX go
What does the script create Folders on disk The server fetches the registry value where of SQL Server was installed (by default "C:\Program Files\Microsoft SQL Server\MSSQL"). If this path does not exist at the server, it will be created. Also, there are two sub-folders that are created by default when installing SQL-server and that is "JOBS" and "LOG"; these will also be created if they do not exist.
Scheduled Jobs All scheduled jobs will stop if any of the steps fail, and reports the error as set in the parameters @notify_level_eventlog. No scheduled jobs will be created for the example databases pubs and Northwind. Note that "database" stands for the name of the database the scheduled job affects. DATABASE TY PE
NAME
STEP
DESCRIPTION
106
System User
BACKUP database - (DBCC)
1. DBCC CHECKCATALOG datbase 2. DBCC CHECKDB database 3. BACKUP - database - DATABASE
Run DBCC CHECKCATALOG
Run DBCC CHECKDB Will perform a full backup of the database to disk. The filename will describe what database it's used for and what day the backup started: database_BACKUP_20030718.BAK
User
BACKUP database TRANSACTION
4. DELETE OLD DATABASE BACKUPS - database 1. DELETE OLD TRANSACTION LOGS - database 2. BACKUP - database - TRANSACTION
Note that no backup will be performed if any error is found in one of the two dbcc checks. All database backups for this database older then @keep_databasebackup_days will be deleted. All transaction log backups for this database older then @keep_transactionlog_days will be deleted. Will perform a transaction log backup of the database to disk. The scheduled job will be created but disabled if database option "Truncate Log On Checkpoint" is enabled. The filename will describe what database it's used for and what day the backup started: database_BACKUP_TRANSACTION_20030718.TRN
User User
REBUILD INDEX - database UPDATE STATISTICS database
1. REBUILD INDEX database 1. UPDATE STATISTICS - database
Note that script will append all transaction log backups for each day in one file per day. Will run the stored procedure REBUILD_INDEX in database and rebuild all indexes using the default fillfactor used to create the index. Will run the stored procedure UPDATE_STATISTICS in database and update all the statistics for the database. Note that this job will only be created if for any reason the options has been disabled for SQL-server to perform this by itself.
Scheduled Job Category If the scheduled job category set in @category_name does not exist, it will be created.
Stored Procedures REBUILD_INDEX The stored procedure will re-create all the indexes in the database using the fillfactor used when creating the index. create procedure REBUILD_INDEX as declare @tablename varchar(255) declare @tableowner varchar(255) declare @tablename_header varchar(600) declare @sql varchar(600) declare tnames_cursor CURSOR FOR select 'tablename'=so.name, 'tableowner'=su.name from dbo.sysobjects so inner join dbo.sysusers su on so.uid = su.uid where so.type = 'U' open tnames_cursor fetch next from tnames_cursor into @tablename, @tableowner while (@@fetch_status <> -1) begin if (@@fetch_status <> -2) begin select @tablename_header = '***** Updating ' + rtrim(upper(@tablename)) + ' (' + convert(varchar, getdate(), 20) + ') *****'
107
print @tablename_header select @sql = 'dbcc dbreindex ( ''' + @tableowner + '.' + @tablename + ''','''',0 )' exec ( @sql ) end fetch next from tnames_cursor into @tablename, @tableowner end print '' print '' print '***** DBReindex have been updated for all tables (' + convert (varchar,getdate(),20) + ') *****' close tnames_cursor deallocate tnames_cursor
UPDATE_STATISTICS
The stored procedure update all the statistics in the database. Note that this stored procedure will only be created if for any reason the options has been disabled for SQL-server to perform this by itself. create procedure UPDATE_STATISTICS as declare @tablename varchar(255) declare @tableowner varchar(255) declare @tablename_header varchar(600) declare @sql varchar(600) declare tnames_cursor CURSOR FOR select 'tablename'=so.name, 'tableowner'=su.name from dbo.sysobjects so inner join dbo.sysusers su on so.uid = su.uid where so.type = 'U' open tnames_cursor fetch next from tnames_cursor into @tablename, @tableowner while (@@fetch_status <> -1) begin if (@@fetch_status <> -2) begin select @tablename_header = '***** Updating ' + rtrim(upper(@tablename)) + ' (' + convert(varchar, getdate(), 20) + ') *****' print @tablename_header select @sql = 'update statistics ' + @tableowner + '.' + @tablename exec ( @sql ) end fetch next from tnames_cursor into @tablename, @tableowner end print '' print '' print '***** Statistics has been updated for all tables (' + convert (varchar,getdate(),20) + ') *****' close tnames_cursor deallocate tnames_cursor
Tables Following tables will be created in tempdb and dropped when the script finishes. Note that if one of the tables already exists in tempdb it will be dropped without any notification: temporary_table_directory temporary_table_db temporary_table_dbsize temporary_table_sproc
108
Logs Each step in every scheduled job will generate a log-file in the default SQL-server installation folder "LOG". The name convention is the name of the step followed by the file extension ".LOG". All white spaces in the name are replaced with an underscore "_". All the steps are set to overwrite any existing logfile with the same name. The easiest way to access the correct log is to right-click the job, select the Steps tab. Double-click the desired step and select the Advanced tab and click the button View.
After running the script Check the jobs Browse through the scheduled job to see that everything looks like you expect it to look.
Check job schedule Double check that the time schedule the job will be executed is what you expect it to be.
Test run all jobs You might call me schizophrenic, but I always see that all the scheduled jobs really run without any errors.
Database backups Does the database backup exist where you said it should? You even might want to make a test-restore of the database backup.
Transaction log backup Does the transaction log backup exist where you said it should? You even might want to make a test-restore of the transaction log backup.
Check logs The errorlogs might look overkill, but they are really helpful whenever anything has gone wrong, and you simply don't want to re-run anything if it's not necessarily to do so. Might be for example an index rebuild that takes very long time to run, or that you only can run it out of office-hours.
After a week or two Check database and transaction log backups Do only those backups exist on disk that should exist according to the variables @keep_databasebackup_days and @keep_transactionlog_days was set to?
Re-schedule jobs Since the variable @backup_mb_per_sek was an estimation, you might have to re-schedule some of the jobs, if you feel that it's not OK for some of the jobs to conflict with each other.
SUMMARY You should of course customize the jobs created, or the script, to meet your company's need. I have, as mentioned before, stripped the script of everything that I do not find useful for everyone. You could, for example, set the jobs to notify you by email or net send (Notification tab in scheduled job properties). Note that the script uses the undocumented extended stored procedures "xp_regread" and "xp_fileexist". The article of Alexander Chigrik provides an explanation of these two procedures among with some other undocumented extended stored procedures.
109
Creating a PDF from a Stored Procedure M Ivica 8/26/2003 This article explains how to create a stored procedure that will in turn create a simple column based report in PDF without using any external tools or libraries (and their associated licensing costs!). SQL2PDF makes a PDF report from text inserted in the table psopdf ( nvarchar(80) ). First a table named psopdf should be created. CREATE TABLE psopdf (code NVARCHAR(80)) After that create the stored procedure SQL2PDF. And table psopdf has to be filled with your data as shown in examples below. At the end the stored procedure is called using the file name only (not extension). EXEC sql2pdf 'fileName' The result is in your C:\ directory.
EXAMPLE 1: INSERT psopdf(code) SELECT SPACE(60) + 'COMPANY LTD' INSERT psopdf(code) SELECT SPACE(60) + 'COMPANY ADDRESS' INSERT psopdf(code) SELECT SPACE(60) + 'STREET NAME & No' INSERT psopdf(code) SELECT ' ' INSERT psopdf(code) SELECT SPACE(34) + 'BILL OF SALE' INSERT psopdf(code) SELECT ' ' INSERT psopdf(code) SELECT 'Product' + SPACE(10) + 'Quantity' + SPACE(10) + 'Price' + SPACE(10) + 'Total' INSERT psopdf(code) SELECT REPLACE(SPACE(56), ' ', '_') INSERT psopdf(code) SELECT 'Product1' + SPACE(9) + '10.00 ' + SPACE(10) + '52.30' + SPACE(10) + '5230.0' INSERT psopdf(code) SELECT 'Product2' + SPACE(9) + '2.00 ' + SPACE(10) + '10.00' + SPACE(10) + ' 20.0' INSERT psopdf(code) SELECT REPLACE(SPACE(56), ' ', '_') INSERT psopdf(code) SELECT SPACE(50) + '5250.0' After INSERT call the stored procedure with file name demo2. EXEC sql2pdf 'demo2' The result is in your C:\ directory.
110
(1) EXAMPLE 2: Second example uses a database pubs. USE pubs INSERT psopdf(code) SELECT t1.au_lname + ' ' + t1.au_fname + ' ' + t1.phone + ' ' + t1.address + ' ' + t1.city + ' ' + t1.state + ' ' + t1.zip FROM authors t1, authors t2 After INSERT call the stored procedure with file name demo1. EXEC sql2pdf 'demo1' The result is in your C:\ directory.
111
(2) Creating a Script from a Stored Procedure Ryan Randall 5/2/2003
A simple task, I thought, but it took me to some interesting places. The method is broadly this: 1) Create an instance of SQL-DMO SQL Server, and use the script method to save the create table text in a file. 2) Get the text from the file into a sp variable. 3) Delete the text file. Here are the details of the method, and a summary which puts it all together: 1) Create an instance of SQL-DMO SQL Server, and use the script method to save the create table text in a file. Here's the usage: exec run_script 'my_server', 'my_database', 'my_table', 74077, 'my_path_name' And here's the sp... CREATE proc run_script @server varchar(100), @database_name varchar(100), @table_name varchar(100), @script_id int, @path_name varchar(200) as --runs a sql server script and outputs it to a file. declare @i int declare @object int declare @return varchar(200) declare @q varchar(200) declare @is_error bit
112
set @is_error = 0 --create sql server object EXEC @i = sp_OACreate 'SQLDMO.SQLServer', @object OUT IF NOT @i = 0 EXEC sp_OAGetErrorInfo @object --connect to sql server using windows nt and verify the connection EXEC @i = sp_OASetProperty @object, 'LoginSecure', 1 IF NOT @i = 0 EXEC sp_OAGetErrorInfo @object EXEC @i = sp_OAMethod @object, 'Connect', NULL, @server IF NOT @i = 0 EXEC sp_OAGetErrorInfo @object EXEC @i = sp_OAMethod @object, 'VerifyConnection', @return OUT IF NOT @i = 0 EXEC sp_OAGetErrorInfo @object --run the script SET @q = 'Databases("' + @database_name + '").Tables("' + @table_name + '").Script(' + cast(@script_id as varchar(10)) + ', ' + @path_name + ')' IF NOT @i = 0 begin EXEC sp_OAGetErrorInfo @object set @is_error = 1 end EXEC @i = sp_OAMethod @object, @q, @return OUT IF NOT @i = 0 begin EXEC sp_OAGetErrorInfo @object set @is_error = 1 end --destroy sql server object EXEC @i = sp_OADestroy @object IF NOT @i = 0 EXEC sp_OAGetErrorInfo @object return @is_error GO 2) Get the text from the file into a sp variable. My first try was to use the FileSystemObject... CREATE proc get_from_file @file_output varchar(8000) output, @path_name varchar(200) as --outputs all the text of a file concatenated into a single string. --Note - 255 character limitation. DECLARE @file_output varchar(8000) DECLARE @fso int DECLARE @ts int DECLARE @i int EXEC @i = sp_OACreate 'Scripting.FileSystemObject', @fso OUT IF NOT @i = 0 EXEC sp_OAGetErrorInfo @fso EXEC @i = sp_OAMethod @fso, 'OpenTextFile', @ts out, @path_name IF NOT @i = 0 EXEC sp_OAGetErrorInfo @fso EXEC @i = sp_OAMethod @ts, 'ReadAll', @file_output out IF NOT @i = 0 EXEC sp_OAGetErrorInfo @ts EXEC @i = sp_OADestroy @ts IF NOT @i = 0 EXEC sp_OAGetErrorInfo @ts EXEC @i = sp_OADestroy @fso IF NOT @i = 0 EXEC sp_OAGetErrorInfo @fso GO This, however, has a 255 character limitation – so it was back to the drawing board. I don't much like it, but I came up with this... declare @file_output varchar(8000) exec get_from_file @file_output output, 'my_path_name' select @file_output And here's the sp (with a simple supporting sp below it)... CREATE proc get_from_file @file_output varchar(8000) output, @path_name varchar(200) as --outputs all the text of a file concatenated into a single string.
113
set nocount on --get_unique_name for temporary table declare @unique_table_name varchar(100) exec get_unique_name @unique_table_name output set @unique_table_name = '##' + @unique_table_name --create concatenated string and puts it into the table exec('create table #t1 (c1 varchar(8000)) bulk insert #t1 from ''' + @path_name + ''' declare @s varchar(8000) set @s = '''' select @s = @s + isnull(c1, '''') + char(13) from #t1 select c1 = @s into ' + @unique_table_name ) --output the single value in the table to our output variable declare @q nvarchar(100) set @q = 'select @p1 = c1 from ' + @unique_table_name exec sp_executesql @q, N'@P1 varchar(8000) output', @file_output output --drop our temporary table exec ('drop table ' + @unique_table_name) set nocount off GO Supporting sp... CREATE proc get_unique_name @output varchar(50) output as --outputs a unique name based on the current user and the precise time the sp is run. --can be used for table names / file names etc. select @output = replace(system_user, '\', '_') + '_' + cast(datepart(yyyy, getdate()) as varchar(4)) + '_' + cast(datepart(mm, getdate()) as varchar(2)) + '_' + cast(datepart(dd, getdate()) as varchar(2)) + '_' + cast(datepart(hh, getdate()) as varchar(2)) + '_' + cast(datepart(mi, getdate()) as varchar(2)) + '_' + cast(datepart(ss, getdate()) as varchar(2)) + '_' + cast(datepart(ms, getdate()) as varchar(3)) GO 3) Delete the text file. This uses a familiar method. This time there are no limitations. Here's the usage... exec delete_file 'my_path_name' And here's the sp... CREATE proc delete_file @path_name varchar(200) as --deletes a file DECLARE @object int DECLARE @i int EXEC @i = sp_OACreate 'Scripting.FileSystemObject', @object OUT IF NOT @i = 0 EXEC sp_OAGetErrorInfo @object EXEC @i = sp_OAMethod @object, 'DeleteFile', null, @FileSpec =@path_name IF NOT @i = 0 EXEC sp_OAGetErrorInfo @object EXEC @i = sp_OADestroy @object IF NOT @i = 0 EXEC sp_OAGetErrorInfo @object GO Putting it all together - here's the usage... declare @object_text varchar(8000) exec get_create_table_script @object_text output, 'my_server', 'my_database', 'my_table'
114
select @object_text And here's the sp... CREATE proc get_create_table_script @create_table_script varchar(8000) output, @server varchar(100), @database_name varchar(100), @table_name varchar(100) as –outputs a create table script for a sql table. To do this, it runs a script to put it into a file, then gets it from the file and deletes the file declare @return int --get path name of temporary sql file declare @path_name varchar(100) exec get_unique_name @path_name output set @path_name = '\\' + @server + '\c$\' + @path_name + '.sql' --create the 'create table' script and put it into sql file exec @return = run_script @server, @database_name, @table_name, 74077, @path_name --return if above step errored. if @return = 1 return --get script results from sql file into output variable exec get_from_file @create_table_script output, @path_name --delete temporary sql file exec delete_file @path_name GO And there's your final stored procedure which does what we set out to do.
Date Time Values and Time Zones Dinesh Asanka 12/22/2003
Introduction DateTime always gives headaches to the database developers, because of their various combinations. Here is another problem of Datetime. I have given a solution and it is open to discussion. It will be highly appreciated if you can share your ideas to this and your solutions.
What is the Use Problem arises when your servers are spread across multiple time zones and you are asked to combine all the data in to a main server. Let's take a Hotel System for an example. Assume a company in Hawaii which owned two hotels in Nairobi, Kenya and another one in Kuala Lumpur ,Malaysia. In each location separate SQL Servers are running. All the data need to be transferred to the Main System which is running in Hawaii. Certainly there will be a problem in Datetime if you are saving the data with their respective times. So we need a solution to identify the actual time. One way of doing this is by keeping the Location with each record. Then we know that we can get the actual time. But, as you can imagine, it will be a tedious task. What if we can keep a common time for all of them ? We can keep all the Datetime in Universal Time Coordinate, better known as Greenwich Mean Time.
115
As far as end users are concerned, it will be difficult them to deal with GMT as they are already used to their own system time. So the option would be to display the system time and convert them to GMT when storing to database as well as convert when reading from the database.
How can we use SQL SERVER? GetUTCDate() is a new function which added to the function family of in SQLServer 2000.This function returns the datetime value representing the current UTC time. The current time is derived from the current local time and the time zone setting in the operating system of the computer on which SQL Sever is running. It must be noted that it is not returning where your Query is run from. Most of the users think it is returning the GMT of the PC where you run the Query. If you are saving current time (GetDate()) now you have to save the current GMT Time(GetUTCDate()) If you are saving user defined time like reservation time of guests, then you must convert this time to GMT. Following Stored Procedure will convert the current time to GMT. /* Purpose : Convert DateTime to GMT Time Format Author : PPG Dinesh Asanka Create Date : 2003-08-23 Version Date Modification */ Create PROCEDURE [dbo].[Convert_DateTime_to_GMT] @dt_Date_Time as datetime AS select DATEADD ( hh , (DATEDIFF ( hh , GetDate(), GetUTCDate() )) , @dt_Date_Time ) GO Here is an example to use this function. Select @dt = cast(‘2003/10/12 13:12’ as datetime) Exec convert_GMT_to_DateTime @dt Now the reading of datetime field. It will be a another simple function like above. /* Purpose : Convert GMT Time Format to System DateTime Author : PPG Dinesh Asanka Create Date : 2003-08-23 Version Date Modification */ CREATE PROCEDURE [dbo].[Convert_GMT_to_DateTime] @dt_GMT as datetime AS select DATEADD ( hh , (DATEDIFF ( hh , GetUTCDate(),GetDate() )) , @dt_GMT ) GO I don't think you would need an example for this as it will be same as above.
Comparison with Oracle I have a habit (not sure whether it is good or bad) of comparing SQL Server with Oracle once I find any new feature in SQL Server. There are many functions in Oracle with relation to the Time Zones. DBTIMEZONE is function equivalent of Oracle 9i to the SQL Server GetUTCDate(). SYS_EXTRACT_UTC returns the UTC time for a given time. TZ_OFFSET returns the offset from UTC for a given time Zone name. There is a fantastic function in Oracle, called NEW_TIME, which takes three arguments, which converts datetime of one zone to another and more importantly there are 18 defined time zones. Some of them are Atlantic Standard Time, Eastern Standard Time, Pacific Standard Time, Yukon Standard Time and many others! Therefore we don't have to convert system time to GMT.
Conclusion Next version of SQL Server has to cater for more functions for the Time Zone. Then SQL Server will be more user friendly as far as end users and DBAs are concerned and will make DBA's job much easier.
116
Find Min/Max Values in a Set Dinesh Asanka 11/21/2003 In Oracle Magazine there was a discussion about finding the Nth Max or Min value from a value set. After three issues of the magazine it came across with the following query as the solution to the problem. Select Min(Col1) From (Select Col1 From (Select Distinct Col1 From Tab1 Order By Col1 Desc) Where RowNum <=&N I was trying to do the same with SQL Server. But I found that there is no field name called ROWNUM in SQLServer. Then I posted it into the Discussion board of SQLServerCentral.Com. You can see that link named RowNum Function in SQLServer. After studying this discussion I felt that there is no direct method of doing it like in Oracle. This might be included in the next version of the SQL Server! I commenced the work with a Simple table called NMaxMin. CREATE TABLE [NMaxMin] ( [ID] [int] IDENTITY (1, 1) NOT NULL , [Number] [int] NULL ) ON [PRIMARY] GO I filled some arbitrary data in to the NMaxMin table. Figure 1 shows the set of vales used for this discussion.
I wrote the below query to get the result set starting from minimum to maximum along with sequence number. select rank=count(*), s1.number from (Select distinct(number) from NMaxMin) s1, (Select distinct(number) from NMaxMin) s2 where s1.number >= s2.number group by s1.number order by 1 After running the query the output will be like in Figure 2.
117
Now you can see there are only 11 records (Previously there were 14 records. This has happened because there are 2 records of 1’s and 3 records of 45’s. From the above table now it will be easy to find out the Nth maximum. If you want the 5th maximum value, query will be: Select number From (select count(*) as rank , s1.number from (Select distinct(number) from NMaxMin) s1, (Select distinct(number) from NMaxMin) s2 where s1.number <= s2.number group by s1.number ) s3 where s3.rank = 5 Answer will be 567 which is the 5th maximum number in the table. For the minimum you just have to do a small change to the query. Select rank, number From (select count(*) as rank , s1.number from (Select distinct(number) from NMaxMin) s1, (Select distinct(number) from NMaxMin) s2 where s1.number >= s2.number group by s1.number ) s3 where rank = 5 Answer will be 78 which is the 5th minimum number in the table. Maximum and Minimum numbers are useful when you are doing statistical calculations.
Gathering Random Data Brian Knight 3/26/2003
I recently had the basic need to retrieve a record from the database at random. What seemed to be an easy task quickly became a complex one. This case showed an interesting quirk with T-SQL that was resolved in an equally quirky way. This quick article shows you a method to retrieve random data or randomize the display of data. Why would you ever want to retrieve random data? • In my case, I wanted to pull a random article to display on this site’s homepage • Choose a random user to receive a prize • Choose a random employee for a drug test
118
The problem with retrieving random data using the RAND() function is how it’s actually used in the query. For example, if you run the below query against the Northwind database, you can see that you will see the same random value and date for each row in the results. SELECT TOP 3 RAND(), GETDATE(), ProductID, ProductName FROM Products Results: 0.54429273766415864 2003-03-19 15:06:27.327 17 0.54429273766415864 2003-03-19 15:06:27.327 3 Syrup 0.54429273766415864 2003-03-19 15:06:27.327 40 Meat This behavior prohibits the obvious way to retrieve random data by using a query like this:
Alice Mutton Aniseed Boston Crab
SELECT TOP 3 ProductID, ProductName FROM products ORDER BY RAND() Results in: ProductID ProductName ----------- ---------------------------------------17 Alice Mutton 3 Aniseed Syrup 40 Boston Crab Meat If you execute this query over and over again, you should see the same results each time. The trick then is to use a system function that doesn’t use this type of behavior. The newid() function is a system function used in replication that produces a Global Unique Identifier (GUID). You can see in the following query that it produces unique records at the row-level. SELECT TOP 3 newid(), ProductID, ProductName FROM Products Results in:
ProductID ProductName ------------------------------------------------------------------------------------8D0A4758-0C90-49DC-AF3A-3FC949540B45 17 Alice Mutton E6460D00-A5D1-4ADC-86D5-DE8A08C2DCF0 3 Aniseed Syrup FC0D00BF-F3A2-4341-A584-728DC8DDA513 40 Boston Crab Meat You can also execute the following query to randomize your data (TOP clause optional): SELECT TOP 1 ProductID, ProductName FROM products ORDER BY NEWID() Results in: ProductID ProductName ----------- ---------------------------------------7 Uncle Bob's Organic Dried Pears Each time you fire it off, you should retrieve a different result. There’s also an additional way to actually use the rand() function that Itzik Ben-Gan has discovered using user defined functions and views as a workaround. The secret there is to produce a view that uses the rand() function as shown below: CREATE VIEW VRand AS SELECT RAND() AS rnd GO Then create a user defined function (only works in SQL Server 2000) that selects from the view and returns the random value. CREATE FUNCTION dbo.fn_row_rand() RETURNS FLOAT AS
119
BEGIN RETURN (SELECT rnd FROM VRand) END To use the function, you can use syntax as shown below to retrieve random records. SELECT TOP 1 ProductID, ProductName FROM Products ORDER BY dbo.fn_row_rand() GO This is also handy if you wish to use the getdate() function at the record level to display data. I have found that this method has slight performance enhancement but it is negligible. Make sure you test between the two methods before you use either.
It Cant be Done with SQL Cade Bryant 10/30/2003 How many times have you said this to yourself (or to your boss/colleagues)? How often have others said this to you? The fact is that, while it’s true that T-SQL has its limitations when compared to a “real” programming language like C++, you might be amazed at the tricks you can pull off with T-SQL if you’re willing to do some digging and trial-and-error experimentation. Such an opportunity to push the limits of T-SQL came to me just yesterday at my job. I work at a talent agency, and one of the databases I manage stores information relating to performing artists’ and bands’ live performances (or “personal appearances” as it’s called in the industry). A personal appearance consists of such attributes as venue locations, venue capacities, show dates, and count of tickets sold (on a per-day as well as on an aggregate basis). The users wanted a Crystal Report that would display some basic header information about an artist’s or band’s upcoming shows. They also wanted one of the columns to display a carriage-return-delimited list (one item per line) of ticket count dates, along with (for each date shown) the total number of tickets sold for that date and the number sold on that particular day. The original spec called for a list of the last five ticket-count dates, with the intent that it would look something like this: 10/16/03 10/15/03 10/14/03 10/13/03 10/10/03
-
1181 1162 1148 1120 1111
(19) (14) (28) (9) (10)
The number to the immediate right of each date represents the total number sold to that date, and the number in parenthesis represents the number sold on that day. The dates are in descending order, counting downward from the most recent ticket-sale-count date. You can see, for example, that, on 10/16/2003, 19 tickets were sold, bringing the total up from the previous day from 1162 to 1181. I assumed that this would be fairly simple with SQL Server 2000 – but it actually proved to me more complex than I thought. I wanted my solution to be set-oriented, if at all possible, avoiding cursors and loops. Luckily I was able to create a UDF to perform this task, as the following code shows: CREATE FUNCTION GetLast5TicketCounts (@ShowID INT, @ClientName VARCHAR(200)) RETURNS VARCHAR(8000) AS BEGIN DECLARE @Counts VARCHAR(8000)
120
SELECT TOP 5 @Counts = ISNULL(@Counts + '
', '') + CONVERT(VARCHAR(100), CountDate, 101) + ' - ' + CAST(TicketCount AS VARCHAR(10)) + ISNULL(' (' + CAST(TicketCount - ISNULL( (SELECT MAX(tc2.TicketCount) FROM vTicketCount tc2 WHERE tc2.ShowID = @ShowID AND tc2.ClientName = @ClientName AND tc2.CountDate < tc1.CountDate), 0) AS VARCHAR(100)) + ')', '') FROM vTicketCount tc1 WHERE ShowID = @ShowID AND ClientName = @ClientName ORDER BY CountDate DESC RETURN ISNULL(@Counts, '') END As you can see, this took quite a bit of coding! Note the use of the inner subquery, whose result (the previous day’s total to-date ticket count) is subtracted from the current day’s total to-date count in order to obtain the ticket count for the current day only. If the date in question happens to be the first date of ticket sales (meaning that there is no previous ticket-count date), then ISNULL forces it to return 0, and nothing is subtracted. Also note the use of the HTML
tags in the code in order to force a carriage-return/line-break after each item. The reason for this was that the T-SQL function CHAR(13) doesn’t seem to work with fields in Crystal Reports – but, Crystal fields can be set to be HTML-aware. Thus I make liberal use of HTML tags when I’m coding queries that are to be used in reports. (For some reason, I find that I need to use three
tags in order to effect a line break). I incorporated this UDF into the stored procedure that drives the report and assumed that my work was done. Then my boss informed me that the users are probably not going to want to be limited to seeing just the last five ticket-count dates; they will probably want to specify how many “last ticket-count” dates to show! Uh oh. I knew that this would require dynamic SQL if I were to use the same basic code layout (in a TOP n clause, you can’t use a variable as n – you have to write “EXEC(‘SELECT TOP ‘ + @variable + ‘)”). I also knew that you cannot use dynamic SQL in a UDF – nor can you use the SET ROWCOUNT n statement. So I explained to my boss that the users’ request was probably not possible with SQL Server, but I would do my best to find a way. With a little experimentation, I discovered that this operation (allowing the user to specify the number of records to return) could indeed be performed in a UDF – but it required coding a WHILE loop, something I was trying to avoid (I try to stick with pure set-oriented operations as much as possible – WHILE loops and cursors are terribly inefficient in T-SQL as compared with set-oriented solutions). Here is the code I came up with: CREATE FUNCTION GetLastNTicketCounts (@ShowID INT, @ClientName VARCHAR(200), @NumCounts INT) RETURNS VARCHAR(8000) AS BEGIN DECLARE @t TABLE(ID INT IDENTITY, TicketCount INT, CountDate DATETIME) INSERT INTO @t SELECT * FROM ( SELECT TOP 100 PERCENT TicketCount, CountDate FROM vTicketCount WHERE ShowID = @ShowID AND ClientName = @ClientName ORDER BY CountDate DESC ) t DECLARE @Counts VARCHAR(8000), @Counter INT SET @Counter = 1 WHILE @Counter <= @NumCounts
121
BEGIN SELECT @Counts = ISNULL(@Counts + '
', '') + CONVERT(VARCHAR(100), CountDate, 101) + ' - ' + CAST(TicketCount AS VARCHAR(10)) + ISNULL(' (' + CAST(TicketCount - ISNULL( (SELECT MAX(tc2.TicketCount) FROM @t tc2 WHERE tc2.CountDate < tc1.CountDate), 0) AS VARCHAR(100)) + ')', '') FROM @t tc1 WHERE ID = @Counter SET @Counter = @Counter + 1 END RETURN ISNULL(@Counts, '') END Note the use of a table variable to store the results (ordered in descending order of the count date and including an IDENTITY column for convenience in incrementally stepping through the data in the loop). Since we are only dealing with a small amount of data (it’s unlikely that the report will contain more than 50 records or that the user will opt to see more than 10 count dates per record), the addition of the loop did not cause any real performance hit. Feeling like a hero as I presented this to my boss, I then got the bomb dropped on me when I was told that the user would not only want to see n number of last ticket-sale dates – they would also want to see (in the same field) n number of first ticket-sale dates, too (that is, dates counting up from the first date of ticket sales)! Oh, and could I also make sure a neat little line appears separating the set of “last” dates from the set of “first” dates? I knew I was in trouble on this one, because, in order for this to work, the first part of the query (which would return the “first” set of dates) would need to have the data sorted in ascending order by date in order to work properly – just like the second part of the query (which returns the “last” dates) would need the data sorted in descending order by date. After much experimentation, attempting to coalesce the two resultsets together into the @Counts variable via adding a second WHILE loop pass (and ending up each time with the dates in the wrong order and hence inaccurate count figures – even though I was using ORDER BY), I discovered that I could get around this by declaring two separate table variables – each sorted ascendingly or descendingly as the case required. Since I already had one half of the equation figured out (how to display the last n ticket count dates/figures), I only needed to “reverse” my logic in order to display the first n dates/figures. Rather than combining both operations into one monster UDF, I decided to create a second UDF to handle returning the “first” dates, and then concatenate the results of each together in the main stored procedure. Here is the code – which as you can see is nearly identical to that of the previous UDF, with the exception of the bolded text: CREATE FUNCTION GetFirstNTicketCounts (@ShowID INT, @ClientName VARCHAR(200), @NumCounts INT = NULL) RETURNS VARCHAR(8000) AS BEGIN DECLARE @t TABLE (ID INT IDENTITY, TicketCount INT, CountDate DATETIME) INSERT INTO @t SELECT * FROM (SELECT TOP 100 PERCENT TicketCount, CountDate FROM vTicketCount WHERE ShowID = @ShowID AND ClientName = @ClientName ORDER BY CountDate ASC ) t DECLARE @Counts VARCHAR(8000), @Counter INT SET @Counter = 1
122
WHILE @Counter <= @NumCounts BEGIN SELECT @Counts = ISNULL(@Counts + '
', '') + CONVERT(VARCHAR(100), CountDate, 101) + ' - ' + CAST(TicketCount AS VARCHAR(10)) + ISNULL(' (' + CAST(TicketCount - ISNULL( ( SELECT MAX(tc2.TicketCount) FROM @t tc2 WHERE tc2.CountDate < tc1.CountDate ), 0) AS VARCHAR(100)) + ')', '') FROM @t tc1 WHERE ID = @Counter SET @Counter = @Counter + 1 END RETURN ISNULL(@Counts, '') END As you can see, the only difference in the code is that the data is inserted into the table variable in ascending (rather than descending) order. Everything else is the same. I only needed to concatenate the results of both of these functions in order to return the data the users wanted. One other minor issue remained: how to display the separator line between the “first” dates and the “last” dates. This line should only be displayed if the user has opted to display both the “first” count dates and the “last” count dates (there would be no need for a separator line if only one set of count dates were being displayed, or if no dates at all were being displayed). I added the following code to the stored procedure: DECLARE @LineBreak VARCHAR(100) SET @LineBreak = CASE WHEN ISNULL(@FCNT, 0) = 0 OR ISNULL(@LCNT, 0) = 0 THEN '' ELSE '
----------------------------
' END Note that the @FCNT and @LCNT variables represent the number of “first” and “last” ticket count dates to display, respectively. I then added this line of code to the SELECT portion of the procedure to concatenate it all together: NULLIF( MAX(dbo.GetFirstNTicketCounts(tc.ShowID, @CN, @FCNT)) + @LineBreak + MAX(dbo.GetLastNTicketCounts(tc.ShowID, @CN, @LCNT)), @LineBreak ) AS TicketCount Here is the entire code of the resulting stored procedure: CREATE PROCEDURE rTicketCountSum @CN VARCHAR(200), @FCNT INT = NULL, -- Num of ticket counts to display from beginning of count period @LCNT INT = NULL, -- Num of ticket counts to display from end of count period @FDO TINYINT = 0 -- Display future show dates only AS DECLARE @LineBreak VARCHAR(100) SET @LineBreak = CASE WHEN ISNULL(@FCNT, 0) = 0 OR ISNULL(@LCNT, 0) = 0 THEN '' ELSE '
----------------------------
' END
123
SELECT ClientName, CONVERT(VARCHAR, ShowDate, 101) AS ShowDate, VenueName, Contact, Phone, VenueCityState, Capacity, CAST((MAX(TicketCount) * 100) / Capacity AS VARCHAR(10)) + '%' AS PctSold, NULLIF( MAX(dbo.GetFirstNTicketCounts(tc.ShowID, @CN, @FCNT)) + @LineBreak + MAX(dbo.GetLastNTicketCounts(tc.ShowID, @CN, @LCNT)), @LineBreak ) AS TicketCount FROM vTicketCount tc -- a view joining all the relevant tables together LEFT JOIN PATC_Contacts c ON tc.ShowID = c.ShowID WHERE ClientName = @CN AND (ShowDate >= GETDATE() OR ISNULL(@FDO, 0) = 0) GROUP BY ClientName, tc.ShowID, ShowDate, VenueName, Contact, Phone, VenueCityState, Capacity Result? The report now returns exactly the data that the users wanted (including the “neat little line break”), while still performing efficiently! Here is a partial screenshot showing a few columns of the Crystal Report (run with the user opting to see the first 5 and last 5 count dates). Notice the far left-hand column: The moral of this story is: I’ve learned to not be so quick to “write off” a programming challenge as being beyond the scope of T-SQL. I’ve learned not to palm coding tasks off onto the front-end developers without thoroughly experimenting to see if, by any possible way, the task can be performed efficiently on the database server. And in the process I’ve significantly minimized (if not eliminated altogether) those instances in which I’m tempted to raise my hands in frustration and declare those dreaded six words: “That can’t be done in SQL.”
Managing Jobs Using TSQL Randy Dyess 4/2/2003 Having the honor of working for quite a few companies that did not have the resources to buy any of the nice SQL Server toys that exist out there or were willing to put an email client on the servers, I have found myself spending a good deal of time each morning checking the status of the numerous jobs running on my servers. Not a hard thing to accomplish, but very time consuming when you are talking about dozens of servers with hundreds of jobs. Maybe it was just me, but no matter how much I pleaded at some of these companies, they would go through the red-tape to get an email client put on the SQL Servers so I could use the job notification ability to send me a nice email each morning if a particular job failed. Being the poor companies' DBA, I had to come up with something else. The one computer that usually had email abilities was my local desktop; funny how they always made sure I could get the hundreds of emails telling me what to do each day. To solve my problem, I made use of my desktop and created a system that checked the outcome of all the jobs across all my servers and sent me a nice little report each morning. The first thing I did was to connect to my local msdb database and create a table to hold the report information. You can adjust the table how you want to since I just included the basic information. IF OBJECT_ID('tJobReport') IS NOT NULL DROP TABLE tJobReport GO
124
CREATE TABLE tJobReport ( lngID INTEGER IDENTITY(1,1) ,server VARCHAR(20) ,jobname VARCHAR(50) ,status VARCHAR(10) ,rundate VARCHAR(10) ,runtime CHAR(8) ,runduration CHAR(8) ) GO Given the nature of some the schedules for the job, I felt like this would grow into a sizable table in a very short time so I created a clustered index to speed the data retrieval up. CREATE CLUSTERED INDEX tJobReport_clustered ON tJobReport(server,jobname,rundate,runtime) GO Next, create a stored procedure that will populate your new table. This example makes use of linked servers to job information and job history from each of my servers; you could change the linked server format over to OPENDATASOURCE if you like. Example of using OPENDDATASOURCE FROM OPENDATASOURCE( 'SQLOLEDB', 'Data Source=DEV2;User ID=sa;Password=' ).msdb.dbo.sysjobs sj INNER JOIN OPENDATASOURCE( 'SQLOLEDB', 'Data Source=DEV2;User ID=sa;Password=' ).msdb.dbo.sysjobhistory sh ON sj.job_id = sh.job_id Otherwise, simply linked all your remote servers to your desktop, adjust the following stored procedure to account for the number of linked servers you have, and create the following stored procedure in your msdb database. IF OBJECT_ID('spJobReport') IS NOT NULL DROP PROCEDURE spJobReport GO CREATE PROCEDURE spJobReport AS SET NOCOUNT ON --Server 1 INSERT INTO tJobReport (server, jobname, status, rundate, runtime, runduration) SELECT sj.originating_server, sj.name, --What is it in English CASE sjh.run_status WHEN 0 THEN 'Failed' WHEN 1 THEN 'Succeeded' WHEN 2 THEN 'Retry' WHEN 3 THEN 'Canceled' ELSE 'Unknown' END, --Convert Integer date to regular datetime SUBSTRING(CAST(sjh.run_date AS CHAR(8)),5,2) + '/' + RIGHT(CAST(sjh.run_date AS CHAR(8)),2) + '/' + LEFT(CAST(sjh.run_date AS CHAR(8)),4) --Change run time into something you can reecognize (hh:mm:ss) , LEFT(RIGHT('000000' + CAST(run_time AS VARCHAR(10)),6),2) + ':' +
125
SUBSTRING(RIGHT('000000' + CAST(run_time AS VARCHAR(10)),6),3,2) + ':' + RIGHT(RIGHT('000000' + CAST(run_time AS VARCHAR(10)),6),2) --Change run duration into something you caan recognize (hh:mm:ss) , LEFT(RIGHT('000000' + CAST(run_duration AS VARCHAR(10)),6),2) + ':' + SUBSTRING(RIGHT('000000' + CAST(run_duration AS VARCHAR(10)),6),3,2) + ':' + RIGHT(RIGHT('000000' + CAST(run_duration AS VARCHAR(10)),6),2) FROM msdb.dbo.sysjobs sj --job id and name --Job history INNER JOIN msdb.dbo.sysjobhistory sjh ON sj.job_id = sjh.job_id --Join for new history rows left JOIN msdb.dbo.tJobReport jr ON sj.originating_server = jr.server AND sj.name = jr.jobname AND SUBSTRING(CAST(sjh.run_date AS CHAR(8)),5,2) + '/' + RIGHT(CAST(sjh.run_date AS CHAR(8)),2) + '/' + LEFT(CAST(sjh.run_date AS CHAR(8)),4) = jr.rundate AND LEFT(RIGHT('000000' + CAST(run_time AS VARCHAR(10)),6),2) + ':' + SUBSTRING(RIGHT('000000' + CAST(run_time AS VARCHAR(10)),6),3,2) + ':' + RIGHT(RIGHT('000000' + CAST(run_time AS VARCHAR(10)),6),2) = jr.runtime --Only enabled jobs WHERE sj.enabled = 1 --Only job outcome not each step outcome AND sjh.step_id = 0 --Only completed jobs AND sjh.run_status <> 4 --Only new data AND jr.lngID IS NULL --Latest date first ORDER BY sjh.run_date DESC --Server 2 INSERT INTO tJobReport (server, jobname, status, rundate, runtime, runduration) SELECT sj.originating_server, sj.name, --What is it in English CASE sjh.run_status WHEN 0 THEN 'Failed' WHEN 1 THEN 'Succeeded' WHEN 2 THEN 'Retry' WHEN 3 THEN 'Canceled' ELSE 'Unknown' END, --Convert Integer date to regular datetime SUBSTRING(CAST(sjh.run_date AS CHAR(8)),5,2) + '/' + RIGHT(CAST(sjh.run_date AS CHAR(8)),2) + '/' + LEFT(CAST(sjh.run_date AS CHAR(8)),4) --Change run time into something you can reecognize (hh:mm:ss) , LEFT(RIGHT('000000' + CAST(run_time AS VARCHAR(10)),6),2) + ':' + SUBSTRING(RIGHT('000000' + CAST(run_time AS VARCHAR(10)),6),3,2) + ':' + RIGHT(RIGHT('000000' + CAST(run_time AS VARCHAR(10)),6),2) --Change run duration into something you caan recognize (hh:mm:ss) , LEFT(RIGHT('000000' + CAST(run_duration AS VARCHAR(10)),6),2) + ':' + SUBSTRING(RIGHT('000000' + CAST(run_duration AS VARCHAR(10)),6),3,2) + ':' +
126
RIGHT(RIGHT('000000' + CAST(run_duration AS VARCHAR(10)),6),2) FROM dev2.msdb.dbo.sysjobs sj --job id and name --Job history INNER JOIN dev2.msdb.dbo.sysjobhistory sjh ON sj.job_id = sjh.job_id --Join for new history rows left JOIN msdb.dbo.tJobReport jr ON sj.originating_server = jr.server AND sj.name = jr.jobname AND SUBSTRING(CAST(sjh.run_date AS CHAR(8)),5,2) + '/' + RIGHT(CAST(sjh.run_date AS CHAR(8)),2) + '/' + LEFT(CAST(sjh.run_date AS CHAR(8)),4) = jr.rundate AND LEFT(RIGHT('000000' + CAST(run_time AS VARCHAR(10)),6),2) + ':' + SUBSTRING(RIGHT('000000' + CAST(run_time AS VARCHAR(10)),6),3,2) + ':' + RIGHT(RIGHT('000000' + CAST(run_time AS VARCHAR(10)),6),2) = jr.runtime --Only enabled jobs WHERE sj.enabled = 1 --Only job outcome not each step outcome AND sjh.step_id = 0 --Only completed jobs AND sjh.run_status <> 4 --Only new data AND jr.lngID IS NULL --Latest date first ORDER BY sjh.run_date DESC GO Next, simply create a job on your desktop with whatever schedule you like to run the stored procedure. Once the table has data, it is a simple procedure to define reporting stored procedures to determine the outcome of jobs, average the run time of jobs, report on the history of the jobs, etc. If you want an automatic email sent to you, just configure SQL Mail on your desktop and create a new job or new job step that uses the xp_sendmail system stored procedure to run a basic query. EXEC master.dbo.xp_sendmail @recipients = '[email protected]', @message = 'Daily Job Report', @query = ' SELECT status,server, jobname FROM msdb.dbo.tJobReport WHERE status = 'Failed' AND rundate > DATEADD(hh,-25,GETDATE())', @subject = 'Job Report', @attach_results = 'TRUE' So, if you have the same bad luck in getting those great tools out there or want a centralized way to keep in control of your job outcomes and history, this simple technique can go along way in helping you quickly manage those hundreds of jobs we all seem to accumulate over time. You can find out more about sysjobs. sysjobhistory and xp_sendmail in my last book Transact-SQL Language Reference Guide. Copyright 2003 by Randy Dyess, All rights Reserved
www.TransactSQL.Com
127
Multiple Table Insert Narayana Raghavendra 11/18/2003
You Want To INSERT Data into More Than One Table. You want to include conditions to specify all tables that participates as “Destination” in Multi Table Insert part. This Stored Procedure can insert rows into any number of tables based on the source table with or without conditions. SP Script CREATE PROCEDURE SP_MULTI_INSERTS (@SUB_QUERY AS VARCHAR(2000), @INSERT_PART AS VARCHAR(2000), @DELIMITER AS VARCHAR(100), @ERRORMESSAGE AS VARCHAR(2000) ) AS --VARIABLES DECLARATION DECLARE @SAND AS VARCHAR(10) DECLARE @SSTR AS VARCHAR(2000) DECLARE @SSTR2 AS VARCHAR(2000) DECLARE @SSTR3 AS VARCHAR(2000) DECLARE @SSQL AS VARCHAR(2000) DECLARE @SUB_QUERY2 AS VARCHAR(2000) --VARIABLES TO CONSTRUCT INSERT SQL DECLARE @LASTPOS AS INT DECLARE @LASTPOS2 AS INT DECLARE @LASTPOS3 AS INT --DATA TRIMMING, AND DEFAULT VALUE SETTINGS SET @INSERT_PART = ltrim(rtrim(@INSERT_PART)) SET @SUB_QUERY = ltrim(rtrim(@SUB_QUERY)) IF LEN(@INSERT_PART) = 0 OR LEN(@SUB_QUERY) = 0 BEGIN SET @ERRORMESSAGE = 'INCOMPLETE INFORMATION' RETURN -1 END SET @LASTPOS = 0 SET @SAND = ' ' --CHECK WHETHER SUBQUERY I.E. SOURCE DATA QUERY HAS WHERE CONDITION IF CHARINDEX(' WHERE ', @SUB_QUERY) > 0 BEGIN IF CHARINDEX(' WHERE ', @SUB_QUERY) > CHARINDEX(' FROM ', @SUB_QUERY) SET @SAND = ' AND ' END ELSE SET @SAND = ' WHERE ' BEGIN TRANSACTION MULTIINSERTS --LOOP STARTS WHILE LEN(@SUB_QUERY) > 0 BEGIN SET @LASTPOS2 = @LASTPOS SET @LASTPOS = CHARINDEX(@DELIMITER, @INSERT_PART, @LASTPOS2) IF @LASTPOS = 0 SET @SSTR = SUBSTRING(@INSERT_PART, @LASTPOS2, 2001) ELSE
128
SET @SSTR = SUBSTRING(@INSERT_PART, @LASTPOS2, @LASTPOS-@LASTPOS2) --CHECK WHETHER 'WHERE' CONDITION REQUIRED FOR INSERT SQL IF LEFT(@SSTR, 5) = 'WHEN ' BEGIN SET @SUB_QUERY2 = @SUB_QUERY + @SAND + SUBSTRING(@SSTR, 5, 2001) SET @LASTPOS2 = @LASTPOS SET @LASTPOS3 = CHARINDEX(@DELIMITER, @INSERT_PART, @LASTPOS+LEN (@DELIMITER)) IF @LASTPOS3 = 0 SET @SSTR = SUBSTRING(@INSERT_PART, @LASTPOS2+LEN(@DELIMITER), 2001) ELSE SET @SSTR = SUBSTRING(@INSERT_PART, @LASTPOS2+LEN(@DELIMITER), @LASTPOS3 - (@LASTPOS2+LEN(@DELIMITER))) SET @LASTPOS = @LASTPOS3 END ELSE BEGIN SET @SUB_QUERY2 = @SUB_QUERY END --CONSTRUCT ACTUAL INSERT SQL STRING SET @SSTR2 = LEFT(@SSTR, CHARINDEX('VALUES', @SSTR)-1) SET @SSTR3 = SUBSTRING(@SSTR, LEN(LEFT(@SSTR, CHARINDEX('VALUES', @SSTR)))+6, 2000) SET @SSTR3 = REPLACE(@SSTR3, '(', '') SET @SSTR3 = REPLACE(@SSTR3, ')', '') SET @SSQL = 'INSERT ' + @SSTR2 + ' SELECT ' + @SSTR3 + ' FROM (' + @SUB_QUERY2 + ') ZXTABX1 ' --EXECUTE THE CONSTRUCTED INSERT SQL STRING EXEC (@SSQL) --CHECK FOR ERRORS, RETURN -1 IF ANY ERRORS IF @@ERROR > 0 BEGIN ROLLBACK TRANSACTION MULTIINSERTS SET @ERRORMESSAGE = 'Error while inserting the data' RETURN -1 END --CHECK WHETHER ALL THE TABLES IN 'MULTIPLE TABLE' LIST OVER IF @LASTPOS = 0 BREAK SET @LASTPOS = @LASTPOS + LEN(@DELIMITER) END --LOOP ENDS --FINISHED SUCCESSFULLY, COMMIT THE TRANSACTION COMMIT TRANSACTION MULTIINSERTS RETURN 0 GO
Parameters Parameter Name @SUB_QUERY @INSERT_PART
Description Source data set. A query that returns the desired rows that you want to insert to multiple tables. Column names and values of condition and insert part of Insert SQL statement
129
@DELIMITER
Delimiter value that delimits multiple inserts and where conditions
@ErrorMessage
[INPUT/OUTPUT Parameter] Any error during the SP execution.
Returns Returns 0 on successful execution. Returns –1 on unsuccessful execution with error message in @ErrorMessage input/output parameter
Algorithm a) Accepts parameters for Source dataset, destination table with/without conditions, and the delimiter string that delimits the table, column names and where conditions. b) Check the parameters passed, if the information is improper or incomplete, return error. c) Check whether the subquery i.e. source data set has the where condition in the Query, this is to identify whether to add "And" or "Where" as condition if the user has given any conditions in Source sub query itself. d) Loop till the insertion of Rows into destination tables is completed. • Get the sub string of Multiple Table insertion string by using the Delimiter. The character position of the Delimiter is recorded in a variable, later it is used to find the next delimiter to extract either When" or "Into" sub string • If the extracted sub string starts with 'When ' that means user is giving a filter condition while inserting rows into that particular table. Include that filter condition to the source dataset query. • The next delimited part contains the column name and value list that needs to be inserted into a table. Manipulate the Destination table parameter to construct an "Insert" SQL statement. • Execute the constructed Insert statement, and check for errors. • Exit the loop if the insertion to multiple tables finished the last insertion.
Base logic in SP Inserting Rows Using INSERT...SELECT. The “Insert..Select” sql statement is constructing using @Insert_part parameter with little manipulation. Example This example uses the “Employee” table in the Northwind database. The structure(without constraints) of the Employee table is copied to Employee2 and Employee3 to try out an example. This example copies the LastName, FirstName data from Employees table To Employees1 – If the EmployeeID in Employees table is less than 5, To Employees2 – if the EmployeeID in Employees table is greater than 4 DECLARE @DELIMITER AS VARCHAR(200) DECLARE @INSERT_PART AS VARCHAR(2000) SET @DELIMITER = 'ZZZYYYXXX' SET @INSERT_PART = 'WHEN EMPLOYEEID < 5' + @DELIMITER + 'INTO EMPLOYEES1 (LASTNAME, FIRSTNAME VALUES (LASTNAME, FIRSTNAME)' + @DELIMITER + 'WHEN EMPLOYEEID >4' + @DELIMITER + 'INTO EMPLOYEES2 (LASTNAME, FIRSTNAME) VALUES (LASTNAME, FIRSTNAME)' EXEC SP_MULTI_INSERTS 'SELECT EMPLOYEEID, LASTNAME, FIRSTNAME FROM EMPLOYEES', @INSERT_PART, @DELIMITER, ''
Result (EmployeeID in Employee1 and Employee2 table is generated because it is an Identity column, increments by 1)
130
In this example, rows will be inserted into the SalaryHistory table only when the value of the Salary is greater than 30000 (the annual salary of the employee is more than 30,000). Rows will not be inserted into the ManagerHistory table unless the manager ID is 200. DECLARE @DELIMITER AS VARCHAR(2000) DECLARE @INSERT_PART AS VARCHAR(2000) SET @DELIMITER = 'ZZZYYYXXX' SET @INSERT_PART = 'WHEN Salary > 30000' + @DELIMITER + 'INTO SalaryHistory VALUES (empid, datehired, salary) ' + @DELIMITER + 'WHEN MgrID = 200' + @DELIMITER + 'INTO ManagerHistory VALUES (empid, mgrid, SYSDATE)' EXEC SP_MULTI_INSERTS 'SELECT EmployeeID EMPID, HireDate DATEHIRED,(Sal*12) SALARY, ManagerID MGRID FROM Employees WHERE DeptID = 100', @INSERT_PART, @DELIMITER, ''
Usage • •
To achieve insertion to multiple tables in a single shot. As the functionality is written in a Stored Procedure, the task is performed little faster. It is has similar functionality to Oracle 9i “Multi Table Insert” feature, you can use this as an alternate if you are migrating from Oracle 9i to MS SQL Server 2000. This SP is more tuned to accept Column names of Tables in Insert Parameter, and you can give condition to specific/all tables that participates in Multi table insert Destination part.
Note Maintain the sequence of “When”(Optional) and “Into” part in @Insert_Part parameter with proper delimiter after every “When” and “Into” key words. - RAGHAVENDRA NARAYANA [email protected]
131
Reusing Identities Dinesh Priyankara 2/18/2003 In most table designs, Identity columns are used to maintain the uniqueness of records. There is no problem with insertion and modification of data using an identity column. With deletions though, gaps can occur between identity values. There are several ways to reuse these deleted (removed) identity values. You can find a good solution in Books Online but I wanted to find a new way and my research ended up with a good solution. After several comparisons, I decided to continue with my solution. So, I'd like to share my method with you all and let you decide what solution to use. First of all, let’s create a table called ‘'OrderHeader'’ that has three columns. Note that the first column intID is identity type column. IF OBJECT_ID('OrderHeader') IS NOT NULL DROP TABLE OrderHeader GO CREATE TABLE OrderHeader (intID int IDENTITY(1,1) PRIMARY KEY, strOrderNumber varchar(10) NOT NULL, strDescription varchar(100)) Now let’s add some records to the table. If you want, you can add small amount of records but I added 10000 records because most tables have more than 10000 records and we must always try to make our testing environment real. DECLARE @A smallint SET @A = 1 WHILE (@A <> 10001) BEGIN INSERT INTO OrderHeader (strOrderNumber, strDescription) VALUES (‘OD-' + CONVERT(varchar(3), @A), -- Adding something for Order Number 'Description' + CONVERT(varchar(3), @A)) -- Adding something for Description SET @A = @A + 1 END OK. Let’s delete some randomly selected records from the table. DELETE OrderHeader WHERE intID = 9212 DELETE OrderHeader WHERE intID = 2210 DELETE OrderHeader WHERE intID = 3200 If you run now a simple select query against the table, you will see some gaps between the column intID values. Now it is time to find these gaps and reuse. As I mentioned above there are two methods (or more methods if you have already done in some other way). First let’s see the BOL example. Method 1 DECLARE @NextIdentityValue int SELECT @NextIdentityValue = MIN(IDENTITYCOL) + IDENT_INCR('OrderHeader') FROM OrderHeader t1 WHERE IDENTITYCOL BETWEEN IDENT_SEED('OrderHeader') AND 32766 AND NOT EXISTS (SELECT * FROM OrderHeader t2 WHERE t2.IDENTITYCOL = t1.IDENTITYCOL + IDENT_INCR('OrderHeader')) SELECT @NextIdentityValue AS NextIdentityValue
Output:
NextIdentityValue -------------------2210
132
This is a very simple query. You can find the first deleted identity value and can reuse it. But remember you have to set the IDENTITY_INSERT ON that is allowed to explicit values to be inserted into identity column. SET IDENTITY_INSERT OrderHeader ON INSERT INTO OrderHeader (intID, strOrderNumber, strDescription) VALUES (@NextIdentityValue, ‘OD-' + CONVERT(varchar(3), @A), 'Description' + CONVERT(varchar(3), @A)) SET IDENTITY_INSERT OrderHeader OFF Method 2 Now I am going to create another table that is called “tb_Numbers” and has only one column that contains numbers in sequence. In most of my databases, I have created and used this table for many tasks. Let me come with those in my future articles. IF OBJECT_ID('tb_Numbers') IS NOT NULL DROP TABLE tb_Numbers GO CREATE TABLE tb_Numbers (intNumber int PRIMARY KEY) Note that I have inserted 30000 records (numbers) into the table. The range is depending on the usage of this table. In my some of databases, this range was 1 to 1000000. DECLARE @A1 int SET @A1 = 1 WHILE (@A1 <> 30000) BEGIN INSERT INTO tb_Numbers (intNumber) VALUES (@A1) SET @A1 = @A1 + 1 END Now let’s query the gaps (or first deleted identity value) in the OrderHeader table SELECT TOP 1 @NextIdentityValue = intNumber FROM OrderHeader RIGHT OUTER JOIN tb_Numbers ON tb_Numbers.intNumber = OrderHeader.intID WHERE intID IS NULL AND intNumber < = (SELECT MAX(intID) FROM OrderHeader) SELECT @NextIdentityValue AS NextIdentityValue
Output:
NextIdentityValue -------------------2210 This is a very simple query too. I have used RIGHT OUTER JOIN to join the OrderHeader table with tb_Numbers. This join causes to return all rows (numbers) from tb_Numbers table. Then I have used some search conditions (WHERE clauses) to get the correct result set. This result set contains all missing values in intID column. By using TOP 1, we can get the desired result. You can do the insertion the same way as I have done in Method 1. Now it is time to compare these two methods. I simply used STATISTICS IO and the EXECUTION TIME to get the evaluation.
133
Comparison DECLARE @StartingTime datetime, @EndingTime datetime Print ‘method1:’ SET STATISTICS IO ON SET @StartingTime = getdate() SELECT MIN(IDENTITYCOL) + IDENT_INCR('OrderHeader') FROM OrderHeader t1 WHERE IDENTITYCOL BETWEEN IDENT_SEED('OrderHeader') AND 32766 AND NOT EXISTS (SELECT * FROM OrderHeader t2 WHERE t2.IDENTITYCOL = t1.IDENTITYCOL + IDENT_INCR('OrderHeader')) SET @EndingTime = getdate() SET STATISTICS IO OFF SELECT DATEDIFF(ms, @StartingTime, @EndingTime ) AS ExecTimeInMS Print ‘method2:’ SET STATISTICS IO ON SET @StartingTime = getdate() SELECT TOP 1 intNumber FROM OrderHeader RIGHT OUTER JOIN tb_Numbers ON tb_Numbers.intNumber = OrderHeader.intID WHERE intID IS NULL AND intNumber < = (SELECT MAX(intID) FROM OrderHeader) SET @EndingTime = getdate() SET STATISTICS IO OFF SELECT DATEDIFF(ms, @StartingTime, @EndingTime ) AS ExecTimeInMS
Output:
Method1: 2210 Table 'OrderHeader'. Scan count 9998, logical reads 20086, physical reads 0, readahead reads 0. ExecTimeInMS -----------200 Method2: 2210 Table reads Table reads
'tb_Numbers'. Scan count 1, logical reads 5, physical reads 0, read-ahead 0. 'OrderHeader'. Scan count 2, logical reads 14, physical reads 0, read-ahead 0.
ExecTimeInMS -----------0 As per the output, there are 20086 logical reads and it has taken 200 ms for the first method. But in the second method there are only 19 logical reads and the execution time is considerably less. That’s why I selected to
134
continue in my way. But there may be a side that I have not seen but you can see. So, try this, and see whether/how this T-SQL solution will suit you. I highly appreciate your comments and suggestions. You can reach me through [email protected].
Sequential Numbering Gregory Larsen 12/5/2003
Microsoft SQL server does not support a method of identifying the row numbers for records stored on disk, although there are a number of different techniques to associate a sequential number with a row. You might want to display a set of records might where each record is listed with a generated number that identifies the records position relative to the rest of the records in the set. The numbers might be sequential that start at 1 and are incremented by 1 for each following record, like 1,2,3,4, etc.. Or in another case you may want to sequentially number groupings of records where each specific set of records are numbered starting at 1 and incremented by 1 until the next set is reached where the sequence starts over. This article will show a number of different methods of assigning a record sequence number to records returned from a query.
Sequentially Numbering Records By Having an Identity Column Even though Microsoft SQL Server does not physically have a row number stored with each record, you can include one of your own. To have your own record number, all you need to do is include an identity column in your table definition. When you define the identity column you can specify an initial seed value of 1, and a increment value of 1. By doing this the identity column will sequentially number each row inserted into the table. Let me show you a simple CREATE TABLE statement that defines a ROW_NUMBER column, which will sequentially number records. SET NOCOUNT ON CREATE TABLE SEQ_NUMBER_EXAMPLE ( RECORD_NUMBER INT IDENTITY (1,1), DESCRIPTION VARCHAR(40)) INSERT INTO SEQ_NUMBER_EXAMPLE VALUES('FIRST RECORD') INSERT INTO SEQ_NUMBER_EXAMPLE VALUES('SECOND RECORD') INSERT INTO SEQ_NUMBER_EXAMPLE VALUES('THIRD RECORD') INSERT INTO SEQ_NUMBER_EXAMPLE VALUES('FOURTH RECORD') INSERT INTO SEQ_NUMBER_EXAMPLE VALUES('FIFTH RECORD') SELECT * FROM SEQ_NUMBER_EXAMPLE DROP TABLE SEQ_NUMBER_EXAMPLE When you run this code it produces the following output: RECORD_NUMBER DESCRIPTION 1 FIRST RECORD 2 SECOND RECORD 3 THIRD RECORD 4 FOURTH RECORD 5 FIFTH RECORD Now as you can see, each record has been automatically numbered using the identity column RECORD_NUMBER. One thing to consider when using this method is that there is no guarantee that these numbers are physically stored next to each other on disk, unless there is a clustered index on the RECORD_NUMBER column. If you use this method, either create a clustered index, or have an ORDER BY RECORD_NUMBER clause to ensure that the records are returned in sequential order. Also remember if you should delete records, then your sequential number will have missing values for each record deleted.
Sequentially Numbering Records by Using a Temporary Table Now you might not have designed your table to have an identity column, or even want to place one on your existing table, so another option is to insert the records you desired to have a sequence number into a temporary table. Here is some code that takes the Northwind.dbo.Employees table and copies only the Sales Representatives into a temporary table. This example uses this temporary table with a rank identity column to show a ranking of Sales Representatives by HireDate.
135
create table #HireDate (rank int identity, HireDate datetime, LastName nvarchar(20), FirstName nvarchar(20) ) insert into #HireDate (HireDate, LastName, FirstName) select Hiredate, LastName, Firstname from northwind.dbo.employees where Title = 'Sales Representative' order by HireDate Select cast(rank as char(4)) as Rank, cast(hiredate as varchar(23)) as HireDate, LastName, FirstName from #HireDate Drop table #HireDate The output of this example looks like this: Rank 1 2 3 4 5 6
HireDate Apr 1 1992 12:00AM May 1 1992 12:00AM May 3 1993 12:00AM Oct 17 1993 12:00AM Jan 2 1994 12:00AM Nov 15 1994 12:00AM
LastName Leverling
FirstName Janet
Davolio
Nancy
Peacock
Margaret
Suyama
Michael
King
Robert
Dodsworth
Anne
Sequentially Numbering Records by Altering Table OK, so you don’t want to create a temporary table, but instead you want to use the existing table to identify the row numbers for each record. You can still do this, provided you don’t have a problem with altering the table. To have row numbers, all you need to do is alter the table to add an identity column with a initial seed value of 1 and an increment of 1. This will number your rows from 1 to N where N is the number of rows in the table. Let's look at an example of this method using the pub.dbo.titles table. set nocount on alter table pubs.dbo.titles add rownum int identity(1,1) go select rownum, title from pubs.dbo.titles where rownum < 6 order by rownum go alter table pubs.dbo.titles drop column rownum Note this example first alters the table, then displays the first 5 rows, and lastly drops the identity column. This way the row numbers are produced, displayed and finally removed, so in effect the table is left as it was prior to running the script. The output from the above script would look like this. rownum title ----------- ---------------------------------------------------------------1 But Is It User Friendly? 2 Computer Phobic AND Non-Phobic Individuals: Behavior Variations 3 Cooking with Computers: Surreptitious Balance Sheets 4 Emotional Security: A New Algorithm
136
5 Fifty Years in Buckingham Palace Kitchens
Sequentially Numbering Records by Using a Self Join Now say your table does not have an identify column, you don’t want to use a temporary table or alter your existing table, but you still would like to have a record number associated with each record. In this case you could use a self join to return a record number for each row. Here is an example that calculates a RecNum column, and displays the LastName for each record in the Northwind.dbo.Employees table. This example uses count(*) to count the number of records that are greater than or equal LastName in this self join. SELECT count(*) RecNum, a.LastName FROM Northwind.dbo.Employees a join Northwind.dbo.Employees b on a.LastName >= b.LastName group by a.LastName order by a.LastName The results from this query looks like this: RecNum 1 2 3 4 5 6 7 8 9
LastName Buchanan Callahan Davolio Dodsworth Fuller King Leverling Peacock Suyama
This method works well for a small number of records, a few hundred or less. Since the number of records counts produced by a self join can grow quite big when large sets are involved, causing the performance of this technique to have a slow response times for large set. This method also does not work if there are duplicate values in the columns used in the self join. If there are duplicates then the RecNum column will contain missing values.
Sequentially Number Records by Using a Cursor A cursor can be used to associate a sequential number with records. To use this method you would allocate a cursor, then process through each cursor record one at a time associating a record number with each record. Here is an example that does just that. This example displays the author's last and first name with a calculated recnum value for each author in the pubs.dbo.authors table where the authors last name is less than ‘G’. Each author is displayed in order by last name and first name with the first author alphabetically being assigned a recnum of 1, and for each successive author the recnum is incremented by one. declare @i int declare @name varchar(200) declare authors_cursor cursor for select rtrim(au_lname) + ', ' + rtrim(au_fname) from pubs.dbo.authors where au_lname < 'G' order by au_lname, au_fname open authors_cursor fetch next from authors_cursor into @name set @i = 0 print 'recnum name' print '------ -------------------------------' while @@fetch_status = 0 begin set @i = @i + 1 print cast(@i as char(7)) + rtrim(@name) fetch next from authors_cursor into @name end close authors_cursor
137
deallocate authors_cursor Output from the cursor query looks like this RecNum Name 1 Bennet, Abraham 2 Blotchet-Halls, Reginald 3 Carson, Cheryl 4 DeFrance, Michel 5 del Castillo, Innes 6 Dull, Ann
Sequentially Numbering Groups of Records Another case I have run across for sequentially numbering records, is where you want to number groups of records. Where each group starts numbering from 1 to N, where N is the number of records in the group, and then starts over again from 1, when the next group is encountered. For an example of what I am talking about, let's say you have a set of order detail records for different orders, where you want to associate a line number with each order detail record. The line number will range from 1 to N, where N is the number of order detail records per order. The following code produces line numbers for orders in the Northwind Order Detail table. select OD.OrderID, LineNumber, OD.ProductID, UnitPrice, Quantity, Discount from Northwind.dbo.[Order Details] OD join (select count(*) LineNumber, a.OrderID, a.ProductID from Northwind.dbo.[Order Details] A Join Northwind.dbo.[Order Details] B on A.ProductID >= B.ProductID and A.OrderID = B.OrderID group by A.OrderID, A.ProductID ) N on OD.OrderID= N.OrderID and OD.ProductID = N.ProductID where OD.OrderID < 10251 order by OD.OrderID, OD.ProductID This code is similar to the prior self join example, except this code calculates the LineNumber as part of a subquery. This way the LineNumber calculated in the subquery can be joined with the complete Order Detail record. The above query produces the following output: OrderID LineNumber ProductID 10248 1 11 10248 2 42 10248 3 72 10249 1 14 10249 2 51 10250 1 41 10250 2 51 10250 3 65
UnitPrice 14.0000 9.8000 34.8000 18.6000 42.4000 7.7000 42.4000 16.8000
Quantity 12 0.0 10 5 9 40 10 35 15
Discount 0.0 0.0 0.0 0.0 0.0 0.15000001 0.15000001
Conclusion These examples represent a number of different approaches at sequentially numbering sets for records. None of these methods are perfect. But hopefully these methods will give you some ideas on how you might be able to tackle your sequential record numbering issues
Using Built in Functions in User Defined Functions Nagabhushanam Ponnapalli 8/7/2003
138
If you follow the various newsgroups on Microsoft SQL Server and other user groups, you often see people asking, ‘Is there any way to use GETDATE() inside a user defined function?’. The answer to this simple question is NO. But there is a way to do this. In this article I will explain how you can use built-in functions inside a UDF. As we know, SQL Server does not allow you to use a Built-in function that can return different data on each call inside user-defined functions. The built-in functions that are not allowed in user-defined functions are: GETDATE GETUTCDATE NEWID RAND TEXTPTR @@CONNECTIONS @@CPU_BUSY @@IDLE @@IO_BUSY @@MAX_CONNECTIONS @@PACK_RECEIVED @@PACK_SENT @@PACKET_ERRORS @@TIMETICKS @@TOTAL_ERRORS @@TOTAL_READ @@TOTAL_WRITE If you really want to use them inside a UDF, here is the way - create a view called v_Built_in_funs and call the view inside your UDF. Here is the example: CREATE VIEW v_Built_in_funs AS select getdate() systemdate, @@spid spid The below UDF returns new objects created on current day: CREATE FUNCTION fnGetNewObjects() RETURNS TABLE AS RETURN ( SELECT name,CASE xtype WHEN 'C' THEN 'CHECK constraint' WHEN 'D' THEN 'Default or DEFAULT constraint' WHEN 'F' THEN 'FOREIGN KEY constraint' WHEN 'L' THEN 'Log' WHEN 'FN' THEN 'Scalar function' WHEN 'IF' THEN 'Inlined table-function' WHEN 'P' THEN 'Stored procedure' WHEN 'PK' THEN 'PRIMARY KEY constraint (type is K)' WHEN 'RF' THEN 'Replication filter stored procedure ' WHEN 'S' THEN 'System table' WHEN 'TF' THEN 'Table function' WHEN 'TR' THEN 'Trigger' WHEN 'U' THEN 'User table' WHEN 'UQ' THEN 'UNIQUE constraint (type is K)' WHEN 'V' THEN 'View' WHEN 'X' THEN 'Extended stored procedure' ELSE NULL END OBJECT_TYPE FROM sysobjects, v_Built_in_Funs WHERE CONVERT(VARCHAR(10),crdate,101) = CONVERT(VARCHAR(10),systemdate,101)) Call the UDF to get new objects created for the day: SELECT * FROM fnGetNewObjects()
Understanding the difference between IS NULL and = NULL James Travis 6/17/2003 When a variable is created in SQL with the declare statement it is created with no data and stored in the variable table (vtable) inside SQLs memory space. The vtable contains the name and memory address of the variable. However, when the variable is created no memory address is allocated to the variable and thus the variable is not defined in terms of memory.
139
When you SET the variable it is allotted a memory address and the initial data is stored in that address. When you SET the value again the data in the memory address pointed to by the variable is then changed to the new value. Now for the difference and why each behaves the way it does.
“= NULL” “= NULL” is an expression of value. Meaning, if the variable has been set and memory created for the storage of data it has a value. A variable can in fact be set to NULL which means the data value of the objects is unknown. If the value has been set like so: DECLARE @val CHAR(4) SET @val = NULL You have explicitly set the value of the data to unknown and so when you do: If @val = NULL It will evaluate as a true expression. But if I do: DECLARE @val CHAR(4) If @val = NULL It will evaluate to false. The reason for this is the fact that I am checking for NULL as the value of @val. Since I have not SET the value of @val no memory address has been assigned and therefore no value exists for @val. Note: See section on SET ANSI_NULLS (ON|OFF) due to differences in SQL 7 and 2000 defaults that cause examples to not work. This is based on SQL 7.
“IS NULL” Now “IS NULL” is a little trickier and is the preferred method for evaluating the condition of a variable being NULL. When you use the “IS NULL” clause, it checks both the address of the variable and the data within the variable as being unknown. So if I for example do: DECLARE @val CHAR(4) If @val IS NULL PRINT ‘TRUE’ ELSE PRINT ‘FALSE’ SET @val = NULL If @val IS NULL PRINT ‘TRUE’ ELSE PRINT ‘FALSE’ Both outputs will be TRUE. The reason is in the first @val IS NULL I have only declared the variable and no address space for data has been set which “IS NULL” check for. And in the second the value has been explicitly set to NULL which “IS NULL” checks also.
SET ANSI_NULLS (ON|OFF) Now let me throw a kink in the works. In the previous examples you see that = NULL will work as long as the value is explicitly set. However, when you SET ANSI_NULLS ON things will behave a little different. Ex. DECLARE @val CHAR(4) SET @val = NULL SET ANSI_NULLS ON If @val =NULL PRINT ‘TRUE’ ELSE PRINT ‘FALSE’
140
SET ANSI_NULLS OFF If @val =NULL PRINT ‘TRUE’ ELSE PRINT ‘FALSE’ You will note the first time you run the = NULL statement after doing SET ANSI_NULLS ON you get a FALSE and after setting OFF you get a TRUE. The reason is as follows. Excerpt from SQL BOL article “SET ANSI_NULLS” The SQL-92 standard requires that an equals (=) or not equal to (<>) comparison against a null value evaluates to FALSE. When SET ANSI_NULLS is ON, a SELECT statement using WHERE column_name = NULL returns zero rows even if there are null values in column_name. A SELECT statement using WHERE column_name <> NULL returns zero rows even if there are nonnull values in column_name. When SET ANSI_NULLS is OFF, the Equals (=) and Not Equal To (<>) comparison operators do not follow the SQL-92 standard. A SELECT statement using WHERE column_name = NULL returns the rows with null values in column_name. A SELECT statement using WHERE column_name <> NULL returns the rows with nonnull values in the column. In addition, a SELECT statement using WHERE column_name <> XYZ_value returns all rows that are not XYZ value and that are not NULL. End Excerpt So as defined by SQL92, “= NULL” should always evaluate false. So even setting the value explicitly means you will never meet the = NULL if condition and your code may not work as intended. The biggest reason where = NULL will shoot you in the foot is this, SQL 7 when shipped and installed is defaulted to ANSI_NULL OFF but SQL 2000 is defaulted to ANSI_NULL ON. Of course you can alter this several ways but if you upgraded a database from 7 to 2000 and found the = NULL worked only when you set if explicitly when you roll out a default 2000 server your code now breaks and can cause data issues. Yet another reason to use IS NULL instead as under SQL 92 guidelines it is still going to evaluate to TRUE and thus your code is safer for upgrading the server.
Summary If summary unless you need to check that the value of a variable was set to equal NULL and you have set ANSI_NULLS ON, then always use the “IS NULL” clause to validate if a variable is NULL. By using = NULL instead you can cause yourself a lot of headaches in trying to troubleshoot issues that may arise from it, now or unexpectedly in the future.
Basis Some of the information provided comes from how C++ works and how SQL behaves under each circumstance. Unfortunately, SQL as far as I know does not have an addressof function to allow me to output the actual memory address to show what occurs under the hood. In C++ when a variable is created the variable has an address of 0xddddddd (in debug but it can be different non-real addresses as well). When you set the variable the first time checking the address will give you a valid memory address where the data is being stored. Also, more information can be obtained from SQL Books Online in the sections on IS NULL and SET ANSI_NULLS….
Using Exotic Joins in SQL Part 1 Chris Cubley 1/22/2003 When most developers think of joins, they think of “a.SomethingID = b.SomethingID”. This type of join, the equijoin, is vitally important to SQL programming; however, it only scratches the surface of the power of the SQL join. This is the first in a series of articles that will look at several different types of “exotic” joins in SQL. This article will focus on using the BETWEEN operator in joins when dealing with range-based data.
141
Introducing the BETWEEN Join When dealing with things like calendars, grading scales, and other range-based data, the BETWEEN operator comes in very handy in the WHERE clause. It is often forgotten that the BETWEEN operator can also be used in join criteria. In the WHERE clause, the BETWEEN operator is usually used to test whether some field is between two constants. However, the BETWEEN operator can take any valid SQL expression for any or all of its three arguments. This includes columns of tables. One use of a BETWEEN join is to determine in which range a particular value falls. Joins of this nature tend to have the following pattern: BETWEEN AND In this pattern, the “fact data” is contained in a table with instances of data such as payments, test scores, login attempts, or clock in/out events. The other table, the “range lookup table”, is usually a smaller table which provides a range minimum and maximum and other data for the various ranges. For example, consider a scenario in which a student is enrolled in a class. A student receives a numeric grade for a class on a scale of 0 to 100. This numeric grade corresponds to a letter grade of A, B, C, D, or E. However, the school does not use the traditional grading scale in which 90 to 100 corresponds to an A, 80-89 corresponds to a B, and so forth. Instead, the school uses the following grading scale: Letter Grade A B C D E
Numeric Grade 92 – 100 84 – 91 76 – 83 68 – 75 0 – 68
To accommodate the school’s custom grading scale, their records database has the following table defined: CREATE TABLE tb_GradeScale( LetterGrade char(1) NOT NULL, MinNumeric int NOT NULL, MaxNumeric int NOT NULL, IsFailing smallint NOT NULL, CONSTRAINT PK_GradeScale PRIMARY KEY(LetterGrade), CONSTRAINT CK_MinMax CHECK(MinNumeric <= MaxNumeric) ) The students’ numeric scores are stored in the following table: CREATE TABLE tb_StudentGrade( StudentID int NOT NULL, ClassID varchar(5) NOT NULL, NumericGrade int NOT NULL, CONSTRAINT PK_StudentGrade PRIMARY KEY(StudentID, ClassID), CONSTRAINT CK_StudentGrade_NumericGrade CHECK(NumericGrade BETWEEN 0 AND 100) ) In this scenario, the tb_StudentGrade table is the “fact table” and the tb_GradeScale table is the “range lookup table”. The NumericGrade field serves as “fact data” while the MinNumeric and MaxNumeric fields serve as the “range minimum” and “range maximum”. Thus, following the fact-min-max pattern, we can construct the following join criteria: NumericGrade BETWEEN MinNumeric AND MaxNumeric If we put these join criteria into the context of a query which generates a report containing all the students’ letter grades for English 101, we end up with the following: SELECT s.StudentID, g.LetterGrade FROM
142
tb_StudentGrade s INNER JOIN tb_GradeScale g ON( s.NumericGrade BETWEEN g.MinNumeric AND g.MaxNumeric ) WHERE ClassID = 'EH101' In this query, we join the student grade table with the grading scale table in order to translate a numeric grade to a letter grade. In order to accomplish this, we use the BETWEEN operator to specify the relationship between the two tables being joined.
Using BETWEEN With Temporal Data Some of the trickiest queries to write are those that deal with temporal data like calendars, appointment times, and class schedules. For example, many businesses have a fiscal calendar that they use for accounting. Accounting periods may start on the 26th of the month and end on the 25th of the following month. The company may vary the starting and ending dates of each accounting period to even out the number of days in each accounting period. In order to generate reports by accounting period, you need to define a table that lays out the fiscal calendar being used. Such a table may look like this: CREATE TABLE tb_FiscalCalendar( FiscalYear int NOT NULL, AcctPeriod int NOT NULL, StartDatetime datetime NOT NULL, EndDatetime datetime NOT NULL, CONSTRAINT PK_FiscalCalendar PRIMARY KEY(FiscalYear, AcctPeriod), CONSTRAINT CK_FiscalCalendar_DateCheck CHECK(StartDatetime < EndDatetime) ) In this table, the FiscalYear column indicates the fiscal year to which the accounting period belongs. The AcctPeriod column identifies the accounting period within the fiscal year. The StartDatetime and EndDatetime columns specify the actual starting and ending date and time of the accounting period. Suppose you are trying to write a report as part of a customer payment processing system. This report summarizes the total number and amount of payments by accounting period. The records of the customer payments are stored in the following table: CREATE TABLE tb_Payment( PaymentID int NOT NULL IDENTITY(1, 1), AccountID int NOT NULL, PostedDatetime datetime NOT NULL DEFAULT(GETDATE()), PaymentAmt money NOT NULL, CONSTRAINT PK_Payment PRIMARY KEY(PaymentID) ) In order to construct the query needed for the report, you must first determine the fiscal year and accounting period in which each payment occurred. You must then group by the fiscal year and accounting period, summing the PaymentAmt field and counting the number of records in each group. To determine each payment’s accounting period, you can use a BETWEEN join to the tb_FiscalCalendar table: FROM tb_Payment p INNER JOIN tb_FiscalCalendar c ON( p.PostedDatetime BETWEEN c.StartDatetime AND c.EndDatetime ) As do many other joins using the BETWEEN operator, this join follows the fact-min-max pattern seen in the grading scale example. Each payment record (of which there are many) provides a “fact” stating that a certain payment occurred at a particular date and time. The fiscal calendar table acts more as a configuration table that specifies a range of datetime values and provides configuration data about this range.
143
To finish off the payment reporting query, we add the grouping, aggregate functions, and an ORDER BY clause to make the output more readable: SELECT c.FiscalYear, c.AcctPeriod, COUNT(*) AS PaymentCount, SUM(PaymentAmt) AS TotalPaymentAmt FROM tb_Payment p INNER JOIN tb_FiscalCalendar c ON( p.PostedDatetime BETWEEN c.StartDatetime AND c.EndDatetime ) GROUP BY c.FiscalYear, c.AcctPeriod ORDER BY c.FiscalYear, c.AcctPeriod The output yields the needed report easily and efficiently. With proper indexing, this query should run quite well even against large sets of data.
Other Uses of BETWEEN Joins The BETWEEN join can be put to use in a number of other scenarios. Coupling the BETWEEN operator with a self-join can be a useful technique for concurrency-checking queries such as validating calendars and appointment schedules. BETWEEN joins can be used to produce histograms by aggregating ranges of data. In a situation where you must join precise data to rounded data, the BETWEEN operator can be used to perform a “fuzzy” join. Once you put the BETWEEN join in your query toolbox, you’ll find even more uses for it and wonder how you ever did without it.
Using Exotic Joins in SQL - Part 2 Chris Cubley 2/5/2003 In the previous article, you saw how the BETWEEN operator could be used in joins to solve problems dealing with range-based data. In this article, I will show you how to take joins even further by using multiple criteria in joins as well as using the greater than, less than, and not equals operators in joins.
Compound Joins Compound joins are joins which use multiple criteria combined with a logical operator such as AND. This is a relatively simple concept and is commonly used in database systems that employ compound primary keys. For a simple example of a database schema in which compound joins are necessary, consider a school management system where one of the features is tracking which classes are taught in which classrooms. The system must match up the features of the classrooms to the needs of the classes. In order to perform these functions, the following two tables are defined: CREATE TABLE tb_Classroom( BuildingName char(10) NOT NULL, RoomNumber int NOT NULL, RoomCapacity int NOT NULL, HasLabEquip smallint NOT NULL, CONSTRAINT PK_Classroom PRIMARY KEY(BuildingName, RoomNumber) ) CREATE TABLE tb_ClassSection( CourseID char(5) SectionNumber smallint
NOT NULL, NOT NULL,
144
BuildingName char(10) NOT NULL, RoomNumber int NOT NULL, InstructorID int NOT NULL, ScheduleID int NOT NULL, SectionCapacity int NOT NULL, RequiresLabEquip smallint NOT NULL, CONSTRAINT PK_ClassSection PRIMARY KEY(CourseID, SectionNumber), CONSTRAINT FK_ClassSection_Classroom FOREIGN KEY(BuildingName, RoomNumber) REFERENCES tb_Classroom(BuildingName, RoomNumber) ) In this example, the tb_Classroom table defines a list of classrooms in which classes are taught. The tb_ClassSection table contains instances of various courses taught at the school. A class section is taught in a particular classroom by an instructor according to a standard class schedule. Both the tb_Classroom and tb_ClassSection tables use natural compound primary keys. One of the reports in the school management system lists the class sections being taught along with the capacity of their respective classrooms. In order to construct this report, the tb_ClassSection table must be joined the tb_Classroom table based upon the compound primary key of the tb_Classroom table. This can be accomplished by using a compound join to return rows where both the BuildingName AND RoomNumber columns match. SELECT s.CourseID, s.SectionNumber, c.RoomCapacity FROM tb_ClassSection s INNER JOIN tb_Classroom c ON( s.BuildingName = c.BuildingName AND s.RoomNumber = s.RoomNumber ) This query is relatively straightforward. If you’ve been using SQL for a while, chances are you’ve seen queries like it. The query is a simple equijoin that uses the AND logical operator to include multiple criteria. Despite its simplicity, this example provides the basis for a much more powerful query construction tool.
Joins Using Inequality Comparison Operators The school management system from the first example also contains a report listing all class sections in which the classroom is not large enough to accommodate the maximum number of students for the class section. To determine which class sections meet these criteria, the system must compare the class section’s capacity to the capacity of the classroom in which it is being taught. If the classroom’s capacity is less than the class section’s capacity, then the class section should be included in the result set. With this query, the trick is to first join each class section to the classroom in which is being taught and then add the additional criterion that the classroom’s capacity is less than that of the class section. To do this, simply take the query from the last example and add the additional criterion. SELECT s.CourseID, s.SectionNumber, c.RoomCapacity, s.SectionCapacity FROM tb_ClassSection s INNER JOIN tb_Classroom c ON( s.BuildingName = c.BuildingName AND s.RoomNumber = s.RoomNumber
145
AND c.RoomCapacity < s.SectionCapacity
) A common mistake when constructing queries such as this is not including the equijoin criteria necessary to match up the rows to be compared by the inequality operator. If only the inequality comparison is included in the criteria, the query returns all the rows where a classroom’s capacity is less than that of any class section, regardless of whether or not the class section was taught in that classroom.
Not Equals Joins You may be wondering how in the world it could be useful to use the Not Equals operator in a join. For an example, consider another report in the school management system in which you list the misallocation of laboratory-equipped classrooms. This report must list all of the class sections that require laboratory equipment, but are scheduled to be taught in classrooms that do not have it. The report should also include all non-lab class sections being taught in laboratory classrooms. In the tables, the class sections that require laboratory equipment are indicated with a RequiresLabEquip value of 1, and the classrooms equipped with laboratory equipment are indicated with a HasLabEquip value of 1. This problem follows a similar pattern to that of the capacity problem. The only difference is the use of the Not Equals operator in place of the Less Than operator. After matching the class section with the classroom in which it is being taught, the value of the RequiresLabEquip column must be compared with the HasLabEquip column. If these values are not equal, there is a laboratory equipment allocation problem and the class section should be included on the report. Applying these criteria result in the following query: SELECT s.CourseID, s.SectionNumber, c.HasLabEquip, s.RequiresLabEquip FROM tb_ClassSection s INNER JOIN tb_Classroom c ON( s.BuildingName = c.BuildingName AND s.RoomNumber = s.RoomNumber AND c.HasLabEquip <> s.RequiresLabEquip ) When using the Not Equals operator in joins, it is even more vital to remember to use additional join criteria than it is when using the Greater Than and Less Than operators. In this case, if only the Not Equals criterion was specified, the query would perform a cross join and then exclude only the class section-classroom pairs where the laboratory indicator was not equal. If there were 100 classrooms and 500 class sections, this could possibly return a result set of 25,000 - 50,000 rows – definitely not what was intended.
Beyond the Basics Compound joins, especially those employing the Inequality and Not Equals operators, can be used with other advanced SQL techniques to construct queries that are even more powerful. The key to leveraging these advanced joins is to spell out the requirements of the problem clearly in terms of relating the available sets of data. Once the relationships between the sets of data are understood, these relationships can be encoded into join criteria. This technique, along with testing and validation of the output, will enable you to solve complicated problems effectively.
146
Replication Replication is one of those topics that isn’t as widely used, but can be mission critical to the success of an application or enterprise. Moving data seamlessly between systems, distributing it widely, making it available without the hassles of custom programming can be critical for a DBA. Unfortunately point and click does not always work and a deeper understanding is needed. Here are a few articles from 2003 from those dealing with real world replication. Altering Replicated Tables (SQL 2000)
Andy Warren
148
147
Altering Replicated Tables (SQL 2000) Andy Warren 8/8/2003 A few weeks ago I published an article about modifying replicated tables with SQL 7. If you haven't read that article, I encourage you to do so before continuing. With SQL 2000 you can now add a column to a table (and a publication) with very little effort. The only thing to remember is that if you want the new column to be added to the subscribers, you MUST make the change via the 'Filter Columns' tab of the publication properties. SQL still provides no help if you want to modify an existing column. You can drop a column as long as it is not part of the primary key or part of a filter (thanks to Jeff Cook for pointing this out to me). If you don't want the new column to be part of any existing publication you can add the column via Enterprise Manager or Query Analyzer. For the following demo, I created two databases, ReplSource and ReplDestination, both on the same machine running an instance of SQL2K Developer Edition. I then imported the Authors table from Pubs into ReplSource and created a standard transactional publication, using the default options. Here is the original schema:
To use the Filter Columns tab you can either use 'Create & Manage Publications' found on the Tools|Replication menu, or you can right click the publication itself either under Databases or under Replication Monitor.
148
Click on Filter Columns. You'll see the Add Column to Table button. Clicking that brings up the following dialog. My one complaint here is that instead of the nice editing tools you normally get when making changes through Enterprise Manager, you have to type everything in. If you're not sure of the syntax, make a quick copy of the table schema and use Enterprise Manager to make the change, then script the changes out so you can copy the DDL for the column you're adding. If you make a mistake here, you'll have to apply the same process you would with SQL 7!
In this example I'm adding a column called country.
149
Once you add a column, it's automatically selected as part of the article. When you close the publication properties the change will be sent to each subscriber the next time the log reader & distribution agent run.
150
That's all there is to it. A big step up from SQL 7 and if you do use these changes often, probably worth the upgrade right there! You've probably noticed that there is also a 'Drop Selected Column' button. Let's look at what happens when you click it:
That's right, even though you're working on a publication, if you use this button it will actually drop the column from both the publisher and all the subscribers. Useful, but use with care! Another thing you can do from Filter Columns is to remove a column from the article. You just can't do this easily in SQL 7, but with SQL 2000 you just clear the checkbox – well, almost. It does most of the work for you, but unfortunately requires you to force a snapshot to occur. Until the snapshot is done, no transactions will be distributed to subscribers of that publication.
151
That's all there is to it. SQL 2000 greatly reduces the time needed to perform one of the more common tasks of adding a column to a published article. Maybe in a future release we'll see enhancements that will support modifying existing columns without having to do a snapshot.
152
XML It’s been a few years now that XML has been a hot buzzword in the computer industry. SQL Server 2000 added XML capabilities to SQL Server, but few of us ever use them judging by the number of articles and questions on the subject. We had relatively few articles in 2003 that dealt with XML, and here are a couple that we picked for publication. Is XML the Answer?
Don Peterson
154
153
Is XML the Answer? Don Peterson 10/7/2003
Despite the breathless marketing claims being made by all the major vendors and the natural desire to keep skills up-to-date, it would be prudent to examine exactly what advantages are offered by XML and compare them to the costs before jumping headlong into the XML pool. The idea of XML is simple enough; basically just add tags to a data file. These tags are sometimes referred to as metadata. XML is inherently and strongly hierarchical. The main benefits are touted as being: • Self describing data • Facilitation of cross-platform data sharing or “loose coupling” of applications • Ease of modeling “unstructured” data
Self-describing At first the idea of self-describing data sounds great, but let’s look at it in detail. A classic example of the selfdescribing nature of XML is given as follows: Shirt Red <Size>L <Style>Hawaiian Y One possible equivalent text document could be as follows: Red,L,Hawaiian,Y Anyone can look at the XML document and infer the meaning of each item, not so for the equivalent simple text document. But is this truly an advantage? After all, it’s not people we want to read the data files, it’s a machine. Which is more efficient for a machine to read or generate? Which makes better use of limited network bandwidth? The XML file is more than six times the size of the plain text file. In my experience XML files will tend to be around 3–4 times the size of an equivalent delimited file. Due to the bloated nature of XML, hardware vendors are actually offering accelerators to compensate. Worse yet, there are more and more non-standard XML parsers being written to “optimize” XML, thus completely destroying any illusion of “compatibility.” (See http://techupdate.zdnet.com/techupdate/stories/main/0,14179,2896005,00.html)
Communication facilitation The self-documenting nature of XML is often cited as facilitating cross application communication because as humans we can look at an XML file and make reasonable guesses as to the data’s meaning based on hints provided by the tags. Also, the format of the file can change without affecting that communication because it is all based on tags rather than position. However, if the tags change, or don’t match exactly in the first place, the communication will be broken. Remember that, at least for now, computers are very bad at guessing. In order to effect communication between systems with a text file, both the sender and receiver must agree in advance on what data elements will be sent (by extension, this mandates that the meaning of each attribute is defined), and the position of each attribute in the file. When using XML each element must be defined and the corresponding tags must be agreed upon. Note that tags in and of themselves are NOT sufficient to truly describe the data and its meaning which, of necessity, includes the business rules that govern the data’s use unless a universal standard is created to define the appropriate tag for every possible thing that might be described in an XML document and that standard is rigorously adhered to. (See http://www.well.com/~doctorow/metacrap.htm) That XML is self-describing has led many to wrongly assume that their particular tags would correctly convey the exact meaning of the data. At best, tags alone convey an approximate meaning, and approximate is not good enough. In fact, it has been noted that XML tags are metadata only if you don’t understand what metadata really is. (http://www.tdan.com/i024hy01.htm).
154
No matter the method of data transmission, the work of correctly identifying data and its meaning is the same. The only thing XML “brings to the table” in that regard is a large amount of overhead on your systems.
Unstructured data The very idea of unstructured or semi-structured data is an oxymoron. Without a framework in which the data is created, modified and used data is just so much gibberish. At the risk of being redundant, data is only meaningful within the context of the business rules in which it is created and modified. This point cannot possibly be overemphasized. A very simple example to illustrate the point follows: the data ‘983779009-9937’ is undecipherable without a rule that tells me that it is actually a valid part number. Another example often thrown about by XML proponents is that of a book. A book consists of sections, chapters, paragraphs, words, and letters all placed in a particular order, so don’t tell me that a book is unstructured. Again, what benefit does XML confer? None. The data still must be modeled if the meaning is to be preserved, but XML is inherently hierarchical and imposes that nature on the data. In fact it has been noted that XML is merely a return to the hierarchical databases of the past, or worse yet, a return to application managed hierarchical data files. The problem is that not all data is actually hierarchical in nature. The relational model of data is not inherently hierarchical but it is certainly capable of preserving hierarchies that actually do exist. Hierarchies are not neutral so a hierarchy that works well for one application, or one way of viewing the data, could be totally wrong for another, thus further eroding data independence. (http://www.geocities.com/tablizer/sets1.htm). Herein lies the real problem. No matter how bloated and inefficient XML may be for data transport, it is downright scary when it is used for data management. Hierarchical databases went the way of the dinosaur decades ago, and for good reason; they are inflexible and notoriously difficult to manage. I can understand why many object-oriented programmers tend to like XML. Both OO and XML are hierarchical and if you are used to thinking in terms of trees and inheritance, sets can seem downright alien. This is one of the fundamental problems with the OO paradigm and it’s about time that data management professionals educate themselves about the fundamentals. Set theory and predicate logic (the foundations of the relational model of data) have been proven superior to hierarchical DBMSs, which are based on graph theory. Why is it that data integrity always seems to take a back seat whenever some programmer cries about the perceived “impedance mismatch” between OO and relational data? Why is it that the “fault” is automatically assumed to lie with the database rather than a flawed programming paradigm? What I am seeing is a push from many development teams to store raw XML in the database as a large varchar, or text column. This turns the database into nothing more than a simple staging ground for XML. This, of course violates one of the first principles of database design: atomicity, or one column, one value. How can a DBMS enforce any kind of integrity on a single column containing raw XML? How do I know that the various XML strings stored in a given table are even related? Indexing and optimization using such a scheme is impossible.
Vendors Why are the major hardware and software vendors so excited about XML if it is so bad? There are several possibilities: • Ignorance. Often times marketing departments drive the products, and marketing departments like nothing more than for their products to be full buzzword compliant. • Stupidity. The technical “experts” are often ignorant as well, only they have no excuse, so I call it stupidity. I spent several hours at the last SQL PASS Summit trying to find someone on the SQL Server product team who could provide a single good reason to use XML. By the end of the conversation there were at least five “experts” around the table, all unable to make their arguments hold up to the scrutiny of reason. Some of the answers they gave were shockingly stupid. One of these “experts” stated tha the biggest benefit of XML is to allow programmers to easily adapt a database to changing needs by “loading” columns with multiple attributes of which the database is unaware! I’m sure they were glad to see me go so they could get back to their fantasy world of XML nirvana. I left that conversation with a growing sense of disquiet about the future direction of SQL Server. Instead of taking steps to more fully implement the relational model, they and other vendors are chasing their tails trying to implement a failed idea from decades past. • Greed. I recently read an article extolling the virtues of XML. In it the author claimed that companies are finding “XML enriches their information capabilities, it also results in the need for major systems
155
upgrades.” Interestingly, the author does not define or quantify just how XML “enriches” anyone but the software and hardware vendors. However you choose to look at it, the major vendors do not have your best interests at heart and when XML is finally recognized for the bad idea that it is, and they will gladly help you clean up the mess…for a price.
Conclusion Do not be fooled by the fuzzy language and glitzy marketing-speak. As data management professionals you have a responsibility to safeguard your company’s data and you can’t possibly do that effectively if you don’t know, or ignore, the fundamentals. Pick up An Introduction to Database Management Systems by Chris Date and Practical Issues in Database Management by Fabian Pascal and get yourself solidly grounded in sound data management principles. The alternative? Spend your time riding the merry-go-round chasing after the latest industry fad, which happens to be last year’s fad and so on…throwing money at vendors and consultants with each cycle.
156
Design and Strategies Every DBA needs to have guiding principles and rules. These may differ among individuals and organizations, but they will all be grounded in basic database principles. Codd's Rules
Frank Kalis
158
Design A Database Using an Entity-Relationship Diagram
Ramesh Gummadi
160
157
Codd's Rules Frank Kalis 12/10/2003
These rules were formulated by E.F.Codd and published in 1985 1). They describe what a relational database system must support in order to call itself relational. So, without further introduction, let's dive into the gospel of relational databases!
1. Information Rule Data is presented only in one way. As values in columns in rows. Simple, consistent and versatile. A table (aka an entity or relation) is a logical grouping of related data in columns and rows. Each row (aka record or tuple) represents a single fact and each row contains information about just one fact. Each column (aka field or attribute) describes a single property of an object. Each value (datum) is defined by the intersection of column and row.
2. Guaranteed Access Rule Each value can be accessed by specifying table name, primary key and column name. This way each and every value is uniquely identifiable and accessible. Basically this is a restatement of the fundamental requirement for primary keys.
3. Systematic treatment of Null values A fully relational database system must offer a systematic way to handle missing information. Null is always treated as unknown. Null does mean no value or the absence of a value. Because no value was entered, it follows that it is unknown. The information is missing. Null is not the same as an empty string or 0. Each value, Null included, compared with Null, is Null.
4. Dynamic relational online catalogue In addition to user defined data a relational database contains also data about itself. So there are two kinds of tables. • user-defined • system-defined Metadata is data which describe the structure of the database, its objects and how they are related. This catalogue is an integral part of the database and can be queried by authorized users just like any other table. Another name for this online catalogue is system catalogue or data dictionary.
5. Comprehensive Data Sublanguage Rule Codd's intention was to have at least one language to communicate with the database. This language should be capable to handle data definition, data manipulation, authorization, integrity constraints and transactional processing. It can be used both interactively and embedded within applications. Although SQL is not the only data query language, it is by far the most common one. SQL is a linear, nonprocedural or declarative language. It allows the user to state what he wants from the database, without explicitly stating where to find the data or how to retrieve the data.
158
6. View Updating Rule When presenting data to the user, a relational database should not be limited to tables. Views are 'virtual tables' or abstractions of the source tables. They react like tables with the one exception that they are dynamically created when the query is executed. Defining a view does not duplicate data. They are current at runtime. All theoretically updateable views should be updateable by the system. If data is changed in a view, it should also be changed in the underlying table. Updateable views are not always possible. For example there is a problem when a view addresses only that part of a table that includes no candidate key. This could mean that updates could cause entity integrity violations. Some sources on the internet state that 'Codd himself did not fully understand this'. I haven't found any rationale for this.
7. High-level Insert, Update and Delete A relational database system must support basic relational algebraic operations (Selection, Projection and Joins) as well as set operations like Union, Intersection, Division and Difference. Rows are treated like sets for data manipulation. Set operations and relational algebra are used to create new relations by operations on other tables.
8. Physical Data Independence The physical layer of the architecture is mapped onto the logical. Users and applications do not depend upon the physical structure of a database. Implementation of the physical layer is the job of the storage engine of a RDBMS. The relational engine communicates with the relational store without any interaction by the user. An application that queries data from a relational database does not need to know how this data is physically stored. It only sends the data request and leaves the rest to the RDBMS. Applications should not be logically impaired when the physical storage or access methods change.
9. Logical Data Independence Users and applications are to a certain degree independent of the logical structure of a database. The logical structure can be modified without redeveloping the database and/or the application. The relations between tables can change without affecting the functionality of applications or ad hoc queries.
10. Integrity Independence To be viewed as a relational database the RDBMS must implement data integrity as an internal part of the database. This is not the job of the application. Data integrity enforces consistence and correctness of the data in the database. Simply put, it keeps the garbage out of the database. Changes to integrity constraints should not have an effect on applications. This simplifies applications, but is not always possible.
11. Distribution Independence The data manipulation (sub)language of the RDBMS should be able to work with distributed databases. Views should be able to join data from tables on different servers (distributed queries) as well as from different RDBMS (heterogeneous queries). The user should not have to be aware of whether a database is distributed or not.
12. Nonsubversion Rule If the RDBMS supports a low-level (single record at a time) language, this low-level language should not be used to bypass and/or subvert data integrity that are expressed in the high-level (multiple records at a time) relational language.
0. Foundation Rule Interestingly Codd defined a Rule 0 for relational database systems.
159
"For any system that is advertised as, or claimed to be, a relational database management system, that system must be able to manage databases entirely through its relational capabilities, no matter what additional capabilities the system may support." (Codd, 1990) That means, no matter what additional features a relational database might support, in order to be truly called relational it must comply with the 12 rules. Codd added this rule in 1990. Also he expanded these 12 rules to 18 to include rules on catalogs, data types (domains), authorization and other. 2) Codd himself had to admit the fact that, based on the above rules, there is no fully relational database system available. This has not changed since 1990. To be more specific, rules 6, 9, 10, 11 and 12 seem to be difficult to satisfy.
REFERENCES: 1)
Codd, E.F. "Is Your DBMS Really Relational?" and "Does Your DBMS Run By the Rules?" ComputerWorld, October 14 1985 and October 21 1985. 2)
Codd, E.F. The Relational Model for Database Management, Version 2; Addison-Wesley; 1990.
Design A Database Using an Entity-Relationship Diagram Ramesh Gummadi 10/28/2003
Database developers involved in the task of designing a database have to translate real world data into relational data, i.e. data organized in the form of tables. First they have to understand the data, then represent it in a design view and then translate into a RDBMS. One of the techniques that is great to use is the E-R diagram. Most of the developers who are involved in data base systems might already be familiar with it, or at least have heard about it. I am going to try to briefly explain the concept and give an example to understand it.
So, What is the E-R Model? This model was first introduced by Dr.Peter Chen in 1976 in a paper titled "The Entity-Relationship Model – Toward a Unified View of Data". The most useful thing in this model is, it allows us to represent data in the form of a diagram popularly known as an E-R diagram and from this diagram we can map the data into a relational schema. First I will try to informally define some terms used in this model. Term
Definition
Example
Entity
Any real world object that has some well defined characteristics.
Employee, Professor, Department, Student, Course, etc.
Property (or) Attribute
Any characteristic of an entity.
For example an employee has a name, ID, department etc.; A professor has a name, the subject he teaches, the department he belongs to, etc.; A department has a name, number of employees, etc; A student has a name, id and class he belongs to; A course has a name, course number, etc.
Relationship
Association among entities or an entity that helps to connect two or more other entities.
Registration (connects student and the course entities) Project ( connects employee and department entities)
Regular Entity
An entity which is independent of other entities.
Course is a regular entity.
Weak Entity
An entity which is dependent on some other entity.
Section is a sub type of Course. i.e a course has many sections. Without a course there is no section.
160
In the E-R model all the above listed terms are represented in a diagrammatic technique known as the E-R diagram.
Lets draw an E-R Diagram Say we are given a task of designing a database for an university system. We try to recognise the various entities that form a university sytem and then establish relationships among them. I have represented this data in the form of an E-R diagram.
Each entity is shown as a rectangle. For weak entities the rectangle has a double border. In the above diagram, regular entities are University, College, Dean, Professor, Department, Student and Course. Section is a weak entity. Properties or attributes of an entity are shown in ellipses and are attached to their respective entity by a single solid line. In this diagram I am showing properties for only the student entity, for the sake of the clarity of the diagram. The relationships between entities are shown as diamonds and the entities which are a part of the relationship are connected to the diamond by a solid line labeled either '1' or 'M' indicating whether the relationship is one-to-many, one-to-one or many-to-many. Lets Map the E-R diagram to Relational Schema (as tables) Regular Entity – map to base table Weak Entity – map to base table with the primary key of the dominant entity as foreign key in the table. One-to-Many relationship – No need to introduce new table, but need a primary key on the '1' side to be a foreign key in the 'M' side. Many-to-Many relationship – New base table with two foreign keys corresponding to the two participants.
161
One-to-One or Zero relationship – Usually primary key on one side is a foreign key on the other side. Let me now derive tables from the above diagram from this set of rules. All the regular entities represented by a rectangle can be translated into base tables. Table – University UID (primary key)
int
Name
varchar (20)
Chancellor
varchar (20)
There is a 1–M relationship between University and College and 1–1 relationship between Dean and College. So the primary key in the table University will be a foreign key in the table College and a primary key in the table Dean will be a foreign key in the table College. The rest of the tables also follow the same pattern. Table – College CID (primary key)
int
University (foreign key references UID in University table)
int
Dean (foreign key references DeanID from Dean table)
int
Name
varchar (20)
Table – Dean DeanID (primary key)
int
Name
varchar (20)
Age
int
Table – Department DID (primary key)
int
College ( foreign key references CID in College table)
int
Chair (foreign key references PID in professor table)
int
Name
varchar (20)
Table – Professor PID (primary key)
int
Department ( foreign key references DID in Department table)
int
Name
varchar (20)
Table – Course CourseID (primary key)
Int
Department ( foreign key references DID in Department table)
Int
162
Name
varchar (20)
Table – Section SectionID (primary key)
Int
Course ( foreign key references CourseID in Course table)
Int
Professor (foreign key references PID in professor table)
Int
Name
varchar (20)
Table – Student StudentID (primary key)
int
Department ( foreign key references DID in Department table)
int
Name DateofEnrollment TelephoneNumber
varchar (20) smalldateti me varchar (20)
There is only one many-to-many relationship in the above diagram and that is between Section and Student. That means a student can register for many sections and a section has many students. To establish this relationship we will create a new table called Student_Registration. Table – Student_Registration Student (foreign key references StudentID in Student table)
Section ( foreign key references SectionID in Section table)
i n t i n t
Cool! Now we finished designing a database with the help of an E-R diagram. So, folks, tell me now if this technique is useful and simple to use and start using it for your projects.
Conclusion This example is simple and you can design this database from common sense, without actually using the E-R diagram. However when you are given a task of designing a database, first putting it in the form of a diagram makes your job easy. When the task is of designing a big data mart or data warehouse this technique is indispensable. I welcome any comments or suggestions.
References An Introduction to Database Systems by C.J. Date.
163
Miscellaneous Everything that’s left. Not the best or worst, not really in a theme, just everything that didn’t seem to fit in any of the other categories.
A Brief History of SQL
Frank Kali
165
Change Management
Chris Kempster
166
DBMS vs File Management System
Dale Elizabeth Corey
174
Introduction to English Query and Speech Recognition
Sean Burke
177
Lessons from my first Project as a Project Manager
David Poole
179
Pro Developer: This is Business
Christopher Duncan
182
Two Best Practices!
Darwin Hatheway
184
VBScript Class to Return Backup Information
Bruce Szabo
186
White Board, Flip Chart, or Notepad?
Andy Warren
192
164
A Brief History of SQL Frank Kalis 9/10/2003
The original concept behind relational databases was first published by Edgar Frank Codd (an IBM researcher, commonly referred to as E.F.Codd) in a paper, “Derivability, Redundancy, and Consistency of Relations Stored inLarge Data Banks” (RJ599), dated 08/19/1969. However, what is commonly viewed as the first milestone in the development of relational databases is a publication by Codd entitled "A Relational Model of Data for Large Shared Data Banks“ in Communications of the ACM (Vol. 13, No. 6, June 1970, pp. 377–87). This was only a revised version of the 1969 paper. This article awoke massive public opinion in both the academic community and industry in the feasibility and usability of relational databases for commercial products. Several other articles by Codd throughout the seventies and eighties are still viewed almost as gospel for relational database implementation. One of these articles is the famous so-called 12 rules for relational databases, which was published in two parts in Computerworld. Part 1 was named „Is Your DBMS Really Relational?"(published 10/14/1985); Part 2 was called „Does Your DBMS Run By the Rules? " (10/21/1985). Codd continuously added new rules to these 12 originals and published them in his book "The Relational Model for Database Management, Version 2"" (Addison-Wesley, 1990). But to continue with the evolution of SQL and relational databases we must take a step back in time to the year 1974. In 1974 Donald Chamberlin and others developed, for IBM, System R as a first prototype of a relational database. The Query Language was named SEQUEL (Structured English Query Language). System R also became part of IBM’s prototype SEQUEL-XRM during 1974 and 1975. It was completely rewritten in 1976–1977. In addition there were new features like multi-table and multi-user capabilities implemented. The result of this revision was quickly named SEQUEL/2, but had to be renamed due to legal reasons to SQL because Hawker Siddeley Aircraft Company claimed the trademark SEQUEL for themselves. In 1978 systematic tests were performed to prove real world usability on customers’ systems. It became a big success for IBM, because this new system proved both useful and practical. With this result, IBM began to develop commercial products based on System R. SQL/DS came out in 1981; DB2 hit the streets in 1983. But although IBM had done most of the research work, in fact, it was a small, unknown software company named Relational Software to first release a RDBMS in 1979, two years before IBM. This unknown software company was later renamed Oracle. As an interesting sidenote, Relational Software released its product as Version 2. Obviously a brilliant marketing move: no one had to worry about a buggy and/or unstable Version 1 of this new kind of software product. The victory of SQL was well on its way. Being the de facto standard, SQL became also an official standard through the American National Standards Institute (ANSI) certification in 1986 (X3.135-1986). But this standard could only be viewed as a cleaned-up version of DB2’s SQL dialect. Just one year later, in 1987, followed the standardization by the International Standards Organization (ISO 9075-1987). Only two years later, ANSI released a revised standard (X3.135-1989). So did ISO with ISO/IEC 9075:1989. Partially due to the commercial interests of the software firms, many parts of the standard were left vague and unclear. This standard was viewed as the least common denominator and missed its intention. It was some 150 pages long. To strengthen and establish the standard, ANSI revised SQL89 thoroughly, and released in 1992 the SQL2 standard (X3.135-1992). This time they did it right! Several weaknesses of SQL89 were eliminated. Further conceptual features were standardized although at that time they were far beyond the possibilities of all relational databases. The new standard was some 500 pages long. But even today there is no single product available that fully complies with SQL92. Due to this disparity there were three levels of conformity introduced: • •
Entry level conformance. Only small changes compared to SQL89. Intermediate conformance. At least 50% of the standard has to be implemented.
165
•
Full conformance.
Recently SQL99, also known as SQL3, was published. This standard addresses some of the modern, previously ignored features of certain modern SQL systems. There are object-relational database models, call-level interfaces and integrity management. SQL99 replaces SQL92 level of conformance with its own: Core SQL99 and Enhanced SQL99. SQL99 is split into 5 parts: 1. Framework (SQL / Framework) 2. Foundation (SQL / Foundation) 3. Call Level Interface (SQL / CLI) 4. Persistent Stored Modules (SQL / PSM) 5. Host Languages Binding (SQL / Bindings) Another impressive feature of SQL99 is the number of pages of this standard, some 2,000 pages long. SQL has established itself as standard of database query language. So what’s next? Well, at a minimum, SQL4 is due to be released in this century.
Change Management Chris Kempster 3/18/2003 One of the many core tasks of the DBA is that of change control management. This article discusses the processes I use from day to day and follows the cycle of change from development, test then into production. The core topics will include: a) formalising the process b) script management c) developer security privileges in extreme programming environments d) going live with change e) managing ad-hoc (hot fix) changes
Environment Overview With any serious, mission critical applications development, we should always have three to five core environments in which the team is operating. They include: a) development a. rarely rebuilt, busy server in which the database reflects any number of change controls, some of which never get to test and others go all the way through. b) test a. refreshed from production of a regular basis and in sync with a "batch" of change controls that are going to production within a defined change control window. b. ongoing user acceptance testing c. database security privileges reflect what will (or is) in production c) production support a. mirror of production at a point in time for user testing and the testing of fixes or debugging of critical problems rather than working in production. d) pre-production a. mirror of production b. used when "compiling code" into production and the final pre-testing of production changes e) production
166
The cycle of change is shown in the diagram below through some of these servers:
We will discuss each element of the change window cycle throughout this article. The whole change management system, be it in-house built or a third party product has seen a distinct shift to the whole CRM (relationship management) experience, tying in a variety of processes to form (where possible) this:
167
This ties in a variety of policy and procedures to provide end-to-end service delivery for the customer. The "IR database" shown in the previous diagram doesn't quite meet all requirements, but covers resource planning, IR and task management, and subsequent change window planning. With good policy and practice, paper based processes to document server configuration and application components assist in other areas of the services delivery and maintenance framework.
Pre-change window resource meeting Every fortnight the team leaders, DBAs and the development manager discuss new and existing work over the next two weeks. The existing team of 20 contract programmers works on a variety of tasks, from new development projects extending current application functionality (long term projects and mini projects) to standard bug (incident request) fixing and system enhancements. All of this is tracked in a small SQL Server database with an Access front end, known as the "IR (incident reporting)" system. The system tracks all new developments (3 month max cycle), mini projects (5–10 days), long term projects (measured and managed in 3 month blocks) and other enhancements and system bugs. This forms the heart and sole of the team in terms of task management and task tracking. As such, it also drives the change control windows and what of the tasks will be rolled into production each week (we have a scheduled downtime of 2 hours each Wednesday for change controls). The resource meeting identifies and deals with issues within the environments, tasks to be completed or nearing completion and the work schedule over the next two weeks. The Manager will not dictate the content of the change window but guide resourcing and task allocation issues. The team leaders and the development staff will allocate their tasks to a change control window with a simple incrementing number representing the next change window. This number and associated change information in the IR database is linked to a single report that the DBA will use on Tuesday afternoon to "lock" the change control away and use it to prepare for a production rollout.
168
Visual Source Safe (VSS) The key item underpinning any development project is source control software. There is a variety on the market but on all clients' sites I have visited to date, all use Microsoft VSS. Personally, I can't stand the product; with its outdated interface, lacking functionality and unintuitive design, it's something most tend to put up with. Even so, a well managed and secured VSS database is critical to ongoing source management. Consider these items when using VSS: a) Spend time looking around for better change manage front-ends that leaver off the VSS API / automation object model, if possible, web-based applications that allow remote development is a handy feature. b) Consider separate root project folders for each environment a. $/ i. development ii. test (unit test) iii. production c) Understand what labeling and pinning mean in detail, along with the process of sharing files and repining. These fundamentals are often ignored and people simply make complete separate copies for each working folder or worse still, have a single working folder for dev, test and production source code (i.e. 1 copy of the source). d) All developers should check in files before leaving for the day to ensure backups cover all project files. e) Take time to review the VSS security features and allocation permissions accordingly. f) If pinning, labeling, branching etc. is all too complex, get back to basics with either three separate VSS databases covering off development, test and production source code, or three project folders. Either way the development staff needs to be disciplined in their approach to source control management. g) Apply latest service packs.
Managing Servers There are not a lot of development teams that I have come across that have their own server administrators. It is also rare that the servers fall under any SOE or contractual agreement in terms of their ongoing administration on the LAN and responsibility of the IT department. As such, the DBA should take the lead and be responsible for all server activities where possible, covering: a) server backups – including a basic 20 tape cycle (daily full backups) and associated audit log, try and get the tapes off site where possible and keep security in mind. b) software installed – the DBA should log all installations and de-installations of software on the server. The process should be documented and proactively tracked. This is essential for the future rollout of application components in production and for server rebuilds. c) licensing and terminal server administration d) any changes to active-directory (where applicable) e) user management and password expiration f) administrator account access On the Development and Test servers I allow Administrator access to simplify the whole process. Before going live, security is locked down on the application and its OS access to mimic production as best we can. If need be, we will contact the company's systems administrators to review work done and recommend changes. In terms of server specifications, aim for these at a minimum: a) RAID-1 or RAID-5 for all disks – I had 4 disks fail on my development and test servers over a one year period, these servers really take a beating at times and contractor downtime is an expensive business. b) 1+ Gb RAM minimum with expansion to 4+Gb c) Dual PIII 800Mhz CPU box as a minimum Allowing administrative access to any server usually raises hairs of the back of people's necks, but in a managed environment with strict adherence of responsibilities and procedure, this sort of flexibility with staff is appreciated and works well with the team.
Development Server The DBA maintains a "database change control form", separate from the IR management system and any other change management documentation. The form includes the three core server environments (dev, test and prod)
169
and associated areas for developers to sign in order for generated scripts from dev to make their way between server environments. This form is shown below:
In terms of security and database source management, the developers are fully aware of: a) naming conventions for all stored procedures and views b) the DBA is the only person to make any database change c) database roles to be used by the application database components d) DBO owns all objects and roles security will be verified and re-checked before code is prompted to test e) Developers are responsible for utilising visual source safe for stored procedure and view management f) the DBA manages and is responsible for all aspects of database auditing via triggers and their associated audit tables g) production server administrators must be contacted when concerned with file security and associated proxy user accounts setup to run COM+ components, ftp access, and security shares and remove virtual directory connections via IIS used by the application. h) strict NTFS security privileges With this in mind, I am quite lenient with the server and database environment, giving the following privileges. Be aware that I am a change control nut and refuse to move any code into production unless the above is adhered to and standard practices are met throughout the server change cycle. There are no exceptions. a) Server a. Administrator access is given via terminal services to manage any portion of the application b. DBA is responsible for server backups to tape (including OS, file system objects applicable to the application and the databases) b) Database
170
a. ddl_admin access – to add, delete or alter stored procedures, views, user defined functions. b. db_securityadmin access – to deny/revoke security as need be to their stored procedures and views. No user has db_owner or sysadmin access. Database changes are scripted and the scripts stored in Visual Source Safe. The form is updated with the script and its run order or associated pre-post manual tasks to be performed. To generate the scripts, I am relatively lazy. I alter all structures via the diagrammer, generate the script, and alter accordingly to cover off issues with triggers or very large tables that can be better scripted. This method (with good naming conventions) is simple and relatively fail-safe, and, may I say, very quick. All scripts are stored in VSS. The database is refreshed on "quiet" times from production. This may only be a data refresh, but when possible (based on the status of changes between servers), a full database replacement from a production database backup is done. The timeline varies, but on average a data refresh occurs every 3–5 months and a complete replacement every 8–12 months.
Test Server The test server database configuration in relation to security, user accounts, OS privileges, database settings are as close to production as we can get them. Even so, its difficult to mimic the environment in its entirety as many production systems include web farms, clusters, disk arrays etc. that are too expensive to replicate in test. Here the DBA will apply scripts generated from complete change control forms that alter database structure, namely tables, triggers, schema bound views, full-text indexing, user defined data types and changes in security. The developers will ask the DBA to move up stored procedures and views from development into test as need be to complete UAT (user acceptance testing). The DBA will "refresh" the test server database on a regular basis from production. This tends to coincide with a production change control window rollout. On completion of the refresh, the DBA might need to re-apply database change control forms still "in test". All scripts are sourced from VSS.
Production Support The production server box is similar to that of test, but is controlled by the person who is packaging up the next production release of scripts and other source code ready for production. This server is used for: a) production support – restoring the production database to it at a point in time and debugging critical application errors, or pre-running end of month/quarter jobs. b) pre-production testing – final test before going live with code, especially handy when we have many DLLs with interdependencies and binary compatibilities issues. All database privileges are locked down along with the server itself.
Production The big question here is, "who has access to the production servers and databases?". Depending on your SLAs, this can be wide and varied, from all access to the development team via internally managed processes all the way to having no idea where the servers are, let alone getting access to it. I will take the latter approach with some mention of more stricter access management. If the development team has access, it's typically under the guise of a network/server administration team that oversee all servers, their SOE configuration and network connectivity, OS/server security and more importantly, OS backups and virus scanning. From here, the environment is "handed over" to the apps team for application configuration, set-up, final testing and "go live". In this scenario, a single person within the development team should manage change control in this environment. This tends to be the application architect or the DBA. When rolling out changes into production: a) webserver is shutdown
171
b) MSDTC is stopped c) Crystal reports and other batch routines scheduled to run are closed and/or disabled during the upgrade d) prepare staging area "c:\appbuild" to store incoming CC window files e) backup all components being replaced, "c:\appatches\<system>\YYYYMMDD" a. I tend to include entire virtual directories (even if only 2 files are being altered) b. COM+ DLL's are exported and the DLL itself is also copied just in case the export is corrupt f) full backup of the database is done if any scripts are being run g) consider a system state backup and registry backup, emergency disks are a must and should always be kept up to date. Take care with service packs of any software. The change (upgrade or downgrade) of MDAC, and the slight changes in system stored procedures and system catalogs with each SQL Server update can grind parts (or all) of your application to a halt.
Hot Fixes Unless you are running a mission critical system, there will always be minor system bugs that result in hot fixes in production. The procedure is relatively simple but far from ideal in critical systems. a) Warn all core users of the downtime, pre-empt with a summary of the errors being caused and how to differentiate the error from other system messages. b) If possible, re-test the hot fix on the support server c) Bring down the application in an orderly fashion (e.g. web-server, component services, sql-agent, database etc). d) Backup all core components being replaced/altered Database hot fixes, namely statements rolling back the last change windows work is tricky. Do not plan to kick users off if possible, but at the same time, careful testing is critical to prevent having to do point in time recovery if this get bad to worse. Finally, any hotfix should end with a 1/2 page summary of the reasons why the change was made; this is documented in the monthly production system report. Accountability is of key importance in any environment.
Smarten up your applications (Autonomic Computing) Autonomic computing "is an approach to self-managed computing systems with a minimum of human interference" (IBM). In other words, self repairing, reporting, managing systems that look after the whole off the computing environment. So what has this got to do with change management? Everything, actually. The whole change management process is about customers and the service we provide them as IT professionals. To assist in problem detection and ideally, resolution system architects of any application should consider either: a) API for monitoring software to plug in error trapping/correct capability b) Application consists of single entry point for all system messages (errors, warning, information) related to daily activity c) The logging system is relatively fault tolerant itself, i.e. if it can't write messages to a database it will try a file system or event log. d) Where possible, pre-allocate range of codes with a knowledge base description, resolution and rollback scenario if appropriate. Take care that number allocates don't impose on sysmessages (and its ADO errors) and other OS related error codes as you don't want to skew the actual errors being returned. A simplistic approach we have taken is shown below; it's far from self healing but meets some of the basic criteria so we can expand in the future:
172
MRAC Principal of IR/Task Completion This is going off track a little in terms of change control but I felt it's worth sharing with you. The MRAC (Mange, Resource, Approve, Complete) principal is a micro guide to task management for mini projects and incident requests spanning other teams/people over a short period of time. The idea here is to get the developers who own the task to engage in basic project management procedures. This not only assists in documenting their desired outcome, but communicating this to others involved and engaging the resources required to see the entire task through to its completion. The process is simple enough, as shown in the table below. The development manager may request this at any time based on the IR's complexity. The developer is expected to meet with the appropriate resources and drive the task and its processes accordingly. This is not used in larger projects in which a skilled project manager will take control and responsibility of the process. Task or deliverable Planned Completion Date Managed by Resourced to Approved by Completed by Requirements Design Build Test Implement The tasks of course will vary, but rarely sway from the standard requirements, design, build, test, implement lifecycle. Some of the key definitions related to the process are as follows: Managed Resourced
Each task or deliverable is managed by the person who is given the responsibility of ensuring that it is completed The person or persons who are to undertake a task or prepare a deliverable
173
Accepted
Approved Authorised Variation
The recorded decision that a product or part of a product has satisfied the requirements and may be delivered to the Client or used in the next part of the process. The recorded decision that the product or part of the product has satisfied the quality standards. The recorded decision that the record or product has been cleared for use or action. A formal process for identifying changes to the Support Release or its deliverables and ensuring appropriate control over variations to the Support Release scope, budget and schedule. It may be associated with one or more Service Requests.
This simple but effective process allows developers and associated management to better track change and its interdependencies throughout its lifecycle.
Summary No matter the procedures and policies in place, you still need commitment from development managers, project leaders/manager and the senior developers to drive the change management process. Accountability and strict adherence to the defined processes is critical to avoid the nightmare of any project, that being a source code version that we can never re-create, or a production environment which we don't have the source for. Failure to lay down the law with development staff (including the DBA) is a task easily put in the 'too hard' basket. It is not easy, but you need to start somewhere. This article has presented a variety of ideas on the topic that may prompt you to take further action in this realm. The 21st century DBA, aka Technical Consultant, needs to focus on a variety of skills, not only database change but change management processes as a complete picture.
DBMS vs File Management System Dale Elizabeth Corey 8/4/2003 A Database Management System (DMS) is a combination of computer software, hardware, and information designed to electronically manipulate data via computer processing. Two types of database management systems are DBMSs and FMSs. In simple terms, a File Management System (FMS) is a Database Management System that allows access to single files or tables at a time. FMSs accommodate flat files that have no relation to other files. The FMS was the predecessor for the Database Management System (DBMS), which allows access to multiple files or tables at a time (see Figure 1 below).
174
File Management Systems Advantages
Disadvantages
Simpler to use
Typically does not support multi-user access
Less expensive·
Limited to smaller databases
Fits the needs of many small businesses and home users
Limited functionality (i.e. no support for complicated transactions, recovery, etc.)
Popular FMSs are packaged along with the operating systems of personal computers (i.e. Microsoft Cardfile and Microsoft Works)
Decentralization of data
Good for database solutions for hand held devices such as Palm Pilot
Redundancy and Integrity issues
Typically, File Management Systems provide the following advantages and disadvantages. The goals of a File Management System can be summarized as follows (Calleri, 2001): • Data Management. An FMS should provide data management services to the application. • Generality with respect to storage devices. The FMS data abstractions and access methods should remain unchanged irrespective of the devices involved in data storage. • Validity. An FMS should guarantee that at any given moment the stored data reflect the operations performed on them. • Protection. Illegal or potentially dangerous operations on the data should be controlled by the FMS. • Concurrency. In multiprogramming systems, concurrent access to the data should be allowed with minimal differences. • Performance. Compromise data access speed and data transfer rate with functionality. From the point of view of an end user (or application) an FMS typically provides the following functionalities (Calleri, 2001): • File creation, modification and deletion. • Ownership of files and access control on the basis of ownership permissions. • Facilities to structure data within files (predefined record formats, etc). • Facilities for maintaining data redundancies against technical failure (back-ups, disk mirroring, etc.). • Logical identification and structuring of the data, via file names and hierarchical directory structures.
Database Management Systems Database Management Systems provide the following advantages and disadvantages: Advantages
Disadvantages
Greater flexibility
Difficult to learn
Good for larger databases
Packaged separately from the operating system (i.e. Oracle, Microsoft Access, Lotus/IBM Approach, Borland Paradox, Claris FileMaker Pro)
Greater processing power
Slower processing speeds
Fits the needs of many medium to large-sized organizations
Requires skilled administrators
Storage for all relevant data
Expensive
Provides user views relevant to tasks performed Ensures data integrity by managing transactions (ACID test = atomicity, consistency, isolation, durability) Supports simultaneous access Enforces design criteria in relation to data format and structure Provides backup and recovery controls Advanced security
175
The goals of a Database Management System can be summarized as follows (Connelly, Begg, and Strachan, 1999, pps. 54 – 60): • Data storage, retrieval, and update (while hiding the internal physical implementation details) • A user-accessible catalog • Transaction support • Concurrency control services (multi-user update functionality) • Recovery services (damaged database must be returned to a consistent state) • Authorization services (security) • Support for data communication Integrity services (i.e. constraints) • Services to promote data independence • Utility services (i.e. importing, monitoring, performance, record deletion, etc.) The components to facilitate the goals of a DBMS may include the following: • Query processor • Data Manipulation Language preprocessor • Database manager (software components to include authorization control, command processor, integrity checker, query optimizer, transaction manager, scheduler, recovery manager, and buffer manager) • Data Definition Language compiler • File manager • Catalog manager
Second and Third Generation DBMSs The second generation of DBMSs were developed after 1970 when E. F. Codd proposed the relational model which replaced the hierarchical and network models. A Relational Database Management System (RDBMS) organizes the database into multiple simple tables, which are related to one another by common data fields. The third generation of DBMSs are represented by Object-Oriented Database Management Systems (OODBMS) and Object-Relational Database Management Systems (ORDBMS). “Object-oriented DBMS takes the database idea one step further: data becomes ‘intelligent’ in that it ‘knows’ how to behave – that is, data is associated not only with data format information but also with instructions for handling it” (Kreig, 1999). Object-Relations Database Management Systems is a combination of the RDBMS and the ORDBMS in that it extends the RDBMS to include “user-extensible type system, encapsulation, inheritance, polymorphism, dynamic binding of methods, complex objects, etc, and object identity (Connelly, Begg, and Strachan, 1999, pg. 811). Examples of ORDBMSs are Informix and Oracle.
Conclusion From the File Management System, the Database Management System evolved. Part of the DBMS evolution was the need for a more complex database that the FMS could not support (i.e. interrelationships). Even so, there will always be a need for the File Management System as a practical tool and in support of small, flat file databases. Choosing a DBMS in support of developing databases for interrelations can be a complicated and costly task. DBMSs are themselves evolving into another generation of object-oriented systems. The Object-Oriented Database Management System is expected to grow at a rate of 50% per year (Connelly, Begg, and Strachan, 1999, pg. 755). Object-Relational Database Management System vendors such as Oracle, Informix, and IBM have been predicted to gain a 50% larger share of the market than the RDBMS vendors. Whatever the direction, the Database Management System has gained it's permanence as a fundamental root source of the information system.
References • • • •
Connolly, Thomas, Begg, Carolyn, and Ann Strachan. (1999). Database Systems: A Practical Approach to Design, Implementation, and Management. Essex , UK . Addison Wesley Longman. Database Management. [Online]. Edith Cowan University. http://wwwbusiness.ecu.edu.au/users/girijak/MIS4100/Lecture7/index.htm. [2001, August 20]. Database Management Systems. [Online]. Philip Greenspun. http://www.arsdigita.com/books/panda/databases-choosing. [2001, August 20]. File Management Systems. [Online]. Franco Calleri. http://www.cim.mcgill.ca/~franco/OpSys-304427/lecture-notes/node50.html. [2001, August 21].
176
•
Introductory Data Management Principles. [Online]. Laurence J. Kreig. Washtenaw Community College. http://www.wccnet.org/dept/cis/mod/f01c.htm. [2001, August 14].
Introduction to English Query and Speech Recognition Sean Burke 3/7/2003
A couple of weeks ago, I had just come away from yet another Microsoft XP marketing pitch about what a wonderfully robust operating system it is going to be, and how its cool new features were going to truly enrich the end user experience. I've used XP a bit and I like it, so don't get me wrong here. I'm not bashing Microsoft, but I couldn't help but be a bit cynical when I heard this. In particular, the specific hype session was centered on some very non-specific speech recognition capabilities that were destined to "render the keyboard obsolete". I don't remember the exact wording, but I can't be too far off in my paraphrasing. I would be very impressed if I could get speech recognition and activation on my machine, and make it so that it was truly a productivity booster. Supposedly Office XP is fully loaded with voice recognition capabilities, including menu commands and dictation. But I can't help but think back to the early and mid-nineties when a number of the speech-recognition software products came out with what I recall were the "keyboard killers" of their time. I don't remember if their features were really new, or just new to me, but I do remember how the scene played out. It went something like this (after about 2 hours of trying to set up the microphone, and too many mundane hours of training the software): "Ahem… File, new. File New. FILE NEW. No. F I LE N E W." (ok) [Some nonsensical dictation, just a few lines to test it out. Works pretty well, kind of clunky, still a bit irritating] "File Save. SAVE. S A V E! (no, not exit!) Save? YES! Not Exit! Cancel! CANCEL! @#$%&! I was promptly thrown out of the word processing application with nothing to show. Nothing, that is, unless you count the resulting distaste for speech-to-text. I know the technology was still in its infancy, and may still be for all intents and purposes, but for me, it turned out to be nothing more than an interesting distraction. I would be willing to bet that most of you haven’t found it too compelling either. In fact, I have yet to meet anyone who speaks to his or her machine for purposes other than the opportunity to tell it what it can go do with itself. Not too much after I saw the latest XP marketing pitch, I was on my way in to work, thinking about a database project that we had been working on for a while. Part of it has to do with essentially recreating the Query builder functionality that can be found in SQL Server or Access. We have a web-based mechanism that is used for other applications and mimics that functionality fairly well, but it is not quite sufficient for the needs of this particular application. I’ve played around with a number of the third-party tools and controls that are currently on the market, and they too have a fairly robust feature set. What was lacking in all of these tools was EXTREME simplicity for the end-user. Dreaming the impossible dream, I recalled the speech recognition capabilities of XP, and thought about how cool it would be if I could just TELL the application what data I needed to pull, and it actually went out and did it. A quick reality check reminded me that I knew nothing about speech to text technology, much less how to couple that with a database application. But I do know a little something about a particular database platform that supports a cool bit of technology. You guessed it – English Query. I’ve never actually used it before, and don’t even know for sure of anyone that has. However, one thing that I do know is that I live to learn about new technology, and this seemed to be the perfect opportunity to broaden my horizons.
Well, what is it? The focus of English Query centers on the use of Database Objects and Semantic Objects (entities and relationships). An entity is anything that you can refer to as an "item" or an "object" such as a customer, employee, or invoice. These conceptual entities are physically represented in the database as a field or table, or combinations of fields and tables joined together through views. The relationships are just what you would think – simply descriptions of how the entities are related, and how they interact with one another. For example, "Invoices have Line Items", and "Customers purchase Products" are ways of describing relationships between two entities.
177
So what about the speech recognition part of this? After a little research, I found out that Microsoft has released their latest version of the Speech SDK (5.1). This version is essentially a wrapper around their previous version (5.0) that allows for any programming environment that supports automation to access its interface. In my case, this would be Visual Basic 6.0 as I'm still getting acclimated to .NET. However, I spent some time on the .NET Speech site, and it looks very promising. As I progress through the English Query project, I may end up focusing more on the .NET Speech tools, rather than the current tool set. This will inevitably lengthen the learning curve, but it's something I want to do eventually anyway.(0)
This diagram represents the components of an English Query application deployed on the Web
178
What does it take to get an English Query application up and running? My development environment is a Win2K Pro box running SQL Server 2K. I will go over the steps you must take to build and deploy an EQ application on the desktop and on the Web in more detail in subsequent articles; for now, here is a general overview of what needs to be done: 1. The first thing you have to do is install English Query on your machine. This can be found in the setup option for SQL Server on the installation CD. 2. To create an EQ application, you must create a new project in Visual Studio from scratch, or using one of the wizards that is provided with the EQ install. Save your sanity and use the wizards. 3. Once the wizard completes its operation, refine the EQ model that was created by the wizard. a. Enhance the entities in your model by providing "synonyms" for certain entities (ex: phone = phone number) b. Define additional relationships between and within your entities. 4. For any modifications you make, and for good programming practices in general, test your EQ model. 5. After testing, you can refine how data is displayed. This will inevitably be an iterative process. 6. Build (compile) the EQ application. 7. Use the EQ application in your VB project, or deploy it to the Web.
This is a representation of the parts of an English Query Project created in Visual Studio
Where do we go from here? My aim in this series of articles is to document my learning curve on English Query and present some workable examples that you may be able to creatively apply to some of your own applications or web pages. Hopefully by the time we are done, you will have a better understanding of how EQ compliments the SQL Server 2000 platform. It is possible that I may get a couple of minor things wrong in an article as the series progresses, like the details of certain steps that need to be taken, or actions that may be performed in more than one way but represented by me as THE way to do it. Should this occur, it is my intention to keep updating each article in the series so that it will remain as current and accurate as possible, and credit anyone who points out the mistake. I would appreciate any feedback, suggestions, and questions you may have on either English Query or the Speech SDK. You can post your comments in the forum for this article by using the link on the tabs below.
Lessons from my first Project as a Project Manager David Poole 6/4/2003
Introduction
179
Around this time last year I mentioned to my boss that I was interested in Project Management. I had worked for the company for two years as the principle DBA and felt that project management was the next career step for me. Well be careful what you wish for! I thought that I had become suitably world weary and cynical. Not quite up to Michael Moore standards, but getting there. I felt ready for the task ahead but my first project in the role of project manager was an eye opener. I thought I would share with you the main lessons I learnt on my first project.
Lesson One A customer has certain expectations of their project. If the project is worth $50,000 then the customer is likely to have $60,000 worth of expectations. If, through budgeting, that $50,000 project gets pruned to, say, $20,000 then you will find that the customer still has $60,000 worth of expectations. A project that has been gutted in this way at the start is called a Death March. Read "Death March" by Edward Yourdon for further details. Your first job will be to enter a bartering process with the customer to set priorities for the tasks within the project and to work out what you can deliver for the $20,000. This leads to Lesson Two.
Lesson Two – Put everything in writing. Make it clear that actions are only carried out against written and agreed tasks. The temptation is to slip things into the project to act as a sweetener, particularly if you know that you are going to have to give the customer some bad news. However, if these sweeteners are not written down then • You have no written proof demonstrating you flexibility. • It raises false expectations in the customer and things will get ugly later on when you have to say "no" further down the line. • You will be penalized for project creep when time taken implementing the sweeteners has detracted from the time spent on the meat of the project. If you have a concern that needs addressing (i.e. the spec of the server is too low for the task it is expected to do) then you need to put this in writing. This leads to Lesson Three. My boss told me that he always volunteers to take the minutes of any meeting because he always knows that the points that he makes will be recorded. No-one can overlook the points that he raised because he always records those items. Of course someone could suggest that something be struck from the minutes after the first draft is issued, but it is unlikely to happen because: • •
A written reminder tends to prompt people's memories. Anyone who wants something struck off is faced with having to explain why they want to do so to all attendees of the meeting.
Lesson Three – Keep a project journal. This will tell you not only when things happened but give you the chance to keep a narrative of why they happened. If a customer continually puts off a major decision then it helps if you document the date/times on which you chased them up for that decision. If you raised a concern with an aspect of the project, i.e. you expressed concern that your data warehousing project is going to be run on a low spec server that is doubling as a web server, then not only the concern, but the response to this concern needs to be recorded. This is to help you keep track of your project. It is serendipity that this also acts as protection in the event of project failure. The journal will also be vital in preparing for a project post mortem.
180
Lesson Four – Keep an issues log We keep a simple Word document with a table that lists: • The issue. • The date it was raised. • Who raised it. • Who is responsible for dealing with it • The resolution. • The date it was closed. This document is a global document that is circulated to all members of the project team and to the customer. It acts as a forum for all and sundry to raise their concerns.
Lesson Five Face to face meetings are relationship builders. The rapport that you build with your customer will help in weathering the ups and downs of the project. There are things that you can say in a meeting that you would never put in writing and you would be very wary of saying on the phone. This doesn’t contradict Lesson Two. You still write everything down, but you sanitize it for general consumption. Within the constraints of time and budget you need to meet with the customer often enough to keep abreast of how the customer perceives the progress of their project. You should also aim to have a project post mortem on completion of a project. This is usually the time when you ask the customer to sign off the project as being complete and to accept your final invoice.
Lesson Six A project post mortem is supposed to be a constructive affair in which both the positive and negative aspects of the project are examined from both yours and the customer’s perspectives. In many ways it is like an annual employee appraisal. It is not an excuse for the employer/customer to give the employee/project manager what we British call "a right bollocking". If it is seen in this light then really the issues at stake should have been raised and dealt with earlier in the project. There is the danger is that this final stage will degenerate but, frankly, there is little to be gained from such an experience.
Lesson Seven – Talk to your project team members. Have you ever, as a developer, had a conversation with a project manager and asked them "you promised the customer what?" If you are asked for a delivery time for a technical aspect of the project that is outside of your experience then agree to get back to the customer after consulting your developers. Don’t improvise unless you absolutely have to, you are asking for egg on your face. This is the 21st century. You should be able to phone someone on your team during a meeting recess. This is a variation on "be nice to the people on your way up, you are sure to meet them again on your way down".
Summary They say that good judgement is the product of experience and that experience is the product of bad judgement. Well, shall we say that I gained a lot of experience on my first project. I was fortunate that a couple of my bosses are the sort of project managers that developers volunteer to work with and they really helped me through it. I’ve learnt that there is nothing soft about "soft" skills. Sometimes you have to smile and shake the customer’s hand when you would sooner break his/her fingers. Would I do it again? I would have to say 'yes'. With a good team behind you and a fair-minded customer it is challenging but fun. Much as I enjoy the problem solving aspect of DBA'ing my experience is that techy jobs tend not to earn much respect in the boardroom. My observation would be that technicians tend to have tasks thrust upon them – whereas people managers have at least some flexibility to control their own destiny.
181
Pro Developer: This is Business Christopher Duncan 2/25/2003 I've been paying the rent as a professional software developer since the 80s. I've also worked both full time and part time as a musician for longer than that. In my travels, I've come to recognize a great many similarities between programmers and musicians. Both have the fire, passion and soul of the artist. And all too often, both are incredibly naïve when it comes to the business end of things. Business – you know, that aspect of your work where they actually pay you at the end of the day? Whether you're up all night banging away at the next Killer App or you're cranking up the guitar in a smoky bar full of black leather jackets, chances are good that money isn't really what you're concentrating on. However, contrary to popular belief, that doesn't make you noble. At the end of the month, no matter how compelling your art may be, your landlord is only interested in cold, hard currency. It's just the way the world works. If you don't take the business aspect of your career every bit as seriously as you take your art, you're going to end up hungry. And just for the record, I've also done the starving artist routine. Trust me, it's not nearly as romantic as it looks in the movies. Give me a fat bank account and a two inch steak any day of the week. My art's much better when I'm not distracted by the constant rumblings of an empty stomach. Programmers by and large fare much better than their guitar playing brethren when payday rolls around. Even in the midst of the occasional economic slumps that the tech industry has weathered over the past few decades, a low paying coding job beats the heck out of a high paying bar gig. Nonetheless, all things are relative. If you make a living as a programmer, then you need computers, software, development tools, research books, and probably an extremely robust espresso machine. Spare change to tip your local pizza delivery person is also a good idea if you want to ensure that your pepperoni delight arrives while the cheese is still melted. All of this requires money. The difference between a hobbyist and a professional is that the professional lives off of that money. My best friend taught me that when I was but a fledgling, wannabe garage band musician, working for free. Believe me, getting paid is better.
You mean income is a bad thing? Now here's where the tale starts to get a little strange. In almost any other line of work, people who pay close attention to the financial aspects of their career are simply considered ambitious and motivated, attributes that actually garner respect in many circles. Indeed, in most industries, making money is a sign of success. However, when you hang out at the local coffee shop and listen to the musings of programmers and musicians (who for some reason tend to end up at the same espresso bars), you'll find that money is not only a secondary consideration, but that those who pursue it are frequently scorned by their fellow artists as being somehow less pure in their craft. Among musicians, referring to a song or style of music as "commercial" is intended as an insult, one that implies that the songwriter sold their artistic soul for a few bucks and is therefore beneath creative contempt. You'll find a similar attitude among programmers. Those who have financial and career goals as a priority are often held in disdain by the true software artists. In both cases, there is nothing wrong with being zealous about your craft. Indeed, show me someone who has no passion when it comes to their vocation, and I'll show you a very mediocre craftsman. However, if you're going to be a professional in an artistic field, you have to master the business aspects just as completely as you've mastered the creative ones. Failure to do so will bring dire consequences, not all of them immediately obvious.
Why do you go to work? First, let's take a look at why you became a professional programmer to begin with. Sure, coding is more fun than just about anything else that comes to mind, but you could code in your spare time for free. In fact, the programming you do in your spare time is often much more rewarding from a creative point of view because you're not tied to the constraints of business apps. You can write the programs and use the technologies that really excite you. So, come to think of it, why the heck would you want to spend all day writing Corporate Software that's not nearly as cool as you'd like to make it, when you could instead spend your time kicking out the really great, bleeding edge stuff that gets your motor running? Easy. Your company pays you money to write software, and even if it's not as sexy as what you do in your spare time, you need that money. Pizza ain't free.
182
And when you get right down to it, this really speaks to the heart of the matter. You get up each day, you shower (or so your co-workers hope, anyway), you jump into the transit vehicle of your choice, and you fight the masses to get to the office so that you can pursue your day as a professional software developer. Of course, once you get there, instead of coding, you spend a large portion of each day dealing with the fallout from unrealistic marketing schemes and ill informed decisions from clueless managers who think that semicolons are merely punctuation marks for sentences. You cope with an endless stream of pointless meetings, interminable bureaucracy, insipid mission statements, unrealistic deadline pressures and a general environment that seems to care about almost everything except the cool software you're trying, against all odds, to deliver. You don't have to cope with any of this nonsense when you're sitting at home on the weekend, coding away on your favorite pet project in your robe and bunny slippers. So, tell me again why you spend a significant portion of your waking hours fighting traffic and wearing uncomfortable clothes to spend time in an office environment that seems dead set on working against the very things in life that you hold dear? Oh, yeah, that's right. They pay you money to do so. Sorry. I forgot. Really I did.
We're in this for the money Now let's clear one thing up right off the bat. I'm not some starry eyed, naïve musician who would classify your art as "commercial" just because your primary purpose is making money. Oh, wait, what's that you say? That's not your primary purpose? Yeah, right. The word I would normally bark out in response to that relates to the end result of the digestive process of bulls, but I'm going to try my best to be a bit more eloquent here. So, let me try to put this another way. Rubbish! Every single hour of every single day that you spend in the corporate world as a professional software developer is driven by one, and only one thing. Money. Get warm and fuzzy with that, or find another career. Regardless of how passionate you may be about the art and science of software development, at the end of the day, it's highly unlikely that you'd spend five seconds of your time at the office if they weren't paying you to do so. You're there for the money. I don't make the rules. It's just the way it is. So, no matter how passionate you may be about your craft, at the end of the day, you're a hired gun. Maybe you're a full time employee. Or maybe, like me, you're a professional mercenary. It doesn't matter. Either way, it all boils down to the same thing. You show up to code only when people offer to pay you money to do so. Personally, I find no dishonor in this lifestyle. I deliver the very best I have to offer to my clients. They offer the very greenest American dollars they possess in return. From my point of view, everybody wins in this scenario. And so, I'm constantly baffled by programmers I encounter in everyday life who speak from the perspective that only the software is important, and nothing else. Really? Is that true? Then can I have your paycheck? I mean, only if you don't care about it, that is. Personally, I could find a lot of uses for it. But if the software is all that's important to you then shucks, let me give you my bank account number. I'd be happy to assist you in dealing with those pesky details that arise from the business end of the programming vocation. It's no trouble. Really. I'm happy to help.
Perspective is everything Of course, anyone who has by now labeled me an insufferable wise guy is completely unfamiliar with my work, be it coding, writing, speaking or training. Yes, this is an intentionally confrontational posture towards all who bury their heads in the sand and think of software and nothing but software. In fact, you happen to be my primary target for this particular conversation. But that doesn't mean that I don't like you. In fact, it's your very posterior that I'm trying to protect. Week after week, I either personally encounter or hear tales of you, or someone like you, being trashed in the workplace because you have no grip on the realities of the business world. You're taken advantage of and work ridiculous hours to no good end. Your software requirements change more often than your manager changes his socks. You suffer the consequences of releases that are absolute disasters because your company refuses to give you the time you need in order to do things the right way. You are completely unarmed in this melée if your only response speaks to the needs of the software. To your complete surprise and dismay, you'll find that nobody cares. Consequently, you're ignored, your project suffers an ill fate, and the skies just aren't as blue as they could be for one simple reason. You're trying to solve the right problems, but you're speaking the wrong language. And so, you lose. Over and over again.
183
A simple strategy for winning So do I have all the answers? Yeah, probably, but that's another conversation entirely (and should you doubt it, you can always take the matter up with our local attack Chihuahua – he has very strong feelings about such things). However, in this particular case, what you should really be questioning is whether or not I have a perspective on the software business that will help improve the things that you truly care about in our industry. And by the strangest of coincidences, I just happen to have some of those as well. But then, I guess you saw that coming, didn't you? I've been known to talk for hours on end about the specific tactics that we, as professional software developers, can employ to ensure the delivery of a Really Cool Software. In fact, you could say that it's my stock in trade. Today, however, my message is much, much simpler. I'm not talking about bits and bytes here. Okay, in fairness, I never spend much time at all talking about bits and bytes. You guys already know about that stuff, and you don't need me to teach you how to code. What I am talking about, in particular, is perspective, and I deem it a critical issue. In fact, I'd go so far as to say that if you don't have the proper perspective, you're screwed, and so is your project. So what's the perspective that I'm promoting here, and how will it help you? Just like the title says. This is business! Forget your technical religions. No one cares! Never mind how cool the app you just coded is. Nobody wants to know! Really! The people who are in a position of power and have the authority to influence the quality of software you deliver live in a completely different world than you do. Until you come to terms with this one simple fact of life, you're going to bang your head against the Corporate Wall for the rest of your career. And worst of all, the software you deliver will suck! Okay, maybe not suck in the eyes of Mere Mortals, but you and I both know that it could be way cooler than your management will let you make it.
Changing your approach And this is where the rubber meets the road. Are you tired of the stupid decisions that limit the quality of the software you deliver? Are you tired of the ridiculous and arbitrary deadlines you have to deal with that ultimately result in software going out the door, with your name on it, that you consider to be, to put it politely, sub-standard? And are you tired of losing argument after argument over this in countless meetings? Then it's time you pulled your head out of your, er, compiler! Companies who pay you to develop software are businesses, and they will only respond to arguments that have their basis in business! Learn a new perspective, and prevail! I never use one word where thirty will do. It's a personal shortcoming. Particularly because in this case, what I've taken many words to relate can be summarized quite succinctly. Your job is not about software. It's about business. Grasp this one simple concept, and apply it in all of your interactions. Every time you attempt to promote your agenda to those who have the power to do something about it, stop and ask yourself these questions. Does what you're proposing make sense from a monetary and business perspective? Will the person you're speaking with see value in it from their point of view? Or are you speaking only in terms of software? I realize that it seems a bit strange to de-emphasize technical issues when what you're trying to do is improve a technical product, but at the end of the day, everyone else shows up at the office for the same reason that you do. They're in it for the money, and business is the path to obtaining it. Speak from this perspective, and you'll be amazed at how much it improves your ability to deliver the next Killer App. Compared to dealing with people, debugging is the easy stuff. Copyright (c) 2003, Christopher Duncan. All rights reserved.
Two Best Practices! Darwin Hatheway 12/17/2003
As a DBA, one of the things that happens to me several times a day is finding a chunk of SQL in my inbox or, worse still, on a piece of paper dropped on my desk. Yes, it's SQL that performs poorly or doesn't do what the programmer expected and now I'm asked to look at it. And, it's often the case that this chunk of SQL is just plain
184
ugly; hard to read and understand. There are two Best Practices that frequently get applied to such messes before I really start analyzing the problem…
BEST PRACTICE 1 – Use Mnemonic Table Aliases. I found this chunk of SQL in a Sybase group today: select distinct a.clone_id,b.collection_name,a.source_clone_id,a.image_clone_id,c.library_name,c.ve ctor_name, c.host_name,d.plate,d.plate_row,d.plate_column,a.catalog_number,a.acclist,a.vendor_ id,b.value,c.species,e.cluster from clone a,collection b,library c,location d, sequence e where a.collection_id = b.collection_id and a.library_id = c.source_lib_id and a.clone_id = d.clone_id and a.clone_id = e.clone_id and b.short_collection_type='cDNA' and b.is_public = 1 and a.active = 1 and a.no_sale = 0 and e.cluster in (select cluster from master_xref_new where type='CLONE' and id='LD10094') I'm sure the news client has damaged the formatting of this a little bit but it's still obvious that the programmer didn't put any effort into making this SQL readable and easy to understand. And there it was in the newsgroups and he wanted us to read and understand it. Wonderful. For me, the worst part of this query are the table aliases. A, B, C, D, E. I find that I must continually refer back to the "from" clause to try and remember what the heck A or E or whatever represents. Figuring out whether or not the programmer has gotten the relationships right is a real pain in the neck with this query. He's saved typing, sure, but at a tremendous cost in clarity. And I've had much worse end up on my desk: tables from A to P on at least one occasion and about three pages long, with some columns in the SELECT list that weren't qualified by table aliases at all. Let's rewrite this guy's query for him using this first Best Practice (I'm not going to do anything about his spacing): select distinct clo.clone_id,clc.collection_name,clo.source_clone_id,clo.image_clone_id,lib.library _name,lib.vector_name, lib.host_name,loc.plate,loc.plate_row,loc.plate_column,clo.catalog_number,clo.accli st,clo.vendor_id,clc.value,lib.species,seq.cluster from clone clo,collection clc,library lib,location loc, sequence seq where clo.collection_id = clc.collection_id and clo.library_id = lib.source_lib_id and clo.clone_id = loc.clone_id and clo.clone_id = seq.clone_id and clc.short_collection_type='cDNA' and clc.is_public = 1 and clo.active = 1 and clo.no_sale = 0 and seq.cluster in (select cluster from master_xref_new where type='CLONE' and id='LD10094') Without bothering to fix the spacing, isn't this already easier to understand? Which query lends itself to easier maintenance? Trust me, it's the latter, every time. In some situations, being able to easily identify the source table for a column in the select list can be a big help, too. You may have two different tables which have fields with identical names but which mean different things. Catching those will be easier with mnemonics. We can make another big improvement in this query with another best practice...
185
BEST PRACTICE 2 – Use ANSI JOIN Syntax Do this to clearly demonstrate the separation between "How do we relate these tables to each other?" and "What rows do we care about in this particular query?" In this case, I can only guess what the programmer is up to but, if I were a DBA at his site and knew the relationships between the tables, I could use this "relating" vs. "qualifying" dichotomy to help troubleshoot his queries. Let's rewrite this query again (but I'm still not going to do much about his spacing): select distinct clo.clone_id,clc.collection_name,clo.source_clone_id,clo.image_clone_id,lib.library _name,lib.vector_name, lib.host_name,loc.plate,loc.plate_row,loc.plate_column,clo.catalog_number,clo.accli st,clo.vendor_id,clc.value,lib.species,seq.cluster from clone clo inner join collection clc on clo.collection_id = clc.collection_id inner join library lib on clo.library_id = lib.source_lib_id inner join location loc on clo.clone_id = loc.clone_id inner join sequence seq on clo.clone_id = seq.clone_id where clc.short_collection_type='cDNA' and clc.is_public = 1 and clo.active = 1 and clo.no_sale = 0 and seq.cluster in (select cluster from master_xref_new where type='CLONE' and id='LD10094') I still can't say for sure that this query is right. However, the DBA that does know this database is going to find it much easier to spot a missing element of the relationship between, say, collection and clone. It's certainly much easier to spot a situation where the programmer failed to include any relationship to one of the tables (it would be obvious to us at this point), so you get fewer accidental Cartesian Products. In my experience, simply rewriting ugly queries according to these best practices has often pointed up the nature of the problem and made the solution a snap. This certainly happens often enough that taking the time to do the rewrite is worth the trouble. Another advantage of following this rule is that it allows you to readily steal an important chunk of your SQL statements from any nearby statement that already relates these tables. Just grab the FROM clause out of another statement, put in the WHERE that's customized for this situation and you're ready, with some confidence, to run the query. Being a lazy sort, this feature is a real plus for me. So, encourage mnemonic table aliases and use of ANSI JOIN syntax. As Red Green says: "I'm pullin' for ya. We're all in this together." He's right; your programmers might end up at my site or vice-versa someday.
VBScript Class to Return Backup Information Bruce Szabo 8/28/2003
Introduction In the first part of this series a script was used to query a SQL server for databases that were being backed up as part of the maintenance plans. This allows one to determine if a database is part of a maintenance plan. It would, in most cases, be nice to have the pertinent backup information on hand. The following class will return the relevant backup information from the maintenance plan so it can be viewed in a more user friendly manner. If the script presented from the first series is combined with this script one would be able to loop through all the databases in the maintenance plans and return their individual backup information to a user interface. By taking
186
these classes and converting them to ASP scripts, a web page can be created to display the current backup situation on a given SQL server. Some of these techniques will be presented in upcoming articles. In this article, however, a script to present the backup information is going to be presented.
An Example The code for this article can be found at SQLServerCentral.com. The following is an example of the code needed to return the backup information for a given database. By entering the server and database name one can query to find the last backup for a give database. There are two message boxes here that return the backup information. The message boxes demonstrate two ways information can be returned from the class. The first method is to use GetBackUpHist. This method of the class returns a text string with all the backup information put together. The second method takes each individual element and builds the text string. This is useful to add formatting or to write information to a file if the this class was used as part of an inventory type script. set objDBInfo = new clsDBBackupInfo objDBInfo.SQLServer = "MYSERVER" objDBInfo.UserID = "MYUSERID" objDBInfo.Password = "MYPASSWORD" objDBInfo.Database = "MYDATBASE" msgbox objDBInfo.GetBackupHist strDBMsg = "" strDBMsg = strDBMsg & "Database " & objdbinfo.Database & vbCRLF strDBMsg = strDBMsg & "Start Time " & objdbinfo.StartTime & vbCRLF strDBMsg = strDBMsg & "EndTime " & objdbinfo.EndTime & vbCRLF strDBMsg = strDBMsg & "Duration " & objdbinfo.Duration & vbCRLF strDBMsg = strDBMsg & "Plan " & objdbinfo.Plan & vbCRLF strDBMsg = strDBMsg & "Success " & objdbinfo.Success & vbCRLF strDBMsg = strDBMsg & "Message " & objdbinfo.Message & vbCRLF msgbox strDBMsg set objDBInfo = nothing The UserID and Password properties are optional. If the SQL server is running with integrated security and the logged in user is an administrator on the SQL server the information will be returned without the UserID and Password properties.
The Class The beginning of the class has an explanation for the properties and methods of the class. This section is not enumerated. The enumerated section of the code starts by initializing the needed variables (lines 1-18). The only code needed in the initialize routine sets the security variable to integrated security by default. The terminate routine closes the connection to the server. Lines 28-116 are where the let properties are defined. These are the five settings the user has the ability to control. In this case the user can set the SQLServer, the Database, the UserID, the Password, and the Security. When the SQLServer property and the Database properties are set a check is made to see if both properties have been set (lines 30 and 68). If both properties have been set the rest of the let property routines behave the same for these two propeties. A SQL statement is constructed, a connection is open and a recordset is returned. The record set is checked to make sure it is not empty and the values are read into the variables. When the recordset values are read into the private variables they are then available as properties to the users via the get statements which will be discussed below. The UserID and Password properties need to be set, as mentioned above, if the server will not be accessible via integrated security. The security setting does not need to be set as it is set to integrated by default. This setting might be used if one wanted to change servers and databases. One server may be able to use integrated security while another needs an SQL login. The class has eight get properties which are the properties the user can get once the object has been instantiated. The SQLServer and Database properties should be known so they may not need to be returned. The other six properties (lines 118 - 148) can be used by the user to format the database backup information. StartTime, EndTime and Duration give the user an idea of how long a backup takes. The success property lets the user know if the backup was successful. The plan property lets the user know which database maintenance plan the backup is a member of and the message property lists where physically the backup was written. Lines 151 - 168 are a private routine to open a connection to the database. Lines 170 - 172 are a private routine to close the connection to the database. The close routine is called by the terminate routine. The final method is the GetBackupHist. This method returns a string with the same information returned by the individual properties. This
187
method is used mostly for troubleshooting or in a case where a script needs to return information without regards to format. '**************************************************** '* '* CLASS clsDBBackupInfo '* '**************************************************** '* The purpose of this class is to list the backups for a given database. '* The information can be retrieved via a text message using the GetBackupHist () '* method or using the individual elements using the gets. '* '* LETS '* SQLServer - Server whose maintenance plans you want to query '* Database - Database we want to look up last the last backup for '* '* GETS '* SQLServer - Server Name '* Database - Database Name '* Plan - Plan name containing the backup '* Success - was the last backup a success '* EndTime - when the last backup ended '* StartTime - when the last backup started '* Duration - the length of time the last backup took '* Message - message for the last backup usually the location of the backup file '* '* Public Functions '* GetBackupHist() Returns a string containing the backup information and populates the GETS. 1 class clsDBBackupInfo 2 private strSQLServer 3 private strDataBase 4 private objCon 5 private SQL2 6 private RS1 7 private str 8 private fd 9 private ConnectionString 10 private strPlan 11 private boolSuccess 12 private dtEndTime 13 private dtStartTime 14 private dtDuration 15 private strMessage 16 private boolSecurity 17 private strUserID 18 private strPassword 19 20 Private Sub Class_Initialize() 21 boolSecurity = TRUE 22 End Sub 23 24 Private Sub Class_Terminate() 25 Call closeConnection 26 End Sub 27 28 Public Property Let SQLServer ( byVal tmpSQLServer ) 29 strSQLServer = tmpSQLServer 30 if len(strSQLServer) > 0 and len(strDatabase) > 0 then 31 Dim SQL2 32 Dim RS1
188
33 Dim str 34 Dim fd 35 36 SQL2 = SQL2 & "SELECT TOP 1 * FROM sysdbmaintplan_history " 37 SQL2 = SQL2 & "WHERE (activity LIKE " & "'" & "backup database" & "'" & ") AND (database_name = " & "'" & strDatabase & "') " 38 SQL2 = SQL2 & "ORDER BY end_time Desc" 39 40 Call openConnection() 41 42 Set RS1 = objCon.Execute(SQL2) 43 44 if not RS1.eof then 45 for each fd in RS1.Fields 46 str = str & fd.name & " " & fd.value & vbCRLF 47 next 48 strPlan = RS1("Plan_name") 49 boolSuccess = RS1("Succeeded") 50 dtStartTime = RS1("Start_Time") 51 dtEndTime = RS1("End_time") 52 dtDuration = RS1("Duration") 53 strMessage = RS1("Message") 54 else 55 strPlan = "" 56 boolSuccess = "" 57 dtStartTime = "" 58 dtEndTime = "" 59 dtDuration = "" 60 strMessage = "" 61 end if 62 Set RS1 = Nothing 63 end if 64 End Property 65 66 Public Property Let Database ( byVal tmpDatabase ) 67 strDatabase = tmpDatabase 68 if len(strSQLServer) > 0 and len(strDatabase) > 0 then 69 Dim SQL2 70 Dim RS1 71 Dim str 72 Dim fd 73 74 SQL2 = SQL2 & "SELECT TOP 1 * FROM sysdbmaintplan_history " 75 SQL2 = SQL2 & "WHERE (activity LIKE " & "'" & "backup database" & "'" & ") AND (database_name = " & "'" & strDatabase & "') " 76 SQL2 = SQL2 & "ORDER BY end_time Desc" 77 78 Call openConnection() 79 80 Set RS1 = objCon.Execute(SQL2) 81 82 if not RS1.eof then 83 for each fd in RS1.Fields 84 str = str & fd.name & " " & fd.value & vbCRLF 85 next 86 strPlan = RS1("Plan_name") 87 boolSuccess = RS1("Succeeded") 88 dtStartTime = RS1("Start_Time") 89 dtEndTime = RS1("End_time") 90 dtDuration = RS1("Duration")
189
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 100 101 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 140 151 152 153
else
End Property
end if
strMessage
= RS1("Message")
strPlan boolSuccess dtStartTime dtEndTime dtDuration strMessage
= = = = = =
end if Set RS1 = Nothing
"" "" "" "" "" ""
Public Property Let Security ( byVal tmpSecurity ) boolSecurity = tmpSecurity End Property Public Property Let UserID ( byVal tmpUserID ) strUserID = tmpUserID boolSecurity = FALSE End Property Public Property Let Password ( byVal tmpPassword ) strPassword = tmpPassword boolSecurity = FALSE End Property Public Property Get SQLServer SQLServer = strSQLServer End Property Public Property Get Database Database = strDatabase End Property Public Property Get Plan Plan = strPlan End Property Public Property Get Success Success = boolSuccess End Property Public Property Get EndTime EndTime = dtEndTime End Property Public Property Get StartTime StartTime = dtStartTime End Property Public Property Get Duration Duration = dtDuration End Property Public Property Get Message Message = strMessage End Property Private Sub openConnection() Set objCon = WScript.CreateObject("ADODB.Connection")
190
154 155 ConnectionString = "Provider=sqloledb;" 156 ConnectionString = ConnectionString & "Data Source=" & strSQLServer & ";" 157 ConnectionString = ConnectionString & "Initial Catalog=MSDB;" 158 if boolSecurity = TRUE then 159 ConnectionString = ConnectionString & "Integrated Security=SSPI;" 160 else 161 ConnectionString = ConnectionString & "User Id=" & strUserID & ";" 162 ConnectionString = ConnectionString & "Password=" & strPassword & ";" 163 end if 164 165 166 objCon.Open ConnectionString 167 168 End Sub 169 170 Private Sub closeConnection() 171 objCon.Close 172 End Sub 173 174 Public Function GetBackupHist() 175 Dim SQL2 176 Dim RS1 177 Dim str 178 Dim fd 179 180 SQL2 = SQL2 & "SELECT TOP 1 * FROM sysdbmaintplan_history " 181 SQL2 = SQL2 & "WHERE (activity LIKE " & "'" & "backup database" & "'" & ") AND (database_name = " & "'" & strDatabase & "') " SQL2 = SQL2 & "ORDER BY end_time Desc" 182 183 Call openConnection() 184 185 Set RS1 = objCon.Execute(SQL2) 186 187 if not RS1.eof then 188 for each fd in RS1.Fields 189 str = str & fd.name & " " & fd.value & vbCRLF 190 next 191 strPlan = RS1("Plan_name") 192 boolSuccess = RS1("Succeeded") 193 dtStartTime = RS1("Start_Time") 194 dtEndTime = RS1("End_time") 195 dtDuration = RS1("Duration") 196 strMessage = RS1("Message") 197 else 198 str = "No Backups for " & strDatabase & " on " & strSQLServer 199 strPlan = "" 200 boolSuccess = "" 201 dtStartTime = "" 202 dtEndTime = "" 203 dtDuration = "" 204 strMessage = "" 205 end if 206 207 GetBackupHist = str
191
208 Set RS1 = Nothing 209 210 End Function 211 212End Class '**************************************************** '* '* END CLASS clsDBBackupInfo '* '**************************************************** Conclusions This routine is used to query maintenance plans for information regarding backups. The routine allows one to draft formatted messages using the properties of the class. The class can be used in conjunction with other routines to create a reporting mechanism for SQL backup procedures. In the next article, both this script and the previous script will be used in conjunction with SQL-DMO to find servers and query the maintenance plans on those servers.
White Board, Flip Chart, or Notepad? Andy Warren 10/2/2003
Occasionally we stray just a little bit from pure SQL articles and delve into related areas. If you haven't guessed yet, this is one of those occasions. On a daily basis I meet with members of our development team to discuss problems they are working on, problems I need them to work on, or sometimes problems that I'm working on. It's an informal environment where we go to whichever office is convenient and talk things through until we get to where we need to be. Out of each of these discussions we often wind up with a list of todo items and/or a diagram showing some proposed changes, or maybe the flow of how a process will work. Not exactly a new process, I'm sure most of you do something similar. Where it gets interesting (to me anyway) is how to have that conversation effectively. It seems that we almost always look around for something to draw on so that we can present ideas visually – and then modify those ideas visually as well. When it is time to draw, we typically have three choices: • Dry erase board/white board/chalk board • Flip chart/easel pad • 8-1/2x11 pad • PC Leon has a dry erase board in his office. Not that he consciously decided that it was better than the other two options, that's just how it wound up. Dry erase is nice because you change your drawing quickly and still keep it legible, but the downside is that once you have something complete there is no easy way to move that to a transportable medium. (Note: I've heard of people taking digital photos of the board and printing them, not a bad idea I guess, or maybe you're one of the lucky few who have a board with the built in printing functionality). A lot of the time the problem is bigger than we can describe on a single board, so we have to start writing on something else, or start writing a lot smaller in the left over space. Nine times out of ten when I'm in Leon's office I end up using a 8-1/2x11 pad because I can't erase what's on the board. Legal or letter notepads are about as low tech as you can get I guess, but they do work. If you have just two people working the problem it works pretty well, but with three or more the size limits its usability/viewability. Not as easy to change as dry erase of course, but paper is cheap so you can always redo it, plus the completed notes are easy to photocopy. Maybe it's just me but I don't think it works as effectively as either dry erase or a flip chart – I think because it is is helpful to literally "take a step back" and have something you can look at from a few feet away. We almost never use a PC to convey ideas. At most we'll grab a chunk of source code, or look at a table design. Maybe we just haven't found the right tool?
192
That leaves the flip chart. It overcomes most of what I consider negatives on the notepad. Pretty hard to copy of course, and the paper is a lot more expensive. Not as easy to modify as the dry erase board. For discussions with developers, at the end of the session they tear the sheets off and tape them to the walls in their office while they work. Over the past year it has become my tool of choice for outlining problems/solutions, even for things I'm working on solo. I'll get up, add some stuff, sit back down, look at the chart and think on it. The interesting part about both dry erase and flip chart is they encourage discussion. When someone walks by or comes in about something else and sees a new drawing, they often ask questions or have comments that are useful. No one is going to walk in and see what I have written on my notepad, without being asked. These sessions are really meetings and well run meetings always have minutes. For us, it's what winds up on the board/paper that we consider the minutes, no point in investing more time in it. This is a lot more effective than everyone taking notes while they try to think through the problem at the same time. A common scenario is for us to revisit the drawing to rethink a problem or reconsider why we went in a specific direction (a month or more later). Getting everyone looking at the original drawings seems to get us back into that mental position quickly – or at least more quickly than just talking about it with no visual reference. I'm not here to say that one method is better than the other, just that one works better for me. What I'm hoping you'll think about is how you convey ideas and information during these type of brain storming/problem solving sessions. A lot of what we (developers and DBAs) do is complex stuff. Getting everyone "on to the same page" isn't easy, but it is a useful metaphor.
193
This book contains the best material published on the site from 2003. It’s a variety of topics from administration to advanced querying. XML to DTS, security to performance tuning. And of course, the famous White Board, Flip Chart, or Notepad debate. So why print a book containing material you can get for free? Take a minute, read the introduction and find out!
The Best of SQLServerCentral.com Vol. 2 Essays and Ideas from the SQL Server Community
Andy Jones, Andy Warren, Bob Musser, Brian Kelley, Brian Knight, Bruce Szabo, Chad Miller, Chris Cubley, Chris Kempster, Christopher Duncan, Christoffer Hedgate, Dale Elizabeth Corey, Darwin Hatheway, David Poole, David Sumlin, Dinesh Asanka, Dinesh Priyankara, Don Peterson, Frank Kalis, Gheorghe Ciubuc, Greg Robidoux, Gregory Larsen, Haidong Ji, Herve Roggero, James Travis, Jeremy Kadlec, Jon Reade, Jon Winer, Joseph Gama, Joseph Sack, Kevin Feit, M Ivica, Mike Pearson, Nagabhushanam Ponnapalli, Narayana Raghavendra, Rahul Sharma, Ramesh Gummadi, Randy Dyess, Robert Marda, Robin Back, Ryan Randall, Sean Burke, Sharad Nandwani, Stefan Popovski, Steve Jones, Tom Osoba, Viktor Gorodnichenko
Book printing partially sponsored by
Shelving: Database/SQL Server
The Best of SQLServerCentral.com — Vol. 2
In April 2001 six geeks banded together to form a more perfect site. Three years and 140,000+ members later, SQLServerCentral.com is one of the premier SQL Server communities in the world. We’ve got over 1,000 articles, 100s of scripts and FAQs, everything you would need as a SQL Server DBA or developer, and all at a great price — free.
$22.99 USA