Foreign Key Discussion Kratoch

  • Uploaded by: Eddie Awad
  • 0
  • 0
  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Foreign Key Discussion Kratoch as PDF for free.

More details

  • Words: 25,826
  • Pages: 37
Discussing Foreign Keys and their usage in the Database Marcel Kratochvil – August 2009 (Version 1.1) : http://foreignkeys.blogspot.com/ You know you are treading on hallowed turf when you dare to raise the specter that foreign keys aren't that great. Someone then in the room turns around and points a pretend hand gun at you and pulls the trigger. So core is the entrenched thinking about foreign keys that you wonder if there is any hope in this industry for changing behaviors in how we tune and manage the database. The perception is that foreign keys equate to data integrity, and this is now so ingrained in thinking patterns that merely questioning this concept raises defensiveness. My goal is to show that by focusing on the misuse of foreign keys and what they mean, actually exposes fundamental flaws in the methodologies IT is using today regarding tuning, data management and application development. I will also show that just by looking at foreign keys our whole world view can change and be challenged, revealing a new one, containing concepts that will challenge your thinking. I will be demonizing foreign keys, but merely doing this to show that if a core, unquestioning tenant of the database world can be exposed, then everything we cherish and take for granted needs to be reviewed. The paper is broken down into numerous discussion points, covering the issues from different angles. Some of the points raised are addressed from different viewpoints in each discussion. The aim is to show why the world view covering foreign keys and the relational model needs to change. I will discuss why foreign keys represent a thinking strategy that typify what is holding back the IT industry from establishing efficient practices. I will aim to show how this relates to core issues found in database tuning. In the appendix I cover two scenarios showing why we need to challenge existing world views. Discussion 1: The definition When I raised the notion at a recent conference that foreign keys might not be needed, the statement that came back was “so you don't want data integrity then?”. So ingrained is this notion of having foreign keys equating to having data integrity that it forms the core for a lot of the discussion points raised. Its therefore worthwhile looking at some of the definitions: i. Relational Model is based on first-order predicate calculus. “A major aim of relational database design is to group attributes into relations to minimize data redundancy and thereby reduce the file storage space required by the implemented base relations” 1 The relational model stated as thus does not include data semantics or whether the meaning behind the data is correct. Its also important to highlight at this point “reduce the file storage space”, this will become important later. ii. “The conceptual view is a representation of the entire information content of the database, again in a form that is 1 Database Systems. Thomas Connolly. 3rd Edition.

Page 1 / 37

somewhat abstract in comparison with the way in which the data is physically stored ”2 The logical or conceptual model is separate from the physical model. Now lets cover some of the varying definitions for what data integrity or database integrity is: iii. Database Integrity “Refers to the validity and consistency of stored data. Integrity is usually expressed in terms of constraints” 3 iv. “Integrity refers to the accuracy and validity of the data ... Integrity involves ensuring that the things they are trying to do are correct”4 vi. So what then is data integrity? “The standard features to ensure data integrity are (1) data validation (2) automatic computations (3) verification of totals (4) control of user access (5) transaction integrity and (6) backup and recovery”5 2 Database Systems. C.J Date. 4th Edition. Chapter 2. 3 Database Systems. Thomas Connolly. 3rd Edition. Chapter 1. 4 Database Systems. C.J Date. 4th Edition. Chapter 19. 5 Database Management Systems. Gerald V.Post.

One could deduce that foreign keys are covered in part under point 1, data validation. I would conclude that data validation has more to do with data semantics more than the relationship rules. Integrity does try to ensure that queries when performed are correct. When you look more at the relational model and review the use of normalization, you begin to see that a key function that is there is to provide protection from anomalies:

(which is the Primary key definition) and Referential Integrity (which is the Foreign Key definition). He refers to the rules as states. So by having a foreign key does not mean you have data integrity. All you are doing is describing a mathematical relation between two entities. The mathematical relationship thus ensures that when queries are performed there are no anomalies.

The first key point I am raising is that the usage behind what a foreign key actually is and does is vii. “The process of normalization is formal method … so being misrepresented. The statement about foreign that a relational schema can be normalized to a specific keys implies that by using them we are ensuring we form to prevent the possible occurrence of update have data integrity. Examples abound in the English anomalies”6 language where the meanings behind words change over time, especially when it meets political or So when we configure our schema to conform to the marketing aims. Examples include free trade, relational model we are in fact establishing an globalization, proactive, empowerment and a more environment so that when we query the data we can recent one is detox. In the case of foreign keys I am be sure there are no anomalies arising when that suggesting that the meaning for term foreign key has data is queried. We are assured that the response changed over time due to misuse, overuse and as I from the query is mathematically exact. This is have seen, by uttering this term you command important because the relational model is based on unquestioning attention. No one will dare question predicate calculus. We can use this to mathematically the authority of the what the foreign key is. prove a SQL statement (which I recall painfully doing in my finally year of university). So the crux is that a foreign key can when used, ensures that when a query is performed the results Lets now look at constraints: can be mathematically shown to be correct. Nowhere does it claim that the actual data that viii. “Why are primary keys important? The answer is that comes back is accurate. Nowhere does it claim that they provide the sole tuple-level addressing mechanism the data entered is correct. Correctness is covered in within the relational model...its follows, therefore, that data semantics. primary keys are absolutely fundamental the the operation of the overall relational model”7 This doesn't dispel the notion that we have no need for foreign keys, all I have shown is a foreign key Date then importantly states: and real data integrity are two independent concepts. Can I have data integrity without foreign ix. “ Foreign-to-primary-key matches represent references keys? Answer is yes. Do I have data integrity with from one relation to another; they are the glue that holds the database together … Note carefully, however, that not Foreign Keys? The answer is no - remember that foreign keys do not give you backup, security, data all such relationships are represented by foreign-to8 validation. Also, as I will show later, the whole notion primary key matches.” of referential integrity is an outdated concept. Date doesn't use the strong words about foreign keys as he does with the primary key. He clearly points out that primary keys are fundamental to the relational model. He then implies that foreign keys are useful but can be represented using other methods. Date refers to the two core integrity rules Entity 6 Database Systems. Thomas Connolly. 3rd Edition. Chapter 13. 7 Database Systems. C.J Date. 4th Edition. Chapter 12. 8 Database Systems. C.J Date. 4th Edition. Chapter 12.

Page 2 / 37

Are there situations where the relational logical model says we must have a foreign key relationship and it must be managed, but the real life situation says its fine to break it? The following example highlights one such case, and as you will read later on the use of applications on the Internet changes the rules. In this case we have an e-commerce application. We have an account which has many purchases. The two tables are linked together by a primary/foreign key relationship. An account can make many purchases and it does not make sense to have purchases

without an account. But lets now throw a spanner in the works. The accounts are created by the general public, and as per a Government privacy ruling, the owner of the account has the right to insist on their account being deleted. It is thus a legal requirement that on request the account is deleted. In addition, because of auditing requirements, purchase order information cannot be deleted. This is also a legal requirement as introduced by the tax department and deleting purchase records can be seen as a fraudulent activity. Because the government cannot get an accurate figure for tax purposes. So now we have an interesting situation. If a customer requests their account to be deleted, the actual record must be deleted. The parent record must be destroyed. But We cannot delete the child records for legal reasons also. The solution to this requires either warping our definition for a foreign key (possibly by nullifying the parent record data but keeping it), re-architecting the database to somehow get around this conflict, or just not getting hung up on the notion that we need foreign key rules rigidly maintained and living with the orphaned record. We then factor into reports that the parent might not exist, do different join conditions and just generally work around it. The physical model does not have to match the logical model.

objects the rules changed, the way we work with data, tuning and our understanding of the database should have changed also. What I have observed is the notion of being object/relational was an anathema to most, and rather than embracing and using the object features, data architects, DBAs and developers have steadfastly refused to, and stuck to the relational model hoping that by ignoring it or not dealing with it, then this object interloper would somehow go away. What compounded this view was that most of the Oracle tools did not support objects and the only interaction to them was by hiding them behind objects views. So if you have been thinking of having a debate based on the rules supporting the relational model, remember that the rules have changed. The arguments are irrelevant. Possibly fun to have, but in the greater picture will not achieve anything worthwhile. Subsequent discussion points will refer to the relational model, but more to highlight how the use of the model has been misinterpreted and misused over the years, and the core idea of separating the logical from the physical model was forgotten about.

The introduction of objects breaks most of the fundamental rules of the relational model, and this is And now the big question? Is my data accurate if I seen when it comes to normalization. follow the relational model to the extreme? Answer is 1NF “A relation in which the intersection of each row a firm no. Is it the end of the world if I don't. No. We and column contains one and only one value”. A UNF break the rules all the time, yet we have this table contains one or more repeating groups.9 perception relational is the way to go. And yet by daring to attack a core relational concept, the So why introduce objects in the database when they Foreign Key, I dare to attack the core belief system. break this cherished relational rule? A variety of reasons, but one of the main ones is performance. Why have the relational rules and what do these Using objects you can achieve magnitude rules actually achieve? This is going to be a crucial improvements in performance over relational question to answer. designs. This has been proven in the real world where telecommunication systems use objects to By this time in the discussion I hope you have been achieve performance. The use of objects also furiously scribbling notes in the margins highlighting enables easier translation to real world objects in the some of the possible logical inconsistencies in the design. Its a lot easier to design a car as an object arguments raised above. You might even be thinking than as a relation. that its going to be easy to dismiss the discussion points based on pure deductive reasoning and thus So if objects are so great, why aren't we all using prove that Foreign Keys are the best thing ever. them? Performance and complexity. I see object usage in the database as one where if used correctly Its best then that I highlight what might seem like a you can achieve great performance improvements, small and inconsequential point and that is: Oracle is but at a cost in other areas, in particular the ability to not a relational database, it is, and has been since do cross reference queries, usability, minor updates Oracle 8.0, classed as an object/relational database. and maintenance. Also, its very easy to misuse objects, and the core tenant behind objects with its When Oracle expanded the database to include 9 Database Systems. Thomas Connolly. 3rd Edition.

Page 3 / 37

layered structure (hiding complexity) lends itself to inefficient programming practices resulting in compounding performance problems. So object usage can go from great performance to abysmal, whereas relational tends to promote a more consistent one. In relational breaking the data apart and then reassembling per query, lends itself to consistent performance. If you think performance of relational systems can vary, you have never then worked with objects. The powerful capability of an object/relational database is that you can simultaneously have both models working, achieving the performance required. In this case the complexity arises in data management and replicating the data between the structures. But back to foreign keys. In the object world, they are not required to achieve data integrity. In the object world, UNF is allowed. We can repeat fields to improve performance, we sacrifice data redundancy for performance, we replicate data, warp it and change it to achieve better performance or allow us to perform more complex queries against it. The same holds true when looking at XML. XML is hierarchical and foreign keys just do not fit into it. Spatial data is different again.

When asking about Foreign Keys and data integrity, it was commented "don't you want your data secure?" As if having a Foreign key is the be all and end all. As if I have perfect data if I have a foreign key. As the British might say Bollocks to that. When discussing the topic of data integrity and foreign keys, the following real life example was given to me: "don't you want correct values entered for whether a Credit Card is Visa or MC?". This represents a typical comment that I have heard many a time, where people are confusing data integrity and misrepresenting what the relational model was trying to achieve. The answer to the above question has nothing to do with foreign keys but highlights how most people see the foreign key as the protector of the database, its guardian. There is a difference between data integrity and data semantics.

Page 4 / 37

Discussion 2: The Multimedia World View When you deal in the object/relational world and then move into the multimedia world, you find you are working in a niche. But this is a niche so new that the rules are defined retrospectively, only years after you build and design applications. In this world, its a different game. The methodology for development, the way you tackle development, the speed and management of development shows that you are in a new paradigm. Traditional logic, traditional methodologies just do not work in this world view. Whey you look at images, audio and video you enter a world of fuzzy logic and thinking. You have to leave the safe world of binary logic behind. If you try to use them failure is certain. This is one reason why there has been a slow adoption rate in the use of multimedia, especially multimedia in the database. When you deal with multimedia you enter the world where the concept of exactness changes. Ask two photographers if a photo is award winning and they will disagree. Just like judging wine and whether a Shiraz is better than a Cabernet, comparing photos, or digital images is a fuzzy science. Multimedia data is not accurate. It can change over time. A video can be edited, soundtracks in it changed, even the meta data embedded in it can change. It can be compressed, resized or adjusted. With multimedia you move into a world of uncertainties. Data accuracy isn't an achievable goal or the holy grail you aim for, its just not in the equation. Welcome to the world of fuzzy data. In this world, foreign keys are left at the front door along with other relational concepts that do not fit. The problem most people working with relational data have is they perceive multimedia data as just another subtype, just an obscure or special case of relational data. Its just binary data. This is not the case. I will show in later discussions that multimedia is a superset of relational. Once you understand this, the way you treat multimedia data is different and you will start to see why relational concepts fall apart. Once you abandon the safe, traditional world of relational and realize that attempting to obtain the nirvana of creating a database with perfect data integrity is not possible, then you move into the world of fuzzy data management. This is the world where imperfection is not your enemy but just part of the equation, where you realize that having `100% data integrity means either having no data or

no users. Putting users into the equation means you have to deal with data imperfection or bad data on entry. The mentality programmer's have today is to write programs that minimize or completely eradicate any chance of bad data. When you move away from the relational and logical model and move to the physical model, by trying to match your business to the real world, you will start to move towards object relational. In doing so, some core relational concepts, beliefs get in the way of development. Add a web front end, security, and every relational concept cherished needs to be rethought. Lets highlight this by looking at two multimedia examples. The first is the delayed delete and the second concerns Internet replication. Lets setup a multimedia application. In it I have a collection, and a collection contains images and video. Lets say that one collection contains hundreds of thousands of images. There is a one to many relationship between the collection and image table. A primary key in collection maps to a foreign key in the image table. Now lets say we need to delete the collection (in a real life scenario the collection could represent an area in a museum that has been made redundant). To delete the collection is simple, but to then delete all the hundreds of thousands of images isn't (one could state that with Securefiles in Oracle11G it is faster, but lets assume its an expensive I/O). Should the delete all be done in one transaction and done

Page 5 / 37

immediately? If it was relational data it would be easy to do in one transaction, but with multimedia the attempt to delete such a large volume of data will impact disk, logs and rollback. On a single disk, or single I/O system such a delete could result in major performance issues for other customers using the site. A solution, and a fair solution (but not the only solution) is to perform a delayed delete. In this case the collection parent record is deleted, then a batch job runs and deletes the records in the image table with a delay in between. So for the duration of the delete (which could be days depending on performance), during this time the foreign key integrity rule is broken. We have orphaned child records. We could adjust the logical design and include a physical change – mark the parent as deleted but only delete it when all child records are deleted. This raises the concept that our logical needs to be changed to adapt to physical conditions (which is covered later). It also introduces the fuzzy concept of a record being marked as deleted and not actually being deleted. For those who believe that constraints are crucial for maintaining consistency when there are multiple applications potentially accessing the data, it means each one has to be modified to deal with this partially deleted record, otherwise it could reference it, when it shouldn't. Lets extend the example and say that some of the images have been replicated to other sites on the

Internet using an Internet distributed database (a concept discussed in greater detail in another discussion). In this case, the delete of the child images in the collection involves going through security and remote procedure calls. Even if the calls fail, a job could easily run and cleanup orphaned records at a later time. The need to follow foreign key relationship rules just isn't required. When looking at the delayed delete it raises the issue, can we have a foreign key but have its implementation delayed? So rather than do a cascade update in the same transaction, do the update that night as part of a batch job? Or have a batch job run daily that removes orphaned records? A relational purist would say no, it has to be in the same transaction, but in reality this concept is done all the time. It might not be with foreign keys but it is done with other relationships or enforcement of constraints within the database. For example, if a new user is created in the database via public means (lets say a self service kiosk), are their details validated immediately, or is their a process, which could be manual, to validate them? And what happens on failure to validate? In this case we have invalid user records which are cleaned up at a later time. The developer will attach a status to the record to indicate in fuzzy terms whether the user record is valid or pending, that way when reports are run they can ignore the pending records. That way they can achieve data integrity. How many times have you thought of using Primary Key/Foreign Key and didn't for any reason? Its the old exception to the rule. Do you have Primary Keys on every table without any exceptions? You don't well if you don't then you have already started on the fuzzy path. And if you do, do you follow 3NF?, 4NF, 5NF?, do you even know what that is? At what point do you you stop, or is it after 1NF? There are different grades, levels of relational, and are you just making arbitrary decisions as to which one is best suited for your environment?

Page 6 / 37

For those who would like to dismiss fuzzy logic as just a fad (or some might even say a failed trend with AI development), keep in mind that fuzzy logic is a mathematical model that has proven that binary logic is just a subset (or special case) within it. See : http://www.doc.ic.ac.uk/~nd/surprise_96/journal/ vol1/sbaa/article1.html There are numerous books in the market covering fuzzy logic. A very good one is Fuzzy Logic by Daniel McNeill and Paul Freiberger. In addition, on the Internet you should be able to find discussions covering the notion that fuzzy is a superset of binary logic. This highlights the case that one can state that multimedia is a superset of relational. From : http://www.economicexpert.com/a/Fuzzy:logic.htm “Another common misconception is that fuzzy logic is a new way of expressing probability. However, Bart Kosko has shown that probability is a subset of fuzzy logic, as probability only handles one kind of uncertainty. He also proved a theorem demonstrating that Bayes' theorem can be derived from the concept of fuzzy subset hood . This should not by any means suggest that all those who study probability accept or even understand fuzzy logic, however: to many, fuzzy logic is still a curiosity. “

Discussion 3: Reactive Thinking Conventional thinking still focuses on reactive thinking. We are presented with a problem then come up with the solution. In our environment we keep working until we hit a problem. Problem solving like puzzle solving can be fun, and as the industry support this thinking pattern it has become ingrained in our culture. We don't anticipate. The work culture does not reward proactive thinking. Solving problems before they occur is expected. “We expect it to work until something happens”. The culture of an organization can control whether abrasive or dominant personalities win arguments. Have you ever been involved in discussion where a strong personality pushes the use of a tool because they think its the best thing ever? You might encounter developers who swear by PHP, Perl or Ruby. In the Oracle community I have witnessed similar personalities pushing Apex, ADF or Java. Strong personalities who adopt a righteous attitude can be even more dangerous in an environment as they will push their agenda without listening to the issues and usually end arguments using intimidation. The arguments might be along the lines, “I am the manager”, “I have a degree”, “We will use this tool unless you can show me your tool is better”, “Your tools doesn't work/is slow/expensive/doesn't work/has issues or I just hate it”. Society has this habit of creating reactionary rules, or rules without expiry dates. This is typically seen in IT when it comes to database usage, tuning and application tuning. The rules are created as a reaction to some bad practice seen or technology limitations. We hear about the need to use bind variables, because dynamic SQL is bad, or don't modify this database parameter because it will cause full table scans. All these rules are valid at the time of writing, but are stated without looking at a more general framework or under what conditions the rules are considered invalid or have expired. What actually happens is these rules get introduced then written in stone “thou shalt not do this”. After some time their original meaning gets lost as different interpretations are applied to them. As technology changes, installs change, business practices change and staff change, then these rules stay on and become ingrained in the business practice. And when asked why? The answer is invariable, that's how we have always done it. We now have a scenario were a rule is being blindly followed and because of technological changes could be redundant. It could be still be valid. It could be a

Page 7 / 37

major cause of tuning issues. The problem is that nobody knows, and until someone stands up and dares to ask the question why are we following this rule? We will never know. This is what I believe has happened with the relational model and in particular foreign keys. One major benefit of the relational model was that when it was introduced it saved on data redundancy, it forced into the design that only one value was needed to be stored. This was crucial as it saved on disk space which was a precious resource in the 1970's, 80's and early 1990's. Now the rules have changed. Disk is incredibly cheap and plentiful and DBAs are happy to waste huge amounts of disk space to achieve better performance. For data warehouses the standard rule is 8-1. We are now duplicating our data via indexes, materialized views, replication and through the use of data warehouses to the point where for every byte of actual data entered we have at least 8 additional supporting bytes. And now we see that foreign keys are being recruited as a new strategy for improving performance. This is not what foreign keys are about and we should note that the equivalent performance gains could be achieved by implementing new constructs which are better suited for the purpose (my initial thought was to suggest better adherence to the SQL3 standard, but requires further investigation). So now we have proponents of foreign keys who think they are the best thing because with them you could get better performance, but that's not why they were created. So now we have the potential scenario were in 10 years time DBAs could be creating foreign key structures as readily as indexes without understanding why they are there, except that we might get better performance if they are there. Date was very careful to separate out the physical design from the logical and ensure the two are separate. This was done for the key reason of ensuring that logical model was kept correct. Date states that “The DBA must also decide how the data is to be represented in the database – this process is referred to as the physical design”10 If you are trying to follow the relational model then this is an important point, its critical to this discussion point. You should do your database design, normalize it to a realistic point, then take that design and transpose it to the physical. Just because the logical design imposes foreign keys, does not 10 Database Systems. C.J Date. 4th Edition. Chapter 2.

mean the physical design has to. The mentality of developers is to take the logical design straight into the database without considering the physical limitations. Date stresses time after time there is a fundamental difference between the logical and physical design. Its a point we have forgotten about in the mist of time. This leads into another interesting issue caused by bad thinking strategies and this is superstitious behavior. When something works and works well, the steps required to achieve it are usually noted, but rarely do we review those steps to see if there is an optimal way of achieving it. For a DBA doing maintenance on a table they might follow a well worn path to perform that maintenance without even look at improving each step, or at times even asking why they are even bothering to do the whole maintenance routine. This can be seen by DBAs who might be performing maintenance on indexes and routinely rebuilding them, thinking this will balance them and improve performance. Not realizing this is a redundant step. It can also be seen by developers who write their SQL statements a certain way or write routines in code. If it works why change it? Its an issue we are always getting ourselves in, and its why its important to listen to peers, see how others program, attend conferences, or just generally review coding practices to see if they are redundant. I am constantly having to review code I wrote with every minor release to see if its mere existence is to be questioned. I have rewritten routines three or more times (ten's of thousands of line of code) because the database architecture in a new release changed the rules, meaning the code was dated. I have thrown out code when the database was enhanced and could do the feature I had once built (eg. encryption). I have routines written that I have hoping I can throw out when Oracle finally listens and introduces in the features I am after. Its a struggle, its a constant learning exercise, its very, very hard to keep up to date with the rapidly changing database. Leave the game for six months and a lot of concepts learnt can become dated. And yet what is seen in most organizations? Resistance to change, resistance to new ideas, resistance to reviewing old concepts. Lets keep the status quo. In the organization I am in, I try to encourage new staff to review coding practices, architectures and challenge me to justify why they are there. Sometimes they put forward views that are so out of left field that I am forced to review core components of the code base. I have even been know to rewrite or abandon key code areas overnight because of this.

Page 8 / 37

And what ultimately causes this behavior, one where we do not question why, review old decisions or are afraid to embrace change? The culture of an organization controls this, and this will be discussed in a later section as being the core reason why there are database performance issues. I am a DBA and Developer, not A DBA who became a developer, or a developer who become a DBA, I am both and very active in both areas. I am also active in System Administration, Network Administration, UI Design and security, but do not class myself as being sufficiently worldly experienced in them to claim the usage title, even though I have extensive experience in all those fields. My focus for the last twelve years has been on multimedia and the Internet.

Discussion 4: The Rules Change When you have worked in the multimedia world you really do start to appreciate that the rules change. Everything you have taken for granted, all your preconceptions, everything which form your core foundation of knowledge, your safety net - it all changes. Its scary, its fun, its a new world. Every core concept that was taken for granted needs to be reviewed, rethought and adapted for this new world view. Its one of the main reasons why I believe the uptake of multimedia has been so slow as its been so frightening for DBAs and Developers to deal with this new world view. Its scary. Its such a radically new environment to work in because you are so far out of your comfort zone, most would find that its safer to stay in the current safe, cozy bed. Some examples to highlight. Lets look at tuning. With multimedia you don't deal with small, simple rows and nice safe atomic transactions. You can deal with records that are gigabytes in size. Ever try to index an image? What does indexing an image actually mean? In the multimedia world you are not just dealing with binary, but a whole new range of types and interacting with those types changes. If you think changing from Varchar to date is complex, try switching from a video to a photo. Try converting audio to text or creating a graphic from spatial coordinates. Now try searching on them. Think a where clause can get complex, try doing search on a video for a chair in it. Just trying to answer that one is so complex and difficult its beyond current technology, so you need to employ workarounds. Now onto more typical multimedia issues and tuning. How much CPU does it take to create a video snippet? What about rotating an image or compressing it. Does zipping a JPEG actually work? Its at this point you realize how many features in the database are built for relational and not multimedia. Oracle pushes compression in the database, but they refer to compression of relational data. Multimedia compression is a science in its own right. Compression with RMAN and backups just doesn't achieve anything of note. Working with Multimedia is like being a left hander in a right handed world. All Oracle features are geared around relational. Enter Multimedia and you feel like the poor second cousin. Try using Multimedia with Streams, replication, partitions, SGA management, clusters and you begin to see the limitations that are inherit in them. And yet the Oracle database is a generation ahead of all other databases when it

Page 9 / 37

comes to multimedia support. The 11G use of Securefiles shows a new mentality for binary management. Its a start, but multimedia management goes beyond Securefiles into new areas. Oracle has a very powerful story for multimedia and transactional support and gets close with the ability to support doing a commit/rollback of multimedia in the one transaction. I am not aware of other databases being able to handle this. Most have blob inserts treated separately. A number can't even handle a blob being larger than their equivalent of a redo log. And yet after ten years and trying to convince people to think that having images in the database is the safest, securest way to go, I am still constantly encountering developers, DBAs and manager who insist on storing them outside the database, as if that is the only possible way. Its frightening, its frustrating but it shows how ingrained into people's psyches relational is. Its a worry, its a concern.

With Oracle, backup/recovery is mature. Yet in the multimedia world its still in its early infancy. So many features are needed to make backups more efficient and faster. Move to the Internet and there are no features in Oracle designed to enable the efficient replication of images between sites, because the mentality just does not exist for understanding that with Multimedia the slow point is the network connection and not CPU, I/O or memory. Lets say you have an audio file and want to allow thousands of users around the world to download it.

Do you put it on one central site and give the server a lot of CPU to allow retrieval? No, because it will not scale because downloading a large number of audio files will result in the Internet connection becoming clogged? How about caching the image? No, because once more the Internet connection becomes clogged. What about introducing RAC and allowing the image to be delivered from different instances? Answer is still the network will become clogged as all instances still deliver through the same network. Traditional tuning methods are just not focusing on the right area. The solution is to replicate via the Internet, securely, the audio files to multiple sites, in different countries around the world. Load balancing can then be achieved, but its achieved using a different mind set. No one supports this architectural concept, yet it is core to scaling multimedia to deliver to large numbers of users.

Discussion 5: Feel threatened

To re-emphasize, with multimedia and the Internet the key performance dimension is not CPU, I/O, Memory or internal networking performance, but rather the speed of the Internet connection to the outside world. A dimension that is forgotten by all database tuners as its beyond their supposed control, because in the current world, DBAs and Network managers do not work together. In most organizations they probably have never met. Which leads to a key discussion point as to why the culture of an organization is crucial for tuning and is covered later.

To that point I happily state that foreign keys are fine in the relational world when being used in the logical model. When moving to the physical model, that is when their usage should be reviewed. When dealing with multimedia they should be reviewed. Foreign keys can be defined in the Oracle database as just definitions, which can prove to be useful when using certain tools. Its when you enable them and force the relationship that you move into a different world.

Taking on Foreign Keys, critiquing them is so threatening a concept for nearly everyone that all arguments I have raised, especially about the illusion of data integrity, consistency are dismissed out of hand. No thought is dared given to this radical concept, because if it was true the fundamental pillar of their belief system would collapse. If such a belief system could collapse so easily, maybe the core on which it is founded is flawed. Maybe there is just too much dependent on it. As I am trying to show, the relational model does not match the real world. Or maybe, if you look at the original definition for relational, you see that the logical model was clearly separated from the physical model. Date was very clear on this point.

So to emphasize, the real world is different and will not always or cannot match our logical model. There So to get around these incapabilities you have to live are no data models so far in existence that work in in a world where the rules change. For starters, the real world. None. Its not like you can create UML realize this - Multimedia is a superset of relational. or an E/R diagram and know that when you put your i.e. relational data is just another Multimedia type. model together it addresses all the database issues. Fuzzy logic is a superset of boolean, its not a Some give the hope they are close, but so far all replacement. When you being to appreciate that, you models assume infinite CPU, network bandwidth and start to see that the multimedia world view is not a storage. None take into account performance, none threat, not a new construct, but rather the next level factor in user mistakes on data entry, or entering in in the database world. inaccurate date. None take into account availability, cross platform issues or the changing nature of technology. None. Not even close. Yet we adhere to these models as if they biblical. If so such a blind adherence to these models was adopted in the car industry covering safety tests, then they would be thrown out. In the car and plane industry they create theoretical models which they then test in real world scenarios. The might test a crash between two cars to see if air bags deploy, or push an airplane wing to its limits to see at what point it will crack. They then check their theoretical models against the real world and ensure they match. If they don't they adjust, adapt or restart. And thank goodness they do this, because would you want to fly a plane where the model used for wing

Page 10 / 37

structure and failure tolerance was never updated for in an internal environment. To allow a user to make new materials? a purchase you get them to create an account and then buy an item. We now can track the user and Yet in IT we try and fit our developers and users to keep them up to date, inform them and in effect the relational model. So in IT do we take our world “control them”. view and mash it to fit the model? How many times would we like to change our users to fit the model? Internet users are getting fed up with being Our view is that its not the models fault but the users controlled, and being asked to create accounts to fault. buy something. When we go to the store we aren't forced to fill in a company membership, we just buy If our database design fails then it must be because the item and leave. So the mentality changes to the our design wasn't right. The relational model is point where users enter in dummy data or just do flexible, enables change, but how many times do not go to the site. Meaning if you force users to DBAs have to take the model then change it to enter in data, the data you get is likely not accurate. achieve scalability or reliability? In the 1990s when If you don't force customers to enter it in, but still relational really started to appear, this was common give them the option, they will put in the data if they practice and well understood. It was just part of the want to. If they do, its likely going to be more course. accurate. Now think of how many new structures are in the Oracle database and have been put there because the model is flawed and can't represent the real world? When we look at foreign keys I see a model that is out of date and unrealistic for most scenario's. It doesn't work with multimedia and its gaping holes are exposed in it for Internet applications. It needs to be updated, replaced or just abandoned. The object oriented model is an improvement but still fails miserably in most areas raised above. Blind adherence to the wonderment of Foreign Keys to me says, you are blinkered, uncompromising and a dinosaur. The Internet introduced more users, a new market and a larger number of ways of highlighting the flaws in the model. When you build applications for the Internet, you soon realize how smart your internal users (intranet users) are in comparison the general public. In the general population there are users at such radically different skill set levels that it pushes the concept of a UI design to new levels. You start to understand what true data accuracy is and what the real meaning of data integrity is. This was highlighted just recently to me when a customer changed direction on their interface to one which takes into account the changing nature of the Internet. They noted that to get “accurate data don't force users”. Traditional thinking when applied to Internet development used models based on computer usage

Page 11 / 37

Another important point to consider is that as more an more people use browsers and get comfortable with the Internet and the simplicity of most screens, then this is the standard they will expect. The notion is that for screen development, if I have a basic idea of what I want to do, then the interface should be intelligent and well designed enough to allow me to do what I need without help. This concept is now making its way back into internal applications and within five to ten years will start to drive and dominate all application design. Its incredibly hard to do, but most users will expect it. “If its so easy to do on the internet why can't it be easy to do the same on my internal system?” The only way of achieving this is by building applications that run on a web browser. As browser technology improves, they will be able to do more things easily (e.g. field validation, asynchronous updates, easier updates). So the outside world is going to be making its way into the intranet world. Its a different world out there in the Internet, a different mindset and all your current notions, models, assumptions, knowledge base, will likely not work when applying them to the real world. And you can't force the Internet population to adapt to your relational model, as the saying goes, its easier to herd cats. When you adjust your thinking patterns to be fuzzy, a lot changes. World perceptions change. On a technical side a lot of core concepts once taken for granted are made irrelevant. Database tuning, security and application debugging are all different. From the way you trap errors to the reliance of accurate user feedback when an issue occurs. All of this changes.

Discussion 6: The practical side Now from a practical side. As a DBA what are some common issues seen with Foreign Keys? Any DBA who has to manage a large database and who has to do the maintenance of them in the early hours of the morning, will understand how painful having foreign keys are. There are other object structures in the database that are also painful to deal with like materialized views, triggers and replication, but because Foreign Keys link tables together they enter a new level of maintenance torture. As Developers and Managers are usually oblivious to this, they just attribute the rantings of the DBA when they encounter these issues as just part of the DBAs usual negative attitude. If I was a vindictive developer I would put self referencing constraints on tables or constraints between tables in different schema's, and then ask the DBAs to perform a basic maintenance operation. The satisfaction that would bring. Foreign Keys usage slows down maintenance and can cause major issues, especially if you forget they are there, or in most cases are not aware they are there. Now in today's environment there are a lot of third party applications, and the DBA cannot deal with the developers of the application. But, they might have to perform maintenance on applications to which they have no documentation or knowledge about the internals (most people tend to forget this little point, DBAs are responsible for all database applications, and not just the ones built internally). So if a schema has foreign keys, a database import now takes longer because it has to be run multiple times (Note: If you don't understand why, then you must be a developer or manager, and enough said.) The response is to document the keys, their usage or just disable them before maintenance. Yes, if that was done, and now I say to developers, how many times do you warn the DBAs about Foreign Keys? Or do you expect them to use some fancy WYSIWYG tool to detect them? And so how do DBAs deal with third party applications that use them? Walking through a minefield is simpler. Basically, the DBA becomes a sapper and have to hunt and defuse the foreign key mines. Lets now focus on the terror of all foreign keys, the cascade delete. Such a powerful feature, such a core component of foreign keys (once more refer to Date and his strong views on them) that not only do you link tables together but you hard code deletes between the two. In theory its a wonderful construct and achieves exactly what it aims to, it removes the

Page 12 / 37

concept of having orphaned records (which really isn't all that bad). I am aware of this feature being so powerful and useful that sites have banned its use, primarily because they have learnt the hard way that when doing maintenance they might accidentally lose data.

The relational model never factors in data maintenance. This requirement by sites to ban such a core feature must surely spark a light bulb in people's thinking that there is something fundamentally wrong with their usage? But no, because our thinking patterns prevent this. If we don't use Foreign Keys, cascade deletes, its not because of the model, its because of us. The classic breakup line: “I am breaking up this relationship not because of you, its me”. Sure. Its really you, but we haven't the stomach to say so. The cascade delete scenario: A DBA deletes from a table that has the cascade delete constraint. Lets say they are running a maintenance script provided by the developers, with the aim being to upgrade the table. The foreign key kicks in and deletes child records (lets make it fun and say this delete cascades and causes deletes in multiple tables). All well and good, except that the delete was part of maintenance and the rows in the original table need to go back in. Oh dear, we have now have a very bad scenario – actual corruption. The child records are deleted, they are lost, they can't go back in. This is a very serious issue, so serious that I have heard most sites just don't allow cascade delete just because of it. Table management scenario: Want to drop a table to do some storage management? I am sorry, you can't because there is a Foreign Key on it (don't even think of doing a truncate). Well the knowledgeable DBA or the developer who is the reading the manuals counters by saying you should do the drop with cascade. Wonderful idea. But wait, what was just dropped? How do I put it back and did I put it back correctly? The developers at this point are saying that this is all part of the DBAs job and they should just live with it. The DBAs should be extracting the definitions before they do maintenance, and yes there are some wonderful tools for doing this. The manuals make it look all simple, especially as their examples work on

2 or 3 tables. Try then working on a schema or schema's with 4000 or more tables. Then try to take into consideration tables with Foreign Keys linked to tables in another schema. It now becomes a maintenance nightmare. Now trying doing this maintenance in a time window which is only open for a couple of hours, usually at 2am and you soon appreciate the pain DBAs suffer when it comes to having to work with the keys. But the DBAs are forced to do it, all because without foreign keys we have not data integrity. The developers had better hope the DBAs don't wise up and realize this is just a myth and integrity can be achieved equally well using other techniques.

you have good as admitted that the Foreign Key model is flawed because if it was correct there would be no need to prevent usage of this fundamental feature.

Triggers are equally bad, but at least triggers can be disabled/enabled quickly. A trigger doesn't suffer from locking and scalability issues on DML that Foreign key's are prone to. They are though prone to mutation which is being addressed by Oracle. Triggers do offer advantages when it comes to working on third party applications where you don't have access to the source code. In this case triggers become an indispensable tool.

Now we get to my favorite, a constraint enabled with NOVALIDATE. One major reason for introducing it was because it takes too long to enable a constraint. So its just turned on (there is the more frightening thought that we have data that doesn't conform but it will take too long to fix, so lets just ignore it and hope no one notices).

Hang on you now ask, isn't that point pure black/white thinking? Yes it is, and I hope it highlights how flawed this binary thinking is when it comes to Foreign Keys. The real world is gray, it doesn't conform to the relational model and trying to squeeze the real world to make it match the relational model just will not work. ENABLE NOVALIDATE Constraints:

Lets now forget about the fact that the table could or could not be corrupted, but it doesn't matter because This raises a new topic worth covering. Most tuning we now have our precious foreign key enabled and books, manuals and presentations focus on tuning checking all new data going in. The concept of a as if you have access to the source code. There are “novalidated” Foreign Key is so absurd when one a lot of third party applications that companies are blindly adheres to the relational model. But from a buying and resulting in the DBA having to manage fuzzy view its quite a sane concept. I'll repeat that and tune them,as well as integrate them into the because its important. When you change your world environment. When looking at third party view to the world of fuzzy logic, it's quite OK to applications in your environment, the rules for adopt this concept. tuning are different. Now ask the question "is the table correct?" when you enable a constraint without valildation. The analogy here is to look at quantum theory and whether the spin of a particle is either up or down. When looking at a particle and before measurement, quantum theory says the particle is both in an up The greatest cause of downtime is that caused by the and down state (it is both, not one or the other). Only when it is measured does it collapse to become DBA. either up or down. When looking at this type of Corollary: To ensure high availability keep the DBA constraint, the table is both correct and not correct, away from the database. its only when we observe the table (look at the keys and check them) do we see what state it falls into. Please note this doesn't imply the DBA is at fault This is a fuzzy scenario, akin to Schroedinger's cat and to be blamed for the downtime. where we can create a third state from two seemingly binary states. The table isn't either correct How many safeguards need to be put in place to or not correct, its in a new state. Relational model prevent this scenario? How much double checking, has no hope of dealing with it, yet this new concept and additional wasted time has to be put in place to protect the database from foreign keys? Why not just had to be invented to deal with realities of bad data or with the fact we don't have enough time to put them it into the application? Get the application properly enable the constraint. Wow, Foreign keys right. Its not a hard concept to come to terms with. can be fun, but that's the geek in me talking. Which scenario is worse – the potential of having an orphaned record or the frightening specter that a DBA day doing maintenance at two in the morning neglects to notice a cascade delete constraint and inadvertently corrupts the data by issuing it?

And as you scream out and say, “we don't allow cascade deletes on Foreign Keys at our site”, then

Page 13 / 37

I would have thought by the time this feature came

in, this blind adherence to the binary, black/white thinking of Foreign Keys would have made everyone rethink their position on them. But no, the opposite happened. People just blocked them out, denied their existence. The model is fine, its our use of it which is wrong - if we could only change the world to fit our model, it would prove to be perfect. So now we see the relational purists put their head in the sand and mutter "they do not exist, they do not exist". Those that accepted this features existence are the ones that have started to go down the fuzzy path of thinking, the path for real world usage, the path to a newer model, one that works in the real world. So back to maintenance, what do we have for DBAs? Blind adherence, usage of the model as dictated by developers and data architects, thinking that by having their data in a Foreign Key that its safe? The DBAs are having to spend a lot of time managing and dealing with these constructs and they suffer because of them. Lets now do a thought exercise. Think about this. Imagine a world where Foreign Keys never existed (for the “relationists” that is hard, but please try). Imagine then that someone said to a DBA, lets introduce this feature: Its one that will slow down insert and updates, require a new method for breaking transactions, you will double, triple your maintenance time and there is still no guarantee that the data is safe, but you must put it on because I want to maintain an ad hoc relationship between my tables. The DBAs would say one of the following: • • • • • •

go away (but not that nice) build it into your application your nuts focus instead on getting the data quality correct get the interface better built write a batch job to clean up orphaned records

After much bickering to Oracle they would then introduce an optimizer directive to define a relationship between tables, and that would be it. Maybe it could interpret the FROM clause join statements to work this out (such a novel concept). The query would then be able to use this to get some new performance from queries. If this happened, no one would introduce the foreign key concept. We have Foreign Keys embedded in our databases because of a relational purists view and focus on a model that has been superseded by object relational

Page 14 / 37

and has been proven time and time again not work in real world situations. We have had the belief that the logical model must match to the physical model and if it doesn't fit, we can just make the real world fit to match it. We need to abandon the notion that the relational logical model can match the physical one. Scare tactics. If I remove foreign keys then my database would become a mess. Translation - I don't trust my developers to do the right thing or I couldn't be bothered checking code and programs so I'll do the easiest thing and put in Foreign Keys. My translation - the DBA works in a culture that prevents them from working with the Developers and ensuring they understand relationships between tables better. The culture of the organization encourages haphazard programming techniques. The implementation of Foreign Keys is there as a band aid solution highlighting an otherwise gaping wound. Focus on healing the wound and not applying a band aid which will not help.

foreign keys, you think they are married to them. Paper Titles In putting together this paper I came up with 15 possible paper titles. Its worthwhile looking at each one in greater detail as that in itself raises interesting issues concerning foreign keys.

9. Using fuzzy logic to show why foreign keys are a legacy concept Though this paper only touches on fuzzy logic and why we need to start seriously embracing it, especially when working with multimedia, this title tried to unite the two concepts and show that the relational model is possibly dated.

1. Tackling the definition of what is data accuracy. As covered in the first discussion point, the idea is to try and address what data integrity actual means, and what it really means when looking at the physical model. 2. Exposing the inconstancies of foreign keys A simple title, but not catchy enough to get any serious interest in the underlying topic. 3. A skeptics view of foreign keys A nice simple title, non threatening, designed to appeal to those with a skeptical nature, without eluding to anything dangerous. 4. Ban foreign keys De Bono released a great book titled “I am right you are wrong”. The title was a deliberate attention getter as it represented everything De Bono was trying to highlight as being flawed with thinking practices. Its there to provoke and annoy. DBAs like to do that. Hence this title. 5. Daring to tread on the Holy Grail of Relational The Foreign Key By elevating the Foreign Key to an unrealistic level and then debunking its use, this title shows the dangers of creating s straw-man argument. 6. Revealing the curtain behind the wizard of foreign keys This title was designed to show that behind the wonderment of foreign keys was actually something very different. 7. Are you a foreign key dinosaur A reference to one of the discussion points raised. Those that refer to foreign keys and use them blindly are living in the past. 8. Leaving the foreign key affair at the altar So many people react so strongly when you question

Page 15 / 37

10. Moving to a realistic data model by breaking the stranglehold of foreign keys Again trying to highlight that the current thinking practices lock people into thinking that only relational is the way to go. 11. Foreign keys - sweet innocent child or slobbering, smelly, ogre? This title was put together to try and get you to think that possibly foreign keys are not as simple as we think. 12. I love foreign keys, I also love eating broken glass and rotten fish This title was put together to illicit a sympathetic response from DBAs who are suffering from constantly having to do a lot of extra work because of them. 13. Moving to a better workplace by releasing the stranglehold of foreign keys This title eludes to the locked in thinking practices which are being seen in the workplace. By targeting foreign keys, the aim is move the thinking to a different world view. 14. Switching from theoretical to practical by critiquing foreign keys This title was designed to show that the physical model doesn't always need foreign keys, even though we push hard on their usage in the logical one. 15. A sane development environment by trashing foreign keys Another attention getting title designed to equate a healthy working environment by countering the arguments given for why we need foreign keys. The final title for this paper was picked because it was bland and boring. Its non threatening and implies a possibly interesting discussion could ensue. The title was deliberate because it was seen that any form of threatening title might be counter productive.

Discussion Point 7: Logical Thinking I attended a presentation at OpenWorld last year (2008) by a well known and respected company. Everything that was said was correct. If you looked individually at each statement made it was correct. It was all factual and proven to be logically right and consistent. It was well received and everyone attending loved it. My only comment as I listened to it was to scribble in the margin, that it was wrong. My world view, my experiences and perceptions could see that what was said was right but also wrong. The reason why is that when you change your world view, then the intent behind the arguments raised can be seen as false. Lets say you attended a conference on mobile communication and all the speakers and attendees had analogue phones. All the discussions, papers would center about analogue phones. The technology discussed would be correct, the concepts raised would all be valid. If you had a digital phone, you would be sitting in these presentations and going "what the" ..... You would know that with a digital phone the rules change. You can stream video, sms, access your email and do so much more. The discussions, papers are all technically correct but still wrong, because its based on a different (and outdated) world view.

go into the fuzzy world of Multimedia, a world where the concept of exactness doesn't hold up. Let's look at an interesting quirk in the Multimedia world, its a simple case. Get two photographs (which can be referred to as a digital image) and ask the question is Image A the same as Image B? The relationalist would say they are the same if each byte matches exactly (binary thinking - do a byte by byte comparison). Its technically correct but in the Multimedia world absolutely useless. Don't think binary think fuzzy, now try this definition: "two images are classed as observably identical when they are perceived to be identical" I have to use a number of analogies, hoping one will allow you to see through the looking glass. If I can't get you to my world view, if I can't break you out of the 2D world you are in, it is impossible to convince you of the underlying argument concerning Foreign Key usage. Failure can be easily judged by the type of questions that will be argued in response to this paper.

Lets look at the issues that happen in Multimedia. Get a person in a room and get them to look at both images. Ask them if they are identical. Now change 1 bit in the image (not a byte but a single bit). The relationsist says they are different because bit to bit, byte to byte comparison says they are not identical. Even a checksum would say they are different (apply To much specializing encourages binary thinking an MD5 checksum to it). Now ask the person again if which prevents us from breaking out of the thinking they are the same. The answer is yes because its patters we get stuck in, and prevent us from seeing a different world view and even appreciate it is there, highly unlikely that a person can differentiate a 1 bit change in the image. we have to usually be dragged into it kicking and screaming. It gets even more challenging when you introduce error correction, in which case the pixel Another analogy. In the 2D geometric world the representation is identical. In fact changing a bit in a fundamental rule is that parallel lines never meet. It JPEG image might not change the pixel can be mathematically proven. Extend your world representation. Print out two images, one is a 10Mb view to 3D and now we have a situation where Tiff, and the other 1Mb JPG extracted from the parallel lines meet, and yet the still conform to the original tiff. Print them out on A4 and ask a person to 2D definition of what a parallel line is. Lets now see if they are different. Two fundamentally different replace parallel lines with Foreign Keys and using images, different sizes, different types, yet as far as them (data integrity) is the rule that parallel lines the person comparing them is concerned, they are never meet. Lets now go to the 3D world, the real identical. Because they are perceived as identical. In world and introduce Multimedia. the Multimedia world, the real world, in the end Parallel lines can meet, the use of comparisons like this have to be subjective. And with Foreign Keys changes, we can subjective comparison you get issues with personal break the 2D world and yet still views, external lighting factors and a whole range of maintain the core rule. Add a other issues that can result in two different people web based Multimedia world view stating that the images are the same or not the and the rules change yet again. same. There is no right or wrong answer. Similar When you can grasp the situation for training neural nets. The training implications of this analogy you begin to see why determines the outcome. So to ask the question, are Foreign Keys are a symptom and not a cure of bad these two images identical is not the right question thinking. Lets look at this point further by daring to to ask. If asked this it just eludes to binary thinking.

Page 16 / 37

You need to start asking different questions and taking a completely different view when answering them, and this means moving down the fuzzy path. This scenario gets to the heart of data integrity and the argument of Foreign Keys. The relational world wants mathematical purity, the ability to prove a SQL statement, to remove anomalies because they upset our mathematical model. Now there is nothing wrong with this, but everything is wrong with this statement. We need the mathematical model to ensure the optimizer can work well and that SQL statements return correct results. The real world doesn't work that way. You need to switch between the 2D and 3D world, just like we switch from drawing sketches on paper (2D) to making a model (3D). Both can be used, both are right, and even both overlap, both are very different with different rules. more flags might be raised. Banks round cents up or down or not at all. Two banks talking to each other might have totals mismatching by a cent. The programs factor this in, because they know they cannot achieve full data integrity. Does this mean banks should throw out Foreign Key's? - No. As I am stating, Foreign Keys should be implemented on a case by case and used where it make sense and The die hard foreign key loving relationsist would say where the implication and cost of using them are understood. binary is just different. Lets then look at something more close to home, lets only look at relational data, Specialization vs Cross pollination. Specializing lets see some flaws in this model. What is more produces improvements within the world view, important - having the address field correct for a cross pollination allows us to produce radical client or having an orphaned record? Let's say a change in direction and achieve new directions client has 3 records, but because of a bad program, a record is orphaned, so now we only have 2 records e.g. Einstein only achieved his theory of relativity (physics) when he saw a particular geometrical that we can see (lets say for argument sake the mathematical solution to a geometric issue (cross foreign key points to a non existent primary key). What is worse situation - having an invalid customer pollination). Specializing in physics would never have allowed him to achieve this. address, a mistyped email address for a client or an orphaned record? The answer depends on your Relational databases attempted to use semantic business priority. There is no right answer. In an Internet business dependent on emailing customers, management as a way of dealing with issues of address, but they never came close to dealing with an invalid email address means lost business. An the issue of ensuring an email address is correct orphaned record, depending on where its orphaned (think how you would achieve it, and using regular could mean a variety of things. There is a good expressions is not the full solution, it involves a very chance I can fix the Foreign Key, I might be able to complex methodology). The technology today really use flashback to read an old value and determine where it should go, I might just delete it. Its not the doesn't deal with correctly validating an email address, one which can't be overcome by robots or end of the world, but if the email is wrong a customer cold be lost, business lost and money lost. crooks. This example on its own shows a flaw in relational, a flaw so big warning signs should be flashing warning us about the dangers In the real world banking systems rely on data Move into the Multimedia world and you need to abandon traditional binary logic and embrace fuzzy system and deal with neural networks. You deal with modeling to produce answers to complex questions, but like chaotic systems, slight changes in the initial starting points can produce dramatically different results.

accuracy, the figure must add up, the books must tally. But in reality they don't, so most bank systems deal with figures not adding up 100% provided they are accurate to the 99.9% mark. If they are out

Page 17 / 37

For more on this issue please refer to: http://www.eternaldonut.com/downloadp/004_intro.pdf (podcast also available, and available on iTunes). This paper discusses the Multimedia terminology and covers Multimedia definitions and then goes into greater details about how to link Multimedia types. See the site: http://www.eternaldonut.com/ and to learn more about core Multimedia concepts go to : http://eternal-donut.blogspot.com/ and look at historic notes. I go into great detail about Multimedia basics from a technical view.

Discussion 8: Logical and Physical Design The traditional strategy for application design has been one where a data architect comes in and based on user requirements designs a logical or conceptual model of the application. This then is given to the DBAs and converted to a physical design. With improvements in technology and more due to lack of time and implementation pressures, the physical design step is skipped and the logical model is put straight on the database (possibly with some storage parameters thrown in). Starting with a logical design and then moving to physical intuitively seems to make sense. Its a natural progression. The problem is that the real world does not work this way. In fact, I am happy to state that starting with logical and moving to physical or starting with logical and applying it straight to the database is flawed and is not the right direction to go in. It encourages reactive tuning and I'll cover this shortly, but first where do foreign keys fit in with this?

So is there another world view beyond Multimedia? The answer is most definitely yes, and its one which challenges our thinking patterns and moves us in a completely different direction. The concept is way to threatening to most, and just getting people to understand Multimedia is the first The foreign keys come from the process of doing the stage. To try and go beyond that is too challenging. logical design of the relational model. As previously mentioned, the Oracle database is Object/Relational and numerous other models can be used for design that do not require foreign keys. The relational model is dated and even though its based on sound mathematical principles, those principles are not relevant in the current technology environment. They are part of the 2D world when we are in the 3D world. They are relevant in their world, but as the playing field has changed, they represent obsolete thinking and concepts. By starting with a logical model (and if we are fortunate moving to a physical model), always means the DBAs are put on the back foot and are forced to react to the design given to them, and are constrained by that design for tuning. Reactive tuning is the mind set most DBAs are trained to understand. Most DBAs would not be even aware that there are better ways of doing things. They are forced to firefight and deal with performance issues as they appear. They are not in a position to prevent them. The physical model dictates the logical. The two must be done together. The hardware the application runs on, the limitations of the database and the programming language, the finite capacity of disks, memory, CPU and the network are all critical for the design. They control it, they influence it. Maintaining performance might involve core changes in how

Page 18 / 37

constraints are managed. As covered early with the delayed delete. The core design was changed to ensure efficient performance with deletes.

database. This myth is exposed in : http://www.eternaldonut.com/downloadp/003_intro.pdf

There are no models at the moment that can efficiently deal with a combined physical and logical model. There should be, and that model should also deal with the multitude of data types including XML, Spatial, Objects and Multimedia. So in the meantime we deal with the models we have, and because we do, we implement architectures that are inherently inefficient from the very beginning.

By believing the database is just a commodity SQL engine has also resulted in a lot of middleware applications appearing. These applications split the data (database – back end tier) from the application logic (middleware – tier).

So I propose the following as an intermediate step to resolving it. All Data Architects should have active DBA skills that are no more than six months old. This will ensure that when they design they are aware of scalability and performance issues and can factor it into the design. The data architect will also be aware of database features that can be used and work well, and can be factored into the design. ii. All DBAs need programming experience. They need to be skilled in programming in all languages uses by the developers. That way they are a in position to review code and know when it is not going to be used efficiently. iii. All Developers need active DBA experience. They need security, scalability and performance techniques ingrained in their core thinking, so when they build applications, these three crucial elements are not ignored. iv. DBAs need to be involved in the design of any new application from the very beginning. They need to constantly review the design for any performance, security and scalability flaws.

In the current environment of 2009, there is justification for this architecture. It works well with legacy systems and on paper it looks good. Unfortunately unless it is designed correctly it can result in applications that will never be able to scale (see Appendix).

i.

A Data Architect that proposes that the data design should be database independent, is committing a design flaw that will result in the worst possible design coming about. The notion that the design has to be separate from the database was pushed during the 90's as a commercial ploy to try to even the playing field across all database vendors. It pushed the notion that SQL is SQL and all databases are now identical. Nothing can be further from the truth. The idea that the database is just a SQL engine has resulted in very bad application designs appearing across a number of vendors. Most have learnt the hard way and now provide port specific solutions for each

Page 19 / 37

One of the strengths of having a 3 tier application was that it simplified the development. The data was separate from the application logic. It also offered scalability. Within 5 years the rules that formed this argument will be redundant because of the changes in technology. Today a server with 8Gb of Memory, 8 cores and a gigabit ethernet is a common standard for hardware. It has natural limitations, so for sites that wish to scale above this, a middleware layer looks attractive. In five years, the common server standard will be 128Gb of Memory and 64 cores (maybe 128). It might be at 10Gb Ethernet, but likely still at a gigabit. So the server will have improved in performance by a factor of 8 (or greater) but the network connection might not have. To scale, the bottleneck will not be the hardware but the interaction between the application and the database. It will make sense to store these two close to each other. Which means moving the application logic into the database. For Oracle this means building in Java or PL/SQL. With PL/SQL now being able to be compiled to C, it will run faster than Java and scales better. So within five years we will start to see the move away from 3 tier and back to a 2 tier model (but not the client/server model) for most applications. It will prove to be cheaper, easier to manage and will be scalable. The middleware layer will move into the database and the database server will become more sophisticated. The data model used will again need to change to deal with this concept and physical design will become even more important.

Discussion Point 9: Looking at the arguments for having Foreign keys

performance, having the constraint enabled make it work - I mean really, is this more an instruction or guide for the optimizer? This is a point worth further All rules are meant to be broken including this one. discussion. Rules exist to serve, not enslave. - software programmer's axiom If I preface Foreign Key with the phrase - these two tables are linked together in a consistent manner, is 1. “Your data wont be accurate” - or also commonly that the same thing? Lets call it a linked index. What stated as “don't you want data integrity?” I am saying is the performance for the optimizer is not because you have a foreign key its because there The first one is a threat, the second comment is an is a relationship between the tables. In the object implied threat designed to intimidate. The answer world I could have a ref. Oracle replication works they want you to say being “Off course I want data better and is easier to manage if you use a Primary integrity, I would be mad not to want it”. And that is Key. Previously it used rowid, but Primary Keys are the end of the argument. easier to work with, so we put a Primary Key on the table to make it easer to manage replication, not As have previously discussed, foreign keys do not because we are following relational to the letter. I guarantee data integrity, but lets look at this issue put a Primary Key on my spatial and text tables, not from a different angle. because I think relational is the best things ever, but because these types require their usage. So if I turn off the constraint, is my data accurate? Could I adopt the Java model of garbage collection The Relational model doesn't require you to put and remove orphaned records at my own leisure? Its indexes on a table (in the physical model that is a a fuzzy concept a bit radical in thinking, So now to different story). The logical model ignores answer the question, I can state “yes I want data performance (hence why its called logical). In other integrity, but I don't need to be anal (slang) about words we use relational constructs such as Primary having it consistent all the time – are you a difficult Key, Foreign Key for performance and not because person? You don't want to be difficult do you?” they are relational and the perceived correct way to go. This is a complete reversal in thinking about their Foreign Keys enforce this concept and mentality that usage. by having them, I have data integrity. Of course we can find exceptions and cases where it is absolutely crucial to maintain the relationship, I can name a number of cases. Though from my perspective I would rather want the programs to be written correctly to ensure there were no breaks with the relationship. Having the Foreign Key is the simple, quick and reactive way of enforcing the relationship. Remove the Foreign Key and you are now forced to check programs, ensure correct programming, maybe even get the DBAs to review the security and performance of the code at the same time. Again, the culture of the organization prevents the correct methodology from being used. The counter argument is that we don't have time to do this, its quicker and is simpler to put in the Foreign Key. So then, would you rather be in an elevator which only used an emergency break or one where the actual up/down cables were made from steel and not rope. Some would just take the stairs. 2. Optimizer works better with constraints Does the definition of a constraint result in better

Page 20 / 37

Having a Primary Key/Foreign Key enabled is as good as having an index or optimizer hint. Its just a tool to give the database better performance. But remember, the relational logical model doesn't give me better performance. The point I am trying to make is when you are using Foreign Keys to

seemingly obtain better performance, you are using them for the wrong reason. It would be better if Oracle used a different construct to achieve this. Its this core change in thinking that drives home the point I am trying to make about Foreign Keys and our use of them. They are just a component of the logical relational model, and a component that loses its significance as we move to a model that uses objects and multimedia. Another analogy, Newton's law of universal gravitation. It works well, its been used in physics and taught in schools as a base principle. So it must be right. Yes, it is right, if we apply it to the world we are in (relational), but once we look at the general universe (multimedia) the rule fails. We now have to use Einsteins Theory of Relativity. Newtons law only works as an approximation, but it gives us the illusion that it is exact. Foreign key adoption and the blinkered, unrelenting, blind adherence to the their usage is the representative case for all this. Its why I focus on it. People who unquestioningly stick to using foreign keys in the physical model reinforce the view that the IT world is full of close minded individuals unable to perceive change. It shows how the ability to question,review and change our perceptions as technology and the rules change is an unknown concept. It shows how locked in DBAs and developers can become. Foreign Key usage represents a wake up call. If I can highlight, shake such a core concept of Foreign Keys, If I can critique it, prove its not core, its not central as its made out to be, then think, what else is in the database should we focus on and review? The answer is everything. When we step back and move into the more general case world of Multimedia, the rules change, everything needs to be reviewed. 3. Maintenance This is a section of concern to DBAs and it must be said that most Developers and Managers would like to turn a blind eye to it. The perception is that maintenance isn't part of the application management equation. So what do we have for DBAs? A methodology pushed by developers and data architects trying to take a logical model and apply it to the physical without change, and under the illusion that by having them, their data is safe? Any DBA knows that safety and integrity cover a whole range of

Page 21 / 37

issues. Having to spend a lot of time managing and dealing with Foreign Keys I realize that. DBAs suffer a lot because of this strategy and have had to suffer in silence. We have Foreign Keys because of a relational purists point of view, and too much effort is spent focusing on a model that has been superseded by the object/relational and has been proven time and time again not to work in real world situations. I am sure you can come up with a scenario where it is crucial that the PK/FK relationship must exist. Great. With my DBA hat on, I will say wonderful, I am happy for you. I can also come up with situations where its critical, but I don't say - and this is important - that all relationships must be PK/FK. To me a PK/FK is a rare exception, like having a partition or a materialized view. Its only introduced when it is crucial that it be implemented. Its a tool that is used as required and in conjunction with the DBAs. Its not a be all and end all, its not blind adherence. A worthwhile discussion point is to cover why there is the thinking that DBAs have to put in foreign keys to protect the database from the developer. This view is wrong on so many fronts. “Many companies employ under-skilled and lazy developers that will always take the quick route.“ anon When I saw this, my view was that by hiring the worst possible developers, the DBAs have to then spend a lot of time protecting the database from them. They are reactive and the result will usually be a poorly written system. My view is that rather than hiring five under skilled developed who might build an application that will end up costing the company more because what they built will not scale and be expensive to maintain, its better to spend more and hire one competent developer who can do it right. This highlights a point I make a lot of times, and that is that management are the cause of most performance issues in the database (see Appendix A).

A Personal History Working in the multimedia world has given me a different perspective on things. I quickly learned that Multimedia is a paradigm shift involving tuning, database management, scoping, creating objects and user interaction. Even screen design and how you tackle working in a multimedia environment is different. I have learnt that multimedia is a threatening concept to all DBAs (not most). They just don't want to know about it. Developers are nearly as bad, but most treat multimedia as just an external file that needs to be accessed. Dealing with multimedia breaks all the rules. You are taken out of your comfort zone. To make it even harder, most Oracle tools and features either ignore or just do not properly support multimedia and the object types around them. It is improving, but since first introduced in Oracle8, it has been improving at a snail pace. Its also worthwhile noting that XML and Spatial can be considered to be subtypes of multimedia. When dealing with multimedia, you deal with objects, abstract data types and XML. I have also dealt with RAC, heterogeneous gateways, PHP and a host of open source tools. On the PL/SQL side I have pushed Mod PL/SQL to its limits and am trying to take it further. Yet go to any Conference and outside of Apex, all talk on PL/SQL is on DML code constructs and triggers. The focus is not in the multimedia environment. Since 1995 I have been promoting Mod PL/SQL as a viable and powerful tool for web development. Integrate it with multimedia and you have a platform that makes it very efficient to access the database and deliver dynamic pages. I have had to build ftp clients, servers, email management routines and raw HTTP requests. I have had to learn how to build networks, enable the application to work over dial-up, configure routers, firewalls, install SAN's and work with all versions of windows, most Linux variations and Solaris. I have had to integrate with LDAP and SSO and configure Apache and the embedded gateway. I have built and rebuilt 3 times an e-commerce engine. Each time learning from the previous and trying to adapt the rapidly changing Internet environment without becoming locked into a model or trend that will become outdated. I have built a search engine and then rebuilt it more than four times that is flexible, uses dynamic SQL with dynamic bind variables, and can access all object types as well as query external databases and Google. I have created an architecture that has

Page 22 / 37

had to be flexible and easy to maintain, so easy that maintaining it can be done from anywhere in the world (and have done this on many occasions). I have had to do compilation and integration of open source code across platforms and yet am constantly made to feel inadequate because I haven't yet done a complete recompile of a Linux kernel. I have learnt that the Internet and multimedia is still the wild west and there are no right ways, just new technologies that one needs to adapt to very quickly. I have worked in a team, had many ideas scrutinized and crushed, have had to make many design decisions to meet sales objectives and be able to switch direction overnight. I have written parsing engines, compilers in PL/SQL, Ajax client interfaces, integrated with over ten different merchant banks, linked with iTunes, Google and a host of relational and non relational databases. I have described the syntax for the circa data type and the fuzzy logic required to search on it. And dealt with so many Virtualizations that am building a new type for Oracle so Virtualizations can be stored inside a database. I have worked in industries that Oracle has never heard of or even looked at. Done pioneering work in areas that are only now being seen, and I have written papers on topics that are so new most Oracle users don't understand or care about it. If you are in multimedia you are in a niche. The thing is, when you look at the big picture, its not a niche but an industry.

types of queries (to me that is the exception rather than the rule). #6 Are they value for money? If they never existed and then someone invented them, would we use them? I think not. I think it would be a hard sell to convince anyone they were needed. And then can we This is an extract from a paper I wrote 2 years ago addressing the issue of Foreign Keys. I have included really trust them if we do have them? Look at constraints enabled without validation. Just by it here because it provides a useful summary. allowing them it means we can create a constraint #10 Relational might say we need foreign keys, but that might not be valid. It says, “I can live with it doesn’t say when it should be enforced. If you say possible invalid data, just enforce anything new”. immediately, then you must enforce them always #5 The whole notion of having data accuracy is the and immediately meaning: “golden fleece” we are all trying to chase. This is not - You can never use constraints enabled without correct. When you start looking at the concept of validation fuzzy logic you begin to realize that perfect data - Warehouse applications, bulk loading, data accuracy is not only a myth it can be inefficient management are all compromised trying to achieve it. Let’s look at the humble date value. #9 It’s a hidden layer. Just like triggers, when they are enabled they add a hidden layer that can make it When we enter in a date, should we store the hard to diagnose and debug when problems occur. seconds, the sub seconds, the milli-seconds? How Let’s say you add a foreign key to an existing accurate do we need it? In theory it should be as application and problems occur inside the accurate as the computer allows, because that is application. Is it because the foreign key has exact. returned errors to the program in an area of code it was never expecting it? Same for tuning. When looking at a complex update statement, you have to When we store someone’s name, do we enforce it to look at the constraints around it as well as the actual make sure it is spelt correctly? What about the street? Can we live with typing errors in comment SQL to tune it. fields made by users? If we want perfect data then we have to be as anal about keeping the data Triggers are one those blessings and curses. They accurate here, as we are about keeping foreign keys can solve a lot of problems because they are accurate. transparent and hidden, but they can cause a lot of problems because of this. If you are no aware they are there, tuning can become very difficult and goes There is no difference. The foreign key concept is one which we are locked into thinking as having to into a multi-layered approach. In this case tuning be perfect, whereas spell check errors in comment isn’t simple it’s very complex, takes longer and is fields are ones that we can live with because it is more frustrating to resolve. what we are used to. My view is that spelling mistakes in a comment field is more dangerous than #8 Don’t you trust your developers? Are they there an orphaned record. There is a balance that needs to as a safeguard because developers can’t write be achieved when enforcing data accuracy and it efficient code? And if developers think they are so depends on the application. For a financial system, great, just check to see if they use them in their data accuracy might be seen as crucial, no mistakes development environment. I bet they are turned off allowed, so we should pull out all stops to maintain because they get in the way of development. A key it, even at the expense of application management. database feature disabled because it gets in the way? They mustn’t think highly of it them. If the #4 Maintenance. Once foreign keys are enabled developers want it so much, enforce them to the letter in all environments. Then when the screaming then the flexibility of the design of the application is weakened. Try using transportable tablespaces, stops, review their use. oracle streams (data guard), replication, materialized #7 Performance problems. Inserts, updates, deletes, views and RAC. With some blood, sweat and many tears you might get foreign keys working, but they checking integrity, updating indexes. All to be done within the transaction. It can slow down transactions. will always be a point of failure, which will add additional costs to the maintenance of an application, In some cases, I begrudgingly accept that having the constraint there can help the optimizer for some and increase the length of time to do maintenance. Discussion point 10: My top ten reasons for not using foreign key constraints:

Page 23 / 37

And on the issue of maintenance. Lets say a DBA has Other constructs that need a good review and paper titles: to do maintenance on application. They want to move 1. The use of Documentation in code data out from a table to another area. They delete the data first (as they can’t do a truncate because of 2. Developers who think they can program the constraints), and unknown to them a cascade and yet cannot write a sort routine. fires. They then do their maintaining and re-insert the data. They have just caused major data failure 3. Debugging code techniques and not even known it. To prevent this from happening, safeguards have to be put in place, but 4. Reviewing code for security holes all DBAs are now living in a minefield, where the wrong move when doing legitimate maintenance can 5. Low fidelity design and building an result in the destruction of the application. application that can adapt to the changing skill set of a user. Even moving a table between tablespaces (dropping it, even setting up partitioning) can be hampered 6. Middleware vs two-tier architectures by constraints. #3 Those that want them are usually not the same group of people as those that have to manage them. Try doing database maintenance on a 300 table application with foreign keys. Try reorganizing a table, maintaining changes to it, even trying to move copies of data in it. You soon realize that you have to disable the keys. In some cases disabling the keys means dropping indexes or disabling the indexes. Which causes problems when they are re-enabled. Time. And do this at 3am when you only have a 2 hour window to do the work and the pressure is on. Then let’s see how keen on foreign keys you are. #2 Locking. Has anyone seriously looked at how expensive it is to manage a locking strategy with cascade deletes? And for a RAC environment, how inefficient are multiple locks, across multiple blocks in RAC? This is a complex and lengthy topic and I will not go into it further. #1 They encourage relational thinking, when the database is object relational. I mean honestly, using object oriented structures in the database can be more efficient than using a traditional relational structure. We should be developing applications that use the best of both relational and objects (and use them that work well.) And that brings up another side issue, why are people so resistant to using the object oriented features inside the Oracle database? It’s not a battle between relational and OO, it’s not that one is better than the other, it’s not one is right the other wrong, and one should not just be using relational because you don’t understand or are not comfortable with the OO features. Ignorance is dangerous, especially from a performance perspective.

Page 24 / 37

7. Treating developers as just coders and why automatic code generation tools have not lived up to their hype. 8. PL/SQL features needed to turbo charge it for Internet usage

Discussion Point 11: Constraints in the database or in the client? Since constraints were first introduced this question has always been asked and is worth a discussion in its own right. Do we let the database enforce integrity or do we let the client enforce it? This question has been asked over and over again for the last 15 years, I know because I can recall discussions on it in the mid 1990s. My view point on the answer has changed over time to the point were the only answer was the question should be changed from an or to an and. Constraints should be enforced in both because each side can't trust the other. I have since changed my view on this again in light of looking at the Multimedia world view. I now believe the underlying question isn't correct and we need to look at this from a completely different angle. Lets review some of the arguments. By having constraints enforced in the client application we: • •

Minimize CPU usage for this check on the server. Can create client friendly applications because the client gets an instant response that they have made a mistake, rather than being told on insert.

This is even more pronounced if you have a 200 column table, its more efficient to get the errors fixed as they go in, and not on insert/update/commit. Mainly because some errors can compound causing errors in subsequent columns. By having them enforced in the database we: • protect the data from all applications trying to access it. We are not relying on all applications to be correct and pass down valid data. • Oracle features rely on these constraints to work. • You only need to do it once and not for every application accessing it. Both sides of the argument are valid and raise very powerful points, the problem occurs with the question being asked in that it tries to apply a logical rule to real world situations. I might be in an environment where there is only one

Page 25 / 37

application accessing the data (and there will only be one) and its a Mod PL/SQL application. So now all the arguments for having constraints enforced in the database are weakened. Creating a friendly interface is more important so I am going to built JavaScript to enforce key constraints. But I could equally be in an environment where there is no client interface. It could be a batch processing environment. In which case the notion of constraints in the client is meaningless. Or what if I am in data warehouse? What if I am getting data from XML DB? The initial question raised actually implies we need to enforce all constraints. This would be Primary Key, Foreign Key, Check constraints, as well as database triggers and a host of other structures required to ensure data integrity. We should also consider constraints imposed by Object/Relational structures (think XML which is actually hierarchical). We need to add on top of this security checks. We also need to put in checks to ensure database calls do not cripple the database because they are badly written or return too much data (see Appendix topic on SOA and Web Services). By thinking a little fuzzy and realizing that depending on the architecture maybe not all constraints are needed, we can now review the actual question. The question is better re framed as: If I use this architecture (fill in the gap with your architecture of choice) which if any constraints are needed to achieve data integrity? There are so many different architectures being used that the answer is now more complex and will be different for each architecture chosen. This question also lets us review the physical model in comparison to the logical and let us check which constraints make sense to use. And remember, when you start working with Multimedia the notion of data accuracy and integrity changes. The initial question is dated and can only be asked when working in the relational model. Even then its still hard to consider it a valid question. When this question is asked, it highlights that the world view for most DBAs and Developers is still caught in the limited relational world and we now need to escape from it.

Discussion Point 12: Escaping the world view Edward De Bono wasn't just known for the saying “lateral thinking”, he proposed a whole new methodology for thinking patterns. Unfortunately this term was attached to his name and it stuck. Too many times I hear people say when it comes to problem solving and tuning, you need to think laterally. Problem is no one ever explained how to think laterally. How is it achieved? How can some DBAs come up with radically different ideas and concepts whilst others are struggling to work out the time of day? The solution to this covers the foreign key debate and highlights why its so hard to change your world view and think “outside the box”. The general thinking is that truth comes through debate, or finding truth through confrontation. This is highlighted by looking at the legal system, you get two sides, one for, one against and let them battle it out. In the end the truth emerges. Except it doesn't, as debates do not allow us to escape the box and think outside it. Debates are part of the thinking process, they are not the thinking process. The analogy as De Bono cited, was that water is part of soup, its not soup. Its become all to easy to land in the world of debates and confrontation to talk about and resolving issues, because that is our comfort zone, its what we are used to. De Bono also raised a 6 thinking hats concepts as a way of challenging people to view discussions as just more than confrontations, unfortunately the perception was that wearing colored hats in meetings was a silly idea and its lost favor. The 6 thinking hats was never about that, but it didn't come up with strategies for how to think laterally, it just covered how to manage ideas and ensure meetings, discussions, didn't degenerate into debates.

The antagonists ensured that all the obvious issues were covered, that nothing was missed, the heart of the show focused around when there was a “house moment”. Which was a moment where some random event, some comment allowed House to escape his world view, see the problem anew and come up with the solution. He could escape his box. How did House do it then, what was the methodology he used to allow his character to achieve lateral thinking? Its the same one which when used effectively can also be used for debugging, tuning and coming up with completely new methodologies. Unfortunately its a technique that is so chaotic, so foreign to most people that they cannot do it, and mostly achieve it by sheer accidental luck and not through a controlled thinking process. So yes, in addition to discussing foreign keys I also raise the issue that the debating methodology is fundamentally flawed, which in my opinion represents the thinking strategies and patterns of all those in IT at all levels. So to understand why I constantly pursue the issue of Foreign keys and why they are bad, is to realize that I am tackling the issue of the thinking patterns of everyone in IT, and why new strategies need to be adopted. For the last 30 years, the same issues confronted by IT with tuning, with design, with development are not being resolved, not being addressed, just painted over and re branded. Things are not changing because the world view has changed. What we are seeing is the the same methodologies being rehashed and even though computers are running faster and improving each year, our own development philosophy and mentality has not changed to keep pace with this.

My experience has shown that the presenters, academics, proponents, have come up with their powerful insights, not through good thinking habits or using a well proven methodology, but rather through reactive methods, caused when due to issues they have had to face and address, forced In the well know TV show House, you might be them to view the issues in a new light. The well forgiven in thinking that if you surround yourself with known presenters have had the pleasure of working antagonists you would end up with solving the in a large number of environments and thus given problem. By surrounding yourself with “no” people them access to more problems and thus more (rather than “yes” men), you are able to discover solutions. This isn't a bad thing to have happen, its and get to the truth of the problem. If you believe just that currently it seems to be the only this, then you will have missed the point of the show. methodology being used to come up with new A person who is used to debating as the only method directions and resolution of problems. for getting to the truth of a matter, might rightly believe this strategy is a good one, but if you recall I am equally guilty, its happened to me many a time. in the show, having a group of people around him I encounter a tuning problem, realize that feature x only helped him resolve the facts, at no stage did is bad, or feature y solves it, then write a paper on it. they help him come to a resolution of the problem. It produces results, and if you attend conferences,

Page 26 / 37

you will see most papers presented cover these results.

Some ways of doing this are mentioned in the appendix, but these are not the only way. So treat So to reiterate, this thinking strategy isn't necessarily this discussion point as a teaser, and if you are eager bad, it's just reactive, unstructured and random in it to understand what the answer is, first focus on why its nature, which typifies the chaotic architectures we I am so constantly headstrong in always talking see today. The religious arguments which center about Foreign Keys even though it has consistently around which product is better, or the rehashing of raised the ire of all those we hear it. In doing so, you old ideas as new, shows this is constantly happening. will understand why focusing on Multimedia allows you to escape from the current relational model So how did House come up with his out of the box world view. solution to problems? He would have solved more problems, more quickly if he realized where his House was right though in pointing out that all insights came from and used a better thinking people lie at different levels, but addressing that and methodology to resolve them, but then the TV show how it relates to tuning is a completely different would have been over in 5 minutes and where would paper. the fun have been in that? Let's just say that the solution to how to come up with out of the box ideas For more on lateral thinking methods see: consistently and turn your environment from being http://www.pdfcoke.com/doc/7545855/Creativityreactive to proactive is beyond the scope of this Lateral-Thinking paper and worth further discussion in another forum. Conclusion For those that have read this paper and believe that this paper is just a debate on the relative merits of foreign keys, then I will leave you with this conclusion. So by all means use Foreign Keys, but don't blindly use them. Use them when it makes sense, when it solves a tuning issue or when maintaining a relationship between tables is crucial to the application running. Just don't use them because that is what you were taught and the relational model says you need them. And don't think you have achieved data integrity in your application if you do use them. You never will achieve it, all you will achieve is different grades of adherence. As you move to the Multimedia world be prepared to adopt a new world view, a fuzzy view and a view that will challenge your core concepts. My view is that the way to best way to tune and ensure optimal database performance is for management to create a culture that promotes excellent Developer/DBA practices. For those that recognize that this paper is more than just a paper on foreign keys, then I invite you to read the appendix where you will hopefully get a better understanding of some of the actual real reasons for why this paper was written. If through this paper I can get you to question and rethink about a pillar of the computing world, then hopefully I can get you to break out of flat world you are in and adopt a new thinking strategy. I also hope that you will see Multimedia in a new light and realize that there is a whole new world out there awaiting discovery.

This Paper with a discussion blog can be Found at : http://foreignkeys.blogspot.com/ Links to further papers can be found at : http://www.eternal-donut.com/

Page 27 / 37

Appendix A: Why donuts are the solution to the tuning problem. If this hasn't got you thinking, then nothing will. I'll answer the “what the?” question at the end of this discussion, but first lets see how one can come to this conclusion. Moving to a new world view. Step back and look at tuning. Keep stepping back and lets see if we can get to the root cause of what causes tuning issues. DBAs when they step back might end up saying - write better SQL. Managers would say get a faster box. Developers would say - improve the design. There are many dimensions to resolving tuning, but when you really look at it, when you take the big step back and say, what fundamentally is the cause of all tuning issues, then in my experience, and this is shown and proven to me time and time again, is that database performance issues are caused by management. The best way to tune is to educate management. This is a fundamental principle of proactive tuning (a concept I have been advocating for over 15 years, and have written papers on). Yet it is such a foreign concept for most DBAs/Developers it is inconceivable. The thought processes do not even allow them to contemplate this, so ingrained are they in their world view. And yet they all know it, they all talk about it, and elude to it as being the core problem in papers and discussions. Management are responsible for the culture of an organization. That is their primary goal. Forget about management being there to sign forms, be a leader and buy things. Establishing the culture is really the only true reason why management exist. Management are not productive. They don't build anything, write code, do design work, answer problems, work with users, debug, tune. They make decisions. Decisions to hire, fire, buy and manipulate. In short they control the culture of an organization through the decisions they make. In turn the culture dictates how people work, how they interact. Let's try and highlight this by looking at some cases. All these cases are based on real world scenario's, ones you have likely encountered. The goal is to highlight how bad practices have led to a culture promoting inefficiency. This is similar to Foreign Key usage. Case 1: Separate Buildings DBAs and Developers just do not get on. A good DBA will act as an auditor and review what the developers are doing with the aim to expose tuning, design and security flaws. A good DBA can do this in a way that doesn't make the developer feel inadequate. Problem is, this rarely happens and conflict ensues. Given time something nasty will happen. The management mentality is to see that its best to keep the two groups apart. This just makes it harder to communicate and slower. Developers will avoid dealing with the DBAs and some will hope they can sneak through code without it being reviewed. Performance/Security issues occur, the DBAs are forced to reactively tune and the two groups are now in a conflict scenario. Management at this point will then move the DBAs to separate floors, maybe different buildings. In some cases different cities. The smart thinking manager might even outsource one or both groups. Management set the culture. They don't see what is happening from a technology viewpoint they see it from a point where two groups are fighting and we need to keep them separate. Change management is then put in place to allow them to communicate. The culture is now set in place. Its ingrained in the psyche of everyone in the organization. DBAs and Developers are never to speak or meet unless there is legal representation. So many sites have gone down this path and yet this cultural setup is a leading cause of performance problems. Why? When DBAs and Developers can work with each other, then they can solve performance and security problems before they occur. DBAs can review code quickly and identify security holes immediately and ensure they are fixed before they become part of the core code. The developers, if they know a DBA is reviewing their code and reviewing it well will more than likely attempt to write efficient SQL and use efficient practices. If you know you are being audited but audited in a good way you are likely to do good work. Such a concept of having the DBAs and Developers working together leads to an environment of proactive performance and tuning. So why doesn't this happen? As mentioned, the culture imposed by management has insisted this cannot happen. If you are in a site where the DBAs belong to one department and the developers to another, then

Page 28 / 37

its nearly impossible to arrange for them to work together. Also, DBAs and Developers are not trained to understand how by working together its in their best interest. They are not trained in dealing with personalities and handling clashes when they occur. They are not training in the basics of psychology and personality management. They are not trained in conflict management and basic communication skills. Management tend to view all DBAs and Developers as the same. Usually just coders, some with bigger ego's than others. See Case 5. How many environments outsource solutions to other places because its less hassle than dealing with their own internal structure - most. And why, because of the culture. Case 2: Nice Testers Testers who are nice are dangerous. Testers who are afraid of hurting the developers feelings because they can see they have put so much time and effort into a program are not doing their job. Management who hire nice testers and who do not setup a culture where the testers can make honest comments without fear of retribution or fear of hurt are ensuring that code is not properly tested. Testers should be able to comment about poor screen design, potential inefficient application navigation issues and should be able to test outside the scope of what they are told to look at. Also at some point in the testing, DBAs and Developers should be able to review and see how the testers use the application. How they do searches, how they move around the application, and check for potential bottlenecks or tuning holes caused by a bad interface. This culture of testers, DBAs and Developers all working together and yet being able to critique inefficient areas of the application is a concept so foreign to most sites that the notion cannot even be raised for fear of being laughed at (which also highlights another management cultural issue). Case 3: The cheap solution (or the expensive solution) When management buy a hardware or software solution because its either cheap and can just do the job, or its very expensive and must be good then, they put in place the ground rules for how the Developers and DBAs have to work. If the DBAs and Developers, both cannot be involved in the decision from the beginning (and not the situation where management say please pick Option A or B) then they are putting the DBAs in

Pyschology

Fuzzy Logic

User Interface

Users

Management

Code

Developers

SQL Security

DBAs

Binary Logic Configuration

Database

Hardware

The Tuning Scale an environment where they now have to reactively tune. A good DBA, a good Developer will know where in the hardware the slow points are. These are likely based on the architecture and model they are working in. They should know what works and what will not work. They should be advising management on the hardware that should be bought and giving them the choice, not the other way around. Case 4: The next trend How many times has this happened in your environment. Management are sold on a new technology because its seen as the next trend or saviour to all their problems. SOA is he way to go, all screens must be carousel, ADF will solve all our issues, Apex will solve all our issues. Java will solve all our issues. You might hear the justification from management (or developers/DBAs influencing management) by using the adage "everyone is using it, so should be". We should use PHP, we should use Middleware, we should use Ruby. Or

Page 29 / 37

maybe the negative situation happens. Lets not use PL/SQL because we don't want to be locked into Oracle. Or maybe they justify a trend based on an impractical concept. Lets build our application so it works against any database because we don't want to be locked in. Has it also ever happened that management have adopted a new technology, usually on the basis that its cheaper, requires less developers, is simpler, purely so they can be seen as smart in their own managers eyes and thus hope to get promoted and move up the corporate chain? I have seen this happen on a number of occasions and the results have always been disastrous. Managers who establish a culture where they determine the direction, are creating an environment where the DBAs and Developer are forced to react. The development tool chosen might not work efficiently with the database, the methodology chosen might encourage inefficient programming habits. In the end the DBA will likely be put in a situation where they have to spend a lot of time doing reactive tuning or having to jump through hoops to get the application to scale (if that is possible), all because of poor management decisions, decisions resulting because of the culture of the environment prevented the correct decision from being made. Case 5: Personality driven environments Have you been in an work environment where the Developers are in control? In this case, it is an environment where a dominant developer has been given control. In worst case scenario's they will take the developer they like the least and make them the DBA. This is referred to as a puppet DBA. The developer then assigns the DBA the role of doing backups and taking their table parameters and adding storage parameters to them. In this case the DBA is not a real DBA. The developer then has free reign to impose whatever trend, design methodology they want. In this case what will happen is that the developers will push through their own architectures, designs and impose their own methodologies without looking at the tuning or security ramifications. They will also likely use their puppet DBA to sign of on all designs so when performance and scalability is not reached they can blame the DBAs because they have a proven paper trail. The reverse can also happen, where an obnoxious DBA, one with a persecution complex, one who cannot handle being criticized grabs the reigns of power and then ruthlessly imposes a strict regime of terror over the developers. This DBA normally holds contempt for developers and views them as coders who have no idea how to write efficient code. The DBA then imposes stringent practices and change management procedures ensuring that requests come in at a trickle and any perceived ignorance of the developers is responded to with anger or heavy handed tones. The developers are then hamstrung and cannot work efficiently. The DBA makes them tune statements which either do not require it or are run so infrequently its not worth the effort to focus tuning resources on them. In both these cases the issue can be seen that the Developer or DBA are at fault and what has this to do with management? Management are responsible for the culture and they are responsible for the personalities and how the teams work with each other. If they have just sat back and watched these scenarios unfold, then they are guilty of incompetence through neglect. In these situations management have been intimated by the dominant players and rather than addressing the conflict have more than likely supported it by ensuring the dominant personalities get their way. No where have they tried to train, teach or advise them to change their behavior. Nowhere have they tried to create a culture where the DBAs and Developers work together and not at each other. Case 6: User database creep Have you been in an environment where users sneak in their own databases because its too hard to work within their own organization? You now get a situation where data is stored in a variety of open source and and free databases which are not backed up, secure or allow the organization to create a global view of all data. All because the users who need to store the database cannot deal with the players in their own

Page 30 / 37

internal structure. All because the culture imposed by management prevents this from happening. Change the culture, change the habits and turn things around so the users want to put the data in a controlled and central data store. DBAs and Developers keep looking at a granular focus to find the solutions to performance issues. If I improve locks, adjust the latches the tuning issue will be solved. Its the mentality that fundamental tuning improvements can be made by going more granular. All that will happen is you get inversely geometric improvements. It can take 10hrs of effort to tune it from 80-90% optimal, but 100hrs to get it from 90-95%. No one is encouraged to think the other way, to look at the bigger picture - and why not? The thinking is that its management's job - but management have no idea. Simply they are clueless (Scott Adams really has this right). We are so used to reactive tuning its become part of the core culture of most organizations, and as mentioned management are responsible for the culture. So DBAs and developers should start to go the other way and tune management. More so to encourage them to establish the correct culture so that the correct habits are put in place. The sites that I have seen that have efficient and well tuned systems actually do this. In most cases they might not even realize they have done it, but a good, competent manager will establish the culture and then watch as good habits, practices and efficient tuning architecture naturally form. Lets now look at one of the most insidious methodologies created, one that if seen highlights that the culture is suffering, and that is change management. Which brings things back to foreign keys. If you see them being used is just like seeing change management in an environment. Its symptomatic of a bigger issue. Its very rare to see a site using change management correctly. A number of times I have heard the comment from people that they feel they invented change management, because it solved a core communication issue in their environment. For them change management was the solution. Why pick on change management here? All it achieves is slowing down the whole process to the slowest point. It converts everyone into a protocol, and an inefficient one at that. Seriously, how many people think their change management architecture can run faster? Change management has a place when used intelligently. In most cases it highlights that the culture of an organization is so bad, that DBAs and Developers can only communicate with each other via official communication protocols. Now lets jump to foreign keys in this analogy. They represent a stage in an otherwise inefficient process. They represent a failure in our tuning architecture. Lets go back to change management. It might be seen as slow and cumbersome, so what is the solution that most people will come up with? They will look at each step and see where improvements can be made. Maybe introduce an automated change management system, maybe better use Service Level Agreements to get faster throughput, maybe even let the DBAs meet the developers in an officially controlled meeting. All you are doing is spending more time to make more efficient steps in a process that is flawed and created as a result of poor management who did not setup a culture to encourage better communication. “In my environment, I threaten people and say if you/we don't change our behavior I'll be forced to bring in change management - ooh the bogeyman cometh”. Management are usually so ignorant about culture in their environment that when one gets it right, its an achievement worth celebrating. But so often all that is made are excuses. Yes, we could do it that way, but my manager won't let me, or they have their own agenda. So in the end, because of bad culture (bad decisions made), the DBAs and Developers are put on the back foot and asked to solve issues that don't lend them well to tuning and performing well. So the environment becomes reactive. So what is a solution to this? This where the donuts come in. Donuts are the universal communication device. Everyone knows about them. Most know they are bad. Most love to eat them, most feel guilty for eating them, and nearly everyone will cut a donut in half or quarters and then eat each piece thinking that doesn't count for calorie intake. They are equally loved and loathed and they are a great way for breaking

Page 31 / 37

down barriers. Walk into a room with donuts and everyone can be your friend. DBAs and Developers should be using this device as a way of re-establishing communication and contact and solving their issues in a more friendly environment, rather than through cumbersome change management. Managers should remove all barriers and get the groups working with each other, only then will an effective and efficient environment be seen. So when tuning, if you do not factor in the culture of an organization, the psychology and relationship between the DBAs, Developers, Managers, Users and other groups. If you do not factor in the usage of tools, the methodologies used, and insist on only focusing on the obvious performance issues and look at only the database, then you are not tuning, you are just applying a band aid. Can one go more general than management? Potentially one can, but that depends on the organization type. Government employees can generalize to politically appointed ministers, private companies to the CEO, board or even shareholders.

Page 32 / 37

Appendix B: From Bind Variables to Middleware After sitting in a presentation run by a well known presenter, I came out of it very disappointed. Not with the presenter, not with how it was presented, nor the technical nature of the topic. Its accuracy wasn't to be questioned. What was disappointing was that after 15 years we haven't changed. The topic was on why its crucial to use bind variables for performance and as a safe guard against SQL code injection. I distinctly recall when Mod PL/SQL first came out in the 1995 time frame, the notion of URL modification and doing SQL Code injection was raised. In fact at the time it was an obvious issue because the support tools that Oracle provided all had this issue. The tools provided were designed to be user friendly and fast to develop. It took some convincing but eventually the security holes in the packages were cleaned up and the holes patched. At the time, the notion of using bind variables was sensible, it was obvious, you just did it and PL/SQL encouraged their use. So after 15 years, to be present in a conference where this issue is seen as so serious that it deserves a topic in its own right makes me wonder, how did this happen, why are we still having to drill home such obvious constructs? Is it because there is a huge turn around of programmers? Is it because developers are not being trained properly, or most programmers just don't care? Lets now look at why from some angles. During the “which is the best database wars” a very dangerous notion was formed. The notion centered around the idea that databases are commodity items and all perform the same. Programmers were taught that if you knew SQL you could program against any database. It didn't matter how you programmed, how you wrote SQL, the database optimizer would sort it out. What we now know is that this notion is dangerous and when adopted leads to poorly performing applications and ones with gaping security holes in them. The idea that if the database isn't smart enough, then we should get a faster box or a different database, has led to all sorts of problems. Its basically held back efficient application development. When developers started to program without understanding tuning and scalability, it was then that the PC Mentality was ingrained in the common culture. This is when a developer thinks that if my program runs OK on a computer it will run fine with ten users or one hundred users. The PC mentality thinks that the database should be hidden and treated as just a data store. The PC mentality has resulted in simple applications like a word processor still not being able to scale to handle more than 60 pages, and takes 60 seconds to start up. The PC mentality results in it taking 60 seconds to login to an internal network over a high speed line and doing over five minutes to do a directory listing with over ten thousand files in it. The PC mentality is a serious issue that has been reinforced by a poor culture. Lets now try and look at this without focusing on the developers and ask could it be the tools that they use? Do the programming languages used enforce efficient use of the database, encourage good performance and security practices? Its well established and proven that PL/SQL has this built into it. It can take more effort to use dynamic SQL than it is to use a cursor with bind variables (try retrieving a large set of rows using the execute immediate statement). But what about other tools like Java, PHP, Perl, C and VB? Do they have built in constructs that make it easy to use bind variables? Do they encourage good access to the database? I don't believe so. I have programmed in all these environments, and each has great strengths and capabilities but all lack any smarts for dealing with a database. Most are database agnostic and all encourage inefficient database access. If you are not convinced on this point, there is a large amount of information on all the features built into PL/SQL which tightly integrate it with the database. Once you see these features you realize what features the others programming languages are really missing. Is this article just a propaganda tool pushing PL/SQL? It might look that way, but I would rather focus on the core issue raised as to why we still focus on bind variables usage. Why aren't we focusing on the culture and changing it to ensure these tools when used are used efficiently? They can be. It takes more effort, but it is possible.

Page 33 / 37

Are bind variables and SQL code injection the only issue? Interestingly no. 15 yrs ago the issue of SQL code injection was seen as a trivial issue to address. A good DBA would spot the issue when doing a code review and get the developer to fix In today's environment, this level of code review rarely happens. Most DBAs wouldn't even know where to look. Once again, a development culture that is not established correctly is leading to security and performance issues.

it.

So are there other issues? Yes, and there are more serious ones. In particular how do you stop someone who is correctly authorized from modifying the URL (or changing a POST) and accessing data they are not allowed to? This isn't SQL code injection but standard URL modification. How do you stop someone from invoking a procedure call from the URL line they are not meant to be able to call? How do we secure Web Service calls and ensure that all data and requests passed to the database are valid? How do we prevent denial of service attacks or stop data flooding (when a large number of legitimate requests are done which fill the database with useless data e.g. someone adds a million items to their shopping basket). There are more important and more serious issues that need to be addressed, and being forced to keep reiterating the basics is very disappointing. It shows that the development community is locked into a world view that prevents them for seeing out. It needs to change. As for Middleware, I will finish with the observation that most sites do not require Middleware. Scalability can be achieved by efficient database programming and ensuring the database logic and objects reside next to each other. Middleware only offers solutions for sites that require serious scalability or have legacy systems. Generally sites using a Middleware layer will produce a culture that encourages inefficient programming habits because they insist on hiding the database from the programmer (who might not even know what the database is) and adopt the mentality that the bandwidth between the middle tier and the database is infinite. This will not lead to a scalable application but rather the opposite. Only when used correctly and intelligently will a 3 tier architecture scale correctly. This topic is worth further discussion which I am sure will happen. Oh, and yes, there are really good solutions to the issues raised above, but that is for another time.

Page 34 / 37

Appendix C: A very dangerous direction: Reviewing SOA with Web Services. This is a such a potentially serious issue regarding performance that it deserves a mention in its own right. Like foreign keys, the move to SOA is now progressing down a path where it is considered a trend and the new direction. SOA is a double edged sword. Its theoretical side is a good direction to be in. By using Web Services it becomes possible to integrate systems in different companies and in different places of the world. It allows a new direction where data can be shared between companies (whether that is good is another topic) and I believe it is a healthy and good way to go. The potential for catastrophic failure is there if the SOA architecture is misused, and misused it will be. This is guaranteed to happen because the lessons from history say it will happen. Within five years we will be reading papers and going to conferences where the talk is how to tune and get working SOA because it runs so slow and doesn't scale. Consultants from major companies will be brought into sites to explain how to develop a scalable SOA. It will be a new industry. The reason why I highlight the potential for catastrophic failure, and not just your run of the mill failure, is that in SOA, the DBAs are really out of the loop on the design side. They can only be brought in when it is shown not to scale, and all they will be able to say is, “I can't fix it”. The reason is because with SOA compared to other architectures, is that scalability and performance with SOA hinge on the network bandwidth. The bleak picture that I am going to paint is that the Web Services used by SOA are going to be automatically generated by tools that have no notion of performance and assume infinite bandwidth. When you factor in that the bandwidth on the Internet is not only not infinite but rather slow, the methodology required to implement SOA requires a complete change in thinking. Tools that automatically generate Web Services should be banned from an organization. Do not use them. Avoid them. Unfortunately, managers and developer will fall for the allure of tools claiming they can generate these services quicker than it takes to ask the DBA if they are any good. This will happen because it has happened in the past. How many times have we heard companies claim that the tool they have allows rapid development? I recall this claim was made with Forms 2.0, then 2.3 and a complete set of tools after that, some which don't exist any more. When you see product managers showing how quick it is to create a one to many relationship on a screen using point and click and then claim its the new best thing, you realize there is no hope, it will just happen. No amount of warnings, no amount of shouting and going don't, don't, will help. The big companies will push it and they will not care about performance (I must acknowledge a glimmer of hope that exists with the Google philosophy where they acknowledge the performance issue and have made performance a core part of their user usage strategy. This is seen by Chrome and the new Google O/S). The potential for achieving abysmal performance is now assured as we move down this path. The ray of sunshine is that the companies providing early Web Services now have realized this and are working to build intelligent Web Services that scale. They have to, their business depends on it. Now here is the important point. Scaling with Web Services doesn't involve getting a faster box, putting more memory in it, or even moving to a cluster. As mentioned the bottleneck, the slow point is the network. Even though performance on the Internet is improving, the rise in Spam and multimedia downloads is going to grow to saturate this growth. Achieving scalability with Web Services involves a multi-pronged attack and it involves building smart , compact Web Services. Its a very simple solution but requires hard work to achieve it. The current thinking strategy is that if we have a schema with tables, all I need to do is just put Web Services over them, and that's that. The first round of automatic Web Service generating tools will do just this. The end result will be a set of Web Services that are hard to use and understand, and which will require a lot of calls to achieve simple requests. The next round after that of automatic generation tools might move up a layer and allow PL/SQL or Java program call to be Web Serviced. This is still going to fail in achieving scalability. Its also going to fail in usability and writing efficient Web Service calls. At this point we will start

Page 35 / 37

to see automatic generation tools use WSDL to talk automatically with each other. With the automatic generation of WSDL we can be assured that the problem of performance is now going to be hidden behind the scenes and near impossible to fix. The solution will not be to get a better, faster network, the result is a core component failing abysmally because it cannot be accessed due to insufficient network bandwidth. With Internet applications, you can't demand that the customer gets a faster network to use your interface. Its naive thinking and will result in lost business. The realization why its a problem can be explained by looking at this scenario. Lets say a Web Service is built and because its automatically generated, there is a lot of junk surrounding it as well as comments embedded in it as well as obtuse words. A snippet of the call might look like: <mylargenamespace:averylongtagname>T <mylargenamespace:moredatarepeated>value <mylargenamespace:moredatarepreate>value What we will see happening is programming invoking Web Service calls that return a large amount of information when only a single value was needed. In the above example, lets say all I was interested in was the first value. If a Web Service call is 100K in size, and one person requests it, then the amount of data to return is 100K + overhead. Time to retrieve grows to handle TCP/IP packet management. Additional overhead will likely be required as packets go through firewalls and are inspected. Lets say 100 people do this call. That is then 100K x 100. But what is more realistic is that the program will not be doing one call, but might be doing 10 to 100 in the one page display, with each request doing a different call. It might get more complex if there is a one to many relationship, and a call is requested for every result in the many relationship. In Forms when this happens, the DBA might have a chance of detecting and possible tuning this situation, but in Web Service calls its going to be really difficult. So we now have 100k x 100 x 100, and our network has reached capacity with only 100 users. What happens when we get to 1000 or 10,000 or more? The architecture fails and fails abysmally. So to achieve efficient and scalable Web Services there are some core concepts we have to adopt and remember: 1. Do not use automatic Web Service generation tools. Never use them. Ever. 2. Parsing XML does not scale linearly. A 10K Web Service could take 1ms to compile. A 100K Web Service will not take 10 x 1ms, but will take longer and consume more memory. It might take 15ms, and a 1Mb Web Service will take longer again. The growth is a possible logarithmic scale. This is when all the XML is parsed up front. If you point out that the DOM parser is different to the SAX parser, keep in mind what is the total time to parse. One parser use the CPU cycles up front, the other as you navigate, but to parse the whole structure still takes time. The solution is to keep the Web Services as small as possible with data volume. In the above example, if the Web Service was written as <m a=”value” b=”value”>T The total size is dramatically reduced. Add an intelligent layer to it, and it could be possible to shorten it to just <m v=”T”/> only if the application requested a compact version. Enable calling programs to request compact or detailed data views. 3. If there are two Web Service calls done in the one global page call, then combine both Web Services. If a program has to make a call to retrieve data, then do a call to retrieve another set of data, it is more efficient to get both sets of data in the one call. This is where automatic Web Service generation tools fail. They cannot factor this in. Its just too complex to deal with this and requires the skill of an intelligent programmer.

Page 36 / 37

4. If one value (or a small subset) is required to be returned, then that is all that should be returned. Do not send back junk data unless the calling program requests it. 5. If the server can do smarts efficiently, then let it do it. Is it best to return data back to the program unsorted because sorting is seen as an extravagant waste of CPU cycle or is it better to send it back sorted? Its better for it to be returned sorted, and this is because scalability occurs at both the client and server end. If the program has to do a call and then do its own sort program to deal with it, its just not efficient. This moves into the area where the Web Services generated must take into consideration the performance and usability aspect of the client calling program. The mentality of SOA drives home the point that we are providing a service and you either live with it or else. 6. Prevent large volumes of data from being returned unless requested. This is equivalent to the optimizer ALL_ROWS and FIRST_ROWS. If the calling program only wants an initially set of rows then return a small set. Don't employ the strategy of here's all the data, you deal with it. This mentality lends itself to massive volumes of data being returned and saturating the network. Sometimes its efficient to return a large set of rows and avoid repeated calls. In this case compact the data. Review these rules every 12 months. These concepts might be new to some, but are obvious to anyone programming against Web Services and are a blessing when sites support these concepts. Adopt these concepts and you will achieve scalability of your application on the Internet by using SOA. And remember SOA Web Service usage is a double edged sword. You can either get great performance results by doing the hard work, or suffer catastrophic failure by adopting the easy option and using automatic generation tools. Methods that can be used to create an environment that encourages efficient Web Service creation (ones that can be used to establish a healthy culture that focuses on network efficiency) : A. Setup the environment so all developers are forced to work over a modem speed for all Web Service calls. Follow the rule that ten concurrent users must be able to run the application over a shared modem line and achieve good response times. This will ensure that if inefficient calls are built, the developer will immediately know about it and feel the pain. B. Setup a virtual charge system for the developers. Allocate a virtual salary to the developers. For every web service call deduct $100 from their salary. For every 1K of data beyond a 10K call, charge $10. Treat multimedia data in a separate category. At the end of the project compensate the developers using the virtual salary as a guide. Use a reward system. If they build a proven, functional Web Service call that is under 1K in size then they achieve a bonus. These two simple methods will change the culture of development and will ensure that scalable Web Services are built. It will motivate the developers to think that the network is not an infinite resource but a limiting one that must be respected.

-------- 00 --------

Page 37 / 37

Related Documents

Discussion
April 2020 24
Discussion
October 2019 35
Discussion
May 2020 23
Discussion
October 2019 43
Discussion
April 2020 11

More Documents from "Keith Benson"