database
database systems
logically coherent collection of related data with inherent meaning, representing some aspect of the real world, and designed, built, and populated with data for a special purpose, with an intended group of users and preconceived applications in which these users are interested [elma] collection of persistent data used by the application systems of some given enterprise [date]
database
database management system
traditionally used to store text and numeric data nowadays, databases find application in practically all disciplines in which computers are used, as well as in emerging consumer electronic devices multimedia applications geographic information systems data warehouses, data mining, OLAP industrial and manufacturing systems enterprise resource planning/management systems Internet and e-commerce applications online gaming and entertainment applications
electronic databases are created and maintained using a database management system (DBMS) general purpose software system that facilitates the processes of defining, constructing, and manipulating databases for various applications used to manage databases of varying size and complexity provides tools, interfaces, and data languages for users of different skills and technical backgrounds often used in developing database applications (alone or in conjunction with other programming environments)
database system
database system
electronic record-keeping system components: database and DBMS [elma/silb] database, DBMS, hardware, and users [date]
DBA and staff
database
data administrator DBMS application/tool developers
hardware
end users
1
evolution of database systems
evolution of database systems
driving factors need for program-data independence support for complex data types and structures increasing number of users with non-technical backgrounds advancements in hardware and software technologies
1960s: traditional file processing systems use of special purpose DBMS for large and complex ventures (e.g., the Apollo moon landing) first efforts at standardization (Data Task Force)
evolution of database systems
evolution of database systems
1980s: 2nd generation DBMS (RDBMS) introduction and widespread acceptance of the relational DBMS (RDBMS) in business and industry RDBMS led to the introduction of the Structured Query Language (SQL) 1990s: 3rd generation DBMS (object-oriented) era of client-server computing, multimedia, data warehousing, and Internet applications requiring more complex data abstraction mechanisms object-oriented DBMS, object-relational DBMS
continuing trends for 2000 and beyond distributed databases, Web-enabled databases, “universal servers” fusion of database and artificial intelligence (AI) technologies for more “natural” query interfaces as well as “intelligent” applications data mining on complex multidimensional data for enterprise-wide strategic planning and decision support systems proliferation of personal databases for PDAs and other consumer electronic devices
traditional file processing
traditional file processing
Student Accounts System
Student Accounts
Accounting Personnel
Student Student Information System Registrar
Subject
Grades
1970s: 1st generation DBMS (legacy systems) commercial advent of DBMS in industries requiring the management of complex data structures introduction of the hierarchical DBMS and the network DBMS, characterized by record-at-a-time processing, limited data independence, and no widely accepted theoretical foundation
inherent difficulties program-data dependence data redundancy and associated problems of data consistency and integrity limited data sharing lengthy and costly application development and maintenance data accessibility problems, especially for nontechnical users proprietary data formats and constraints data security issues
2
database systems
benefits of database systems
Student Accounts System
Student DBMS
Accounting Personnel
Accounts Subject Grades
Student Information System Registrar
Student Monitoring System
Database
data independence (i.e., the immunity of application programs to changes in the way data is stored and accessed) controlled and minimal data redundancy (leading to improved consistency, integrity, and quality of data) provision of metadata (through the data dictionary or system catalog) allows for the “self-describing” nature of databases improved data sharing enforcement of standards and data constraints (such as data formats, business rules, “active” rules)
Guidance Office
benefits of database systems
costs and risks
data security (user authorization and authentication, encryption) improved data availability and reliability (e.g., distributed databases) improved data accessibility and responsiveness (variety of interfaces for different users, powerful tools for data manipulation) increased productivity of application development and reduced costs of maintenance “economies of scale”
hardware and software costs installation, management, and maintenance cost and complexity need for specialized personnel data conversion and/or migration costs organizational conflicts regarding the “ownership” of, privileges on, and priority of access to, shared data
the database approach..
database system users
.. is as much a way of managing organizational data.. .. as it is a set of related technologies for creating and maintaining the said data.. thus.. .. poor organizational planning or database design and implementation.. .. may hinder the realization of its potential benefits.
database administrators (DBAs) and staff, data administrators [date] application developers database and interface designers, software engrs, system analysts, programmers end users casual end users, naïve/parametric end users, sophisticated end users, stand-alone end users other users dbms system designers and implementers, tool developers, operators and maintenance personnel
3
dbms languages and interfaces
dbms languages and interfaces
data definition language (DDL) storage definition language, view definition language
privileged DBA interfaces programmer and application developer languages and interfaces precompiler approach, procedure call approach, object-based approach end user interfaces menu-based, forms-based (GUIs), natural language query interfaces, wizards-based, Web-based report generation languages and interfaces
data manipulation language (DML) retrieval, insertion, deletion, update stand-alone query language, data sublanguage embedded in a host language in typical modern DBMSs, these languages are not distinct languages; rather, a comprehensive integrated language is used (e.g., SQL)
typical dbms components
misc dbms utilities and tools
export and import utilities backup and recovery tools file and database structure reorganization utilities report generation utilities performance monitoring and tuning utilities CASE tools application development environments communication software
database schema
database instance
description of the database with respect to the structures that are used to store the data, the integrity constraints defined on the data, as well as operations on the data usually specified during the database design phase using a data model changes infrequently, if at all a.k.a. database intension
actual data contained in the database at a particular moment in time changes with each update that is made to the data in the database a.k.a. database extension, state, or snapshot
4
DBMS architecture the three schema (ANSI-SPARC) architecture is a proposed standard to facilitate data independence and support multiple user views of the data in the database levels of the three schema architecture conceptual (community view) level external (user view) level internal (physical) level
three schema architecture end users
external level
external view
external view
external/conceptual mapping
conceptual level
conceptual schema conceptual/internal mapping
internal level
internal schema
stored database
three schema architecture
three schema architecture
conceptual (community view) level uses a conceptual schema to describe the overall structure of the organizational data stored in the database from the perspective of a community of users hides details of physical data structures; focuses on the description of data entities, data types, relationships, user operations, and data constraints usually specified using a conceptual data model or an implementation data model e.g., an E-R diagram or a relational database scheme representing an organization’s database
external (user view) level uses a number of external schemas or user views to provide a logical description of some portion of the database that is required by a specific user to perform a task contains only a subset of the associated conceptual schema relevant to a particular user or group of users (other parts of the database are “hidden”) usually specified using the same data model used in the conceptual schema e.g., report for a manager, screen display for a clerk, account transfer transaction for a bank client
three schema architecture
data independence
internal (physical) level uses an internal schema to describe the physical storage structures of the database, with complete details of data storage and access paths specified using a physical data model e.g., physical characteristics of tables, indexes, file placement in secondary storage, physical storage device characteristics
the three schema architecture enables a DBMS to support data independence by providing mappings between schemas application programs refer to the external schemas, and the DBMS takes care of the necessary mapping to the conceptual and internal schemas whenever changes occur in a lower schema, only the mapping between the said schema and a higher level schema need to be changed
5
data independence
data models
logical data independence is the capacity to change the conceptual schema without having to change the external schemas and their application programs physical data independence is the capacity to change the internal schema without having to change the conceptual schema
a data model is a collection of concepts used to describe a database structures used to hold the data (data types and the relationships among data items) integrity constraints on the data operations permitted on the data categories of data models conceptual (high-level, semantic) models physical (low-level, internal) models implementation (representational) models
data models
data models
conceptual (high-level, semantic) models uses concepts that are close to the way many users perceive data a.k.a. entity-based or object-based data models physical (low-level, internal) models uses concepts that describe details of how data is stored in the computer’s secondary storage implementation (representational) models falls between conceptual and physical data models, balancing user views and storage details a.k.a. record-oriented data models
hierarchical data model organizes data in a tree structure implemented in a joint effort by IBM and the North American Aviation (Rockwell International) in 1965 resulted in the IMS family of systems; popular from the late 1960s through the 1970s network data model based on mathematical set theory formally defined in 1971 by the CODASYL implemented in a large variety of systems such as IDMS (Cullinet), DMS 1100 (Univac), IMAGE (HP), VAX-DBMS (DEC)
data models
database design
relational data model (RDBMS) organizes data as relations (tables) formulated and proposed in 1970 by E.F. Codd (IBM) currently implemented in several commercial products (e.g., Oracle, Sybase, Informix, CA-Ingres, MySQL, MS SQL) object-oriented data model (OODBMS) added database functionality (i.e., object persistence) to OOPLs such as C++, Smalltalk, and Java e.g., ONTOS DB, VERSANT, ObjectStore, GemStone object-relational data model (ORDBMS) hybrid RDBMS and OODBMS approach
functional requirements functional analysis high-level transaction specification
requirements collection and analysis
miniworld
database requirements conceptual design conceptual schema (high-level data model) logical design
application program design
transaction implementation
conceptual schema (implementation data model) physical design internal schema
application programs
6