DATABASE SYSTEMS ANALYSIS AND DESIGN
E.E. Tozer, Software Sciences Limited April 1976
ABSTRACT Developments in the database field have tended to emphasise programming technology, with a dearth of accompanying progress in systems analysis and design methods. This paper puts forward an overall view of system design which is intended to act as a constraining framework. It is based upon a pragmatic approach and is presented in a form which could be (and is being) used on large scale implementation projects. Orderly analysis and design procedures are encouraged. The taking of premature design decisions is discouraged, especially through the recognition of three distinct views of data: conceptual, implementation and storage, and through recognition of distinctions between design of each of these, specification of mappings between them, and design of programs and run sequences. It is envisaged that specific procedures developed elsewhere (see references) could be incorporated into the methodology described here.
Software Sciences Limited Abbey House 282/292 Farnborough Road Farnborough,
Hampshire
Telephone:
44321
Telex:
858228
194
i.
INTRODUCTION As the computer systems of an organisation develop, there is an increase in the degree of overlap between originally distinct systems.
Further,
increased confidence and familiarity leads
to the individual systems being enriched in power and sophistication.
The combined result is a vast increase in the
complexity of the system design task, at that very point in time when greater reliance is being placed upon the performance,
reliability and accuracy of the computer systems.
The programming aspects of this situation have been recognised for some time, and have been tackled with partial success in the form of "Database Management" systems analysis
systems.
and design procedures
both by the complexity of requirements, developing programming technology. overall view of the processes
Traditional
are being outstripped and also by the
This paper proposes an
involved in analysis and design,
with particular emphasis upon the data, as opposed to the processing aspects.
Different, partially overlapping,
aspects
of this field have been tackled in a theoretical manner by Bramhill
& Taylor
many others.
(1975), Brown
(1974), Robinson
(1974), and
Whether or not a theoretical basis exists for
the taking of a design decision,
a system designer "in the
field" has to take that decision,
and he had to live with the
results. Rather than offering precise formulae for each and every stage of the progress,
this paper puts forward an overall framework,
which identifies
important stages,
distinctions between them.
and draws the necessary
This fremework is most important,
as it ensures that premature design decisions are avoided, and gives direct guidance as to the actual sequence and purpose of particular analysis and design procedures. intention to over-constrain
There is no
the particular techniques or
their variants which individual practitioners would embed in this framework;
the only criterion is that the technique
should be adequate for the purpose, overall methdology.
However,
and consistent with the
in the interest of being as
specific and as helpful as possible,
descriptions of, or
195
references to, suitable techniques are included where appropriate.
The analysis and design approach proposed is seen as being applicable to DP systems in general.
Only in the latter
stages does it become dependent upon use of a particular DBMS.
Projects using conventional files or non-Codasyl
DBMS would benefit considerably from adoption of the design approach put forward, with appropriate variants in the latter stages. Section 2 explains the overall viewpoint taken, which is that of viewing systems analysis and design themselves
as a system.
This system is explained in detail in section
3, and each of
the main processing functions is examined in turn in more detail in sections 4 to ii. Except where it is explicitly stated otherwise,
flowcharts
in
sections 2 and 3 show data flow:-
Signifies a data container
(file,
document etc).
name
J
Signifies a process.
Signifies data flow, which may only be from container to process or process to container.
2.
VIEWS OF DATA
There are several distinct views of data which are relevant to the process of analysis and design.
This view is not
coincident with that adopted by ANSI SPARC/DBMS Study Group (1975). The areas o f classification chosen are:-
196
(i)
Level of a b s t r a c t i o n a)
Conceptual The v i e w of data h e l d by the organisation.
This
v i e w is p u r e l y a f u n c t i o n of the m o d e of o p e r a t i o n of the o r g a n i s a t i o n and the p o l i c i e s of its managers.
It is i n d e p e n d e n t of the e x i s t e n c e of
any c o m p u t e r systems,
and s h o u l d be e x p r e s s e d in a
form m o s t s u i t a b l e to e n d - u s e r s b)
in the organisation.
Impiementatign All or some subset of the c o n c e p t u a l d a t a v i e w m a y be u s e d by c o m p u t e r systems.
The i m p l e m e n t a t i o n
v i e w is a r e p r e s e n t a t i o n of that subset. d e s i g n e d and e n c o d e d into m a c h i n e - r e a d a b l e this purpose.
form for
The i m p l e m e n t a t i o n v i e w remains as
i n d e p e n d e n t as p o s s i b l e c)
It is
from c, the storage view.
St0rage A c t u a l data,
d e s c r i b e d to p r o g r a m s by b, m u s t be
h e l d on backing store in some form.
The storage v i e w
is a complete d e s c r i p t i o n of such d a t a on backing store.
(ii)
B r e a d t h of v i e w a)
Global,
or corporate.
The c o m p l e t e v i e w of data,
r e l e v a n t to all p r o c e s s e s
c a r r i e d out in the o r g a n i s a t i o n .
b)
Specific,
or functional.
The v i e w of data r e l e v a n t to a p a r t i c u l a r
function of
the o r g a n i s a t i o n .
(iii)
Existence W h e t h e r or not it is n e c e s s a r y for o c c u r r e n c e s of data items to exist in some p a r t i c u l a r form.
197
VIEWS OF DATA
FunctionalView
CorporateView
CONCEPTUAL
I
subset
I
~I
q CONCEPTUAL
I Selection/ design
design
subset
IMPL~TATION
I
Re-group~
I0N '~IIMPL~I~WfAT "SUBSCH~NA"
"SCHWA" /
I design
/
/
TRANSLATION
STORAGE
Fig.2.1
198
The c o m b i n a t i o n of these views of data is shown in Figure 2.1.
The terms S C H E M A and S U B S C H E M A are d e f i n e d by C O D A S Y L D D L C (1973).
They are used s u b s e q u e n t l y in this paper in the
sense shown in 2.1. All of the data d e s c r i b e d by the Storage v i e w m u s t actually e x i s t on b a c k i n g storage.
The d a t a d e s c r i b e d by the S U B S C H E M A
view must e x i s t on an a s - r e q u i r e d basis.
There is no
r e q u i r e m e n t that d a t a d e s c r i b e d by the other 3 views should p h y s i c a l l y exist at all. "translation"
D a t a b a s e access is e x p r e s s e d by the
arrow b e t w e e n the s u b s c h e m a and storage views.
The DBMS makes use of s p e c i f i e d m a p p i n g s b e t w e e n storage and schema and b e t w e e n schema and subschema, to carry out
(hopefully)
o p t i m i s e d translation.
Design should c o m m e n c e w i t h the c o n c e p t u a l view, to the storage view.
and progress
For any g i v e n c o n c e p t u a l v i e w it should
be p o s s i b l e to choose m a n y i m p l e m e n t a t i o n views.
For any
given i m p l e m e n t a t i o n v i e w it s h o u l d be p o s s i b l e to choose m a n y storage r e p r e s e n t a t i o n s .
A S U B S C H E M A v i e w is u n a f f e c t e d by any S C H E M A change which results in a new S C H E M A of w h i c h the S U B S C H E M A remains a subset.
C u r r e n t d e v e l o p m e n t s by the Codasyl DDLC include d i v i s i o n of S C H E M A DDL into a n u m b e r of categories. categories
Some of these
are r e l e v a n t to the d i s t i n c t i o n b e t w e e n i m p l e m e n t a t i o n
and storage views of data.
The m e a s u r e m e n t ,
tuning,
resource
a l l o c a t i o n and storage c a t e g o r i e s b e t w e e n them relate to the storage v i e w of data.
The CODASYL DDLC D B A W G
(1975) have p r o p o s e d a m o r e
complete
s e p a r a t i o n of the storage v i e w into a Storage D e f i n i t i o n Language
(DSDL).
199
3.
THE ANALYSIS AND DESIGN PROCESS
3.1 Introduction This section outlines the method of analysis and design which is discussed in more detail in the remainder of the paper. Figure 3.1 shows the process in flow-diagram form, and can be regarded as a guide to the paper. Analysis and design is iterative.
The need for iteration
pervades almost every data path shown in 3.1.
It is not shown
explicitly because to do so would make the diagram excessively complicated. 3.2 Relationship betwee n Anal[sis and Design Traditionally, where system design is recognised as taking place at all, it is regarded as a process which occurs after analysis is complete,
and before programming commences.
It
is more useful to regard both analysis and design as each taking place at a n u ~ e r
of different levels of refinement.
At each level, the order is analysis or some blend of the two) requirements,
(or hypothesis
leading to definition of functional
leading to design of a process to meet those
requirements.
During the design process, refinement,
at each successive stage of
additional opportunities may be realised,
and the
unpleasant truth may emerge concerning the severity of some constraints.
Either at one stage, or through a number of
stages, iteration may take place, by means of presenting the originator of the requirements with the highlights results of the design process, to modify the requirements
of the
and offering the opportunity
in the light of these.
3.3 The Nature of Design The design process consists of two distinct stages: of a range of possible solutions to a problem, from the range of a best-fit,
generation
and selection
according to a pre-defined set
200
of criteria.
Failure to achieve a g o o d enough solution may
result in i t e r a t i o n i n v o l v i n g a l t e r a t i o n of some of the initial conditions
and r e p e t i t i o n of both stages.
3.4 D e v e l o p m e n t Stages
The p r o c e s s e s d e s c r i b e d in 3.2 and 3.3 take place at each of a n u m b e r of stages, refinement.
representing successive
levels of
Some of these stages are r e l a t e d to the d i f f e r i n g
levels of a b s t r a c t i o n of v i e w s of data.
THEANALYSIS &DESIGN PROCESS THEORGANISATIONI iANDATAIs FUNNCTITINsAL~ ~~b,,~~ #"
! I
/ [!ALIDATION OFi DATAMODEL
I
DESIGN USER ~ INTEP~AC)
1
ND ~ ~ R U N DESG IA N
STORAGE SCHEMA STRUCTURE DESIGN
POP
SIMULATION OR LIVE RUNNING FIG.3-1
202
4.
DATA ANALYSIS
4.1 O b j e c t i v e s To a s c e r t a i n the objects
(concrete or abstract), w h i c h
are m e a n i n g f u l to the systems of an organisation. To d i s c o v e r the nature
and r e l e v a n t of all r e l a t i o n s h i p s
b e t w e e n such objects. To define the p r i v a t e language of the organisat~on. Thus to p r o v i d e a c o m m o n l y a c c e p t e d t e r m i n o l o g y w h i c h can be u s e d to specify systems and procedures. To i d e n t i f y each data item, different manifestations
and to d i s t i n g u i s h b e t w e e n
of the same item
(e.g. d i f f e r e n t
coded forms). To i d e n t i f y and e l i m i n a t e synonyms. To fully define, u n i f y and r a t i o n a l i s e To b u i l d a "where used"
index,
coding systems.
so that the
r a m i f i c a t i o n s of an a l t e r a n t i o n to a p r o c e d u r e or to the role of a piece of data can be easily explored.
4.2 T e r m i n o l o g y
and C o n v e n t i o n s
The m e t h o d of w o r k i n g p r o p o s e d takes the v i e w p o i n t that systems exist to serve the end-user.
Thus w h a t the e n d - u s e r wants,
and how he sees his e n v i r o n m e n t of information.
is the most i m p o r t a n t source
His p e r c e p t i o n of his role,
and his w o r k i n g
e n v i r o n m e n t is n e c e s s a r i l y p a r t l y subjective.
It follows that
the a p p r o a c h to data m o d e l l i n g needs to cope w i t h this subjectivity.
The c o r p o r a t e data m o d e l
is framed in terms of "things"
r e l a t i o n s h i p s b e t w e e n things.
and
The terms entity and
r e l a t i o n s h i ~ have b e e n adopted as being w i d e l y current for d e s c r i b i n g the c o n c e p t u a l v i e w of data.
E n t i t y is d e f i n e d as a person, place, to the enterprise.
a s u b j e c t i v e process. d i r e c t e d by e n d - u s e r s be meangful. entities, accepted.
thing or event of i n t e r e s t
As such, its i d e n t i f i c a t i o n m u s t be p a r t l y S e l e c t i o n of entities must be p r i m a r i l y of the p r o p o s e d systems,
to w h o m they m u s t
It is i m p o r t a n t to w r i t e clear definitions
and to ensure that these d e f i n i t i o n s
of
are g e n e r a l l y
203
It is necessary to distinguish between entity types and entity occurrences.
An example of an entity type is "Vehicle".
Occurrences of this type are: "Blue Ford Cortina XYZ 132K" "Red Honda, Motorbike ABC 789J" Entity types may be shown on data relationship diagrams as named rectangular boxes.
t VEHICLE J
e.g. Relationships
exist between entities.
(Use of the term
relationship does not imply any connection with the specialised terminology of the relational data model). Relationship may be 1 to i, 1 to many, or many to many where 1 or "many" refers to the number of occurrences of the entity type which may b e i n v o l v e d
in the relationship.
They can be shown in the following way:
J Eois ioI I
i to l
..[~0o~ ...
1 to many
PERSON
Vehicle owned
~I VEHICLE
]
(i.e. with the arrow-head on the many end)
~ i CLE k - - ~ ~OOATION ~O~I l" "ISERvIcE I
many to many
Very often, many to many relationships two one-to-many relationships.
e.!g.
or
I
VEHICLE
can be analysed into
~ Location~where it can be serviced
.......
Vehicles which can
~ L O C A T I O N FOR I SERVICE
. . . . . .
be serviced
VEHICLE
I
LOCATION SERVICE FOR I
Where~it can be serviced\ ~
/ V e h i c l e s which can be serviced
~E~VICING O7 A VEHIC~E | AT A LOCATION
204
may be more appropriate particular
application
representations
for the purpose
of a
system.
An entity must have at least one type of data value associated with it, and will usually have several.
These data items types
are termed attributes. For example
the entity type vehicle
types colour,
chassis-no,
registration-no,
seating
may have the attribute
engine-no, capacity,
date of purchase,
weight,
and possibly many
others. One or more of the attributes
of an entity may have unique values
which can be used to distinguish of the entity.
This attribute
between different occurrences
or collection
called the identit_~ of the entity. be identified by chassis-no, Participation
of attribures
For example,
vehicle may
or by registration-no.
of an entity in a specified
relationship
may be
~VEHICLE
I
optional. For example,
• F
in:
PERSON
I
Vehicles owned
a person may own no vehicles or in:
at all; REGISTRATION BOOK
~ VEHICLE
L a vehicle may be new and unregistered. In general, relationship
an entity has a condition
For a particular participates
associated with each
in which it may participate. entity occurrence
to true or false according
is
this condition
to whether
in the relationship.
evaluates
or not the occurrence
205
In one-to-many or many-to-many relationships, to attempt to quantify the "many". in which it participates
it is necessary
Thus for each relationship
in a "many" role, an entity has
associated with it a population.
Because data analysis itself
proceeds in a number of stages of refinement,
it is desirable
to permit the expression of this population in several forms, e.g. *
- many
m-n
- range
m,n
- average
n
- absolute value
The population may depend upon a specified population condition. For example,
in
PERSON
1
Vehicles °wned
J~
VEHICLE
I
1
the population
condition may be "any currently owned, or
owned during the past 5 years". There may be any number of relationships between entities. For example: Vehicles owne d Sold by f
R u n - o v e r by PERSON
VEHICLE Passenger in
.~
Previous owners Present occupants Shows only a small
range of
the known relationships between
person and vehicle.
For this reason, it is always important to write a clear definition of each relationship when it is identified.
206
Relationships
may exist between entities
~
e.g. VEHICLE Involved in ~ ' I accident w i t h ~ 4.3
of the same type.
ebuilt using omponents from
Method of Workin @
4.3.1 Develop and discuss
a series of draft entity diagrams,
showing only: Entity, 4.3.2 Maintain
Relationship/description.
in parallel
a working
set of entity and relationship
descriptions. 4.3.3 Commence
for each entity a list of important
4.3°4 Develop this material by examining of the most
important
interaction
of these by discussing
line management
application
at a sufficiently
a broad understanding
attributes.
in turn each of a series systems.
Check-out
the
the overall diagram with high level for there to be
of all the features.
4.3.5 Document what should happen,
rather than what actually
takes
place. 4°3.6 When entities identified,
and relationships
seem fairly stable and well-
refine and augment the data model by showing in
diagram form: .
Entities,
identifiers
Relationships,
description,
populations
and conditions.
Such a digram can be called an entity diagram or entity model. 4.3.7 Supporting
documentation
should now include:
Entity Definitions. Attribute
lists for entities°
Attribute
definitions,
Relationship
including meaning of coding systems.
descriptions,
An overall name-directory, and relationships,
populations
and conditions.
showing entities,
organised
attributes
for easy reference.
207
4.3.8 The complete set of information
required to document the
corporate data model is: Entity Data Name, description,
Definition
existence Ownership
synonyms,
rules
Where appropriate,
that division
of the
organisation
responsible
for all occurrences.
NB Ownership
is more often appropriate
specific occurrences;
this is an
application-dependent
requirement.
Statistics
Expected population
Relationships
Degree,
to
Overall growth rate Integrity
sequence,
nature/meaning
Privacy
requirements Archival
Availability Number of versions
or time-span
to be
covered Of these elements, approximate
only definition,
statistics
relationships
and
are relevant to initial formulation
of
the entity model. Attribute
Data
Definition
Name, description,
synonyms,
existence
data type and characteristics, structure,
permissible
rules,
coding
time-lag between
event and updating of the value. Ownership
(Where different
Statistics
from the entity).
Expected population Overall growth rate
Relationships
Cross-reference
to entities
Derivation/consistency External
formats
Inputs,
rules
outputs
Integrity
Privacy,
Controls
Default value, Tolerance
Availability,
Of this data, only definition the entity model.
Valida£ion
rules,
on accuracy.
is relevant to formulation
of
208
Relationship Data Populations
if l:n or m:n.
Conditions under which entity occurrrences particpate. Conditions governing populations. Description of the relationship.
5.
FUNCTIONAL ANALYSIS
5.1
Objectives To identify and define the requirements
for particular
application systems.
To produce for each system identified: i) ii)
A functional requirements
definition.
A definition of the data model appropriate to the application function.
iii)
A definition
of likely ranges for system performance
and usage traffic. This section concentrates on items 5.2
(ii) and (iii).
Relation to Other Processes This consists of definition of the processing requirements of each application function, conceptual data model. with data analysis,
and definition of the associated
This process proceeds in parallel
and close co-ordination
is necessary.
It is taken as read in this paper that initial selection of application areas is subject to rigorous examination by senior management.
A system is only developed if its value outweighs
the cost of having it;
in the
(normal)
optimum use of scarce resourcest urgencyt
situation of making
the systems having highest
and the most favourable value/cost ratios are of
course those developed.
209
5.3
Describ£ng the Functional Data Mode ! Each functional data model will be a subset of the corporate data model, with additional information added, concerning: -
-
-
-
patterns of usage; access paths; attribute subsets actually used; sequencing and selection criteria.
In detail, for each entity, it is accessed,
and for each function in which
information required is:
Attribute subset accessed
Frequency
How often, hit distribution
Turnaround
Time, tolerance
Access paths
Relationships used, key attributes, ordering,
selection/search
criteria
Traffic by
Retrieval rate, Creation rate, Modification
access path
rate, Deletion rate.
Integrity
(over and above those defined in data
Controls
analysis).
Of this data, attribute subset and access paths are relevant to refinement and validation of the entity model;
the
remainder have more bearing upon database and storage design. For each attribute,
and for each function in which the
attribute is used as a data item: Us age
Source, destination,
Traffic
Retrieval rate, Creation rate, Modification rate, Deletion rate.
usage mode
210
Input format
Decoding
Output format
Report heading~
Internal format
Base, scale, precision
Integrity
(Over and above those defined in
editing,
encoding
basic data analysis).
Controls Relationships
Derivation/Consistency
rules.
Format information is relevant to potential generation of text or code from the information if it is stored in a data dictionary system. 5.4
Functional
Data Model Diagrams
The entity diagram for the relevant subset of the corporate data model should be augmented to show: Access paths, sequencing and selection criteria, keys,
approximate populations,
approximate access traffic.
The remainder of the information specified in 5.2 should be documented in narrative or tabular form.
6.
VALIDATION AND REFINEMENT OF THE DATA MODEL
6.1
Purpose Although the specification of corporate and functional data models is co-ordinated, different reasons, ways.
their features are present for
and they may be inconsistent in several
For this reason they should be cross-checked rigorously.
211
It is also desirable, having d e v e l o p e d these models on a n e c e s s a r i l y p a r t l y s u b j e c t i v e basis, to apply r e l e v a n t formal m e t h o d s of analysis to the results.
In
this way it is p o s s i b l e
to find the s i m p l e s t versions of the data structures.
6.2
Stages of V a l i d a t i o n and R e f i n e m e n t
i)
Completeness Does the corporate data m o d e l contain features to a c c o m m o d a t e all r e q u i r e d functional data models?
ii)
Consistency Is the c.d.m,
self-consistent?
Is the c.d.m,
c o n s i s t e n t w i t h each f.d.m.?
Is each f.d.m,
iii)
consistent w i t h all the others?
Access paths Access paths and their relative importance should be r e p r e s e n t e d in the c.d.m.
R e l e v a n t features of an
access path are: -
Traffic for - add, modify,
-
Search keys
-
Selection criteria
-
iv)
retrieve,
delete.
P o p u l a t i o n from w h i c h s e l e c t i o n is to take place S e q u e n c i n g requirements
Normalisation Reduction of data structures to 3rd normal form is a p o w e r f u l tool for e l i m i n a t i o n of u n n e c e s s a r y complexity, and for finding the simplest p o s s i b l e form of those structures.
However,
such a form is not n e c e s s a r i l y the clearest
or the m o s t a p p r o p r i a t e either for e n d - u s e r s or for programming.
Hence after n o r m a l i s a t i o n has b e e n
carried out, it may be a p p r o p r i a t e to "de-normalise" the structures by recreation of h i e r a r c h i e s
and
r e p e a t i n g groups, w h e r e this can be firmly j u s t i f i e d on the grounds of usefulness
and clarity.
212
7.
SYSTEM DEFINITION
7.1
Objectives
This is the stage at w h i c h a p a r t i c u l a r system is d e s i g n e d at an o v e r a l l level.
C o n s t i t u t e n t w o r k units
are identified,
are e m b e d d e d in a control f r a m e w o r k w h i c h constrains
and
the
s y s t e m to operate in a m e a n i n g f u l manner.
7.2
I d e n t i f i c a t i o n of P r o c e s s e s
"Natural"
units of w o r k are i d e n t i f i e d from the r e q u i r e m e n t s
specification.
These units,
c o m p u t e r programs,
as yet, b e a r no r e l a t i o n to
but instead r e p r e s e n t the user's v i e w of
s p e c i f i c i n t e g r a l jobs w h i c h the system is p e r f o r m i n g for him.
E x a m p l e s of p r o c e s s e s are: p r o d u c t i o n of a report; .
the a p p l i c a t i o n of a s p e c i f i c
t r a n s a c t i o n to the
database; the computation,
e.g. of a sales forecast,
according
to a s p e c i f i c algorithm.
Characteristics
of p r o c e s s e s
are:
they operate upon data; outputs
thus process
inputs
and
are "logical records", w h i c h may be:
transactions,
reports,
units of the database,
t r a n s i e n t data item g r o u p i n g s
in memory.
Thus also W the o p e r a t i o n of the s y s t e m may be r e p r e s e n t e d by a data flow chart showing related p r o c e s s e s c o n n e c t i n g data paths.
and their
(Figure 3.1 in Section
3 is an
e x a m p l e of such a flow chart.)
W h e r e there is value
in doing so, p r o c e s s e s may be t h e m s e l v e s
s u b d i v i d e d into processes; of processes,
(e.g. c o n s i d e r i n g the example above
p r o d u c t i o n of a report may consist of
c o m p u t a t i o n of a n u m b e r of forecasts, some order).
and their ranking in
213
The subdivision of processes in a data-flow sense loses its values at a point where: the units of work lose meaning for the user; the relationships between the units of work is more strongly that of a procedural or scheduling nature.
7.3
Design of System Behaviour and Controls
Knowledge of the user's expectations of the system permits the design of a logical framework into which the system processes can be fitted, and which will control their operation in a secure manner.
Particular attention should be paid to useability of those system control parameters provided for users.
7.4
Specification of Processes
English plus decision tables is the most appropriate specification medium.
Data flow diagrams are also appropriate, but
procedural diagrams and charts should be eschewed except where vital, because they tend to impose premature design decisions.
Essential scheduling requirements between and within processes must b e identified and fully defined at this stage.
A strong
mandatory scheduling relationship between work units indicates that they should be regarded as part of a process.
Care should be exercised to avoid jumping the gun on detailed design work through inclusion at too early a stage of "how" decisions on mechanisms for achieving the results required of the processes.
However, certain "how" decisions are appropriate
at this stage:
e.g. selection of computational methods of
performing forecasting or optimisation calculations which are integral parts of the system.
Such decisions are necessary at
this stage because they affect the system's behaviour towards the user, and have an impact upon the data models.
214
7.5
Specificati0nof
Constraints
The value of each process to the system should be ascertained; some p r o c e s s e s justified.
are optional,
and i n c l u s i o n of these has to be
For each function performed,
a l t e r n a t i v e options~ e.g.
simple or sophisticated.
Some form of ranking b a s e d on priority, s h o u l d be carried out,
there may be
cost and f l e x i b i l i t y
in o r d e r to select the actual
c o n s t i t u e n t s of the first version of the system, the g u i d e l i n e s
and to lay
for s u b s e q u e n t d e v e l o p m e n t phases.
T o l e r a n c e s of ranges should be d e f i n e d for: a c c u r a c y of results, performance,
c o n s u m p t i o n of resources
(both d e v e l o p m e n t and operating),
scheduling.
8.
P R O G R A M AND RUN D E S I G N
8.1
Objectives
The p r i m a r y aim of this stage is to select suitable groupings of s y s t e m p r o c e s s e s , sequences.
to be formed into p r o g r a m s and p r o g r a m
The r e s u l t i n g w o r k - u n i t s
are i n t e n d e d to m a k e
e f f e c t i v e use of the resources of the c o m p u t e r by,
for example,
a v o i d i n g u n n e c e s s a r y r e p e a t e d t r a n s f e r of data, w h i l s t at the same time m e e t i n g the p e r f o r m a n c e
8.2
requirements
specified.
Method
E a c h s y s t e m process should have defined:
i) ii) iii)
iv)
Usage frequency; D e s i r e d t u r n a r o u n d time; Data access requirements,
i n c l u d i n g sequencing;
S c h e d u l i n g r e l a t i o n s h i p to - date/time, -
o t h e r processes.
215
There is a t r a d e - o f f to be e x e r c i s e d b e t w e e n k e e p i n g p r o c e s s e s separate, w h i c h makes i n e f f i c i e n t use of the computer, but w h i c h retains flexibility, and the binding t o g e t h e r of p r o c e s s e s into p r o g r a m s w h i c h can m a k e m o r e e f f e c t i v e use of the computer, but W h i c h w i l l need to be r e d e s i g n e d if new p r o c e s s e s are introduced.
Stages of the p r o c e d u r e are: i)
Choose p r o g r a m groupings a c c o r d i n g to: I/O - Database access -
Non database "batch" I/O; TP m e s s a g e h a n d l i n g
S c h e d u l i n g Functions Common service functions L i a b i l i t y to be invoked in a c o - o r d i n a t e d m a n n e r (i.e. function occurrences part of the same process). Close similarity of a s s o c i a t e d functional data models Similarity of scheduling requirements - e.g. end-month. ii)
Specify i n t e r - p r o g r a m s c h e d u l i n g relationships;
iii)
Specify overall frequency and scheduling requirements;
iv)
Specify the data-access needs of each p r o g r a m in terms of: -
-
-
-
subset of the schema data model, access paths, sequencing, selection criteria, frequency of access.
9.
S C H E M A DESIGN
9.1
Objective
The aim is to design an e f f e c t i v e i m p l e m e n t a t i o n v i e w of the corporate data model.
The e f f e c t i v e n e s s is judged in teinns of:
Clarity and a p p r o p r i a t e n e s s The p r o v i s i o n of facilities to enable the definition of e f f e c t i v e forms of the n e c e s s a r y sub-schemas. The schema's s u i t a b i l i t y as a basis for the d e f i n i t i o n of an e f f i c i e n t storage structure.
216
9.2
Relationshi~
The details the n a t u r e
Between
of the m e t h o d
of S C H E M A
for S C H E M A
of the f i l e - h a n d l e r
if a c o n v e n t i o n a l distinctions
and S U B S C H E M A
design
or DBMS
file-handler
are d e p e n d e n t
in use.
is used,
upon
In p a r t i c u l a r ,
the drawing
of m e a n i n g f u l
between
SUBSCHEMA
-
SCHEMA
-
-
Storage
may prove
difficult.
The s i m u l a t i o n is d e s c r i b e d
In this to use
The
Design
of s u b - s c h e m a
section~
design methods
in C o d a s y l - t y p e
DBMS.
Schema m u s t be d e s i g n e d
be designed.
This
SUBSCHEMA
the SCHEMA.
At the
Hence
it is m o s t
When m o d i f y i n g SUBSCHEMA
but w h i c h w o u l d
already
one of the m a i n of those
in r e s p o n s e
care must
can
faci$ities
present
inputs
in
to S C H E M A
SUBSCHEMAS.
design
cycles
of
before
care
the S U B S C H E M A S
to a l t e r e d
be e x e r c i s e d
other existing
to take g r e a t
leave
subschemas
of s t r u c t u r a l
that there w i l l be several
the S C H E M A
invalidate
it is n e c e s s a r y which would
1973).
before
to those
and s u b s c h e m a
are a p p r o p r i a t e
are achieved.
requirements,
which would
choise
of the needs
schema
designs
the
is limited
likely
between
satisfactory
file h a n d l e r
are p r o p o s e d w h i c h
in p r i n c i p l e
same time,
is a d e f i n i t i o n
iteration
a conventional
(CODASYL DDLC
is b e c a u s e
in a Codasyl
design
using
in 10.2.
or n e w
to a v o i d
SUBSCHEMAS.
to avoid
subtle
changes,
and DML s y n t a c t i c a l l y
a l t e r the semantics.
changes
In p a r t i c u l a r ,
valid,
217
9.3
M e t h o d for Design of a CODASYL SCHEMA
9.3.1 T r a n s l a t i o n of the entity r e p r e s e n t a t i o n of the corporate data m o d e l involves e x e r c i s e of the following choices: R e p r e s e n t a t i o n of entities: -
-
as record-types, as several record types, a s s o c i a t e d in some way, as group items w i t h i n a record type.
R e p r e s e n t a t i o n of relationships: -
-
as set types, as several set types, p o s s i b l y w i t h a d d i t i o n a l record types, e.g.
as is r e q u i r e d for m a n y - m a n y
relationships, -
as repeating groups, as loose associations of records, e.g. by common search keys.
A s s o c i a t i o n of attributes with entities.
9.3.2 The information p r o d u c e d in the design process can be c l a s s i f i e d by : a)
To be encoded in schema DDL i) ii) iii)
R e l e v a n t to structure e.g. RECORD,
SET, A R E A
R e l e v a n t to m o d e of use e.g. ORDER, KEY R e l e v a n t to storage design e.g. L O C A T I O N MODE, ACTUAL, V I R T U A L
b)
To be d o c u m e n t e d f o r reference by application programmers. e.g.
Record occurrence rules Record r e t r i e v a l modes Record s e l e c t i o n rules P r o c e s s i n g n e c e s s a r y to m a i n t a i n c o n s i s t e n c y w h e n each record is STORED, M O D I F Y e d or DELETEd.
A subset of the type a) information can u s e f u l l y be shown in d i a g r a m form.
Successive, not n e c e s s a r y complete,
forms of
such a d i a g r a m can be used as w o r k i n g notes during the design process.
218
9.3.3 Choice
of Records
The m o s t b a s i c the r e c o r d
and Sets
choice
types
in S C H E M A
design
is ~'what are to be
and set types".
Stages: i)
Make each e n t i t y - t y p e attributes Represent
ii)
relationships
Many-to-many will y i e l d a CODASYL
a record/set DBMS.
It is p o s s i b l e from stages
This
into
making
between
which
alternative
several (ii).
There
s h o u l d be c a r r i e d
for c h o o s i n g
between
them.
(i) above,
is i n v a l i d
by b r e a k i n g
two o n e - t o - m a n y
that
all
the record.
as set types.
as in
"intersection"
4.2.
structures
will
is no h a r m in this;
forward until
The design
and are n o w subject
in
any such
relationships,
use of an a d d i t i o n a l in Section
draft"
entities
structure
as i n d i c a t e d
(i) and
including
treated
is o v e r c o m e
alternatives
a "first
type,
data items w i t h i n
relationships,
relationship possibly record,
a record
of the e n t i t y
there
the
are grounds
(or designs)
to a n u m b e r
result
represent
of stages
of
refinement.
iii)
Where have
different different
subschemas,
access
items
into
iv)
Where~
within
variable
the e l e m e n t s
records
of a set,
record types
occurrences
structure type.
access
record
in d i f f e r e n t
paths),
types
there
consider
are redundant,
is a r e p e a t i n g
population,
of the r e p e a t i n g
group
the DBMS
group
it m a y be b e t t e r the m e m b e r
responsible
for
problem.
are v e r y
are c o m m o n d l y
be w o r t h w h i l e
are used
record types.
and thus make
the space m a n a g e m e n t
of an entity
in some way.
a r e c o r d type,
to make
Where
to these
consistent
w h i c h has a larger
v)
(i.e.
different
2 or m o r e
common
and m u s t b e kept
of the a t t r i b u t e s
needs
a n d / o r have
their d i v i s i o n
NB Data
subsets
and several
accessed
to c o n s o l i d a t e
is r e p r e s e n t e d
small,
together,
record
by g r o u p
types,
record
then
it may
so that the
items w i t h i n
a record
219
vi)
Where a r e l a t i o n s h i p is used rarely or not at all it may be that the overheads i n v o l v e d in sets are not justifiable. eliminated.
In this case, Instead,
the set-type can be
the record o c c u r r e n c e s can be
a s s o c i a t e d by a selection process b a s e d on like data item values. vii)
G r o u p i n g of several record types by m a k i n g t h e m members of the same set type is appropriate w h e r e this reflects the common patterns of access.
9.3.4 Choice of d e c l a r e d record keys, and record sequences in sets~ is g o v e r n e d by the most p o p u l a r sub-schema usage.
9.4
I n f o r m a t i o n from Data A n a l y s i s
Choice of record types, and of the access paths to them, is g o v e r n e d by the relative f r e q u e n c y w i t h w h i c h a group of items is a c c e s s e d in a p a r t i c u l a r manner.
Thus an analysis p r o c e d u r e is r e q u i r e d which: i)
For each item, determines the relative p o p u l a r i t y of d i f f e r e n t access paths to it.
ii)
For each access path,
lists, ordered by popularity,
the data items to w h i c h access is requested.
Storage structure design
(Section ii) requires that these
figures should be d i s s e c t e d by: i)
ii)
Type of access batch,
w h e r e economy of resources is p a r a m o u n t
online,
where response time is p a r a m o u n t
Insert/delete/modify/retrieve
operation.
220
i0.
SUBSCHEMA DESIGN
10.1
Objectives
The aim of this stage is to achieve a p r o g r a m - v i e w of data w h i c h is: i) ii)
A p p r o p r i a t e to the needs of the program. U n l i k e l y to be s e n s i t i v e to a l t e r a t i o n s in other systems,
iii)
10.2
the schema,
or the storage structure.
C o n s i s t e n t w i t h t h e schema,
and derivable from it.
Method
10.2.1 Using c o n v e n t i o n a l data , man agemen ~
By "conventional"
is m e a n any system in w h i c h p r o g r a m s n o r m a l l y
access d i r e c t l y real files r e s i d e n t on b a c k i n g storage. i)
For each r e c o r d - t y p e
in each file on backing storage~
w r i t e a s u b r o u t i n e to p e r f o r m each of four functions: -
create; delete;
-
modify; read.
These s u b r o u t i n e s s h o u l d embody any p r o c e s s i n g n e c e s s a r y to m a i n t a i n consistency. ii)
Design the p r o g r a m v i e w of data,
i n c l u d i n g selection
and s e a r c h procedures. iii)
Define the data m a i n p u l a t i o n o p e r a t i o n s to be carried out by the program.
iv)
Write subroutines
for
(ii), u s i n g
(i) as p r i m i t i v e
operations.
i0~2o2 U s i n g a Codas~l style DBMS
The p r o b l e m is to t r a n s l a t e a s u b - s c h e m a requirement, terms of access to the entities data models~
framed in
in p o s s i b l y several f u n c t i o n a l
into a r e c o r d / s e t subschema, w i t h a s s o c i a t e d
access path details.
As stated in Section 9, Schema and S u b s c h e m a design are closely interrelated.
221
Select a subset of the schema w h i c h m o s t closely meets the subschema needs.
(This is p r o b a b l y best shown in d i a g r a m
form, b a s e d upon the conventions shown in Figure 9.1).
If this is u n s a t i s f a c t o r y ,
d e t e r m i n e w h e t h e r the d i s c r e p a n c y is
to be met by m o d i f i c a t i o n of the schema, or by special p u r p o s e p r o g r a m m i n g to derive the desired data from a feasible subschema.
Any m o d i f i c a t i o n of the schema should be s u b j e c t to the safeguards m e n t i o n e d in Section 9.2.
There will always be a certain amount of p r o c e s s i n g w h i c h c a n n o t be e x p r e s s e d in schema or s u b s c h e m a DDL.
This falls into two classes: i)
P r o c e s s i n g w h i c h is g e n e r a l l y applicable to operations c a r r i e d out on a p a r t i c u l a r p a r t of the database; e.g. m a i n t e n a n c e of c o n s i s t e n c y w h e n a p a r t i c u l a r record is created or deleted.
This is best e m b o d i e d
in general purpose subroutines, which are made available as part of the database documentation. ii)
P r o c e s s i n g w h i c h is specific to a p a r t i c u l a r program; e.g. s e l e c t i o n criteria applied on a p a r t i c u l a r access path.
Both sorts of special purpose p r o c e s s i n g should be clearly documented;
the latter may be shown on the s u b s c h e m a structure
diagram.
ii.
STORAGE S T R U C T U R E DESIGN
ii.i
Objectives
The aim of this stage is to design a r e p r e s e n t a t i o n of the database on b a c k i n g storage which e f f i c i e n t l y meets the s p e c i f i e d p a t t e r n of access and usage,
and which also keeps
w i t h i n s p e c i f i e d constraints upon use of resources.
222
11.2
Statistics Required
The p e r f o r m a n c e of the
o v e r a l l c o m b i n a t i o n of database,
and a p p l i c a t i o n p r o g r a m s
is e x p e c t e d to be "optimal"
DBMS
in
some way.
Some c o m b i n e d function of: -
b a c k i n g storage; m a i n storage;
-
-
channel time; p r o c e s s o r time;
is to be minimised.
Of these only b a c k i n g storage is not d e p e n d e n t upon usage traffic.
The two n e c e s s a r y sets of i n f o r m a t i o n
for the d e s i g n process
are: The r e l a t i v e w e i g h t i n g s of the four m a i n resourees to be conserved. S t a t i s t i c s defining: -
for each database
record
frequency and hit d i s t r i b u t i o n of: add; delete; modify; retrieve o p e r a t i o n s -
for each access path used in the database, f r e q u e n c y of use in each of batch and on-line modes.
i1.3
M e t h o d of support__for access paths
Access paths should be ranked a c c o r d i n g to their level of usage. The m o s t h e a v i l y u s e d paths s h o u l d then be given h i g h e s t p r i o r i t y for e f f i c i e n t access.
On-line access paths
require rapid response,
cost in c o n s u m p t i o n of resources. require support
at a m i n i m a l
p o s s i b l y at some
B a t c h - m o d e access paths
c o n s u m p t i o n of resources,
expense of e l a p s e d time if necessary.
at the
223
Heavily
used
access
paths
s h o u l d be s u p p o r t e d
indexes
and pointer mechanisms
wh e r e
by a p p r o p r i a t e
the a c c o m p a n y i n g
overheads
can b e justified.
11.4
Summary
of Choices
For a Codasyl-style exercised
Database
DBMS,
the range
at the stage of storage
Access
design
mode
retrieval
Real or v i r t u a l
- CALC or VIA.
modes
to support,
representation
Set r e p r e s e n t a t i o n
mode
and linkage
S e a r c h keys.
Usage
A R E A placement. Schema
Storage
record
to storage mapping;
-
d a t a item r e p r e s e n t a t i o n ,
-
data item distribution.
Characteristics
Device Page Space
type.
size. allocation;
-
amount
of overflow,
-
growth
rate.
and h o w efficiently.
of d e r i v e d
Indexing.
Storage
are:
Strate~[
Record placement Record
of choices w h i c h m u s t be
data.
options.
224
ACKNOWLEDGEMENTS
Although
the impetus
for this w o r k has come from several years
e x p e r i e n c e g a i n e d on a v a r i e t y of p r o j e c t s , d o c u m e n t it in its p r e s e n t
u n d e r c o n t r a c t to the M i n i s t r y of Defence. my MOD c o l l e a g u e s
the e n c o u r a g e m e n t to
form o c c u r r e d while the author was Thanks are due to
for this encouragement.
The author is also e s p e c i a l l y i n d e b t e d to Bill W a g h o r n of SCICON and Ian P a l m e r of CACI, who have b o t h c o n t r i b u t e d s i g n i f i c a n t l y to the ideas t h r o u g h e x t e n s i v e of this paper,
discussions.
For comments upon drafts
thanks are due to m a n y c o l l e a g u e s w i t h i n the
M i n i s t r y of Defence,
and ICL.
REFERENCES
i.
ANSI X3 S P A R C / D B M S Study G r o u p I n t e r i m Reports F e b r u a r y 1975.
2.
C O D A S Y L DDLC Journal
3.
C O D A S Y L C O B O L J O D as m o d i f i e d by the D B L T G D a t a b a s e
4.
Brown, A.P.G.
5.
Robinson,
F a c i l i t y Proposal,
of D e v e l o p m e n t 1973.
R e v i s e d 1975.
M a r c h 1975.
"Entity M o d e l l i n g "
IFIP TC/2 C o n f e r e n c e 1975
(Pub. N o r t h H o l l a n d Book Co.) K.A.
" D e s c r i p t i o n of stored data, using m o d i f i e d
NCC Standards"~
Private
6.
Bramhill,
and Taylor G.
7.
B S C / C O D A S Y L DDLC DBAWG June 1975 Report.
P.S.,
to Codasy!".
communication,
1974.
"Database Design.
F r o m Codd
To be p u b l i s h e d in "Database Journal",
1976.