System Analysis Process

  • Uploaded by: Goli
  • 0
  • 0
  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View System Analysis Process as PDF for free.

More details

  • Words: 11,386
  • Pages: 32
DATABASE SYSTEMS ANALYSIS AND DESIGN

E.E. Tozer, Software Sciences Limited April 1976

ABSTRACT Developments in the database field have tended to emphasise programming technology, with a dearth of accompanying progress in systems analysis and design methods. This paper puts forward an overall view of system design which is intended to act as a constraining framework. It is based upon a pragmatic approach and is presented in a form which could be (and is being) used on large scale implementation projects. Orderly analysis and design procedures are encouraged. The taking of premature design decisions is discouraged, especially through the recognition of three distinct views of data: conceptual, implementation and storage, and through recognition of distinctions between design of each of these, specification of mappings between them, and design of programs and run sequences. It is envisaged that specific procedures developed elsewhere (see references) could be incorporated into the methodology described here.

Software Sciences Limited Abbey House 282/292 Farnborough Road Farnborough,

Hampshire

Telephone:

44321

Telex:

858228

194

i.

INTRODUCTION As the computer systems of an organisation develop, there is an increase in the degree of overlap between originally distinct systems.

Further,

increased confidence and familiarity leads

to the individual systems being enriched in power and sophistication.

The combined result is a vast increase in the

complexity of the system design task, at that very point in time when greater reliance is being placed upon the performance,

reliability and accuracy of the computer systems.

The programming aspects of this situation have been recognised for some time, and have been tackled with partial success in the form of "Database Management" systems analysis

systems.

and design procedures

both by the complexity of requirements, developing programming technology. overall view of the processes

Traditional

are being outstripped and also by the

This paper proposes an

involved in analysis and design,

with particular emphasis upon the data, as opposed to the processing aspects.

Different, partially overlapping,

aspects

of this field have been tackled in a theoretical manner by Bramhill

& Taylor

many others.

(1975), Brown

(1974), Robinson

(1974), and

Whether or not a theoretical basis exists for

the taking of a design decision,

a system designer "in the

field" has to take that decision,

and he had to live with the

results. Rather than offering precise formulae for each and every stage of the progress,

this paper puts forward an overall framework,

which identifies

important stages,

distinctions between them.

and draws the necessary

This fremework is most important,

as it ensures that premature design decisions are avoided, and gives direct guidance as to the actual sequence and purpose of particular analysis and design procedures. intention to over-constrain

There is no

the particular techniques or

their variants which individual practitioners would embed in this framework;

the only criterion is that the technique

should be adequate for the purpose, overall methdology.

However,

and consistent with the

in the interest of being as

specific and as helpful as possible,

descriptions of, or

195

references to, suitable techniques are included where appropriate.

The analysis and design approach proposed is seen as being applicable to DP systems in general.

Only in the latter

stages does it become dependent upon use of a particular DBMS.

Projects using conventional files or non-Codasyl

DBMS would benefit considerably from adoption of the design approach put forward, with appropriate variants in the latter stages. Section 2 explains the overall viewpoint taken, which is that of viewing systems analysis and design themselves

as a system.

This system is explained in detail in section

3, and each of

the main processing functions is examined in turn in more detail in sections 4 to ii. Except where it is explicitly stated otherwise,

flowcharts

in

sections 2 and 3 show data flow:-

Signifies a data container

(file,

document etc).

name

J

Signifies a process.

Signifies data flow, which may only be from container to process or process to container.

2.

VIEWS OF DATA

There are several distinct views of data which are relevant to the process of analysis and design.

This view is not

coincident with that adopted by ANSI SPARC/DBMS Study Group (1975). The areas o f classification chosen are:-

196

(i)

Level of a b s t r a c t i o n a)

Conceptual The v i e w of data h e l d by the organisation.

This

v i e w is p u r e l y a f u n c t i o n of the m o d e of o p e r a t i o n of the o r g a n i s a t i o n and the p o l i c i e s of its managers.

It is i n d e p e n d e n t of the e x i s t e n c e of

any c o m p u t e r systems,

and s h o u l d be e x p r e s s e d in a

form m o s t s u i t a b l e to e n d - u s e r s b)

in the organisation.

Impiementatign All or some subset of the c o n c e p t u a l d a t a v i e w m a y be u s e d by c o m p u t e r systems.

The i m p l e m e n t a t i o n

v i e w is a r e p r e s e n t a t i o n of that subset. d e s i g n e d and e n c o d e d into m a c h i n e - r e a d a b l e this purpose.

form for

The i m p l e m e n t a t i o n v i e w remains as

i n d e p e n d e n t as p o s s i b l e c)

It is

from c, the storage view.

St0rage A c t u a l data,

d e s c r i b e d to p r o g r a m s by b, m u s t be

h e l d on backing store in some form.

The storage v i e w

is a complete d e s c r i p t i o n of such d a t a on backing store.

(ii)

B r e a d t h of v i e w a)

Global,

or corporate.

The c o m p l e t e v i e w of data,

r e l e v a n t to all p r o c e s s e s

c a r r i e d out in the o r g a n i s a t i o n .

b)

Specific,

or functional.

The v i e w of data r e l e v a n t to a p a r t i c u l a r

function of

the o r g a n i s a t i o n .

(iii)

Existence W h e t h e r or not it is n e c e s s a r y for o c c u r r e n c e s of data items to exist in some p a r t i c u l a r form.

197

VIEWS OF DATA

FunctionalView

CorporateView

CONCEPTUAL

I

subset

I

~I

q CONCEPTUAL

I Selection/ design

design

subset

IMPL~TATION

I

Re-group~

I0N '~IIMPL~I~WfAT "SUBSCH~NA"

"SCHWA" /

I design

/

/

TRANSLATION

STORAGE

Fig.2.1

198

The c o m b i n a t i o n of these views of data is shown in Figure 2.1.

The terms S C H E M A and S U B S C H E M A are d e f i n e d by C O D A S Y L D D L C (1973).

They are used s u b s e q u e n t l y in this paper in the

sense shown in 2.1. All of the data d e s c r i b e d by the Storage v i e w m u s t actually e x i s t on b a c k i n g storage.

The d a t a d e s c r i b e d by the S U B S C H E M A

view must e x i s t on an a s - r e q u i r e d basis.

There is no

r e q u i r e m e n t that d a t a d e s c r i b e d by the other 3 views should p h y s i c a l l y exist at all. "translation"

D a t a b a s e access is e x p r e s s e d by the

arrow b e t w e e n the s u b s c h e m a and storage views.

The DBMS makes use of s p e c i f i e d m a p p i n g s b e t w e e n storage and schema and b e t w e e n schema and subschema, to carry out

(hopefully)

o p t i m i s e d translation.

Design should c o m m e n c e w i t h the c o n c e p t u a l view, to the storage view.

and progress

For any g i v e n c o n c e p t u a l v i e w it should

be p o s s i b l e to choose m a n y i m p l e m e n t a t i o n views.

For any

given i m p l e m e n t a t i o n v i e w it s h o u l d be p o s s i b l e to choose m a n y storage r e p r e s e n t a t i o n s .

A S U B S C H E M A v i e w is u n a f f e c t e d by any S C H E M A change which results in a new S C H E M A of w h i c h the S U B S C H E M A remains a subset.

C u r r e n t d e v e l o p m e n t s by the Codasyl DDLC include d i v i s i o n of S C H E M A DDL into a n u m b e r of categories. categories

Some of these

are r e l e v a n t to the d i s t i n c t i o n b e t w e e n i m p l e m e n t a t i o n

and storage views of data.

The m e a s u r e m e n t ,

tuning,

resource

a l l o c a t i o n and storage c a t e g o r i e s b e t w e e n them relate to the storage v i e w of data.

The CODASYL DDLC D B A W G

(1975) have p r o p o s e d a m o r e

complete

s e p a r a t i o n of the storage v i e w into a Storage D e f i n i t i o n Language

(DSDL).

199

3.

THE ANALYSIS AND DESIGN PROCESS

3.1 Introduction This section outlines the method of analysis and design which is discussed in more detail in the remainder of the paper. Figure 3.1 shows the process in flow-diagram form, and can be regarded as a guide to the paper. Analysis and design is iterative.

The need for iteration

pervades almost every data path shown in 3.1.

It is not shown

explicitly because to do so would make the diagram excessively complicated. 3.2 Relationship betwee n Anal[sis and Design Traditionally, where system design is recognised as taking place at all, it is regarded as a process which occurs after analysis is complete,

and before programming commences.

It

is more useful to regard both analysis and design as each taking place at a n u ~ e r

of different levels of refinement.

At each level, the order is analysis or some blend of the two) requirements,

(or hypothesis

leading to definition of functional

leading to design of a process to meet those

requirements.

During the design process, refinement,

at each successive stage of

additional opportunities may be realised,

and the

unpleasant truth may emerge concerning the severity of some constraints.

Either at one stage, or through a number of

stages, iteration may take place, by means of presenting the originator of the requirements with the highlights results of the design process, to modify the requirements

of the

and offering the opportunity

in the light of these.

3.3 The Nature of Design The design process consists of two distinct stages: of a range of possible solutions to a problem, from the range of a best-fit,

generation

and selection

according to a pre-defined set

200

of criteria.

Failure to achieve a g o o d enough solution may

result in i t e r a t i o n i n v o l v i n g a l t e r a t i o n of some of the initial conditions

and r e p e t i t i o n of both stages.

3.4 D e v e l o p m e n t Stages

The p r o c e s s e s d e s c r i b e d in 3.2 and 3.3 take place at each of a n u m b e r of stages, refinement.

representing successive

levels of

Some of these stages are r e l a t e d to the d i f f e r i n g

levels of a b s t r a c t i o n of v i e w s of data.

THEANALYSIS &DESIGN PROCESS THEORGANISATIONI iANDATAIs FUNNCTITINsAL~ ~~b,,~~ #"

! I

/ [!ALIDATION OFi DATAMODEL

I

DESIGN USER ~ INTEP~AC)

1

ND ~ ~ R U N DESG IA N

STORAGE SCHEMA STRUCTURE DESIGN

POP

SIMULATION OR LIVE RUNNING FIG.3-1

202

4.

DATA ANALYSIS

4.1 O b j e c t i v e s To a s c e r t a i n the objects

(concrete or abstract), w h i c h

are m e a n i n g f u l to the systems of an organisation. To d i s c o v e r the nature

and r e l e v a n t of all r e l a t i o n s h i p s

b e t w e e n such objects. To define the p r i v a t e language of the organisat~on. Thus to p r o v i d e a c o m m o n l y a c c e p t e d t e r m i n o l o g y w h i c h can be u s e d to specify systems and procedures. To i d e n t i f y each data item, different manifestations

and to d i s t i n g u i s h b e t w e e n

of the same item

(e.g. d i f f e r e n t

coded forms). To i d e n t i f y and e l i m i n a t e synonyms. To fully define, u n i f y and r a t i o n a l i s e To b u i l d a "where used"

index,

coding systems.

so that the

r a m i f i c a t i o n s of an a l t e r a n t i o n to a p r o c e d u r e or to the role of a piece of data can be easily explored.

4.2 T e r m i n o l o g y

and C o n v e n t i o n s

The m e t h o d of w o r k i n g p r o p o s e d takes the v i e w p o i n t that systems exist to serve the end-user.

Thus w h a t the e n d - u s e r wants,

and how he sees his e n v i r o n m e n t of information.

is the most i m p o r t a n t source

His p e r c e p t i o n of his role,

and his w o r k i n g

e n v i r o n m e n t is n e c e s s a r i l y p a r t l y subjective.

It follows that

the a p p r o a c h to data m o d e l l i n g needs to cope w i t h this subjectivity.

The c o r p o r a t e data m o d e l

is framed in terms of "things"

r e l a t i o n s h i p s b e t w e e n things.

and

The terms entity and

r e l a t i o n s h i ~ have b e e n adopted as being w i d e l y current for d e s c r i b i n g the c o n c e p t u a l v i e w of data.

E n t i t y is d e f i n e d as a person, place, to the enterprise.

a s u b j e c t i v e process. d i r e c t e d by e n d - u s e r s be meangful. entities, accepted.

thing or event of i n t e r e s t

As such, its i d e n t i f i c a t i o n m u s t be p a r t l y S e l e c t i o n of entities must be p r i m a r i l y of the p r o p o s e d systems,

to w h o m they m u s t

It is i m p o r t a n t to w r i t e clear definitions

and to ensure that these d e f i n i t i o n s

of

are g e n e r a l l y

203

It is necessary to distinguish between entity types and entity occurrences.

An example of an entity type is "Vehicle".

Occurrences of this type are: "Blue Ford Cortina XYZ 132K" "Red Honda, Motorbike ABC 789J" Entity types may be shown on data relationship diagrams as named rectangular boxes.

t VEHICLE J

e.g. Relationships

exist between entities.

(Use of the term

relationship does not imply any connection with the specialised terminology of the relational data model). Relationship may be 1 to i, 1 to many, or many to many where 1 or "many" refers to the number of occurrences of the entity type which may b e i n v o l v e d

in the relationship.

They can be shown in the following way:

J Eois ioI I

i to l

..[~0o~ ...

1 to many

PERSON

Vehicle owned

~I VEHICLE

]

(i.e. with the arrow-head on the many end)

~ i CLE k - - ~ ~OOATION ~O~I l" "ISERvIcE I

many to many

Very often, many to many relationships two one-to-many relationships.

e.!g.

or

I

VEHICLE

can be analysed into

~ Location~where it can be serviced

.......

Vehicles which can

~ L O C A T I O N FOR I SERVICE

. . . . . .

be serviced

VEHICLE

I

LOCATION SERVICE FOR I

Where~it can be serviced\ ~

/ V e h i c l e s which can be serviced

~E~VICING O7 A VEHIC~E | AT A LOCATION

204

may be more appropriate particular

application

representations

for the purpose

of a

system.

An entity must have at least one type of data value associated with it, and will usually have several.

These data items types

are termed attributes. For example

the entity type vehicle

types colour,

chassis-no,

registration-no,

seating

may have the attribute

engine-no, capacity,

date of purchase,

weight,

and possibly many

others. One or more of the attributes

of an entity may have unique values

which can be used to distinguish of the entity.

This attribute

between different occurrences

or collection

called the identit_~ of the entity. be identified by chassis-no, Participation

of attribures

For example,

vehicle may

or by registration-no.

of an entity in a specified

relationship

may be

~VEHICLE

I

optional. For example,

• F

in:

PERSON

I

Vehicles owned

a person may own no vehicles or in:

at all; REGISTRATION BOOK

~ VEHICLE

L a vehicle may be new and unregistered. In general, relationship

an entity has a condition

For a particular participates

associated with each

in which it may participate. entity occurrence

to true or false according

is

this condition

to whether

in the relationship.

evaluates

or not the occurrence

205

In one-to-many or many-to-many relationships, to attempt to quantify the "many". in which it participates

it is necessary

Thus for each relationship

in a "many" role, an entity has

associated with it a population.

Because data analysis itself

proceeds in a number of stages of refinement,

it is desirable

to permit the expression of this population in several forms, e.g. *

- many

m-n

- range

m,n

- average

n

- absolute value

The population may depend upon a specified population condition. For example,

in

PERSON

1

Vehicles °wned

J~

VEHICLE

I

1

the population

condition may be "any currently owned, or

owned during the past 5 years". There may be any number of relationships between entities. For example: Vehicles owne d Sold by f

R u n - o v e r by PERSON

VEHICLE Passenger in

.~

Previous owners Present occupants Shows only a small

range of

the known relationships between

person and vehicle.

For this reason, it is always important to write a clear definition of each relationship when it is identified.

206

Relationships

may exist between entities

~

e.g. VEHICLE Involved in ~ ' I accident w i t h ~ 4.3

of the same type.

ebuilt using omponents from

Method of Workin @

4.3.1 Develop and discuss

a series of draft entity diagrams,

showing only: Entity, 4.3.2 Maintain

Relationship/description.

in parallel

a working

set of entity and relationship

descriptions. 4.3.3 Commence

for each entity a list of important

4.3°4 Develop this material by examining of the most

important

interaction

of these by discussing

line management

application

at a sufficiently

a broad understanding

attributes.

in turn each of a series systems.

Check-out

the

the overall diagram with high level for there to be

of all the features.

4.3.5 Document what should happen,

rather than what actually

takes

place. 4°3.6 When entities identified,

and relationships

seem fairly stable and well-

refine and augment the data model by showing in

diagram form: .

Entities,

identifiers

Relationships,

description,

populations

and conditions.

Such a digram can be called an entity diagram or entity model. 4.3.7 Supporting

documentation

should now include:

Entity Definitions. Attribute

lists for entities°

Attribute

definitions,

Relationship

including meaning of coding systems.

descriptions,

An overall name-directory, and relationships,

populations

and conditions.

showing entities,

organised

attributes

for easy reference.

207

4.3.8 The complete set of information

required to document the

corporate data model is: Entity Data Name, description,

Definition

existence Ownership

synonyms,

rules

Where appropriate,

that division

of the

organisation

responsible

for all occurrences.

NB Ownership

is more often appropriate

specific occurrences;

this is an

application-dependent

requirement.

Statistics

Expected population

Relationships

Degree,

to

Overall growth rate Integrity

sequence,

nature/meaning

Privacy

requirements Archival

Availability Number of versions

or time-span

to be

covered Of these elements, approximate

only definition,

statistics

relationships

and

are relevant to initial formulation

of

the entity model. Attribute

Data

Definition

Name, description,

synonyms,

existence

data type and characteristics, structure,

permissible

rules,

coding

time-lag between

event and updating of the value. Ownership

(Where different

Statistics

from the entity).

Expected population Overall growth rate

Relationships

Cross-reference

to entities

Derivation/consistency External

formats

Inputs,

rules

outputs

Integrity

Privacy,

Controls

Default value, Tolerance

Availability,

Of this data, only definition the entity model.

Valida£ion

rules,

on accuracy.

is relevant to formulation

of

208

Relationship Data Populations

if l:n or m:n.

Conditions under which entity occurrrences particpate. Conditions governing populations. Description of the relationship.

5.

FUNCTIONAL ANALYSIS

5.1

Objectives To identify and define the requirements

for particular

application systems.

To produce for each system identified: i) ii)

A functional requirements

definition.

A definition of the data model appropriate to the application function.

iii)

A definition

of likely ranges for system performance

and usage traffic. This section concentrates on items 5.2

(ii) and (iii).

Relation to Other Processes This consists of definition of the processing requirements of each application function, conceptual data model. with data analysis,

and definition of the associated

This process proceeds in parallel

and close co-ordination

is necessary.

It is taken as read in this paper that initial selection of application areas is subject to rigorous examination by senior management.

A system is only developed if its value outweighs

the cost of having it;

in the

(normal)

optimum use of scarce resourcest urgencyt

situation of making

the systems having highest

and the most favourable value/cost ratios are of

course those developed.

209

5.3

Describ£ng the Functional Data Mode ! Each functional data model will be a subset of the corporate data model, with additional information added, concerning: -

-

-

-

patterns of usage; access paths; attribute subsets actually used; sequencing and selection criteria.

In detail, for each entity, it is accessed,

and for each function in which

information required is:

Attribute subset accessed

Frequency

How often, hit distribution

Turnaround

Time, tolerance

Access paths

Relationships used, key attributes, ordering,

selection/search

criteria

Traffic by

Retrieval rate, Creation rate, Modification

access path

rate, Deletion rate.

Integrity

(over and above those defined in data

Controls

analysis).

Of this data, attribute subset and access paths are relevant to refinement and validation of the entity model;

the

remainder have more bearing upon database and storage design. For each attribute,

and for each function in which the

attribute is used as a data item: Us age

Source, destination,

Traffic

Retrieval rate, Creation rate, Modification rate, Deletion rate.

usage mode

210

Input format

Decoding

Output format

Report heading~

Internal format

Base, scale, precision

Integrity

(Over and above those defined in

editing,

encoding

basic data analysis).

Controls Relationships

Derivation/Consistency

rules.

Format information is relevant to potential generation of text or code from the information if it is stored in a data dictionary system. 5.4

Functional

Data Model Diagrams

The entity diagram for the relevant subset of the corporate data model should be augmented to show: Access paths, sequencing and selection criteria, keys,

approximate populations,

approximate access traffic.

The remainder of the information specified in 5.2 should be documented in narrative or tabular form.

6.

VALIDATION AND REFINEMENT OF THE DATA MODEL

6.1

Purpose Although the specification of corporate and functional data models is co-ordinated, different reasons, ways.

their features are present for

and they may be inconsistent in several

For this reason they should be cross-checked rigorously.

211

It is also desirable, having d e v e l o p e d these models on a n e c e s s a r i l y p a r t l y s u b j e c t i v e basis, to apply r e l e v a n t formal m e t h o d s of analysis to the results.

In

this way it is p o s s i b l e

to find the s i m p l e s t versions of the data structures.

6.2

Stages of V a l i d a t i o n and R e f i n e m e n t

i)

Completeness Does the corporate data m o d e l contain features to a c c o m m o d a t e all r e q u i r e d functional data models?

ii)

Consistency Is the c.d.m,

self-consistent?

Is the c.d.m,

c o n s i s t e n t w i t h each f.d.m.?

Is each f.d.m,

iii)

consistent w i t h all the others?

Access paths Access paths and their relative importance should be r e p r e s e n t e d in the c.d.m.

R e l e v a n t features of an

access path are: -

Traffic for - add, modify,

-

Search keys

-

Selection criteria

-

iv)

retrieve,

delete.

P o p u l a t i o n from w h i c h s e l e c t i o n is to take place S e q u e n c i n g requirements

Normalisation Reduction of data structures to 3rd normal form is a p o w e r f u l tool for e l i m i n a t i o n of u n n e c e s s a r y complexity, and for finding the simplest p o s s i b l e form of those structures.

However,

such a form is not n e c e s s a r i l y the clearest

or the m o s t a p p r o p r i a t e either for e n d - u s e r s or for programming.

Hence after n o r m a l i s a t i o n has b e e n

carried out, it may be a p p r o p r i a t e to "de-normalise" the structures by recreation of h i e r a r c h i e s

and

r e p e a t i n g groups, w h e r e this can be firmly j u s t i f i e d on the grounds of usefulness

and clarity.

212

7.

SYSTEM DEFINITION

7.1

Objectives

This is the stage at w h i c h a p a r t i c u l a r system is d e s i g n e d at an o v e r a l l level.

C o n s t i t u t e n t w o r k units

are identified,

are e m b e d d e d in a control f r a m e w o r k w h i c h constrains

and

the

s y s t e m to operate in a m e a n i n g f u l manner.

7.2

I d e n t i f i c a t i o n of P r o c e s s e s

"Natural"

units of w o r k are i d e n t i f i e d from the r e q u i r e m e n t s

specification.

These units,

c o m p u t e r programs,

as yet, b e a r no r e l a t i o n to

but instead r e p r e s e n t the user's v i e w of

s p e c i f i c i n t e g r a l jobs w h i c h the system is p e r f o r m i n g for him.

E x a m p l e s of p r o c e s s e s are: p r o d u c t i o n of a report; .

the a p p l i c a t i o n of a s p e c i f i c

t r a n s a c t i o n to the

database; the computation,

e.g. of a sales forecast,

according

to a s p e c i f i c algorithm.

Characteristics

of p r o c e s s e s

are:

they operate upon data; outputs

thus process

inputs

and

are "logical records", w h i c h may be:

transactions,

reports,

units of the database,

t r a n s i e n t data item g r o u p i n g s

in memory.

Thus also W the o p e r a t i o n of the s y s t e m may be r e p r e s e n t e d by a data flow chart showing related p r o c e s s e s c o n n e c t i n g data paths.

and their

(Figure 3.1 in Section

3 is an

e x a m p l e of such a flow chart.)

W h e r e there is value

in doing so, p r o c e s s e s may be t h e m s e l v e s

s u b d i v i d e d into processes; of processes,

(e.g. c o n s i d e r i n g the example above

p r o d u c t i o n of a report may consist of

c o m p u t a t i o n of a n u m b e r of forecasts, some order).

and their ranking in

213

The subdivision of processes in a data-flow sense loses its values at a point where: the units of work lose meaning for the user; the relationships between the units of work is more strongly that of a procedural or scheduling nature.

7.3

Design of System Behaviour and Controls

Knowledge of the user's expectations of the system permits the design of a logical framework into which the system processes can be fitted, and which will control their operation in a secure manner.

Particular attention should be paid to useability of those system control parameters provided for users.

7.4

Specification of Processes

English plus decision tables is the most appropriate specification medium.

Data flow diagrams are also appropriate, but

procedural diagrams and charts should be eschewed except where vital, because they tend to impose premature design decisions.

Essential scheduling requirements between and within processes must b e identified and fully defined at this stage.

A strong

mandatory scheduling relationship between work units indicates that they should be regarded as part of a process.

Care should be exercised to avoid jumping the gun on detailed design work through inclusion at too early a stage of "how" decisions on mechanisms for achieving the results required of the processes.

However, certain "how" decisions are appropriate

at this stage:

e.g. selection of computational methods of

performing forecasting or optimisation calculations which are integral parts of the system.

Such decisions are necessary at

this stage because they affect the system's behaviour towards the user, and have an impact upon the data models.

214

7.5

Specificati0nof

Constraints

The value of each process to the system should be ascertained; some p r o c e s s e s justified.

are optional,

and i n c l u s i o n of these has to be

For each function performed,

a l t e r n a t i v e options~ e.g.

simple or sophisticated.

Some form of ranking b a s e d on priority, s h o u l d be carried out,

there may be

cost and f l e x i b i l i t y

in o r d e r to select the actual

c o n s t i t u e n t s of the first version of the system, the g u i d e l i n e s

and to lay

for s u b s e q u e n t d e v e l o p m e n t phases.

T o l e r a n c e s of ranges should be d e f i n e d for: a c c u r a c y of results, performance,

c o n s u m p t i o n of resources

(both d e v e l o p m e n t and operating),

scheduling.

8.

P R O G R A M AND RUN D E S I G N

8.1

Objectives

The p r i m a r y aim of this stage is to select suitable groupings of s y s t e m p r o c e s s e s , sequences.

to be formed into p r o g r a m s and p r o g r a m

The r e s u l t i n g w o r k - u n i t s

are i n t e n d e d to m a k e

e f f e c t i v e use of the resources of the c o m p u t e r by,

for example,

a v o i d i n g u n n e c e s s a r y r e p e a t e d t r a n s f e r of data, w h i l s t at the same time m e e t i n g the p e r f o r m a n c e

8.2

requirements

specified.

Method

E a c h s y s t e m process should have defined:

i) ii) iii)

iv)

Usage frequency; D e s i r e d t u r n a r o u n d time; Data access requirements,

i n c l u d i n g sequencing;

S c h e d u l i n g r e l a t i o n s h i p to - date/time, -

o t h e r processes.

215

There is a t r a d e - o f f to be e x e r c i s e d b e t w e e n k e e p i n g p r o c e s s e s separate, w h i c h makes i n e f f i c i e n t use of the computer, but w h i c h retains flexibility, and the binding t o g e t h e r of p r o c e s s e s into p r o g r a m s w h i c h can m a k e m o r e e f f e c t i v e use of the computer, but W h i c h w i l l need to be r e d e s i g n e d if new p r o c e s s e s are introduced.

Stages of the p r o c e d u r e are: i)

Choose p r o g r a m groupings a c c o r d i n g to: I/O - Database access -

Non database "batch" I/O; TP m e s s a g e h a n d l i n g

S c h e d u l i n g Functions Common service functions L i a b i l i t y to be invoked in a c o - o r d i n a t e d m a n n e r (i.e. function occurrences part of the same process). Close similarity of a s s o c i a t e d functional data models Similarity of scheduling requirements - e.g. end-month. ii)

Specify i n t e r - p r o g r a m s c h e d u l i n g relationships;

iii)

Specify overall frequency and scheduling requirements;

iv)

Specify the data-access needs of each p r o g r a m in terms of: -

-

-

-

subset of the schema data model, access paths, sequencing, selection criteria, frequency of access.

9.

S C H E M A DESIGN

9.1

Objective

The aim is to design an e f f e c t i v e i m p l e m e n t a t i o n v i e w of the corporate data model.

The e f f e c t i v e n e s s is judged in teinns of:

Clarity and a p p r o p r i a t e n e s s The p r o v i s i o n of facilities to enable the definition of e f f e c t i v e forms of the n e c e s s a r y sub-schemas. The schema's s u i t a b i l i t y as a basis for the d e f i n i t i o n of an e f f i c i e n t storage structure.

216

9.2

Relationshi~

The details the n a t u r e

Between

of the m e t h o d

of S C H E M A

for S C H E M A

of the f i l e - h a n d l e r

if a c o n v e n t i o n a l distinctions

and S U B S C H E M A

design

or DBMS

file-handler

are d e p e n d e n t

in use.

is used,

upon

In p a r t i c u l a r ,

the drawing

of m e a n i n g f u l

between

SUBSCHEMA

-

SCHEMA

-

-

Storage

may prove

difficult.

The s i m u l a t i o n is d e s c r i b e d

In this to use

The

Design

of s u b - s c h e m a

section~

design methods

in C o d a s y l - t y p e

DBMS.

Schema m u s t be d e s i g n e d

be designed.

This

SUBSCHEMA

the SCHEMA.

At the

Hence

it is m o s t

When m o d i f y i n g SUBSCHEMA

but w h i c h w o u l d

already

one of the m a i n of those

in r e s p o n s e

care must

can

faci$ities

present

inputs

in

to S C H E M A

SUBSCHEMAS.

design

cycles

of

before

care

the S U B S C H E M A S

to a l t e r e d

be e x e r c i s e d

other existing

to take g r e a t

leave

subschemas

of s t r u c t u r a l

that there w i l l be several

the S C H E M A

invalidate

it is n e c e s s a r y which would

1973).

before

to those

and s u b s c h e m a

are a p p r o p r i a t e

are achieved.

requirements,

which would

choise

of the needs

schema

designs

the

is limited

likely

between

satisfactory

file h a n d l e r

are p r o p o s e d w h i c h

in p r i n c i p l e

same time,

is a d e f i n i t i o n

iteration

a conventional

(CODASYL DDLC

is b e c a u s e

in a Codasyl

design

using

in 10.2.

or n e w

to a v o i d

SUBSCHEMAS.

to avoid

subtle

changes,

and DML s y n t a c t i c a l l y

a l t e r the semantics.

changes

In p a r t i c u l a r ,

valid,

217

9.3

M e t h o d for Design of a CODASYL SCHEMA

9.3.1 T r a n s l a t i o n of the entity r e p r e s e n t a t i o n of the corporate data m o d e l involves e x e r c i s e of the following choices: R e p r e s e n t a t i o n of entities: -

-

as record-types, as several record types, a s s o c i a t e d in some way, as group items w i t h i n a record type.

R e p r e s e n t a t i o n of relationships: -

-

as set types, as several set types, p o s s i b l y w i t h a d d i t i o n a l record types, e.g.

as is r e q u i r e d for m a n y - m a n y

relationships, -

as repeating groups, as loose associations of records, e.g. by common search keys.

A s s o c i a t i o n of attributes with entities.

9.3.2 The information p r o d u c e d in the design process can be c l a s s i f i e d by : a)

To be encoded in schema DDL i) ii) iii)

R e l e v a n t to structure e.g. RECORD,

SET, A R E A

R e l e v a n t to m o d e of use e.g. ORDER, KEY R e l e v a n t to storage design e.g. L O C A T I O N MODE, ACTUAL, V I R T U A L

b)

To be d o c u m e n t e d f o r reference by application programmers. e.g.

Record occurrence rules Record r e t r i e v a l modes Record s e l e c t i o n rules P r o c e s s i n g n e c e s s a r y to m a i n t a i n c o n s i s t e n c y w h e n each record is STORED, M O D I F Y e d or DELETEd.

A subset of the type a) information can u s e f u l l y be shown in d i a g r a m form.

Successive, not n e c e s s a r y complete,

forms of

such a d i a g r a m can be used as w o r k i n g notes during the design process.

218

9.3.3 Choice

of Records

The m o s t b a s i c the r e c o r d

and Sets

choice

types

in S C H E M A

design

is ~'what are to be

and set types".

Stages: i)

Make each e n t i t y - t y p e attributes Represent

ii)

relationships

Many-to-many will y i e l d a CODASYL

a record/set DBMS.

It is p o s s i b l e from stages

This

into

making

between

which

alternative

several (ii).

There

s h o u l d be c a r r i e d

for c h o o s i n g

between

them.

(i) above,

is i n v a l i d

by b r e a k i n g

two o n e - t o - m a n y

that

all

the record.

as set types.

as in

"intersection"

4.2.

structures

will

is no h a r m in this;

forward until

The design

and are n o w subject

in

any such

relationships,

use of an a d d i t i o n a l in Section

draft"

entities

structure

as i n d i c a t e d

(i) and

including

treated

is o v e r c o m e

alternatives

a "first

type,

data items w i t h i n

relationships,

relationship possibly record,

a record

of the e n t i t y

there

the

are grounds

(or designs)

to a n u m b e r

result

represent

of stages

of

refinement.

iii)

Where have

different different

subschemas,

access

items

into

iv)

Where~

within

variable

the e l e m e n t s

records

of a set,

record types

occurrences

structure type.

access

record

in d i f f e r e n t

paths),

types

there

consider

are redundant,

is a r e p e a t i n g

population,

of the r e p e a t i n g

group

the DBMS

group

it m a y be b e t t e r the m e m b e r

responsible

for

problem.

are v e r y

are c o m m o n d l y

be w o r t h w h i l e

are used

record types.

and thus make

the space m a n a g e m e n t

of an entity

in some way.

a r e c o r d type,

to make

Where

to these

consistent

w h i c h has a larger

v)

(i.e.

different

2 or m o r e

common

and m u s t b e kept

of the a t t r i b u t e s

needs

a n d / o r have

their d i v i s i o n

NB Data

subsets

and several

accessed

to c o n s o l i d a t e

is r e p r e s e n t e d

small,

together,

record

by g r o u p

types,

record

then

it may

so that the

items w i t h i n

a record

219

vi)

Where a r e l a t i o n s h i p is used rarely or not at all it may be that the overheads i n v o l v e d in sets are not justifiable. eliminated.

In this case, Instead,

the set-type can be

the record o c c u r r e n c e s can be

a s s o c i a t e d by a selection process b a s e d on like data item values. vii)

G r o u p i n g of several record types by m a k i n g t h e m members of the same set type is appropriate w h e r e this reflects the common patterns of access.

9.3.4 Choice of d e c l a r e d record keys, and record sequences in sets~ is g o v e r n e d by the most p o p u l a r sub-schema usage.

9.4

I n f o r m a t i o n from Data A n a l y s i s

Choice of record types, and of the access paths to them, is g o v e r n e d by the relative f r e q u e n c y w i t h w h i c h a group of items is a c c e s s e d in a p a r t i c u l a r manner.

Thus an analysis p r o c e d u r e is r e q u i r e d which: i)

For each item, determines the relative p o p u l a r i t y of d i f f e r e n t access paths to it.

ii)

For each access path,

lists, ordered by popularity,

the data items to w h i c h access is requested.

Storage structure design

(Section ii) requires that these

figures should be d i s s e c t e d by: i)

ii)

Type of access batch,

w h e r e economy of resources is p a r a m o u n t

online,

where response time is p a r a m o u n t

Insert/delete/modify/retrieve

operation.

220

i0.

SUBSCHEMA DESIGN

10.1

Objectives

The aim of this stage is to achieve a p r o g r a m - v i e w of data w h i c h is: i) ii)

A p p r o p r i a t e to the needs of the program. U n l i k e l y to be s e n s i t i v e to a l t e r a t i o n s in other systems,

iii)

10.2

the schema,

or the storage structure.

C o n s i s t e n t w i t h t h e schema,

and derivable from it.

Method

10.2.1 Using c o n v e n t i o n a l data , man agemen ~

By "conventional"

is m e a n any system in w h i c h p r o g r a m s n o r m a l l y

access d i r e c t l y real files r e s i d e n t on b a c k i n g storage. i)

For each r e c o r d - t y p e

in each file on backing storage~

w r i t e a s u b r o u t i n e to p e r f o r m each of four functions: -

create; delete;

-

modify; read.

These s u b r o u t i n e s s h o u l d embody any p r o c e s s i n g n e c e s s a r y to m a i n t a i n consistency. ii)

Design the p r o g r a m v i e w of data,

i n c l u d i n g selection

and s e a r c h procedures. iii)

Define the data m a i n p u l a t i o n o p e r a t i o n s to be carried out by the program.

iv)

Write subroutines

for

(ii), u s i n g

(i) as p r i m i t i v e

operations.

i0~2o2 U s i n g a Codas~l style DBMS

The p r o b l e m is to t r a n s l a t e a s u b - s c h e m a requirement, terms of access to the entities data models~

framed in

in p o s s i b l y several f u n c t i o n a l

into a r e c o r d / s e t subschema, w i t h a s s o c i a t e d

access path details.

As stated in Section 9, Schema and S u b s c h e m a design are closely interrelated.

221

Select a subset of the schema w h i c h m o s t closely meets the subschema needs.

(This is p r o b a b l y best shown in d i a g r a m

form, b a s e d upon the conventions shown in Figure 9.1).

If this is u n s a t i s f a c t o r y ,

d e t e r m i n e w h e t h e r the d i s c r e p a n c y is

to be met by m o d i f i c a t i o n of the schema, or by special p u r p o s e p r o g r a m m i n g to derive the desired data from a feasible subschema.

Any m o d i f i c a t i o n of the schema should be s u b j e c t to the safeguards m e n t i o n e d in Section 9.2.

There will always be a certain amount of p r o c e s s i n g w h i c h c a n n o t be e x p r e s s e d in schema or s u b s c h e m a DDL.

This falls into two classes: i)

P r o c e s s i n g w h i c h is g e n e r a l l y applicable to operations c a r r i e d out on a p a r t i c u l a r p a r t of the database; e.g. m a i n t e n a n c e of c o n s i s t e n c y w h e n a p a r t i c u l a r record is created or deleted.

This is best e m b o d i e d

in general purpose subroutines, which are made available as part of the database documentation. ii)

P r o c e s s i n g w h i c h is specific to a p a r t i c u l a r program; e.g. s e l e c t i o n criteria applied on a p a r t i c u l a r access path.

Both sorts of special purpose p r o c e s s i n g should be clearly documented;

the latter may be shown on the s u b s c h e m a structure

diagram.

ii.

STORAGE S T R U C T U R E DESIGN

ii.i

Objectives

The aim of this stage is to design a r e p r e s e n t a t i o n of the database on b a c k i n g storage which e f f i c i e n t l y meets the s p e c i f i e d p a t t e r n of access and usage,

and which also keeps

w i t h i n s p e c i f i e d constraints upon use of resources.

222

11.2

Statistics Required

The p e r f o r m a n c e of the

o v e r a l l c o m b i n a t i o n of database,

and a p p l i c a t i o n p r o g r a m s

is e x p e c t e d to be "optimal"

DBMS

in

some way.

Some c o m b i n e d function of: -

b a c k i n g storage; m a i n storage;

-

-

channel time; p r o c e s s o r time;

is to be minimised.

Of these only b a c k i n g storage is not d e p e n d e n t upon usage traffic.

The two n e c e s s a r y sets of i n f o r m a t i o n

for the d e s i g n process

are: The r e l a t i v e w e i g h t i n g s of the four m a i n resourees to be conserved. S t a t i s t i c s defining: -

for each database

record

frequency and hit d i s t r i b u t i o n of: add; delete; modify; retrieve o p e r a t i o n s -

for each access path used in the database, f r e q u e n c y of use in each of batch and on-line modes.

i1.3

M e t h o d of support__for access paths

Access paths should be ranked a c c o r d i n g to their level of usage. The m o s t h e a v i l y u s e d paths s h o u l d then be given h i g h e s t p r i o r i t y for e f f i c i e n t access.

On-line access paths

require rapid response,

cost in c o n s u m p t i o n of resources. require support

at a m i n i m a l

p o s s i b l y at some

B a t c h - m o d e access paths

c o n s u m p t i o n of resources,

expense of e l a p s e d time if necessary.

at the

223

Heavily

used

access

paths

s h o u l d be s u p p o r t e d

indexes

and pointer mechanisms

wh e r e

by a p p r o p r i a t e

the a c c o m p a n y i n g

overheads

can b e justified.

11.4

Summary

of Choices

For a Codasyl-style exercised

Database

DBMS,

the range

at the stage of storage

Access

design

mode

retrieval

Real or v i r t u a l

- CALC or VIA.

modes

to support,

representation

Set r e p r e s e n t a t i o n

mode

and linkage

S e a r c h keys.

Usage

A R E A placement. Schema

Storage

record

to storage mapping;

-

d a t a item r e p r e s e n t a t i o n ,

-

data item distribution.

Characteristics

Device Page Space

type.

size. allocation;

-

amount

of overflow,

-

growth

rate.

and h o w efficiently.

of d e r i v e d

Indexing.

Storage

are:

Strate~[

Record placement Record

of choices w h i c h m u s t be

data.

options.

224

ACKNOWLEDGEMENTS

Although

the impetus

for this w o r k has come from several years

e x p e r i e n c e g a i n e d on a v a r i e t y of p r o j e c t s , d o c u m e n t it in its p r e s e n t

u n d e r c o n t r a c t to the M i n i s t r y of Defence. my MOD c o l l e a g u e s

the e n c o u r a g e m e n t to

form o c c u r r e d while the author was Thanks are due to

for this encouragement.

The author is also e s p e c i a l l y i n d e b t e d to Bill W a g h o r n of SCICON and Ian P a l m e r of CACI, who have b o t h c o n t r i b u t e d s i g n i f i c a n t l y to the ideas t h r o u g h e x t e n s i v e of this paper,

discussions.

For comments upon drafts

thanks are due to m a n y c o l l e a g u e s w i t h i n the

M i n i s t r y of Defence,

and ICL.

REFERENCES

i.

ANSI X3 S P A R C / D B M S Study G r o u p I n t e r i m Reports F e b r u a r y 1975.

2.

C O D A S Y L DDLC Journal

3.

C O D A S Y L C O B O L J O D as m o d i f i e d by the D B L T G D a t a b a s e

4.

Brown, A.P.G.

5.

Robinson,

F a c i l i t y Proposal,

of D e v e l o p m e n t 1973.

R e v i s e d 1975.

M a r c h 1975.

"Entity M o d e l l i n g "

IFIP TC/2 C o n f e r e n c e 1975

(Pub. N o r t h H o l l a n d Book Co.) K.A.

" D e s c r i p t i o n of stored data, using m o d i f i e d

NCC Standards"~

Private

6.

Bramhill,

and Taylor G.

7.

B S C / C O D A S Y L DDLC DBAWG June 1975 Report.

P.S.,

to Codasy!".

communication,

1974.

"Database Design.

F r o m Codd

To be p u b l i s h e d in "Database Journal",

1976.

Related Documents

Process Analysis
May 2020 14
System Analysis
November 2019 11
System Analysis
May 2020 15
Process Analysis Modeling
December 2019 21

More Documents from "Ioannis Moutsatsos"