VSAM Virtual Storage Access Method Allows for interactive updates (adds,
changes, and deletes)
1-1
Three Types of VSAM files KSDS Keyed Sequence Data Set RRDS Relative Record Data Set ESDS Entry Sequence Data Set
1-2
KSDS Allows for users to access records
sequentially or randomly Includes an index and data components A CLUSTER consists of both the index and data components together Index relates a value in a key field to the actual location of the record on disk Index is only used in random (or dynamic) processing
1-3
VSAM Data Component Storage Concepts Control Interval (CI) Fixed amount of storage; Must be multiple of 512 Usually 2048 or 4096 bytes Data records are stored in a CI • A CI of 4096 bytes can store forty 100-byte records
Control Areas (CA) One cylinder in size Size of a cylinder varies by disk drive Contains numerous CIs 1-4
Approximate number of records in a CI (CI size - 10) / record length
1-5
Records in a CI
Calculate the number of records in a CI
formula on previous slide Determine the percentage of records in a CI to be added or insert in an average CI Concept of FreeSpace • Space where records can be added “on the fly”
1-6
CI in a CA A fixed number based on CI size and
disk drive
For disk drive with ACD001: • CI of 2048: 315 CI in a CA • CI of 4096: 180 CI in a CA
Determine the percent of the CI to be left open for additions Known as CA freespace 1-7
Freespace in a CI
Is used to add records which belong
in the CI Records in a CI are shifted automatically within the CI to accommodate the inserted record
1-8
CI Splits If there is no freespace in the CI in
which the record is to be inserted:
CI split Half of the records in the CI are moved to a free CI in the same CA The inserted record is then inserted in the proper CI These happen routinely and are accomplished quickly
1-9
CA splits If a CI needs to split and there is no free CI
in the CA: CA split Half of the CI are moved to a free CA (usually at the end of the file, there is unused space) Therefore, each of the two CI have 50% free space The original CI can now split CA splits have much overhead and should be avoided!!! 1-10
Avoid CA splits Reorganization of files Import: Copy the VSAM file to a sequential dataset Export: Delete and reload the VSAM file from the sequential data set: Resets the freespace as in the define cluster Allocate adequate freespace Analyze the primary key
1-11
Primary Key Analysis Is the pattern in the PK Example: The Financial Aid office must keep three years of data on-line. • Previous year: State reporting • Current year: Distributing aid to students • Next year: Granting/guaranteeing aid for
next year
1-12
Primary Key Analysis Key options for the financial aid file 1-digit year + SSN • Little on-line activity on first third of file • Most “adds” are in last third: CI/CS split
likely
SSN + 1 digit year • Activity is spread evenly throughout the file • Recommended
1-13
VSAM Index Component Every KSDS as an index component for
each primary key AND each foreign key Base cluster: Primary key index and Data components Foreign key index is known as the alternate index
1-14
Primary Key Index and Data Two parts of the index: Sequence set Index set
1-15
Primary Key Index and Data Sequence Set Lowest level of the index component Contains information that relates key values to a specific CI Links the highest PK in a CI to the address of that CI • Stores all the key-address pairs for the Cis in a CA
in one CI of the sequence set • There is a separate CI in the sequence set for each CA in the data
1-16
Primary Key Index and Data Index Set Highest level of the index component Key-address pairs stored in one CI (can be stored in main memory for processing efficiency) • Address Pointer links to address of the appropriate
sequence set • Based on size of data component (number of cylinders or CA needed to store data component), you may need a intermediate index
1-17
Alternate Index and Data Component Alternate index relates alternate key to
primary key Then uses the primary key index to locate the data Can have unique or non-unique AI Figure 2-4 (unique) Figure 2-5 (non-unique)
1-18
Alternate Index Systems analyst determines whether to
update the AI each time the cluster is changed Can cause much overhead to update all indices especially in the case of a CA split Other alternative is to re-build the index periodically (every night) 1-19
Relative Record Data Set (RRDS) Lets the user access each record at random
without the overhead of maintaining an index Instead each record in a RRDS is numbered, starting with 1 for the first record RRDS consists of a specified number of areas or slots Known as the relative record number (RRN)
1-20
RRDS May need a routine to convert the PK of a
record to a relative record number Hashing • Most common hashing routine: The
remainder option on the divide Can cause empty slots. Can waste storage; But we avoid CI/CA splits Difficult to HASH if PK is non-numeric
1-21
RRDS Collision If hashing routine results is same RRN for two different PK Must set secondary searching technique in case of collisions • Usually linear probing: check the next record up to a maximum number of tries. Needed to know if record to be added already exists Needed to determine if record to be retrieved exists without reading entire file
1-22
RRDS Advantages No index overhead Direct relationship between data and location of the data Permits both random and sequential processing If good hashing routine with minimal collisions: performance efficiency is excellent
1-23
RRDS Disadvantages Storage efficiency Collisions Difficulty in determining good hashing technique • Difficulty with alphabetic key Does NOT support the concept of FK or AI Not widely used
1-24
ESDS Entry Sequenced Data Set The simplest type of VSAM file Records are stored sequentially at time of
entry
1-25
ESDS Similar to sequential processing Does allow to OPEN EXTEND to add
records to the end of the file
1-26
ESDS
Author does not recommend it Says it is restricted to sequential processing BUT you can build an AI for a FK BUT as of my current COBOL manuals, COBOL cannot use the AI and processing must be sequential. This is my current topic for career day questions 1-27