Availability Management

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Availability Management as PDF for free.

More details

  • Words: 1,087
  • Pages: 36
Availability Management

- Premanand Lotlikar 31st July, 2007

Agenda • • • • • • • • • •

Introduction Objective of Availability Mgmt Basic Concepts Benefits Relationship with other processes Activities in Change Mgmt Process Control Key Performance Indicators Cost Possible Problems

Objectives • Determining availability requirements in close collaboration with customers • Guaranteeing the level of availability established for the IT services • Monitoring the availability of the IT services • Proposing improvements in the IT infrastructure and services with a view to increasing levels of availability • Supervising compliance with the OLAs and UCs agreed with internal and external service providers

Basic Concepts

Basic Concepts • High Availability means – IT service is continuously available to the customer – Little downtime – Rapid service recovery

• Availability of service depends on – – – –

Complexity of the IT infrastructure architecture Reliability of the components Ability to respond quickly and effectively to faults Quality of maintenance by support and suppliers

Basic Concepts • Reliability means – Service is available for an agreed period without interruptions

• Includes resilience • Calculated using statistics • Determined by – Reliability of the components – Ability of service/component to operate despite failure (resilience) – Preventive maintenance

Basic Concepts • Maintainability needed to – Keep the services in operations – Restore services when they fail

• Includes – Taking measures to prevent faults – Detecting faults – Making diagnosis by components themselves – Resolving the fault – Restoring the service

Basic Concepts

Basic Concepts • Mean Time to Repair (MTTR) – Avg time b/w the occurrence of a fault and service recovery

• Mean Time Between Failures (MTBF) – Avg time b/w recovery from one incident and the occurrence of next

• Mean Time Between System Incidents (MTBSI) – Avg time b/w the occurrence of two consecutive incidents

Benefits • Fulfillment of the agreed service levels. • Reduction in the costs associated with a given level of availability. • The customer perceives a better quality of service. • The levels of availability progressively increase. • The number of incidents is reduced.

Inputs - Outputs

Relationship with other processes • Service Level Mgmt is responsible for negotiating & managing availability • Availability is one of the most important element in SLA

Relationship with other processes • Configuration Mgmt has information about the infrastructure and can provide valuable information to Availability Mgmt

Relationship with other processes • Changes in capacity can often affect the availability of a service • Changes to availability will affect capacity • These 2 processes exchange info about – Scenarios for upgrading – Phasing out IT components – Availability trends that may need changes to capacity

Relationship with other processes • Problem Mgmt is directly involved in identifying and resolving the causes of actual or potential availability problems

Relationship with other processes • Incident Mgmt provides reports with information about recovery times, repair times etc. This information is used to determine the achieved availability.

Relationship with other processes • Change Mgmt informs Availability Mgmt about FSC • Availability Mgmt informs Change Mgmt about maintenance related to new service and elements.

Activities • Planning • Monitoring

Planning • • • • • •

Determining the availability requirements Designing for availability Designing for recoverability Security issues Maintenance management Developing the Availability Plan

Determining the availability requirements • Must be undertaken before SLA is concluded • Should address both new IT services and changes to existing services • Clearly defining availability requirements early is essential to prevent confusion and differences

Determining the availability requirements • Should identify: – Key business functions – Agreed definition of IT service downtime – Quantifiable availability requirements – Quantifiable impact on the business functions of unscheduled IT service downtime – Business hours of customer – Agreements about maintenance windows

Designing for availability • Vulnerabilities affecting availability standards should be identified early • This will prevent – Excessive development costs – Unplanned expenditure at later stages – Additional cost by suppliers – Overall delays

Designing for recoverability • Uninterrupted availability is rarely feasible • Design for recoverability involves – Effective Incident Mgmt – Appropriate escalation – Communication – Backup and recovery procedures – Tasks, responsibilities and authority clearly defined

Key Security issues • Security and reliability are closely linked • High availability can be supported by effective information security • This includes: – Determining who is authorized to access secure areas – Determining which critical authorizations may be issued

Maintenance management • There will always be scheduled window of unavailability • These periods can be used for preventive actions • Maintenance must be carried out when impact on services can be minimized

Developing the Availability Plan • Long term plan concerning availability over the next few years • It is not the implementation plan for Availability Mgmt • Plan require liaison with areas such as – Service Level Mgmt – IT Service Continuity Mgmt – Capacity Mgmt – Change Mgmt

Methods and Techniques • Component Failure Impact Analysis(CFIA) – Uses an Availability matrix with strategic components and their roles in each service – Horizontal Analysis – Vertical Analysis

CFIA

Fault Tree Analysis • Used to identify chain of events leading to failure of IT service • Distinguishes following events: – Basic Event: power outages or operator error – Resulting Event: resulting from combination of earlier events – Conditional Event: events that occur only in certain conditions – Trigger Event: events that cause other events

Fault Tree Analysis

Availability Calculations • Availability is commonly defined as a percentage as follows: • For example, if the service is 24/7 and over the last month the system has been down for four hours to carry out maintenance, the real availability of the system was:

Process Control • Critical Success Factors – Business must have clearly defined availability objectives – SLM must have been setup to formalize agreements – Both parties must use the same definitions of availability and downtime

Process Control • Key Performance Indicators – Percentage availability per service – Downtime duration – Downtime frequency

Cost

Possible Problems • The real availability of the service is not monitored correctly. • There is no commitment to the process in the IT organization. • The appropriate software tools and personnel are not available. • The availability objectives do not match the customer's needs. • There is a lack of coordination with other processes. • Internal and external service providers do not recognize the authority of the Availability Manager as a result of a lack of support from management

Thank you!

Related Documents

Availability Management
November 2019 27
Availability Management
November 2019 16
Flights Availability
May 2020 17
01311-availability
October 2019 38
Squash Availability
November 2019 27