University Of Western Australia Subsea Technology module OENA8589
RISK, RELIABILITY AND AVAILABILITY Kevin Mullen
Risk
1
What is Risk?
• “The chance of something happening that will have an impact on the objective” • Frequency x Consequence • “Expected value of an unwanted outcome measured in dollars”
What is Risk?
• “Expected value of an unwanted outcome measured in dollars” – Injury or death of personnel – Damage or destruction of the environment – Excessive production costs – Reduction or loss of production – Project delays
2
Consequence
Likelihood
Typical Risk Matrix Likelihood -> Consequence
Never heard of Has occurred in industry
in industry
Has occurred
Occurs often
Occurs often
in company
in company
at site
No Injury
LOW
LOW
LOW
LOW
LOW
Slight injury
LOW
LOW
MED
MED
MED
Minor injury
LOW
MED
MED
HIGH
HIGH
Major injury
MED
MED
HIGH
HIGH
VERY HIGH
Fatality
MED
HIGH
HIGH
VERY HIGH
VERY HIGH
Multiple fatality
HIGH
HIGH
VERY HIGH
VERY HIGH
VERY HIGH
VERY HIGH HIGH MED LOW
Rectify immediately Rectify with urgency, unless clearly impracticable Reduce risk as far as practicable Accept, but manage through competency and awareness
3
ENTERPRISEWIDE RISK RANKING MATRIX SEVERITY OF CONSEQUENCES
LIKELIHOOD OF OCCURRENCE
Threat to Enterprise (Catastrophic) (1) PERSONNEL – Multiple (five or more) fatalities. APPENDIX A COMMUNITY – Widespread impact to nearby communities. ENVIRONMENTAL – Long term environmental impact, and/or adverse, worldwide publicity. FACILITY – Total destruction to installation(s) estimated at a cost greater than $100,000,000; Extended facility shutdown, and/or potential for permanent closure. For floating production systems, loss of floating structure.
Major Serious (2) (3) PERSONNEL – One or PERSONNEL - One or several fatalities, limited to more severe injuries, area of incident. including --immediate ENTERPRISE-WIDE RISK permanently RANKING COMMUNITY - One or disabling injuries. more severe injuries. COMMUNITY - One or ENVIRONMENTAL more minor injuries. Significant release with ENVIRONMENTAL serious off-site impact and Significant release with more likely than not to cause serious off-site impact. immediate or long-term FACILITY - Damage health effects. to process area(s) at an FACILITY – Damage to estimated cost greater installation(s) estimated at a than $1,000,000 but cost greater than less than $10,000,000; $10,000,000 but less than 10 to 90 days of $100,000,000; downtime in downtime. excess of 90 days.
Minor (4) PERSONNEL - Single injury, not severe, possible lost time. MATRIX COMMUNITY - Odor or noise complaint from the public. ENVIRONMENTAL Release which results in Agency notification or Permit violation. FACILITY - Some equipment damage at an estimated cost greater than $100,000 but less than $1,000,000; 1 to 10 days of downtime.
Incidental (5) PERSONNEL – Minor or no injury, no lost time. COMMUNITY - No injury, hazard, or annoyance to the public. ENVIRONMENT Environmentally recordable event with no Agency notification or Permit violation. FACILITY - Minimal equipment damage at an estimated cost less than $100,000; negligible downtime.
Enterprise-Wide Risk Ranking Matrix
Frequent (1) Incident is very likely to occur at this facility. Possibly several times during its life time. Statistical probability P> 10-2
1
2
2
3
5
Occasional (2) Incident may occur at this facility some time during its life time. Statistical probability: 10-2 > P > 10-3
2
2
3
4
6
Seldom (3) Incident has occurred at a similar facility and may reasonably occur at this facility. Statistical probability: 10-3 > P > 10-4
3
3
4
5
6
Unlikely (4) Given current practices and procedures, this incident is not likely to occur at this facility. Statistical probability: 10-4 >P > 10-6
4
4
6
6
6
Remote (5) Highly unlikely, although statistics show that a similar event has happened. Statistical probability P< 10-6
4
5
6
6
6
ENTERPRISE RISK MANAGEMENT DRIVEN
SAFETY MANAGEMENT SYSTEM DRIVEN
PRIMARY DRIVER
OCCUPATIONAL HEALTH AND SAFETY DRIVEN
Risk Assessments QRA – Quantitative Risk Assessment 1. Identifying what could go wrong 2. Estimating the likelihood of these events occurring 3. Examining the possible consequences of these events
Risk Analysis
4. Deciding which risks are tolerable and which aren’t
Risk Assessment
5. Modifying the activity so the intolerable risks are reduced or eliminated.
Risk Management changes to design and operational practice
4
Fatal Accident Rates
Implied Cost of Averting a Fatality (ICAF) 58. In making an assessment of reasonable practicability, there is a need to set criteria on the value of a life or implied cost of averting a statistical fatality (ICAF). HSE’s ‘Reducing Risks Protecting People’ document sets the value of a life at £1,000,000 and by implication therefore the level at which the costs are disproportionate to the benefits gained. In simplistic terms, a measure that costs less than £1,000,000 and saves a life over the lifetime of an installation is reasonably practicable, while one that costs significantly more than £1,000,000, is disproportionate and therefore is not justified. However case law indicates that costs should be grossly disproportionate and therefore costs in excess of this figure (usually multiples) are used in the offshore industry. In reality of course there is no simple cut-off and a whole range of factors, including uncertainty need to be taken account of in the decision making process. 59. In the offshore industry there is a need to take account of the increased focus on societal (or group) risk, i.e. the risk of multiple fatalities in a single event, as a result of society's perceptions of these types of accident. Therefore the offshore industry typically addresses this by using a high proportion factor for the maximum level of sacrifice that can be borne without it being judged ‘grossly disproportionate’; this has the effect of increasing the ICAF value used for decision-making. The typical ICAF value used by the offshore industry is around £6,000,000, i.e. a proportion factor of 6. HSE considers this to be the minimum level for the application of Cost Benefit Analysis (CBA) in the offshore industry. 60. Use of a proportion factor of 6 ensures that any CBA tends towards the conservative end of the spectrum and therefore takes account of the potential for multiple fatalities and uncertainty. Although a proportion factor of 6 tends to be used, there are no agreed standards and it is for each duty holder to apply higher levels if appropriate, for example in very novel designs. Extract from Assessment Principles for Offshore Safety Cases (APOSC) Issued March 2006 UK Health and Safety Executive
5
Safety Terminology •
Risk Assessment - a subjective evaluation, involving judgment, intuition and experience, where the level of risk is classified in four levels and their associated measures of Fatalities/Person/Year – 1) Tolerable Risk - level prepared to accept but will continue to seek reduction. 10-3 to 10-5 – 2) Acceptable Risk - level prepared to accept without seeking further reduction. 10-5 – 3) Unacceptable Risk - level prepared to reject for oneself and others. 10-3 – 4) ALARP - As low as reasonably practicable.
•
The usual measure of risk at a global level is Fatalities/Person/Year, but for the local view, i.e., for your immediate corporate mission, risk can be viewed as simply the “failure of your product.”
•
The usual format for the analysis of Risk Assessment is a “CostBenefit” Analysis, lives saved versus monetary costs.
What is Risk Management? Risk Management is the effective identification, assessment and control of Risk • Establish Context and Scope • Identify the Hazards • Assess the Risk – frequency – consequences – safeguards • Rank the Risks • Eliminate / Minimise the Risk • Ongoing review and monitoring
6
How is Risk Managed?
• Useful Tools: – QRA – RAM studies – FMECA – HAZID \ HAZOP – Audits • Best implemented during design • Qualitatively first, then quantitatively
Why is Risk Management needed? • Legislation \ Standards • • • •
Control of Major Hazard Facilities Pipeline Acts OS&H Regulations 1984 AS/NZS 4360 Risk Management
• Necessary for business optimisation ($) • Increase value by: – minimising loss ($) – maximising opportunity ($) • Optimises the performance of the facility • Reduces probability of becoming: – Piper Alpha – Longford – Exxon Valdez
7
History of Major Hazards Control 1960’s Flixborough UK (explosion and fire) Prescriptive • Recommendations for design and operation • (USA) style statutory provisions • Consideration of the operation of safety procedures Alexander L. Kielland (accommodation platform capsize) 1970’s The “Safety Report” approach. • Operator has to describe safety management to the Regulator. Bhopal India (toxic release) 1980’s • Concept Safety Evaluations based on Quantified Risk Analysis Techniques QRA • Aims to identify and quantify risks to an acceptable level Piper Alpha oil platform (explosion and fire) 1990’s The “Safety Case” approach. • Operator has to convince Regulator on safety management. • Companies now responsible for their Actions - Must assess and determine the level of Risk 2000’s Control of Major Hazards • Safety SILs
Bombay High North platform (explosion and fire)
Bowtie Diagram
Critical Event
Events leading to critical event
Events following critical event
The process of risk analysis, with a sequence of events leading to a hazardous situation (critical event), followed by a series of events leading to a variety of possible consequences
8
Identify the Control Measures Proactive Controls
Causes
Hazards
Reactive Controls
Reduction measures
Elimination measures
Outcomes
Incidents
Prevention measures
Emergency Response
Mitigation Prevention of measures escalation
Safety Case “A documented body of evidence that provides a convincing and valid argument that a system is adequately safe for a given application in a given environment” To implement a safety case we need to: • make an explicit set of claims about the system • produce the supporting evidence • provide a set of safety arguments that link the claims to the evidence • make clear the assumptions and judgements underlying the arguments The Safety Case must demonstrate that the control measures are adequate to eliminate or reduce as far as practicable risks associated with Major Incidents Demonstration is typically achieved through: • Reference to Codes of Practice, Standards, Guidance, etc. • Through risk assessment (qualitative or quantitative) The safety case is a “living document” which evolves over the safety life-cycle.
9
Reliability
RAM DEFINITIONS • • • •
•
•
RAM – Reliability, Availability, Maintainability Reliability - The ability of an item to perform a required function under stated conditions for a stated period of time (BS4778) – UPTIME Failure – The termination of the ability of an item to perform a required function (BS4778) - FAILURE EVENT Maintainability - The ability of an item, under stated conditions of use, to be retained in, or restored to, a state in which it can perform its required functions, when maintenance is performed under stated conditions and using prescribed procedures and resources (BS4778) - DOWNTIME Availability - The ability of an item (under combined aspects of its reliability, maintainability and maintenance support) to perform a required function at a stated instant of time or over a stated period of time (BS4778) - UPTIME / (UPTIME + DOWNTIME) or MTTF / (MTTF + MTTR) Deliverability – The ability of a system to deliver gas to the LNG plant (under combined aspects of availability and capacity) understated conditions and at a stated instant of time or over a stated period of time – (AVAILABILITY * CAPACITY)
10
Reliability: Key Design Requirement •
Reliability is as fundamental a design requirement as function and performance
•
For every Functional requirement a Reliability requirement can (in principle) be specified
•
– Function:
Seal A must not leak
– Reliability:
P(seal A does not leak) > 0.99
For every Performance requirement a Reliability requirement can (in principle) be specified – Function: Valve must close in less than 10 seconds – Reliability: P(time to close < 10) > 0.99
Failure Characteristics • Different components fail in different patterns – Flow components, chokes & valves - wear out – Mechanical components, wellheads – long life – Electronic components - fail early or last a long time – Pressure containment, pipes – system fails pressure test, or long life – Environmental influences, CO2, H2S, chlorides, overprotective CP and H2 build-up – corrode progressively or induce rapid cracking failures • These create various distribution, Normal, Exponential, Weibull, etc. • Simple Prediction uses Exponential = e ^ (t/mttf) as approximation for linear failure rates • Complex Simulation programs use distributions matched to components
11
Factors influencing failure rate In general the failure rate of a component or element depends on four main factors: (a) Quality (b) Temperature (c) Environment (d) Stress These factors are influenced by: • the design process • manufacture • the way the system is operated
Probabilistic Design
Probability Distribution Function of Load and Resistance
12
Stress and Strength
Overlapping of stress and strength distributions
Failure Rate and Mean Time To Failure Example: Constant Failure Rate • •
Set h(t) = λ, a constant failure rate. Integrate to find the reliability R(t) R(t) = exp (-λ t),
This is often used in reliability analysis of systems. Mean Time To Failure (MTTF) - average time a device or system will operate, without repair, before failure. Form the Expected Value Theorem: • •
E(x) = ∫ x f(x) dx, and introducing an integration by parts, it follows that the MTTF can be determined as: MTTF = ∫ t f(t) dt = ∫ R(t) dt
For the special case of a constant failure rate: •
MTTF = 1 / λ
13
Availability
Availability Improvement • Availability = MTTF / (MTTF+MTTR) • It is express as a fixed ratio, NOT time dependent • Availability can be achieved in 2 ways: – Extend failure free operating period (reliability) – Reduce time to restore system (maintainability) • Subsea time to repair must include; Detection, Location, Analysis of repair, Spares / repair kit, Qualification, Mobilisation, Deployment, Repair execution, Commissioning. • Increased value in driving for Reliability rather than Maintainability to achieve Availability
14
Reliability & Repair Data Reliability / Availability of Repairable Items Assessment Period (t) ITEM
1 2 3 4 5 6 7 8 9 10
REPAIRABLE ITEM Hydraulic System Elements Production Pipiing Test / Vent Piping 10 inch 10 kpsi gate valve Isolation function 10 inch 10 kpsi gate valve HIPPS function 1/2" Test Valve 1/2" Vent Valve PZT Sensor HIPPS Hydraulic Module Check valve HIPPS SEM
30 years MTTF
FAILURE RATE X years^-1
QUANTITY OF ITEMS No.
RELIABILITY OVER PERIOD Re=exp^(-Xt)
UNRELIABILITY OVER PERIOD 1-Re
MTTR
years
days
REPAIR RATE u years^-1
AVAILABILITY PROPORTION A=u / (X + u)
UNAVAILABILITY PROPORTION 1-A
10000 5000 1000 250 250 250 50 210 500 42
0.0001 0.0002 0.0010 0.0040 0.0040 0.0040 0.0200 0.0048 0.0020 0.0238
1 1 1 1 1 1 1 1 1 1
0.99700 0.99402 0.97045 0.88692 0.88692 0.88692 0.54881 0.86688 0.94176 0.48954
0.0030 0.0060 0.0296 0.1131 0.1131 0.1131 0.4512 0.1331 0.0582 0.5105
100 100 70 20 20 20 20 20 20 20
3.650 3.650 5.214 18.250 18.250 18.250 18.250 18.250 18.250 18.250
0.999973 0.999945 0.999808 0.999781 0.999781 0.999781 0.998905 0.999739 0.999890 0.998697
0.000027 0.000055 0.000192 0.000219 0.000219 0.000219 0.001095 0.000261 0.000110 0.001303
Types of Redundancy •
Classified on how the redundant elements are introduced into the circuit
•
Active or Static Redundancy – External components are not required to perform the function of detection, decision and switching when an element or path in the structure fails.
•
Standby or Dynamic Redundancy – External elements are required to detect, make a decision and switch to another element or path as a replacement for a failed element or path.
•
Generally subsea systems (e.g. umbilicals, the MCS) use active redundancy – hot standby
•
As an alternative to redundancy, consider Diversity – using alternative arrangements of a different kind – e.g. the Back-Up Intervention Control system (BUICS) available on Snohvit, in case the umbilical fails
15
Simple Parallel Redundancy Active - Type 1
In its simplest form, redundancy consists of a simple parallel combination of elements. If any element fails open, identical paths exist through parallel redundant elements.
Bimodal Parallel Redundancy Active - Type 3
(a) Bimodal Parallel/ Series Redundancy
(b) Bimodal Series/ Parallel Redundancy
A series connection of parallel redundant elements provides protection against shorts and opens. Direct short across the network due to a single element shorting is prevented by a redundant element in series. An open across the network is prevented by the parallel element. Network (a) is useful when the primary element failure mode is open. Network (b) is useful when the primary element failure mode is short.
16
Series and Parallel Availabiity Calculations SAP
Series - Availabilty - Product
Umbilical
PUP
Availability 72.000% UnAvail 28.000%
Subsea
Av 90.000%
Av 80.000%
UnAv 10.000%
UnAv 20.000%
Parallel - Unavailabilty - Product
SCM A Re 90.000% UnRe 10.000% MTTF yrs 4.5
OR
Re 99.000% UnRe 1.000%
MTTR years 0.5 SCM B Re 90.000% UnRe 10.000% MTTF yrs 4.5 MTTR days 0.5
Maintainability
17
Maintainability
•
Philosophy - preventative, corrective, opportunistic
•
Actions to demonstrate function is in good condition – In service monitoring, testing and footprinting – Corrosion monitoring – Noise / vibration monitoring – Fluid monitoring, sand detection, SRBs, chlorides, scale
•
Repair planning and contingencies, pipeline repair systems, spares stock holding, stand-by or call-off intervention contracts, alternative temporary systems
•
Access systems and tooling
•
All aim to reduce MTTR
•
Reliability Centred Maintenance
•
Historic records, Trends, Predictive capability & feed back loops
Maintenance Philosophy
• Subsea Excess Capacity (typical) • Subsea High Redundancy (typical) – spare wells – valves – spare control systems •
Mobilise maintenance when…?
18
Maintaining the Gorgon Field
Deliverability
19
Deliverability • Deliverability = Availability * Capacity • Useful terms – DCQ, Daily Contract Quantity – Shortfall, Quantity not supplied • Security of supply • Contract shape and style • Business Risk and Exposure • Best Programs focus on the issue • Used to understand, Quantify risk & contract accordingly • Shapes contract terms DCQ to rolling 24 hour average quantity • “Its about the money stupid”
Deliverability •
How to get high deliverability – System analysis & engineering – Understanding frequency & duration of failures – Standard sizes and component rating at no extra cost – De-bottlenecking & tuning capacity of system – Line pack and storage – Ability of downstream to respond to peak turn-up rates – Capacity and ullage as pressure drops due to well failure – Temporary increase of flow velocity / erosion limits wrt life – N out of M philosophy and sparing insurance
•
Operability studies & modelling
•
Supply chain models based on “Just In Time” logistics
•
Define value of Re Av De in relationship to project
20
Safety Integrity Levels
What is a Safety Integrity Level? Safety Integrity Level is the required “reliability” of a safety function Safety Integrity Level 4 3 2 1
Low demand mode of operation (Average probability of failure to perform its design function on demand) ≥ 10-5 to < 10-4 ≥ 10-4 to < 10-3 ≥ 10-3 to < 10-2 ≥ 10-2 to < 10-1
Safety Integrity Level
High demand or continuous mode of operation (Probability of a dangerous failure per annum) ≥ 10-5 to < 10-4 ≥ 10-4 to < 10-3 ≥ 10-3 to < 10-2 ≥ 10-2 to < 10-1
4 3 2 1
21
PFD
•
Risk reduction requiring a SIL 4 function should not be implemented. Rather, this should prompt a redistribution of required risk reduction across other measures.
Classic HIPPS Configuration
22
SIL 3 HIPPS example
Risk Reduction Residual
Tolerable
risk
risk
1.87 x 10-6
pa
10-5 pa
Initial Risk of high pressure getting past the tree production choke (Pressure Regulating System) 100
(once per annum)
(Acceptable failure rate per DNV)
Necessary risk reduction
Increasing risk
Actual risk reduction
23
Risk Reduction
Layers of Protection
Pressure Protection System for Pipeline
Residual
Tolerable
risk
risk
10-5
Initial Risk of hydrate blockage, and overpressuring the pipeline 100
(once per annum)
(Acceptable failure rate per DNV)
Necessary risk reduction
Increasing risk
Actual risk reduction
Partial risk covered by other systems e.g. manual shutdown, Pipeline Simulator etc.
Risk Reduction by Pressure Safety System SIL 3
Risk Reduction by Pressure Regulating System SIL 2
Risk reduction achieved by all safety-related systems and external risk reduction facilities
24
Equipment Failure Rates
Equipment PFDs
25
PFD as a function of Test Interval
Probability of Failure on Demand
PFDAVG = ½ λ τ i
PFDavg
Test TIF Independent Failure
Time, Test Interval τ i
PFD for a simple system Proof Test =
1 yr
For the Pressure Transmitter, PFDSE
=
0.44 x 10-3
For the logic solving element, PFDLS
=
7.0 x 10-3
=
3.5 x 10-3
For the final element, PFDFE
Therefore, for the safety function, PFDAVG
=
0.44 x 10 -3 + 7.0 x 10-3 + 3.5 x 10-3 = ≡
1.1 x 10-2
Safety Integrity Level 1
Change proof test interval to 6 months PFDSE = PFDLS = PFDFE = PFDAVG
0.22 x 10 -3 3.5 x 10-3 1.75 x 10-3 = 5.5 x 10-3 ≡
Safety Integrity Level 2
26
Layered Protection System Subsea Control Module
Dump Valve
Subsea Electronics Module
Gas Plant DCS PPS card
Single layer PFDAVG = 1.1 x 10-2 ≡ Safety Integrity Level 1 (annual testing) Dual layers PFDAVG = (1.1 x 10-2) x (1.1 x 10-2) ≡ 1.2 x 10-4
(assuming no common mode failure) (annual testing)
≡ ”Safety Integrity Level 3”
Conclusion
27
The cost of failure - BP experience
These are the direct costs only, Foinaven also incurred: • FPSO demurrage charges • NPV of production (20% * 80,000 bbl/d * 300 days * 25USD / bbl) 120MUSD • Share value erosion and significantly lower dividends for period • Loss of public / shareholder confidence in BP abilities to manage technology • Reputation damage • Tangible losses > 250MUSD, Measurable losses at least the same again • Changed BP contracting philosophy, EPC to EPCM Managed Engineeirng • Schiehallion SCM were run at single high pressure but DCV pilots were not requalified and subsequently overstressed and leaked.
The BP Bathtub Curve
28
Value of Performance An interesting echo from the 1970’s
or SAFETY
29