Summary Reliability-Centered Maintenance (RCM) is the world’s dominant proactive method for determining maintenance requirements for physical assets. It is a thorough methodology that produces well documented and fully justified decisions on asset management strategies. It has been highly successful at improving reliability and safety while reducing overall maintenance costs and increasing capacity for increased revenue generation. RCM had its beginnings in the commercial airline sector and has spread to virtually every other capital intensive industry. In many cases, it has a substantial payback on its initial investment in addition to several benefits that are not as easily quantified: safety, environmental compliance and product quality. RCM provides great benefit and is worthy of consideration if reliability is truly important to your business.
Stan Nowlan and Howard Heap1 studied aircraft failures looking for correlations between those failures and the maintenance that was being performed. They recognized that maintenance was a contributing factor to many of the failures but in some other cases maintenance was able to improve the situation. They looked for patterns and found them. There were actually six patterns of Conditional Probability of Failure2. A 4% Conditional Probability of Failure
The Value of RCM in Business Today
the increased maintenance even made things worse!
The Value of RCM in Business Today Reliability-Centered Maintenance (RCM) is the world’s dominant proactive method for determining the maintenance requirements for physical assets. It is of particular value to capital intensive industries where business success depends heavily on the use of its assets operating reliably and safely. RCM was developed in the commercial airline sector in the 1970’s. At the time, the commercial aircraft industry was experiencing some 60 crashes per million take-offs. Roughly 40 of those were attributed to equipment failure. The industry was in the early stages of design of its jumbo jets – the Boeing 747, McDonald Douglas DC-10 and Lockheed L-1011. It feared that if airliners continued to crash at the same rate it would not attract many customers, ultimately failing to reach its growth potential. They attempted to cure the problem by increasing the amount of maintenance they were doing – after all, many failures were equipment related. To their dismay, they discovered that in many cases
B 2%
C 5%
D 7%
E 14%
F 68% Operating Age
•
•
Pattern A is the well-known bathtub curve. It begins with a high incidence of failure (known as in infant mortality) followed by a constant or gradually increasing conditional probability of failure, then a wear-out zone. This pattern appears in biological systems (like us) and in simple systems that have only a few dominant failure modes. Pattern B – classic wear out, shows constant or slowly increasing conditional probability of failure, ending in a wear-out zone. Prior
1
Nowlan, F. Stanley, and Howard F. Heap, “Reliability-Centered Maintenance,” Department of Defense, Washington, DC, 1978. Report number AD-A066579. 2 Conditional Probability of Failure is the probability of failure of an asset at any instant in time given the condition that it has survived to that point in time.
© 2006, Conscious Asset Management +1-705-431-6598 www.consciousasset.com
[email protected]
1
to the Nowlan and Heap study, this was the dominant view of equipment failure. It occurs in assets that are in contact with product, process fluids and slurry’s and drive components.
•
•
•
•
Pattern C – gradual aging, shows slowly increasing conditional probability of failure, but there is no identifiable wear-out age. This occurs where there is erosion, corrosion or fatigue. Pattern D – best new, shows low conditional probability of failure when the item is new or just out of the shop, then a rapid increase to a constant level. This occurs in systems, usually complex, that are maintained and put into service by highly qualified technicians before being turned over to less qualified operators. Examples are hydraulic, fluid power and pneumatic systems. Pattern E – totally random, shows a constant conditional probability of failure at all ages. This pattern appears in many systems or components that are, on their own, not typically subject to maintenance work. Rolling element bearings and incandescent light bulbs are examples of this type of failure. Pattern F – starts with high infant mortality, dropping to a constant or slowly decreasing conditional probability of failure. This is common in complex systems that are subject to start up and shut down cycles, frequent overhaul type maintenance work and product cycle fluctuations.
Nowlan and Heap’s study on civil aircraft showed that 4% of the items conformed to pattern A, 2% to B, 5% to C, 7% to D, 14% to E and no fewer than 68% to pattern F. The number of times these patterns occur in aircraft is not necessarily the same as in industry. But there is no doubt that as assets become more complex, we see more and more of patterns E and F. Later studies3 have shown the same patterns with somewhat different (but similar) distributions. 3
Broberg (1973) also studied aircraft and two studies were performed on submarine failures (MSP in 1982 and SUBMEPP in 2001). All show similar patterns with somewhat different percentage distributions. In the submarine
These findings contradicted the then-current belief that there was a connection between reliability and operating age. This belief led to the idea that the more often an item is overhauled, the less likely it would be to fail. Nowadays, this is seldom true. Unless there is a dominant age-related failure mode, age limits do little or nothing to improve the reliability of complex items. In fact scheduled overhauls often increase overall failure rates by introducing infant mortality into otherwise stable systems. To make practical use of this information, Nowlan and Heap developed the RCM process and their findings were published by the US Department of Defense (1978). Various military standards4 (in the US and UK) and an aerospace industry standard5 (international) were then published. Two other excellent reference books on RCM soon followed for commercial application. John Moubray published the first edition of his book6 in 1991, followed by Anthony M. Smith7 in 1993. Smith focused primarily on the electric generation industry while Moubray’s work was more general in nature and has broader application. Throughout the 1990’s a proliferation of various maintenance program development methods arose, all of them claiming to be RCM. For reasons of improved economy, the US military then decided to eliminate the requirement for use of its stringent MIL-STDs if suitable commercial alternative studies there were far fewer pattern F failures but more of patterns E and B. This is attributed to maintenance programs that include “run in” of assets after they are maintained in order to eliminate infant mortality once the asset is put into service. 4 US DOD: MIL-STD 2173(AS), NAVAIR 0025-403, S9081-AB-GIB-010/MAINT (The USN’s RCM Handbook) and UK MOD: NES 45. 5 MSG-3, “Maintenance Program Development Document”, Air Transport Association, Washington, DC 6 Moubray, John, “Reliability-centred Maintenance II”, 1991, Butterworth-Heinemann, Oxford, UK (now in 2005, it is in its 2nd edition). 7 Smith, Anthony M., “Reliability-Centered Maintenance”, 1993, McGraw-Hill, Inc., New York, NY
© 2006, Conscious Asset Management +1-705-431-6598 www.consciousasset.com
[email protected]
2
were available. To clear the confusion that existed in the commercial market, and at the encouragement of the US DOD, the SAE developed its standard, JA10118. The remainder of this paper discusses RCM as described in the SAE standard. RCM has proven to be highly successful. In commercial airlines it has reduced the crashes from 60 per million to only 2 per million, a 30fold improvement and reduced the proportion of equipment related causes from 40 per million to 0.3 per million. Commercial air travel today is extremely safe with an average of one commercial airliner crashing per month. That may seem high, but relative to the industry’s performance in the 1960’s and 70’s it is a vast improvement. Your chances of being hit by lightning are 5 times greater! The RCM process requires the answering of 7 questions in sequence: 1. What are the functions and associated desired standards of performance of the asset in its present operating context (functions)? 2. In what ways can it fail to fulfill its functions (functional failures)? 3. What causes each functional failure (failure modes)? 4. What happens when each failure occurs (failure effects)? 5. In what ways does each failure matter (failure consequences)? 6. What should be done to predict or prevent each failure (proactive tasks and task intervals)? 7. What should be done if a suitable proactive task cannot be found (default actions)? These seem simple enough, but answering these questions satisfactorily requires a deep understanding of RCM that goes beyond the scope of this paper. The results of RCM analyses are documented on Information Worksheets and Decision Logic Worksheets. Many companies opt for 8
SAE JA1011, “Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes”, Aug 1999.
computerized data base systems to track their analyses, but print the worksheets for review purposes. Successful RCM analysis requires a detailed knowledge of the physical assets, what they are intended to do for the company along with performance standards, how the fail, what happens when they fail, how to repair them and of a variety of maintenance approaches that can be used. RCM requires a team effort. Teams are typically comprised of 3 to 4 operators and maintainers plus a facilitator. The knowledge required to do RCM analysis is found in operators and maintainers of the assets. Periodically they are supported by engineers and other specialists in the design, construction, use and maintenance of the assets. The teams are facilitated by an analyst who is trained to a great depth in RCM and in how to facilitate the analysis. RCM analyses are conducted in small projects. Each project deals with a system or piece of equipment. The projects are chosen so that they can be completed in no more than 15 three-hour meetings by the analysis team. Analysis of all the assets at a particular facility can take several months to a few years depending on the size and complexity of the facility and its systems. Toronto Hydro, the electric distribution utility for Canada’s largest city, did approximately 100 projects covering most of their assets within a 2 year period. GE Plastics in the Netherlands analyzed critical assets in some 32 projects over 2 ½ years. The Canadian Navy analyzed all systems (nearly 250) on its then new ships (in the late 1980’s) in a four year period. RCM is a proactive analysis process. It is used to determine what will happen when failures occur and to decide on appropriate measures to mitigate the consequences BEFORE they happen. The alternative to RCM is to allow the failures to happen, to suffer the consequences and then to decide on what to do to avoid them in the future. There are a variety of approaches that are used: Root Cause Failure Analysis, Preventive Maintenance Optimization and various engineering statistical analysis methods all deal with the failures after the fact. With the exception of PMO, they require statistical data that can only be collected for your facility after
© 2006, Conscious Asset Management +1-705-431-6598 www.consciousasset.com
[email protected]
3
you have experienced the failures. These methods all work very well and they are, in the opinion of the author, excellent as enhancements to any existing maintenance program, but they are not a replacement for proactively avoiding the problems in the first place. Only RCM attempts and succeeds at doing that economically, yet even RCM has its weakness. RCM cannot detect multiple-independent failure events that, when combined, lead to undesirable (often catastrophic) consequences. Fault-tree analysis deals with this successfully but it is very complicated and expensive to perform. Faulttree analysis is usually reserved only for highly critical systems such as nuclear power plants, avionics systems, high-tech weapons, potentially hazardous chemical and biological processes.
•
•
• One company shared its lessons from applying RCM publicly in a conference in late 2002. Toronto Hydro acknowledged that RCM was tougher to perform than they originally anticipated: • Dealing with hidden failures proved challenging for many analysts not familiar with the concept. • They found that their data collection wasn’t sufficient to support many of the calculations they had to make. A recent benchmarking study in the US9 revealed that over 50% of the 800 participants did not trust their data systems enough to perform reliability calculations. Although RCM doesn’t require accurate statistical data it helps improve to improve the results. • The use of untrained analysts slowed progress substantially on a few of their projects. • Careful planning is critical to success. Analysts needed the training. It was important to make sure the people with the greatest knowledge of specific assets were trained in time for their scheduled projects. • They found it challenging for equipment specialists to begin looking at systems from a systems’ perspective. Engineers and technicians used to 9
Terry Wireman, Genesis Solutions, presented at MARTS, 2005.
•
•
•
dealing with minute details tended to get bogged down in details. This problem proved to be temporary – eventually they became very good system analysts. They found that they didn’t know their own systems as well as they thought. A great deal of time was spent discovering how some of their assets, particularly the older ones, really worked. Potential cost savings mean reductions in labor and materials use. As they were seeing many opportunities to save, they realized that it could mean loss of jobs. This led to increased resistance to implementing the results of the analyses. Eventually they got past this problem. RCM looks like a lot of work but the use of “templates” can substantially reduce the effort for fleets or “classes” of assets. Getting some analysts to participate proved problematic due to the time commitment it required. The analysis time competed with their other job responsibilities. Management intervention was sometimes needed to resolve the conflicting priorities. Project estimations were initially low and time commitments for individuals were greater than expected. Some schedule delays occurred as a result. Over time the experience gained through doing analyses helps in estimating new projects. For hidden failures RCM uses “riskbased” decision criteria. For an electric utility where there is a great deal of redundancy there are many hidden failures with purely operational consequences. Electrical equipment and systems are often designed to “fail safe” so the risk-based approach was confusing. Additional operational criteria were developed for use in these unique, but plentiful circumstances.
For all the effort and the learning curve there is a substantial payback. The commercial airline industry saw huge gains in reliability and safety. Toronto Hydro saw an average of 22% cost reductions over their two years of analysis work.
© 2006, Conscious Asset Management +1-705-431-6598 www.consciousasset.com
[email protected]
4
Initially they experienced 34% from their 10 pilot projects – of course they had picked “high value” targets in order to really test the value of RCM. The Canadian Navy, using an earlier military-version of RCM experienced substantial gains in ship availability from 60% to 70%, maintenance cost reductions estimated by the author to be on the order of CAD 200 million and the avoidance of CAD 2 billion in capital costs. They replaced 14 ships with only 12 due to the increase in ship availability that RCM delivered. GE Plastics in the Netherlands saw MTBF on their critical manufacturing systems increase from 8 hours to 80 days, 50% cost reductions and a 30% staff reduction in maintenance (they were clearly overmaintaining). Note that staff reductions don’t necessarily mean layoffs and job losses. In the author’s experience, employees are usually re-assigned to other valuable work that was not being done due to the pressures to keep the formerly unreliable assets running. In North America the ability of RCM to reduce the amount of work required is part of the solution to an increasingly serious problem with demographics. The pending retirement of much of its aging work force will leave many companies scrambling for talent that is increasingly unavailable and poorly qualified for the available jobs. There are fewer people in the younger generations and they have largely opted out of careers in the trades. RCM takes time. A large facility such as a fossil fuel power plant (e.g.: Castle Peak in Hong Kong) required 18 months to analyze all of its systems in a concerted effort. Toronto Hydro has taken 2 years for most of its assets, GE Plastics took 2.5 years, the Canadian Naval project was almost 4 years but it ran in parallel with the design effort. Fortunately, the benefits and payback begin to materialize early. As Toronto Hydro discovered, it is common for companies to discover a great deal about their physical assets that they didn’t already know. This can lead to substantial savings and other benefits. The author worked with one paper making company in the southern USA on a paper towel converting line. They discovered that the function of a large guard was to protect the
equipment from water drips. The guard had been installed early in the life of the plant when the air conditioning system used chilled water and un-insulated pipes. Condensate drips were a problem in those early days. The air conditioning system had been replaced some 10 years prior to the analysis and the drips eliminated. Unfortunately they didn’t eliminate the guard at the same time! No one had questioned its function. For 10 years, they continued removing and replacing it in order to do maintenance work on the towel line. That added 8 hours of downtime to each repair and it happened about once every two weeks. Over 10 years that amounted to roughly 2,080 hours of downtime on an otherwise very profitable production line. There were also several other towel lines with the same sort of guard. They removed all the guards after their first RCM team meeting and never replaced them! There are other examples like this. Nearly everyone the author has met who has done RCM analysis work, has had a similar experience from learning more about their installed assets. RCM is not cheap. Training is provided to analysts and facilitators. The pilot projects must be planned carefully. The analysts’ time must be scheduled and in some cases replacements for those analysts must be found – at least part of the time. Facilitators can work on RCM full time but then their replacements are required on a full time basis, at least for the time when the analysis projects are in progress. Once the facilitators are trained they require mentoring. Most facilitators are not comfortable in their new roles until they have done one or two full analyses backed up by an experienced mentor who trained them. The training, planning support and mentoring are all provided by consultants, highly trained specialists who are in demand and don’t come cheaply. And then there is the cost of the time required to do the analysis work. For example, a large facility with over 100 system projects, each requiring 10 to 15 meetings (average 12.5) of 3 hours each and a team of 4 people requires 15,000 man hours (about 8 man years) of effort. Toronto Hydro paid approximately CAD 272,000 for the external support, training and their analysts’ time. Their payback was large – they had an internal rate of return on that investment of 180% over 10 years. The costs
© 2006, Conscious Asset Management +1-705-431-6598 www.consciousasset.com
[email protected]
5
were covered in a few months and the benefits are returned over many years. The most obvious benefit or value to business is generated by the increased reliability and availability of the physical assets that are analyzed. Assets running longer and with fewer breakdowns produce more. If you can sell everything you can make, then you generate more revenue. If you are market constrained, then you can’t sell any more produce (or service). You can still benefit from reducing the total capacity of physical assets required to deliver what you can sell. In the case of several mining companies it has been possible to “park” haul trucks that were no longer needed because the remaining fleet availability provided the needed hauling capacity. This saved them both operating and maintenance costs and brought in revenue from the resale of their excess fleet. In a plant environment where it is practical to shut down for one or more shifts, it may be possible to increase shift production and reduce the number of required shifts. RCM reduces the overall maintenance effort. If you are over-maintaining, as utilities and others with a strong reliability culture often do, then you can reduce your Preventive Maintenance program budget. If you are under-maintaining, then you are probably doing more repairs than you need to. By determining the right amount of proactive maintenance you can eliminate failures and the repair costs associated with them. For example, elimination of overhaul work where failure pattern F is present will eliminate premature failures. Doing more condition monitoring where random failures are most common (patterns D, E and F) will catch failures that have begun, but not yet progressed to the fully functional failed state. This will eliminate costly breakdown repairs. The discovery of hidden failures through regular testing will eliminate situations where the backup or safety system fails to operate. This invariably means a less expensive repair and avoidance of the cost of downtime when the primary device eventually fails. RCM also reduces other, less quantifiable business risks. Product quality and continuity of service is often highly dependent on the smooth running of your physical assets. RCM keeps
those assets running well and within product quality standards and limits. Those limits on say, control systems and tolerances, are used as desired standards of performance when defining system functions. Failure to meet those standards, no matter how minor, is treated as a functional failure and proactive steps are taken to avoid the consequences. Where product quality problems can lead to product liability issues RCM helps avoid them. Pharmaceutical and food companies use highly automated processes to maintain stringent product quality tolerances. Keeping to tolerances on packaging lines means that you won’t deliver less than you are stating on the packaging and it keeps the over-fill to an absolute minimum. That keeps you from giving away product for free. Keeping product within its spec limits means less scrap, less waste and less risk of product that is out of spec (and potentially dangerous) from making its way into the market place. This can also help to reduce product risk liability and insurance premiums. Utilities delivering electricity, gas and water to customers strive keep their product within tolerances on voltage, cycles and pressure. Imagine the insurance payout if a water utility over-pressurized water mains in a large city or failed to deliver water in the event of a major fire. What if a gas utility over-pressurized its mains? Electrical voltage spikes can also damage a great deal of equipment. Keeping within these various tolerances is important – RCM helps. A business operating reliably is able to forecast its revenues more accurately. The financial industry appreciates this consistency and predictability. Banks will provide more favorable loan conditions to reliable businesses. The author knows one Canadian bank that already makes these offers if RCM is used. Stock market analysts treat reliable businesses more favorably making it easier to raise capital in the investment markets. Insurance companies will see the reduced claims potential and reduce premiums. The author knows of at least two major global insurers who have done this for major global clients. In one case the premium reduction for a global mining company was $US 2 million per year on their business loss insurance. And in some other cases, previously uninsurable risks can become insurable.
© 2006, Conscious Asset Management +1-705-431-6598 www.consciousasset.com
[email protected]
6
Warranties are another area of benefit to business. Manufacturers usually offer a warranty on materials and workmanship for new assets and the cost of that is built into their products. They also offer, for an additional cost, an extended warranty that covers a wider range of potential problems. Those warranties can cost on the order of 2 – 3 % of the capital outlay for the asset. If RCM is used to develop your maintenance program, then the asset is far less likely to fail and therefore it is far less likely you will have an opportunity to make a warranty claim. If you don’t mind accepting the risk, you can negotiate you’re your suppliers and save your company 2 – 3 % on the capital costs of new asset acquisitions. That’s usually enough to pay for the RCM analysis effort for a Greenfield application and the risks are largely eliminated. In other cases, you might choose to pay the warranty costs and negotiate with the manufacturer to allow you to use your maintenance program rather than the one they recommend while they still honor the warranty if you make any claims. One coal mining company did this and the manufacturer was happy to comply with their request. The mining company used their RCM maintenance program and they never made a claim! Safety is where RCM got started and it is an area where RCM excels. Safety is built into the RCM decision logic criteria and the “run-to-failure” option is not available if safety is in jeopardy. Most industrial accidents happen during maintenance or due to failures of physical assets. Failures resulting in safety hazards are dealt with proactively. The overall reduction of maintenance work that is required and the emphasis that RCM gives to condition monitoring techniques (which are usually nonintrusive) means less exposure to risk for maintainers. The reductions in failures mean fewer repairs and again, less exposure to risk. Safety performance is enhanced over time following the implementation of programs developed using RCM. The experience of the airline industry is perhaps the most graphic example of the safety benefits of RCM. The benefits to safety do not go un-noticed. Improved safety performance over time will also
result in reductions in insurance premiums for workers’ compensation coverage. Environmental non-compliance can result in fines, partial or total shut down of operations. In some extreme cases the license to operate can be revoked. RCM helps to avoid these fines and shutdown related losses and to keep you in business. Like safety, environmental consequences are dealt with proactively and strictly – run-to-failure is not an option. It is no accident that the nuclear power industry uses RCM extensively! Regulatory agencies enforce rules that are meant to ensure compliance to minimum standards. They are society’s way of telling us what it will tolerate before taking punitive action against your company. Regulations are most commonly applied where safety and the environment are concerned. Even product safety considerations, as in the pharmaceuticals and food, give rise to extensive regulations. For example, in those industries calibration programs are mandated and strictly enforced. Typically, the regulations require you to meet a certain minimum standard but fall short of specifying what to do to and how to do it. Those choices are left up to you. Inspectors for the regulatory agencies, like those for insurance companies and safety agencies, have a great deal of discretion in judging how well you are complying with the regulations. RCM is a very solid methodology for determining how to meet regulatory requirements. Those requirements are treated as functions of the asset and any failure to comply is handled in the analysis quite easily. RCM produces extensive documentation that can be used to back up your maintenance program. It enables you to show deliberate proactive effort to comply backed up with a bullet-proof logic for the choices you make. Regulators seldom have the knowledge to disagree with your program. In the unlikely and unfortunate event that something does go wrong, you can also use the documentation as evidence that you did everything you could to avoid the problem. Of course you must also show that you complied with the maintenance program requirements you developed using RCM for this to work!
© 2006, Conscious Asset Management +1-705-431-6598 www.consciousasset.com
[email protected]
7
In summary, RCM is a well proven methodology to deliver a great deal of value to business in a wide range of areas: • Increased output • Reduced maintenance costs • Reduction of production loss (opportunity costs) • Improved product quality and service level compliance • More predictable business performance • Favorable treatment by lending institutions – better loan terms • Favorable treatment by insurers – reduce premiums • Favorable treatment by manufacturers – reduce or eliminate extended warranty costs • Improved safety performance • Improved environmental compliance • Defensible and auditable maintenance program to satisfy regulators RCM delivers. The reader is encouraged to explore the topic further and to consider RCM as a key strategic initiative for any company that relies on physical assets in carrying out its business.
About the Author James V. Reyes-Picknell, P.Eng. James V. Reyes-Picknell is founder and President of Conscious Asset Management of Toronto, Canada. He provides consulting in the field of physical asset / maintenance management. His services span strategy, operations, process improvement and executive development. James is a licensed professional engineer, an Aladon certified RCM II practitioner, honors graduate of the University of Toronto in Mechanical engineering (1977). He has studied at the Royal Navy Engineering College, Technical University of Nova Scotia and Dalhousie University. His career in operations and maintenance spans 28 years, 10 of those in physical asset and maintenance management consulting.
cycle management and analysis, diagnostic assessments, benchmarking for best practices, business process design and enterprise asset management systems. James has been published in several books as a co-author, a contributing author and is the author of numerous articles published in several periodicals. He is presently authoring the 2nd edition of the popular book on maintenance management, “Uptime, Strategies for Excellence in Maintenance Management”. Prior to founding his company, James was leader of the global Enterprise Asset Management practice of IBM Business Consulting Services (formerly known as PwC Consulting, PricewaterhouseCoopers and Coopers & Lybrand). He has worked as a Marine Engineer in the Canadian Navy, a specialist machinery engineer with Exxon Chemicals in Canada, the Maintenance and Support Planning Manager for a large warship design and construction project and Logistics Support Manager for both Helicopter and Microwave Landing Systems projects. His industry experience includes: aerospace, automotive, brewing, computers, consumer goods manufacturing, defense, electric power utility (generation, transmission, distribution), facilities management, forest products, gas processing and transmission, health care, higher education, marine, metals, mining, oil & gas (upstream, refining), petrochemical, pharmaceutical and postal services and water / waste water utilities.
James’ experience includes plant, fleet and facility maintenance, strategy development and implementation, reliability management and engineering, spares and operating supplies, life
© 2006, Conscious Asset Management +1-705-431-6598 www.consciousasset.com
[email protected]
8