VEM
DFR – Design for Reliability
DFR – Fundamentals for Engineers
Reliability Audit Lab
RAL
RAL
VEM
Topics that will be covered: 1. Need for DFR 2. DFR Process 3. Terminology 4. Weibull Plotting 5. System Reliability 6. DFR Testing 7. Accelerated Testing
Reliability Audit Lab
RAL
VEM
1. Need for DFR
Reliability Audit Lab
RAL
VEM
What Customers Care about:
1. Product Life…. i.e., useful life before wear-out. 2. Minimum Downtime…. i.e., Maximum MTBF. 3. Endurance…. i.e., # operations, robust to environmental changes.
4.Stable Performance…. i.e., no degradation in CTQs. 5. ON time Startup…. i.e., ease of system startup
Reliability Audit Lab
RAL
VEM
Reliability Audit Lab
VEM Failure Mode Identification
No DFR
Time
Failure Rate
DFR
Release No DFR
DFR
Goal
Time
Resources/Costs Resources/costs
Failure Rate
(Pre-Launch)
# Failure Modes
RAL
Reliable Product Vision
Release
No DFR
DFR
50%
5%
Time
Identify & “eliminate”
Start with lower “running
Reduce overall costs by
inherent failure modes
rate”, then aggressively
employing DFR from the
before launch. (Minimize
“grow” reliability. (Reduce
beginning.
Excursions!)
Warranty Costs)
Take control of our product quality and aggressively drive to our goals Reliability Audit Lab
RAL
VEM
2. DFR - Process
Reliability Audit Lab
RAL
VEM NPI Process • CTQ Identification • Customer Metrics
• Field data analysis DP0
Specify
DP1
Design
DP2
Implement
Rel. Goal Setting
DP3
Production / Field
• Assess Customer needs
• Establish audit program
• Develop Reliability metrics
• FRACAS system using ‘Clarify’
• Establish Reliability goals
• Correlate field data & test results
System Model • Construct functional block diagrams • Define Reliability model • ID critical comps. & failure potential • Allocate reliability targets
Verification Design • Apply robust design tools • DFSS tools • Generate life predictions
• Execute Reliability Test strategy • Continue Growth Testing • Accelerated Tests • Demonstration Testing • Agency / Compliance Testing
• Begin Growth Testing
Reliability Audit Lab
VEM
1 2 3 4 5
Legacy Product DFR Process . . .
RAL
Review Historical Data • Review historical reliability & field failure data • Review field RMA’s • Review customer environments & applications Analyze Field & In-house Endurance Test Data • Develop product Fault Tree Analysis • Identify and pareto observed failure modes Develop Reliability Profile & Goals • Develop P-Diagrams & System Block Diagram • Generate Reliability Weibull plots for operational endurance • Allocate reliability goals to key subsystems • Identify reliability gaps between existing product & goals for each subsystem Develop & Execute Reliability Growth Plan • Determine root cause for all identified failures • Redesign process or parts to address failure mode pareto • Validate reliability improvement through accelerated life testing & field betas Institute Reliability Validation Program • Implement process firewalls & sensors to hold design robustness • Develop and implement long-term reliability validation audit
Reliability Audit Lab
VEM
Design For Reliability Program Summary
RAL
Keys to DFR: • Customer reliability expectations & needs must be fully understood • Reliability must be viewed from a “systems engineering” perspective • Product must be designed for the intended use environment • Reliability must be statistically verified (or risk must be accepted) • Field data collection is imperative (environment, usage, failures) • Manufacturing & supplier reliability “X’s” must be actively managed DFR needs to be part of the entire product development cycle Reliability Audit Lab
RAL
VEM
3. DFR - Terminology
Reliability Audit Lab
RAL
VEM
What do we mean by 1. Reliability 2. Failure 3. Failure Rate 4. Hazard Rate 5. MTTF / MTBF
Reliability Audit Lab
RAL
VEM
1. Reliability R(t): The probability that an item will perform its intended function without failure under stated conditions for a specified period of time 2. Failure: The termination of the ability of the product to perform its intended function 3. Failure Rate [F(t)]: The ratio of no. of failures within a sample to the cumulative operating time. 4. Hazard Rate [h(t)]: The instantaneous probability of failure of an item given that it has survived until that time, sometimes called as instantaneous failure rate.
Reliability Audit Lab
VEM
Failure Rate Calculation Example
RAL
EXAMPLE: A sample of 1000 meters is tested for a week, and two of them fail. (assume they fail at the end of the week). What is the Failure Rate?
Failure Rate =
2 failures 1000 * 24 * 7 hours
2 = failures /hour 168 , 000 = 1.19E-5 failures/hr
Reliability Audit Lab
RAL
VEM Probability Distribution Function (PDF):
The Probability Distribution Function (PDF) is the distribution f(t) of times to failure. The value of f(t) is the probability of the product failing precisely at time t.
f (t) Probability Distribution Function
time
t
Reliability Audit Lab
RAL
VEM
Common Distributions Probability Distribution Exponential Weibull
Probability Density Function, f(t)
Variate, Range, t
−λt
f t =λe f t =
0≤t∞ t
− β
β t β−1 ⋅ ⋅e η η
β
0≤t∞
2
− t− μ
Normal
Log Normal
2 1 2σ f t = ⋅e σ 2π
−∞t ∞
ln t −μ 2
2 1 2σ f t = ⋅e σt 2π
Reliability Audit Lab
0≤t∞
RAL
VEM Cumulative Distribution Function (CDF) :
The Cumulative Distribution Function (CDF) represents the probability that the product fails at some time prior to t. It is the integral of the PDF evaluated from 0 to t.
t
CDF =F t =∫ f t dt 0
f (t) Probability Distribution Function
t1
Cumulative Distribution Function Reliability Audit Lab
time
RAL
VEM Reliability Function R(t)
The reliability of a product is the probability that it does not fail before time t. It is therefore the complement of the CDF:
t
R t =1−F t =1−∫ f t dt 0
or ∞
R t =∫ f t dt
Typical characteristics: • when t=0, R(t)=1 • when t→∞, R(t) →0
t
f (t) Probability Density Function R(t) = 1-F(t)
time
t Reliability Audit Lab
RAL
VEM Hazard Function h(t) The hazard function is defined as the limit of the failure rate as Δt approaches zero. In other words, the hazard function or the instantaneous failure rate is obtained as h(t) = lim [R(t) – R(t+Δt)] / [Δt * R(t)] Δt -> 0
The hazard function or hazard rate h(t) is the conditional probability of failure in the interval t to (t + Δt), given that there was no failure at t. It is expressed as h(t) = f(t) / R(t).
Reliability Audit Lab
RAL
VEM Hazard Functions As shown the hazard rate is a function of time. What type of function does hazard rate exhibit with time? The general answer is the bathtub-shaped function.
The sample will experience a high failure rate at the beginning of the operation time due to weak or substandard components, manufacturing imperfections, design errors and installation defects. This period of decreasing failure rate is referred to as the “infant mortality region” This is an undesirable region for both the manufacturer and consumer viewpoints as it causes an unnecessary repair cost for the manufacturer and an interruption of product usage for the consumer. The early failures can be minimized by improving the burn-in period of systems or components before shipments are made, by improving the manufacturing process and by improving the quality control of the products. Reliability Audit Lab
RAL
VEM
At the end of the early failure-rate region, the failure rate will eventually reach a constant value. During this constant failure-rate region the failures do not follow a predictable pattern but occur at random due to the changes in the applied load. The randomness of material flaws or manufacturing flaws will also lead to failures during the constant failure rate region. The third and final region of the failure-rate curve is the wear-out region. The beginning of the wear out region is noticed when the failure rate starts to increase significantly more than the constant failure rate value and the failures are no longer attributed to randomness but are due to the age and wear of the components. To minimize the effect of the wear-out region, one must use periodic preventive maintenance or consider replacement of the product.
Reliability Audit Lab
Product's Hazard Rate Vs. Time : “The Bathtub Curve”
VEM
Infant Mortality
Random Failure (Useful Life)
RAL
Wear out
h(t) decreasing
Hazard Rate, h(t)
h(t) increasing h(t) constant
Wear out Failures
Manufacturing Defects
Time
Random Failures
Reliability Audit Lab
RAL
VEM Mean Time To Failures [MTTF] -
One of the measures of the system's reliability is the mean time to failure (MTTF). It should not be confused with the mean time between failure (MTBF). We refer to the expected time between two successive failures as the MTTF when the system is non-repairable. When the system is repairable we refer to it as the MTBF Now let us consider n identical non-repairable systems and observe the time to failure for them. Assume that the observed times to failure are t1, t2, .........,tn. The estimated mean time to failure, MTTF is MTTF = (1/n)Σ ti
Reliability Audit Lab
VEM
Useful Life Metrics: Mean Time Between Failures (MTBF)
RAL
Mean Time Between Failures [MTBF] - For a repairable item, the ratio of the cumulative operating time to the number of failures for that item. (also Mean Cycles Between Failures, MCBF, etc.)
EXAMPLE: A motor is repaired and returned to service six times during its life and provides 45,000 hours of service. Calculate MTBF.
Total operating time 45 ,000 MTBF = = = 7,500 hours ¿ of failures 6 MTBF or MTTF is a widely-used metric during the Useful Life period, when the hazard rate is constant Reliability Audit Lab
VEM
The Exponential Distribution
RAL
If the hazard rate is constant over time, then the product follows the exponential distribution. This is often used for electronic components.
ht = λ=constant 1 MTBF mean time between failures = λ −λt f t =λe −λt F t =1−e Rt =e−λt 1 −λ λ
At MTBF: R t =e−λt =e
=e−1 =36. 8
Appropriate tool if failure rate is known to be constant Reliability Audit Lab
VEM
RAL
The Exponential Distribution 0.0003
λ=.0003 0.0002
PDF:
λ=.0002
f(t) 0.0001
λ=.0001 0
0
1 10
4
2 10
4
3 10
4
4 10
4
5 10
4
Time to Failure 1
λ=.0001
0.667
CDF:
F(t)
λ=.0002 0.333
λ=.0003 0
0
1 10
4
Reliability Audit Lab
4 2 10
4 3 10
Time
4 4 10
5 10
4
VEM
RAL
Useful Life Metrics: Reliability
Reliability can be described by the single parameter exponential distribution when the Hazard Rate, λ, is constant (i.e. the “Useful Life” portion of the bathtub curve),
R= e
−
t MTBF
=e
− FR t
Where:
t = Mission length (uptime or cycles in question)
EXAMPLE: If MTBF for a motor is 7,500 hours, the probability of operating for 30 days without failure is ...
R=e
30 ∗ 24 hours − 7500 hours
= 0 .908 = 90 . 8
A mathematical model for reliability during Useful Life Reliability Audit Lab
RAL
VEM
3. DFR – Weibull Plotting
Reliability Audit Lab
VEM
Weibull Probability Distribution
RAL
• Originally proposed by the Swedish engineer Waloddi Weibull in the early 1950’s • Statistically represented fatigue failures • Weibull probability density function (PDF, distribution of values):
f t =
β
β -1 − t t η e β
β
η
Equation valid for minimum life = 0
t = Mission length (time, cycles, etc.) β = Weibull Shape Parameter, “Slope” η = Weibull Scale Parameter, “Characteristic Life” Reliability Audit Lab
Waloddi Weibull 1887-1979
VEM
The Weibull Distribution
RAL
This powerful and versatile reliability function is capable of modeling most real-life systems because the time dependency of the failure rate can be adjusted.
β h t = β t β -1 η f
β−1 − t βt η t = β e
β
η
− t η
R t =1−F t =e
β
Reliability Audit Lab
RAL
VEM
Weibull PDF • • •
Exponential when β = 1.0 Approximately normal when β = 3.44 Time dependent hazard rate
f
β−1 − t βt η t = β e
β
η
0 .0 0 5
β=0.5 η=1000 β=3.44 η=1000
0 .0 0 4
0 .0 0 3
0 .0 0 2
β=1.0 η=1000
0 .0 0 1
500
Reliability Audit Lab
1000
1500
2000
RAL
VEM
β > 1: Highest failure rate later“Wear-Out”
Weibull Hazard Function ht =
ht =
f t f t = 1 - F t R t β h
t η
β−1
{
[ ] [ ]}
t exp − η
t 1 - 1 - exp − η ht =
β t β -1 β η
0.006
β
β=0.5 η=1000
β
0.004
β=3.44 η=1000
h(t)
β=1.0 η=1000
0.002
0
β < 1: Highest failure rate early“Infant Mortality” Reliability Audit Lab
500
1000
1500
2000
Time
β = 1: Constant failure rate
2500
VEM
Weibull Reliability Function
RAL
Reliability is the probability that the part survives to time t. 1
− t η
R t =1−F t =e
β
β=3.44 η=1000
0.8
β=1.0 η=1000
0.6
R(t)
β=0.5 η=1000
0.4
0.2
0
0
500
1000
1500
Time Reliability Audit Lab
2000
2500
RAL
VEM
Summary of Useful Definitions - Weibull Analysis Beta (β):
The slope of the Weibull CDF when printed on Weibull paper
B-life:
A common way to express values of the cumulative density function - B10 refers to the time at which 10% of the parts are expected to have failed.
CDF:
Cumulative Density Function expresses the time-dependent probability that a failure occurs at some time before time t.
Eta (η):
The characteristic life, or time at which 63.2% of the parts are expected to have failed. Also expressed as the B63.2 life. This is the y-intercept of the CDF function when plotted on Weibull paper.
PDF:
Probability Density Function expresses the expected distribution of failures over time.
Weibull plot:
A plot where the x-axis is scaled as ln(time) and the y-axis is scaled as ln(ln(1 / (1-CDF(t))). The Weibull CDF plotted on Weibull paper will be a straight line of slope β and y intercept = ln(ln(1 / (1-CDF(0))) = η.
Reliability Audit Lab
VEM
RAL
Weibull Analysis
What is a Weibull Plot ? •
•
Log-log plot of probability of failure versus age for a product or component
•
Nominal “best-fit” line, plus confidence intervals
•
Easily generated, easily interpreted graphical read-out
Weibull Best Fit
Observed Failures
Confidence on Fit
Comparison: test results for a redesigned product can be plotted against original product or against goals Reliability Audit Lab
VEM
Weibull Shape Parameter (β ) and Scale Parameter (η ) Defined
RAL
β is called the SLOPE For the Weibull distribution, the slope describes the steepness of the Weibull best-fit line (see following slides for more details). β also has a relationship with the trend of the hazard rate, as shown on the “bathtub curves” on a subsequent slide. η is called the CHARACTERISTIC LIFE For the Weibull distribution, the characteristic life is equal to the scale parameter, η. This is the time at which 63.2% of the product will have failed. Scale and Shape are the Key Weibull Parameters
Reliability Audit Lab
VEM
RAL
β and the Bathtub Curve
β<1
β=1
• Implies “infant mortality”
• Implies failures are “random”, individually unpredictable
• If this occurs: Failed products “not to print” Manufacturing or assembly defects Burn-in can be helpful
• An old part is as good as a new part (burnin not appropriate) • If this occurs: Failures due to external stress, maintenance or human errors. Possible mixture of failure modes
• If a component survives infant mortality phase, likelihood of failure decreases with age.
β>4
1<β<4 • Implies mild wearout
• Implies rapid wearout
• If this occurs Low cycle fatigue Corrosion or Erosion Scheduled replacement may be cost effective
• If this occurs, suspect: Material properties Brittle materials like ceramics • Not a bad thing if it happens after mission life has been exceeded.
Reliability Audit Lab
RAL
VEM
5. DFR – System Reliability
Reliability Audit Lab
RAL
VEM System Reliability Evaluation
A system (or a product) is a collection of components arranged according to a specific design in order to achieve desired functions with acceptable performance and reliability measures. Clearly, th type of components used, their qualities, and the design configuration in which they are arranged have a direct effect on the system performance an its reliability. For example, a designer may use a smaller number of high-quality components and configure them in a such a way to result in a highly reliable system, or a designer may use larger number of lower-quality components and configure them differently in order to achieve the same level of reliability. Once the system is configured, its reliability must be evaluated and compared with an acceptable reliability level. If it does not meet the required level, the system should be redesigned and its reliability should be re-evaluated. Reliability Audit Lab
VEM
Reliability Block Diagram (RBD) Technique
RAL
The first step in evaluating a system's reliability is to construct a reliability block diagram which is a graphical representation of the components of the system and how they are connected. The purpose of RBD technique is to represent failure and success criteria pictorially and to use the resulting diagram to evaluate System Reliability. Benefits The pictorial representation means that models are easily understood and therefore readily checked. Block diagrams are used to identify the relationship between elements in the system. The overall system reliability can then be calculated from the reliabilities of the blocks using the laws of probability. Block diagrams can be used for the evaluation of system availability provided that both the repair of blocks and failures are independent events, i.e. provided the time taken to repair a block is dependent only on the block concerned and is independent of repair to any other block Reliability Audit Lab
RAL
VEM
Elementary models Before beginning the model construction, consideration should be given to the best way of dividing the system into blocks. It is particularly important that each block should be statistically independent of all other blocks (i.e. no unit or component should be common to a number of blocks). The most elementary models are the following Series Active parallel m-out-of-n Standby models
Reliability Audit Lab
VEM
Typical RBD configurations and related formulae
RAL
Simple Series and Parallel System Figure a shows the units A,B,C,….Z constituting a system. The interpretation can be stated as ‘any unit failing causes the system as a whole to fail’, and the system is referred to as active series system. Under these conditions, the reliability R(s) of the system is given by
R(s) = Ra * Rb * Rc * ………Rz I
A
B
C
Z
O
a) Series System Figure b shows the units X and Y that are operating in such a way that the system will survive as long as At lest one of the unit survives. This type of system is referred to as an active parallel system.
R(s) = 1 – (1 – Rx)(1 – Ry) X O
I Y b) Parallel System Reliability Audit Lab
RAL
VEM A Series / Parallel System
When blocks such as X and Y themselves comprise sub-blocks in series, block diagrams of the type are illustrated in figure c. Rx = Ra1 * Rb1 * Rc1 *……..Rz1; Ry = Ra2 * Rb2 * Rc2 *……..Rz2 Rs = 1 – (1 – Rx)(1 – Ry)
A1
B1
C1
Z1 O
I A2
B2
C2 c) Series / ParallelSystem
Reliability Audit Lab
Z2
RAL
VEM m-out-of-n units The figure represents instances where system success is assured whenever at least m of n identical units are in an operational state. Here m = 2, n = 3. Rs = (Rx)^3 + 3*(Rx)^2*Fx, where Fx = 1 – Rx.
X
I
X
2/3
X d) m-out-of-n System
Reliability Audit Lab
O
RAL
VEM
6. DFR – Reliability Testing
Reliability Audit Lab
VEM
Reliability Testing - Why?
RAL
Reliability Testing allows us to: • Determine if a product’s design is capable of performing its intended function for the desired period of time. • Have confidence that our sample-based prediction will accurately reflect the performance of the entire population. • Provide a path to “grow” a product’s reliability by identifying weak points in the design. • Confirm the product’s performance in the field. • Identify failures caused by severe applications that exceed the ratings, and recognize opportunities for the product to safely perform under more diverse applications. Reliability Audit Lab
VEM
RAL
Reliability Testing - Measures
Reliability Testing answers questions like … • What is my product’s Failure Rate? • What is the expected life? • Which distribution does my data follow? • What does my hazard function look like?
. . .. ..
• What failure modes are present? • How “mature” is my product’s reliability? These metrics and more can be obtained with the right reliability test Reliability Audit Lab
RAL
VEM
Four Major Categories of Reliability Testing • Reliability Growth Tests (RGT) - Normal Testing - Accelerated Testing
• Reliability Demonstration Tests (RDT) • Production Reliability Acceptance Tests (PRAT) • Reliability Validation (RV)
Reliability Audit Lab
VEM
Reliability Testing - Growth Testing
RAL
Scope: To determine a product’s physical limitations, functional capabilities and inherent failure mechanisms. • Emphasis is on discovering & “eliminating” failure modes • Failures are welcome. . . represent data sources • Failures in development = less failures in field • Used with a changing design to drive reliability growth • Sample size is typically small • Test Types: Normal or Accelerated Testing • Can be very helpful early in process when done on competitor products which are sufficiently similar to the new design. Used early & throughout the design process Reliability Audit Lab
VEM
Reliability Testing … Demonstration Testing
RAL
Scope: To demonstrate the product’s ability to fulfill reliability, availability & design requirements under realistic conditions. • Failures are no longer hoped for, because they jeopardize compliance (though it’s still better to catch a problem before rather than after launch!) • Management tool . . . provides means for verifying compliance • Provide reliability measurement, typically performed on a static design (subsequent design changes may invalidate the demonstrated reliability results) • Sample size is typically larger, due to need for degree of confidence in results and increased availability of samples.
Used at end of design stages to demonstrate compliance to specification
Reliability Audit Lab
VEM
Reliability Testing … Production Reliability Acceptance Testing (PRAT)
RAL
Scope: To ensure that variation in materials, parts, & processes related to move from prototypes to full production does not affect product reliability • Performed during full production, verifies that predictions based on prototype results are valid in full production • Provides feedback for continuous improvement in sourcing/manufacturing • Sample size ranges from full(screen) to partial (audit) • Test Types: Highly Accelerated Stress Screens/Audits (HASS/A), Environmental Stress Screening (ESS), Burn in Screens and Audits precipitate and detect hidden defects Reliability Audit Lab
VEM
Reliability Testing … Validation
RAL
Scope: To ensure that the product is performing reliably in the actual customer environment/application. • “Testing results” based on actual field data sources • Provides field feedback on the success of the design • Helps to improve future design / redesign & prediction methods • Requires effective data collection & corrective action process • Sample size depends on the customer & product type
Reliability Validation tracks field data on Customer Dashboards Reliability Audit Lab
VEM
Reliability Testing … The Path
RAL
NPI (New Products): Set Reliability Goals Develop Models Initial Design Accelerated Testing
Initial Design Growth Testing
NPI Pilot Readiness Mature Design
Pilot Testing Demonstration Testing
Implement Production Reliability Demonstration Audit Programs
Implementation Acceptance Testing
Establish service schedule Keep updated dashboards Ensure Data Collection Improve future design
Post-Sales Service Validation Testing
Legacy Products: Complaint generated Create case Clarify
Field Data Acquisition Validation Testing
Reproduce Failure Reliability Verification
Revise goals Redefine models Product redesign
Implement changes Reliability Demonstration Audit Programs
Verification
Product Redesign
Implementation
Growth Testing
Demonstration Testing
Reliability Tests are critical at all stages! Reliability Audit Lab
Acceptance Testing
RAL
VEM
7. DFR – Accelerated Testing
Reliability Audit Lab
VEM
Accelerated Testing
RAL
Scope : Accelerated testing allows designers to make predictions about the life of a product by developing a model that correlates reliability under accelerated conditions to reliability under normal conditions.
Time to Failure
BASIC CONCEPT
.. .
.. .
}
}
Stress
To predict here,
we test here
(Normal stress level)
(Elevated stress level)
Model: The model is how we extrapolate back to normal stress levels.
Common Models: • Arrhenius: Thermal • Inverse Power Law: Non-Thermal • Eyring: Combined
Results @ high stress + stress-life relationship = Results @ normal stress Reliability Audit Lab
VEM
Accelerated Testing
RAL
Key steps in planning an accelerated test: • Choose a stress to elevate: requires an understanding of the anticipated failure mechanism(s) - must be relevant (temp. & vibration usually apply) • Determine the accelerating model: requires knowledge of the nature of the acceleration of this failure mechanism, as a function of the accelerating stress. • Select elevated stress levels: requires a previous study of the product’s operating & destructive limits to ensure that the elevated stress level does not introduce new failure modes which would not occur at normal operating stress levels.
Applicability of technique depends on careful planning and execution Reliability Audit Lab
RAL
VEM Parametric Reliability Models One of the most important factors that influence the design process of a product or a system is the reliability values of its components.
In order to estimate the reliability of the individual components or the entire system, we may follow one or more of the following approaches. Historical Data ➢Operational Life Testing ➢Burn-In Testing ➢Accelerated Life Testing ➢
Reliability Audit Lab
RAL
VEM Approach 1 : Historical Data
The failure data for the components can be found in data banks such as GIDEP (Government-Industry Data Exchange Program),
➢
MIL-HDBK-217 (which includes failure data for components as well as
➢
procedures for reliability prediction), AT&T Reliability Manual and
➢
Bell Communications Research Reliability Manual.
➢
In such data banks and manuals, the failure data are collected from different manufacturers and presented with a set of multiplying factors that relate to different manufacturer's quality levels and environmental conditions
Reliability Audit Lab