Halt To Hass To Hasa

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Download & View Halt To Hass To Hasa as PDF for free.

More details

  • Words: 8,992
  • Pages: 177
The following presentation materials are copyright protected property of Ops A La Carte LLC.


How to Move from HALT to HASS to HASA for

by Mike Silverman, CRE Managing Partner, Ops A La Carte LLC [email protected] // www.opsalacarte.com // (408) 472-3889 2

Presenter Biography • Mike is founder and managing partner at Ops A La Carte, a Professional Consulting Company

that has in intense focus on helping customers with end-to-end reliability. Through Ops A La Carte, Mike has had extensive experience as a consultant to high-tech companies, and has consulted for over 125 companies including Cisco, Ciena, Siemens, Abbott Labs, and Applied Materials. He has consulted in a variety of different industries including power electronics, telecommunications, networking, medical, semiconductor, semiconductor equipment, consumer electronics, and defense. • Mike has 20 years of reliability and quality experience. He is also an expert in accelerated

reliability techniques, including HALT&HASS, testing over 500 products for 100 companies in 40 different industries. Mike has authored and published 7 papers on reliability techniques and has presented these around the world including China, Germany, and Canada. He has also developed and currently teaches 10 courses on reliability techniques. • Mike has a BS degree in Electrical and Computer Engineering from the University of Colorado

at Boulder, and is both a Certified Reliability Engineer and a course instructor through the American Society for Quality (ASQ), IEEE, and Effective Training Associates. Mike is a member of ASQ, IEEE, SME, ASME, PATCA, and IEEE Consulting Society and is an officer in the IEEE Reliability Society for Silicon Valley.


Ops A La Carte assists clients in developing and executing any and all elements of Reliability through the Product Life Cycle.

Ops A La Carte has the unique ability to assess a product and understand the key reliability elements necessary to measure/improve product performance and customer satisfaction. Ops A La Carte pioneered “Reliability Integration" – using multiple tools in conjunction throughout each client's organization to greatly increase the power and value of any Reliability Program. 4

Ops A La Carte Services Reliability Integration in the Concept Phase 1.



Gap Analysis


Reliability Program and Integration Plan Development

Reliability Integration in the Design Phase 1.

Reliability Modeling and Predictions


Derating Analysis/Component Selection


Tolerance/Worst Case Analysis/Design of Experiments


Risk Management / Failure Modes, Effects, & Criticality Analysis (FMECA)


Fault Tree Analysis (FTA)


Human Factors/Maintainability/Preventive Maintenance Analysis


Software Reliability


Ops A La Carte Services, continued Reliability Integration in the Prototype Phase 1.

Reliability Test Plan Development


Highly Accelerated Life Testing (HALT)


Design Verification Testing (DVT)


Reliability Demonstration Testing


Failure Analysis Process Setup

Reliability Integration in the Manufacturing Phase 1.

Highly Accelerated Stress Screening (HASS)


On-Going Reliability Testing


Repair Depot Setup


Field Failure Tracking System Setup


Reliability Performance Reporting


End-of-Life Assessment


Ops A La Carte Services, continued Reliability Training/Seminars 1. Reliability Tools and Integration for Overall Reliability Programs 2. Reliability Tools and Integration in the Concept Phase 3. Reliability Tools and Integration in the Design Phase 4. Reliability Tools and Integration in the Prototype Phase 5. Reliability Tools and Integration in the Manufacturing Phase 6. Reliability Techniques for Beginners 7. Reliability Statistics 8. FMECA 9. Certified Reliability Engineer (CRE) Preparation Course for ASQ 10.Certified Quality Engineer (CQE) Preparation Course for ASQ



“the process of seamlessly cohesively integrating reliability tools together to maximize reliability and at the lowest possible cost”


Reliability vs. Cost ♦

Intuitively, one recognizes that there is some minimum total cost that will be achieved when an emphasis in reliability increases development and manufacturing costs while reducing warranty and in-service costs. Use of the proper tools during the proper life cycle phase will help to minimize total Life Cycle Cost (LCC).

CRE Primer by QCI, 1998


Reliability vs. Cost, continued








Reliability vs. Cost, continued In order to minimize total Life Cycle Costs (LCC), a Reliability Engineer must do two things: ♦

choose the best tools from all of the tools available and must apply these tools at the proper phases of a product life cycle.

properly integrate these tools together to assure that the proper information is fed forward and backwards at the proper times.


Reliability vs. Cost, continued

As part of the integration process, we must choose a set of tools at the heart of our program in which all other tools feed to and are fed from. The tools we have chosen for this are:



HALT and HASS Summary ♦

Highly Accelerated Life Testing (HALT) and Highly Accelerated Stress Screening (HASS) are two of the best reliability tools developed to date, and every year engineers are turning to HALT and HASS to help them achieve high reliability.


HALT and HASS Summary, continued ♦

In HALT, a product is introduced to progressively higher stress levels in order to quickly uncover design weaknesses, thereby increasing the operating margins of the product, translating to higher reliability.

In HASS, a product is “screened” at stress levels above specification levels in order to quickly uncover process weaknesses, thereby reducing the infant mortalities, translating to higher quality.


HALT and HASS Summary, continued

This presentation shall review the best reliability tools to use in conjunction with HALT and HASS and how to integrate them together.



Reliability Integration Tools - Summary ♦ PHASE I: Concept Phase • Reliability Integration in the CONCEPT Phase -

Tools that are used in the concept phase of a project in order to define the reliability requirements of a program. Benchmarking is usually required. • The output of this phase is the Reliability Program

and Integration Plan. This plan will specify which tools to use and the goals and specifications of each. This is the plan that drives the rest of the program.


Reliability Integration Tools - Summary ♦ PHASE II: Design Phase • Reliability Integration in the DESIGN Phase - Tools

that are used in the design phase of a project after the reliability has been defined. • Predictions and other forms of reliability analysis

are performed here. • These tools will only have an impact on the design

if they are done very early in the design process.


Reliability Integration Tools - Summary ♦ Phase III: Prototype Phase • Reliability Integration in the PROTOTYPE Phase -

Tools that are used after a working prototype has been developed. • This represents the first time a product will be tested. • The testing will mostly be focused at finding design



Reliability Integration Tools - Summary ♦ Phase IV: Manufacturing Phase • Reliability Integration in the MANUFACTURING

Phase - Tools here are a combination of analytical and test tools that are used in the manufacturing environment to continually assess the reliability of the product. • The focus here will be mostly at finding process



Reliability Integration Tools - Summary ♦ In this seminar, we shall concentrate on the

tools used in the Prototype and Manufacturing Phases. ♦ We offer other seminars on tools used in the Concept and Design Phases.



Reliability Integration in the PROTOTYPE Phase • Highly Accelerated Life Testing (HALT) • Failure Reporting, Analysis and Corrective Action

System (FRACAS) • Reliability Demonstration Test




HALT - Highly Accelerated Life Test • • • • • •

Quickly discover design issues. Evaluate & improve design margins. Release mature product at market introduction. Reduce development time & cost. Eliminate design problems before release. Evaluate cost reductions made to product.

Developmental HALT is not really a test you pass or fail, it is a process tool for the design engineers. There are no pre-established limits.

St re ss

HALT, How It Works

Start low and step up the stress, testing the product during the stressing


St re ss

HALT, How It Works


ilu re

Gradually increase stress level until a failure occurs


ilu re

aly sis



St re ss

HALT, How It Works

Analyze the failure 28

HALT, How It Works

pr o

Make temporary improvements




ilu re

aly sis

St re ss



HALT, How It Works


pr o




ilu re

aly sis



S (in tre cr s e s

Increase stress and start process over


HALT, How It Works



S (in tre cr s e s


ilu re



pr o

aly sis

Fundamental Technological Im Limit


HALT, Why It Works Classic S-N Diagram (stress vs. number of cycles) Point at which failures become non-relevant S0= Normal Stress conditions

S2 Stress

N0= Projected Normal Life

S1 S0



N0 32

Margin Improvement Process

Lower Destruct Limit

Lower Oper. Limit

Product Operational Specs

Upper Oper. Limit

Upper Destruct Limit

Stress 33

Margin Improvement Process

Lower Destruct Limit

Lower Oper. Limit

Product Operational Specs

Upper Oper. Limit

Upper Destruct Limit

Destruct Margin Operating Margin

Stress 34



Summary of Customers Industry Types

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Number of Companies Networking Equipment 6 Defense Electronics 4 Microwave Equipment 4 Fiberoptics 2 Remote Measuring Equipment 2 Supercomputers 2 Teleconferencing Equipment 1 Video Processing Equipment 1 Commercial Aviation Electronics 1 Hand-held Computers 1 Hand-held Measuring Equipment 1 Monitors 1 Medical Devices 1 Personal Computers 1 Printers and Plotters 1 Portable Telephones 1 Speakers 1 Telephone Switching Equipment 1 Semiconductor Manufacturing 1 TOTAL 33

Product Type Electrical Electrical Electrical Electrical Electrical Electrical Electro-mechanical Electrical Electrical Electrical Electrical Electrical Electro-mechanical Electrical Electro-mechanical Electrical Electro-mechanical Electrical Electro-mechanical 36

Summary of Products by Customer Field Environment Environment Number of Type Products

Thermal Environment



0 to 40°C

Office with User


0 to 40°C



-40 to +75°C



-40 to +60°C

Field with User


-40 to +60°C



-40 to +75°C


Vibration Environment Little or no vibration Vibration only from user of equipment 1-2 Grms vibration, 0-200 Hz frequency Little or no vibration Vibration only from user of equipment 1-2 Grms vibration, 0-500 Hz frequency

47 37

Summary of Results - by attribute Attribute Average Most Robust Least Robust Median


Thermal Data,oC LDL UOL


Vibration Data, Grms VOL VDL


























Summary of Results - by field environment Environment


Thermal Data,oC LDL UOL


Vibration Data, Grms VOL VDL








Office with User





















Field with User















Summary of Results - by product application Thermal Data,oC

Product Application

Vibration Data, Grms





























Summary of Results - by stress -

Cold Step Stress: 14% Hot Step Stress: 17% Rapid Thermal Transitions: 4% Vibration Step Stress: 45% Combined Environment: 20%

Significance: Without Combined Environment, 20% of all failures would have been missed


Failure Details by Stress - Cold Step Stress Failure Mode


Failed component


Circuit design issue


Two samples had much different limits


Intermittent component

1 42

Failure Details by Stress - Hot Step Stress Failure Mode


Failed component


Circuit design issue


Degraded component


Warped cover



Failure Details by Stress - Rapid Temperature Transitions Failure Mode


Cracked component


Intermittent component


Failed component


Connector separated from board

1 44

Failure Details by Stress - Vibration Step Stress Failure Mode Broken lead Screws backed out Socket interplay Connector backed out Component fell out of socket Tolerance issue Card backed out Shorted component Broken component Sheared screws

Qty 43 9 5 5 5 4 4 2 2 1

Failure Mode RTV applied incorrectly Potentiometer turned Plastic cracked at stress point Lifted pin Intermittent component Failed component Connectors wearing Connector intermit. contact Connector broke from board Broken trace

Qty 1 1 1 1 1 1 1 1 1 1


Failure Details by Stress - Combined Environment – (combination of vibration with rapid temp transitions)

Failure Mode Broken lead Component fell off (non-soldered) Failed component Broken component Component shorted out Cracked potting material Detached wire Circuit design issue Socket interplay

Qty 10 4 3 1 1 1 1 1 1 46

HALT Flow Chart Reliability - Highly Accelerated Life Testing (HALT) Flow


Use Reliability Modeling/ Derating Data as Input

Perform a Failure Modes and Effects Analysis (FMEA) to Determine Weakpoints in a Design

Research Environmental Limitations on All "Exotic" Technologies Being Used

Perform HALT, Taking Product Outside Environmental and Performance Specs to Find Weakpoints

Evaluate Failures/ Weaknesses and Fix Those That Are Relevant and CostEffective Send failure information to FRACAS


Retest Product to Determine New Limits

Are Margins Acceptable for Reliability Reqts?


Use Results to Develop a HASS Profile

Publish Results

Use Results to Develop a Reliability Demonstration Test





This is also sometimes referred to as Closed Loop Corrective Action (CLCA) or Corrective and Preventive Action (CAPA).

The purpose of the FRACAS is to provide a closed loop failure reporting system, procedures for analysis of failures to determine root cause, and documentation for recording corrective action.

CRE Primer by QCI, 1998


FRACAS, continued ♦

This closed loop system should include: • Assurance that the root cause for each failure

is found and clearly defined. • Provisions to assure that effective corrective

actions are taken on a timely basis • Follow-up audits for all open failure reports,

failure analyses, suspense dates




• Reporting all delinquencies to management ♦

An integral part of the FRACAS is the failure review board (FRB). The FRB is responsible for initiating and reviewing corrective action to ensure reliability improvement.

CRE Primer by QCI, 1998


FRACAS: How to use in conjunction with a HALT ♦

When performing HALT, failures are identified and each must be taken to root cause. FRACAS is the perfect tool for this. A FRACAS can: • Help classify failures as to their relevancy • Help choose the appropriate analysis tool • Keep track of the progress on each open issue • Help communicate results with other departments and

outside the company


FRACAS: How to use in conjunction with a HALT, continued ♦

A FRACAS can help classify failures as to their relevancy • During

HALT, many failures are likely to be uncovered. However, not all failures will be relevant. The FMECA process will find many of these nonrelevant failures, but for those that are first found in HALT, a FRACAS will help make the determination of the relevancy by use of a variety of tools.


FRACAS: How to use in conjunction with a HALT, continued ♦

When performing a failure analysis, there are many tools that can be helpful. Some of these are: • Fault Tree Analyses (FTA’s) • Fishbone diagrams • Pareto charts • Designs of Experiments • Tolerance Analyses


FRACAS: How to use in conjunction with a HALT, continued ♦

A FRACAS can keep track of the progress on each open issue • Each failure is assigned a unique FRACAS Report ID • Each report requires detailed information about the

corrective action and must be signed off • During critical stages in a project, regular FRACAS

review meetings are typically held


FRACAS: How to use in conjunction with a HALT, continued ♦

A FRACAS can help communicate results with other departments and outside the company • FRACAS databases are typically kept on a network

drive for general viewing • FRACAS can be sent to a vendor to track failure

analysis • FRACAS can be used to communicate with customers

on product development or field issues


FRACAS Flow Chart Reliability - Failure Reporting Analysis and Corrective Action System (FRACAS) Flow

Trend Discovered in Repair Center

Failure Discovered in HALT Process


Gather Failure Information

Duplicate Failure, if possible

Failure Discovered in HASS Process

Develop Failure Analysis Plan for Specific Failure Including Resource Plan

Report Findings and Recommendations

Contact Customer or Supplier (if appropriate) to Inform Them of Plan

Analyze Failure to Root Cause

Send Sample of Failure Back to Component Manufacturer (if appropriate)

Implement Corrective Action

Test Solution

Did Solution Fix Problem?

Yes No

Report Solution and Close Failure Analysis

Monitor Effectiveness of Solution / Perform Verification HALT

Modify HASS Profile, if necessary




Reliability Demonstration Testing (RDT) ♦

A sample of units are tested at accelerated stresses for several months.

The stresses are a bit lower than the HALT stresses and they are held constant (or cycled constantly) rather than gradually increasing.

This enables us to calculate the acceleration factor for the test.

The RDT can be used to validate the reliability prediction analyses.

It is also useful in finding failure modes that are not easily detected in a high time compression test such as HALT.


RDT, continued

CRE Primer by QCI, 1998


RDT, continued

CRE Primer by QCI, 1998


RDT, continued

CRE Primer by QCI, 1998


RDT, continued

CRE Primer by QCI, 1998


RDT, continued Classic S-N Diagram (stress vs. number of cycles) Point at which failures become non-relevant S0= Normal Stress conditions

S2 Stress

N0= Projected Normal Life

S1 S0



N0 63

RDT, continued

CRE Primer by QCI, 1998


RDT, continued

CRE Primer by QCI, 1998


RDT: How to Use the Results of HALT in Planning an RDT ♦

Two of the most important pieces of information to decide upon when planning an RDT is which stresses to apply and how much. From this, we can derive the acceleration factor for the test. HALT can help with both of these. • HALT will identify the effects of each stress on the

product to determine which are most applicable. • HALT will identify the margins of the product with

respect to each stress. This is critical so that the highest amount of stress is applied in the RDT to gain the most acceleration without applying too much, possibly causing non-relevant failures.


RDT: How to Use the Results of Reliability Predictions in Planning an RDT ♦

Another key factor in planning an RDT is the goal of the test. This is usually driven by marketing requirements, but the Reliability Prediction will help determine how achievable this is • Although the prediction may not be able to give an

exact MTBF number, it will give a number close enough to help determine how long of an RDT to run and what type of confidence in the numbers to expect. • Many times, the reliability of the product will far

exceed initial marketing requirements. If this is the case, the RDT can be planned to try to prove these higher levels. Once achieved, the published specs from marketing can be increased. 67

RDT Flow Chart Reliability - Reliability Demonstration Testing Flow


Input From Reliability Modeling/ Derating

Input From HALT

Review Reliability Goals Based on Marketing Input

Develop Test Plan, including 1. Number of Units 2. Acceleration Factors 3. Total Test Time 4. Confidence Levels

Have Reliability Goals Been Met?

Set up and Begin Test


Monitor Results

Publish Results



Reliability Tools and Integration in the MANUFACTURING Phase • Highly Accelerated Stress Screening (HASS) • Highly Accelerated Stress Auditing (HASA) • On-Going Reliability Testing (ORT) • Repair Depot Setup • Field Failure Tracking System • Reliability Performance Reporting • End-of-Life Assessment




HASS - What Is It? • • • • • • •

Detect & correct PROCESS changes. Reduce production time & cost. Increase out-of-box quality & field reliability. Decrease field service & warranty costs. Reduce infant mortality rate at product introduction. Finds failures that are not found with burn-in Accelerates ones ability to discover process and component problems.

HASS is not a test, it’s a process. Each product has its own process. But...before HASS can begin, we must first HALT !! 72

Before HASS, We Must Characterize Product with HALT ♦

Before HASS, we must HALT • Even for mature products in which HASS is the

goal, HALT must be done first to characterize the product margins.


Steps Towards HASS ♦

Begin process during HALT stage (involve mfg)

HASS development

Production HASS

HASS Process Is Begun Early ♦

Even before HALT is complete, we should • determine production needs and throughput • start designing and building fixture • determine which stresses to apply • obtain functional and environmental equipment • understand manpower needs • determine what level HASS will be performed

(assembly or system) • determine location of HASS (in-house or at an

outside lab or contract manufacturer) • for high volume products, determine when to switch

to an audit and what goals should be put in place to trigger this

HASS Process ♦

After HALT is complete, we must • assure Root Cause Analysis (RCA) completed on all

failures uncovered • develop initial screen based on HALT results • map production fixture (thermal/vibration) • run proof-of-screen

HASS Process, continued ♦

Proof-of-Screen Criteria • Assure that screen leaves sufficient life in product • Assure that screen is effective

Assuring the Screen Leaves Sufficient Life UDL

Make dwells long enough to execute diagnostic suite. Execute diagnostics during entire profile.


. . . . t



Minimum 20 passes

It is highly recommended to combine six-axis vibration, tickle vibration, power cycling, other stresses with thermal. Powered on monitoring is essential.

Assuring the Screen Leaves Sufficient Life ♦

We run for X times more than proposed screen • When we reach end-of-life, then we can say that

one screen will leave 1 – 1/x left in the product. • Example: We recommend testing for a minimum of

20 times the proposed screen length. A failure after 20 HASS screens tells us that one screen will leave the product with 1 – 1/20 or 95% of its life.

HASS Process for Wide Operating Limits

Lower Lower Destruct Operating Limit Limit

Product Specs




Upper Upper Operating Destruct Limit Limit

The “Ideal” HASS Profile for wide operating limits Fast Rate Thermal




Make dwells long enough to execute diagnostic suite. Execute diagnostics during entire profile. It is highly recommended to combine six-axis vibration, tickle vibration, power cycling, other stresses with thermal. Powered on monitoring is essential.


HASS Process for Narrow Operating Limits

Lower Lower Destruct Operating Limit Limit

Product Specs

Upper Upper Operating Destruct Limit Limit

Precipitation Screen Detection Screen ESS


The “Ideal” HASS Profile for narrow operating limits UDL

Make dwells long enough to execute diagnostic suite. Execute diagnostics during entire profile.

UOL Fast Rate Thermal S P E C


Slow Rate Thermal



It is highly recommended to combine sixaxis vibration, tickle vibration, power cycling, other stresses with thermal. Powered on monitoring is essential.

HASS Process Is Begun Early ♦

Production HASS • Start screening process with 4x the number of

screen cycles intended for long-term HASS • During production screening (after each production

run), adjust screen limits up and cycles down until 90% of the defects are discovered in the first 1-2 cycles. • Monitor field results to determine effectiveness of

screen. Again, adjust screen limits as necessary to decrease “escapes” to the field. • Add other stresses, as necessary, if it is impractical

to adjust screen limits any further. It is essential that the product being tested be fully exercised and monitored for problem detection.

Typical HASS Failures ♦

Poor solder quality

Socket failures

Component failures

Bent IC leads

Incorrect components

Improper component placement

Test fixture/program errors

HASS Advantages over “Burn-In”

Finds flaws typically found by customers

Reduces production time and costs

Lowers warranty costs

HASS Defects by Environment

Combined Temperature and 6 Degree-of-Freedom Vibration 46%

Data from Array Technology (1993).

Extreme Temperature Transitions 12% High Temp 13%


29% Low Temp Extreme

HASS – Implementation Requirements

HALT for margin discovery

Screen development

Powered product with monitored tests

Fixturing to allow required throughput

QuaMark, 1998

HASS Cost Benefits

Greatly reduced test time

Reduction in test equipment

Lower warranty costs

Minimized chance of product recalls

QualMark, 1998

HASS: How to Use the Results of FMECA and a Reliability Predictions in Planning a HASS ♦

How to use the results of FMECA and a Reliability Prediction in planning a HASS • FMECA

results can identify possible wearout mechanisms that need to be taken into account for HASS.

• Reliability Prediction results can help determine how

much screening is necessary.


HASS: How to Use the Results of FMECA and a Reliability Predictions in Planning a HASS, continued ♦

Using FMECA results to identify possible wearout mechanisms that need to be taken into account for HASS • As we discussed in the FMECA section, certain

wearout failure modes are not easily detectable in HALT or even in HASS Development. Therefore, when wearout failure modes are present, we must rely on the results of a FMECA to help determine appropriate screen parameters.


HASS: How to Use the Results of FMECA and a Reliability Predictions in Planning a HASS, continued ♦

Using Reliability Prediction results to determine how much screening is necessary • One of the parameters of a reliability prediction is the

First Year Multiplier factor. This is a factor applied to a product based on how much manufacturing screening is being performed (or is planned for) to take into account infant mortality failures. • The factor is on a scale between 1 and 4.

No screening yields a factor of 4, and 10,000 hours of “effective” screening yields a factor of 1 (the scale is logarithmic).


HASS: How to Use the Results of FMECA and a Reliability Predictions in Planning a HASS, continued ♦

Using Reliability Prediction results to determine how much screening is necessary, continued • Effective screening allows for accelerants such as

temperature and temperature cycling. • HASS offers the best acceleration of any known

screen. Therefore, HASS is the perfect vehicle for helping to keep this factor low in a reliability prediction.


HASS: Using the Results of HALT to Develop a HASS Profile ♦

Using the HALT Results, we then run a HASS Development process • The process must prove there is significant life left

in the product • The process must prove that it is effective at finding



HASS: Linking the Repair Depot with HASS by Sending “NTF” hardware back through HASS ♦

During the repair process, we may identify a large number of “No Trouble Founds” or NTFs. HASS is the perfect vehicle for identifying if these NTFs are truly intermittent hardware problems or due to something else. Using HASS to assist with the “No Trouble Found (NTF)” issue at the Repair Depot.


HASS Dilemma Ø Ø Ø

Difficult to implement without impacting production Expensive to implement across many CM’s. Difficult to cost-justify

HASA Solves All These Issues



What is HASA

HASA is a Highly Accelerated Stress Audit

HASA is an effective audit process for manufacturing.


HASA Advantages

HASA combines the best screening tools with the best auditing tools.

Better than ORT because it leverages off of HALT and HASS to apply a screen tailored to the product

Better than HASS because it is much cheaper and easier to implement and “almost” as effective.


Steps to HASA

♦ HALT ♦ HASS Development ♦ Pilot HASS ♦ HASA


What is HASA

HASA is an audit or sampling procedure

HASA is intended for high volume applications in which the emphasis is not on catching every defect but rather detecting process shifts


Advantage of switching from HASS to HASA? ♦

To achieve cost efficiency for High Volume production, reducing manpower, equipment, utilities, and space costs.

Risks in switching from HASS to HASA? ♦

Some defective units will be shipped to the field.

Corrective actions must be fast and accurate.


Review of HASA ♦

When can HASA be used? • Design and processes are control • HASS failure rate has become acceptable

What is the advantage of a HASA program? • Cost

What is the risk of a HASA • Statistical confidence


When should HASA be considered as an option?

Only if: HALT is completed HASS Development is completed HASS is successful


Is HALT Completed?

If so: Design defects have been eliminated Margins are known Margins are large Design is robust


Is HASS Development completed? If so: Screen is effective Screen is safe


Is HASS implementation complete?

If so: Failure rates are acceptable Manufacturing processes are under control


HASA Plan Goal is to catch shifts in processes

1. Detect degradations in process quality control 2. Compare with pre-established thresholds 3. Empower Corrective Action Team


HASA Example Example from HP Vancouver

# units shipped per day = 1000 # units tested per day = 64 90% probability of detecting a rate shift from 1% to 3% by sampling 112 units in just under 2 days


HASA Choices Acceptable risk of allowing a “bad lot” to ship Probability of detecting a process shift of some amount


Sample Size

Decision Limits

Commitment to Action


HASA Summary ♦

For high volume production, HASA is the best process monitoring tool


HASS/HASA Flow Chart Reliability - Highly Accelerated Stress Screening (HASS) Flow

Analyze Repair Data to Determine if HASS Profile Needs to Be Strengthened


Data from HALT

Develop a HASS Profile that Matches Product Performance Capabilities

Prove Profile Using Iterative Process of Increasing Stress to Maximum Possible without Weakening Product

Perform HASS on Material from Repair Center

Has Product Undergone a Change that Could Affect Performance

Yes Perform HASS and Collect Data

Develop HASS Sampling Plan and Implement

Perform Sample HASS and Collect Data

Do Results/ Volumes Warrant Moving to Sample?

Send Failures No to Failure Analysis Process





Do Results Warrant Staying with Sample HASS?


Has Product Undergone a Change that Could Affect Performance?





On-Going Reliability Testing (ORT) ♦

ORT is a process of taking a sample of products off a production line and testing them for a period of time, adding the cumulative test time to achieve a reliability target. The samples are rotated on a periodic basis to: • get an on-going indication of the reliability • assure that the samples are not wearing too much (because after the ORT is complete, the samples are shipped).


ORT vs RDT ♦

ORT is a very similar test to the Reliability Demonstration Test (RDT) except that the RDT is usually performed once just prior to release of the product, whereas the ORT is an on-going test rotating in samples from the manufacturing line.

An ORT consists of a Planning stage and a Testing and Continual Monitoring stage. The inputs from the customer are the number of units allocated to the test, the duration that each set of units will be in the test before being cycled through, and the stress factors to be applied. 115

ORT Parameters ♦

Just as in a RDT, we must choose a goal, sample size, acceleration factors, and confidence.

In addition, we must choose length of time each sample will be in ORT. Because these are shippable units, we cannot risk taking significant life out.


ORT Goal ♦

The goal of an ORT is to: • Ensure that the defined reliability specification,

including the MTBF, are met throughout the manufacturing life of the product. • Verify that infant mortalities have been removed

during the standard manufacturing process.


ORT Goal, a closer look ♦

The goal of an ORT is to: • Ensure that the defined reliability specification,

including the MTBF, are met throughout the manufacturing life of the product. Does it do this?


ORT Goal, a closer look ♦

The goal of an ORT is to: • Ensure that the defined reliability specification,

including the MTBF, are met throughout the manufacturing life of the product. Does it do this? The answer is “yes”, but a better question is: Is this really the most effective method of doing this?


ORT Goal, a closer look ♦

The goal of an ORT is to: • Ensure that the defined reliability specification,

including the MTBF, are met throughout the manufacturing life of the product. Does it do this? The answer is yes, but a better question is: Is this really the most effective method of doing this? Probably Not. Wouldn’t a HALT/RDT be much more effective? HALT will make the product more robust, and then RDT will measure the reliability after that. Then we perform periodic HALT’s to assure the product remains robust. 120

ORT Goal, a closer look ♦

The goal of an ORT is to: • Ensure that the defined reliability specification,

including the MTBF, are met throughout the manufacturing life of the product. • Verify that infant mortalities have been removed

during the standard manufacturing process. Does it do this?


ORT Goal, a closer look ♦

The goal of an ORT is to: • Ensure that the defined reliability specification,

including the MTBF, are met throughout the manufacturing life of the product. • Verify that infant mortalities have been removed

during the standard manufacturing process. Does it do this? The answer is “NO, IT DOES NOT!” ORT is ineffective for process monitoring because a) ORT rarely finds problems because acceleration factors are typically not aggressive enough.


ORT Goal, a closer look ♦

The goal of an ORT is to: • Ensure that the defined reliability specification,

including the MTBF, are met throughout the manufacturing life of the product. • Verify that infant mortalities have been removed

during the standard manufacturing process. Does it do this? The answer is “NO, IT DOES NOT!” ORT is ineffective for process monitoring because a) ORT rarely finds problems because acceleration factors are typically not aggressive enough. b) when problems that are found, it may be weeks later and products from that lot have already shipped. 123

Comparison Between ORT and HASA ♦

ORT Benefits over HASA • You can measure reliability at any given time

HASA Benefits over ORT • Effective process monitoring tool due to ability to

find failures and to timely corrective actions • Don’t need to measure on-going reliability because

reliability measurement was already done once in RDT. Also, periodic HALT is a much better vehicle for continuously monitoring reliability over time after it has been baselined.




Repair Depot Setup ♦

A Repair Depot facility must be set up with the proper testing in place to reproduce the failures and to assure that the product has enough life left to be shipped back into the field.

But more importantly it must be set up in such a way as to learn from the failures and make changes to the design and manufacturing processes to assure the failures are not repeated.


Repair Depot Setup ♦

Repair Depot Plan

Lowest Replaceable Unit (LRU) Level Analysis

Repair Depot Location Strategy – In-house vs. 3rd Party

Repair Process


Repair Depot Setup ♦

Repair Depot Plan • The plan will outline the Repair Depot process from

beginning to end and the decisions that have to be made along the way, including identifying the Lowest Replaceable Units (LRUs), the location of the Repair Depot, and the Repair Process itself.


Repair Depot Setup ♦

Lowest Replaceable Unit (LRU) Level Analysis • Through the use of a reliability prediction and a

maintainability prediction, we shall help identify the LRU’s for the product. The choice of which subassemblies will be LRU’s is based on how often the subassembly will fail, how easy it is for the user to identify and replace the LRU, and how safe the operation is for the user and the product.


Repair Depot Setup ♦

Repair Depot Location Strategy – In-house vs. 3rd Party • Depending on the complexity of the product, it may

make sense have a 3rd Party vendor act as the Repair Depot. We shall help perform this analysis and evaluate the needs vs. the capabilities to determine the strategy that makes the most sense.


Repair Depot Setup ♦

Repair Process • Integrate with HALT results • Integrate with HASS results • Using HASS for “No Problem Found” issues • “Three Strikes” Process • Set up to feed data to the Field Failure Tracking



Repair Depot Setup ♦

Integrating the Repair Depot Center with HALT results • If we find design issues in the field and we can

duplicate these issues in the Repair Depot Center, we must feed them back into the HALT process. • Therefore, the Repair Depot must be set up to easily

feed information back to the HALT process when issues like this arise. • The Corrective Action System is the perfect vehicle

for linking these two together.


Repair Depot Setup ♦

Integrate the Repair Depot Center with HALT results • Once the issue is fed back into the HALT process,

we will review the type of failure and why HALT was not able to find the problem. • Under what conditions did the failure occur? • Was this something that HALT found but for which corrective action was not implemented? • Perhaps we stopped short of reaching the fundamental limit of technology. • Perhaps additional stresses are required. • This will then be used for future HALT’s so that the

process continually learns and adapts. 133

Repair Depot Setup ♦

Integrate the Repair Depot Center with HASS results • If we find process issues in the field and we can

duplicate these issues in the Repair Depot Center, we must feed them back into the HASS process. • Therefore, the Repair Depot must be set up to easily

feed information back to the HASS process when issues like this arise. • The Corrective Action System is the perfect vehicle

for linking these two together.


Repair Depot Setup ♦

Integrate the Repair Depot Center with HASS results • Once the issue is fed back into the HASS process,

we will review the type of failure and why HASS was not able to find the problem. • Was the issue a DOA or did it occur into the life of the product? This will help determine why HASS did not catch. • Perhaps the screen limits need to be adjusted. • Perhaps additional stresses are required. • This information will then be used for future

HASS’es so that the process continually learns and adapts. 135

Repair Depot Setup ♦

Using HASS for “No Problem Found” issues • We shall help determine how to treat “No Problem

Founds” (NPFs). • We shall also help integrate the Repair Depot

process with the HASS process so that “No Problem Found” items are sent through the HASS process so that intermittent failures can be discovered and repaired as well.


Repair Depot Setup ♦

“Three Strikes” Process • We shall help identify how often to allow failed

products to be sent back to the field. • A typical rule of thumb is to apply a “three strikes”

policy in which a failed product may not be returned to the field if it is has been returned 3 times.


Repair Depot Setup ♦

Set up the Repair Depot System to feed data to the Field Failure Tracking System • The Repair Depot Center retests products returned

from the field to confirm failures and determine root cause. • The confirmation is then fed back to the Field

Failure Tracking System so that it can be properly categorized for reliability data reporting.




Field Failure Tracking System ♦

The purpose of the Field Failure Tracking System is to provide a system for evaluating a product’s performance in the field and for quickly identifying trends.


Field Failure Tracking System ♦

When setting up a Field Failure Tracking System, we must • Identify key parameters to monitor • Develop failure codes • Choose appropriate Tracking System • Implementation of the Tracking System • Integration of the Tracking System • Track trends • Educate others as to use


Field Failure Tracking System ♦

Identify key parameters to monitor • Before setting up a system, we must first identify

what key parameters need to be monitored – Date of Shipment, Date of Return, Failure Code, Further Actions, etc. as well as what key metrics need to be calculated (DOA rate, MTBF, etc.)


Field Failure Tracking System ♦

Failure Code Definitions • Failure code definitions are key to the tracking

system because proper coding of failures is the only way to easily identify trends.


Field Failure Tracking System ♦

Choice of Tracking Systems • Once we have identified all of the key parameters

to capture, we will help choose the best tracking system for your particular application. Anything from Excel to Oracle can be used. Some level of customization may be needed. Special modules are also available from reliability software houses such as ReliaSoft and Relex. • This system must link well with the Failure Analysis



Field Failure Tracking System ♦

Implementation of Tracking System • Once we have assured data integrity, then we can

use the Field Failure Tracking System to • track trends • calculate reliability of the product on-going, including


Field Failure Tracking System ♦

Implementation of the System - Tracking Trends • The most important step in the process is to evaluate

the data, spot any trends that are developing, and to provide corrective action as needed. We will show you proper techniques for identifying trends from the given data and how to follow through the entire process.


Field Failure Tracking System ♦

Implementation of the System - Metrics • There are numerous metrics we can track, including

• DOA rates • MTBF • Warranty returns • End-of-Life Issues (see EOL Assessment for more details)


Field Failure Tracking System ♦

Set up to easily collect data to calculate Field MTBF • All of these reliability calculations can be presented

using • total over time • point estimates • rolling averages • 3 month rolling average • 12 month rolling average


Field Failure Tracking System ♦

Set up to easily collect data to calculate Field MTBF • Rolling averages are typically the best because

• they show the reliability trend • they make for an easy comparison from the time a product starts shipping to present to show reliability growth • they can show effectiveness of a corrective action by comparing from the time a failure is discovered to after a corrective action has been implemented.


Field Failure Tracking System ♦

Setting up the Field Failure Tracking System to integrate with sales support and customer service to assure data integrity • Data Integrity is key to a good Field Failure

Tracking System. We must be able to accurately determine • Date product was put into service • Date of failure • Circumstances around failure and solution


Field Failure Tracking System ♦

Data Integrity – Date product was put into service • This is not the same as the date of shipment • Often we have an “adder” to the ship date (e.g.

ship date + 30 days) but we must verify this is accurate • Sales support will help define this since they know the customer installation process best • From this we need to be able to accurately

determine DOAs from products that failed soon after installation. • We need to come up with “Definition of DOA”, or

how many days after installation is a failure considered a DOA. 151

Field Failure Tracking System ♦

Data Integrity – Date of failure • The failure date is equally as critical and not

always easy to determine because the customer sometimes only indicates the date the product was returned and not the date it failed • Customer service can help work with the customers to assure that we get accurate information here


Field Failure Tracking System ♦

Data Integrity – Circumstances around failure and solution • The key to each field issue is the actual failure itself

and how it was solved. • Tags on product with failure information is essential • RMA numbers should also be on these tags and

these RMA numbers should tie back to database • The database should be linked to the Field Failure

Tracking Database so that all of the circumstances around the failure are known before trying to repair. • Often we will find that the true problem was identified after hardware was returned • Sometimes numerous pieces of hardware are pulled to solve a single problem – “Shotgunning” 153

Field Failure Tracking System ♦

Integrating the Field Failure Tracking System with the Repair Depot Center • Failed products from the field are returned to the

Repair Depot Center for confirm and to determine root cause. • The confirmation is then fed back to the Field

Failure Tracking System so that it can be properly categorized for reliability data reporting.




Reliability Performance Reporting ♦

Reliability Performance Reporting in its simplest form is just reporting back how we are doing against our plan. In this report, we must capture • how we are doing against our goals and against our

schedule to meet our goals ? • how well we are integrating each tool together ? • what modifications we may need to make to our plan ? ♦

In the report, we can also add information on specific issues, progress on failure analyses, and paretos and trend charts


Reliability Performance Reporting ♦

How we are doing against our goals and against our schedule to meet our goals ? • After collecting the field data, we then compare with our

goals and estimate how we are doing. • If we are achieving a specific goal element, we explain

what pieces are working and the steps we are going to take to assure that this continues • If we are not achieving a specific goal element, we must

understand what contributed to this and what steps we are going to take to change this • As part of this, we must understand the major contributors to each goal element through trend plotting and failure analyses 157

Reliability Performance Reporting ♦

How well we are integrating each tool together ? • As part of an understanding the effectiveness of our

reliability program, we must look at the overall program • For example, if we stated in the plan that we were going

to use the results of the prediction as input to HALT, we must describe here how we accomplished this • This can help explain the effectiveness of the HALT so that its results can be repeated • This can help explain how the HALT can be more effective in future programs if we overlooked or skipped some of the integration • This will serve as documentation for future programs 158

Reliability Performance Reporting ♦

What modifications we may need to make to our plan ? • Occasionally, we may need to modify the plan

• Goals may change due to new customer/marketing requirements • We may have discovered new tools or new approaches to using existing tools based on research • We may have developed new methods of integration based on experimentation and research • Schedule may have changed


Reliability Performance Reporting ♦

What modifications we may need to make to our plan ? • If this occurs, we need to

• Re-write the plan • Summarize the changes in our Reliability Performance Report so that we can accurately capture these new elements going forward




End-of-Life (EOL) Assessment ♦

We Perform End-of-Life Assessments to • Determine when a product is starting to wear out in

case product needs to be discontinued • Monitor preventive maintenance strategy and

modify as needed • Monitor spares requirements to determine if a

change in allocation is necessary • Tie back to End-of-Life Analysis done in the Design

Phase to determine accuracy of analysis


End-of-Life (EOL) Assessment ♦

Determining when a product is starting to wear out in case product needs to be discontinued • In most market segments, customers don’t expect

products to last forever and would gladly replace if technological advances dictate it • Therefore, discontinuing a product and offering an

upgrade/replacement is common • If we can calculate when a product is starting to

reach end-of-life, this will help provide the cost justification for both us and our customer


End-of-Life (EOL) Assessment ♦

Monitor preventive maintenance strategy and modify as needed • If we decide to continue supporting the product, we

may determine our preventive maintenance strategy needs to be modified • Perhaps we didn’t anticipate a wearout mechanism that now needs to be dealt with • Perhaps we estimated incorrectly the length of time before wearout would begin


End-of-Life (EOL) Assessment ♦

Monitor spares requirements to determine if a change in allocation is necessary • Our spares allocation is partly based on our initial

Reliability Prediction and End-of-Life (EOL) Analysis. • If our EOL Assessment is showing something that

we did not predict in our EOL Analysis, then our spares allocation will need to be adjusted as a result.


End-of-Life (EOL) Assessment ♦

Tie back to End-of-Life Analysis done in the Design Phase to determine accuracy of analysis • For future programs, this comparison between

EOL analysis and EOL assessment is critical to understand how to modify. • In some cases, our EOL analysis may differ

because of analytical techniques. If this is the case, we can develop an adjustment factor between EOL analysis and EOL assessment and carry forward to new programs.


End-of-Life (EOL) Assessment ♦

So now that we understand how to use EOL Assessments, how to we actually perform one ? • An EOL Assessment uses Weibull-plotting

techniques to determine where on the “bathtub” curve we are


End-of-Life (EOL) Assessment ♦

A review of the “bathtub” curve

Infant Mortality level driven by amount of screening in mfg./characterized using a special factor in prediction

Failure Rate

Ideal reliability at time of ship

Steady State Reliability Level described by prediction

Onset of endof-life (EOL)



End-of-Life (EOL) Assessment ♦

To figure out where we are, we plot the field data • We must “scrub” the data to

• accurately determine the number of days in use before failure • properly categorize the failure • We must be careful and plot data by assembly

type, especially if different assemblies have different wearout mechanisms. Otherwise, it will be impossible to determine a pattern


End-of-Life (EOL) Assessment ReliaSoft's Weibull++ 6.0 - www.Weibull.com

Failure Rate vs Time Plot 0.10

Weibull Since Jan 28 - (NTF-knwnissues) W2 RRX - SRM MED

F=49 / S=0

Failure Rate, f(t)/R(t)





Mike Silverman Company 5/2/2004 07:58

0 0






Time, (t) β=2.9032, η=60.9188, ρ=0.8154


Presentation Summary In this class, we learned about: • The four phases of a reliability program

• Concept • Design • Prototype • Manufacturing • We learned about the reliability tools used in each

phase and how to integrate all of the tools together • We learned about HALT and HASS and their role

in an overall reliability program


Presentation Summary In the Prototype Phase, we learned about: • Highly Accelerated Life Testing (HALT) • Failure Reporting, Analysis and Corrective Action

System (FRACAS) • Reliability Demonstration Testing

and how to integrate these together and with tools from the other phases


Presentation Summary In the Manufacturing Phase, we learned about: • Highly Accelerated Stress Screening (HASS) • Highly Accelerated Stress Screen Audit (HASA) • On-Going Reliability Testing (ORT) • Repair Depot Setup • Field Failure Tracking System Setup • Reliability Performance Reporting • End-of-Life Assessment

and how to integrate these together and with tools from the other phases


Presentation Summary In Summary we have learned: • the power of developing realistic reliability goals

early, planning an implementation strategy, and then executing the strategy, and...

the power of integration !!


Presentation Summary



Further Education • For a more In-depth view of this topic and more, Mike

will be teaching at: • May 18th-20th: Applied Reliability Symposium, San Diego • Ops A La Carte is a proud sponsor of the 2005 ARS at the

Catamaran Resort on Mission Bay in San Diego, CA. • In addition to sponsoring, we shall be giving a presentation

on “Reliability Integration Across the Product Life Cycle”

• October 20th and 21st: “Essential Reliability Tools: A look at the best reliability tools being used” • Go to www.opsalacarte.com/pages/news/news_events.htm for more details


For more information... ♦ Contact Ops A La Carte (www.opsalacarte.com) • Mike Silverman

• (408) 472-3889 • [email protected]


Related Documents

Halt To Hass To Hasa
November 2019 17
December 2019 13
Halt En
November 2019 15
Hass Final
December 2019 14