Syed-ksr

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Syed-ksr as PDF for free.

More details

  • Words: 11,589
  • Pages: 61
1

CHAPTER 1 INTRODUCTION 1.1 MOTIVATION The sustained improvement of deep-submicron technologies has led to an explosion in the number of transistors that may be integrated on a chip and further to the possibility of putting a whole system on a chip (SOC). Core-based design is one paradigm of the new trends used to reduce complexity and costs of chip development. Nevertheless, test related costs are problems still far away from having a unitary and satisfactory solution. The external testing of integrated circuits (ICs) is a traditional approach in which automated test equipment (ATE) provides all the necessary test data. This may set high requirements on the storage capacity and speed of the ATE. Furthermore, the ever increasing transistor count per I/O pin and the low accessibility of internal blocks are affecting the tradeoff between the final fault coverage and the test application time. All of these combined with the necessity of specially tuned testers for different types of cores and the growing need for periodic in-field maintenance and on-line testing capabilities make the external testing difficult, costly and insufficient. All the above mentioned problems demand built-in self-test (BIST) solutions. In this context, BIST for random logic (LBIST) is becoming an attractive alternative in IC testing. The standard BIST architecture (H.-J. Wunderlich et.al 1998) [21] uses an LFSR that feeds pseudo-random patterns into the scan paths. It is easy to implement and minimizes both hardware overhead and impact on the system performance. However, due to random pattern resistant (RPR) faults, pseudo-random patterns cannot always achieve sufficient fault coverage within an acceptable test time.

2

The fault coverage can be increased by biasing the pseudo-random test sequence towards the RPR faults (R. Rodríguez-Montanes et.al 1992) [16]. Conflicting input values required by different RPR faults may need different weighting sets. Unfortunately, the control logic and the storage requirements for the weighting sets can increase unacceptably. Pseudo-exhaustive testing (N. A. Touba et.al 1996) [17] achieves the benefits of exhausting testing while usually requiring less test patterns. This reduction is obtained by splitting the circuit into various segments that are tested exhaustively. The efficiency of the method is limited by the size of the largest segment that has to be tested. An alternative approach for increasing the fault coverage is the insertion of test points, which has been proposed for both LBIST and external testing .While the area increase due to test point insertion may be tolerable, they can introduce additional signal delays, which could require a complete resynthesis and a new timing verification (R. K. Brayton et.al 1986) [2].Deterministic LBIST (DLBIST) guarantees higher or complete fault coverage by embedding deterministic test cubes (test patterns with unspecified bits) into the pseudo random sequence. There is a wide range of deterministic logic BIST methods that apply deterministic test patterns and hence improve the low fault coverage often obtained by pseudo-random patterns. In an initial deterministic BIST scheme, additional external patterns were applied on top of the pseudo-random test (B. Benware et.al 2003) [1]. In contrast to the above mentioned BIST methods, pure DLBIST schemes try to avoid both the modification of the core under test (CUT) and the application of additional external test data. These methods can be classified into store and generate schemes and test set embedding schemes.Store and generate schemes consist of hardware structures which store the test patterns on-chip in a compressed form and implement a decompression algorithm. Widely known representatives of this method are LFSRreseeding (M. Naruse et.al 2003) [6], multi-polynomial reseeding and folding

3

counter based-LBIST.Test set embedding schemes rely on a pseudo-random test pattern generator plus some additional circuitry that modifies the pseudo-random sequence in such a way that a set of deterministic cubes is embedded. Widely known test set embedding techniques are bit flipping and bit-fixing. In the bit-flipping approach, the output sequence of an LFSR is inverted at a few bit positions in order to increase the fault coverage (Figure 1.1), while in the bit-fixing approach constant values are applied (Figure 1.2). The test generation process is controlled by a bit-flipping function (BFF) or a bit-fixing function (BFX), respectively.

Figure 1.1 Bit-Flipping Logic

Figure 1.2 Bit-Fixing Logic

The term pattern mapping will be used for referring to the assignment of a pseudorandom pattern to a given deterministic cube. The synthesis procedure of a DLBIST scheme consists of pattern mapping and generation of the hardware structure used to implement the mapping, e.g. by means of a BFL or BFX. The embedded test sequences obtained by mapping deterministic cubes to pseudorandom sequences are also evaluated with respect to their coverage of non-target

4

defects. Moreover, possible tradeoffs between the test length, hardware overhead, fault coverage and non-target defect coverage are analyzed. 1.2 GOAL OF THE WORK In most embedded test approaches, the test responses are compressed by a multi-input shift register (MISR), which delivers a signature containing the information about the correctness of the CUT. The test responses may contain unknown bits (Xs), which can appear due to the existence of multiple clock domains, floating buses or uninitialized memory elements. In order to obtain an uncorrupted signature at the end of the test, these Xs have to be masked to either logic ‘0’ or logic ‘1’ before they propagate into the MISR. This may be performed by combinational logic implementing a so-called Don’t care masking function (DML). The DML can be kept quite small by carefully selecting those bits of the test responses carrying the information about the CUT correctness which have to remain unmasked. However, that paper concentrated on handling designs that produce unknown values which potentially could corrupt the signature and invalidate the results of a BIST run. Consequently, an additional logic block called DML (Don’t Care Masking Logic) is used to block the unknown values without compromising the stuck-at fault coverage of the test. In contrast to DML that considered potential coverage loss between the circuit under test and the test response evaluator, this paper focuses on unmodeled defect or resistive bridging fault (RBF) coverage of the sequence applied by the test pattern generator to the circuit under test, which is used as a surrogate of unmodeled defects in this paper.

5

CHAPTER 2 AUTOMATIC TEST EQUIPMENT (ATE) 2.1 HISTORY

Figure 2.1 Automatic Test Equipment

Automatic test equipment (ATE) is any automated device that is used to quickly test printed circuit boards, integrated circuits, or any other related electronic components or modules. An ATE system can be as simple as a digital multi-meter (DMM) whose operating mode and measurements are controlled and analyzed by a computer, or as complex as a system containing dozens of complex test instruments capable of automatically testing and diagnosing faults in complex electronic systems, such as very sophisticated flying-probe testers. ATE is widely used in the electronic manufacturing industry to test electronics components and systems after they are fabricated. ATE is also used to test avionics systems on commercial and military aircraft. ATE systems are also used to test the electronic modules in today’s automobiles. ATE systems typically interface with an automated placement tool, called a Handler, that physically places the

6

Device under Test so that it can be measured by the equipment that are shown in figure 2.1. 2.2 LIMITATIONS OF ATE • ATE are expensive (typically several million US$). • Increase of test application time. • Increase of test data volume.

7

CHAPTER 3 BASIC CONCEPTS OF BUILT-IN SELF-TEST 3.1 ARCHITECTURE OF BIST Built-in self-test (BIST) is a technique in which additional circuitry is added to a circuit under test (CUT) in order to make it able to test itself with minimum external help. Figure 3.1 sketches the general structure of a self-testable circuit composed of a test pattern generator (TPG), a test response evaluator (TRE) and a BIST control unit (BCU).

Figure 3.1 BIST Architecture Its implementation can result in an improvement in the test quality due to its better support for at-speed testing, which is essential for detecting delay faults. BIST supports in-field and on-line testing, which helps to reduce the cost of system maintenance. It also offers the opportunity to improve reliability by means of burn-in testing.

8

The main idea of Built-in self-test (BIST) is the capability of a circuit (chip, board, or system) to test itself. BIST techniques can be classified into two categories, namely online BIST, which includes concurrent and no concurrent techniques, and off-line BIST, which include functional and structural approaches are shown in figure 3.2. In online BIST, testing occurs during normal functional operating conditions; i.e., the circuit under test (CUT) is not placed into a test mode where normal functional operation is locked out. Concurrent on-line BIST is a form of testing that occurs simultaneously with normal functional operation. In no concurrent on-line BIST, testing is carried out while a system is in an idle state. This is often accomplished by executing diagnostic software routines or diagnostic firmware routines. The test process can be interrupted at any time so that normal operation can resume.

Figure 3.2 Classification of Testing Off-line BIST deals with testing a system when it is not carrying out its normal functions. Systems, boards, and chips can be tested in this mode. This form of testing is also applicable at the manufacturing, field and operational levels. Often Off-line testing is carried out using on-chip or on-board test-pattern generators (TPGs) and output response analyzers (ORAs) or micro diagnostic routines. Off-line testing does not detect errors in real time, i.e., when they first occur, as is possible with many on-line concurrent BIST techniques.

9

A functional off-line BIST deal with the execution of a test based on a functional description of the CUT and often employs a functional, or high-level, fault model. Normally such a test is implemented as diagnostic software or firmware. Structural offline BIST deals with the execution of a test based on the structure of the CUT. An explicit structural fault model may be used. Fault coverage is based on erecting structural faults. Usually tests are generated and responses are compressed using some form of an LFSR. BIST approaches can be divided into test-per-scan and test-per-clock schemes, which are described in the following Section. 3.1.1 Test-Per-Scan Schemes Test-per-scan BIST schemes require scan-based design. In the case of sequential circuits, this means that all the storage cells can be configured as one or several scan paths (scan chains), which are used as serial shift registers in test mode. In this way, each storage device of the CUT becomes easily controllable and observable. The test stimuli/responses are shifted into/out of the scan paths. Scan-based design helps to reduce the problem of testing sequential circuits to the simpler problem of testing combinational circuits. The BCU in Figure 3.1 must contain at least a shift counter and a pattern counter. The shift counter controls the bit stream which is generated and shifted into the scan path by a TPG. The pattern counter controls the length of the test sequence. A system clock cycle (also called capture or functional clock cycle) is applied to load the CUT response to the current test pattern into the scan path. During the so-called shift mode (also called scan or test mode) a new test pattern is shifted into the scan path, while the CUT response to the previous pattern is shifted out and compressed by a TRE.A very common and effective parallel-serial mixed scheme is obtained by partitioning a full scan path into multiple scan chains (Figure 3.3).

10

In Figure 3.3, the test patterns are generated by a pseudo-random pattern generator (PRPG) and the responses are compacted by a multiple input shift register (MISR). Both the PRPG and the MISR are typically implemented as linear feedback shift registers (LFSRs). Such a scheme is called Self-Test Using MISR and Parallel Shift register sequence generator (STUMPS) (T. Clouqueur et.al 2005) [4].The basic design with multiple scan chains suffers from highly correlated patterns. To solve this problem, XOR-trees (phase shifters (PS)) may be inserted between the LFSR and the scan chains inputs (Figure 3.4).

Figure 3.3 Test-per-scan scheme

Figure 3.4 STUMPS architecture for parallel-serial mixed scheme

11

In order to reduce test time, power consumption and storage requirement, other scan structures like scan forest or Illinois scan may be used. There are several approaches to transform the storage elements of the CUT into scan elements. For example, edgetriggered D-type flip-flops can be transformed into so called scan flip-flops by adding a multiplexer (Figure 3.5.1) in front of them. A scannable signal is used to switch between shift and capture modes and the same clock signal can be used for both modes.An example of level-triggered storage element transformed into scan element is shown in Figure 3.5.2 Here, the switching between shift and capture modes is made with the help of two clock signals that control the first of the two latches.

Figure 3.5.1 Edge – Triggered Scan Element (Scan Flip-Flop)

Figure 3.5.2 Level Sensitive Scan Element (Shift Register Latch) Figure 3.5 Storage cells for scan design Test-per-scan schemes have several advantages: (a) high fault/defect coverage; (b) reduced test data size (compared to sequential test patterns); (c) relatively low test

12

generation time; (d) reduced test costs (no special requirement for costly ATEs for functional testing); (e) low impact on the system behavior, as only scan paths are included into the mission logic and (f) separation of the pattern generator from the CUT. The drawbacks of test-per-scan schemes are: (i) long test application time required by the scan mode; (ii) functionally untestable faults can be activated; (iii) reduced testability for faults whose detection necessitates pairs of test patterns and (iv) reduced system performance if scan elements are introduced into the critical paths. If partial scan paths are used, such problem can be reduced and more test patterns may be applied within the same test time. 3.1.2 Test-Per-Clock Schemes In a test-per-clock scheme, a test pattern is applied to the CUT every clock cycle. This scheme is best suited for register-based design. This kind of scheme employs a specific BIST architecture using the built-in logic block observer (BILBO), which is a more sophisticated register that can function as a normal state register, scan register, PRPG or MISR All functionality of the BILBO depends on the mode input signals B0 and B1.Signal B0 controls all the registers to switch between the global and local modes Figure 3.6). The global mode covers the functional and scan modes. In the local mode the registers may act as pattern generators or response evaluators. In order to select each of these sub-modes associated with the global or local mode, the signal B1 is used. In contrast to signal B0, which is unique for all registers, the signal B1 depends upon the addressed register.

Figure 3.6: Control signals of a BILBO

13

Figure 3.7 Test-per-clock scheme In Figure 3.7, it can be seen how to facilitate testing by changing the functionality of the BILBO registers. Initially, the registers R1 and R2 are initialized in scan mode. Then register R1 is set to a PRPG mode for the combinational logic C1 and the test responses are observed by register R2 that functions in response evaluation mode as MISR. The combinational logic C2 is tested after the test outcome contained in R2 is shifted out and the functionalities of R1 and R2 are interchanged. In the end, the new test outcome contained in R1 has to be shifted out.

Tables 3.1 Modes of BILBO Compared to test-per-scan schemes, the test-per-clock schemes have both advantages and disadvantages. The advantages of test-per-clock schemes are: (a) shorter test times and better support for two-pattern testing, as a new pattern can be applied in each clock cycle; and (b) better support for at-speed testing, as no pattern shifting is required, which generally is done at a lower speed. The disadvantages of the test-per-clock schemes may be the following: (i) larger hardware overhead and (ii) stronger impact on the system behavior and design flow. The

14

overhead can also be affected by the increased complexity in the test-per-clock schedule that requires the synthesis of a rather complex BCU. One reason for these disadvantages is that additional test registers have to be included, due to the fact that normal BILBO registers cannot work as TPG and TRE simultaneously. 3.1.3 Test Pattern Generator (TPG) Depending upon the desired fault coverage and the specific faults to be tested for, a sequence of test vectors (test vector suite) is developed for the CUT. It is the function of the TPG to generate these test vectors and apply them to the CUT in the correct sequence. A ROM with stored deterministic test patterns, counters, linear feedback shift registers are some examples of the hardware implementation styles used to construct different types of TPGs. 3.1.4 Test Controller The BIST controller orchestrates the transactions necessary to perform self-test. In large or distributed BIST systems, it may also communicate with other test controllers to verify the integrity of the system as a whole. The external interface of the test controller consists of a single input and single output signal. The test controller’s single input signal is used to initiate the self-test sequence. The test controller then places the CUT in test mode by activating input isolation circuitry that allows the test pattern generator (TPG) and controller to drive the circuit’s inputs directly. Depending on the implementation, the test controller may also be responsible for supplying seed values to the TPG. During the test sequence, the controller interacts with the output response analyzer to ensure that the proper signals are being compared. To accomplish this task, the controller may need to know the number of shift commands necessary for scan-based testing. It may also need to remember the number of patterns that have been processed. The test controller asserts its single output signal to indicate

15

that testing has completed, and that the output response analyzer has determined whether the circuit is faulty or fault-free. 3.1.5 Output Response Analyzer (ORA) The response of the system to the applied test vectors needs to be analyzed and a decision made about the system being faulty or fault-free. This function of comparing the output response of the CUT with its fault-free response is performed by the ORA. The ORA compacts the output response patterns from the CUT into a single pass/fail indication. Response analyzers may be implemented in hardware by making used of a comparator along with a ROM based lookup table that stores the fault-free response of the CUT. The se of multiple input signature registers (MISRs) is one of the most commonly used techniques for ORA implementations.

16

CHAPTER 4 CLASSIFICATION OF TEST PATTERN GENERATOR There are several classes of test patterns. TPGs are sometimes classified according to the class of test patterns that they produce. The different classes of test patterns are briefly described below: 4.1 DETERMINISTIC TEST PATTERNS These test patterns are developed to detect specific faults and/or structural defects for a given CUT. The deterministic test vectors are stored in a ROM and the test vector sequence applied to the CUT is controlled by memory access control circuitry. This approach is often referred to as the “ stored test patterns “ approach. 4.2 ALGORITHMIC TEST PATTERNS Like deterministic test patterns, algorithmic test patterns are specific to a given CUT and are developed to test for specific fault models. Because of the repetition and/or sequence associated with algorithmic test patterns, they are implemented in hardware using finite state machines (FSMs) rather than being stored in a ROM like deterministic test patterns. 4.3 EXHAUSTIVE TEST PATTERNS In this approach, every possible input combination for an N-input combinational logic is generated. In all, the exhaustive test pattern set will consist of 2

N

test vectors. This

number could be really huge for large designs, causing the testing time to become

17

significant. An exhaustive test pattern generator could be implemented using an N-bit counter. 4.4 PSEUDO-EXHAUSTIVE TEST PATTERNS In this approach the large N-input combinational logic block is partitioned into smaller combinational logic sub-circuits. Each of the M-input sub circuits (M
18

CHAPTER 5 DTERMINISTIC LOGIC BIST (DLBIST) 5.1 DLBIST ARCHITECTURE WITHOUT DML Figure 5.1 shows the basic DLBIST architecture (G. Kiefer et.al 2000) [8] without DML. An LFSR with a phase shifter is used as the source of random patterns. Some of the patterns are useless, i.e. they do not detect any (stuck-at) faults not detected by other patterns. In order to achieve the desired fault coverage, some of the bits produced by the LFSR are inverted (flipped), which is controlled by bit-flipping logic (BFL). BFL is a combinational block that takes the LFSR state, the pattern number (from the pattern counter) and the bit number (from the bit counter) and selects the LFSR outputs to be inverted by driving a logic-1 at the inputs of the corresponding XOR gates. Note that the architecture can be used for combinational and scan cores employing one or several chains. The responses of the CUT F. Brglez et.al 1989 [3] are fed into a MISR.

19

Figure 5.1 DLBIST without XML

20

5.1.1 Linear Feedback Shift Registers The Linear Feedback Shift Register (LFSR) (E. H.Volkerink et.al 2005) [18] is one of the most frequently used TPG implementations in BIST applications. This can be attributed to the fact that LFSR designs are more area efficient than counters, requiring comparatively lesser combinational logic per flip-flop. An LFSR can be implemented using internal or external feedback. The former is also referred to as TYPE1 LFSR while the latter is referred to as TYPE2 LFSR. The two implementations are shown in Figure 5.2 & 5.3.

Figure 5.2 internal feedback LFSR Figure 5.3 External Feedback LFSR The external feedback LFSR best illustrates the origin of the circuit name a shift register with feedback paths that are linearly combined via XOR gates. Both the implementations require the same amount of logic in terms of the number of flip-flops and XOR gates. In the internal feedback LFSR implementation, there is just one XOR gate between any two flip-flops regardless of its size. Hence, an internal feedback implementation for a given LFSR specification will have a higher operating frequency as compared to its external feedback implementation. For high performance designs, the choice would be to go for an internal feedback implementation whereas an external feedback implementation would be the choice where a more symmetric layout is desired (since the XOR gates lie outside the shift register circuitry). Looking at the state diagram, one can deduce that the sequence of patterns generated, is a function of the initial state of the LFSR, i.e. with what initial value it started generating the vector sequence. The value that the LFSR is initialized with, before it begins generating a vector sequence is referred to as the seed. The seed can be any value other than an all zeros vector. The all zeros state is a forbidden state for an LFSR as it causes

21

the LFSR to infinitely loop in that state.This can be seen from the state diagram of the example above. If we consider an n-bit LFSR, the maximum number of unique test vectors that it can generate before any repetition occurs is 2 n - 1 (since the all 0s state is forbidden). An n-bit LFSR implementation that generates a sequence of 2 n – 1 unique patterns is referred to as a maximal length sequence or m-sequence LFSR. The LFSR illustrated in the considered example is not an m-sequence LFSR. It generates a maximum of 6 unique patterns before repetition occurs. The positioning of the XOR gates with respect to the flip-flops in the shift register is defined by what is called the characteristic polynomial of the LFSR. The characteristic polynomial is commonly denoted as P(x). Each non-zero co-efficient in it represents an XOR gate in the feedback network. The X n

and X 0 coefficients in the characteristic polynomial are always non-zero but do not

represent the inclusion of an XOR gate in the design. Hence, the characteristic polynomial of the example illustrated in Figure 5. 4 is P(x)= X4 + X3+ X + 1. The degree of the characteristic polynomial tells us about the number of flip-flops in the LFSR whereas the number of non-zero coefficients (excluding X n and X 0 ) tells us about the number of OR gates that would be used in the LFSR implementation.

5.1.2

Primitive

Figure 5.4 Test Vector Sequences

Polynomials

22

Characteristic polynomials that result in a maximal length sequence are called primitive polynomials while those that do not, are referred to as non-primitive polynomials. A primitive polynomial will produce a maximal length sequence irrespective of whether the LFSR is implemented using internal or external feedback. However, it is important to note that the sequence of vector generation is different for the two individual implementations. The sequence of test patterns generated using a primitive polynomial is pseudo-random. The internal and external feedback LFSR implementations for the primitive polynomial P(x) = X4 + X + 1 are shown below in Figure 5.5 and Figure 5.6 respectively.

Figure 5.5 internal Feedback P(X)=x4+x+1

Figure 5.6 External Feedback P(X)=x4+x+1

23

In Figure 5. 7 is illustrated the case where sequence generated by the feedback shift register is of the length (23-1).Also if one of these circuits generates a cyclic state sequence of length k, then the output sequence also repeats itself every k clock cycles. In the analysis of such circuits, all operations are done modulo 2. The truth table for modulo- 2 addition and subtraction is shown below.

Figure 5.7 Feedback shift register with three states

Table 5.1 Truth table for modulo-2 gates 5.1.3 Bit-Flipping Logic The bit-flipping DLBIST (H.-J. Wunderlich et.al 1996) [19] scheme provides both pseudo-random and deterministic test patterns. Some of the pseudo-random patterns generated by an LFSR, are altered into deterministic test patterns. Most of the

24

pseudorandom test patterns do not contribute to the fault coverage, since they only detect faults that already were detected by other pseudo-random patterns. Such useless pseudo-random test patterns may therefore be skipped or modified in any arbitrary way. The key idea is to modify some useless pseudorandom patterns into useful deterministic test patterns to improve the fault coverage. The deterministic test patterns are determined by an ATPG tool, and they target those faults that are not detected by pseudo-random test patterns. In such a deterministic test pattern, only few bits are actually specified, while most of the bits are don’t care and hence can arbitrarily be set to 0 or 1. The modification of the pseudo-random patterns is realized by inverting (flipping) some of the LFSR outputs, such that the deterministic patterns are obtained. The flipping is performed by combinational logic, which implements a so-called bit-flipping function BFL). The BFL can be kept quite small by exploiting the large number of useless pseudo random test patterns that may be modified, and carefully selecting the pseudorandom test patterns on which deterministic test patterns are mapped. As shown in Figure 5.8, the BFL inputs are connected to the LFSR, the pattern counter (PC), and the shift counter (SC), while the BFL outputs are connected to the XOR-gates at the scan inputs. The BFL determines whether a bit has to be flipped based on the states of the LFSR, the pattern counter and the shift counter. The pattern counter is part of the test control unit, and counts the number of test patterns applied during the selftest. The shift counter is also part of the test control unit, and counts the number of scan shift cycles for shifting data in/out the scan chains. If a phase shifter (PS) is attached to the LFSR, its output is used to control the operation of the BFL, as well. A correction logic(CRL) is preventing that useful patterns of the pseudo-random sequence are destroyed by the BFL. The BFL realizes the mapping of deterministic test patterns to pseudo-random patterns. Every specified bit (i.e. care bit) in a deterministic pattern either matches to the

25

corresponding bit in the pseudo-random pattern, in which case no bit-flipping is required, or the bit does not match, in which case bit-flipping is required. For all unspecified bits (i.e. don’t-care bits) in the deterministic pattern, the corresponding bits in the pseudorandom pattern may be arbitrarily flipped or not. The BFL should provide that (1) all conflicting bits are flipped, (2) all matching bits are not flipped while (3) the don’t care bits may be arbitrarily flipped or not. At the end of the test, a signature will be stored into Multi-Input Shift Register (MISR), containing the information about the correctness of the tested core. In order to obtain a not corrupted signature, the logic of the circuit should be BIST-ready, which is equivalent with ensuring that no Xs (unknowns) are generated. If that is not possible an Don’t Care masking logic should be inserted before the MISR.

Figure 5.8 Bit Flipping Logic 5.1.4 Mapping Test Patterns to Random Patterns In the bit-flipping DLBIST approach, the modification of the pseudo-random patterns is realized by inverting (flipping) some of the LFSR outputs, such that deterministic test stimuli are obtained. In the bit-fixing approach, the modification of the pseudorandom patterns is realized by fixing some of the LFSR outputs to either ‘1’ or ‘0’, such that

26

deterministic test patterns are produced . 1n (Gherman et.al 2004) [7], it has been shown that the expected number of bits to be flipped in order to embed a pre computed test cube is significantly smaller than the number of specified bits. From now on, only pattern modification by means of bit-flipping will be considered .Nevertheless, the considerations presented here can be applied to both the bit-flipping and the bit-fixing approaches, assuming a few modifications. The bit-flipping is realized by combinational logic implementing a so-called bit-flipping Logic (BFL). The BFL realizes the mapping of a set of deterministic test cubes to a (larger) set of pseudo-random patterns. Every specified bit (i.e. care bit) in a deterministic test cube either matches the corresponding bit in the associated pseudorandom pattern, in which case bit-flipping should not be performed, or the bit does not match, in which case bit-flipping is required. For all unspecified bits (i.e. don’t care bits) in a deterministic test cube, the corresponding bits in the associated pseudorandom pattern may be flipped or not. The BFL must provide that (1) all conflicting bits are flipped, (2) all matching bits are not flipped, while (3) the don’t care bits may be flipped or not. The BFF can be kept quite small by carefully selecting the candidates for each deterministic test cube in the large set of useless pseudo-random patterns. Without any loss of generality, consider a CUT with a single scan chain. Let S denote the set of all possible combinations of the states of the LFSR, the PC, the SC and the PS output (if any). The ON-set is the sub-set of S that corresponds to the clock cycles in which the LFSR (or PS) output must be flipped. Similarly, the OFF-set is the subset of S that corresponds to the clock cycles in which the LFSR (or PS) output must not be flipped. Obviously, the ON-set and OFF-set are disjoint (ON-set __OFF-set = _). The don’t care set (DC-set) contains those states of S that corresponds to the clock cycles in which the LFSR (or PS) output may be flipped or not, i.e. the states that are neither in the ON-set nor in the OFF-set (DC-set = S - {ON-set __OFF-set}). The DC-set may be exploited to minimize the logic implementation of the BFL.

27

The ON-set, OFF-set, and DC-set specify an incompletely specified function BFL:{0,1}n__{0,1,X}, where the symbol ‘X’ indicates a don’t care and n corresponds to the total number of state bits of the LFSR, the PC, the SC and output bits of the PS (if any). For instance, consider the simple example of a DLBIST scheme with a 2-bit LFSR, a 2-bit PC, a 2- bit SC and no PS (n = 6). Considering that the symbol ‘_’ stands for the concatenation of the LFSR, the PC and the SC states. Then BFL(01_10_01) = 1 indicates that the pseudo-random bit must be flipped when the LFSR state is 01, the PC state is 10, and the SC state is 01. The state 01_10_01 is therefore part of the ON-set. BFL(01_10_11) = 0 indicates that the pseudo-random bit must not be flipped when the LFSR state is 01, the PC state is 10, and the SC state is 11. The state 01_10_11 is therefore part of the OFF-set. BFL(10_01_01) = ‘X’ indicates that the pseudo-random bit may be flipped or not when the LFSR state is 10, the PC state is 01, and the SC state is 01. The state 10_01_01 is therefore part of the DC set.

Figure 5.9 Pattern Mapping Function

28

5.1.6 Multiple Input Signature Register Besides test pattern generation, BIST architectures should also be able to compress/ evaluate test responses. As the number of test patterns applied to the CUT is usually very large, it is infeasible to store all the expected values on-chip and compare them with the response values. It is much cheaper in terms of storage requirement and compacting circuitry to compress the test responses to short sequences, called signatures, which are delivered for analysis at the end of the test session (J. Rajski et.al 2003) [14]. A signature is obtained as the final state of a finite state machine whose inputs are fed with test responses. This type of compression which addresses the length of the test response sequence is also known as time compression. Examples of time compressors are accumulator, LFSR- and counter-based compactors. Boolean testing is performed by applying test patterns to an integrated circuit chip from the tester and observing the corresponding responses. A logic simulator is used to simulate the fault free design to obtain the responses expected from a fault free chip for the applied test patterns. The tested integrated circuit chip passes the test if and only if all observed test response bits match the simulated fault-free test response bits. Unfortunately, for complex designs, logic simulators cannot accurately predict the logic values of all test response bits. This is due to the presence of un-initialized and uncontrollable bistables, bus contention, floating busses, multiple clock domains or simply because the simulation model is inaccurate. The test response bits whose logic values are not accurately predicted by the simulators are also called unknown test response bits or X5. A major problem arises when test responses are compacted using on-chip hardware. For example, classical signature analyzers such as Multiple Input Signature Registers (MISRs) (I. Pomeranz et.al 2002) [12] are probably the best response compactors. The major problem with classical signature analysis is that the signature can be

29

corrupted in the presence of Xs. The outputs of four scan chains are connected to the inputs of the MISR. The initial MISR state is 0000. The states of the MISR during the first four clock cycles are shown in Fig. 5.10.

Figure 5.10 Multiple Input Signature register The other type of test response compaction, called space compression, is used to transform n test outputs into m
30

of such successive divisions. Instead of comparing a large set of test outputs, only the signature defined as the final state of the LFSR obtained at the end of the testing needs to be compared. An ideal compaction algorithm has the following features: (a) it should be easy to implement it as a part of the on-chip DFT circuitry; (b) it should not be a limiting factor with respect to test time; (c) it should provide a logarithmic compression of the test data; and (d) it should not lose information concerning the tested faults. However, there is no known compaction algorithm that satisfies all the above criteria. In particular, it is difficult to ensure that the compressed output obtained from a faulty circuit is not the same as the output of the fault-free circuit. This phenomenon is often referred to as error masking or aliasing and is measured in terms of the likelihood of its occurrence. Aliasing occurs because many compaction operations have an inherent filtering effect. Methods to design test response compactors with minimum aliasing probability are available in (M. Reddy et.al 1997) [15], among others. They use primitive feedback polynomials and assume that errors occur randomly. The probability of aliasing for MISR-based compression has been theoretically proven to be 2-k, where k is the signature length. We can note that the result is independent of the size and complexity of the CUT and a long signature can provide low aliasing. The use of accumulator based structures for test response compaction leads to aliasing probabilities comparable to the MISR-based methodology. In the counter based time compression approach the number of ones or the number of 0-1 and 1-0 transitions in the test response sequences are counted. Depending upon the situation, either ones counting, transition counting or MISR-based time compression is a better solution (Mitra et.al 2002) [10]. The basic idea behind response analysis is to divide the data polynomial (the input to the LFSR which is essentially the compressed response of the CUT) by the characteristic polynomial of the LFSR. The remainder of this division is the signature used to determine the faulty/fault-free status of the CUT at the end of the BIST

31

sequence. Since the last bit in the output response of the CUT to enter the SAR denotes the co-efficient x 0 , the data polynomial of the output response of the CUT can be determined by counting backward from the last bit to the first. Thus the data polynomial for this example is given by K(x), as shown in the Figure 5.11 (a). The contents for each clock cycle of the output response from the CUT are shown in Figure 4.11(b) along with the input data K(x), shifting into the SAR on the left hand side and the data shifting out the end of the SAR, Q(x), on the right-hand side. The signature contained in the SAR at the end of the BIST sequence is shown at the bottom of Figure 5.11 (b) and is denoted R(x). The polynomial division process is illustrated in Figure 5.11 (c) where the division of the CUT output data polynomial, K(x), by the LFSR characteristic polynomial, P(x) results in a quotient, Q(x), which is shifted out of the right end of the SAR, and a remainder R(x), which is contained in the SAR at the end of the BIST sequence.

4.FFigure 5.11 Example of fault detection and signature aliasing

32

CHAPTER 6 DON’T CARE MASKING 6.1 DLBIST ARCHITECTURE WITH DML

Figure 6.1 Architecture of DLBIST (With DML Logic)

In most embedded test approaches, the test responses are compressed by a multiple input shift register (MISR), which delivers a signature containing the information about the correctness of the CUT. The test responses may contain unknown bits (Xs), which can appear due to the existence of multiple clock domains, floating buses or uninitialized memory elements. In order to obtain an uncorrupted signature at the end of the test, these Xs have to be masked to either logic ‘0’ or logic ‘1’ before they propagate into the MISR. This may be performed by combinational logic implementing a so-called Don’t care masking function (DML). The DML can be kept quite small by carefully selecting those bits of the test responses carrying the information about the CUT correctness which have to remain unmasked.

33

The inputs of the BFL and the DML are the state bits of the pattern counter, the shift counter and the test pattern generator (TPG) which can be an LFSR and, eventually, a phase shifter. Both functions are incompletely specified functions. Consequently, they can be described by an ON-set and an OFF-set, containing the input assignments for which these functions must take the values ‘1’ and ‘0’, respectively. The remaining input assignments build the DC-set. 6.2 DON’T CARE MASKING LOGIC BUILT-IN self test (BIST) solves many of today’s testing problems, including pin throughput issues, complexity of test programs and test application at speed, and enables in-field testing (Flmto et.al 2005) [6]. One class of circuits that are difficult to handle using logic BIST (LBIST) consists of those that produce unknown values (X values) at the outputs. Sources of unknown values include tri-stated or floating buses, uninitialized flip-flops or latches, signals that cross clock domains in circuits with multiple clock domains, and X values coming from analog or memory blocks that are embedded in the random logic circuit. If an unknown value is fed into a test response evaluator (TRE), the signature can be affected. For the most popular TRE, the multiple input signature register (MISR), a single X value invalidates the whole signature. This problem has been attacked from two directions. First, X-tolerant compactors, i.e.,TREs that are less vulnerable to X values, have been proposed, including XCOMPACT by Intel and Convolutional Compactor by Mentor Graphics .The second solution puts no restriction on the type of TRE used. The unknown values that appear at the outputs of the circuit are masked out by additional logic, such that only known values are fed into the TRE. X-tolerant compactors are space compactors. hey are typically designed such that they can tolerate a certain number of Xs in addition to a number of faulty bits.1 While their area overhead is larger than for space compactors without X tolerance, the exact overhead is a function of the assumed maximal number of X values which can be present at the same time.

34

In contrast, masking is test set specific. It can be used with space or time compaction. Its overhead depends on implementation, e.g., whether any information is stored in the tester. It can be employed in a scheme that protects intellectual property (IP). The technique proposed here is of the second type, although it tackles problems which also exist for X-tolerant compactors, as will be explained below. The don’t care masking logic (DML) is introduced between the circuit under test (CUT) and the TRE. It consists of OR gates and synthesized control logic. The first input of each OR gate is connected to an output of the CUT, while the second input originates from the control logic. When the control logic produces a logic-1, the output of the OR gate is forced to logic-1, and hence the response of the CUT is masked. The control logic is a combinational function that uses as inputs the pattern counter and bit counter, which are generally part of the LBIST test control logic for controlling the number of applied patterns and the scan shift/capture cycles. In principle, it is possible to mask out only the unknown values in the response and toleave unchanged all the other values. However, masking the unknown bits exactly would result in high silicon area cost of DML. Furthermore, this is not necessary, as the vast majority of faults are detected by many different patterns. The DLBIST architecture with DML is shown in Fig. 4.12. Similarly to BFL, DML is a combinational logic block that has the LFSR state, the pattern number, and the bit number as inputs. DML provides control signals to the OR gates between the CUT and the MISR. A bit is masked iff DML generates a logic-1 at the corresponding OR gate. Note that DML is not on the critical path of the CUT. The impact on the circuit delay is due to the added OR gates only, as long as the delay of DML does not exceed the delay of CUT itself. Let the CUT have p outputs, and let the test set consist of q patterns. Let the responses of the CUT be ( r11,r12,….,r1p), ( r21,r22,….,r2p), ( rq1,rq2,….,rqp), where rij _

35

{0,1,X} is the value that appears at the jth output of the CUT as a response to the ith test pattern in absence of any fault. The term “output” stands for “primary output” for combinational and non scan sequential circuits, scan out ports for full-scan circuits and primary outputs and can-outs for partial-scan circuits. We are looking for a function DML:N×N_B such that ML(i,j)=1 if rij=X(i.e., all unknown values are masked). Furthermore, some that are important for preserving the fault coverage (called relevant bits) must not be masked (DML(i,j)=0 must hold for these bits). In general, there are several possibilities to select the set of relevant bits such that the desired fault coverage can be achieved. The size of X-masking logic depends on the number and exact positions of relevant bits. For values of of (i,j) , For which r ij _ X and which are not among the relevant bits, DML is allowed to assume either 0 or 1. This degree of freedom is utilized for minimizing the DML logic, as introduced next. The problem to synthesize the DML can be formulated as an instance of logic synthesis with don’t care (DC) sets.The value at the ith output of the CUT when the jth test pattern is applied is uniquely determined by the triple (LFSR state, pattern number, bit number), i.e., a state of (LFSR, pattern counter, bit counter). The logic synthesis instance is composed as follows: the ON set consists of (LFSR, pattern counter, bit counter)state triples that correspond to with . The OFF set includes all those triples that correspond to relevant bits. All other triples constitute the DC set. Once the ON and OFFsets are known,logic synthesis can be run. In general, compact ON and OFF sets will lead to smaller logic, because a logic synthesis tool has more degrees of freedom. While the ON set is given by the X values in the responses, there are several alternative OFF sets, depending on which bits are selected as relevant. Thus, both the number of relevant bits and the number of patterns they belong to should be minimized. 6.3 SELECTION OF RELEVANT BITS For the sake of simplicity, we call a value at an output of the circuit when a test pattern is applied a bit (so for outputs and patterns there are pq bits). A subset of these pq bits

36

has to be selected as relevant bits that are excluded from masking. Remember that a triple (LFSR state, pattern number, bit number) corresponds to a bit. The triples corresponding to relevant bits are included into the OFF set of the logic synthesis problem formulated above. If more bits are selected as relevant, the number of fault detections, but also the silicon area cost is growing. As an additional constraint, there is a parameter which is defined as the minimal number of detections that must be preserved when known bits are masked out. Obviously, a higher value of requires more bits to be selected as relevant. The selection algorithm uses the fault isolation table to select relevant bits. The fault isolation table contains for each stuck-at fault f all bits for which it is detected when no DML logic is introduced (the number of such bits is denoted as Nf). A bit is said to detect a fault if the fault’s effect is observed at the output of the circuit for the test pattern that corresponds to the bit. For each fault , the number of detections Df must be guaranteed to be at least min{ Nf, n }. Note that if bits detecting a fault have been selected as relevant, the actual number of detections will typically be higher, because the DML could (but is not guaranteed to) leave other bits detecting this fault (but not selected as relevant) unmasked. It constructs the set RB of relevant bits such that each fault f is detected by at least min{ Nf, n } bits from RB. This is done iteratively. In each iteration, (Lines 2–10), a fault s picked and several bits are selected as relevant, such that the fault is detected by a sufficient number of bits ( Df =number of detections of the fault ). The selected bits might also detect other faults. This is checked in Line 6. All faults whose number of detection is Dg greater or equal than the required number min{ Nq, n } are excluded from the fault isolation table (Line 7–8). Note that the fault f from Line 3 is always among these faults. The algorithm stops when the fault isolation table is empty (Line 2). Procedure select_rel_bits Input: Fault isolation table FIT;parameter n

37

Output: compact set RB of relevant bits That fulfills coverage requirements (1) RB:=φ; (2) While (FIT not empty) begin (3) f:=fault from FIT with lowest number of detections; (4) RB:=RB ∪ Select_bits_for_fault(f,min{Nf, n}-Df); //select bits to ensure sufficient detections (5) for each fault g from FIT begin (6) Determine Dg with relevant bits selected so far; (7) if (Dg ≥ min{Nf,n}) (8) then exclude g from FIT; (9) end for (10) end while (11) return RB; end select_rel_bits;

Figure 4.13 Algorithm for selecting relevant bits.

38

CHAPTER 7 RESISTIVE BRIDGING FAULT MODEL 7.1 RESISTIVE BRIDGING FAULT The main difficulty when dealing with resistive faults is that, unlike for the non resistive case, there is an unknown value to be taken into account, the resistance (M. Renovell et.al 1994) [14]. This is because it cannot be known in advance which particle will cause the short defect corresponding

to

the

bridge.

Parameters

like

its

shape,

size,

conductivity, exact location on the die, evaporation behavior and electro migration can influence the resistance of the short defect. A short defect may be detected by a test pattern for one resistance value, and the short between the same nodes may not be detected by the same pattern for another resistance. This fundamentally changes the meaning of standard testing concepts, like redundancy, fault coverage, and so forth. A logical fault representing an electrical connection between a pair of signal lines (nets) is referred to as a bridging fault. The non-resistive bridging fault model considers a short between the two nets. The logic value of the shorted nets may be modeled as 1dominated (OR bridge), 0-dominated (AND bridge) or intermediate, depending upon the implementation technology. More general and realistic is the resistive bridging fault model, in which the connection between the two nets is characterized by an arbitrary electrical resistance . The resistive bridging fault model will be used in the following chapters to account for non-target defects. This is due to the fact that the cause which generated of the bridging fault

39

cannot be known in advance. Topological and physical parameters like shape, size, electrical conductivity, exact location on the die, evaporation behavior, electron

40

migration and environmental temperature can influence the resistance of the short defect (P. Engelke et.al 2003) [05].A test pattern may detect a bridging defect for one resistance value and not for another resistance value. This fundamentally changes the meaning of standard testing concepts, like testability, redundancy, fault coverage, etc (C. Lee et.al 2000) [9]. In order to illustrate this, consider the example sketched in Figure 7.1. The nets a and b in this example are bridged by a short defect with the resistance Rsh. The voltage Va on a and the voltage Vb on b both depend not only on the input pattern, but also on the bridge resistance Rsh.

Figure 7.1Example circuit Consider the input assignment 0011. Here, it is considered that logic values ‘1’ and ‘0’ are encoded by a high-, respectively a low-voltage. A possible voltage dependence on the Rsh values is depicted by the solid curves in Figure 7.2. For Rsh = 0Ω, there is an intermediate voltage identical for both lines. With increasing Rsh, Va and Vb diverge with Va approaching VDD and Vb approaching 0. The transistors succeeding the bridge will interpret these voltages as logic-0 or logic-1, depending on their input threshold voltages Th. In Figure 7.2, the threshold voltages for transistors C, D and E are shown as

41

horizontal lines labeled by ThC, ThD and ThE, respectively. Hence, the resistive bridging fault may be observed at the drain of the transistors C or E and eventually at the output of the gates containing these transistors iff Rsh ∈ [0, RC], respectively Rsh ∈ [0, RE]. For transistor D, the threshold voltage ThD is below the curve, implying that transistor D will recognize the voltage on a as a logic-1 for any Rsh. Consequently, the fault effect is visible at one of the outputs iff Rsh ∈ [0, RC] ∪ ∅ ∪ [0, RE] = [0, RE]. Next, consider the input pattern 0111 that sets a high-voltage on the second input of the NAND gate. In this case, only one p-transistor will pull up the voltage on the net a to the power supply. Thus, the net a is still driven with logic-1, but with less strength, while the logic-0 on the net b has the same strength as before. One possible voltage characteristic for Va and Vb is described in Figure 7.2 by the dashed curves situated underneath the solid ones. Hence, the fault effect is visible at one of the transistor drains and eventually at the outputs of the corresponding gates iff Rsh ∈ [0, RC’] ∪ [0,RD’] ∪ [0, RE’] = [0, RC’]. Consequently, a resistive bridging fault with Rsh ∈ [RC’, RE] may be detected by the pattern 0011, but not by the pattern 0111, although the logic values on all internal lines of the fault-free circuit are identical for these two patterns.

Figure 7.2 Rsh-V Diagrams

42

CHAPTER 8 EXPERIMENTAL RESULTS 8.1 SIMULATION RESULTS FOR DLBIST WITHOUT DML

Figure 8.1 Simulated waveforms for DLBIST Without DML

43

8.2 SIMULATION RESULTS FOR DLBIST WITH DML LOGIC

Figure 8.2 Simulated waveforms for DLBIST With DML

44

8.3 SYNTHESIS REPORT FOR DLBIST WITHOUT DML LOGIC ======================================================================= HDL Synthesis Report Macro Statistics # Registers 4-bit register 1-bit register # Counters 32-bit up counter # Multiplexers 1-bit 16-to-1 multiplexer # Comparators 32-bit comparator greatequal 4-bit comparator not equal # Xors 1-bit xor2

: : : : : : : : : : : :

44 3 41 2 2 4 4 2 1 1 9 9

======================================================================= Starting low level synthesis... Optimizing unit ... Building and optimizing final netlist ... Register free_fault_1 equivalent to fault_1 has been removed Register free_fault_2 equivalent to fault_2 has been removed Register free_fault_3 equivalent to fault_3 has been removed FlipFlop temp_0 has been replicated 1 time(s) ======================================================================= Final Results Top Level Output File Name : test1 Output Format : NGC Optimization Criterion : Speed Target Technology : virtex2 Keep Hierarchy : No Macro Generator : macro+ Macro Statistics # Registers 32-bit register 4-bit register 1-bit register

: : : :

46 2 3 41

45

# Multiplexers 1-bit 16-to-1 multiplexer # Adders/Subtractors 32-bit adder # Comparators 32-bit comparator greatequal

: : : : : :

4 4 2 2 1 1

Design Statistics # IOs

: 12

Cell Usage : # BELS : 289 # GND : 1 # LUT1 : 50 # LUT1_L : 25 # LUT2 : 7 # LUT3 : 8 # LUT3_L : 1 # LUT4 : 21 # LUT4_D : 1 # LUT4_L : 10 # MUXCY : 74 # MUXF5 : 16 # MUXF6 : 8 # MUXF7 : 4 # VCC : 1 # XORCY : 62 # FlipFlops/Latches : 113 # FD : 12 # FDE : 35 # FDR : 65 # FDRE : 1 # Clock Buffers : 1 # BUFGP : 1 # IO Buffers : 11 # IBUF : 1 # OBUF : 10 ======================================================================= ======================================================================= TIMING REPORT NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE. FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT GENERATED AFTER PLACE-and-ROUTE. Clock Information: ------------------

46

---------------------------------- +------------------------+-------+ Clock Signal | Clock buffer(FF name) | Load | -----------------------------------+------------------------+-------+ clk | BUFGP | 113 | -----------------------------------+------------------------+-------+ Timing Summary: --------------Speed Grade: -5 Minimum period: 5.599ns (Maximum Frequency: 178.603MHz) Minimum input arrival time before clock: 2.612ns Maximum output required time after clock: 6.173ns Maximum combinational path delay: No path found Timing Detail: -------------All values displayed in nanoseconds (ns) ----------------------------------------------------------------------Timing constraint: Default period analysis for Clock 'clk' Delay: 5.599ns (Levels of Logic = 3) Source: count_30 Destination: fault_0 Source Clock: clk rising Destination Clock: clk rising Data Path: count_30 to fault_0 Gate Net Cell:in->out fanout Delay Delay Logical Name (Net Name) ---------------------------------------- -----------FDR:C->Q 2 0.494 0.698 count_30 (count_30) LUT2:I0->O 1 0.382 0.360 I_1_LUT_13 (N253) LUT4:I0->O 5 0.382 0.987 I_0_LUT_18 (N285) LUT4:I2->O 20 0.382 1.706 I__n0018 (N357) FDE:CE 0.208 fault_0 ---------------------------------------Total 5.599ns (1.848ns logic, 3.751ns route) (33.0% logic, 67.0% route) ----------------------------------------------------------------------Timing constraint: Default OFFSET IN BEFORE for Clock 'clk' Offset: 2.612ns (Levels of Logic = 2) Source: rst Destination: lfsr_reg_0 Destination Clock: clk rising Data Path: rst to lfsr_reg_0 Gate Net Cell:in->out fanout Delay Delay ---------------------------------------IBUF:I->O 1 0.718 0.360 LUT1:I0->O 4 0.382 0.944 FDE:CE 0.208

Logical Name (Net Name) -----------rst_IBUF (rst_IBUF) I_INV_rst (N226) lfsr_reg_0

47

---------------------------------------Total 2.612ns (1.308ns logic, 1.304ns route) (50.1% logic, 49.9% route) ----------------------------------------------------------------------Timing constraint: Default OFFSET OUT AFTER for Clock 'clk' Offset: 6.173ns (Levels of Logic = 1) Source: t11 Destination: f13 Source Clock: clk rising Data Path: t11 to f13

Gate Net Cell:in->out fanout Delay Delay Logical Name (Net Name) ---------------------------------------- -----------FDE:C->Q 4 0.494 0.944 t11 (t11) OBUF:I->O 4.735 f13_OBUF (f13) ---------------------------------------Total 6.173ns (5.229ns logic, 0.944ns route) (84.7% logic, 15.3% route)

======================================================================= CPU : 52.36 / 53.88 s | Elapsed : 53.00 / 53.00 s

48

8.4 SYNTHESIS REPORT FOR DLBIST WITH DML LOGIC ======================================================================= HDL Synthesis Report Macro Statistics # Registers 4-bit register 1-bit register # Counters 32-bit up counter # Multiplexers 2-to-1 multiplexer 1-bit 16-to-1 multiplexer # Comparators 32-bit comparator greatequal 4-bit comparator not equal # Xors 1-bit xor2

: : : : : : : : : : : : :

24 3 21 1 1 8 5 3 2 1 1 7 7

======================================================================= Starting low level synthesis... Optimizing unit ... Building and optimizing final netlist ... Register free_fault_1 equivalent to fault_1 has been removed Register free_fault_2 equivalent to fault_2 has been removed Register free_fault_3 equivalent to fault_3 has been removed Register r13 equivalent to q13 has been removed Register xml1 equivalent to xml0 has been removed Register fault_3 equivalent to fault_2 has been removed Register s13 equivalent to q13 has been removed Register xml2 equivalent to xml0 has been removed Register fault_2 equivalent to fault_1 has been removed ======================================================================= == Final Results Top Level Output File Name : test2 Output Format : NGC Optimization Criterion : Speed Target Technology : virtex2 Keep Hierarchy : No Macro Generator : macro+ Macro Statistics # Registers 32-bit register 4-bit register 1-bit register

: : : :

25 1 3 21

49

# Multiplexers 1-bit 16-to-1 multiplexer # Adders/Subtractors 32-bit adder # Comparators 32-bit comparator greatequal

: : : : : :

3 3 1 1 1 1

Design Statistics # IOs

: 12

Cell Usage : # BELS : 146 # GND : 1 # LUT1 : 15 # LUT1_L : 28 # LUT2 : 3 # LUT3_L : 1 # LUT4_L : 9 # MUXCY : 43 # MUXF5 : 8 # MUXF6 : 4 # MUXF7 : 2 # VCC : 1 # XORCY : 31 # FlipFlops/Latches : 49 # FD : 12 # FDE : 4 # FDR : 33 # Clock Buffers : 1 # BUFGP : 1 # IO Buffers : 11 # IBUF : 1 # OBUF : 10 ======================================================================= ======================================================================= TIMING REPORT NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE. FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT GENERATED AFTER PLACE-and-ROUTE. Clock Information: ----------------------------------------------------+------------------------+-------+ Clock Signal | Clock buffer(FF name) | Load | -----------------------------------+------------------------+-------+ clk | BUFGP | 49 | -----------------------------------+------------------------+-------+ Timing Summary: ---------------

50

Speed Grade: -5 Minimum Minimum Maximum Maximum

period: 5.129ns (Maximum Frequency: 194.969MHz) input arrival time before clock: 2.612ns output required time after clock: 6.094ns combinational path delay: No path found

Timing Detail: -------------All values displayed in nanoseconds (ns) ----------------------------------------------------------------------Timing constraint: Default period analysis for Clock 'clk' Delay: 5.129ns (Levels of Logic = 13) Source: temp_0 Destination: temp_0 Source Clock: clk rising Destination Clock: clk rising Data Path: temp_0 to temp_0

Gate Net Cell:in->out fanout Delay Delay Logical Name (Net Name) ---------------------------------------- -----------FDR:C->Q 14 0.494 1.388 temp_0 (temp_0) LUT4_L:I0->LO 1 0.382 0.000 Mcompar__n0021_inst_lut4_0 (Mcompar__n0021_inst_lut4_0) MUXCY:S->O 1 0.137 0.000 Mcompar__n0021_inst_cy_0 (Mcompar__n0021_inst_cy_0) MUXCY:CI->O 1 0.046 0.000 Mcompar__n0021_inst_cy_1 (Mcompar__n0021_inst_cy_1) MUXCY:CI->O 1 0.046 0.000 Mcompar__n0021_inst_cy_2 (Mcompar__n0021_inst_cy_2) MUXCY:CI->O 1 0.046 0.000 Mcompar__n0021_inst_cy_3 (Mcompar__n0021_inst_cy_3) MUXCY:CI->O 1 0.046 0.000 Mcompar__n0021_inst_cy_4 (Mcompar__n0021_inst_cy_4) MUXCY:CI->O 1 0.046 0.000 Mcompar__n0021_inst_cy_5 (Mcompar__n0021_inst_cy_5) MUXCY:CI->O 1 0.046 0.000 Mcompar__n0021_inst_cy_6 (Mcompar__n0021_inst_cy_6) MUXCY:CI->O 1 0.046 0.000 Mcompar__n0021_inst_cy_7 (Mcompar__n0021_inst_cy_7) MUXCY:CI->O 1 0.046 0.000 Mcompar__n0021_inst_cy_8 (Mcompar__n0021_inst_cy_8) MUXCY:CI->O 1 0.046 0.000 Mcompar__n0021_inst_cy_9 (Mcompar__n0021_inst_cy_9) MUXCY:CI->O 1 0.046 0.000 Mcompar__n0021_inst_cy_10 (Mcompar__n0021_inst_cy_10) MUXCY:CI->O 32 0.046 1.978 Mcompar__n0021_inst_cy_11 (N162) FDR:R 0.244 temp_0 ---------------------------------------Total 5.129ns (1.763ns logic, 3.366ns route)

51

(34.4% logic, 65.6% route) ----------------------------------------------------------------------Timing constraint: Default OFFSET IN BEFORE for Clock 'clk' Offset: 2.612ns (Levels of Logic = 2) Source: rst Destination: lfsr_reg_0 Destination Clock: clk rising Data Path: rst to lfsr_reg_0

Gate Net Cell:in->out fanout Delay Delay Logical Name (Net Name) ---------------------------------------- -----------IBUF:I->O 1 0.718 0.360 rst_IBUF (rst_IBUF) LUT1:I0->O 4 0.382 0.944 I_INV_rst (N117) FDE:CE 0.208 lfsr_reg_0 ---------------------------------------Total 2.612ns (1.308ns logic, 1.304ns route) (50.1% logic, 49.9% route)

-----------------------------------------------------------------------Timing constraint: Default OFFSET OUT AFTER for Clock 'clk' Offset: 6.094ns (Levels of Logic = 1) Source: q13 Destination: q13 Source Clock: clk rising Data Path: q13 to q13 Gate Net Cell:in->out fanout Delay Delay Logical Name (Net Name) ---------------------------------------- -----------FD:C->Q 3 0.494 0.865 q13 (s13_OBUF) OBUF:I->O 4.735 q13_OBUF (q13) ---------------------------------------Total 5.094ns (5.229ns logic, 0.865ns route) (85.8% logic, 14.2% route) ======================================================================= CPU : 38.51 / 39.02 s | Elapsed : 39.00 / 39.00 s

52

8.5 POWER ANALYSIS REPORT FOR DLBIST WITHOUT DML

Figure 8.3 power analysis reports for DLBIST without DML

53

8.6 POWER ANALYSIS REPORT FOR DLBIST WITH DML:

Figure 8.4 power analysis reports for DLBIST with DML

54

8.7RTL DIAGRAM FOR DLBIST WITHOUT DML

Figure 8.5 RTL Diagram for DLBIST without DML

55

8.8 RTL DIAGRAM FOR DLBIST WITHOUT DML

Figure 8.6 RTL Diagram for DLBIST with DML

56

57

CHAPTER 9 CONCLUSIONS This work presents and details the development of the first scalable deterministic logic built-in self-test (DLBIST) with DML approach. The implemented scheme is based on the STUMPS architecture and it relies on a pattern generator that can achieve very high fault coverage. Logic blocks that produce unknown values at their outputs are hard to deal with in a BIST environment, as the signature maybe corrupted by the unknown values. Masking the X values at the outputs of such modules allows the use of arbitrary TREs, including those vulnerable to X values. Since most faults are detected by many patterns, some known bits can also be masked without loss of stuck-at fault coverage. For the first time, the uncorrupted test sequences obtained by masking unknown values with DML logic. The resistive bridging fault model has been used to model non-target defects. The experimental results reveal that both deterministic test cubes and pseudorandom test sequences are useful for detecting non-target defects. Furthermore, it has been shown that increasing the length of the test sequences enhances their non-target defect coverage and significantly reduces the testing time. This increases the appeal of the proposed DLBIST with DML scheme and reduces the need for expensive automated test equipment (ATE). In Table 9.1, The DLBIST without and with DML approaches have been compared with respect to the testing time of the test sequences and the power consumption. The DLBIST with DML approach is able to reduce the total testing time, while also the power requirement scale quite well with the circuit size.

ISCAS Circuit Design

S27

Without DML Logic Testing Time Power (ns) Consumption (mw) 6.173

121.23

With DML Logic Testing Time Power (ns) Consumption (mw) 5.094

118.77

58

Table 9.1 Comparison of the DLBIST with and without approaches on S27 designs.

59

CHAPTER 10 REFERENCES 1. B. Benware, C. Schuermyer, S. Ranganathan, R. Madge, P. Krishnamurty,N. Tamarapalli, K. H. Tsai, and J. Rajski, “Impact of multiple-detect test patterns on product quality,” in Proc. Int. Test Conf., 2003, pp.1031–1040. 2. R. K. Brayton, R. Rudell, A. L. Sangiovanni-Vincentelli, and A. R. Wang, “MIS: A multiple-level logic optimization system,” IEEE Trans.Comput., vol. C-6, no. 6, Jun. 1987,pp. 1062–1081. 3. F. Brglez, D. Bryan, and K. Kozminski, “Combinational profiles of sequential benchmark circuits,” in Proc. Int. Symp. Circuits Syst., 1989, pp. 1929–1934. 4. T. Clouqueur, K. Zarrineh, K. K. Saluja, and H. Fujiwara, “Design and analysis of multiple weight linear compactors of responses containing unknown values,” presented at the Int. Test Conf., Austin, TX, 2005. 5. P. Engelke, I. Polian, M. Renovell, and B. Becker, “Simulating resistive bridging and stuck-at faults,” in Proc. Int. Test Conf., 2003, pp. 1051–1059. 6. Flmto, and K. Iwasaki, “Analysis of error-masking and X-masking probabilities for convolutional compactors,” presented at the Int. Test Conf., Austin, TX, 2005. 7. Gherman, H. J. Wunderlich, H. Vranken, F. Hapke, M. Wittke, and M. Garbers, “Efficient pattern mapping for deterministic logic BIST,” in Proc. Int. Test Conf., 2004, pp. 48–56. 8. G. Kiefer, H. Vranken, E. J. Marinissen, and H.-J. Wunderlich, “Application of deterministic logic BIST on industrial circuits,” in Proc. Int. Test Conf., 2000, pp. 105–114. 9. C. Lee and D. M. H.Walker, “PROBE: A PPSFP simulator for resistive bridging faults,” in Proc. VLSI Test Symp., 2000, pp. 105–110.

60

10. Mitra and K. S. Kim, “X-Compact: An efficient response compaction technique for test cost reduction,” in Proc. Int. Test Conf., 2002, pp. 311–320.

61

11. M. Naruse, I. Pomeranz, S. M. Reddy, and S.Kundu, “On-chip compression of output responses with unknown values using LFSR reseeding,” in Proc. Int. Test Conf., 2003, pp. 1060–1068. 12. I. Pomeranz, S. Kundu, and S. M. Reddy, “On output response compression in the presence compression in the response of unknown output values,” in Proc. Des. Autom. Conf., 2002, pp. 255– 258. 13. J. Rajski, C. Wang, J. Tyszer, and S. M. Reddy, “Convolutional compaction of test responses,” in Proc. Int. Test Conf., 2003, pp. 745–754. 14. M. Reddy, I. Pomeranz, and S. Kajihara, “Compact test sets for high defect coverage,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 16, no. 8, Aug. 1997,pp. 923–930. 15. M. Renovell, P. Huc, and Y. Bertrand, “CMOS bridge fault modeling,” in Proc. VLSI Test Symp., 1994, pp. 392–397. 16. R. Rodríguez-Montañés, E. M. J. G. Bruls, and J. Figueras, “Bridging defects resistance measurements in a CMOS process,” in Proc. Int. Test Conf., 1992, pp. 892–899. 17. N. A. Touba nd E. J. McCluskey, “Altering a pseudo-random bit sequence or scan based BIST,” in Proc. Int. Test Conf., 996, pp. 649–658. 18. E. H.Volkerink and S. Mitra, “Response compaction with any number of unknowns using a new LFSR architecture,” in Proc. Des. Autom. Conf., 2005 ,pp.7–122. 19. H.-J. Wunderlich and G. Kiefer, “Bit-flipping BIST,” in Proc. Int. Conf. omput.-Aided Des., 1996, pp. 337–343. 20. H. J.Wunderlich, “BIST for systems-on-a-chip,” Integr. VLSI J., vol. 26, no. 12, Dec. 1998,pp. 55–78, Dec. 1998. 21. B.Syed ibrahim and C.Rajasekaran, “An Advanced Logic BIST System for Don’t care Masking” in Proc.