A review paper on memory fault models and test algorithms

Received Apr 20, 2021 Revised Aug 19, 2021 Accepted Oct 9, 2021 Testing embedded memories in a chip can be very challenging due to their high-density nature and manufactured using very deep submicron (VDSM) technologies. In this review paper, functional fault models which may exist in the memory are described, in terms of their definition and detection requirement. Several memory testing algorithms that are used in memory built-in self-test (BIST) are discussed, in terms of test operation sequences, fault detection ability, and also test complexity. From the studies, it shows that tests with 22 N of complexity such as March SS and March AB are needed to detect all static unlinked or simple faults within the memory cells. The N in the algorithm complexity refers to Nx*Ny*Nz whereby Nx represents the number of rows, Ny represents the number of columns and Nz represents the number of banks. This paper also looks into optimization and further improvement that can be achieved on existing March test algorithms to increase the fault coverage or to reduce the test complexity.


INTRODUCTION
Design for testability (DFT) is a design technique used by IC designers and manufacturers to enhance the controllability and observability of the device under test (DUT), and subsequently improving the fault coverage to >90% for large design [1], [2]. In the IC design flow for ASIC or SOC, during the DFT stage, auxiliary circuitries are added to the design to allow testability of the device after fabrication. DFT concerns both logic circuits as well as memories inside a chip. In recent years, the trend shows SOCs are memory-dominant chips, where embedded memories occupied up to 90% of the total chip area. Therefore, the quality of the memories is the main factor in having a good manufacturing yield [3]- [9]. Testing an embedded memory like a DRAM and SRAM can be very challenging, due to its extreme density [10]. Furthermore, most IC chips are manufactured using very deep submicron (VDSM), more defects are occurring during the chip fabrication process resulting in complex chip testing [9], [11]- [13]. Memories in a chip can be tested using the memory BIST technique, which goal is to ensure the memories are free from any defect resulting in higher yields. Memory BIST technique also allows the reduction in overall testing cost since no external tester is needed and testing can be performed in parallel thus shortening test time [14]- [21].
Memory BIST works on testing the memory under test (MUT) by applying a sequence of test operations or a test algorithm. In order to efficiently test a memory, there are two main criteria: fault coverage and test complexity. The former determines the fault detection ability, whereby the higher the fault coverage, the better the fault detection. It can be calculated in: While the latter determines the length or duration it will take to complete the memory testing. The test complexity is usually written as O(xN m ), where x is the number of test operations required and N is the size of the memory. The test complexity is said to be linear if m is equal to 1, otherwise, it is non-linear. Table 1 shows the time it takes to complete the memory testing, based on the memory size and the test complexity [22], [23]. In the case of tests with linear test complexity, there are x read or write operations that need to be performed on each of N memory cells. Hence, to determine the test length, it will take a minimum of xN clock cycles to complete the test. Table 1. Memory testing as a function of test complexity and memory size [23] Size (N) Complexity N N log N N 3 In this paper, the functional fault models are discussed. As the number of possible faults can be unlimited [24], this paper focuses on the commonly used fault models in the literature. This paper also discusses different memory test algorithms with a focus on March series test algorithms as they are commonly used in the industry due to their simplicity, linear-time test complexity, and low area overhead [25]- [27]. The notations used for describing the fault models and the test algorithms are also presented.

MEMORY FUNCTIONAL FAULT MODELS
Three commonly used terms in DFT: defect, fault, and error. According to [28], a defect is an unintended difference between the circuit design and the implemented hardware, which occurs during manufacturing or the use of the devices. A fault is the representation of a defect in the abstracted function level. For example, when there is a short-circuit between a net and power supply VDD in the device, this defect is represented as a stuck-at 1 fault in the abstracted function level. As a result, a defective device will produce incorrect outputs called the error. In a digital system, a comparison of the logical behavior of the tested system with the behavior of the good system is often used when conducting the test. Therefore, there is a need for all physical failures during manufacturing to be modeled as logical faults [5]. Figure 1 shows the general architecture of a random access memory (RAM) [29], [30]. It normally consists of memory cells, an address decoder, a write driver, and a sense amplifier. A fault may occur in any one of these blocks. As multiple faults may exist in memory simultaneously, they can influence or interact with each other. This scenario is referred to as the linked fault model. Fault models that are linked can be the same or of different types. They can also work in such a manner that one fault masks the behavior of another fault [31], [32]. This paper focuses on reviewing unlinked faults whereby multiple faults do not interact with each other.
The functional fault models of memory can be a static fault or a dynamic fault. A static fault model can be sensitized by only one operation (read or write), while a dynamic fault model requires more than one operation to be sensitized. Therefore, the number of dynamic fault sets can be unlimited theoretically [33]. In this review paper, only the static faults are discussed. The examples of a static fault: a. Stuck-at fault (SAF)-the value of a cell is forced to a logic 0 (SAF0) or logic 1 (SAF1), regardless of the value of the logic. b. Transition fault (TF)-the memory cell fails to transition from a logic 0 to logic 1, or from a logic 1 to logic 0. c. Read destructive fault (RDF)-the content of a memory cell can be changed during a reading process. d. Deceptive read destructive fault (DRDF)-the content of a memory cell can be changed during a reading process, but the read output has the correct value. e. Write disturb fault (WDF)-the content of a memory cell is changed when performing a non-transition write operation. f. Incorrect read fault (IRF)-a read on a memory cell returns an incorrect value, while its state remains unchanged. g. Coupling faults (CF)-a write operation in one cell (also referred to as the aggressor cell) influences the value in another cell (the victim cell). There are six types of CFs: − State coupling fault (CFst)-a victim cell is forced to a logic 0 or logic 1 when the aggressor cell is in a given state − Idempotent coupling fault (CFid)-a change of value in the aggressor cell will unexpectedly change the value in the victim cell − Inverse coupling fault (CFin)-the value of the victim cell will complement the value of the aggressor cell − Transition coupling fault (CFtr)-a victim cell fails to transition from low to high (or high to low) when the aggressor cell is in a given state. − Read destructive coupling fault (CFrd)-the content of a victim cell can be changed during a reading process when the aggressor is in a given state. − Deceptive read destructive coupling fault (CFdrd)-the content of a victim cell can be changed during a reading process when the aggressor is in a given state, but the read output has the correct value. − Write destructive coupling fault (CFwd)-the content of a victim cell is changed when performing a nontransition write operation when the aggressor cell is in a given state. As device technologies move towards very deep submicron, faults like DRDF, WDF, CFdrd, and CFwd are becoming more relevant in today's embedded memories compare to conventional faults like CFid, CFin, and CFst [24]. Table 2 summarizes all the mentioned fault models, in terms of their fault primitives (FPs) and the detection requirement. A fault primitive (FP) is a combination of sensitizing operation sequence (S), faulty behavior (F), and the output of the read operation (R), for each of these fault models. It is denoted as <S / F / R> for single-cell faults (SCFs) or <Sa;Sv / F / R> for double-cell faults (DCFs), where a and v stand for the aggressor and the victim cells, respectively. The set of S is defined as ∈ {0, 1, , 0, 1, 0, 1}, where X indicates that the value of S doesn't have any importance. The faulty behavior F can be either: ∈ {0, 1, ? }, where '?' denotes an undefined logic. While the read operation return value R can take one of the following values: ∈ {0, 1, ? , −}, where '?' denotes an undefined logic and '-' is used when the output data is not applicable [31], [34]- [36].
It can also be seen from Table 2 that, the detection of double-cell faults can be very complex since it involves more than one operation for the sensitization and detection, and it also requires the test to be performed in both address order, so that it would cover the cases where the address of the aggressor cell is lower than the victim cell (denotes as a<v) and vice-versa (denotes as a>v) [23], [37]. Moreover, in many cases, the sensitization of the victim cell must consider all possible values of the aggressor cell (logic 0 and 1). Therefore, it can be said that there are 8 possible cases for each of the CFtr, CFrd, CFdrd, and CFwd [25], [38], [39].
Apart from the fault models previously mentioned, which may occur within the memory cells, other types of faults can appear in other parts of the memory block. The faults that could happen in the address decoder are referred to as the address decoder faults (ADF). There are four possible scenarios for ADF: − No memory cell can be accessed by certain addresses − Multiple cells can be simultaneously accessed by a certain address − A certain memory cell can be accessed by multiple addresses − A certain memory cell is not accessible by any address  Table 2. Common memory fault models with their fault primitives and detection requirement [32], [40] Fault Type SCF/DCF FP Detection Requirement To detect SAFx, an opposite value x' is written to the cells. The read will return the x value in the case of faulty. The test must be performed for both x=0 and x=1.
The contents of the cells must be set to x first. Then, x' value is written to the cells, followed by a read operation. The read will return the x value in the case of faulty. The test must be performed for both x=0 and x=1.
For a complete CFst/CFid/CFin detection, the test algorithm must contain the following sequence [32]: For a complete CFst/CFid/CFin detection, the test algorithm must contain the following sequence [32]: For a complete CFst/CFid/CFin detection, the test algorithm must contain the following sequence [32]: The contents of the cells must be set to x first, followed immediately by a read operation. The read will return the x' value in the case of faulty. The test must be performed for both x=0 and x=1.
The contents of the cells must be set to x first, followed immediately by consecutive double read operations. The second read will return the x' value in the case of faulty. The test must be performed for both x=0 and x=1.
The contents of the cells must be set to x first. Then, another write x value operation is performed (non-transition), followed by a read operation. The read will return the x' value in the case of faulty. The test must be performed for both x=0 and x=1.
The contents of the cells must be set to x first followed by a read operation. The read will return the x' value in the case of faulty. The test must be performed for both x=0 and x=1.
The contents of the victim cell and the aggressor cell must be set to x and y first, where y=0 or 1. Then, x' value is written to the victim cell, followed by a read operation. The read will return the x value in the case of faulty. The test must be performed for both x=0 and x=1, and for both a<v and a>v cases.
The contents of the victim cell and the aggressor cell must be set to x and y first, where y=0 or 1, followed immediately by a read operation on the victim cell. The read will return the x' value in the case of faulty. The test must be performed for both x=0 and x=1, and for both a<v and a>v cases.
The contents of the victim cell and the aggressor cell must be set to x and y first, where y=0 or 1, followed immediately by a double read operation on the victim cell.
The second read will return the x' value in the case of faulty. The test must be performed for both x=0 and x=1, and for both a<v and a>v cases. CFwd DCF <0;0w0 / 1 / -> <0;1w1 / 0 / -> <1;0w0 / 1 / -> <1;1w1 / 0 / -> The contents of the victim cell and the aggressor cell must be set to x and y first, where y=0 or 1. Then, another write x value operation is performed to the victim cell, followed by a read operation. The read will return the x' value in the case of faulty. The test must be performed for both x=0 and x=1, and for both a<v and a>v cases.
According to Siemens [40], ADFs can be detected by using any March style test, and thus, no additional detection requirement is needed. Another type of fault model is the stuck-open faults (SOF), which can occur at the sense amplifier block. If the sense amplifier does not contain a data latch, SOF can be treated as a SAF. Otherwise, the following sequence must be applied for the detection [40]: − The cell under test must be storing a logic x (where x can be either 0 or 1) − An inverse logic x' is written to the cell − The value stored in the cell is read and compared with the expected value The mention detection requirement should be covered by the test for detecting the transition faults (TF). There are other types of fault models that can occur in the memory but are rarely discussed in the previous research works. For example, the neighborhood pattern-sensitive faults (NPSF) that are different from other coupling faults since they involve several aggressor cells and one victim cell [41]. There are several types of faults under the NPSF model, such as the single-port bitline coupling faults, the access transistor leakage current fault, the Static NPSF (SNPSF), the passive NPSF (PNPSF), and the active NPSF (ANPSF) [40], [42]. Plus, other types of fault models are the write recovery fault (WRF), read enable fault (REF), and memory select fault (MSF) [32], [40], [41].   Table 3 describes the notation used to define the test operation sequences of a test algorithm [43], [44]. The notation indicates the number of elements in a test, the test operations, and the address order. An element in the test algorithm describes a series of operations to be performed on each memory cell, according to the set address order (ascending or descending), before proceeding to the next test element. For example, in the case of MATS++ algorithm [42], [45]: ⇕(w0); ⇑(r0, w1); ⇓(r1, w0, r0);

MEMORY TESTING ALGORITHMS
There are 3 elements, each of them is separated by a semicolon. From this notation, it can be derived that: − In the first element, all memory cells will be set to 0s, regardless of the order of the address − In the second element, a read operation (expecting logic 0) and then a write 1 operation are performed on cell 0. The same procedure is repeated for cells 1, 2, …, N-1. − The number of test operations required on each cell is 6. Hence, its test complexity is 6N. There are various algorithms that can be used in testing the embedded memories. Galloping pattern and walking pattern tests such as GALPAT and WALPAT have quadratic test complexities (O(4N 2 ) and O(4N 1.5 ), respectively), and thus, it will take a very long time to complete the test [23]. Based on Table 1, it would require at least 229 years to complete a GALPAT test on a 256 Mbit memory, and 5.1 days for the case of a WALPAT test. As such, a test with linear complexity is preferred. Classical test algorithms like the Zero-One algorithm and Checkerboard algorithm are very simple in terms of test complexity (4N). For Zero-One test algorithm: ⇓ (w0); ⇑ (r0); ⇑ (w1); ⇓ (r1); It only requires 4 test operations on each memory cells. This results in a very low fault coverage since it can only be used to detect the SAF and half of the TF [13]. The Checkerboard test presented minor improvement in the detection of TFs and some CFs, instead of writing 0s or 1s to all cells, the test pattern used is alternated between 0 and 1 for consecutive cells.
March-series test algorithms are widely used in the industry, due to their simplicity yet having a good fault coverage. MATS++ algorithm, which was previously mentioned above, is an example of a March test algorithm. It overcomes the weakness in Zero-One algorithm by detecting all the TFs. The detection of coupling faults is very poor. There are also test algorithms like March X (6 N) [49] and its extension, March Y (8 N), which were developed to enhance its detection of CFin and ADF [32]. Research in [7] proposed optimization in March Y test algorithm, with the second and third elements of the test algorithm executed in parallel, and thus, the test complexity is reduced by 3N. However, no improvement on the fault coverage was made and some modifications need to be done on the SRAM architecture to be testable by using the proposed test algorithm.
This weakness of having poor coverage on coupling faults was overcome by the development of March C algorithm, with 11N test complexity [17], [50]. By following the detection requirement mentioned in Table 2 in Section 2, it can be used to detect all the conventional CFs like CFid, CFin, and CFst [32], [51], [52]. However, research in [44] proved that a read operation seems to be unnecessary and thus, can be removed. Hence, March C-algorithm was proposed, with the reduction of 1N test complexity while maintaining the same fault coverage. March LR algorithm with 14 N test complexity was also proposed to detect several linked faults [53]. The authors in [14] claim the algorithms can detect all simple faults and coupling faults. However, by observing the test operation sequences in the algorithm and comparing them with the requirement described in Table 2, this algorithm is incapable of detecting several faults like DRDF,  [54], [55] and can detect half of the DRDF (only when the value of both coupling cells are 1s). This is achieved by having a consecutive double read operation of value 1 in the algorithm. Meanwhile, March SR (14 N) was proposed [56], [57] to have a complete detection of DRDF and some CFdrds, by adding consecutive double read operations for both logic 0 and logic 1 in the algorithm. Both March CL and March SR algorithms are unable to detect CFwd.

PREVIOUS WORKS ON IMPROVING THE MEMORY TESTING ALGORITHMS
By observing these test algorithms fault coverage provided in Table 5 and Table 6, it shows that those with high test complexities (22 N  For example, the reduction of the March C-test complexity was proposed by [11], which managed to attain an 8 N test complexity. This was achieved by dividing the test operations into two subgroups which are executed in parallel: M1: ⇑(w0); ⇑(r0, w1); ⇑(r1); ⇓(w0); ⇓(r0, w1); ⇓(r1) M2: ⇑(w1); ⇑(r1, w0); ⇑(r0); ⇓(w1); ⇓(r1, w0); ⇓(r0) By observing the proposed algorithm, it can be seen that M2 is exactly the complement of M1, and thus, only one test bit generator is adequate for both subgroups, where an inverter is added to invert the test bit for M2. Meanwhile, research in [10] proposes the modification in the memory BIST design to fusion three different algorithms (MATS, March X, and March C) in one design. The proposed technique was proven inefficient in improving the fault detection of the memory BIST, as it only allows the system to select one test algorithm to be used (among three options available) when the circuit is operating, by utilizing a multiplexer.
An improvement was proposed in [25] by developing an automation program that can optimize the test operation of the existing March test algorithms, by having a set of test sequence generation rule schemes. This research managed proposes March CL-1 algorithm (⇕(w0); ⇑(r0, w0); ⇑(r0); ⇑(r0, w1); ⇓(r1, w1); ⇕(r1); ⇓(r1, w0); ⇕(r0)) and March-SR1 algorithm (⇕(w0); ⇑(r0, w0, r0, w1); ⇑(r1, r1); ⇑(w1); ⇓(r1, w0, r0, w0); ⇓(r0, w0)), as the results of March CL and March SR algorithm optimization. In addition, it also improves the fault detection of both algorithms in CFtr, CFdrd, and also CFwd, and increased their fault coverages, while maintaining the same test complexity. Moreover, the author also claims that a 100% fault coverage can be attained by adding another 4N test complexity, but this claim is still unproven by any work. Table 7 summarizes the fault coverage achieved by the improved March algorithm. In most cases, additional test operations need to be added to the algorithms test sequences to improve their fault coverages, like in the case of March C+ and March Y2 algorithms. This is the trade-off to be considered since the enhancement of the fault coverage will certainly increase the test duration and thus, the overall test cost. In contrast, some research focus only on reducing the test complexity of the existing algorithm and does not offer any improvement on the fault coverage [7], [11]. While some research work like the one proposed by  [10] offers neither the improvement of the fault coverage nor the reduction of the test complexity, but it only proposed a customizable memory BIST implementation.
The research proposed in [25] managed to optimized the March SR algorithm to improve its fault coverage while maintaining the same test complexity. However, some weaknesses were identified in this research. Firstly, it focused only on improving the detection of SCFs. Secondly, the proposed optimization technique only works on several March algorithms e.g. March SR and March CL. Thirdly, the proposed technique does not work on removing any redundancies in the test operation sequences. Therefore, a research gap is identified, where the existing March algorithms can be further optimized to increased their fault coverage, by overcoming these three weaknesses. This will involve the change of the address order for certain test elements, rearrangement of several test operations in the sequences, and also restructuring several test elements, while still maintaining the same test complexity.
Besides, this review also identified the lack of research work to optimize the complexity of test algorithms on detecting faults like NPSFs, write recovery fault, and read enable fault, as the detection of these types of faults normally requires the testing algorithms with high test complexities. This will allow the reduction of the test time and thus, the test cost while maintaining the test quality. In addition, more research works could be done in the future to cover the test of different types of memories such as DRAM and Flash, as well as the emerging memory technologies of in-memory computing, which are very popular for Internetof-Thing applications [60], [61]. Modified March C- [11] 8 N F F F ---F --Improved March CL [25] 12 N F F F F F F 50% 50% 50% Improved March SR [25] 14 N F F F F F F 50% 62.5% 50% March C+ [58] 14

CONCLUSION
This paper presented the review on functional fault models that commonly occur in memories, especially for RAMs. Each can be denoted by their respective FPs which indicate their sensitizers and the value of the faulty output, from which the detection requirement for each of these faults was derived, as shown in Table 2. The Memory BIST technique is widely used to test embedded memories on a chip, where an efficient test algorithm is necessary to ensure that the test can be performed with high quality and within a reasonable test length, to ensure a very good chip manufacturing yield and also maintain a reasonable overall testing costs. This paper has discussed several March test algorithms and some of their modifications proposed by previous researches, each of which offers different fault coverage and test complexities. The analysis shows that in most cases, high fault coverage test algorithms require more test operations, and thus, increasing the test complexity and the test cost. Based on the review, a minimum of 22 N test complexity is needed to detect all unlinked faults in RAM. However, an optimization can be done on the existing algorithms with lower test complexities to improve their fault coverages, especially on the DCFs, while maintaining the same test complexity. This can be achieved by removing test operations redundancies and rearranging test operations in the sequences.