Design and analysis of fault-tolerant sequential logic circuits for safety-critical applications

ABSTRACT


INTRODUCTION
Fault-tolerance and reliability analysis plays an essential role in the design and implementation of highly reliable and robust digital control systems [1]- [3].Safety critical control applications that use these types of electronic digital circuits like avionics, space, and industrial control applications have become more vulnerable to the effects of faults stemming from different natural resources.Examples of theses faults are intermittent faults, permanent single faults, transient single faults, multiple bit upsets (MBUs) or common cause faults (CCFs).All these faults may result from different factors like ionizing radiation, harsh environment and electromagnetic interference that can undermine and defeat the traditional fault-tolerant techniques even at the ground level [4].Faults may affect digital control systems in a different way based on the level of severity of the environment in which the control system is operating.Different fault tolerant digital control systems were developed in the literature works to quickly identify the presence of a digital subsystem failure in the control system and diagnose its causes in terms of type.However, most of the developed digital systems caused low levels of dependability and reliability because of the limited capability of the developed fault tolerance mechanism and the inclusion of additional hardware components that are not necessary to the control system operation.Although there are some traditional fault tolerant techniques based on the hardware redundancy or the reconfiguration strategies used to mask or correct the event of faults, there is a low fault coverage (C) of meeting high degrees of dependability and reliability in these critical control systems.An example of computer architecture, the field programmable gate array (FPGA) architecture consists of a two-dimensional array of logic blocks and flip-flops connected by the interconnection routing blocks.The logic blocks can perform combinational and sequential logic functions using the look up tables (LUTs) and the memory elements utilized to realize state machine control units.Combinational components like LUTs and routing resources are vulnerable to be affected by permanent faults.These faults can be corrected either by reloading the bitstream file or by resetting the FPGA chip.However, the sequential components like memory flip-flops are vulnerable to transient faults that can be corrected by the next load of configuration bit stream [5].
There are some challenges that stem from applying traditional fault-tolerant techniques in building reliable digital control systems.Firstly, the number of tolerated faults is limited to the number of redundant components available in the digital control system before the whole system fails.Secondly, the failure of redundancy management unit, which monitors the operation of the digital system, coordinates the redundancy of the components, and detects if there is a defect in the working element, may cause a whole system failure even if there are no actual defects in the working system [6].The major contribution of this research work is overcoming and avoiding these architectural challenges by designing a novel fault-tolerant methodology that includes both static and dynamic redundant fault-tolerant systems.This approach consists of sequential logic circuit, D flip-flop storage elements linked to a fault injection unit, a duplicate modular redundancy, and data monitoring units.The experimental simulation work is presented, and the results prove that the approach achieves a robust fault-tolerant digital control system that can be used as a hardware platform for ultra-dependable and safety-critical control applications.

PREVIOUS WORKS
A brief presentation of research works focusing on the topic of fault tolerant digital systems and error detection methods is presented in this section.Different methods were used to create different types of the fault-tolerant digital embedded system as it is shown in Figure 1.All these presented methods are discussed in this section.
Figure 1.The different methods that were used for creating fault tolerant digital systems, from the literature Almukhaizim and Makris [7] explained a methodology for creating fault-tolerant digital circuits that was built based on an expansion of the concurrent error detection (CED) method.They used the CED method to accomplish mistake detection as well as to provide error diagnosis and remedy capabilities.A fault tolerance method for sequential logic circuits based on the concept of sequential finite state machines (FSMs) [8], [9].The suggested method was relied on the addition of redundant comparable states to safeguard a small number of states with a high likelihood of recurrence.All single errors occurring in the state variables of highly occurring states or in their combinational logic were guaranteed to be tolerated by the redundant states.Their method required little space because just a few states require protection as well as improved the fault tolerance of synthesized sequential circuits.Ostanin et al. [10] presented a fault-tolerant, low-overhead, and synchronous sequential circuit design.Their approach was based on a fault-secure system.Their method consisted of only one fault-secure sequential circuit, one regular (unprotected), one checker, and one rather straightforward exclusive OR (XOR) circuit.The recommended scheme's dependability was demonstrated for both single stuck-at failures at gate poles and transient, intermittent route delay faults.Each subsequent flaw was said to manifest itself after the preceding one has vanished.Ban and Junior [11]  415 established a trade-off between reliability and hardware area overhead by applying hardening methods to the arithmetic circuits.Their work also suggested several fault-tolerant strategies in which important component gates in mathematical circuits were identified and rated based on the consequences of a circuit output mistake.Regarding the area limitation of the design requirements, these crucial gates were toughened first.In fact, output bits that were deemed essential to a system were given greater protection priorities, which lowered the likelihood of catastrophic mistakes.The researcher selected the boolean difference error calculus (BDEC) method that was previously suggested in the literature and expanded it in two ways: first, to account for the impact of reliability-enhancement strategies like redundancy, and second, to encompass sequential circuit parts [12].Dug et al. [13] constructed and examined two techniques for creating fault-tolerant pipelined sequential and combinational circuits on a FPGA board.Error-detection and partial error correction (EDPEC), and full-error detection and correction (FEDC) were considered as evaluated approaches.Shalini et al. [14] presented a selective triple modular redundancy (STMR) technique, where fault tolerance in digital circuits; hardware redundancy was a suitable approach.To enhance the timing behavior of synchronous sequential circuits, by disregarding the delay, the output was precisely determined.
The selection criteria for STMR included latency and failure likelihood.It was demonstrated through simulation that the suggested approach decreased hardware failure by utilizing TMR technique only when necessary.The researchers developed a new a feedback control loop connected to a digital pipeline hardware system with an appropriate dynamic model to lessen the impact of errors and faults effects on the output [15].The digital blocks whose executed operation was rewinded were selected as data-path registers for the correction loops of a robotic industrial arm which have applied correction factors.They evaluated the cost and reliability of the suggested technique and compared them to the standard TMR approach.In comparison with the triple approach, their method employed 30% fewer slices for FPGA technology.The architectural design of a hybrid and fault-tolerant processing core that is using concepts of error detection and correction against radiation faults is presented, analyzed, and simulated [16].The error correction codes were embedded among five stages of pipeline processing to identify the run-time faults and operational errors.The experimental timing simulation results indicate that the proposed fault-tolerant method is efficient in consuming digital hardware resources and its software operation is continuously monitored by intelligent fault-tolerant techniques.

THE PROPOSED RESEARCH METHOD
The proposed fault-tolerant sequential logic system is created to achieve high standards of dependability in relation to several fault models, including transient, intermittent, and permanent faults.In the proposed fault-tolerant sequential logic system shown in Figure 2, three types of fault tolerance techniques are designed against different types of faults.The basic sequential circuit component that is investigated in this paper is a D flip flop (F-F) memory element, which has two fixed states and can save one bit at one time.In addition, a D flip flop is a bi-stable memory component that can store either a "1" or a "0" bit at a single time.Once the storage memory element reads the D input signal, a checking operation is executed in the circuit to monitor the status of the synchronous clocking signal whether it is high or low, during which point the input signal propagates to the output signal with the rising edge of each synchronous clocking pulse.Furthermore, the complementary of the output signal Q is called Q bar as it is shown in Table 1.
To design a highly robust sequential fault-tolerant system which can be resilient to the effects of various attacks of natural faults and single upsets, two types of fault tolerance techniques and data monitoring units for the two output signals Q and !Q were architected and embedded in the proposed system.For the first logic circuit, a (XNOR) gate called first data monitoring unit for Q which compare the input of D F-F with the next state Q was built, if the output of XNOR is high and equal to 1 that indicates the D F-F work normally and no fault appear, at the opposite of the (0) appearance that indicate an error appearance.For this purpose, a controlled switch depending on XNOR output was embedded, if the input of this switch is equal to 1 the output of Q will flow, and when its input equals to (0) the inverted value of Q will flow.Furthermore, a XOR gate called first data monitoring unit for !Q which compare the input of D F-F with the next state !Q was built, when its output equal (1) that indicates that the D F-F is working normally and when its output equal (0) indicates a fault appearance, so a controlled switch depending of XOR output was embedded, when its input equals to (1).The output of !Q will flow, and when its input equals to (0) the inverted value of !Q will flow.Consequently, these two types of intelligent fault tolerance techniques can be used to tolerate unlimited number of transient and intermittent faults efficiently.Furthermore, two additional Data monitoring Units for the output signals Q and !Q of another memory device were proposed.These two units use the concept of double modular redundancy (DMR) [17]- [19] with two XNOR gates and another two controlled switches that are responsible of detecting and correcting the effects of artificial and natural permanent faults.The idea is using an additional spare (D flip-flop), XNOR gates compares the output of a switch that follow the first XNOR with the output of the spare D flip flop, if its output equals (1) that indicates that no error is observed, and the switch will allow the output of a switch that follow the first XNOR to flow.However, when the output equals (0) that indicates that an error is observed, and the switch will allow the output of a spare D flip flop to flow.In addition, to make the execution of the proposed design deterministic and synchronous, all the digital switches that are used are controlled by a trigger signal which led to that the comparison of all the outputs will be at the same time.Represents the excitation equation of the proposed digital circuit shown in (1):

F = [X AND Y AND~Z AND ! Q(t + 1)] OR [~Z AND Q(t + 1)]
(1) Figure 3(a) presents the first monitory unit (MU1) timing diagram in its Normal State operation when no fault appears by using MATLAB Simulink [20].The input signals 'X', 'Y', and 'Z' are equal to the value 1,1,0 respectively and the data input of the D F-F is equal to 1, in this state the MU1 will compare the status of input signal with the resulted output signal by using the XNOR1 gate.Additionally, the D flip-flop input is checked with the complemented output by using the XOR gate, if both outputs of the XNOR and the XOR gates are equal to '1' value, that indicates no fault appearance.Furthermore, Figure 3(b) presents the MU1 timing diagram when the 'Q' output signal of the D F-F is defected with a simulated fault.In this scenario, the output data of the XNOR gate will be equal to '0' value and the MU1 will correct this false value and replace it with a right value using a programmable digital switch.

RESULTS AND DISCUSSION
To evaluate the dependable and resilient behavior of the proposed fault-tolerant sequential logic circuit and calculate how much it is reliable and secure, a Markov chain diagram comprised of five descriptive states was modeled as it is shown in Figure 4 and Table 2 [21].Three operating states were embedded in the reliability model, one state for failing in a safe mode, and one state for failing in an unsafe mode.The status of the system is in one of the five states: totally operational, first failing-operational, second failing-operational, failing in a safe mode, or failing in an unsafe mode.To analyze the reliable behavior of the designed sequential fault-tolerant system using Markov chain models, it can be assumed that each sequential memory element obeys the exponential failure rule and has a constant failure rate of  [22].The probability equation P(t+Δt) that a fault-tolerant digital sequential circuit will fail in future at some time (t+Δt) can be calculated and written as in the following relationship: where,  is the failure rate, and P (t + Δt): probability that a fault-tolerant digital sequential circuit will fail at some time (t + Δt).
The reliability can be computed from (3): where, The two-dimensional state transition matrix of a Markov model would resemble: Using algebraic manipulation to let the temporal interval t decrease to zero, the following differential equations are produced: The following equations have been constructed using the Laplace transform: System reliability R(t) is typically defined as the probability that a logic circuit operate without going to failure during the period [0, t].In addition, reliability is considered as an evaluation metric for measuring that the predicted service is reached to customer [23] and [24].In (4) represents reliability and how it is calculated.On the other hand, the safety is an extened concept of the reliability.The safety of a logic circuit S(t) is defined as the probability of a circuit to execute its predicted function completely or transition to operate in a failing in a safe mode in the period [0, t].Hence, (5) represents the safety and how it is calculated.
The stratix IV FPGA fabric which has been assumed to be a target realization platform has 38.1 FIT failure rate.Where FIT refers to failure in time which is a unit that represents how many failures can be occur every 10 9 hours in time. = ℎ * 10 −9 So, ℎ = 38.1 * 10 −9 failure/hour.Altera's stratix IV FPGA chip has the same frequency as 50 MHz.Thus, the mean time to repair (MTTR) for one clock is 20 ns.In Table 3, it can be observed that we compare the probability state values of different fault detection coverage values which represents the probability of being in different states in Figure 4. Additionally, the WINSTEM SURE analysis program [25] was used to model the reconfigurable behaviour of the proposed system.The SURE program is a reliability analysis simulation tool that is developed by the National Aeronautics and Space Administration (NASA) agency to calculate the probabilities of failure rate.Table 4 demonstrates reliability and safety at different fault detection coverage values and in Figure 5, we can observe the fault detection coverage versus reliability.

CONCLUSION AND FUTURE WORK
In this paper, we presented an architectural design and reliability analysis of a novel fault-tolerant sequential logic circuit for safety-critical digital applications.The primary objective is overcoming the deficiencies and faults can attack the operation of latches and D flip-flops embedded in safety-critical sequential circuits.The advantage of the approach is that it tolerates an unlimited number of intermittent and transient faults.We demonstrated the experimental results of achieving high levels of reliability by simulating the fault injection campaigns into the output signals of memory storage elements.The results prove that the proposed system can achieve 0.9998 reliability and safety for the fault detection coverage which is equal to 0.8 and achieve 0.99999998 reliability for the coverage equals to 0.99999.For the future work, it is planned to focus on using the mathematical verification concepts that could be utilized to validate the operational execution of the data monitoring models.Furthermore, the proposed circuit can operate in critical environments that generate potential CCFs by adding a hybrid fault-tolerant mechanism with spare sequential components.Finally, generating the hardware description language (HDL) code using the MathWorks Simulink-based HDL coder and synthesizing the proposed circuit in real-time is one of the future works.
Boolean difference error calculus (BDEC) Error-detection and partial-error correction (EDPEC) Full-error detection and correction (FEDC) Feedback control loop based on a dynamic model Bulletin of Electr Eng & Inf ISSN: 2302-9285  Design and analysis of fault-tolerant sequential logic circuits for safety-critical..(Shawkat Sabah Khairullah)

Figure 2 .
Figure 2. The proposed fault-tolerant sequential logic system

Figure 3 .
Figure 3. Timing diagram (a) MU1 in normal NO fault injection and (b) MU1 at first fault injection

Figure 4 .
Figure 4. Discrete-time Markov chain for the proposed fault-tolerant sequential logic system

Figure 5 .
Figure 5. Fault detection coverage versus reliability

Table 3 .
Probabilities for different states with different fault detection coverage values

Table 4 .
Reliability and safety with different fault detection coverage values