# A RADIATION-TOLERANT D FLIP-FLOP DESIGNED FOR LOW-VOLTAGE APPLICATIONS 

By<br>Grant Douglas Poe<br>Thesis<br>Submitted to the Faculty of the<br>Graduate School of Some University<br>in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE<br>in<br>ELECTRICAL ENGINEERING

August 9, 2019

Nashville, Tennessee

Approved:
Lloyd W. Massengill, Ph. D.
Jeffrey S. Kauppila, Ph. D.

## ACKNOWLEDGMENTS

This research and work would not have been possible without the financial support of the Defense Threat Reduction Agency, as well as the tools provided by Cadence Design Systems and Mentor Graphics which allowed for the access to the design environment, verification, and simulation tools used throughout this research. I would also like to acknowledge all of the technical and professional advice, inspiration, and guidance I have received from the faculty and my peers at Vanderbilt University. In particular, I would like to thank Dr. Jeff Kauppila for his help in guiding much of the technical aspect of this research. Not only did his previous work enable much of the analysis seen in this thesis, but he also played a critical and ongoing role in my education for this particular field. Further, I would like thank Dr. Lloyd Massengill for his guidance as my academic advisor, and in many cases instructor, where his example as a professional and engineer has helped raise the standard that I now attempt hold myself to.

## TABLE OF CONTENTS

Page
ACKNOWLEDGMENTS ..... i
LIST OF TABLES ..... iv
LIST OF FIGURES ..... v
Chapter
1 Introduction ..... 1
2 Background ..... 3
2.1 Single-Event Effects ..... 3
2.2 Bias-Dependent Models ..... 7
2.3 Low Voltage Operation ..... 7
3 A RHBD Low Power D Flip-Flop ..... 9
3.1 Static Single Phase Contention Free Flip-Flop ..... 9
3.2 DICE S ${ }^{2} \mathrm{CFF}$ ..... 12
4 Electrical Performance Comparison ..... 16
4.1 Clock-to-Q Delay ..... 17
4.2 Setup Time ..... 17
4.3 Hold Time ..... 20
4.4 Power Consumption ..... 21
4.5 Layout Area ..... 22
5 DS ${ }^{2}$ CFF Radiation Performance ..... 24
5.1 DS ${ }^{2}$ CFF Single-Node Vulnerabilities ..... 24
5.2 Sensitive Node Pair Separation ..... 27
6 DS ${ }^{2}$ CFF Cross-section and Error Rate Estimation ..... 31
7 Conclusions and Future Work ..... 36
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

APPENDIX

## LIST OF TABLES

Table ..... Page
4.1 Nominal electrical performance results ..... 18
4.2 Worst-case Electrical timing ..... 18
5.1 Single-Node Vulnerabilties ..... 25
5.2 Sensitive Node Pair Separation in RHBD Layouts ..... 30
7.1 Slow NMOS/Slow PMOS corner timings ..... 40
7.2 Fast NMOS/Fast PMOS corner timings ..... 40
7.3 Fast NMOS/Slow PMOS corner timings ..... 41
7.4 Slow NMOS/Fast PMOS corner timings ..... 41

## LIST OF FIGURES

Figure Page
2.1 LET Spectra results with 100 mils of aluminum shielding from [1] and [2] using CREME96[3] ..... 4
2.2 Single-node SET robust DICE Cell[4] ..... 5
3.1 Static Single Phase Contention Free Flip-flop schematic and operation from [5] ..... 10
3.2 Transmission Gate Flip-flop (TGFF) ..... 11
3.3 DICE $\mathrm{S}^{2} \mathrm{CFF}$ schematic ..... 13
3.4 Clocked Inverter DICE Flip-flop from [4] ..... 15
4.1 Setup simulation example waveform ..... 20
4.2 Layouts of the DICE FF(top) $\mathrm{DS}^{2} \mathrm{CFFdual}\left(\right.$ middle) and $\mathrm{S}^{2} \mathrm{CFF}$ (bottom) ..... 23
5.1 DICE S $^{2}$ CFF Single-node upset timing window ..... 26
5.2 Schematics of the DS $^{2}$ CFFpass(a) and DS $^{2}$ CFFdual(b) Layouts ..... 28
5.3 Sections of the DS ${ }^{2}$ CFFpass(a) and DS ${ }^{2}$ CFFdual(b) Layouts ..... 29
6.1 Cross-section vs LET of $\mathrm{DS}^{2} \mathrm{CFF}$ and $\mathrm{S}^{2} \mathrm{CFF}$ with roll $=0^{\circ}$ ..... 33
6.2 Cross-section vs LET of $\mathrm{DS}^{2} \mathrm{CFF}$ with roll $=90^{\circ}$ ..... 34
7.1 Sensitive node pairs in the $\mathrm{DS}^{2} \mathrm{CFF}$. Note that the transistor labels are not the same as seen elsewhere in this thesis. ..... 42
7.2 Example waveform from setup simulations from Cadence ..... 43

## CHAPTER 1

## Introduction

Electrical errors caused by ionizing radiation have been observed since as early as 1975[6]. Significant effort has been dedicated to creating designs that are less susceptible to these errors in order to assure reliable operation in the presence of radiation[7]. In particular, designs that do not require alterations in the fabrication process are desirable because they allow for the utilization of cutting-edge commercial technology processes. To accomplish this, many designs are hardened via topology and layout rather than the response of individual devices[4, 8]. This is known as Radiation Hardened By Design (RHBD). In this thesis, one such technology-agnostic RHBD circuit is proposed.

An unfortunate side-effect of the constant device scaling seen in electronics over time is that these geometrically scaled nodes require diminishing charge in order to perturb, and thus generally become more vulnerable to radiation $[9,10]$. Alongside this, increasing device density has led to significant thermal stress on modern chips[11], which has become a primary performance constraint for modern designs. This work seeks to leverage an existing radiation hardened flip-flop design[4] with an existing low power flip-flop design[5] to create a flip-flop that is both hardened against radiation strikes and capable of operating with reduced power and performance overhead when compared to similar radiation hardened designs.

Chapter 2 of this thesis discusses the necessary background information for the proceeding sections, particularly with respect to Single-Event Effects (SEEs) in digital electronics and low power operation. Chapter 3 details information on a low power conventional flip-flopl design called the Static Single-Phased Contention-Free Flip-Flop ( $\mathrm{S}^{2} \mathrm{CFF}$ ), and presents the radiation hardened variant called the DICE (Dual-Interlocked Storage Cell) $\mathrm{S}^{2} \mathrm{CFF}\left(\mathrm{DS}^{2} \mathrm{CFF}\right)$ that is the focus of this thesis. Chapter 4 then discusses the simulated electrical performance of the $\mathrm{DS}^{2} \mathrm{CFF}$ and compares the results with the $\mathrm{S}^{2} \mathrm{CFF}$ and the RHBD Clocked Inverter DICE Flip-Flop[4]. The next
chapter analyzes and discusses the simulated radiation performance of the $\mathrm{DS}^{2} \mathrm{CFF}$ and compares it against the $S^{2}$ CFF. Chapter 6 gives a brief discussion of a preliminary simulation that estimates the radiation performance of the $\mathrm{DS}^{2} \mathrm{CFF}$ as it compares to the $\mathrm{S}^{2} \mathrm{CFF}$. This is followed by a conclusion that addresses key points of the thesis, and briefly notes future work related to this project.

## CHAPTER 2

## Background

### 2.1 Single-Event Effects

When an ion comes into contact with a semiconductor material it loses energy along its path; this energy creates electron-hole pairs (free charge). When these charge carriers are created within an electric field, they will separate and drift accordingly, and generate a transient current. In particular for digital CMOS circuits, an ion strike can generate charge in a reverse biased junction in a MOSFET[10]. The resulting transient current must be sourced by the PMOS or NMOS transistor(s) driving the same node. Because of the finite channel conductance and current drive of these restoring transistor(s), a voltage perturbation can occur at the drain of the restoring device(s). This transient voltage can corrupt stored values in memory circuits such as SRAM[12] and D flip-flops. With D flip-flops the transient itself can also be latched at the input if it occurs close enough to the clock edge[13]. A voltage transient resulting from an ion strike is referred to as a Single-Event Transient (SET). If this transient causes an error in a digital value, it is referred to a Single-Event Upset (SEU).

The fluence of ionizing particles impinging on a circuit is overwhelmingly dependent on the environment in which the circuit operates. As a general trend, the fluence of these disruptive particles increases with increasing altitude[14], so mitigating the effects of SEUs and SETs is of particular importance for systems bound for space and aviation applications. Each charged particle passing through a material has an associated Linear Energy Transfer or LET, which describes the particle's energy lost per unit path length and has units of $\mathrm{MeV}^{*} \mathrm{~cm}^{2} / \mathrm{mg}$. LET is normalized to the density of the material so that it can be roughly quoted regardless of the target. Particles with higher LET generally deposit more charge than those with lower LET. A given node and logical state will have an associated critical charge or $\mathrm{Q}_{\text {crit }}$, which designates the amount of collected charge required to cause an upset, and thus a higher LET particle is more likely to meet or exceed


Figure 2.1: LET Spectra results with 100 mils of aluminum shielding from [1] and [2] using CREME96[3]
this $\mathrm{Q}_{\text {crit }}$ value and cause an upset. Figure 2.1 shows the fluence of particle LET seen for a Solar Particle Event (SPE), Geosynchronous Equitoral Orbit (GEO), and Low Earth Orbit from [1]. Notice that there are several sharp drop offs in fluence across certain LET ranges. Low Earth Orbit (LEO) in particular has close to a 3 order of magnitude drop in fluence between an LET of roughly $1-2 \mathrm{MeV}^{*} \mathrm{~cm}^{2} / \mathrm{mg}$. Therefore, a circuit that is resistant to upsets from any particles with an LET of under $2 \mathrm{MeV}^{*} \mathrm{~cm}^{2} / \mathrm{mg}$ would be particularly suited for any environment with a similar drop in relative flux.

One design that hardens a circuit against SEUs is the Dual Interlocked Storage Cell[4], also known as DICE. DICE memory uses four redundant nodes instead of the usual two, and uses interleaving feedback such that when a voltage perturbation occurs on a single node, at most only


Figure 2.2: Single-node SET robust DICE Cell[4]
one other node will also be perturbed. This can be observed by noticing in the DICE memory cell shown in Figure 2.2, a strike that would pull node A high would turn off P3 and turn on N1. This would leave node D capacitively held at logic 1, with P1 and N1 both driving node B simultaneously, leaving it somewhere around mid-rail depending on the relative drive strengths of P1 and N1. Node C will also at worst, capacitively hold the correct value if the mid-rail pull of B is low enough to turn off N2. N0 will stay on because of the logic 1 still stored at node D , and after N0 pulls the collected charge off of node A, the four storage nodes will return to the correct value.

However, if multiple nodes are perturbed simultaneously, the value stored in DICE can still be corrupted. This is of particular importance because of the concept known as charge sharing[15]. Charge sharing is a mechanism by which multiple sensitive volumes can collect charge from the same event. In fact it has been observed[16] that without being accounted for, charge sharing can greatly reduce the effectiveness of DICE designs especcially with increasingly scaled devices. In [16] it is shown that a DICE FF at the 40 nm technology node has only a $30-50 \%$ decrease in
error rate for protons and neutrons over a standard flip-flop, which generally is not enough of a hardness improvement to justify the penalties in electrical performance and area that come with DICE designs. Therefore, extra care must be taken to reduce the effect of charge sharing in DICE in order to exploit its single node robustness.

Often the angle of incidence of the charged particle relative to the circuit is what causes it to deposit charge in multiple sensitive volumes. A particle's angle of incidence is often described in terms of its tilt and roll. Tilt refers to the angle between the path of the particle and the line normal to the plane of the layout. Roll refers to the angle between the incident particle path and the plane that runs perpendicular to the plane of the layout along the direction of the gates from NMOS to PMOS. For example, $0^{\circ}$ tilt and $0^{\circ}$ roll describes a path directly normal to the plane of the layout. $90^{\circ}$ tilt and $0^{\circ}$ roll refers to a path that would directly cross through a pair of vertical NMOS and PMOS devices, and $90^{\circ}$ tilt and $90^{\circ}$ roll describes a path that could cross through a long horizontal line of NMOS or PMOS devices. To reduce the effect of charge sharing, sensitive node pairs must be separated so that a given particle is less likely to deposit charge in the sensitive volumes of both nodes simultaneously. The further away two nodes are, the narrower the range of angles there are that would allow for this to occur. The utility of increasing node spacing asymptotically approaches zero, so finding a distance that adequately hardens a circuit against dual node strikes while incurring an acceptable area penalty is important. The effectiveness of a given amount of node separation is dictated by the dimensions of the charge collecting volumes at each node, and is therefore dependent on the specific technology generation of the circuit design.

Every flip-flop discussed in this thesis is a Master-Slave D flip-flop. This means that there are two latches connected in series, referred to as the Master latch and the Slave latch, that each alternate between opaque and transparent operation. During transparent operation, the latch simply passes its input to its output, and during opaque operation it holds the stored input value it previously saw during transparent mode and passes it to its output. While the Master is opaque or transparent, the slave will be the opposite. Generally speaking, because the latch is storing a value during opaque operation, this is when it is more vulnerable to radiation.

### 2.2 Bias-Dependent Models

In order to simulate radiation strikes in this work, validated bias-dependent models from [17] were utilized. These models allow for the specification of a transistor directly in the simulation netlist to be struck by an ion with a specified LET or deposited charge, tilt, roll, and time within the simulation. This allows for the easy identification of transistors that are logically capable of causing an upset in a circuit when struck, and furthermore allows one to estimate the LET or the amount of deposited charge that is required for such an upset to occur. These models were calibrated[17] to computationally expensive and thorough TCAD simulations as well as physical test data from the same technology node. Every radiation simulation in this thesis was conducted at room temperature at the nominal performance corner. The performance corner describes how device characteristics are modelled in simulation. In particular, nominal means that each PMOS and NMOS operate at a typical speed as opposed to being especially fast or slow because of random variations during fabrication.

### 2.3 Low Voltage Operation

For modern technology nodes, power consumption has become a primary design constraint. Due to mostly stagnant supply voltage scaling between technology generations[11], device density has significantly outpaced energy efficiency. This has led to power limits dictating the fraction of devices that can be operated on a chip which greatly impacts system performance. One way to solve this power issue is to scale the supply voltage lower than the nominal operating voltage of the technology node, but this voltage scaling comes with drawbacks, particularly with respect to speed and sensitivity to Process/Voltage/Temperature (PVT) variations.

Supply voltage is a critical factor in CMOS power consumption. Generally, CMOS power consumption is estimated by the following equation[18]:

$$
P_{\text {total }}=\alpha\left(C_{L} \cdot V_{s w} \cdot V_{D D} \cdot f_{C L K}\right)+I_{S C} \cdot V_{D D}+I_{\text {Leakage }} \cdot V_{D D}
$$

Where $\alpha$ represents the activity ratio of a node, $\mathrm{C}_{\mathrm{L}}$ is the load capacitance, $\mathrm{V}_{\mathrm{sw}}$ is the switching
voltage and is generally equal to $\mathrm{V}_{\mathrm{DD}}$, which is the source voltage, and $\mathrm{f}_{\mathrm{clk}}$ is the clock frequency. These make up the first term which is the switching component of power consumption. $\mathrm{I}_{\mathrm{SC}}$ is the short circuit current that occurs when both the NMOS and PMOS devices are simultaneously on while the input voltage changes, and $\mathrm{I}_{\text {Leakage }}$ refers to the leakage current. For most circuits, the switching component of the power consumption will dominate, meaning the total power a CMOS circuit consumes can be estimated to be proportional to $\mathrm{V}^{2}{ }_{\mathrm{DD}}$. Therefore, large power savings can be achieved by lowering the supply voltage.

However, the nominal voltage of a technology node cannot be scaled arbitrarily low in order reduce power consumption. Due to leakage concerns, the threshold voltage must be high enough such that devices can effectively "turn off." With this lower limit for threshold voltage, the nominal voltage must then be high enough above $\mathrm{V}_{\text {Threshold }}$ in order to guarantee fast and reliable operation with current modern CMOS circuit designs. This is because operating closer to $\mathrm{V}_{\text {Threshold }}$ increases sensitivity to PVT variations and decreases circuit speed.

Due to thermal stress becoming such an important constraint in modern processors, there is interest in creating designs that can effectively operate at lower voltages. Operating just above the threshold voltage is referred to as Near Threshold Computing (NTC) and is described in [11] to be a viable trade-off between power consumption and performance. Generally speaking, NTC operation offers large power savings, but prohibits many modern circuit designs from being utilized, as they have a significant probability of failing under increased PVT variations. By designing each cell in a system with PVT variation in mind, voltage scaling becomes an attractive option for reducing a chip's power consumption.

## CHAPTER 3

## A RHBD Low Power D Flip-Flop

### 3.1 Static Single Phase Contention Free Flip-Flop

The unhardened commercial D flip-flop design which serves as the baseline design for this work is the Static Single-Phased Contention-Free Flip-Flop ( $\mathrm{S}^{2} \mathrm{CFF}$ )[5]. The $\mathrm{S}^{2} \mathrm{CFF}$ was designed specifically to be PVT robust for low voltage operation, but with minimal overhead in terms of speed and efficiency at nominal voltage. This is particularly attractive for applications that use Dynamic Voltage Scaling (DVS)[19], where high voltages are used for maximum throughput where necessary, and low voltages are used where power is more of a concern than speed. To achieve this robustness to PVT variations, the $S^{2} \mathrm{CFF}$ has static operation, uses a single-phased clock, and has contention free transitions. Static operation is required because dynamic nodes are particularly susceptible to variations and leakage at low voltage. A single-phased clock is desired because it allows the flip-flop to consume less idle power while the input does not change, it allows for the removal of the internal clock buffers, and in the case of the $S^{2} \mathrm{CFF}$, it simplifies the worst-case hold time. Contention free transitions are also required because the increase in PVT variations at low voltage cause the relative drive strength of devices to be unpredictable, so relying on relative device drive strengths for proper operation will lead to a dramatically increased risk of functional failure. The $S^{2} \mathrm{CFF}$ achieves this while using only 24 transistors, which is the same number of transistors as the widely used Transmission Gate Flip-Flop (TGFF).

Figure 3.1 below shows the schematic and basic operation of the $S^{2} \mathrm{CFF}$ from [5]. It can be seen from this figure that indeed every node is statically held, and there is no contention when any node transitions. Net2 in the schematic operates as a pseudo inverted clock, as it is always pulled high when the clock is low. Because of this role, net 2 is critical for both the worst-case Clock-to-Q (CQ) delay and hold time. For the CQ delay, net 2 will be high just before the positive clock edge and it must be pulled low to turn on M13 to pass logic 1 to QN for $\operatorname{logic} 0$ at the output Q . Similarly


Figure 3.1: Static Single Phase Contention Free Flip-flop schematic and operation from [5]


Figure 3.2: Transmission Gate Flip-flop (TGFF)
for the hold time, net 2 must be pulled low after the positive clock edge in order to turn off transistor M3 and prevent net 1 from switching to logic 0 when the input switches to 1 , which would cause a hold time violation. This worst-case dependence on net2 being pulled low makes transistors M9 and M10 critical to the timing characteristics of the $S^{2} \mathrm{CFF}$. This hold time path is simplified when compared to the hold time path of a TGFF shown in Figure 3.2, where the hold time is dictated by how quickly the clock signal propagates through the local clock buffers to turn off transistor P1. This means its worst-case hold time is heavily dependent on N10, and to a lesser extent P11. Also, because this scenario in the TGFF relies primarily on the speed of an NMOS to turn off a PMOS, the TGFF hold time is particularly high for the slow N/fast P performance corner.

The $S^{2}$ CFF is, however, quite vulnerable to radiation. This work identified that 14 of the 24 transistors in the $S^{2} \mathrm{CFF}$ could cause an upset when struck by a single-event. Despite most of these vulnerabilities existing only during a specific logic state in the flip-flop, with so many nodes soft to radiation, a significant portion of the $\mathrm{S}^{2} \mathrm{CFF}$ will be vulnerable to single-node strikes at all times. This is an unacceptable level of vulnerability for many applications in radiation environments. This work applied a targeted hardening approach to mitigate SEUs while incurring as little performance penalty as possible. To maintain PVT robustness at low voltage, the new design presented in this work is also static, single-phased, and contention free.

### 3.2 DICE $\mathbf{S}^{\mathbf{2}} \mathbf{C F F}$

To harden the $\mathrm{S}^{2} \mathrm{CFF}$ against radiation, the storage nodes were converted into a DICE configuration. As described in Chapter 2, this configuration allows for robustness against single-node perturbations. This circuit is called the DICE $\mathrm{S}^{2} \mathrm{CFF}\left(\mathrm{DS}^{2} \mathrm{CFF}\right)$. Figure 3.3 shows the original schematic of the $\mathrm{DS}^{2} \mathrm{CFF}$ that uses a pass transistor to load data from node A to C. Another variation of the schematic simply duplicates the input to both A and C . This second schematic and layout were created when the original was found to have significant problems with respect to the setup time after creating the layout. However, original pass transistor schematic is adequate for understanding how both $\mathrm{DS}^{2} \mathrm{CFF}$ variants work.


Figure 3.3: DICE $\mathrm{S}^{2} \mathrm{CFF}$ schematic

The $\mathrm{DS}^{2} \mathrm{CFF}$ has a very similar functional operation to the $\mathrm{S}^{2} \mathrm{CFF}$. Net1 in the $\mathrm{S}^{2} \mathrm{CFF}$ is effectively split into nodes A and C in the $\mathrm{DS}^{2}$ CFF. Similarly, node B serves the same purpose as the net 2 in the $S^{2} \mathrm{CFF}$, and node D is analogous to net 1 b . The slave latch in the $\mathrm{S}^{2} \mathrm{CFF}$ is simply two inverters to statically hold the output, with net 2 and CLK turning the feedback inverter on and off. The slave latch in the $\mathrm{DS}^{2} \mathrm{CFF}$ is the same as the $\mathrm{S}^{2} \mathrm{CFF}$, except it simply uses eight transistors in a DICE configuration instead of two inverters. In a normal DICE latch, transistor P0 would need another series transistor to turn off the conduction path so that the voltage on node A can be changed without contention. This is, however, not necessary because node B is always pulled high when the clock is low, therefore P 0 is always off when data is being loaded into the master latch. But, because node B is necessarily always pulled high, the pull-down path on node B must be turned off while the clock is low. This means the source of N1 must connect to the drain of N6 because node A does not guarantee that N 1 will turn off when the clock is low.

Furthermore, the $\mathrm{DS}^{2} \mathrm{CFF}$ mirrors the $\mathrm{S}^{2} \mathrm{CFF}$ in that node B is critical for both the worst-case hold time and CQ delay. Again, this is because node B is always pulled high when the clock is low,
but must be pulled low quickly for proper operation. This makes the speed of transistors N1 and N6 especially critical. With the switching speed of node B being so critical, any additional gates, and thus capacitance, connected to node B will have a noticeable impact on performance. Because of this, the additional gate connection on node B required for the dual input scheme is one of the larger drawbacks of the design change. In the schematics shown thus far, the $\mathrm{S}^{2} \mathrm{CFF}$ has five gates connected to net 2 compared to seven gates connected to node B in the $\mathrm{DS}^{2} \mathrm{CFF}$, meaning that the $\mathrm{DS}^{2} \mathrm{CFF}$ should likely be slower than the $\mathrm{S}^{2} \mathrm{CFF}$, but not dramatically so.

With the pass transistor configuration, the worst-case setup time occurs when A and C must be charged high before the positive clock edge. This charging is particularly slow because C has to be pulled high through three separate devices instead of two, with the capacitance of node A adding to the time it takes for P8 to effectively turn on. The disproportionately high setup time seen in simulation is what motivated the creation of the dual input scheme. Discharging node C through P8 would take longer than charging it, however, the worst-case setup time is still dictated by the time it takes to charge node C high. This is because in the $\mathrm{DS}^{2} \mathrm{CFF}$, when a logic 0 is being loaded into node A , it is not necessary for this value to pass at all to node C in order to ensure that nodes ABCD resolve to 0101 after the positive clock edge. Consider the worst possible case just before the clock edge where node C is not pulled down through P8 at all, and holds a logic 1 . Node A will hold a logic 0 from the input inverter, node B will hold a logic 1 because the clock is low, leaving node D to hold somewhere around mid-rail because both N 4 and P 4 are turned on. When the clock edge occurs ABCD will then resolve to 0101 in a fashion that mirrors how the DICE latch would recover from an SEE that pulls node C high, as described in Chapter 2.

The schematic shown in Figure 3.3 and the dual input $\mathrm{DS}^{2} \mathrm{CFF}$ variant contain 32 and 35 transistors, respectively, compared to the $S^{2}$ CFF's 24 transistors. This is proportionally a smaller increase in transistor count than what is seen when switching from the TGFF, with 24 transistors, to its rough DICE equivalent flip-flop, the Clocked-Inverter DICE FF (DICE FF)[4] with 40 transistors. This transistor count savings of the $\mathrm{DS}^{2}$ CFF over the DICE FF is primarily because of the lack of clock buffers in the $\mathrm{DS}^{2} \mathrm{CFF}$.


Figure 3.4: Clocked Inverter DICE Flip-flop from [4]

## CHAPTER 4

## Electrical Performance Comparison

The results discussed in this chapter are from electrical simulations using the Spectre Circuit Simulator in Cadence with transistor models provided in a $14 / 16 \mathrm{~nm}$ commercial PDK. These simulations include fully extracted parasitics from the layout, and were simulated across corners at both nominal $(0.8 \mathrm{~V})$ and NTC $(0.4 \mathrm{~V})$ voltages. The nominal and worst-case timing performance results at room temperature are shown in Table 4.1 and Table 4.2, respectively. Each subsection of this chapter will provide further discussion on the CQ delay, setup time, hold time, power consumption, and area. Tables listing the timing characteristics at each performance corner can be found in the appendix of this thesis.

Most transistors in the layouts constructed for this research were minimum sized devices, but there are some notable exceptions. In the DICE FF, all four transistors in the clock buffers were increased to 1.5 x minimum size. This was done because of a disproportionate slowdown seen in the DICE FF when simulating with layout parasitics. In the $\mathrm{DS}^{2} \mathrm{CFF}$, both pulldown transistors connected to node B were increased to 1.5 x minimum size. This was also done for reasons relating to speed, as this lowered both the hold time and the CQ delay in the $\mathrm{DS}^{2} \mathrm{CFF}$. In order to strengthen the output of the DICE FF and the DS ${ }^{2}$ CFF, the output buffers in both designs use two fingered devices, effectively doubling their drive strength. Further increasing the transistor sizes yielded modest increases in performance. Such techniques were not pursued for these layouts because the already limited layout space in $14 / 16 \mathrm{~nm}$ was further restricted by the need in DICE for many of the gates to have separate node connections for the PMOS and NMOS side.

During the design process for the $\mathrm{DS}^{2} \mathrm{CFF}$, an improvement to the original $\mathrm{S}^{2} \mathrm{CFF}$ was discovered. The $S^{2}$ CFF simulated for the electrical performance results presented in this chapter feature this improvement, but the schematics shown in this thesis do not reflect this. The specifics of this improvement were omitted from this thesis as it is being prepared for publication by the original
author of the $S^{2}$ CFF paper[5].

### 4.1 Clock-to-Q Delay

The Clock-to-Q or CQ delay is the amount of delay between the clock edge and the corresponding change seen at the output of the flip-flop. The CQ delay is an important factor in determining a circuit's speed. The CQ delay was measured as the delay between the clock and the output reaching half of $\mathrm{V}_{\mathrm{DD}}, 0.4 \mathrm{~V}$ for the $\mathrm{V}_{\text {nom }}$ test and 0.2 V for the NTC test. For both tests, a clock speed of 10 MHz was used with the input switching 50 ns before the positive clock edge. This was done to assure that the switching of the input does not encroach on the setup window of the flip-flop, which could cause an increase on the CQ delay.

From Tables 4.1 and 4.2, it can be seen that the $\mathrm{DS}^{2} \mathrm{CFF}$ has a comparable CQ delay to the $\mathrm{S}^{2} \mathrm{CFF}$, but a lower CQ delay than the DICE FF. In both the $\mathrm{DS}^{2} \mathrm{CFF}$ and $\mathrm{S}^{2} \mathrm{CFF}$, the worst-case CQ delay occurs when the output is switching from high to low, and thus the CQ delay is heavily dependent on the time it takes for these circuits to pull down node B and net2, respectively. This can be seen by noticing that the pass transistor $\mathrm{DS}^{2} \mathrm{CFF}$ has a lower CQ delay than the dual input variant, because the dual input variant requires additional gate capacitance be added to node B . The CQ delay for the pass transistor $\mathrm{DS}^{2} \mathrm{CFF}$ is actually smaller than for the $\mathrm{S}^{2} \mathrm{CFF}$, despite node B having a higher capacitance than net2. This is likely due to transistors M9 and M10 not being sized up to three fins each in the $\mathrm{S}^{2} \mathrm{CFF}$ as the analogous transistors in the $\mathrm{DS}^{2} \mathrm{CFF}$ are. The DICE FF has a larger CQ delay than the $\mathrm{DS}^{2} \mathrm{CFF}$ because of the added inverter delays of the local clock buffers. This can especially be seen when using minimum sized devices in the clock buffers of the DICE FF, which increases the CQ delay of the DICE FF by $25 \%$ at 0.8 V and $27 \%$ at 0.4 V for the nominal operating corner.

### 4.2 Setup Time

The setup time describes the amount of time required before the clock edge in which the input must be held constant to guarantee that the correct value is stored by the flip-flop after the clock edge. This, along with the CQ delay, is a critical timing parameter for determining the minimum

| $\operatorname{DFF}\left(\mathrm{V}_{\mathrm{DD}}\right)$ | CQ Delay | Setup Time | Hold Time | Power $(\alpha=.1)$ | $\operatorname{Power}(\alpha=1)$ |
| :---: | :---: | :---: | :---: | :---: | :---: |
| $\mathrm{S}^{2} \mathrm{CFF}(0.8 \mathrm{~V})$ | 38 ps | 20 ps | 3 ps | 1.9 uW | 2.9 uW |
| $\mathrm{DS}^{2} \mathrm{CFFpass}(0.8 \mathrm{~V})$ | 36 ps | 62 ps | -1 ps | 3.4 uW | 7.5 uW |
| $\mathrm{DS}^{2} \mathrm{CFFdual}(0.8 \mathrm{~V})$ | 39 ps | 29 ps | 4 ps | 3.5 uW | 7.4 uW |
| DICE FF$(0.8 \mathrm{~V})$ | 57 ps | 15 ps | 2 ps | 5.7 uW | 9.9 uW |
| $\mathrm{S}^{2} \mathrm{CFF}(0.4 \mathrm{~V})$ | 650 ps | 230 ps | 36 ps | 0.22 uW | 0.34 uW |
| DS $^{2} \mathrm{CFFpass}(0.4 \mathrm{~V})$ | 610 ps | 720 ps | -10 ps | 0.38 uW | 0.86 uW |
| DS $^{2} \mathrm{CFFdual}(0.4 \mathrm{~V})$ | 720 ps | 270 ps | 20 ps | 0.40 uW | 0.85 uW |
| DICE FF$(0.4 \mathrm{~V})$ | 940 ps | 310 ps | 180 ps | 0.69 uW | 1.2 uW |

Table 4.1: Nominal electrical performance results

| DFF $\left(\mathrm{V}_{\mathrm{DD}}\right)$ | CQ Delay | Setup Time | Hold Time |
| :---: | :---: | :---: | :---: |
| $\mathrm{S}^{2} \mathrm{CFF}(0.8 \mathrm{~V})$ | 52 ps | 27 ps | 4 ps |
| DS $^{2} \mathrm{CFFpass}(0.8 \mathrm{~V})$ | 48 ps | 81 ps | 0 ps |
| DS $^{2} \mathrm{CFFdual}(0.8 \mathrm{~V})$ | 52 ps | 37 ps | 4 ps |
| DICE FF $(0.8 \mathrm{~V})$ | 76 ps | 21 ps | 8 ps |
| $\mathrm{S}^{2} \mathrm{CFF}(0.4 \mathrm{~V})$ | 2870 ps | 1260 ps | 94 ps |
| DS $^{2} \mathrm{CFFpass}(0.4 \mathrm{~V})$ | 2370 ps | 2890 ps | 6 ps |
| DS $^{2} \mathrm{CFFdual}(0.4 \mathrm{~V})$ | 2940 ps | 1080 ps | 20 ps |
| DICE FF $(0.4 \mathrm{~V})$ | 3710 ps | 1540 ps | 665 ps |

Table 4.2: Worst-case Electrical timing
cycle time of pipelined logic. When the clock edge occurs too soon before the input of the flipflop changes, this can result in either an increase in the CQ delay, or a failure to latch the data entirely. The setup time can therefore be defined as the delay that causes a certain percent (e.g. $10 \%$ ) increase in the CQ delay, but for the purpose of this project, it was defined as the minimum delay for which the flip-flop would eventually output the correct data.

To find the setup time for these circuits, simulations were performed where the clock operated at close to 100 MHz and input changed between every other positive clock edge. For the nominal voltage case, the input and clock had no initial delay. The clock was set to have a cycle time half a picosecond larger than 100 MHz . This would cause the positive clock edge to drift further away from the input edge with simulation time. An example simulation waveform from Cadence can be found in the appendix, and Figure 4.1 shows an abstraction of this waveform. A side effect of using this method is that the values given in Tables 4.1 and 4.2 are upper estimates for the setup time, such that the "true" simulated setup time is below the listed value by as much as 2 ps . Considering this is a simulated value, however, further simulations to find a more precise number for each circuit was not necessary. For 0.4 V a similar method was used, but simply used a larger cycle time for the clock signal, then in subsequent simulations included a delay on the clock signal with a smaller cycle time for increased precision.

From Tables 4.1 and 4.2, it can be seen why the dual input $\mathrm{DS}^{2} \mathrm{CFF}$ layout was created. The dual input $\mathrm{DS}^{2} \mathrm{CFF}$ has a worst-case and nominal setup time that is less than half that of the pass transistor $\mathrm{DS}^{2} \mathrm{CFF}$. Additionally, using the worst-case CQ delay + setup time as the metric for speed, the pass transistor $\mathrm{DS}^{2} \mathrm{CFF}$ is $32 \%$ slower than the DICE FF at 0.8 V and $3 \%$ slower at 0.4 V . However, the dual input $\mathrm{DS}^{2} \mathrm{CFF}$ is $9 \%$ faster than the DICE FF at 0.8 V and $31 \%$ faster at 0.4 V , showing a clear improvement in performance. Using this same metric, the $\mathrm{DS}^{2} \mathrm{CFF}$ is $12 \%$ slower at 0.8 V and is close to $3 \%$ faster at 0.4 V when compared to the $\mathrm{S}^{2} \mathrm{CFF}$.


Figure 4.1: Setup simulation example waveform

### 4.3 Hold Time

The hold time describes the amount of time after the clock edge that the input to a flip-flop must be held constant in order to guarantee that the output remains at the correct value. The hold time is of particular importance because unlike the setup time and CQ delay, the clock frequency cannot be adjusted to accommodate a poor hold time. This is because hold time violations are not caused by the clock period being too narrow, but instead are caused when the preceding logic passes data to the input of the flip-flop too quickly. This means a circuit vulnerable to hold time violations will still be vulnerable regardless of the clock frequency, and it cannot be arbitrarily adjusted post fabrication to not experience these hold time violations at a given operating voltage.

The hold time was found in the same way as the setup time, except with the clock signal having a cycle time slightly less than 100 MHz instead of slightly greater, so that the positive clock edges come before the change in input. The results show that the $\mathrm{DS}^{2} \mathrm{CFF}$ has hold time advantages over the DICE FF and even the $\mathrm{S}^{2} \mathrm{CFF}$. In particular, the $\mathrm{DS}^{2} \mathrm{CFF}$ has a worst-case hold time improvement of more than $33 x$ over the DICE FF, and almost $5 x$ over the $S^{2}$ CFF at low voltage.

The DICE FF worst-case hold time is seen in the slow NMOS/fast PMOS corner, much in the same way that it is for the TGFF as described in Chapter 3. This particularly slow hold time is again mostly due to the local clock buffers in the DICE FF. This can be demonstrated by noting that when using two fin instead of three fin clock buffers in the DICE FF, the worst-case hold time increases by $75 \%$.

### 4.4 Power Consumption

With the primary goal of low voltage operation being power savings, it is important that the $\mathrm{DS}^{2} \mathrm{CFF}$ does not incur any significant power penalty when compared to the DICE FF. Clock nodes in a circuit switch from low to high and high to low every single clock cycle, and thus consume more power than the storage nodes in a D flip-flop. This is particularly noticeable with lower activity ratios. With an activity ratio of 0.1 , the storage nodes will only switch once in 10 clock cycles, while the voltage of a clock node will change 20 times during those same 10 clock cycles. Because of the reduced number of clock nodes in the $\mathrm{DS}^{2} \mathrm{CFF}$, it sees significant power savings over the DICE FF.

Additional layouts had to be constructed to more accurately simulate power consumption. The power was measured using layouts that contained two identical flip-flops with added clock buffers to drive the external clock nodes. The external clock buffer layout was independently simulated to find its own power consumption at both voltage and frequency combinations so that the power of the buffer could be subtracted from each layout's power simulation. The power number listed Table 4.1 correspond to the power consumption of one flip-flop. At 0.8 V and 0.4 V , the clock frequency was set to 1 GHz and 500 MHz respectively, and all power simulations were done at the nominal performance corner at room temperature.

In the $\mathrm{S}^{2} \mathrm{CFF}$ layout there are 6 transistors connected to a clock node, 10 in the $\mathrm{DS}^{2} \mathrm{CFF}$ layout, 12 in a TGFF, and 20 in the DICE FF layout. This leads the $\mathrm{S}^{2} \mathrm{CFF}$ and $\mathrm{DS}^{2} \mathrm{CFF}$ consuming considerably reduced power compared to the TGFF and DICE FF, respectively. As stated earlier this is particularly pronounced at lower activity ratios, but even at an activity ratio of 1 there are
still significant power savings. The dual input $\mathrm{DS}^{2} \mathrm{CFF}$ at nominal and low voltage consumes $75 \%$ and $71 \%$ of the power that the DICE FF consumes at an activity ratio of 1.0. This further reduces to $61 \%$ and $58 \%$ at an activity ratio of 0.1 . Ironically, these significant power savings could potentially lead to reduced utilization of low voltage operation, which would also improve circuit throughput. There is a significant power penalty, however, when comparing to the original $S^{2} \mathrm{CFF}$ circuit. For an activity ratio of 0.1 , the $\mathrm{DS}^{2} \mathrm{CFF}$ consumes $84 \%$ more power at 0.8 V and $82 \%$ more power at 0.4 V .

### 4.5 Layout Area

The layouts constructed for this work were all 9-track designs and thus were the same height. . Therefore, the difference in layout area strictly corresponds to the design's horizontal length. The $\mathrm{DS}^{2} \mathrm{CFF}$ and DICE FF are 1.81 x and 2.38 x larger than the $\mathrm{S}^{2} \mathrm{CFF}$, meaning the $\mathrm{DS}^{2} \mathrm{CFF}$ is a $24 \%$ area savings over the DICE FF. With the $\mathrm{S}^{2} \mathrm{CFF}$, $\mathrm{DS}^{2} \mathrm{CFF}$, and DICE FF having 24, 41, and 46 transistors respectively, the DICE FF and $\mathrm{DS}^{2} \mathrm{CFF}$ transistors are noticeably less compact. This is due to the node separation required in these designs for radiation hardness which made the designs much more difficult to make as transistor dense. Figure 4.2 shows the layouts for the DICE FF, $\mathrm{DS}^{2} \mathrm{CFF}$, and $\mathrm{S}^{2} \mathrm{CFF}$. Both the pass gate $\mathrm{DS}^{2} \mathrm{CFF}$ and dual input $\mathrm{DS}^{2} \mathrm{CFF}$ are the same size, so only the dual input $\mathrm{DS}^{2} \mathrm{CFF}$ is pictured.


Figure 4.2: Layouts of the DICE FF(top) $\mathrm{DS}^{2} \mathrm{CFFdual}\left(\right.$ middle) and $\mathrm{S}^{2} \mathrm{CFF}$ (bottom)

## CHAPTER 5

## DS ${ }^{\mathbf{2}}$ CFF Radiation Performance

The DICE FF is known to be radiation hardened compared to an unhardened commercial D flipflop[20], but the differences in the radiation response of $\mathrm{DS}^{2} \mathrm{CFF}$ make its hardness less certain. Most notably, the $\mathrm{DS}^{2} \mathrm{CFF}$ contains single-node vulnerabilities in its storage nodes while the DICE FF does not. Additionally, the sensitive node pairs of the $\mathrm{DS}^{2} \mathrm{CFF}$ are closer together than they are in the DICE FF, largely because the DICE FF is itself a larger circuit. It is therefore important to verify that the $\mathrm{DS}^{2} \mathrm{CFF}$ indeed has a large enough improvement in radiation performance over the $S^{2} \mathrm{CFF}$ in order to justify the electrical performance and area overhead incurred.

### 5.1 DS ${ }^{2}$ CFF Single-Node Vulnerabilities

Despite the DICE-like configuration of the storage nodes in the $\mathrm{DS}^{2} \mathrm{CFF}$, the circuit still contains single-node vulnerabilities. All of these single-node vulnerabilities exist because of node B that acts as a pseudo clock node. However, the small timing window of this single-node vulnerability mitigates the effect on overall error rate. Additionally, node B is the largest node in the entire circuit of the $\mathrm{D}^{2} 2 \mathrm{CFF}$, particularly with respect to capacitance. This means in general these single-node vulnerabilities will have a higher value of $\mathrm{Q}_{\text {crit }}$ compared to the vulnerable nodes in the $S^{2} \mathrm{CFF}$.

The pull up devices connected to node B can cause an upset when struck, but only within a narrow timing window. In order for P6 and P1 to be vulnerable, the clock signal has to be high, and the input into the $\mathrm{DS}^{2} \mathrm{CFF}$ has to have changed from low to high while the clock is high. Node B will only remain vulnerable while the clock is still high after the input switches. Therefore, the portion of time that this vulnerability is present is entirely dependent on the switching characteristics of the input. If the input only switches while the clock is low, then this vulnerability never occurs. This is of particular importance because for many logic pipeline designs, the input of a D flip-flop only changes when the clock is low. Figure 5.1 shows a possible timing where P1 and P6

| Flip-Flip | Independent of Switching Input |  |  | Due to Switching Input |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | \# Vuln. Dev. | $\%$ Time Vuln. | Min. LET | \# Vuln. Dev. | \% Time Vuln. ${ }^{1}$ | Min. LET |
| S2CFF | 14 | $100 \%$ | 1.4 | N/A | N/A | N/A |
| DS2CFF | 1 | $25 \%$ | 21 | 2 | $0-25 \%$ | 2.9 |
| DICE FF | 0 | $0 \%$ | N/A | 2 | $100 \%$ | 14 |

Table 5.1: Single-Node Vulnerabilties
would be vulnerable.
While the internal storage nodes of the DICE FF are robust to single-node strikes, there are vulnerabilities in the local clock buffers that can be compared to this vulnerability in the $\mathrm{DS}^{2} \mathrm{CFF}$. When the input switches in either direction and when the clock is either high or low, a strike in the first local clock buffer in the DICE FF can mimic a clock edge and prematurely load the input data to the output, causing an upset. This upset mechanism will always be present in normal operation, as it is still a vulnerability when the clock is low or high, making the DICE FF vulnerability potentially more disruptive. The $\mathrm{DS}^{2} \mathrm{CFF}$ is similarly vulnerable to false clock edges caused by SETs, but in order for one to occur, a strike would need to affect a clock driver external to the D flipflop itself. This external clock driver will have a much larger capacitance and device drive strength, significantly raising $\mathrm{Q}_{\text {crit }}$. A higher $\mathrm{Q}_{\text {crit }}$ makes this upset less likely to occur. Furthermore, the DICE FF will have the exact same vulnerability to these external clock driver strikes on top of its internal clock buffer vulnerability.

The P1/P6 vulnerability described above was found to have a LET upset threshold of 2.9 $\mathrm{MeV}^{*} \mathrm{~cm}^{2} / \mathrm{mg}$. This can be compared to the LET upset threshold seen in the $\mathrm{S}^{2} \mathrm{CFF}$ of 1.4 $\mathrm{MeV}^{*} \mathrm{~cm}^{2} / \mathrm{mg}$. Additionally, it was found that a strike occurring on the NMOS of the first clock buffer in the DICE FF had an upset threshold LET of $14 \mathrm{MeV}^{*} \mathrm{~cm}^{2} / \mathrm{mg}$, whereas a strike to a finger of the external buffer driving only two $\mathrm{DS}^{2} \mathrm{CFFs}$ had an upset LET threshold of $57 \mathrm{MeV}^{*} \mathrm{~cm}^{2} / \mathrm{mg}$. This higher LET threshold decreases the chance of such an upset occuring.

[^0]

Figure 5.1: DICE $\mathrm{S}^{2}$ CFF Single-node upset timing window

Another upset condition occurs when the clock is low and the nodes EFGH are storing 0101. Transistors N1, and to a lesser extent N6, can cause an upset when struck with this circuit condition. This is because in normal operation, node B should always be high when the clock is low. A strike that pulls node B low will turn off transistor N9, and will turn on both P7 and P10, causing nodes EFGH to store 1010 regardless of the previously held value. This vulnerability will likely have a larger effect on the radiation performance of the $\mathrm{DS}^{2} \mathrm{CFF}$ because the timing window for this upset mechanism is not as specific. Assuming the input has an equal probability of either being a 1 or a 0 , this upset can occur roughly one fourth of the circuit's operating time. However, the overall effect of this upset case is mitigated because of the large LET threshold required for it to occur. The cause of this higher LET threshold is a combination of the large size node B in terms of capacitance and the drive strength of the pull up transistors acting on node B. Considering that input to a flip-flop is generally more likely to stay the same between clock cycles, node A and C will usually be low when EFGH is 0101 . This means N 1 is turned off, and P1 is turned on. P6 will be on because the clock is low. Because this upset mechanism changes the value stored in the slave instead of the master, no feedback will propagate to turn off P1, leaving node B driven by P1 and P6 for the entirety of the transient.

In simulation using the bias-dependent models discussed in Chapter 2, it was found that an ion with an LET of $21 \mathrm{MeV}^{*} \mathrm{~cm}^{2} / \mathrm{mg}$ is required to cause an upset when striking N1. This is in comparison to a simulated LET of $2.9 \mathrm{MeV}^{*} \mathrm{~cm}^{2} / \mathrm{mg}$ required for the $\mathrm{P} 1 / \mathrm{P} 6$ vulnerability, which is more similar to the LET threshold of $1.4 \mathrm{MeV}^{*} \mathrm{~cm}^{2} / \mathrm{mg}$ seen for the single-node vulnerabilities in the $\mathrm{S}^{2} \mathrm{CFF}$. However, if the input changes, node A and C will be low, turning on N 1 and turning off P1. In simulation this was found to have a much lower LET threshold of $0.8 \mathrm{MeV}^{*} \mathrm{~cm}^{2} / \mathrm{mg}$. This could be due to the reduced restoring drive strength, as well as when N 1 is on, both its source and drain can collect charge.

### 5.2 Sensitive Node Pair Separation

The $14 / 16 \mathrm{~nm}$ layouts constructed for the $\mathrm{DS}^{2} \mathrm{CFF}$ were organized specifically for maximum spacing of sensitive node pairs in the DICE cells. In particular, nodes A, B, E, and F were separated as much as possible from nodes $\mathrm{C}, \mathrm{D}, \mathrm{G}$, and H , respectively. Additionally, the pass transistor was separated from both nodes A and C. This was accomplished by organizing the layouts as follows: first half of master, first half of slave, second half of master, second half of slave, followed by the pass transistor from left to right. This organization is further broken down for both the pass transistor and dual input schemes in Figure 5.3. To implement this effectively, additional transistors had to be added to the original schematics. Transistors N6, N9, and P9 had to be split into multiple transistors in order to prevent the drain nodes of these transistors from requiring long metal lines. These long metal lines would have likely made the layout difficult to complete, as the layout already contains a high level of metallization and space is very limited. Counting these split devices as multiple devices along with the addition of the output buffers, the transistor count of the $\mathrm{DS}^{2} \mathrm{CFF}$ increases to 38 for the original and 41 for the dual input scheme. However, a similar transistor splitting was done in the creation of the DICE FF layout, adding 6 transistors for a total of 46 . No node separation was required for the $S^{2} \mathrm{CFF}$ because it is largely single-node vulnerable, rendering node separation practically worthless for radiation performance.

Table 5.1 contains a chart detailing the node separation achieved between each node in both


Figure 5.2: Schematics of the DS $^{2}$ CFFpass(a) and DS $^{2}$ CFFdual(b) Layouts


Figure 5.3: Sections of the $\mathrm{DS}^{2} \mathrm{CFFpass}(\mathrm{a})$ and $\mathrm{DS}^{2} \mathrm{CFFdual}(\mathrm{b})$ Layouts

| Node Pair | DS $^{2}$ CFFpass |  | DS $^{2}$ CFFdual |  | DICE FF |  |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
|  | PMOS | NMOS | PMOS | NMOS | PMOS | NMOS |
| AC | 924 nm | 924 nm | 1022 nm | 1022 nm | 1274 nm | 1274 nm |
| BD | 924 nm | 1008 nm | 1092 nm | 1176 nm | 1344 nm | 1344 nm |
| EG | 938 nm | 938 nm | 1106 nm | 1106 nm | 1274 nm | 1274 nm |
| HF | 1344 nm | 1344 nm | 1344 nm | 1344 nm | 1330 nm | 1330 nm |

Table 5.2: Sensitive Node Pair Separation in RHBD Layouts
versions of the DS $^{2}$ CFF and the DICE FF layouts constructed for this project. The node separation for the dual input scheme is notably higher for the $\mathrm{DS}^{2} \mathrm{CFF}$ because the pass transistor that is placed in the far right of the layout was replaced by the second input inverter which is located in the middle of the layout, pushing the first half of the master and slave further from their respective second halves. The DICE FF node separation is larger than the $\mathrm{DS}^{2} \mathrm{CFF}$ node separation because the layout of this circuit is itself larger, naturally leaving more space between sensitive node pairs.

## CHAPTER 6

## DS ${ }^{2}$ CFF Cross-section and Error Rate Estimation

To verify that these single-node vulnerabilities do not ruin the radiation performance of the $\mathrm{DS}^{2} \mathrm{CFF}$, simulations were conducted on both the $\mathrm{DS}^{2} \mathrm{CFF}$ and the $\mathrm{S}^{2} \mathrm{CFF}$ to estimate their relative SEE upset cross-sections. This was done using a simulation flow[21] that constructs a solid physical model of the circuit and identifies the location of transistors in this model. This model inherently accounts for the node spacing seen in the layout of the circuit, and each sensitive volume is estimated based on the properties of the specific technology node. The Monte Carlo Radiative Energy Deposition (MRED)[22] tool was then used to approximate a radiation environment and estimate deposited charge for individual events in the solid physical model. Spectre simulations are then completed for each deposition event using the validated bias-dependent models described in briefly Chapter 2 to determine whether or not the event induced an upset. These models were calibrated[21] to existing SEU test data at the same technology node, and this simulation flow allows for a good first-look comparison in the relative radiation hardness of circuits.

These simulations were conducted for both the $\mathrm{DS}^{2} \mathrm{CFF}$ and the $\mathrm{S}^{2} \mathrm{CFF}$. The $\mathrm{DS}^{2} \mathrm{CFF}$ layout used was, however, an earlier prototype of the pass transistor $\mathrm{DS}^{2} \mathrm{CFF}$ than is pictured elsewhere in this thesis. The pass transistor $\mathrm{DS}^{2}$ CFF simulated does not include an output buffer, and transistor P6 which pulls node B high when the clock is low is a 1.5 x minimum size device instead of being minimum sized. The output buffer was added because a resistive load connected to either rail will pull node H low or high when it is supposed to be capcitively holding its value during a single-event strike, as described for DICE latches in Chapter 2. In this way, a resistive load opens up significant single-node vulnerabilities in the slave latch if there is no output buffer. P6 was changed from 1.5 x minimum size to minimum size for the more recent prototype to make room for both N 1 and N6 to be 1.5 x minimum size which improves the electrical performance of the $\mathrm{DS}^{2} \mathrm{CFF}$ as described in Chapter 3. Additionally, the Spectre electrical simulations were done without the
parasitics extracted from the layout. This is important because when conducting simulations with the bias dependent models directly, the parasitics increased the threshold LET required to cause an upset in the $\mathrm{DS}^{2} \mathrm{CFF}$ by a factor of around 10 . For example, a strike on P 6 in the $\mathrm{DS}^{2} \mathrm{CFF}$ with parasitics required an LET of $2.9 \mathrm{MeV}^{*} \mathrm{~cm}^{2} / \mathrm{mg}$ to upset, whereas without parasitics only an LET of $0.3 \mathrm{MeV}^{*} \mathrm{~cm}^{2} / \mathrm{mg}$ was required. This factor was notably smaller for the $S^{2} \mathrm{CFF}$, where from the schematic a strike on M5 required an LET of $.3 \mathrm{MeV}^{*} \mathrm{~cm}^{2} / \mathrm{mg}$ to cause an upset, but with parasitics this only increased to an LET of $1.4 \mathrm{MeV} * \mathrm{~cm}^{2} / \mathrm{mg}$. For these simulations, the data input stays at its specified value for the entire simulation, meaning the P1/P6 vulnerability described above is not accounted for. Because of these limitations, the results in this section are preliminary, however, the results provide a first look estimate into how the presence of a single-node vulnerability in the $\mathrm{DS}^{2} \mathrm{CFF}$ affects its radiation performance relative to the completely unhardened $\mathrm{S}^{2} \mathrm{CFF}$.

Figures 6.1 and 6.2 show the cross-section plot results from the simulations described above. With data $=0$, it is apparent that there are no single-node vulnerabilities in the $\mathrm{DS}^{2} \mathrm{CFF}$, and therefore the circuit is only vulnerable at higher tilt and roll angles, where multiple nodes are more likely to be perturbed at once. With data=1 it is apparent that there is a single-node vulnerability in the DS ${ }^{2}$ CFF which is consistent with the N1 vulnerability described previously. This is apparent because with data $=1$, the cross-section of the $\mathrm{DS}^{2} \mathrm{CFF}$ has a much looser dependence on tilt and roll angle, and generally increases along with LET. Because it is single-node vulnerable, it is less important that any given ion can effect multiple nodes at once. This is demonstrated particularly well by the cross-section graph for the $\mathrm{S}^{2} \mathrm{CFF}$ where single-node upsets dominate the cross-section, and as such there appears to be almost no dependence on tilt angle. However, despite the single-node vulnerability in the $\mathrm{DS}^{2} \mathrm{CFF}$, its cross-section is significantly lower relative to effective LET when compared to the $S^{2} \mathrm{CFF}$, especially at LET values less than $10 \mathrm{MeV}^{*} \mathrm{~cm}^{2} / \mathrm{mg}$. This is likely due to the marginally higher critical charge required to upset the $\mathrm{DS}^{2} \mathrm{CFF}$ without parasitics and the presence of multiple vulnerable transistors in the $S^{2} \mathrm{CFF}$ while the clock is both high and low. The $\mathrm{DS}^{2} \mathrm{CFF}$ on the other hand has only one vulnerable node and for this test it is only vulnerable when the clock is low and data $=1$.


Figure 6.1: Cross-section vs LET of $\mathrm{DS}^{2} \mathrm{CFF}$ and $\mathrm{S}^{2} \mathrm{CFF}$ with roll $=0^{\circ}$


Figure 6.2: Cross-section vs LET of $\mathrm{DS}^{2} \mathrm{CFF}$ with roll $=90^{\circ}$

The same simulation tool that generated the cross-section versus effective LET plots was also used to generate an error rate prediction. Using Adam's $90 \%$ worst-case environment, the $\mathrm{DS}^{2} \mathrm{CFF}$ was predicted to have an error rate of $\sim 33 x$ less than the $S^{2} C F F$. These simulation results are preliminary, particularly considering they were done without parasitics and the $\mathrm{DS}^{2} \mathrm{CFF}$ layout simulated was an earlier prototype. One particularly interesting result from these simulations, however, was that the data $=1$ error rate in the $\mathrm{DS}^{2} \mathrm{CFF}$ was predicted to be lower than the data $=0$ error rate. This is despite the fact that data $=1$ contains a single-node vulnerability. From the preliminary sensitivity mappings, with data $=0$ the sensitive node pairs are significantly closer together than they are for data $=1$, which likely could lead to this error rate discrepancy. These sensitivity maps can be found in the appendix of this thesis.

Due to the use of an older prototype and the lack of parasitics for these simulations these results are preliminary. However, the more recent prototype has greater sensitive node pair spacing, and the parasitics appear to increase the LET threshold of the $\mathrm{DS}^{2} \mathrm{CFF}$ moreso than they do for the
$S^{2} \mathrm{CFF}$. With extracted parasitics, generally the nodes of the $\mathrm{DS}^{2} \mathrm{CFF}$ had a roughly 10 x increase in LET threshold in upset, versus the approximately $5 x$ factor for the $S^{2}$ CFF. This means that the radiation hardness of both circuits would improve with parasitics, but the $\mathrm{DS}^{2} \mathrm{CFF}$ would likely improve by a larger margin with respect to LET. This and the additional node spacing seen in the dual input $\mathrm{DS}^{2}$ CFF could mean that the 33 x improvement in radiation hardness stated above is lower than the improvement that would be seen if these simulations were more complete.

## CHAPTER 7

## Conclusions and Future Work

This work developed a D flip-flop design that is more hardened to radiation when compared to commercial designs, but has electrical performance advantages over existing fully RHBD designs. This specific tradeoff would provide designers with a middle ground that allows them to improve radiation hardness without needing to accept as large of a penalty in terms of electrical performance. The $\mathrm{DS}^{2} \mathrm{CFF}$ hits this target with significant electrical performance improvements over a comparable DICE FF, and an estimated 33x improvement in radiation hardness over the unhardened $\mathrm{S}^{2} \mathrm{CFF}$. In particular, the $\mathrm{DS}^{2} \mathrm{CFF}$ has worst-case improvements of setup time by $30 \%$, CQ delay by $21 \%$, hold time by a factor of 33 , and a power reduction of $42 \%$ over a DICE FF at low voltage. The $\mathrm{DS}^{2} \mathrm{CFF}$ is particularly suited for low voltage operation where circuits are more vulnerable to Process/Voltage/Temperature variations. The $\mathrm{DS}^{2} \mathrm{CFF}$ achieves this by being static, single phased, and contention free.

Future work for this project would include both verifying the $\mathrm{DS}^{2} \mathrm{CFF}$ electrical and radiation performance on a physical test circuit, and looking into other D flip-flop designs that target this same intermediate tradeoff in electrical and radiation performance. Further simulations would be conducted to give a more thorough and up to date look at the radiation performance of the most current $\mathrm{DS}^{2} \mathrm{CFF}$ layout before moving to physical verification. Once verified, circuits from this targeted middle ground design space would be available for use in RHBD libraries to give designers a range of performance options when considering radiation robustness.

## REFERENCES

[1] K. F. Galloway, A. F. Witulski, R. D. Schrimpf, A. L. Sternberg, D. R. Ball, A. Javanainen, R. A. Reed, B. D. Sierawski, and J.-M. Lauenstein, "Failure estimates for sic power mosfets in space electronics," Aerospace, vol. 5, no. 3, 2018.
[2] M. A. Xapsos, C. Stauffer, T. Jordan, J. L. Barth, and R. A. Mewaldt, "Model for cumulative solar heavy ion energy and linear energy transfer spectra," IEEE Trans. Nucl. Sci., vol. 54, pp. 1985-1989, 2007.
[3] "Creme96 site. available online." https://creme.isde.vanderbilt.edu/.
[4] T. Calin, M. Nicolaidis, and R. Velazco, "Upset hardened memory design for submicron cmos technology," IEEE Trans. Nucl. Sci., vol. 43, pp. 2874-2878, 1996.
[5] Y. Kim, W. Jung, I. Lee, Q. Dong, M. Henry, D. Sylvester, and D. Blaauw, "A static contention-free single-phase-clocked 24 t flip-flop in 45 nm for low-power applications," IEEE International Solid-State Circuits Conference, pp. 466-467, 2014.
[6] D. Binder, E. C. Smith, , and A. B. Holman, "Satellite anomalies from galactic cosmic rays," IEEE Trans. Nucl. Sci., vol. 22, pp. 2675-2680, 1975.
[7] P. E. Dodd and L. W. Massengill, "Basic mechanisms and modeling of single-event upset in digital microelectronics," IEEE Trans. Nucl. Sci., vol. 50, pp. 583-602, 2003.
[8] A. Balasubramanian, B. L. Bhuva, J. D. Black, and L. W. Massengill, "Rhbd techniques for mitigating effects of single-event hits using guard-gates," IEEE Trans. Nucl. Sci., vol. 52, pp. 2531-2535, 2005.
[9] R. C. Baumann, "Radiation-induced soft errors in advanced semiconductor technologies," IEEE Trans. Nucl. Sci., vol. 5, pp. 305-316, 2005.
[10] P. E. Dodd, F. W. Sexton, G. L. Hash, M. R. Shaneyfelt, B. L. Draper, A. J. Farino, and R. S. Flores, "Impact of technology trends on seu in cmos srams," IEEE Trans. Nucl. Sci., vol. 43, pp. 2797-2804, 1996.
[11] R. G. Dreslinski, M. Wieckowski, D. Blaauw, D. Sylvester, and T. Mudge, "Near-threshold computing: Reclaiming moore's law through energy efficient integrated circuits," Proceedings of the IEEE, vol. 98, pp. 253-266, 2010.
[12] P. E. Dodd and F. W. Sexton, "Critical charge concepts for cmos srams," IEEE Trans. Nucl. Sci., vol. 42, pp. 1764-1771, 1995.
[13] V. Joshi, R. R. Rao, D. Blaauw, and D. Sylvester, "Logic ser reduction through flip flop redesign," International Symposium on Quality Electronic Design, pp. 610-616, 2006.
[14] E. Normand, "Single-event effects in avionics," IEEE Trans. Nucl. Sci., vol. 43, pp. 461-474, 1996.
[15] O. A. Amusan, A. F. Witulski, L. W. Massengill, B. L. Bhuva, P. R. Fleming, M. L. Alles, A. L. Sternberg, J. D. Black, and R. D. Schrimpf, "Charge collection and charge sharing in a 130 nm cmos technology," IEEE Trans. Nucl. Sci., vol. 53, pp. 3253-3258, 2006.
[16] T. D. Loveless, S. Jagannathan, T. Reece, J. Chetia, B. L. Bhuva, M. W. M. L. W. Massengill, S. J. Wen, R. Wong, and D. Rennie, "Neutron- and proton-induced single event upsets for d- and dice-flip/flop designs at a 40 nm technology node," IEEE Trans. Nucl. Sci., vol. 58, pp. 1008-1014, 2011.
[17] J. S. Kauppila, D. R. Ball, J. A. Maharrey, R. C. Harrington, T. D. Haeffner, A. L. Sternberg, M. L. Alles, and L. W. Massengill, "A bias-dependent single-event-enabled compact model for bulk finfet technologies," IEEE Trans. Nucl. Sci., vol. 66, pp. 635-642, 2019.
[18] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, "Low-power cmos digital design," IEEE Journal of Solid-State Circuits, vol. 27, pp. 473-484, 1992.
[19] T. D. Burd, T. A. Pering, A. J. Stratakos, and R. W. Brodersen, "A dynamic voltage scaled microprocessor system," IEEE Journal of Solid-State Circuits, vol. 35, pp. 1571-1580, 2000.
[20] K. M. Warren, A. L. Sternberg, J. D. Black, R. A. Weller, R. A. Reed, M. H. Mendenhall, R. D. Schrimpf, and L. W. Massengill, "Heavy ion testing and single event upset rate prediction considerations for a dice flip-flop," IEEE Trans. Nucl. Sci., vol. 56, pp. 3130-3137, 2009.
[21] K. M. W. et al, "Analyzing single event effects with monte carlo radiation transport and electronic design automation tools," Journal of Radiation Effects Research and Engineering, 2018.
[22] R. A. Reed, R. A. Weller, M. H. Mendenhall, D. M. Fleetwood, K. M. Warren, B. D. Sierawski, M. P. King, R. D. Schrimpf, and E. C. Auden, "Physical processes and applications of the monte carlo radiative energy deposition(mred) code," IEEE Trans. Nucl. Sci., vol. 62, pp. 1441-1461, 2015.

## APPENDIX

| $\operatorname{DFF}\left(\mathrm{V}_{\mathrm{DD}}\right)$ | CQ Delay | Setup Time | Hold Time |
| :---: | :---: | :---: | :---: |
| $\mathrm{S}^{2} \mathrm{CFF}(0.8 \mathrm{~V})$ | 52 ps | 27 ps | 4 ps |
| DS $^{2} \mathrm{CFFpass}(0.8 \mathrm{~V})$ | 48 ps | 81 ps | 0 ps |
| DS $^{2} \mathrm{CFFdual}(0.8 \mathrm{~V})$ | 52 ps | 37 ps | 4 ps |
| DICE FF$(0.8 \mathrm{~V})$ | 76 ps | 21 ps | 8 ps |
| $\mathrm{S}^{2} \mathrm{CFF}(0.4 \mathrm{~V})$ | 2870 ps | 1260 ps | 68 ps |
| DS $^{2} \mathrm{CFFpass}(0.4 \mathrm{~V})$ | 2370 ps | 2890 ps | -378 ps |
| DS $^{2} \mathrm{CFFdual}(0.4 \mathrm{~V})$ | 2940 ps | 1080 ps | -90 ps |
| DICE FF $(0.4 \mathrm{~V})$ | 3710 ps | 1540 ps | 609 ps |

Table 7.1: Slow NMOS/Slow PMOS corner timings

| DFF $\left(V_{\text {DD }}\right)$ | CQ Delay | Setup Time | Hold Time |
| :---: | :---: | :---: | :---: |
| $\mathrm{S}^{2} \mathrm{CFF}(0.8 \mathrm{~V})$ | 30 ps | 17 ps | 4 ps |
| DS $^{2} \mathrm{CFFpass}(0.8 \mathrm{~V})$ | 28 ps | 49 ps | 0 ps |
| DS $^{2} \mathrm{CFFdual}(0.8 \mathrm{~V})$ | 31 ps | 23 ps | 4 ps |
| DICE FF $(0.8 \mathrm{~V})$ | 45 ps | 12 ps | 3 ps |
| $\mathrm{S}^{2} \mathrm{CFF}(0.4 \mathrm{~V})$ | 220 ps | 87 ps | 4 ps |
| DS $^{2} \mathrm{CFFpass}(0.4 \mathrm{~V})$ | 220 ps | 290 ps | -20 ps |
| DS $^{2} \mathrm{CFFdual}(0.4 \mathrm{~V})$ | 250 ps | 120 ps | 14 ps |
| DICE FF $(0.4 \mathrm{~V})$ | 340 ps | 86 ps | 57 ps |

Table 7.2: Fast NMOS/Fast PMOS corner timings

| DFF $\left(\mathrm{V}_{\mathrm{DD}}\right)$ | CQ Delay | Setup Time | Hold Time |
| :---: | :---: | :---: | :---: |
| $\mathrm{S}^{2} \mathrm{CFF}(0.8 \mathrm{~V})$ | 38 ps | 24 ps | 4 ps |
| DS $^{2} \mathrm{CFFpass}(0.8 \mathrm{~V})$ | 36 ps | 74 ps | 0 ps |
| DS $^{2} \mathrm{CFFdual}(0.8 \mathrm{~V})$ | 38 ps | 33 ps | 4 ps |
| DICE FF$(0.8 \mathrm{~V})$ | 57 ps | 14 ps | 0 ps |
| $\mathrm{S}^{2} \mathrm{CFF}(0.4 \mathrm{~V})$ | 700 ps | 670 ps | 18 ps |
| DS $^{2} \mathrm{CFFpass}(0.4 \mathrm{~V})$ | 630 ps | 2180 ps | 4 ps |
| DS $^{2} \mathrm{CFFdual}(0.4 \mathrm{~V})$ | 700 ps | 800 ps | 14 ps |
| DICE FF $(0.4 \mathrm{~V})$ | 920 ps | 360 ps | -150 ps |

Table 7.3: Fast NMOS/Slow PMOS corner timings

| DFF $\left(\mathrm{V}_{\mathrm{DD}}\right)$ | CQ Delay | Setup Time | Hold Time |
| :---: | :---: | :---: | :---: |
| $\mathrm{S}^{2} \mathrm{CFF}(0.8 \mathrm{~V})$ | 41 ps | 18 ps | 4 ps |
| DS $^{2} \mathrm{CFFpass}(0.8 \mathrm{~V})$ | 38 ps | 54 ps | 0 ps |
| DS $^{2} \mathrm{CFFdual}(0.8 \mathrm{~V})$ | 42 ps | 27 ps | 4 ps |
| DICE FF$(0.8 \mathrm{~V})$ | 60 ps | 18 ps | 8 ps |
| $\mathrm{S}^{2} \mathrm{CFF}(0.4 \mathrm{~V})$ | 1440 ps | 140 ps | 94 ps |
| DS $^{2} \mathrm{CFFpass}(0.4 \mathrm{~V})$ | 1170 ps | 340 ps | -207 ps |
| DS $^{2} \mathrm{CFFdual}(0.4 \mathrm{~V})$ | 1380 ps | 140 ps | -44 ps |
| DICE FF $(0.4 \mathrm{~V})$ | 2070 ps | 990 ps | 665 ps |

Table 7.4: Slow NMOS/Fast PMOS corner timings

(a) Data=0

(b) Data=1

Figure 7.1: Sensitive node pairs in the $\mathrm{DS}^{2} \mathrm{CFF}$. Note that the transistor labels are not the same as seen elsewhere in this thesis.


Figure 7.2: Example waveform from setup simulations from Cadence


[^0]:    ${ }^{1}$ Percent of time after the switch occurs but before the next positive clock edge

