On the Use of Undefined Logic Values in Digital VLSI

This dissertation addresses both the consequences and advantages of the fact that all digital logic implementations are analog in reality. Although, in the ideal sense, all digital signals exist at either a logic 0 or a logic 1, in practice signals are generally between these two extreme values. There is a poorly-defined zone (which we denote as ¢) near the midpoint of the logic range where a logic level is not recognizable as a O or 1 beyond a reasonable doubt. Variations in design and fabrication exacerbate this uncertainty. We introduce the concept of zoned binary, which has three states { 0, ¢, 1 } , and arbitrarily define ¢ as consisting of the logic voltage range between 1/3Vid and 2/3Vdd , although the designer is free to set the boundary at any other levels appropriate to the specific implementation. It is pointed out that there are many physical causes why a logic value might be in the ¢ zone, including insufficient time to settle to a static value, wire and device defects , and noise. It is noted that current techniques focus on avoidance, or detection of and dealing with effects. We introduce the idea of an unknown value as information, and suggest that it can be used to enhance performance . We design and test a detector for ¢, and proceed to apply it to rudimentary practical problems such as interconnect difficulties , and to more demanding applications such as asynchronous systems and communications error correction. A new logic family Binary Plus logic is proposed , designed and validated, in both static and dynamic versions. Its applicability to completiondetection requirements of asynchronous circuitry is shown, and an asynchronous stage is designed , fabr icated and tested. The detection of ¢ in a received communications bit is interpreted as an error location method. It is shown that this information can be used with techniques well documented in t he literature to enhance the error correction capability of existing error-control coding schemes. A 9-bit simple paritybased circuit capable of correcting received bits in t he st ate is designed , fabricated and shown to perform properly.


·Abstract
This dissertation addresses both the consequences and advantages of the fact that all digital logic implementations are analog in reality. Although, in the ideal sense, all digital signals exist at either a logic 0 or a logic 1, in practice signals are generally between these two extreme values. There is a poorly-defined zone (which we denote as ¢) near the midpoint of the logic range where a logic level is not recognizable as a O or 1 beyond a reasonable doubt. Variations in design and fabrication exacerbate this uncertainty. We introduce the concept of zoned binary, which has three states { 0, ¢, 1 } , and arbitrarily define ¢ as consisting of the logic voltage range between 1/3Vid and 2/3Vdd , although the designer is free to set the boundary at any other levels appropriate to the specific implementation. It is pointed out that there are many physical causes why a logic value might be in the ¢ zone , including insufficient time to settle to a static value , wire and device defects , and noise. It is noted that current techniques focus on avoidance, or detection of and dealing with effects. We introduce the idea of an unknown value as information, and suggest that it can be used to enhance performance . We design and test a detector for ¢, and proceed to apply it to rudimentary practical problems such as interconnect difficulties , and to more demanding applications such as asynchronous systems and communications error correction. A new logic family -Binary Plus logic -is proposed , designed and validated, in both static and dynamic versions. Its applicability to completiondetection requirements of asynchronous circuitry is shown, and an asynchronous stage is designed , fabr icated and tested. The detection of ¢ in a received communications bit is interpreted as an error locati on method. It is shown that this information can be used with techniques well documented in t he literature to enhance the error correction capability of existing error-control coding schemes. A 9-bit simple paritybased circuit capable of correcting received bits in t he </>st ate is designed , fabricated and shown to perform properly.

Preface
Throughout my long and checkered career in the technology field , I have always been fascinated with unknowns. Whether they were statistical "missing values", "missing inputs" in neural networks, or other instances of "knowing that something was not known", I was interested in how the knowledge of their existence affected how the problem was approached, and possibly affected the validity of the results.
When taking ELE447 and ELE537 with Professor James Daly, I obtained practical, and occasionally frustrating, experience in dealing with a new kind of "unknown" -logic values that were not recognizable as either a zero or one. Trying to adjust the design of a circuit so as to minimize the time it spent in this unknown area, and thus delivered results faster, occupied serious time in design lab.
When the topic of this dissertation (among other possible topics) was suggested to me , I found that it captured my interest immediately. Although I could find no previous work directly addressing the topic, there was a reasonable body of literature in areas that would be affected by this work. It quickly became clear that unknown values in CMOS VLSI circuitry was an area that should be viewed in a positive way, rather than something to be avoided. Attempting by design to avoid an uncertain logic level (as I had spent so much time in the lab doing) was not at all the same thing as detecting it and using the information.
The idea of maintaining the integrity of the "unknown" state through the function of the gate led to the development of a new logic family, Binary Plus logic , and to its dynamic version , Centered Binary Plus logic. This family is equivalent to classic binary logic in terms of the functions realized, but has the added advantage (hence the "Plus" ) of being able to recognize and deal with inputs in an uncertain logic range v in an way appropriate to the binary function implemented by the gate. While the family should certainly be useful in dealing with inputs that are genuinely unknown, it was also shown to have great potential as a completion-indicating construct, and hence had obvious use in the area of asynchronous systems. Using the Centered Binary Plus logic family, a rudimentary 4-bit ripple carry adder was designed and fabricated. Testing has shown that the adder takes advantage of many input data patterns to produce significant completion time savings.
Unknown inputs are often the result of a defect or noise in transmission of the data from another place (on the chip, within the computer, or in the world) to the circuit. Current techniques for combating communications errors focus on errorcontrol coding. It is well established in the literature that if the location of an error can be independently (of the error-control coding scheme) determined, correction capabilities are greatly enhanced -for example, a distance-4 code can correct three errors whose locations are known , as opposed to only one when it has to determine for itself the location of the error. Another example is the simple 1-bit parity code, which is , by itself, capable of detecting one error but correcting none. Using uncertain logic values as error location identifiers, a simple 1-bit parity scheme can correct one-bit errors. As part of this work, a 9-bit parity-based communications input register was developed, fabricated and tested. This circuit can identify an uncertain bit, and use the parity relationship in the transmitted word to correct it .

Confucius, Lun Yu, Chapter 2, Verse 17
Digital logic constitutes the heart of so many of the technological improvements that have been introduced to society during the last thirty years. Personal computer systems, hardware that employs embedded processors, controllers for all sorts of previously "manual" devices -these and many more depend on digital logic for their operation.
Digital communications have likewise increased greatly, especially during the growth explosion of the Internet during the last five years.
In today's comparatively technology-savvy world, it is likely that more people than not know words like "binary" , and can identify the concept as having to do with two states, perhaps can even specify it as the "zero or one idea." Binary circuitry as an electronic dichotomy, however, is an abstraction. Digital logic, as implemented in a practical sense, is not , strictly speaking, digital in nature.
Although future concepts such as quantum computers and networks[l] may be based on phenomena that can be interpreted as true dichotomies , CMOS digital fabrications today are inherently analog in implementation. 1

Motivation
Design rules, including Boolean algebra, assume a set of two possible values: {O, 1}, but, in reality, these values do not have an equivalent voltage level in a circuit, except in the ideal sense.
Values inside a CMOS "digital" circuit are, in actuality, a continuum. Ranging from the primary supply voltage, Vid , down to "ground", Vss, it is easy to classify voltage levels near Vid or Vss, but neither easy nor reliable to interpret a voltage near the midpoint of that range as "belonging" to a binary 0 or a binary 1, for, as shall be explained, the boundary between the upper and lower halves of this range is not a reliable one -between fabrication runs or even within a single circuit. The area near the midpoint of the range is therefore a region of uncertainty, in which a value cannot be reliably assigned to a member of the binary dichotomy. Common practice, we shall see, is to design so as to maximize the occurrence of the "easy to assign" values and minimize, insofar as possible, those which cannot be clearly assigned to one binary value or the other. In all of the classic approaches , undefined values are treated as a problem that might occur, and should be designed , tested or coded around in such a way that they will tend to be taken care of if they do occur. An undefined value, when it resolves itself into the incorrect binary value, is thus treated as merely a case of the "wrong" valid binary value. For example, an undefined value in data transmission may resolve itself to its proper, "as transmitted" , value , providing no error, or as the opposite, "incorrect" value, in which case the error detection/correction capabilities of the code checker are responsible for finding the problem and dealing with it.
The classic approaches make no effort to specifically detect the presence of undefined values. In so doing, they discard information which could potentially be useful in correcting the problem.
This work will address this region of uncPrtainty, showing that its existenceonce detected and systematically treated -can be exploited in a number of useful applications, of which two -asynchronous system design and communications error correction -will be examined more closely.

Asynchronous system design
As processors scale down in feature size, but up in speed, absolute size and complexity, new problems develop . "Global clock propagation" (getting the synchronizing, lock-step control signal everywhere on the processor at roughly the same time) is becoming a greater and greater concern. One author , in discussing the future of processor design, made the observation that "the percentage of the die that can be reached in a few clock cycles is decreasing at an alarming rate. " [2] Others agree, observing that while "local" interconnect time (the time for signals to propagate within an individual logic block) is actually decreasing due to decreased feature sizes, global interconnects require new approaches to avoid being a barrier to processor speed. [3] As more and more processors "go mobile", power consumption also becomes a critical problem. Even in non-mobile applications, power consumption must be dissipated in the form of heat , a pressing design problem in itself. In CMOS circuits, power use tends to be proportionally related to clock speed. A CMOS circuit uses power only when the charge state of a circuit is changing, and states change only as a result of the clock changing. A slower clock equals less power use. Already this approach is used in portable systems today, with the aim of prolonging battery charge life. But as applications require more and more speed, this method will be squeezed between the demands of the application and the need to conserve power and reduce the need for circuit cooling. Other methods need to be implemented to allow greater effective processing speed while keeping power use under control. [4] Types of asynchronous systems which we will explore in this work eliminate the need for a global clock signal. Asynchronous concepts such as GALS (Globally Asynchronous Locally Synchronous) [5] limit synchronizing clock signals to the local logic block level. Additionally, when a local logic block "has no work to do," it stops and consumes no power. We will show (and demonstrate in practice) the applicability of using detection of our uncertain logic level to GALS-based asynchronous systems.

Communications error correction
While it is easy to think of communications in the "macro" sense -between computers on a network , for examplewe must also remember that much more communication occurs on a "micro" level -among circuits on a printed circuit board , or even among different processing elements within a single-chip microprocessor.
Data bits are continuously flowing inside a microprocessor, and elements such as noise and even radiation can create occasional errors. It is important that these errors be able to be (1) detected and, (2) if possible, corrected before serious system degradation occurs. [ 6] Error-control coding -a method of encoding information bits in a group of bits also containing checking information that can be used to detect and sometimes correct errors -is the predominant method of protecting systems from data corruption errors. Merely detecting an error in a single bit using these techniques is a very simple task, utilizing what is known as a simple parity code. Designing and implementing a coding scheme that can correct errors is far more complex and costly, as it requires that the bit location of the error be identified. Much of the "overh r·ad" of an error correction code goes into locating the error. It is well established in the literature that, if a method can be separately implemented (over and above the error-control coding scheme in use) to identify by other means the location of errors, the correction capability of a standard error-control code can be greatly enhanced. [6,7,8,9,10] 4 Detecting that a given bit is "uncertain" can be used as an error location technique. This information can then be utilized as described in the literature to provide superior correction capabilities.

Organization of the dissertation
Chapter 2 examines the region of uncertainty -its causes and effects -and discusses the means typically used to "avoid or evade" the consequences. We also introduce the concept of the unknown as knowledge.
Chapter 3 introduces the central concept of this work , Binary Plus logic, examining it from a theoretical standpoint and proving its validity.
Chapter 4 addresses the design and implementation of the "Binary Plus" logic family. Design equations for a simple detector for undefined logic values are derived, and rudimentary applications are discussed. The overall organization of a proof-ofconcept integrated circuit fabricated as part of this work is described, and specific testing data for the detector and Binary Plus logic elementary gates are presented.
Chapter 5 extends the Binary Plus logic family to its dynamic version -Centered Binary Plus logicand shows its applicability to the design of asynchronous systems.
A simple asynchronous logic stage on the fabricated circuit is described and test data presented.
Chapter 6 considers the use of uncertain logic levels in data communicationsboth within a circuit and between circuits aud devices. It is shown that the approach, by providing error location information, can enable limited error correction capabilities where only error detection is possible using error-control coding alone.
A parity-based uncertainty error detector/corrector implemented on the fabricated circuit is described and test data presented.
Chapter 7 summarizes the work , and suggests further research areas.

Chapter 2
Undefined logic values in digital VLSI

.1 Defining Terms
In digital logic a binary 1 is represented by a logic level 1, which is chosen by convention to be a value nominally equal to the power supply voltage Vdd· This voltage, typically five volts in the early days of VLSI development, may still be five volts in some circuitry, but can be less than one volt in advanced circuits today. A binary 0 is represented by a logic level 0, which is chosen by convention to be a value nominally equal to power supply ground, or V 55 , which we will define equal to zero volts.
In practice, values merely near Vdd are also considered to represent a binary 1, and those near Y'ss a binary 0. The question therefore arises: how near Vdd and Vss need signals be in order to be a binary 1 and 0, respectively? Although a simple question , it has no simple answer.
To aid in our understanding, let us define a term Vh: It would be easy (and tempting) to refer to all values < Vh as binary 0 and all values > Vh as binary 1. Theoretically, as Y'ss ::::} Vid is a continuum in the physical sense, values exactly equal to Vh are of such low likelihood that they can be said to not exist , and therefore there is no ambiguity. However, logic design is an eminently 6 practical process, and matters discussed later explain why such an ideal "point of division" is impractical and unreliable.
For practical reasons , we shall see, a "buffer" must be defined around Vh, such that all values outside the range of t hat buffer can be reliably counted on to default to binary 1 or binary 0. As a study of the precise size and statistical reliability of such a buffer is beyond the scope of this work, we shall err on the conservative side and divide the V'ss =? Vid interval into three equal intervals, resulting in a definition of 1/3 Vdd to 2/3 Vid for our undefined area. In short, we shall specify that , for the purpose of this work: The voltage level interval 1/ 3 Vid to 2/3 Vid shall be defined as the "uncertain" , "undefined" or "invalid" logic level interval. That is, values in this voltage range shall be deemed to be neither logic level 1 nor logic level 0, but instead a level that that cannot be reliably distinguished as to its proper binary value.
As implied above, no claim is made that this represents an ideal or even a reasonable division of the Vss =? Vdd voltage range into truly valid and invalid sub-ranges.
But it does provide a standard and a target for design and simulation of circuits illustrating the principles in this work.

What can cause undefined values?
It should be clear that one cause of an undefined value is a normal transition from one logic level to the other. These changes are clearly not discontinuous , but transition through the undefined region near Vh on their way from one valid value to the other.
Although good design practices emphasize as quick a transition as possible, it is inevitable that every circuit segment in transition will spend at least some time in this undefined region. We recognize, of course, that this "uncertainty" is a momentarya transient -phenomenon -waiting "a little longer" will result in a valid logic level 7 (O or 1) . There can be other causes of a transient visit to Vh. But the key term here is "transient": the undefined status is dynamic. Given time, the circuit will resolve itself into a steady state valid level.
There are , however, causes that can result in a steady state undefined value. No matter how long we wait , the observed circuit value will never become a valid 0 or

1.
We'll now look at both of these circumstances.

Circuit Delays -insufficient time to "settle out"
In CMOS circuits, no power is used in steady state conditions. Despite this principle, power consumption is one of the most urgent and continuing problems in CMOS design . Power consumed must be dissipated in the form of heat, necessitating special cooling arrangements. Laptop and handheld system battery life is inversely proportional to power consumption.
Power is used only during transitions from one logic state to another, and consists primarily of the charging and discharging of parasitic capacitance that is the inevitable result of placing independent conductors -and parallel elements of active devices like transistors -within very small distances of each other and other layers of the integrated circuit. As it is a normal design goal to run the circuit as fast as possible, this translates into as many logic transitions as possible per second, and, as an undesirable side effect , into increased power consumption. In fact , to achieve theoretically maximum speed , a circuit would potentially be in transition virtually all of the time.
During this charging and discharging of parasitic capacitance, logic levels transition from one state to another. During some of t his time, inescapably, circuit output levels (and, consequently, input levels to following stages) are in this undefined area near Vh. In fact , t he maximum clock speed at which a circuit stage may be run is determined by how long it takes the slowest signal in the worst case t o leave this area and become recognizably a logic 0 or logic 1. We see, therefore, that the need to allow sufficient t ime for each value to reach defined levels -to leave the undefined 8 region near h and become distinguishably steady state -is the de facto determinant in the practical maximum clock speed of a circuit.
This cause of transient undefined logic levels is certainly the most common.
Races -may transition through Vh more than once Due to differing delay times in paths within a circuit segment , the output value of that segment may transition through Vh multiple times. This condition is known variously as a "race" or as a "hazard" . [11,12,13] A simple example of a circuit with an evident race is shown in Figure 2  In the static sense, the Output from this circuit will always be a logic level of 0.
In the dynamic sense, however , it is clear that when A changes from 0 =? 1 or from 1 =? 0, the change takes longer to arrive at the Exclusive OR gate through the chain of two inverters than via the direct line. Thus, there is a small period of time during which one input to the gate differs from the other; yielding a logic level of 1 at the output.
The danger posed by races has little to do with the undefined region m Vh, however. The very fact of a "spurious" transition to a valid logic level may, when the signal is used as input to a sequential circuit, result in improper operation. We will return to the matter of races later in this work.

Noise
Another cause of dynamic values in the invalid range is noise [14]. Signal degradation or noise injected into a circuit may have the effect of causing logic levels to enter the undefined area near Vh . Of course, it may also cause a momentary transition to an 9 incorrect logic value in a valid range (0 ::::} 1 or 1 ::::} 0). In t his sense, it is similar in effect to a race. Noise may appear on the inputs to the circuit, or even on power supply lines , including Vss and Vid-Noise is by definition a transient phenomenon.

Defects
Under normal circumstances, a properly designed integrated circuit should never exhibit static logic values near Vh . However, fabrication problems or, less frequ ently, failures during service may result in defects affecting signal integrity, resulting in logic levels near Vh . [15] Such faults may be hard -caused by a permanent defect -or softcaused by a sporadic event such as a radiation particle strike. [16] One type of defect , a bridge, is most likely to occur in data transmission busses. Another type, an open, may occur anywhere, but is most likely where minimum-width features are being used. Additionally, opens or shorts may also occur in active devices (transistors) on an integrated circuit[l 7, 18]; we 'll refer to these problems collectively as "device faults" .

Bridges
In an integrated circuit , a single transmission line typically transmits a single binary value -logic 0 or 1 -from one part of the circuit to another. As digital data is usually made up of several bits (a data word in modern microprocessors , for example, may be 16, 32 or 64 bits in width),_ several transmission lines must run in parallel to carry the full word of data. Thus is created a situation in which several transmission lines run for (comparatively) long distances in parallel paths. To minimize parasit ic capacitance, t hese lines typically are composed of minimum width metal features. To minimize consumption of valuable silicon "real estate", they are usually spaced apart t he minimum allowed by t he fabricating technology being used .
The significant proportion of space on many integrated circuits taken up by these data routing busses, combined with their minimum feat ure separation , results in a high feature "density" that increases the probability t hat a conducting defect will result in a resistive "short" between two (or more) adjacent lines, as illustrated in  : The effect of this resistive short between data lines 1 and 2 on logic levels Vi and Vi depends on the "intended values" of Vi and Vi (Vi 1 and Vi 2 , respectively), as well as the resistance of Rs. Clearly, when V i 1 = V i 2 , there is likely to be no ill effect. When , however , Vi 1 # Vi 2 , the actual voltage values appearing as Vi and Vi will usually differ from their intended values, depending on the parameters of all circuitry attached to those two lines and, not insignificantly, the value of the shorting resistance, Rs. As Rs decreases , l(Vi -Vi)I::::;, 0 volts , until, if Rs achieves a "dead short" (Rs = 00), Vi and V 2 will exhibit close to the same value. If circuit parameters are reasonably similar for D 1 and D 2 , as is particularly likely for a bus, the resulting value of both Vi and Vi are likely to be close to Vi for low values of Rs.

Opens
If a problem resulting from a bridge can be thought of as a "  structure, it has the potential for "opening" two or more adjacent lines. Since no interconnection is made with any other bus line, all such defects can be viewed as independent. Also , not all opens are total -a small amount of conductive material may still connect the two segments , which results in the open appearing as a resistor.
In the most general case, then, an open can be diagramed as shown in Figure 2.5.  Also unlike a bridge, voltage levels on the driving side of the defect are not much affected. In Figure 2.5, Vi 1 will not be significantly affected by the break at R 0except, perhaps, that those areas of the circuit will operate more quickly, as a result 12 of being disconnected from the load and parasitic capacitance associated with the circuitry "downstream" from the defect. Clearly, if Ro is low, there will be little or no impact on V o 1 , while as Ro :::::} oo , V 01 approaches independence of V i 1 . In t his case, v 01 can take on any value at all, even one outside t he normal logic range; V o1 is said to be "floating" . Note that in Figure 2.7 the value for the output of the circuit in the case t hat A= l and B=l is not obvious. In t he normal NAND gate in Figure 2 It is most likely that the "floating" value shown for A=O , B=l will actually simply maintain the last value displayed by the output, at least until the charge dissipates, although a "race" condition could alter this. As an example of this hazard, consider a previous input/output set of A=l, B=O / Output=l. If the transition to A=O, B=l was not instantaneous, but instead went through the state A=l, B=l / Output=O, then the output would likely continue to be 0 even after the input state changed to A=O, B=l.

Imperfect inputs to circuit
We have examined causes of the output of a combinational circuit falling in the undefined area around Vh. It is important to note that, when this happens, it can become a cause of the same phenomenon in later circuitry, as the output from a circuit is usually used as an input to another. Therefore, it is conceivable that an external input level presented to a circuit may fall in the area not clearly defined as a logic 0 or 1. intended to minimize their occurrence and persistence. As the input voltage to an inverter, for example, increases from 0 to 1, the output remains high until the input nears (ideally) Vh, and then makes as rapid a transition as possible to a low output state. The graph in Figure 2.10 illustrates this behavior. An input signal in the area very close to Vh, then, is placed in an effective position of unstable equilibrium as it and its "descendents" pass through successive stages of circuitry. If the first "stage" it encounters doesn't convert it to a logic 0 or 1, one of the following stages almost certainly will. It is therefore virtually guaranteed that an input to a set of successive inverters and gates will eventually be effectively converted to a logic level of 0 or 1.
But what determines which value that input (or its descendents) eventually takes on, and is it reliable?

Appears random overall -but really determined by fabrication conditions
In an ideal world, any value below Vh would tend toward a logic 0, and any value greater than Vi would tend toward a logic 1. Only a signal falling exactly at the infinitely small point Vh on the continuum from Vss to Vdd would have an indeterminate fate. As the world of microelectronic fabrication is indisputably practical, rather than ideal , such is not the case. Minor differences in the process used to form elements across a wafer 's surface make it inevitable that no two inverters, for example, will be truly identical. On a more general scale, differences in measured electronic parameters between different fabrication. runs can provide clear proof of the inaccessibility of consistent device behavior near Vh. The graph in Figure   Furthermore, consider the simple circuit segment in Figure 2.12.
If the input A is very close in value to Vh, we cannot even be certain that the values at the outputs of the two inverters will be or tend to the same logic level (0 or 1). Discrepancies of this type can clearly lead to unplanned behavior by the overall circuit. Consider the more specific example in Figure 2.13.
Logic would dictate that the output from the circuit in Figure 2.13 would always be zero, as the inputs to the Exclusive OR gate would always be identical. But consider the following case of an input value close to Vh (Vdd = 5 volts in this example). Due to fabrication differences, the chains of Ala through Ald and A2a through A2d may not come down on the same side of our unstable equilibrium. This Although the example in Figure 2.14 is clearly contrived, it illustrates the potential dangers inherent in logic levels close to Vi.

How are they combated?
The effect of the problems described above is to make it desirable -even imperative -to avoid these effects.
The specific method(s) used to minimize the effects of uncertain logic levels, of course, depends on which of the causes applies. We shall see, however , that all have one characteristic in common: the aim of minimizing (ideally, eliminating) the occurrence of these conditions.

When circuit delay is cause
The approaches here can be summed up by the phrase, "give it more time." But in today's optimized and pipelined circuitry, there are a variety of techniques available to do this . The reader is referred to design texts [19, 20, 21 , 14] for a full understanding of these methods , a few of which we will briefly summarize here.

Decrease clock rate to allow sufficient time
The simplest and most obvious approach is to slow the clock rate governing the circuit. With more time, the signals in the "problem segment" have an opportunity to "settle", resolving themselves into a set of valid logic levels. As a practical matter, however, as high a circuit speed as possible is highly desirable for competitive reasons, so other remedies are pursued when possible.

Optimize circuit elements J or speed
Significant attention is paid in VLSI texts to consideration of circuit delaystheir causes and design techniques to minimize them. The primary cause of delays is the charging and discharging of the parasitic capacitance which is a natural and inevitable consequence of placing conducting and semi-conducting elements in close proximity to each other. Beyond the parasitic capacitance, the effective resistance of both active (such as transistors) and passive circuit elements through which the capacitance must be charged or discharged is critical in determining the delay.
One obvious approach is to increase the size of the "driving" transistors, thereby decreasing its effective resistance, and enabling the more rapid charging or discharging of the capacitance of the circuit. This may be more complex than it appears, however, since increasing the size of the driving transistor(s) also increases the amount of parasitic capacitance in the circuit "feeding" the gate of the driving transistors, resulting in a slowdown in that segment of the circuit. There is therefore a "balancing act" inherent in the optimization of circuit elements.

Redesign pipelined circuits to redistribute delays
Modern circuits are frequently pipelined to increase speed. Briefly, pipelining as a technique takes a large, long delay, circuit (with a necessarily low clock speed) such as that illustrated in block form in Figure 2 While a given piece of data takes as long (usually longer) to get through the circuit (latency), several other pieces of data are being processed through the pipeline simultaneously, resulting in a much higher throughput. In an ideal partitioning of the work of Circuit 1 above into Circuits la, lb and le, the delay of each of the three pipeline "stages" would be one third the delay of the original, non-pipelined circuit, yielding a throughput of three times the original circuit . The attainment of such an ideal is unlikely in practice, however , but it is crucial to balance the pipeline stages as evenly as possible, as the maximum clock speed of the entire pipeline is determined by the worst-case delay of the slowest pipeline stage.
When race (hazard) is the cause As mentioned earlier, races are already considered a potentially serious problem , not because the circuit spends more time in an undefined state, but because the transition through it to a valid logic state (although not necessarily the desired one) can occur more than once while a final value is being arrived at.
For our purposes, primarily concerned with problems resulting from the existence of undefined logic levels , this cause is not much different from the situation discussed above where simple circuit delay is the cause. Given time, a combinational circuit subject to race conditions will eventually settle into a final , valid state. Nonetheless, we wish to make note of the fact that races in sequential or dynamic circuits can be a serious problem producing spurious results; we shall return to the subject later in this work.
When noise is the cause Efforts in this area center on making the noise margin as great as possible. Weste and Eshraghian [14] describe noise margin as a parameter that "permits one to determine the allowable noise voltage on the input of a gate so that the output will not be affected", and go on to recommend design goals in which "the transfer characteristic should switch abruptly." A transition voltage near the midpoint of the logic range (near Vh) is also desirable; while, for example, increasing the voltage at which the transition takes place may raise the "low" noise margin, it will simultaneously lower the "high" noise margin , rendering the gate asymmetrically sensitive to noise.
When defect is the cause A defect differs significantly from delay-based causes m that additional time will likely do little to change the result -the final resting state of the circuit may lie in the undefined area near Vh. Approaches toward mitigating this problem vary according to whether the defect is "hard" -caused by a manufacturing defect or later permanent damage -or "soft" -a temporary result of a event such as the strike of an alpha-particle. To this dichotomy we must add for completeness agingbased defects, such as the development of an open in a transmission line due to conductor migration and use-caused device shorts and opens. [22] This last class of defects resembles hard errors in their permanence, but differ in that they were not present at time of manufacture.

Hard manufacture-time error: testing procedures must detect
It is the aim of modern testing procedures to detect hard errors as part of the manufacturing/testing process. There are many testing methods which may be used to confirm proper operation of a circuit , including boundary scan (a form of edgepin testing), current-sensing (a higher than designed supply current may indicate a short in the circuit) , and methods for getting to the "innards" of a fabricated circuit prior to final processing and packaging, such as "guided probe", "electron-beam" and "bed-of-nails" testing. [23,24,25,26,27,28,29,17,18,30] It is pointed out, however, that defects that are not strong enough to produce a logic error during testing (such as one that produced an intermediate logic level but one that barely resolves itself to the right value) cannot be detected with many standard tests. [31] It has been known for some time to test designers working with analog circuits that digital testing techniques accounted only for "catastrophic faults" , and not for the "out-of-specification" faults that occur as often. [32] Later work [33] pays some attention to analog effects of such faults in digital circuit testing.

Post-manufacture error: error-checking circuitry must detect and correct
Errors that are transient, or permanent errors that develop after the circuit is put into service, must be detected while normal operations are in progress. Simple techniques, such as including a parity bit in RAM arrays, may be used, or complex fault-tolerant methods applied. [34 , 35, 36] All such approaches have costs associated with them, and what may be appropriate for a restricted subset of uses (long mission, high reliability applications such as a space probe) may not be cost-effective for most uses.
When imperfect input is the cause Additional circuitry must be added to detect and sometimes correct this condition.
The same fault-tolerant on-chip methods to detect error (dual-rail encoding and similar fault-tolerant methods) can be used between chips or assemblies.

An undefined value as information
We have seen that there are clear causes for undefined logic levels. Knowing that a logic level is undefined could be an indicator of one of these specific causes, dependent on the environment and circumstances. Indeed , we must consider knowledge of an undefined logic level as information; in brief, the information is that we do not know the proper value that this circuit is indicating. Yet current practice is effectively to throw this information away -to never detect it and, instead, to avoid its occurrence (and/or its effects) to the extent possible. We design as if it's not there, and do what we need to do to increase the probability that the circuit comes down on the correct side of Vh. In circuits for high reliability applications, we have seen earlier, the possibility of incorrect results is accepted, and complex methods for detecting and correcting it (double rail encoding and the like) are em ployed where the cost can be justified. How could such knowledge (that a value is in the undefined range) be of use in CMOS circuits? Earlier in this chapter, we looked at some of the causes that would result in a value being in this range. By specifying appropriate constraints, we should be able, in a practical sense, to use the existence of the condition of uncertain value to infer the active presence of the corresponding cause. For example: • In a tested and "known good" circuit , information that a result is undefined could be used as an indication that more time is needed to allow the result to settle, or that a circuit failure has occurred.
• During operation of a tested and "known good" circuit designed to receive data The question occurs: what is required to fulfill the promise inherent in these uses of undefined logic levels as information? We can say immediately that two clear 24 requirements exist: • A theoretical foundation must be established for the reliable and robust use of this information, unless it already exists in the literature.
• The condition must be detectable. There must be circuitry implemented at appropriate locations (dependent on the desired detection capabilities) to detect when a logic level is valid or invalid.
• Once a detc>ction scheme has been implemented, appropriate circuitry must be present to make use of this new information in a meaningful and practical way.
We will consider these requirements in later chapters.

Summary
We have defined what we mean when we say a logic level is uncertain, undefined or invalid, and have provided for the purposes of this work a range l/3Vdd =? 2/3Vdd· We have further surveyed several causes of logic levels in this uncertain range, and briefly discussed measures typically taken in response to their potential existence.
It should be clear, notably, that design methods used to address this problem are of an "evade and avoid" character. There is no effort in the design to detect the condition; on the contrary, they seem to be considered a nuisance -a form of "non-information", and therefore something to be minimized or corrected.
We briefly discussed the potential the detection of undefined values has for use in VLSI circuitry, based only on the inference that if the condition exists, a cause (or causes) is indicated .
In the following chapters, we shall consider not only the inference of the cause from the condition, but also other uses for this information.

Binary Plus logic
Jn this chapter we shall define theoretically a new logic family, which we shall call "Binary Plus" logic. This family is similar to existing binary logic in that it is based on two valid values. It enhances the binary concept by adding the detection of undefined logic levels -states in which the true binary value cannot be reliably determined -and using that information to add capabilities unavailable to pure binary logic circuitry.

The detector
We begin by specifying the requirements for a functional unit to detect the presence of an undefined value.
Specific circuitry is needed to somehow measure the logic level on the input and make a determination as to within which range it falls , in accordance with  We can say that the boundaries between the zones are robust . They might vary significantly, while still maintaining confidence that , for example, an input on or near the 1/3Vid boundary will never be interpreted as a valid 1. We must, of course, remind ourselves of an earlier stated point -that there is no reason why these boundaries could not be set closer to (or farther from) Vh. Provided that they are not set excessively close to Vh, robustness should still be present. [What constitutes "excessively close", in the presence of noise and other factors, must be left for the specific implementation designer.]

Not a new "value"
It should be noted here that dividing the Vss =? Vid range into three, rather than two, zones might be seen as creating a third "value" in a heretofore 2-value, or binary, scheme. Although that theme has been applied -to create ternary logic -this is not what we seek to do here. Ternary logic, in carrying three rather than two values in each signal, actually suffers from a worse form of the same uncertainty problem as CMOS binary logic circuitry. There are two zones of uncertainty in ternary logicbetween the first and second values, and between the second and third.
The third zone we seek to create in the Vss =? Vdd voltage continuum does not represent a new value. Instead, it establishes a signal of the existence of a condition.
This signal can be conceptualized as an interdependent yet separate signal, as shown in  Table 3.2: Implied Value and Signal One advantage of this approach is that the two pieces of information (binary value and uncertainty signal) are encoded within one physical line. We will refer to a line carrying such a logic level to a detector as carrying Zoned Binary data. It is, in reality, no different than any line carrying binary data -it differs in that it is used as input to a detector designed to "decode" it . 27

.1.2 Required products of the detection process
For the purposes of this work, we will now define a signal RDY such that:

RDY = Uncertain
Conceptually, RDY , when true , indicates that the input value is in one of the two valid binary zones: We also wish to define signals which indicate the presence of a valid "O" and a valid "l ", effectively splitting the RDY signal into two: RDY 0 and RDYi. We shall see in Chapter 4 that it is most efficient to implement and use these signals as inverted forms. We therefore define signals XH and XL as follows: • XH takes on a value of 0 only when the input to the detection circuitry is a valid 1. XH has a value of 1 under all other conditions.
• XL takes on a value of 1 only when the input to the detection circuitry is a valid 0. XL has a value of 0 under all other conditions. We summarize in Table 3.3 the interrelationship of the signals we wish to be able to obtain from an input. We have defined signals that may be used to provide various sorts of detection of undefined values. We now proceed to develop the use of this detection information in Binary Plus logic.

Development of Binary Plus concepts
We require a more precise operational description of Binary Plus logic , which we give here: • the logic is still two-valued, or binary, and • logic gates are implemented so as to maintain the integrity of the additional zoned binary signal through the function of the logic gate to the output; that is, outputs become valid only when valid inputs constitute a sufficient Boolean condition for a known output, and are invalid at all other times.

A small step
We take a small step in the direction of Binary Plus logic by considering a rudimentary use of our detection capabilities as applied to binary logic. In Figure 3.1, we have placed tri-state buffers on the output(s) of the combinational circuitry that uses the inputs. Controlling the buffers with the ANDed RDY signals of our detectors, we prevent erroneous signals from being passed on to later circuits. We have satisfi ed, in a basic way, our requirement that the outputs be valid only when needed inputs are valid. In fact, all inputs must be valid in this case 29 in order that outputs become valid. Clearly this is a contrived example, and an imperfect one, too , for: • the circuit has a clear hazard, in that the output tri-state buffers will likely be enabled before the newly valid inputs have had time to flow through the combinational logic block and reach their static value, • efficiencies are disregarded, as in many implementations, not all inputs are critical to the output, depending on the values of those inputs at any given time, and • we do not know what the outputs of the circuit will be when the tri-state buffers are not enabled , as they will be left floating.

Compleie "Binary Plus" concept
The simplistic approach to ensuring that results have been generated using valid data that we discussed in section 3.2. 1 can be extended to a far more powerful implementation .
We shall first develop an understanding of what it means when we say that "outputs become valid only when needed inputs are valid." As an example, consider the truth table of a basic 2-input OR gate, as show.n in Table 3.4.  We note that a 1 on either input (by extension , any input on an OR gate of more than two inputs) is a fully sufficient condition for a 1 appearing at the output.
Conversely, a logic level of 0 must be applied to all inputs of the OR gate in order for a 0 to appear at t he output.

30
To understand how these characteristics will point toward a better understanding of the Binary Plus concept, let us first, for clarity, extend Table 3.2 by defining our notation for zoned binary, as shown in Table 3.5 .
[Value II Binary (Value) I Uncertain (Signal) II Zoned Representation II [ ~JI ~ I ~~ I I f I I Table 3.5: Implied Value and Signal We will be using the notational symbol </> to represent our uncertain zone in a zoned binary representation. It is important to remember, however, that this is not a true third value , but is instead shorthand for the combination of an unknown value and a known signal.
Now we expand the truth table of Table 3.4 to include new possibilities on the input, as shown in Table 3.6. Note the behavior we have specified for the gate when one or more of the inputs in¢. When one input is 1, it matters not whether the other input is 0, 1 or ¢. The other input is no longer critical. As a 1 on any input of an OR gate is a necessary and sufficient condition for a 1 on the output, we do not have to be concerned whether the other input is even known.
There are two factors that separate this example from the rudimentary data application illustrated in Figure 3.1, and which therefore define the concept of Binary Plus: • The concept of critical inputs for logic functions is taken into account in determining whether the output of the function can be considered valid. To rephrase, we take advantage of logic functions that do not require complete data for a valid output.
• The output of the function is also zoned binary.
Similarly, the Binary Plus AND gate also takes advantage of this conditional criticality of data inputs, as shown in Table 3. 7.

Binary Plus logic specifications
Before formulating the method we will use to create Binary Plus gates, it will be useful to review some basic topics in VLSI CMOS design. We can then proceed to develop the basic implementation theory of the Binary Plus logic family.
In doing so, we must remember that, for inputs i-n the valid ranges, the operation of such gates must be exactly equivalent to its implemented Boolean function. For inputs not in one of the two valid ranges, the gate must behave differently: taking the logic function being implemented into effect, the gate must return a valid output or an output reliably within the invalid range , preferably as close to Vh as possible.
We shall first develop the specification intuitively for understanding. We shall then more formally extend the design technique to the general or complex gate.

Complementary logic
In standard CMOS complementary circuit design, the pfet network for a logic function is the complement , or dual , of the nfet network. The arrangement of these networks is shown in Figure 3.2. The pf et network connects the output to Vid when the inputs warrant a logic 1 output; its complement, the nfet network, connects the output to Ground (Vss) when the inputs warrant a logic 0 output.  Since Binary Plus gates must exhibit a three state output, it follows that the pfet network and nfet network in such a gate cannot be true complements of each other.
Yet the same Boolean logic function must be realized. How are we to implement a gate in t he face of this seeming contradiction?

Intuitive development
In our rudimentary example in Figure 3.1 , we used the RDY signals from the detectors that receive the input for "pre-processing". We shall now rely on the other signals -XH and XL -we specified for our detector outputs. Table 3.3 is reproduced here as Table 3.11 for reference.
I I Input II RDY II XH I XL I I II f I I ~ II ~ I ~ I I Table 3.11: Relationship of Output Signals If we now consider the pf et and nfet networks separate entities whose function is to pull up or down , respectively, the output line, a solution is possible. Table 3.12 specifies the conditions in the pfet and nfet networks which must be met in order that specified outputs will appear.
Remembering t hat a logic 0 input to the base of a pfet will cause it to conduct, we wish to apply inputs of logic level 0 to the pfet network only when that level results from a valid input to the circuit -that is , when the input driving the detector is in the valid logic 1 range. Examining Table 3.11, we see that output "XH" meets this requirement . Output "XL" does not, as it will display a logic 0 when the input 35 11 Table 3.12: Jet Network States vs. Zoned Output is either 1 or </>. Therefore, we must connect "XH" outputs to the pfet network.
Similarly, noting that a logic 1 input to the base of a nfet will cause it to conduct, we wish to apply inputs of logic level 1 to the nfet network only when that level results from a valid input to the circuit -that is, when the input driving the detector is in the valid logic 0 range . Examining Table 3.11, we see that output "XL" meets this requirement. Output "XH" does not , as it wiil display a logic 1 when the input is either O or </>. Therefore, we must connect "XL" outputs to the nfet network. The result of this would be that the gate would tend to display the last valid 0 or 1 36 output level. To ensure this does not occur when an output state of <P is appropriate, we can "center" the output when it would otherwise be floating, creating the circuit shown in Figure 3  The effect of the resistors that "center" the output value in event of a floating condition can be simulated in CMOS circuitry using weak, always-conducting transistors. A disadvantage of this approach is that these weak devices are always conducting, resulting in continuous power dissipation, not a desirable condition. We shall see in Chapter 5 how a "dynamic" approach aileviates this problem.

A note on complemented inputs
In the preceding development, we have said that the "XH" outputs of the detector should be used as inputs to the pfet network, as that output, in contrast to the "XL" output, displays a logic 0 (needed to make a pjet conduct) only when the input to the detector is a valid 1. If , however, it is desired to create a complex gate in which some of the inputs must be inverted within the gate and used in that form, the approach must be adjusted , as shown in Figure 3.5.
To make it clear why we now route the complemented "XH" outputs to the nfet network and the complemented "XL" outputs to the pfet network, we now  Table 3.11 to show the internally complemented values of Figure 3.5, yielding  It is now obvious that the criteria for selecting the output to be used as input to the pfet network is reversed by internal complementing. That is , it is t he complemented XL that takes on a value of logic 0 unambiguously, and should therefore be used as input to the pfet network. By the same reasoning, it is t he complemented XH which should be used as input to the nfet network.

Elimination of races
As no Binary Plus logic gate can display any valid logic level on its output until the inputs have reached a necessary and suffici ent co ndition for that output (which implies that the later arrival of a previously unknown input cannot change the output) , 38 and provided that all such inputs shall be, in turn, zoned binary inputs conditioned by previous Binary Plus or equivalent "protected" sources, if follows -and will be proven later in this chapter -that races cannot occur in properly functioning Binary Plus logic stages.

Formal development
We begin by defining "zoned binary" more formally:

Definition 3.1 Zon ed binary is th e combination of a binary value and a signal,
carried on th e sam e line. The binary values are 0 and 1, and the signal, which is asserted when the value reaches an indeterminate state between 0 and 1, the width of which is determined by the implementer, is termed </> , and represents that th e value is unknown.
We now proceed to define Binary Plus logic. Before we can proceed to Binary Plus gate construction, we must define the term "similarly constructed conventional binary gate": Definition 3.3 A "similarly constru cted conventional binary gate" is a conventional binary gate whose pfet and nfet networks have been designed under th e assumption that the inputs will be inverted.

Definition 3.2 A B inary Plus logic gate is one that accepts zoned binary inputs
We are now ready to define gate construction in the form of a theorem. Proof: Suppose that there is a Binary Plus logic gate that, when the "high" detector outputs ( "XH" for normal and "XL" for internally complemented) are connected to a pfet network equivalent to the pfet network for a similarly constructed conventional binary gate generating the same Boolean function , and the "low" detector outputs ("XL" for normal and "XH" for internally complemented) are connected to an nfet network equivalent to the nfet network for a similarly constructed conventional binary gate generating the same Boolean function , and a centering method is used to ensure that floating outputs are brought to </>, does not display the proper zoned binary output. Then either (1) the pfet network is pulling the output high when the Boolean function does not specify it, (2) t he pfet network is not pulling the output high when the Boolean function does specify it , (3) the nfet network is pulling the output low when the Boolean function does not specify it, ( 4) the nfet network is not pulling the output low when the Boolean function does specify it, (5) the output is not being set to </> when neither conditions for a logic 1 output nor a logic 0 output are met, or (6) the output is being set to </>when sufficient conditions for a logic 1 output or a logic O output are being met.

39
If (1), and since the "XH" inputs ("XL" inputs for complemented inputs) are identical to t hose in a similarly constructed conventional binary gate generating the same Boolean function, then the pfet network is conducting when the pfet network 40 of a similarly constructed conventional binary gate would not. Therefore the pfet network is not equivalent to the pfet network in a similarly constructed conventional binary gate generating the same Boolean function, which contradicts the initial assumption.
If (2) , and since the "XH" inputs ("XL" inputs for complemented inputs) are identical to those in a similarly constructed conventional binary gate generating the same Boolean function , then the pfet network is failing to conduct when the pfet network of a similarly constructed conventional binary gate would. Therefore the pfet network is not equivalent to the pfet network in a similarly constructed conventional binary gate generating the same Boolean function, which contradicts the initial assumption.
If (3), and since the "XL" inputs ("XH" inputs for complemented inputs) are identical to those in a similarly constructed conventional binary gate generating the same Boolean function, then the nfet network is conducting when the nfet network of a similarly constructed conventional binary gate would not. Therefore the nfet network is not equivalent to the nfet network in a similarly constructed conventional binary gate generating the same Boolean function, which contradicts the initial assumption.
If (4), and since the "XL" inputs ("XH" inputs for complemented inputs) are identical to those in a similarly constructed conventional binary gate generating the same Boolean function , then the nfet network · is failing to conduct when the nfet network of a similarly constructed conventional binary gate would. Therefore the nfet network is not equivalent to the nfet network in a similarly constructed conventional binary gate generating the same Boolean function, which contradicts the initial assumption.
If (5), since a centering method is being used to set all floating outputs to </>, therefore the output line must not be floating . If this is true, then either or both of the pfet network and the nfet network are conducting when input conditions do not warrant it. See (1) and (3) above for refutation.
If (6) , since a centering method is being used that can set only floating outputs to </>, therefore the output line must be floating. If this is true, then either the pfet 41 network or the nfet network are not conducting when input conditions warrant it.

Binary Plus and races
We wish to prove that combinational blocks of Binary Plus logic, as defined, are free from races (hazards). We begin by defining the input conditions that must exist: Intuitively, the requirement for a Binary Plus compatible source would be satisfied by a tri-stated binary source, in which the tri-state buffer is not enabled until the value it will release to the Binary Plus logic block is static, and which employs a circuit mechanism to ensure that floating outputs to the logic stage are "centered" to ¢. The term Binary Plus evaluation phase will be defined shortly.
We proceed to define a Binary Plus logic stage and a Binary Plus evaluation phase:  ( 4) there is a sequential dependency in the Binary Plus logic stage.
If (1), then the source of the signal is either not a Binary Plus compatible source as defined, or it is not properly functioning. Either or both of these contradict the assumptions of the theorem.
If (2), since outputs from either properly functioning Binary Plus gates or properly functioning Binary Plus compatible sources cannot exhibit the observed behavior, one of these sources is malfunctioning, which contradicts the assumptions of the theorem.
If (4), as a Binary Plus logic stage is defined to be a combinational construct, sequential operation contradicts the assumptions of the theorem. Q.E.D.

Summary
This chapter has defined, intuitively and formally, Binary Plus logic. We have seen that Binary Plus logic is a binary logic, for , by definition , when critical input values are valid, the product is identical to what it would be if processed by Boolean binary logic.
The characteristic that distinguishes Binary Plus logic from classic binary logic is its use of zoned binary, wherein there is a third state between a binary 0 and binary 1. This state is not a new value, but instead represents a signal that the value is unknown. Binary Plus logic maintains the integrity of zoned binary through its gates, implying that an output remains in the unknown range, represented by the 44 zoned binary notation </> , until inputs defining a critical set for a valid output have themselves become valid binary zeros or ones.
The design characteristics of Binary Plus logic gates have been defined (and formally proven) to include connection of detector outputs to the nfet and pfet networks of the gate, while the details of detector and gate design have been left for Chapter 4. The Binary Plus logic stage has been defined, and formally shown to be immune from races.

Chapter 4 Design and Implementation
In this chapter we shall examine the design considerations and methods employed in the creation of Binary Plus gates. The design of a detector for zoned binary is discussed in detail.
We shall then proceed to briefly discuss some rudimentary applications for the concepts embodied in zoned binary and Binary Plus logic, discussions that will motivate our in-detail look at two applications areas discussed in Chapters 5 and 6.
Finally, introductory information on a fabricated proof-of-concept integrated circuit will be given, to include testing of elementary detection concepts and Binary Plus gates.

Detector design
The requirements for our detector as described in Chapter 3 allow us to draw an initial block diagram for the required detector (see Figure 4.1).
Clearly, there must be a form of voltage comparison taking place m order to determine in which zone the input exists at any moment.
While we could use a scheme that compares a logic level to two reference voltages, either supplied externally or generated in some way internal to the integrated circuit, it was desired to use a simple method , not using approaches thought of as "analog".
Consequently, a novel method of voltage comparison was devised that, by itself, Ov 5v  Weste and Eshraghian [14] then derive an expression for t he transition point of the inverter (Vin) by noting t hat, in the inverter, 49 which yields:

The design equations
Assuming for approximation purposes that Vin = -Vip, and setting /Jn = /3p, they obtain: , establishing that , in the ideal case and with the lengths and widths of the pfet and nfet transistors in an appropriate ratio, t he transition point of the inverter will be vh.
As we wish to derive an expression for the design-modifiable characteristics of the pfet and nfet transition voltages as a function of the desired transition voltage Vin , we rearrange 4.1 appropriately and obtain: as our expression for t he nfet:pfet ratio of the betas of the transistors.
Our aim now becomes expressions for the size of t he nfet or pfet transistors as functions of the other device 's size and the nfet:pfet ratio of the betas of t~e transistors in 4.2 above. For clarity, we define: as our term for the nfet:pfet ratio of the betas of the transistors. We recall from [14] that: ' so we will also define ratio terms Gn and GP such t hat: 50 (4.4) and (4.5) . Restating 4.3 above: Obtaining expressions for Gn and GP: ( 4.7) and (4.8) We have in 4. 7 and 4.8 expressions for the required geometry of the nfet and pf et transistors, in terms of the required beta ratio, the geometry of the other transistor, and two fabrication parameters. If we further wish to assume equal channel lengths Ln and LP, and referring to 4.4 and 4.5 we have: (4.9) and (4.10) Finally, eliminating our convenience terms R and G completely by remembering from 4.2 and 4.3 that: , we can now state complete expressions for the width of the nfet and pfet transistors: 51 (4.11) and (4.12) The expressions in 4.11 and 4. 12 become t he design equations for sizing the active elements of an inverter to achieve a specified transition point. This makes it possible to create the detector circuit shown in Figure 4.2.

Binary Plus gate design
Binary Plus gate design was describ ed and proven, in the general case, in Chapter 3. Now we will look at design as applied to a specifi c gate.
The detector design shown in Figure 4.2 provides t he needed XH and XL signals for gate design. Consider, however , t hat we do not need a RDY signal, and can therefore dispense with that circuitry from our original detector design. The inverter pair alone provides us with t he needed XH and XL signals.
We now can see why XH and XL were defined in Chapter 3 as inverted versions of the input -they can be easily generated through the use of inverter pairs.
As a first step in making use of this to design a Binary Plus OR gate, we need an expression for OR t hat will include inverters on the inputs. Beginning with: we apply DeMorgan's theorem to yield: f =A·B ( 4.14) condition can be simulated in CMOS circuitry using weak, always-conducting transistors. As we mentioned in Chapter 3, a disadvantage of this approach is that these weak devices are always conducting, resulting in continuous power dissipation, not a desirable condition. We shall see in Chapter 5 how a "dynamic" approach alleviates this problem.

Internal versus external complemented inputs
In Chapter 3 we discussed the internal gate wiring procedure to be used if internally complemented inputs were to be used in a complex gate. The reader will recall that the conclusion was that the complemented XH should be used as input to the nfet network and the complemented XL should be used as input to the pfet network, the opposite of their uncomplemented signals.
It should be clear that if we choose to complement outputs externally to the Binary Plus gate, then as far as the gate internals are concerned, all inputs are noncomplementedthat is, there is no need to connect signals from a complemented "XL" inverter pair output to the pfet network nor those from a complemented "XH" inverter pair output to the nfet network.
The decision to do this, rather than to complement internally, involves trade-offs that must be considered by the implementer. For example, how many other Binary Plus gates require the same complemented inputs? Such external complementing also increases the number of inverter pairs at the input to the complex gate, as much as doubling them. Additionally, one must bear in mind that any external inverters in such a scheme must be Binary Plus inverters, which maintain the integrity of the zoned binary value and signal through the inversion, as shown in Table 4.2, whereas complementing inside a Binary Plus gate ("downstream" of the inverter pairs) requires only a pair of standard inverters for each input to be complemented.

Rudimentary applications
Earlier in this chapter we provided a design approach for detecting unknown values and , in combination with material presented in Chapter 3, showed how such detection could be used to implement the Binary Plus logic family.
We shall now consider some additional and rudimentary applications of this knowledge our detection capability enables. It is not suggested that these are demanding or sophisticated uses for this technology, nor that they in any way constitute 55 an exhaustive list of such uses. They are meant to be illustrative of what can be done with almost trivial applications of the information developed by "decoding" a binary line as a zoned binary source.
Informat ion need not be used to its complete advantage. Sometimes a mmor implementation of a concept can lead to "enough" improvement with minimal expenditure in design and space. So it is with the concept of using the fact of uncertain logic levels to solve problems or improve performance. Engineering is, above all, a practical process. It is not desirable to implement more of a costly enhancement than is needed to achieve t he required level of performance.
In Chapters 5 and 6 we shall study more demanding applications.

Warnings of potential problems
Sometimes it may be adequate to provide warning of circuit inputs that lie in this uncertain zone. Simple indicator lights, readable outputs, or generation of an interrupt to a processor -all are possibly useful features in given circumstances, and could be implemented as desired by the designer. One could even envision a case in which more than one zoning could be performed on the same input as in Figure 4.8.  When we introduced the detector described in Section 3.1, our motivation was the detection of naturally (or unnaturally) occuring undefined logic levels. Chapter 2 was partially devoted to describing the possible sources of undefined logic levels; our aim in designing a detector was to infer the activity of one or more of these causes.

Passive encoding
It may be desirable , for example, to determine that a connector has become detached, or that a cable has been cut. Functionally equivalent to an "open" , as discussed in Section 2.2.1 , these occurrences would typically result in "floating" inputs, which, we mentioned, might take on a value in the undefined zone, but which might also take on any other value, conceivably even one outside the Vss :::} Vid range. Therefore this situation, like any open, cannot be reliably detected. However, if we take design action to prevent a floating value, and indeed to force a value in the undefined zone in this circumstance, we then have a reliably detectable condition, as in Figure 4.9.
What we have done here is explicitly encoded the </> state onto the line, ensuring that, in the even of an open on that line, the condition will be reliably detected. It should be noted that the resistors shown in Figure 4.9 need not even be particularly accurate, depending on the size of the uncertain zone.

Active encoding
Consider another example illustrative of how the encoded nature of zoned binary can be put to work, this one active, in contrast to the passive encoding described above.  This is combined with the passive resistor pair from the previous example to yield a "fail safe" sensor. The design illustrated protects against: • an out-of-range condition in the sensing element, • a broken cable , • a disconnected cable at either end , and • possibly, a power failure at the sensor.

Other encoding
The two examples given are rudimentary. The concept of using the detector as a zoned binary decoder can be useful in any application in which it is desirable to

Introduction to the proof-of-concept circuit
It was desired to test the concepts developed in Chapters 3 and 4, as well as the applications that will be discussed in Chapters 5 and 6, by designing and fabricating a proof of concept circuit addressing some of these areas.
In this chapter we will consider an overall view of this circuit and testing setup, and examine and test in detail elementary zoned binary detection and Binary Plus gates implemented as part of the circuit.

Overall view
It was desired to test as many concepts as possible within the constraints of the space afforded by a 4 mm 2 chip. As there are many different applications of the concepts that are the subjects of this work , it was decided to implement different concepts as independent subsets of circuitry. It was also decided to bypass the testing of trivial applications (such as those discussed in Section 4.3.1 in favor of the more complex areas of asynchronous systems (Chapter 5) and communications applications (Chapter 6).

Experiments implemented
It was decided to implement the following circuits: • the dual inverters (1 / 3 vdd and 2/3 vdd) used to detect the presence of levels in the uncertain zone.
• a small collection of Centered Binary Plus logic elementary gates • an asynchronous "stage" whose input set sensitivity could be measured • a circuit illustrating the concept's use to communications Dual inverters: This component was included in order to test the proper operation of the inverters at inputs of VSS ) vh and Vid-One input pin and two output pins ("3.3" inverter output and "1.7" inverter output) were required to interface this component to external test circuitry. If independently implemented , these gates would have required 10 input pins and four output pins. In the interest of conserving pin availability for other circuitry, it was decided that these gates would partially share inputs. There are three input pins used for the two 3-input gates, and 2 input pins used for the two 2-input gates, for a total of five input pins.
Asynchronous stage: To demonstrate the varying speed of a circuit whose completion time is sensitive to the input pattern, a 4-bit ripple-carry adder, implemented in Centered Binary Plus logic , was chosen. [The concept of Centered Binary Plus logic will be covered in Chapter 5.] No effort was made to make this design space-efficient and, instead, standard Centered Binary Plus logic AND and OR gates were used to construct the full adders that make up this design.
The implemented asynchronous stage requires eleven inputs and nine outputs; these will be described in detail in Chapter 5.
Communications application: It was decided to implement a 9-bit simple paritybased checker/ corrector, using the concepts developed in Chapter 6. The primary circuitry was developed as a bit-sliced construct containing, in each bit, all circuitry necessary for detection , dual parity checking and output multiplexing.
This circuit requires ten inputs and eleven outputs; these will be described in detail in Chapter 6.

Layout
The circuit was implemented on a 2.     For reliability, and to ensure an adequate supply of power, at least two pads are customarily allocated for each supply voltage; this would lead to a requirement for 6 power supply pins, for a overall count of 59 pins.

Pin conservation
Two methods were used to reduce the number of required physical pins.
Input sharing: As the 9-bit Parity Checker/Corrector was an entirely separate experiment, there was no need to be able to control its inputs separately from those of the 9-bit Ripple-Carry Adder data. Nine input pins were therefore shared between these two experiments. Additionally, the input to the Binary Plus inverter pair was shared with one of the inputs to the 2-input Centered Binary Plus logic gates. These economies saved 10 pins.
Output pin sharing: Again , as for input pins, the fact that the experiments on this circuit were functionally separate and independent enabled the sharing of output pins. This, of course, required that multiplexers be used to select which of the two possible outputs a pin would relay to the external world. This requirement meant that we would have to allocate a new pin for multiplexer control. But by doing so, it was possible to multiplex eleven outputs from the 9-bit parity checker/corrector with outputs from the adder and the Binary Plus dual inverters. 21 pins were thus made "doubly useful", providing a surplus of two pins in the 40-pin package. One of these was allocated to output multiplexer control, and the other was used as a diagnostic check on the output multiplexing circuit.

Test board
A test board was constructed to allow efficient input of allowable values and measurement of outputs. Figure 4.17, also included at the end of this chapter, depicts the schematic of this board.

Binary Plus component experiments
The purpose of these circuits was to verify t he proper operation of the inverter pair that decodes the three-state zoned binary into 0, </ > and 1, and to check the operation of two and three-input Centered Binary Plus logic AND and OR gates.

Circuit descriptions Binary Plus inverter pair
This inverter pair is implemented as shown in Figure 4.12. Outputs XH and XL are routed directly to the appropriate output multiplexers.
x  The output from these gates was routed to multiplexers for output.

and 3-input Centered Binary Plus logic AND gates
The AND gates are implemented in a similar manner to the OR gates discussed in the previous section.

Binary Plus inverter pair
Testing of the inverter pair was straightforward. Logic level inputs of 0, ¢and 1 were applied to the input , and the output observed as shown in Table 4

2-input gates
All possible input combinations were tested for the 2-input AND and OR gates.
Results were as shown in Table 4.7.
The measurements were not as predicted. Those entries in Table 4. 7 marked with an "*" should have been an output of ¢. It is likely that this is due to an experimental design oversight on the part of t he author.
As designed , the output from each circuit is routed to a multiplexer, the reason for which was discussed earlier in Section 4.4. 3, and from there to strong output pad buffers. The multiplexers are constructed from pass switches, and are less likely than other components to alter the transmitted voltage level. The buffers are another matter. In the manner discussed in Section 2.2.2 , values in the range of¢ a re highly likely to be transform ed to a logic level 0 or logic level 1 by the two powerful, cascaded inverters that make up t he buffer.
We can note in advance, however , that the test results for the adder discussed in the next chapter provides evidence that these 2 and 3-input AND and OR gates

3-input gates
All possible input combinations were tested for the 3-input AND and OR gates. The same difficulty with the output buffers converting 1> outputs to valid O's and 1 's was again noted.

Summary
In this chapter we developed the design , to include design equations, for the zoned binary detector, as well as illustrating specific designs for Binary Plus gates, the theory for which had already been covered in Chapter 3.
We examined a few rudimentary applications for the concepts involved, and addressed an important point: that once a method of detection of 1> has been created, originally motivated by the desire to detect a condition created by problems in the circuit or timing inadequacies, it can be used in conjunction with methods that purposely set the logic level on a line as ¢ . Binary Plus concepts can be used in either " d" mo e , although our definition of a Binary Plus logic stage in Chapter 3 was based 69 '' I around the latter mode. Finally, we provided an overview of a circuit fabricated to test the concepts in this work, and provided specific details and testing data appropriate to the material covered in this chapter. Circuit details and testing data appropriate to concepts discussed in Chapters 5 and 6 will be covered in those chapters.  ( Chapter 5

Centered Binary Plus logic
In this chapter we shall further develop the Binary Plus concept to include its dynamic version , Centered Binary Plus logic, and that version 's potential for use in asynchronous systems. We will look at gate design for Centered Binary Plus logic , and how gates can be combined into combinational blocks of differing granularity.
We shall also examine asynchronous circuitry implemented on the proof-ofconcept circuit, and describe the testing procedure and its results .
We begin by very briefly reviewing the operation of "dynamic logic" in VLSI CMOS circuits , and reviewing in more detail the principles behind asynchronous systems.

Static versus dynamic logic in VLSI design
Static logic designs in CMOS typically use complementary logic, as described in Chapter 3. Complementary pfet and nfet networks "pull up" or "pull down" the output line. In dynamic logic design , the pfet network is replaced by a precharge phase, during which a pfet device precharges the output to a logic 1 (Vdd)· Then the nfet network is given an opportunity to pull down the output line during an evaluate phase. If the nfet network does not conduct, the output line remains charged to a logic 1. A moment's thought will reveal the sensitivity of dynamic logic to timing -specifically races. If the proper final value of an output is l 1 but a race exists in the circuit such that the nfet network momentarily conducts, then the output precharge will be dissipated, and the output will take on a value of 0. Even should the race condition then be resolved , and the nfet network cease conducting, the damage has been done: there is no mechanism that will "pull up" the output, as there is in a static gate (the pfet network) . So the consequence of a race to a dynamic circuit can be very serious, and must be guarded against carefully.
Weste and Eshraghian [14] cover dynamic logic design and considerations in some detail, and can be referred to for a fuller understanding, if the reader so desires. Such an understanding is not required for comprehension of this work, as what has been mentioned above should be adequate to our development of Centered Binary Plus logic later in this chapter.

Overview
Most circuit design today is synchronous -data is clocked through sequential circuits (which contain combinational blocks of logic) by a master clock signal. In Section 2.2.1, we discussed the fact that the delay in the slowest block of circuitry was the determining factor in how fast the system, governed by the system clock, could be run. We also made reference in Section 2.2.3 to the criticality of balancing pipeline stage delays so as to allow the master clock governing the pipeline to run at the maximum rate.
A different design philosophy aims to eliminate the need for an all-governing system clock, which in turn can reduce the impact of delays in individual stages on the overall system speed. This approach, called "asynchronous systems", studies many different forms of systems that do without a global clock signal.
One form , referred to as "wave pipelining" [20], relies on carefully balanced signal transmission paths to enable the sending through of data in waves; careful attention to design is needed to ensure that the results from one wave are distinguishable from those in preceding or following waves.
Another approach to asynchronous systems seeks· to capture many of the advantages of avoiding a global system clock, while reducing the sensitivity to delay tuning characteristic of wave pipelining circuits. This is referred to as Globally Asynchronous Locally Synchronous design, or GALS. [5] In a GALS system, each local block runs independently. One set of data is handled by a block at one time, and no further data is admitted to the stage until completion has been detected and the output data latched. A given logic block may complete with one time delay for one set of data, and complete with a different delay for a different set of data. Statistically, the delay attributable to the block is therefore the mean of the delays over a potentially wide range of data input sets, instead of the maximum of those delays over all possible input sets, as would be the case for a globally clocked design.
In Section 2.2.1 we mentioned that increased power consumption is the cost of running a circuit as fast as possible, a~d explained that power is consumed by transitions from one logic state to another. Self-clocked schemes such as GALS provide one way to reduce power consumption. An independent stage -not governed by a global clock -will consume power only when being used. A segment of circuitry not needed will never operate, and will therefore not contribute to power consumption. [5] Binary Plus logic clearly has the potential to contribute to a completion-signaling Binary Plus logic , we shall see, has these necessary characteristics as a byproduct of its design.

Implications for input set sensitivity
In an asynchronous system, a logic block no longer must be given adequate time, every time, to complete its worst case function. The performance can vary with input data; as soon as a function is complete, the output data can be latched and the functional logic block can be given its next set of input.
This latter characteristic has more significant implications for design than might first be thought. For example, the synchronous nature of most systems has resulted in much effort being expended in creation of designs that have good worst case performance, versus good or at least adequate mean performance.
Consider the "lowly" ripple-carry adder shown in Figure 5.2. This adder is rarely used in synchronous designs because of its very poor worst-case performance.
The worst case gate delay for such an adder, using a typical full adoer design, is given by: where n is the operand size in bits. For a 16-bit adder, the worst-case gate delay is 35. This occurs when a carry generated in the low-order bit full adder is propagated In an asynchronous system, in contrast, the mean gate delay is a better measure of an adder design's efficiency. Using a 16-bit adder as an illustrative example, there are 2 16 possible configurations of input bits for each operand, leading to a total of 2 32 possible "problems", or input sets, that can be presented to such an adder. For each of these input sets, one can readily see that the total gate delay -the time before all outputs will have "settled" to their final, valid values -can be computed from the above formula, substituting for n the maximum number of consecutive carries (the largest "carry chain") encountered in performing that addition.
Simulating the ripple-carry adder over the 2 32 possible input sets yields the results shown in Table 5.1.
The mean gate delay can be computed to be approximately 13.27, or roughly 38% of the worst-case delay. There may be situations in which the space advantage of a simple adder design like the ripple-carry, combined with a mean gate delay of 13.25 (and a median gate delay of just over 11), is enough to make its inclusion in a design warranted. If there are additional constraints known to the designer that might further reduce mean delay (for example, knowledge that the Carry-in input is always zero), the simple design may be even more attractive. In any event, this example points to the need to emphasize designs of all kinds with good mean performance for use in asynchronous systems, a significant shift in philosophy.  The ripple-carry adder was used as an example for two reasons. Firstly, the significant difference between it 's mean and worst-case performances highlights the paradigm shift in design for asynchronous versus synchronous systems. Secondly, a small ( 4-bit) ripple-carry adder has been implemented on the fabricated proof-ofconcept circuit.

Globally asynchronous locally synchronous systems
The term asynchronous systems covers many concepts, grouped together under the common characteristic of not requiring a global clock signal. One such concept, wavepipelining, can be described as locally asynchronous. Lam and Brayton, in their 1994 book Timed Boolean Functions [20], succinctly describe both the advantage and the complications of wavepipelining: " in wavepipelining mode , the circuit .. . will be clocked at a period less than the maximum topological delay (or true delay) of a stage; thus a data wave is pumped into a stage before the previous wave reaches the registers at the end of the stage. So wavepipelining circuits operate at higher speeds than conventional circuits, sometimes orders of magnitude higher. Since the clock period is shorter than the delay of a circuit, data from neighboring clock cycles co-exist in the circuit simultaneously, and they can interact to cause the circuit to compute incorrectly. For instance, if a long path and a short path converge at a gate and the clock frequency is fast enough, then the present data on the short path can arrive at the gate earlier than the previous data on the long path, resulting in an invalid computation. Hence wavepipelining circuits involve complex signal interactions in the temporal domain and their proper operations require precise timing analysis." A type of asynchronous system that removes the need for careful timing control in the combinational logic block, while maintaining the advantages of asynchronous systems on a global scale, comes under the general classification of Globally Asynchronous Locally Synchronous (GALS) systems. [5] To develop this type of system from more familiar constructs, let us modify the pipeline shown in Figure 2.16 to explicitly show the interstage "hold and forward" la. tches that must be a part of any pipeline. You can see in Figure 5.3 that the global clock signal actually controls these latches, each of which receives data from a previous pipeline stage and releases it into the next.

OUT
Each stage now takes only the amount of time required to accomplish its task with the specific input set presented to it -it need not wait for a global clock signal to cycle.
While one might at first conclude that t he overall pipeline speed is still limited by the delay of the slowest stage, we must bear in mind that that delay may be long for some input sets, and short for others. We saw in Table 5.1 that a stage composed of a 16-bit ripple carry adder could vary in delay from three gate delays to thirty-five, depending on the input set. If we wished to make the overall pipeline less sensitive to potentially long data-dependent delays in a pipeline stage, we could provide for storage of multiple results in each latch , which would tend to "average out" the delay of a stage. While this would increase the pipeline latency, it would tend to also increase its throughput in the presence of varying stage delays.
We could further enhance the pipeline by expanding its width, as in Figure

OUT
This arrangement, it is seen, would double the capacity of the pipeline. Additionally, since the multiple latches would have the ability to release an available result from, for example, pipeline stage a, into Circuit lb(l) or lb (2), depending on which was available first , it would further "smooth" the operation of the pipeline, making it less sensitive to timing "spikes" caused by occasional inputs sets generating large delays.

Currently used methods for completion detection
Self-timed combinational logic blocks must be able to determine when completion has been achieved and results are valid. There are several methods in use for doing this, of which we shall briefly mention a few.

Bounded-delay: not detecting completion
The bounded-delay technique, such as described in [37], does not concern itself with detecting completion. Instead, it estimates the maximum (worst-case) delay for a stage, and creates a delay element to provide that much delay before the output data is latched and new data is admitted into the stage. While it might at first seem that this approach gives up the benefit of GALS entirely, such is not really the case.
The global clock signal is still eliminated, the prime purpose of GALS constructs.
Additionally, although each pipeline stage now has-a fixed delay, it need not be the same delay as every other stage. Pipeline latency is reduced (in comparison to an equivalent synchronous pipeline) but throughput will not necessarily be improved unless slow stages are duplicated in a manner similar to that shown in Figure 5.5.
The chief disadvantage of this technique is that it does not take advantage of data dependent delay to improve throughput. [38] Dual-rail: doing it twice So-called "dual-rail" techniques, such as proposed in [5], are based on using two independent nfet networks; input to these networks are both the normal inputs and inverted inputs, so that one or the other nfet networks conducts. The RDY signal 81 (completion) for a stage goes to logic 1 when either of the two outputs goes to logic 0 (both were precharged to a logic 1 at the start of the cycle). While these methods take advantage of data dependent delays, they "carry the disadvantages of a very high hardware overhead and slow operation" [38].

Activity-sensing: waiting for steady-state
During the operation of a combinational logic block, the application of new data to the inputs will typically result in various transitions of internal (intermediate result) signals and the output(s). Grass and Jones [38] proposed a method of detecting such transitions; after no transitions had occurred for a specified period of time, completion could be assumed.
Aside from the obvious disadvantage of completion not being signalled until a preset delay period had passed since the last signal transition, the case in which no signal transition takes place also must be addressed; such a circumstance could occur in many ways, but would at least occur when two consecutive input sets were identical. Grass and Jones propose a "minimum delay generator (MDG)" which would signal completion when no transitions at all occurred. [38]

Interstage requirements
In Section 5.2.3 we mentioned the need for "store and forward" latches to receive the results from one stage and , when the following stage becomes available, to apply those results as input to the next stage.
These latches, as has been suggested, can be simple or complex. But at the least, they must be able to: • Latch the results, possibly on the leading edge of the RDY (completion) signal.
• Initiate any required precharge phase for the combinational logic block from which the results have just been latched.
• Signal the preceeding latch when a new input data set may be released into the stage.
• Release the latched data to the next combinational logic block when the following latch signals that it is permissible to do so.
The design of these interstage latches is not a focus of this work. However , it is required that completion-detecting components of the designs to be covered in the next section be able to fulfill the interfacing needs of such latches. These requirements are: • A completion signal must be supplied to the receiving (sink) latch. All outputs from the combinational logic block must be valid and remain valid while this signalling is transitioning from logic 0 to logic 1.
• Any precharge required for completion detection or result determination must be able to be controlled by a signal from the sink latch or as a natural consequence of the results being latched. This process should also reset the completion signal to logic 0.
• Once the precharge has been accomplished, the completion signal must not transition to logic 1 until a new set of data inputs has been presented to the circuit by the input (source) latch, and valid results obtained.

Centered Binary Plus logic
We shall now proceed to adapt the Binary Plus concept to self-timed circuitry. In doing so, we shall combine many concepts covered previously.
In Section 3.3.1 , we saw that the output from a Binary Plus gate will take on a valid logic level only when critical inputs have become valid. As , depending on the logic function of the gate, not all inputs are, or remain, critical, Binary Plus logic can be said to take advantage of data dependencies to improve performance. To do this , we must ensure that all inputs and outputs -as well as internal signals (intermediate results) -are given an initial value of Vh.  This approach has some undesirable characteristics, however:

Precharge is to Vh
• During the precharge part of the cycle, there is a current path from Vdd to V 55 , and therefore power will be used .
• To minimize the power use during precharge, the precharge transistors will have to be made very weak. This will slow the precharge process, impacting the speed of the circuit .
• Due to the variance between transistors and fabrication parameters we have discussed in Chapter 2, the strengths of the pf et and nfet precharge transistors may not be adequately close to equal to assure a precharge value very close to

Vh.
In the interest of eliminating the above problems, we introduce a single, additional supply to the circuit, carrying Vh. This modifies the circuit of Figure 5.6 to that shown in Figure 5. 7.
Note that a pass switch is necessary, as the output line may have to be either "pulled up" from logic 0 to Vh or "pulled down" from logic 1 to Vh.
The advantages of this circuit over the use of weak precharge transistors are: • No path is created from Vdd to Vss · Those supplies are no longer involved in the precharge process. • Any variance between transistors in the pass switch will not affect the final voltage level held by the output line at the end of the precharge process.

Must have both pfet and nfet complementary logic
In the dynamic logic discussed in Section 5.1, the pf et (or nfet) network was eliminated, and a precharge device used in its stead. Due to the fact that Centered Binary Plus logic precharges to Vh , we will still need both a pfet network (to pull the output up to logic 1) and an nfet network (to pull the output down to logic 0). This additional space requirement will certainly be a consideration in deciding whether to use Centered Binary Plus logic in an asynchronous design , but there are compensations, as we shall now discuss .

Inherent speed enhancement
In dynamic logic like that illustrated in Figure 5 . if that specific implementation is taken.

Detection of invalid inputs and defects
This chapter has emphasized the use of the characteristics of zoned binary to asynchronous systems, pointing out how those characteristics can provide for a powerful completion-detection capability. But the designer is free to implement additional enhancements taking advantage of the other uses of our detection capability.
For example, self-timed systems could be equipped with an auxiliary timer to detect when an excessive amount of time has elapsed with no completion being detected. Such an "alarm" could signal a hard or soft defect in the circuitry, or, if it were "designed in" , that a signal that is in the unknown zone has becone critical to the computation being done by the circuit.
Note, however, that Centered Binary Plus logic is a dynamic logic, despite the presence of both pfet and nfet networks . The precharge (to Vh) can dissipate over time, so the detection of non-completing input or circuit conditions must be sensitive to these timing considerations. As the time necessary for inputs to be processed through a Centered Binary Plus logic stage should be, under normal conditions, far less than the dissipation time, timing determination for this purpose should not be difficult to achieve.

Granularity
Just as a large combinational block in a synchronous system can be broken up into balanced pipeline stages, Centered Binary Plus logic provides the paradigm for a designer's choice for breaking up a circuit into self-timed blocks. -A systeP-1. in which the blocks of combinational logic between latches are small could be referred to as having fine granularity, whereas an ALU implemented in one logic block would certainly be said to display coarse granularity.
Much the same tradeoffs exist in the coarse to fine granularity decision as in the breakup of circuits into pipeline stages in synchronous systems, with some additional considerations.
• As in synchronous pipelines, making the granularity finer will tend to increase throughput.
• Space overhead, especially in the form of latches, increases as granularity becomes finer , just as in synchronous pipelines.
• For Centered Binary Plus logic (and other GALS constructs), finer granularity allows for easier "widening" of the pipeline for "bottleneck" stages.
• Granularity in Centered Binary Plus logic pipelines can be taken to the single gate extreme, if advisable from a design standpoint. Each gate contains the essential capabilities to be a pipeline stage.
We shall henceforth refer to a self-synchronized Centered Binary Plus logic block as a granule.

Control and handshaking
While, as stated earlier, it is not a purpose of this work to look closely at latch and control design , it is desirable to specify methods by which Centered Binary Plus logic granules interface with their source and sink latches.

Completion signaling
We have made clear that Centered Binary Plus logic is inherently capable of detecting a valid output logic ;:,ignal. It is left for us to briefly define how such detection applied to several outputs might be aggregated into a granule completion signal (CLS).
Let us expand upon the simple ripple carry adder shown in Figure 5 [39,40] Precharge initiation and completion As the ALLRDY signal will latch the data as it rises , it is also a signal to the sink latch that the precharge can begin . This would be accomplished through the use of the Precharge SET input, as shown in Figure 5  Note that power-saving is automatic with this scheme. The circuit is held in precharge phase, using no power, until there is work for it to do.

Satisfaction of requirements
In Section 5.2.5 were listed three requirements for a stage to fulfill the interfacing needs of interstage latches in a GALS pipeline. Let us now review them in light of our preceeding development: • A completion signal must be supplied to the receiving latch. All outputs from the combinational logic block must be valid and remain valid while this signalling is transitioning from logic 0 to logic 1. will be complete before precharge begins.
• Any precharge required for completion detection or result determination must be able to be controlled by a signal from the receiving latch or as a natural consequence of the results being latched. This process should also reset the completion signal to logic 0.
If the Precharge flip-flip SET input is generated by the sink (receiving) latch, this requirment if clearly satisfied. The ALLRDY signal will go to logic 0 as soon as the first of the results moves out of its valid range due to the precharge operation.
• Once the precharge has been accomplished, the completion signal must not transition to logic 1 until a new set of data inputs has been presented to the circuit, and valid results obtained.
The ALLRDY signal cannot again transition to logic 1 until (a) the precharge phase is released by both the source and sink latches (this implies that both a new input set is ready for release into the stage and that there is "room" in the sink latch for the next result set) and (b) the input set propagates through the stage and makes all results valid.
It would seem that the requirements have been satisfied. Design of the latch is left to the implementer.

,
In Section 5.2.3 were listed other, currently used methods for detecting stage completion in a GALS pipeline stage. We now compare these techniques with the Centered Binary Plus pipeline stage approach just developed:

Bounded-delay:
The Centered Binary Plus pipeline approach takes advantage of input pattern dependencies in completion time, whereas the bounded-delay technique [37] is similar to synchronous approaches in that it requires a worstcase delay be built into the pipeline stage timing. The bounded-delay method, of course, requires significantly less hardware overhead than the Centered Binary Logic method or other methods do.

Dual-rail:
The dual-rail technique [5], as has been mentioned before, is characterized by high hardware overhead and slow operation. While a speed comparison is inappropriate at this time (as no effort has been made to design a detector optimized for speed), we may fairly say that the Centered Binary Plus technique will have a significant hardware overhead. However, it has been proven not to suffer from the sensitivity to races that dynamic techniques like dual-rail have, so Centered Binary Plus pipeline stages should be more robust.
Activity-sensing The chief advantage of Centered Binary Plus logic over activity sensing [38] is that there must be a delay built into activity-sensing stages, over and above the actual completion time. Minimizing such delays makes it necessary to do detailed timing analyses of such stages to ensure that the delay is not excessive.
No claim is made that Centered Binary Plus logic is the best approach to use in all GALS pipelines. However, it does possess its own significant advantages with regard to currently used techniques -factors a designer will take into account in determining the best technique to use in a specific implementation.

exper1-4-bit ripple-carry adder ment
There are typically two primary approaches in designing a complex combinational circuit to perform a given function. One is to use complex gates to implement the function; this method reduces the gate count, but increases design complexity and time and tends to decrease modularity.
The other approach is to use standard circuits for logic functions, even at the expense of additional space. This maximizes regularity, and not only can lead to a reduction in the time to create and simulate a design, but can also lead to being able to judge the design correct by construction. [14] Although the use of complex gates can lead to significant space savings in Binary Plus and Centered Binary Plus logic (due to the reduction in the number of dual inverter based "zone decoders"), it was decided to implement the proof-of-concept asynchronous circuit by use of standard Centered Binary Plus logic AND gates, OR gates and inverters.

5.5.l Ripple-carry adder
The circuit selected to demonstrate the use to asynchronous design of the concepts of Centered Binary Plus logic is the ripple-carry adder. This adder should vary in completion time with differing input data patterns. It was not an aim of this work to produce a fast or space-efficient implementation. The gate-level diagram of the full adder circuit used in this design is shown in Figure 5.11. We introduce two conventions at this point.
Centered Binary Plus logic gates are denoted in the above diagram by the use of standard binary logic gate symbols, superimposed by a "+". This implies: • the existence of zone decoding dual inverters on all inputs, • standard Binary Plus gate design -that is, the routing of the "high transition voltage inverter" output to the pfet network and the routing of the "low transition voltage inverter" output to the nfet network, and • inclusion of components necessary to precharge the output of the gate to Vh.
A standard symbol is shown to represent a full "Ready detector", with its output of both "RDY", indicating that the logic level being measured is in one of the valid binary ranges , and its inverse, RDY, indicating that the logic level being measured is in the intermediate, ¢ range. The presence of both outputs is necessary for proper functioning of the precharge/evaluate cycle, as discussed in Section 5.3.6 and as we shall see shortly.
The organization of the adder itself is very similar to that shown in Figure 5.10.
As modified to use the Centered Binary Plus adder shown above, its final form appears in Figure 5.12.

Precharge control
The prime method for control of the precharge/evaluate cycle in this proof-of-concept circuit is via the PSET and PRESET inputs: Were this circuit to be used as part of a Centered Binary Plus asynchronous pipeline, the ALL output would be used to latch the data from the adder into the sink latch. The sink latch would then initiate the precharge phase by sending a pulse to the PSET input. Once NONE had gone high , indicating that the precharge was This problem could be largely eliminated by the u;e of complex gates. In reality, however, the designer is likely to find that going to the extreme shown in Figure 5.13 is not necessary in the practical sense, for the following reasons: • The load and other capacitance on the output lines (SUMs and Carry-Out) will in most cases be greater than t hat on the intermediate result lines, making it highly likely that intermediate result lines will have reached ¢ during the precharge phase before the outputs do. • Considering the extra time that will be used by the AND tree in Figure 5.13, the designer could just as easily build a short delay into the initiation of the evaluate cycle without adversely affecting comparative timing, allowing even more time for intermediate values to reach </> while reducing greatly the space requirements of the full adder circuits.

Testing strategy
Following the difficulty encountered and discussed in Section 4.5. 2 regarding getting predicted results from elementary Centered Binary Plus gates in cases when one or more inputs were </> , it was decided to run static tests on the adder, in addition to those planned for dynamic operation.
The prime purpose of this experiment, however , was to demonstrate the varying completion times for the adder over a range of input sets. A short pulse was generated using a function generator; this was used to set the precharge flip flop, and was also used as a trigger to a pulse generator, which generated another short pulse delayed from the first. This second pulse was used to reset the precharge flip-flop. This second pulse was also used to trigger a dual trace oscilloscope, on which the output of the ALL signal was also displayed. In this manner, the delay between the beginning of the evaluate phase (the start of the flip-flop reset signal) and the completion signal (the ALL output) could be measured. The duration of the cycle could thus be measured and recorded. The input set could be modified at any time , and a new duration measured and recorded.
As it was desired to obtain some a priori prediction of adder performance relative to input set, in order to compare actual performance with predicted to confirm intended operation, a gate-level simulator was constructed. As it was desired only to get a rough prediction of performance, this software assumed that the delay for each gate-type construct in the circuit was equal. When run on all 512 possible input problems, the following gate delay predictions shown in Table 5

Static testing
Several input patterns were applied to the adder in a static mode. As was the case with the elementary circuit testing discussed in Section 4.5.2, results were correct when all inputs (or a critical subset of inputs) were Valid; when these conditions were not met, the result came down on "one side or the other". Again, this is likely due to output buffer conversion of</> to 0 or 1, although the time required for static measurements would allow for dissipation of the Vh precharge anyway.

Dynamic testing
Randomly selected input bit patterns were applied to the adder and the completion delay measured as described above. Table 5.3 lists the results , trial by trial.
From Table 5.3 it is difficult to see by inspection any more than a rough relationship between the input set and the completion time. It is clear, however, that the input set does affect the completion time. To determine if the completion times measured were, in fact, related to the input-set related performance of the adder as predicted by the gate-level simulator, a correlation was run between the number of gate delays as determined by the gate-level simulator and the actual measured completion time.
A correlation coefficient of 0.5832 was reported (a reasonably positive correlation).
It was reported to be statistically significant at the p=.000 level -highly significant. It is therefore highly likely that the variation in completion time is due to the predicted operation of the adder circuit and that, therefore, the adder is operating as intended.
While the variation in completion time (from a tested minimum of 76.1 ns. to a maximum of 106.0 ns., only 39% greater) is not great, it is likely that there are constant-time factors that are having the effect of minimizing the variation. If we assume that the variation in actual completion time (excluding constant factors such as precharge time and output buffer delay) is roughly proportional to the variation in gate delays as predicted by the gate-level simulator, then we can estimate the constant time C as follows . Since t, the total time measured for completion, can be roughly given as: , where C is the constant time due to factors not related to the input set pattern, dp is the number of gate delays as predicted by the gate-level simulator and d 9 is the delay in nanoseconds per gate delay, then we can use our extreme measurements to set up a simple set of simultaneous equations in two variables: Solving gives us: and C = 54.7ns.
Based on a predicted gate delay range of from 5 to 13 gate delays, we can estimate that our input set dependent delay -ignoring constant-time causes -will range from approximately 21 to 56 ns. , a variation of 166%.

101
,I It is likely that runnmg the stage isolated from output buffer influences will significantly lessen the constant time factor.

Summary
In this chapter we developed the design of Centered Binary Plus logic gates and stages. We saw that Centered Binary Plus logic has several advantages, and is fully capable of interfacing with latches as part of a Globally Asynchronous Locally Synchronous (GALS) pipeline. The technique proposed has significant advantages over each of the examined alternative methods of self-clocking.
We examined a 4-bit ripple-carry adder implemented as part of the proof-of-

Communications applications
Data communications is an increasingly important part of technology. Rarely is it understood, however , how pervasive the concept really is. For communication takes place over not only large but also very small distances. Data must be communicated from one part of an integrated circuit to another, or between integrated circuits in a Multi-Chip Module (MCM) or on a circuit board (for example, from main memory to and from the CPU). One of the two primary purposes of the backplane in systems and other digital devices is to communicate data among the circuit boards in the system.
For our purposes we will consider communication as the moving of digital data (whether by digital or analog communications media) from one location to another, placing no upper or lower bounds on the distance over which it is moved. We shall see that the information that can be derived by use of the detector of Figure 4.2 can be used to good advantage in enhancing the reliability of communications.
Reliability in communications on all scales is generally addressed under the general heading of "error-control coding" . We will not propose an alternative to errorcontrol coding, but will instead show how the use of the information provided by the detection techniques covered in Chapter 4 can be used in conjunction with errorcontrol coding strategies covered in the literature. [7,6,8,9,10]

Hardware and error detection/correction
Much attention was paid in Chapter 2 of this work to transient and static problems that can result in undefined logic levels occurring during the transmission of data from one place in a system to another. While static errors would presumably be detected by an adequate post-manufacturing testing process, transient errors can occur at any time. There are also cases in which new static errors can appear; for example, a cable can be broken, a connector detached or aging of a circuit can cause bus line or device failure.
Many schemes address the detection and correction of such errors. [6] The simplest of these schemes remains the single parity bit found in some semiconductor memories and common in communication designs. It is axiomatic that a single parity bit is limited to detecting 1-bit errors. Errors involving an even number of bits cannot, by definition, be detected by such a scheme. Additionally, the scheme is limited to detection only -an error indication implies that an odd number of bits (usually one) are in error, but cannot identify those bits. Schemes involving a larger number of check bits are generally able to detect a larger number of errors than a 1-bit scheme, and may also be able to point at the bit in error. In a binary system, correction requires merely being able to identify the offending bit; with only two possible values, correction is comparatively trivial. Now consider what effect an undefined logic value might have on a typical circuit based on a 1-bit parity design. As we have discussed in Section 2.2.2, circuitry is going to resolve an undefined logic level into a valid 0 or 1. If the value happens to be the correct one, then no parity error will be detected and the user of the results -human or system -will never be made aware of the possible problem. If, on the other hand , the resolved value is the incorrect one, a parity error will be signaled and the received word will be considered incorrect.
In the above example, we have an excellent illustration of the consequences of discarding information. In one result, the value passed on is presumably correct, but lost was a possible indication that a problem _ exists with the transmission link. The alternate result indicates the existence of a problem, but the location (bit-wise) of 105 1 I the problem is lost.

Error-control coding
In their book, Error Control Coding for Computer Systems, T. R. N. Rao and E.
Fujiwara begin Chapter 1 thusly: "In computer systems, large amounts of data move between various subsystems. For instance, the data traffic between the CPU and main memory may be of the order of 100 million bits every second. Even though the systems are designed for very high reliability, there are bound to be a few errors in these communications caused by such things as atmospherics, electrical noise, component or device malfunctions, or sometimes design or program faults. It is important that the system detect these errors as and when they occur. Some remedial action such as error correction or error recovery must take place before a more serious situation like a system crash arises." [6] Rao and Fujiwara's text provides excellent coverage of the topic of error-control coding, and the reader is referred to that work for an in-depth understanding, including analyses of the probability of various errors in different channel models. We will cover the topic of error-control coding in only enough detail to provide an adequate background for the adaptations proposed in this chapter.

Channel models and errors
When data is transmitted from one site to another , bits may arrive as transmitted or may be received as some other value. Depending on the characteristics of the communications channel, different types of data modification may be possible, with varying probabilities. An examination of some typical models will lead the way to a model most appropriate for the contribution described in this chapter.

Classical (symmetric) error model
A binary symmetric channel is one in which errors may be of the 0 :::} 1 or 1 :::} 0 variety, with equal probability. Additionally, the errors are bitwise independent -an error in one bit neither increases nor decreases the probability that any other bit will be in error. [6]  In an ideal asymmetric channel, as shown in Figure 6.2, the probability of one of the error transitions is virtually zero.

Unidirectional error model
The unidirectional model is a "word-by-word" special case of the asymmetric error model. Rao and Fujiwara define it as follows: "Both 1-errors and 0-errors can occur in the received words, but in any particular received word, all errors shall be of one type; these errors are characterized as unidirectional errors." [6] Binary erasure error model Rao and Fujiwara define a binary erasure model. In such a channel, 0 :::::} 1 and 1 :::::} 0 do not occur, but there may be erasures -a change of a 0 or 1 to a non-existent value.
This channel is depicted in Figure 6.3.

Figure 6.3: Binary Erasure Error Model
This diagram should be of particular interest to us, as it implies the existence of a third state -neither 0 nor 1. In actuality, such a non-value state need not be signaled by a value close to Vh; any other method of determining that a bit is not known (such as a plane-wise parity error in a memory) may be used. [6,7] General analog model If, however, we also wish to detect "erasures", which we will now define as bits that fall within our undefined zone, we have the diagram shown in Figure 6.5. Now adopting our three-state notation of Chapter 3, we can say that an information bit that is transmitted as a 0 may be received correctly as a 0, or incorrectly as a 1 or a ¢. Symmetrically, an information bit that is transmitted as a 1 may be received correctly as a 1, or incorrectly as a 0 or a ¢ . The probabilities of any of these outcomes is dependent on the specific characteristics of the communications channel; their determination is outside the scope of this work.

A caution about transmitting zoned binary
Heretofore we have used a working assumption, first made in Section 2.1, that the boundaries between logic 0 and </> and that between </> and logic 1 are placed at 1/3 Vdd and 2/3 Vid respectively. The implementer must be cautioned against assuming that this is an always appropriate choice. Let us consider the transmission of a zoned binary bit from one location to another. We see that we must admit for consistency the possibility of </> ==;. 1 and </> =? 0 errors.
Returning to our analog equivalence, we realize that for a 0 =? </> or a 1 ==;. </> transition , t here must be an absolute change in analog value of 1/3 Vid , using our boundary divisions as defin ed in Section 2.1 and shown as dotted lines in Figure 6.6.
But for a </> (Vh) to 1 or 0 transition, there need be an absolute change in analog value of only 1/ 6 Vdd· Such errors may be even more dangerous, as they will be, by definition, undetectable except by error-coding techniques.
The designer must consider this problem, especially when contemplating the transmission of encoded zoned binary data over long or noisy communications channels, and consider moving the boundaries for such exceptions to, perhaps, 1/4 Vdd and 3/4 Vid, thereby making the analog "distance" between any valid state and the adjoining state( s) equal to 1I4 vdd.

Distance
All error-control codes are characterized by the fact that not all of the words that can be formed by different combinations of bits are valid. Those that are, are termed codewords, while those that are not are indications of error.
The Hamming distance between two equal-sized strings of binary bits can be computed by counting the number of bit positions in which the values of those two strings differ. The distance (dmin) of a code is the minimum Hamming distance between all pairs of codewords. [6] The distan ce of a code serves as an indicator of the theoretical ability of t he code to detect and/or correct errors. Three theorems from Rao and Fujiwara's text are quoted: "It is necessary and sufficient that the distance ( dmin) of a code is at least d in order to detect any error pattern of weight d -1 or less." "A code C can detect and correct all patterns of t or fewer errors if and only if the code has minimum distance ~ 2t + 1." "A code can correct any combination of t errors and detect up to d errors (d ~ t) if and only if the dmin of the code~ t + d + 1." [6] A distance-2 code, therefore, can detect one-bit errors and correct none. A distance-3 code can detect up to two-bit errors, or, if error correction was required, could detect and correct one-bit errors. To detect up to two-bit errors while correcting one-bit errors would require a distance-4 code. 111 I

Simple parity code
A simple parity code is probably the cheapest and easiest error-control coding scheme in use. It uses one parity bit (or "check" bit) to "protect" any number of data bits.
Intuitively, to generate a parity check bit, we count the number of data bits with a value of 1, and then set the check bit to ensure that the number of ones (including the check bit) is always odd (for "odd parity") or even (for "even parity").
It is easy to see why the simple parity code is a distance-2 code. If you take a valid code word (some number of data bits plus an appropriately computed parity check bit) , and change one data bit position (from 0 to 1 or from 1 to 0), you must also change the parity bit . Therefore each codeword differs from any other codeword by a minimum Hamming distance of 2 bits.
With a dmin of 2, the simple parity code is capable of detecting a single-bit error.

SEC and SEC/DED codes
There are a number of linear codes that provide minimum distances of 3 and 4.
The distance-3 Hamming code can be used as either a DED (double error detecting) or a SEC (single error correcting) code. By adding an overall parity bit to the distance-3 Hamming code, we obtain a distance-4 code, which can be used for DED and SEC purposes simultaneously. Such a code is referred to as a SEC/DED code. [6] In our discussion later in this chapter, we will not be concerned with the construction of these codes and their implementation with encoders and decoders, for which Rao and Fujiwara can be referred to. We will, however , treat them as functional units that can be used to detect and/or correct errors on the basis of the received code alone.

Error location with zoned binary detector
It is clear that a bit received and identified as being in the uncertain zone by our detector of Figure 4.1 has at least a strong potential for being in error. So an array of these detectors -one for each bit of a received word -can provide additional information regarding the location of a possible error that would otherwise be lost.
It is, of course, possible that an error occurs that causes the bit in error to take on a valid value opposite to what was intended. In this event , our detector would not be able to identify it. In this case, we would be no better off than without the detectors, but no worse off either. The error detection circuitry based on error-control coding would at least detect the error, if not correct it.
But if an error-correcting code scheme is in use , why implement the detector scheme in addition? Does the additional location information it might provide gain us anything?
It would seem this is so, according to Rao and Fujiwara: "Because the positions of the erasures are known, the correction of erasures in a received word will be simpler than the correction of errors.
Thus, a given code that is used for error correction can be employed more efficiently to correct erasures." [6] It should be clear from earlier in this chapter that a received value of</> functionally indicates an "erasure"that is , it has changed from a 0 or 1 to neither.

An easy case: the unidirectional channel
Using the known location of erasures in the unidirectional channel described in Section 6.2.1 provides a clear and easy path toward enhancing communications reliability. We know by definition that errors in a unidirectional channel word are all of the same direction: 0 =:::> 1 or 1 =:::> 0. Therefore, the proper binary value of any error is known, provided only that we can identify its location. As our detector points to the location(s) of erasures, those locations can simply be set to their proper value. The enhancement in reliability comes from the fact that this strategy effectively moves the boundaries between logic 0 and logic 1 to a point 2/3 towards the only error transition that can be made. More precisely: • When the only possible error direction is 0 ==> 1, any </> should be set to 0, effectively moving the boundary between logic 0 and logic 1 to 2/3 Vid· • When t he only possible error direction is 1 ==> 0, any </> should be set to 1, effectively moving the boundary between logic 0 and logic 1 to 1/3 vdd· The same strategy could be applied to an ideal asymmetric channel, as described in Section 6.2. l.

Error correction strategies for ¢ errors
In this section, we shall see how the uncertainty detector can be used to indicate erasures to schemes suggested by Rao and Fujiwara.[6] We shall also extend these approaches into a channel model not considered in that text: the "symmetric with erasures model" in Figure 6.5 that we developed from the general model shown in Figure 6.4. This model requires less a priori knowledge about channel characteristics than other discussed models, and so should be more widely usable.
Consider t hat a simple parity scheme with a single check bit can detect one error in a received word and correct none, as it is a distance-2 code. As this is a theoretical limit of the coding structure itself, we must step "outside" the code decoding circuitry if we wish to enhance the performance of a receiving device using such a simple code.
Likewise , coding schemes developed to have more capability, such as DED and SEC/DED codes , have their theoretical limits. An external approach must be used that is , the input must be conditioned in some way by taking advantage of the additional knowledge of error location.
Provided t hat we can identify the location of a bit in error by virtue of its being an erasure, we know one critical fact about that bit: it was originally transmitted as a 0 or as a 1. This may seem trivial, but it points us toward a correction strategy.
The strategy involves the generation of alternative received words, varying only in the values of the bits that were identified as unknown.

Strategy for simple parity codes
Consider a receiver utilizing codewords based on a simple parity check bit. This is a distance-2 code, and so should be capable of detecting a one-bit error and correcting none. Consider, however, the following example: If a received word is " 0 1 0 0 </> 1 1 0 1 ", then it is likely that the transmitted word was eit her " 0 1 0 0 0 1 1 0 1 " or " 0 1 0 0 1 1 1 0 1 " . We can now use the error detection capability of the simple 1-bit parity method to determine which alternative is not in error.
Our strategy for correcting single bit unknowns ("erasures") is therefore to generate two words from our received word, differing only in the value assigned to the unknown bit. Both are t hen processed by a parity checker (either in parallel by two identical checking circuits, or sequentially by one) to choose which of the generated words is the valid codeword.

Extension of strategy to DED codes
A DED code is a distance-3 code, which implies that it should be capable of either correcting a one-bit error or detecting two-bit errors and correcting none. The difference between detecting and correcting is really one of determining the location of the error.
Our strategy is similar to that used for a simple parity code, but since we have two unknown bits (erasures), there are four possibilities for the settings of those two bits.
The four words generated by these four possibilities are independently processed by DED checkers; the orie that is error-free is selected.

Extension to SEC /DED codes
SEC/DED codes are distance-4 codes, which implies that they can detect 3-bit errors or, alternatively, detect 2-bit errors while correcting one error.
For erasure errors, however, "error location capability allows a distance-4 code (SEC-DED code) to correct up to three errors." [6] Suppose we have a received word with a 3-bit erasure. We can generate 2 3 alternative words, and check them all for errors. This is certainly getting to the point where a sequential approach is more practical, as providing eight independent, parallel code-checking circuits can be space-consuming. If, of course, time constraints were extreme enough, the expenditure of space might be warranted.

Extension to the general model
The general channel depicted in Figure 6 It should be intuitively clear that we can no longer correct three errors. Since we can no longer "point" to all three error locations, it will "cost" us to determine the location of that non-erasure error.
We can still, however , do better than correct a single one-bit error, the theoretical maximum that we could accomplish with the symmetric error model of Figure 6.1.
The strategy described earlier in Section 6.4.1 can be adapted to fit this new model, as follows: • Generate two alternatives of the received word , based on the two possible values of the erasure error (whose location is known).
• Route these two alternatives to independent SEC/DED checkers.
• Select the output from the checker that reports a single, corrected error.
If the only error was an erasure error , both checkers will output the correct codeword; one checker will indicate a single, corrected error, while the other will indicate no error.
If there is a single, non-erasure error, both checkers will output the correct codeword and report a single, corrected error.
-If there is both a single erasure and a single non-erasure error, one checker will output the correct codeword and indicate a single, corrected error, while the other will output an incorrect codeword and indicate a double error.
Note that this method can be adapted to a circumstance in which two erasure errors were detected. In this case, a value could be arbitrarily assigned to the second erasure (making it either the correct value or a non-erasure error), and sent to the same circuitry.
It should be pointed out that it is not even necessary for this second erasure to be assigned the same arbitrary value in the two generated alternatives. This may simplify the design of the circuitry generating the alternatives.
We have seen how the information from our uncertainty detector can be used to extend the correction capabilities of standard error-control coding schemes to handle a model in which both erasures (transitions to¢) and classic 1=}0and0=}1 errors can be received.

Implementation example: simple parity code
We can now proceed to illustrate the design of a correction system appropriate to the error-control coding strategies of both Sections · 6.4.1 and 6.4.4. A very simple 4-bit codeword scheme will be shown. Figure 6.7 is not a complete circuit diagram. Depending on the specific errorcontrol coding scheme being used, there would be additional desirable outputs.
Specifically, one might find various error indicators useful, such as: • An indicator that at least one of the inputs was an erasure (¢).
• An indicator that at least two of the inputs were erasures (¢).
• For a SEC/DED code, an indicator that more than two of the inputs were erasures. ( ¢) . • An indicator that errors are present that could. not be corrected.
• An indicator that no errors of any kind were present.
The multiplexers at the inputs to the two checkers are used to either (1) pass the original value of the input bit to the checker, or (2) pass a 0 or 1 (for ·he left or right checker, respectively) to the checker in place of the original input bit (for erasures).
The two checkers each return a "parity correct/error" signal to t he "Selection Circuitry", which chooses which checker 's output is to be used.

The detector once again revisited as a decoder
In Section 4.3.2, we mentioned that the undefined range that can be discerned by the detector need not be a natural outcome of circuit conditions we wish to detectit can be explicitly coded, should there be a valid need.
Early in this chapter, we defined "communications" as "the moving of digital data (whether by digital or analog communications media) from one location to another, placing no upper or lower bounds on the distance over which it is moved." There are many forms of transport media; certainly not all depend on varying voltage levels to represent a 0 or 1. There may be many transmission modes, and various modulation/demodulation methods appropriate to them.
It is possible that a demodulation subsystem may detect an indeterminate state for one or more bits in a received word of digital data. In such a circumstance, that subsystem could emit as output a zoned binary value, encoding the uncertain bit(s) as <f>. The methods of this chapter could then treat those bits as erasures.

7 Partial utilization: some gain at lower cost
Sometimes the tradeoff of space (or time) in order to achieve a given performance gain is not practical. This must be judged on an implementation by implementation basis by the designer. The methods already discussed in this chapter do provide significant performance gain, but at the undeniable cost of either: • at least two code-checker circuits, implemented in parallel, with associated multiplexers and selection circuitry, or • a single code checker, with required circuitry to sequentially present the alternatives to it until a successful decoding into a codeword occurs, the impossibility of doing so is recognized, or the list of alternatives is exhausted.
Space is impacted to some degree, and, in the second approach, time is also lengthened, which may not be practical in a time-constrained system.
Is there any other way in which the information provided by our detector can be used to good advantage, while not requiring such a significant expenditure of resources?

Code-independent advantage
Simply by detecting that one or more bits are in the uncertain range provides the receiver with more information than it had. As this condition would indicate some measure of difficulty with the communications media or transmitting device, it could signal an actual or developing problem before it was detected by the code checker, if any.
In fact , it is simple to link detectors together in such a way as to provide an indication when more than one "erasure" is detected in the same received word, providing an indication of the possibility of a two-bit error, one that would not be detected by, for example, a simple one-bit parity code checker. While this is obviously not the only kind of two-bit error that can occur, it will certainly detect some of them.
Additionally, we might refer to the simple application illustrated in Figure 4.9.
For an external parallel input , for example, ANDing the RDY signals obtainable from the detectors for all lines would provide a single signal indicating the probability of a broken or disconnected cable, or a totally malfunctioning communications link.

Simple set to zero with uniform distribution of erasure errors
Consider the simple expedient of setting all </> inputs to 0. [One could just as easily set them all to one, or set them to one or zero depending on the bit position -it is truly arbitrary, unless there is a priori knowledge about the error distribution (or data distribution) that would bias the decision one way or the other.] We will assume for the moment that the distribution of correct values when </> is detected is a dichotomy with a probability of .5 for each.

Simple one-bit parity checker
Use of this approach would gain no operational advantage with a simple one-bit parity code (distance-2) checker, other than those mentioned above in Section 6.7.l.
It would have an equal probability of causing a bit that would have been correctly interpreted as a 1 (greater than vh but less than 2/3 vdd) to be forced to a zero, causing an error. While this is counter-balanced by the possibility that its proper value was a zero, it is at best a draw.

SEC/DED codes
Consider the possible consequences of setting erasure bits to zero, or some other arbitrary assignment: • When there is one error, and that error is an erasure: setting the erasure bit to zero and passing the resulting word to the SEC/DED code checker will result in either: if 0 was the correct value, no error will be indicated, and the output will be correct, or if 0 was the incorrect value, the SEC/DED checker will correct the error, a single, corrected error will be indicated, and the output will be correct.
• When there is one error and that error is not an erasure: there is no impact.
The error is corrected by the SEC/DED code checker.
• When there are two errors, and both are erasure errors: setting both erasure bits to zero and passing the resulting word to the SEC/DED code checker will result in one of the following: if 0 was the correct value for both bits, no error will be indicated, and the output will be correct, or if 0 was the correct value for one of the bits and the incorrect value for the other bit, then the SEC/DED checker will correct the remaining error, a single, corrected error will be indicated , and the output will be correct, or 121 if 0 was the incorrect value for both bits, then the SEC/DED checker will detect and indicate a double-bit error, and the output will be incorrect (but this will be known because of the double-bit error indication).
• When there are two errors, and one is an erasure and one is not an erasure: setting the erasure bit to zero and passing the resulting word to the SEC/DED code checker will result in either: if 0 was the correct value for the erasure, the SEC/DED checker will correct the remaining, non-erasure error, a single, corrected error will be indicated, and the output will be correct , or if 0 was the incorrect value for the erasure, then the SEC/DED checker will detect and indicate a double-bit error, and the output will be incorrect (but this will be known because of the double-bit error indication).
• When there are two errors, and both are non-erasures: there is no impact.
The SEC/DED checker will detect and indicate a double-bit error, and the output will be incorrect (but this will be known because of the double-bit error indication).
We can determine that there will be no gain over a system in which the received value of all bits in the region of Vh are allowed to resolve themselves into a 0 or a 1 by chance.
Consider that being consistent in the assignment of 0 or 1 will have no effect on the outcomes listed above. Assignment as a 1 or a 0 is as likely to be correct as incorrect.
Since the assignment of the value in the above scheme is arbitrary, and consistency confers no advantage, a random assignment (such as might occur by allowing the values around Vh to resolve themselves) works just as well.
But this conclusion does not eliminate the possible use of this simplified approach in situations in which the distribution of values within </> is not uniform, as we shall see in the next section.

Simple set to most probable value with asymmetric distribution of erasure errors
In Section 6.3.1 we discussed the simple expedient of setting an erasure bit to the "error-susceptible" value for a unidirectional channel or ideal asymmetric channel.
For both of these types of channels, we possessed a priori knowledge that, for any given word, the probability of one of the two possible error transitions is very close to zero. Therefore, knowing that only one of the two transmitted values could be "corrupted" during transmission implied that any "corrupted" value received had to have been transmitted as the "corruptible" value, and so it could be set to that value.
If we have an asymmetric channel, even if not an ideal asymmetric channel (characterized by the fact that the probability of one of the two possible error transitions is very close to zero), the negative conclusions of Section 6.7.2 may be mitigated.
If the probabilities of the 1 ~ 0 and 0 ~ 1 error transitions differ from .5 significantly, the assignment of erasures to 0 or 1 is no longer arbitrary, and so modifying the strategy to set erasures to the most "corruptible" value may yield gains. The designer will have to consider the relative probabilities involved, together with any other characteristics of the communications channel, in deciding whether to implement any partial approach.

·
There are three possible modifications to the approaches discussed in this section, which may be used to some advantage.

Simplified detector
The techniques described in this section do not require a full detection capability.
If, for example, it was desired to set all </> inputs to 0, which might be desirable in processing received words from an ideal asymmetric channel, one could simply pre-process each input as shown in Figure 6.8. A Acondltloned Figure 6.8: Input Bit Pre-Processing (</>::::} 0) As the "3.3 inverter" will not transition to an output of zero until the input rises out of the </> range into the range of logic level 1, all inputs are conditioned by the pre-processing circuit such that all inputs in the </> range will be received as logic level 0.
It should also be pointed out that the designer has the option of varying the transition point of the inverter using the design equations in Chapter 4 so that it will occur at some point other than 2/3 Vdd, in order to best fit the error distribution of the channel.

Post-toggling two incorrect erasures
In one of the cases described under SEC/DED codes in Section 6.7.2, we described the consequences when there were two erasures. For 25% of the cases (in a uniform distribution) , both erasures will be set incorrectly by the simplified scheme discussed in that section, and a double-bit error will be detected and reported; the output will be unusable.
By detecting: • the double erasure (as opposed to any other double-bit error), and • the double-bit error returned by the code checker, we can post-process those two bits using a circuit such as that depicted in Figure 6.9.
In this manner, we can correct those two bits with as much confidence as we could in the double-checker scheme discussed earlier in this chapter. While there is additional space expended on this circuitry, it is not as much as a full dual checker implementation, while it does correct more than other single-checker approaches discussed. As always, the designer must consider the tradeoffs involved , especially

Special case: Bridge detection and correction for bus communications
It should be again emphasized that the techniques suggested m this chapter are meant to be, above all, practical techniques. This implies that, in cases in which special circumstances exist, the designer must as always be alert to the possibility of cost-effective modifications to the underlying concepts. As an example of such an implementation, we consider here the special case of an internal data bus in which temporary bridges are of specific concern.
In Section 2.2.1 , we discussed various physical defects that could cause undefined logic levels. Figure 2.2, reproduced here as Figure 6.10, illustrated one of these defects -a bridge between adjacent bus lines.
It is clear that a bridge between two adjacent bus lines can produce a two-bit error. We know from our earlier discussion that we require a distance-3 code to be able to correct two erasures. We also found that it was necessary to generate four alternatives, passing them through four parallel distance-3 code checkers (or sequentially through one).
Consideration of the special case of bridges, however , allows us to eliminate two of the alternatives. For if, in Figure 6.10, the driven value of D 1 and D 2 are both 0 or both 1, then there is no error -in fact , the effects of the bridge will be undetectable.
Only when one of the driven values is 0 and the other 1 will there be a potential problem. Additionally, only when the resistance of the bridge is low enough will the values be pulled "toward" each other enough to become undefined; if not, they retain their proper, driven values. In the former case, under the reasonable assumption (for parallel bus lines) that both lines are driven and loaded equally, the effect of our low-resistance bridge will be to create two adjacent bit values in the undefined zone.
Since we need to check only two alternatives, we need only two parallel distance-3 code checkers, very similar to the arrangement shown in Figure 6. 7. That figure need be only slightly modified, as shown in Figure 6.11, by alternating the Vss and Vdd multiplexer inputs so that both "01" and "10" patterns will be generated for any pair of adjacent erasures.
Again, the simplicity of this arrangement for correcting a two-bit error depends on an a priori understanding of the defects that are likely to occur. While this circuit would also properly correct a single-bit erasure, a two-bit erasure in which the proper values were "00" or "11" would not be corrected -instead, the circuit would indicate an uncorrectable error.

Comparison with classic method
It might be asked how these methods compare with the use of code-checking circuits alone. To illustrate, we use the example of a 9-bit parity checker/corrector circuit fabricated on our proof-of-concept circuit, as discussed and tested in the following Selection Circuitry Figure 6 .11: Distance-3 Correction System for Adjacent Bus Line Bridges section, compared with a simple 1-bit parity checker. · Table 6.1 , limited to those cases in which a maximum of three errors of both types appear in a 9-bit received word, details the differences in capability based on different input conditions, including patterns that can be successfully handled by neither checker.
The percent age shown for each condition that can be handled by each checking scheme assumes a uniform distribution across ¢: that is, an equal number of</> inputs would be interpreted as zeros and ones by the classic parity checker.
The experimental circuit displays results superior to the classic simple parity checker when there is a single erasure. The simple parity checker is superior in detecting errors when there are both a single erasure and one or two full errors in the same word. The results are identical or mixed in other cases.    Two enhancements are shown: • The P in signal is used to set "odd" or "even" parity.
• A signal D </ > is generated such t hat one or more 1> inputs will set it to 1.

Actual design topology
For reasons of extensibility to any number of bits, t he actual design implemented a "bit-slice" approach . A circuit was designed t hat contained all one-bit components required for the detector, input mult iplexers, two parity-based checkers and the out put mult iplexer, such as shown in F igure 6.13.
Using t his approach led to space effi ciency as well as to extensibility to greater than 9-bit inputs. ...

Functional unit topology
For clarity, we present the design of the implemented circuit organized by function.

Detectors and input multiplexers
The design of the detector is straightforward along the lines described fully in Chapter 3. One output, RDY , is used as a selection signal for the two input multiplexers for each input bit.
When RDY is high , both multiplexers pass the original (valid) input bit through to the pair of checkers. When RDY is low, indicating a¢ input level, one multiplexer P,"--------r------------,  sends a 0 to its checker in place of the original input bit value, and the other sends a 1 to its checker.
Note that the Vss and Vid inputs alternate multiplexers for successive bits, as in Figure 6 .11. This is simply because this part of the circuit was designed to be adaptable to the technique covered in Section 6.7.5 with the substitution of distance-3 checkers for the distance-2 checker implemented. As the assignment of bits in the implementation's scheme is arbitrary, it has no effect on the ability of this circuit to correct 1-bit erasures.

Parity checkers
The checkers implemented in this circuit are straightforward, implementing a bit-bybit exclusive or. The output at the "bottom" of each checker is 0 if a parity error is detected , and 1 if the parity check passes.

Selection circuitry and multiplexer
The selection circuitry is shown in Figure 6.14. Figure 6.14: Selection Circuit for 9-bit Parity-Based Corrector Inputs consist of a "parity error" indicator (0 = no error, 1 = error) from each of the two checkers and the Dq, line indicating that at least one of the inputs was in the </>zone (1 = one or more inputs are </>, 0 = no inputs are </>). The circuitry generates the select signal for the bit-sliced output multiplexer, as well as a Parity Error (P Eout) output.
The truth table for P Eout is shown in Table 6.2.
Notes that apply to the entries in Table 6.2 are as follows: 1. This is the normal state when there are no erasures or other one-bit errors. It can also occur when there are an even number of non-erasure errors.  3. This state occurs when there is no erasure, but there is a one-bit error (or any odd number of one-bit errors) on the input.
4. These states occur when an erasure is indicated, but the two checkers return identical results. This can happen only in the presence of more than one erasure -technically, an even number of erasures.
5. These states occur when there is an erasure that has been corrected. It can also occur when there are an odd number of errors, at least one of which is an erasure.

Testing results
Testing results for this circuit are shown in Tables 6.3 through 6.6. Table 6.3 shows results when all inputs are in the valid binary ranges and Parity is set to "Even", Table 6.4 shows results when all inputs are in the valid binary ranges and Parity is set to "Odd", Table 6.5 shows results when one or more inputs is in the </>zone and Parity is set to "Even", and Table 6.6 shows results when one or more inputs is in the </> zone and Parity is set to "Odd".
The results show that the circuit performs as intended.  Table 6.4: All Inputs in Valid Ranges and Parity = "Odd"

Summary
We have briefly reviewed channel models and their associated errors , as well as some basic theoretical concepts in error-control coding, such as distance. We t hen proceeded to adapt our uncertainty detector to serve the purpose of error location.
This allowed us to use strategies described in the literature to boost the correction capabilities of error-control coding schemes.
We also considered the possibilities for partial implementation of these principles, and found them dependent for their efficacy on asymmetry in the error distribution, or on a restricted set of possible error patterns, both of which are realistic possibilities in specific implementations.
We compared the performance of a parity-based correction circuit to classic parity-based error detection. The proposed circuit allowed error location (and therefore correction) in cases where there was one erasure (0 ::::} 1> or 1 ::::} ¢) and no full errors (0 ::::} 1 or 1 ::::} 0) in the received codeword. Use of the circuit was not without its disadvantages, however; when there were both erasures and full errors in the same codeword, error detection was reduced in some cases. As always , the designer of the specific implementation must take channel error characteristics into account, including the probabilities of various types of single and compound errors, in deciding which scheme to use.
Finally, we depicted the design of a 9-bit, parity-based error correction circuit fabricated on the proof-of-concept circuit. We described the bit-sliced design of this experimental circuit, and presented the testing results showing that the circuit performs as intended.

II Input
Chapter 7

Summary and conclusions
The major contribution of this research is the consideration of unknown logic level values as information. Much of digital logic design views logic as an abstraction, a dichotomy of zero and one. Although it is well acknowledged in VLSI texts that digital logic circuitry is analog in its ultimate nature, efforts are made to make the reality fit, insofar as possible, the abstraction.
In this work, we developed a design for a detector for unknown logic values that does not depend on the existence of reference voltages. While no implication is made that this is the most efficient detector in any regard , it does provide the required information necessary to demonstrate the validity of the concepts covered in this thesis.
Several uses were described for this information , some of them rudimentary but potentially of practical application. We focussed, however, on two specific application areas to illustrate and demonstrate the contribution of this research.
Clock skew, as a result of increasing circuit speeds and concurrently increasing die size, is a serious problem for the future of processor design . Power consumption by advanced processors is also of increasing concern, especially with the proliferation of laptop systems and other portable computing devices. Asynchronous system concepts, especially the GALS (Globally Asynchronous Locally Synchronous) constructs, are well suited to address both of these problems. As logic stages are independently, locally clocked, the need for a global clock is reduced or eliminated. Power usage 137 can be greatly reduced without impacting performance, since a local stage without work to do undergoes no state transitions, so uses no power.
A logic family, Binary Plus logic, and its dynamic version, Centered Binary Plus logic, was developed to fulfill the completion recognition and self-clocking requirements of GALS systems. The design technique for a Binary Plus gate was developed and proven valid, and Binary Plus gates and combinational multiple-gate logic blocks were shown to be free from race conditions. Binary Plus gates recognize an undefined value on the input, and do not display a valid output until there is a necessary and sufficient condition on the inputs to justify it. This provides clear completion recognition , and also allows the logic stage to take advantage of low-delay input sets. The method has significant advantages over other currently used completion-detection techniques in asynchronous design.
To demonstrate the use of these concepts in asynchronous system design , we designed and fabricated a proof-of-concept circuit containing a 4-bit ripple-carry adder, implemented as a Centered Binary Logic stage. Tests on this circuit showed the anticipated effects of input-dependent variations in completion time; a correlation between measured completion time and the performance predicted by a gate-level simulator constructed for the circuit was positive and showed very high statistical significance.
In communications applications, error-control coding techniques have long been used to guard against transmission errors, some of which may be transitions to undefined values. These transitions are termed erasures in the literature. By knowing the location of an error, correcting it is greatly simplified, and an errordetecting/ correcting code can be used to correct more errors than would be possible without the knowledge of the location of an error. Detecting an undefined logic level on an input can be used as an erasure location technique, enabling us to use erasure correction methods well documented in the literature. Such erasure-correction methods were previously limited to environments in which the error could be localized in other ways , such as a current spike (due to an a-particle strike) or from multi-dimensional parity checks in memory arrays.
A 9-bit, simple parity-based erasure detector/corrector was implemented on the proof-of-concept circuit . This system showed itself capable of correcting a one-bit erasure, demonstrating that the knowledge that an input is undefined can be used to boost the detection/ correction capability of error-control coding.

Future work
There are many directions in which further research could be taken to explore the concepts introdllced in this work.
More space-efficient or faster versions of the detector must be developed. The detector as designed in this work is large; this both takes up space and increases capacitance in the driving circuit, limiting its speed. It is possible that techniques using a Vh supply and non-ratioed inverters might create a faster, more space-efficient detector. As a Vh supply is of use in precharging Centered Binary logic stages, this would simply be an additional use for it.
Issues of noise margins for this logic family should be examined. It is clear that in some ways the noise margin is decreased from that of standard binary logic , while in other ways it is increased. For example , it would take less noise to cause a change from a valid binary value to another state ( 1>) as the boundary between either valid value and that state is closer than the boundary between 0 and 1 in pure binary logic. On the other hand , it would take more noise to cause a change from a valid binary value to the opposite binary value, as that boundary has been pushed farther away. In short , the chance of transitioning to a detectable error is greater, while the chance of transitioning to a non-detectable error is less. In the event that it was desired to transmit a 1> from one location to another -and have it arrive as a 1>noise could be a serious consideration , for reasons that were covered in Chapter 6.
For asynchronous designs , Binary Plus compatible input sources, such as pipeline stage source latches, should be developed and tested. Techniques for widening the pipeline should be explored, including expansion of the source latch concept into a stage router.
A major application area not addressed in this work is the use of the concepts we have developed in the area of circuit testing. Adaptation of Boundary Scan 139 techniques to the detection of unknown values would be a significant topic by itself.