Implementations of Delay-Insensitive Circuits∗ Joel Reyes Noche Department of Mathematics and Natural Sciences Ateneo de Naga University Naga City, Camarines Sur, Philippines February 20, 2009
Abstract A delay-insensitive (DI) circuit is a digital logic circuit that operates correctly regardless of any delays in the logic gates (modules) or in the wires (interconnection lines). In practice, DI circuits are constructed by putting together DI primitive modules. These primitives have implementations in CMOS (complementary metal oxide semiconductor) technology, in RSFQ (rapid single flux quantum) technology, in SET (single electron tunneling) technology, and on asynchronous cellular automata (ACA).
1
Introduction
An asynchronous circuit is a digital logic circuit that does not use a global clock signal. Delay-insensitive (DI) circuits are asynchronous circuits that make the least restrictive timing assumptions. DI circuits are made by connecting DI building blocks (called primitive modules or primitives). As long as the internal timing assumptions of these blocks are satisfied, the behaviors of circuits composed of these primitives are not affected by the speed of operation of the modules or of the delays in the wires connecting them. Although there have been many delay-insensitive circuit building blocks proposed in the past (for example, [3, 6]), we only consider the blocks created by Patra and Fussell [9, 10]. They present sets of blocks that are universal (can be used to make any DI circuit) and minimal (no proper subset of them is sufficient for making all such circuits). Software models of these are described in [13]; the models aid in the visualization of the primitives’ operation. A description of a DI primitive consists of the accepted behavior of the module in response to its environment and also the accepted behavior of the environment in response to the module. We simplify the discussion here and describe a primitive by giving examples of accepted and unaccepted behaviors. We focus on one DI primitive, the Merge.1 A Merge has two input ports a? and b? and one output port c!. A signal transition on one input port, say, a?, once assimilated by the module, leads to a signal transition on the output port c!. The Merge is serial, that is, every signal on one of its input ports must be followed by exactly one signal on an output port of the module before the next input signal can be assimilated by the module [6, p. 3]. For example, b?c!a?c! and a?c!a?c! are accepted behaviors for the Merge, while a?b?, a?c!c!, and c! are unaccepted behaviors. For the remainder of the paper, we show how a Merge is implemented in different technologies.
2
Complementary Metal Oxide Semiconductor Technology
In a metal-oxide-semiconductor field-effect transistor (MOSFET), a conducting metallic (e.g., aluminum) gate is electrically isolated from a semiconductor (e.g., silicon) channel by an insulating oxide layer (e.g., silicon dioxide). Complementary MOS (CMOS) digital logic uses p-channel and n-channel (enhancement type) MOSFETs (PMOS and NMOS) in complementary networks (when one conducts, the other does not). When used as a ∗ Presented 1 For
at the 2009 Engineering Bicol Research Conference held at the Ateneo de Naga University detailed descriptions of the other primitives, see [9, 10] and also [6, §2].
1
switch, a MOSFET has three terminals: a drain, a gate, and a source.2 When the gate voltage is high, a PMOS does not conduct current from its source to its drain (it is off), and an NMOS conducts current from its drain to its source (it is on). When the gate voltage is low, a PMOS is on, and an NMOS is off. (For details, see, for example, [1, §5.5, 5.6, 5.7].)
Figure 1: Two-input CMOS xor gate (figure taken from [8, p. 162] and modified) A Merge can be implemented in CMOS as an xor gate [16, p. 77] (see Figure 1). Note that here a signal transition is taken to be a change in voltage (either from high to low, or from low to high). CMOS implementations of some of Patra and Fussell’s other building blocks are in [9, 10, 16]. CMOS implementations of DI circuits are complex [11, p. 42], making them inefficient and impractical. “[I]n CMOS technology asynchronous designs actually perform worse [than synchronous designs] with respect to power consumption, wiring requirements, and speed” [6, p. 1034].
3
Rapid Single Flux Quantum Technology
Rapid single flux quantum (RSFQ) logic is based on low temperature superconductors using Josephson junctions (JJ) as the basic switching elements. In current technology, a JJ consists of a pair of niobium superconductor electrodes separated by aluminum oxide as a thin tunnel barrier. Below a certain critical current Ic , such junctions carry superconducting current with no need for a bias voltage across the junction, thus not dissipating any energy. When the induced current exceeds Ic , the junction becomes resistive and undergoes a Josephson 2π phase leap which produces a non-zero voltage drop across the junction. This effect in two-terminal Josephson junctions is the basis for the switching action needed to build logic devices. [...] In RSFQ logic circuits, [...] a bit of information is carried by the propagation of a single magnetic flux quantum Φ0 [...]. These quanta can equivalently be thought of as very short voltage pulses V (t) R of quantized area, where Φ0 = V (t) dt ≈ 2.07 mV·ps. Pulse propagation takes place by biasing the JJs in the circuit so that an arriving flux quantum will cause a JJ to exceed its critical current Ic , thereby switching and emitting a new flux quantum. A full flux quantum is generated whenever a JJ’s Ic is exceeded, regardless of any degradation that may have occurred to the triggering quantum, thus providing for power gain in RSFQ circuits. [11, pp. 44–45] An RSFQ implementation3 of a Merge is shown in Figure 2. 2 A fourth terminal, the substrate, is connected to the highest voltage of the circuit (for PMOS) or to the lowest voltage of the circuit (for NMOS). 3 Patra, Polonsky, and Fussell [11] do not specify the values of the circuit elements for this implementation of a Merge, although for the other DI RSFQ primitives they presented, biasing currents were a little less than a tenth of a milliampere, JJ critical currents were a few tenths of a milliampere, and inductances were a few picohenrys.
2
Figure 2: RSFQ Merge (figure taken from [11, p. 46]) [T]he two inputs are a and b, and the output is c. Ib1 is the bias current, which is split between two arms feeding junctions J3, J1 and J4, J2. The critical current thresholds are arranged so that Ic3 < Ic1 and likewise Ic4 < Ic2 . When a pulse on a arrives, additional current flows through J1, exceeding its critical current, whereupon J1 goes resistive. J3 is not triggered since the current induced by the input pulse is in the opposite direction from the bias current. The SFQ pulse developed consequently across J1 is transferred through J3, across which the potential is still zero, to J5. This causes J5 to trip and emit an output pulse at c. The pulse generated by J1 also trips J4, whose critical current is less than that of J2. When J4 becomes resistive, it prevents J2 from tripping. Since there is no voltage drop across J2, no pulse is emitted back through input b. Inputs on b operate symmetrically. Thus the junctions J3 and J4 serve to isolate the inputs from each other and provide signal directionality (from inputs to output). [11, p. 46] RSFQ implementations of some of Patra and Fussell’s other building blocks are in [11], and some have been fabricated and tested at low frequencies. These DI RSFQ primitives have been used in designs of self-timed pipelined parallel adders [2]. RSFQ technology has sub-picosecond junction switching speed (allowing operation at several hundred gigahertz) and very low power dissipation (below one microwatt per JJ even in its resistive state). However, since low temperature superconductors are used, they must be cooled using liquid helium. There are also “limitations on RSFQ memory density due to the large physical size of a flux quantum and the difficulty of amplifying output signals to off-chip power levels at speeds comparable to those attainable on the chip.” [11, p. 45]
4
Single Electron Tunneling Technology
Single electron tunneling (SET) technology is based on the quantum tunneling effect wherein an electron has a non-zero probability of passing through a potential barrier. In logic circuits using this technology, the switching elements are quantum tunnel junctions. (See [14, §2.1] for more details.) Tunneling through a junction becomes possible when the voltage applied to a junction exceeds the junction’s critical voltage. Since electron tunneling is stochastic in nature, the switching delay is a random variable. [15, p. 704]. For temperatures above zero kelvin it is possible that an electron will tunnel through a junction even though the critical voltage condition is not met. To ensure that thermal effects do not dominate, the temperature must be kept low enough so that the charging energy is much greater than the thermal energy. [14, p. 8] A SET implementation4 of a Merge is shown in Figure 3. The inverted signals V aˆ and V bˆ are produced using SET static inverting buffers (Figure 4). When both inputs are low the inverted input signals are high causing an electron to tunnel through J2 leaving a positive charge on n2. This in turn causes an electron to tunnel through J3 leaving 4 Safiruddin and Cotofana [15] use the values Ca = Cb = Ct = 0.5 aF, Cs1 = 9.5 aF, Cs2 = 10.5 aF, Cg = 10 aF, C1 = C2 = C3 = 0.1 aF, V s = 16 mV, and the resistance of each junction is 25.8 kΩ.
3
Figure 3: SET Merge (figure taken from [15, p. 705] and modified) a positive charge on n3. This is inverted and so the output becomes low, as it should be. If one of the input signals undergoes a transition then the voltage of J2 (and J1) becomes lower than the critical voltage causing the electron to tunnel back leaving no charge on n2 (and n1). This reduces the voltage over J3 to under the critical voltage and the electron tunnels back leaving no charge on n3. This value is complemented afterwards by the output inverter thus the output becomes high, as it should. If the second signal also undergoes a transition then the voltage over J1 becomes higher than the critical voltage causing an electron to tunnel. This cause[s] an electron to tunnel through J3 leaving a positive charge on n3 corresponding to a low output, as it should. If subsequently one of the signals transitions again, going low this time, the circuit goes into the previous state with no charges on n1, n2, and n3. If after that the other signal also transitions the circuit returns to the original charge neutral state. [15, p. 705]
Figure 4: SET static inverting buffer (figure taken from [15, p. 705] and modified) It must be noted that these designs were simulated using ideal conditions (zero kelvin temperature, no cotunneling or background charge effects) [15] and have not yet been fabricated.
5
Asynchronous Cellular Automata
A cellular automaton is an array of cells arranged and connected uniformly. “Each cell is connected to a neighborhood of a finite number of cells, and having a state from a finite state set. Each cell undergoes state
4
transitions according to a transition function, which determines the cell’s state based on the states of cells in its neighborhood.” [5, §2] Asynchronous cellular automata (ACA) “allow any cell to undergo state transitions at arbitrary times independent of the timings of the other cells’ transitions. Due to the asynchronicity, however, computation in ACA may be nondeterministic, i.e., more than one global configuration may evolve from a certain configuration.” [5, §2] Lee, et al. [5, 4] consider two-dimensional ACA in which each cell has a neighborhood composed of its four orthogonal adjacent cells along with itself (a von Neumann neighborhood). They were able to embed a universal set of DI building blocks5 [6] in a 5-state ACA, achieving computational universality [5]. Later, they embedded some of Patra and Fussell’s primitives in a 4-state ACA (see Figure 5). From these they constructed a universal logic element (a Rotary Element [7]), showing that their model has computational universality.
Figure 5: Transition rules in A4 (figure taken from [4, p. 207] and modified) A Merge Core is shown in Figure 6. If we connect two Entrances (Figure 7), a Turn Core (Figure 8), and an Exit (Figure 9), we get the Merge module shown in Figure 10. Although no physical implementations of ACA models like this have yet been created, some possible candidates are discussed in [12].
References [1] Jose Araneta. A First Course in Semiconductor Devices and Circuits. National Book Store, Mandaluyong City, Philippines, 2007. [2] Y. Kameda, S. Polonsky, M. Maezawa, and T. Nanya. Self-timed parallel adders based on DI RSFQ primitives. IEEE Transactions on Applied Superconductivity, 9(2):4040–4045, June 1999. [3] Robert Keller. Towards a theory of universal speed-independent modules. IEEE Transactions on Computers, 23(1):21–33, 1974. 5 Their primitives differ from Patra and Fussell’s [9, 10] by allowing input and output lines of modules to be bi-directional and able to buffer signals.
5
Figure 6: (a) ACA Merge Core; (b) Merge Core operating on a signal on its right internal path (figure taken from [4, p. 212] and modified)
Figure 7: (a) ACA Entrance; (b) Entrance operating on an input signal (figure taken from [4, p. 208] and modified)
Figure 8: (a) ACA Turn Core; (b) Turn Core operating on a signal arriving on its lower internal path (figure taken from [4, p. 210] and modified)
6
Figure 9: (a) ACA Exit; (b) Exit operating on a signal (figure taken from [4, p. 209])
7
Figure 10: ACA Merge module (figure taken from [4, p. 213] and modified) [4] Jia Lee, Susumu Adachi, Ferdinand Peper, and Shinro Mashiko. Delay-insensitive computation in asynchronous cellular automata. Journal of Computer and System Sciences, 70:201–220, 2005. [5] Jia Lee, Susumu Adachi, Ferdinand Peper, and Kenichi Morita. Embedding universal delay-insensitive circuits in asynchronous cellular spaces. Fundamenta Informaticae, XX:1–24, 2003. [6] Jia Lee, Ferdinand Peper, Susumu Adachi, and Kenichi Morita. Universal delay-insensitive circuits with bi-directional and buffering lines. IEEE Transactions on Computers, 53(8):1034–1046, August 2004. [7] Kenichi Morita. A simple universal logic element and cellular automata for reversible computing. In Maurice Margenstern and Yurii Rogozhin, editors, MCU, volume 2055 of Lecture Notes in Computer Science, pages 102–113. Springer, 2001. [8] Joel Noche. An asynchronous single-precision floating-point arithmetic unit. Master’s thesis, University of the Philippines, Diliman, College of Engineering, 2003. [9] Priyadarsan Patra and Donald Fussell. Building-blocks for designing DI circuits. Technical Report TR93-23, Department of Computer Sciences, University of Texas at Austin, 1993. [10] Priyadarsan Patra and Donald Fussell. Efficient building blocks for delay insensitive circuits. In Proceedings of the International Symposium on Advanced Research in Asynchronous Circuits and Systems, pages 196– 205. IEEE Computer Society, November 1994. [11] Priyadarsan Patra, Stanislav Polonsky, and Donald Fussell. Delay insensitive logic for RSFQ superconductor technology. In Proceedings of the International Symposium on Advanced Research in Asynchronous Circuits and Systems, pages 42–53. IEEE Computer Society, April 1997. [12] Ferdinand Peper, Jia Lee, Susumu Adachi, and Shinro Mashiko. Laying out circuits on asynchronous cellular arrays: A step towards feasible nanocomputers? Nanotechnology, 14(4):469–485, 2003. [13] Jesse Sacayanan and Joel Noche. Modeling of delay-insensitive circuit building-blocks using the Hamburg design system. Philippine Engineering Journal, XXIII(2):11–18, December 2002. [14] Saleh Safiruddin. Single electron tunneling based building blocks for delay insensitive circuits. Master’s thesis, Delft University of Technology, Faculty of Electrical Engineering, Mathematics and Computer Science, 2008. [15] Saleh Safiruddin and Sorin Cotofana. Building blocks for delay-insensitive circuits using single electron tunneling devices. In Proceedings of the IEEE International Conference on Nanotechnology, pages 704–708. IEEE, August 2007. [16] Philip Shirvani, Subhasish Mitra, Jo Ebergen, and Marly Roncken. DUDES: A fault abstraction and collapsing framework for asynchronous circuits. In Proceedings of the International Symposium on Advanced Research in Asynchronous Circuits and Systems, pages 73–82. IEEE Computer Society, April 2000.
8