IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 4, APRIL 2003
641
A Low-Power ROM Using Charge Recycling and Charge Sharing Techniques Byung-Do Yang and Lee-Sup Kim
Abstract—In a memory, most power is dissipated in high-capacitive lines such as predecoder lines, wordlines, and bitlines. To reduce the power dissipation in these high-capacitive lines, this paper proposes three techniques using charge recycling and charge sharing. The first is the charge recycling predecoder (CRPD), the second is the charge recycling wordline decoder (CRWD), and the last one is the charge sharing bitline (CSBL) for a ROM. The CRPD and the CRWD recycle the previously used charge in predecoder lines and wordlines. Theoretically, the power consumption in predecoder lines and wordlines are reduced to a half. The CSBL reduces the swing voltage in the ROM bitlines to very small voltage using a charge sharing technique with three small capacitors. The CSBL can significantly reduce the power dissipation in ROM bitlines. The CRPD, the CRWD, and the CSBL consume 82%, 72%, and 64%, respectively, of the power of previous ROM designs. A charge recycling and charge sharing ROM (CRCS-ROM) with the CRPD, the CRWD, and the CSBL is implemented. A CRCS-ROM with 8K 16 bits was fabricated in a 0.35- m CMOS process. The CRCS-ROM consumes 8.63 mW at 100 MHz with 3.3 V. The chip core area is 0.51 mm2 . Index Terms—Bitline, charge recycling, charge sharing, low-power design, predecoder line, ROM, VLSI design, wordline.
I. INTRODUCTION
D
UE TO the high demands on portable products, power consumption has become a major concern in VLSI chip designs, especially for embedded memories such as SRAM and ROM. In a memory, most power is dissipated in high-capacitive lines such as predecoder lines, wordlines, and bitlines. Fig. 1 shows a conventional ROM architecture. The ROM core dissipates most of the power because wordlines and bitlines have large capacitance due to the high number of cell transistors. The bitlines dissipate a lot of power because bitlines are highly capacitive and many bitlines are selected for each access. The wordlines also consume a large amount of power. Although only one wordline is enabled during a clock cycle, the wordlines have large capacitance because they are connected to the gates of cell transistors. As the size of embedded memory continues to increase, the large memory is hierarchically divided into small memory blocks and only one memory block is selectively accessed. Thus, the power consumption in bitlines and wordlines of the large memory is reduced. However, the power consumption in the predecoder is still large due to the high-capacitive Manuscript received April 23, 2002; revised December 5, 2002. This work was supported by the Korea Science and Engineering Foundation (KOSEF) through the MICROS, Korea Advanced Institute of Science and Technology. The authors are with the Department of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology, Taejon 305-701, Korea (e-mail:
[email protected];
[email protected]). Digital Object Identifier 10.1109/JSSC.2003.809516
predecoder lines. The predecoder lines are connected to many AND gates in the wordline decoder, as shown in Fig. 1. Moreover, the predecoder lines are shared in the divided memory blocks in the large memory. Many techniques have been proposed to reduce the power dissipation in these high-capacitive lines of ROMs. Some of the low-power techniques have been focused on reducing the power consumption by decreasing bitline capacitance and wordline capacitance [1]. This can be achieved by the nonzero term minimization, the selective precharge, the hierarchical wordline, etc. The nonzero term minimization in the ROM table reduces the number of nMOS transistors in the ROM core. The selective precharge technique is to precharge only the bitlines to be accessed. The hierarchical wordline technique is to divide the memory into several blocks and enable only one block. These techniques save the power consumption without losing performance. Other techniques have been focused to reduce the swing voltage of bitlines [1]–[4]. These techniques are independent of the ROM data and the power is saved in proportion to the swing voltage. To reduce the swing voltage in bitlines is a powerful method. However, these low-swing techniques need special ROM cores and they increase the size of the ROM. In this paper, three techniques using charge recycling and charge sharing are proposed to reduce the power dissipation in these high-capacitive lines. The first is the charge recycling predecoder (CRPD), the second is the charge recycling wordline decoder (CRWD), and the last is the charge sharing bitline (CSBL) for a ROM. The CRPD and the CRWD recycle the previously used charge in predecoder lines and wordlines in the next clock cycle. Power consumptions in predecoder lines and wordlines using the CRPD and the CRWD are theoretically reduced to one half. However, the amount saved is smaller than one half due to the control overheads. The CRPD and the CRWD are adaptable for any low-power memory. The CSBL reduces the swing voltage in the ROM bitlines to very small voltage. The CSBL uses the charge sharing technique with three small capacitors. Also, the CSBL has small bitline capacitance because it uses the diffusion programming ROM core. Therefore, the CSBL can significantly reduce the power dissipation in the ROM bitlines. A charge recycling and charge sharing ROM (CRCS-ROM) with the CRPD, the CRWD, and the CSBL is implemented. The CRCS-ROM is suitable for low-power applications. This paper is organized as follows. In Section II, we propose the three techniques for the CRCS-ROM, CRPD, CRWD, and CSBL. In Section III, we present performance comparisons and show test results of the fabricated chip. The conclusion is given in Section IV.
0018-9200/03$17.00 © 2003 IEEE
Authorized licensed use limited to: University of Punjab. Downloaded on November 25, 2009 at 04:11 from IEEE Xplore. Restrictions apply.
642
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 4, APRIL 2003
Fig. 1. Conventional ROM architecture.
Fig. 2. Concept of charge recycling predecoder.
II. LOW-POWER TECHNIQUES USING CRCS
from ground. Therefore, the power dissipation in the conventional predecoder is
A. CRPD Fig. 2 shows the concept of the proposed CRPD. Conventional predecoders have several predecoder lines. Only one preand the other predecoder lines decoder line is charged to are discharged to ground. We assume that predecoder line 1 to ground and predecoder line 2 changes changes from . Fig. 2 shows only two predecoder lines to from ground to explain the operation of the predecoder. In conventional predecoders, the voltage swing in predecoder . The voltage of the previously lines is from ground to and the selected predecoder line becomes ground from voltage of the newly selected predecoder line becomes
where and are the switching frequency and the predecoder line capacitance, respectively. In the CRPD, the voltage swing in predecoder lines is also . However, the newly selected predecoder from ground to by the charge sharing between the preline is charged to . Theredecoder lines before the predecoder charges it to fore, the CRPD consumes half of the power dissipated in the conventional predecoder. The power dissipation in the CRPD is
Authorized licensed use limited to: University of Punjab. Downloaded on November 25, 2009 at 04:11 from IEEE Xplore. Restrictions apply.
YANG AND KIM: LOW-POWER ROM USING CRCS TECHNIQUES
643
(a)
(b)
(c) Fig. 3.
(a) Conventional predecoder. (b) Operation of charge recycling predecoder. (c) Charge recycling predecoder.
The operation is as follows. We assume that predecoder line 1 to ground and predecoder line 2 changes changes from . from ground to or ground. 1) Two predecoder lines are connected to and ground 2) Two predecoder lines are isolated from and connected to share their charge. Their voltages bebecause two lines have almost the same come capacitance. or 3) The predecoder lines are connected again to ground. During this time, the CRPD consumes power in the predecoder lines. The predecoder lines are high capacitive because the predecoder lines are connected to many AND gates in the wordline decoder. Therefore, a lot of power is consumed to drive the high-capacitive predecoder lines in the large memory. The power is dissipated in the predecoder lines and the control logic. The CRPD saves the power dissipated in predecoder lines. If the control logic consumes relatively small power compared with the predecoder lines, the theoretical power saving in the CRPD can be up to 50%. Fig. 3(a) shows a conventional 2-to-4 predecoder. Only one predecoder line among four predecoder lines is charged to and the other predecoder lines are discharged to ground. When the address changes, the previously selected predecoder line is discharged and the newly selected predecoder line is charged.
No charge is recycled between the predecoder lines in the conventional predecoder. However, the 2-to-4 CRPD in Fig. 3(b) recycles the charge used in the previously selected predecoder line. The CRPD address changes after the predecoder charge sharing signal (PDCS) becomes “1.” When the address changes, the previously selected predecoder line and the newly selected predecoder line are connected to share their charges. After the charge sharing, their voltages become . When the PDCS becomes “0,” the predecoder lines are disconnected to drive the or ground. The previously selected predecoder lines to and the predecoder line is discharged to ground from from . newly selected predecoder line is charged to As a result, the newly selected predecoder line consumes half of the power dissipated in the conventional predecoder. The simulated waveforms of the control signals and the predecoder lines in the CRPD are shown in Fig. 4(a). Fig. 3(c) shows an implementation of the 2-to-4 CRPD. Although the conventional predecoder has only an AND gate and a buffer per predecoder line, the CRPD needs a charge sharing driver for each predecoder line. The charge sharing driver is required to select which predecoder line shares its charge. The charge sharing driver is implemented using a simple logic composed of a D-flip-flop, six gates, and a buffer. The D-flip-flop stores the previous status of the predecoder line and the XOR
Authorized licensed use limited to: University of Punjab. Downloaded on November 25, 2009 at 04:11 from IEEE Xplore. Restrictions apply.
644
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 4, APRIL 2003
(a)
(b)
(c) Fig. 4.
Simulated waveforms in: (a) CRPD; (b) CRWD; and (c) CSBL.
gate detects whether the status of the predecoder line changes or not. If the status changes, the transmission gate connected to the predecoder line opens and the predecoder line shares its charge with another predecoder line where the status changes. in the CRPD does not cause any The voltage of leakage current in the wordline decoder in Fig. 5. When the , the wordline enable voltage of the predecoder line is signal WLE is “0.” After the voltages of the predecoder lines or ground, the WLE becomes “1.” Memories become need some time to precharge bitlines. The CRPD finishes the charge recycling operation during that time. Therefore, the CRPD does not decrease the performance of the memory. B. CRWD Fig. 6 shows the concept of the proposed CRWD. A conventional wordline decoder charges a wordline from ground to by row address. After all operations finish in a memory, the selected wordline is discharged to ground. Therefore, the power dissipation in the conventional wordline decoder is
where and are the switching frequency and the wordline capacitance, respectively. In the CRWD, the voltage swing in the wordlines is from . However, the CRWD recycles the charge used ground to in the previously selected wordline using a large capacitor. The . The newly sevoltage of the large capacitor is about by lected wordline, which was at ground, is charged to the large capacitor during the first charge sharing. After that, from . Therefore, the the wordline is charged to
CRWD consumes half of the power dissipated in the conventional wordline decoder. The power dissipation in the CRWD is
The CRWD in Fig. 6 shows a charge sharing driver and a wordline decoder selected by row address. The operation of the CRWD is as follows. and the 1) The voltage of the large capacitor is voltage of the wordline is at ground. 2) The large capacitor and the wordline are connected to share their charge. The charge in the large capacitor is transferred to the wordline. The swing voltage of the large capacitor is very small whereas the swing voltage of the wordline is almost equal to the voltage of the large capacitor because the large capacitor has a much larger capacitance than the wordline. 3) The wordline is disconnected from the large capacitor and . The voltage of the wordline beit is connected to . comes 4) The wordline is connected again to the large capacitor for the second charge sharing. As the result, the charge stored in the wordline is transferred to the large capacitor. The same amount of the charge, which the large capacitor lost to the wordline at the first charge sharing, is supplied from the wordline. 5) The wordline is discharged to ground and the large capacitor is floated. Fig. 5 shows an implementation of the CRWD. The CRWD uses a large capacitor for the charge recycling operation. Initially, the voltages in all wordlines are at ground. When the
Authorized licensed use limited to: University of Punjab. Downloaded on November 25, 2009 at 04:11 from IEEE Xplore. Restrictions apply.
YANG AND KIM: LOW-POWER ROM USING CRCS TECHNIQUES
Fig. 5.
645
CRWD.
wordline enable signal WLE becomes “1” and the wordline charge sharing signal WLCS becomes the first “1,” a charge sharing driver and a wordline decoder are selected by row address and they connect a wordline to the large capacitor. The charge in the large capacitor is transferred to the wordline. The voltage of the wordline becomes almost the same as the voltage of the large capacitor. After the first charge sharing, the wordby the wordline high signal (HIGH). The line is charged to . Therefore, voltage of the large capacitor becomes near is reduced the required power for charging the wordline to to one half. The second “1” of the WLCS reconnects the wordline to the large capacitor. At this time, the charge stored in the wordline is transferred to the large capacitor until the wordline and the large capacitor have the same voltage. The same amount of charge, which the large capacitor lost at the first charge sharing to the wordline, is supplied from the wordline by the second charge sharing. Hence, the voltage of the large capacitor remains constant because the amount of the lost charge and the obtained charge are the same. The simulated waveforms of the control signals and the wordline in the CRWD are shown in Fig. 4(b). The CRWD does not need a voltage generator to make the . The voltage of the large capacitor becomes near automatically after several charge recycling operations. When the large capacitor is connected to wordline whose voltage is at , the lost charge and the obtained charge of the ground and large capacitor are
where , , and are the wordline capacitance, the large capacitor, and the voltage of the large capacitor, respectively. If the large capacitor has a much larger capacitance than the wordline, the swing voltage of the large capacitor becomes approximately zero. At this time, the lost charge and the obtained charge are and The wordline is charged to from ground at the from first charge sharing and it is discharged to at the second charge sharing. At the steady state when the lost charge and the obtained charge are exactly the same, the voltage of the large capacitor becomes
Initially, is unspecified because the large is not , the capacitor is floated. If lost charge and the obtained charge are different so that changes. At this time, the voltage difference of the large capacitor becomes
and
Authorized licensed use limited to: University of Punjab. Downloaded on November 25, 2009 at 04:11 from IEEE Xplore. Restrictions apply.
646
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 4, APRIL 2003
Fig. 6. Concept of CRWD.
size based on the designer’s requirement. However, ten times larger capacitance than the wordline capacitance is enough for conventional low-power applications. In that case, the powersaving ratio in the wordline becomes 45%. C. CSBL
Fig. 7. Voltage variation of the large capacitor after the CRWD is initialized.
When is , it keeps constant voltage. Howis larger than , ever, if becomes smaller than zero voltage. The de. On the contrary, if is creases to be , becomes larger than smaller than increases. Therefore, no zero voltage. The . The voltage is voltage generator is required to generate generated automatically during several charge recycling operations. Fig. 7 shows the voltage variation of the large capacitor after the CRWD is initialized. Several clock cycles are needed . If the larger capacitor is used, more clock cycles to be are required to charge the capacitor. When the large capacitor has ten times larger capacitance than the wordline, about ten . The clock cycles are enough for the voltage to be . CRWD does not consume additional power to generate However, its energy efficiency is lower before the voltage of . the large capacitor becomes Although the large capacitor must have much larger capacitance than the wordline for more energy efficiency, the chip area limits the physical size of the large capacitor. We can choose the
Fig. 8 shows the concept of the proposed CSBL. The CSBL reduces the swing voltage in the bitlines using three capacitors for each group. These capacitors are , , and . is the sum of the drain capacitances of the column select transistors and the wiring capacitance. If more swing voltage is needed, an additional capacitor can be used. Two small capaciand are used to make a reference voltage of a sense tors amplifier. As shown in Fig. 8, the swing voltage in the bitlines becomes small by the charge sharing between a selected bitline and three small capacitors. When the swing voltage in the se, the power consumption of the CSBL is lected bitline is
where and are the switching frequency and the bitline capacitance, respectively. The swing voltage is
where is the threshold voltage of the pMOS transistor. The is controllable by sizing charge sharing voltage and .
Authorized licensed use limited to: University of Punjab. Downloaded on November 25, 2009 at 04:11 from IEEE Xplore. Restrictions apply.
YANG AND KIM: LOW-POWER ROM USING CRCS TECHNIQUES
647
Fig. 8. Concept of CSBL.
Fig. 8 shows a simplified CSBL structure composed of a se, , and . All switches represent lected bitline, transistors and the numbers on the switches shows the timing sequences when the transistors are turned on. A transistor conis a pMOS transistor and the other transistors are nected to nMOS transistors. The operation of the CSBL is as follows. and are charged to and , re1) is discharged to ground. The charges spectively, and and are and stored in , respectively. The voltage in the selected bitline is at ground if the bitline is previously discharged. If not, it has an arbitrary voltage. This just increases the charge sharing voltage without affecting the power. The higher charge sharing voltage improves the noise margin in the bitline. Therefore, we assume that in the worst case the bitline voltage is initially at ground. 2) One of the column select transistors is turned on. The , and are connected and share bitline, . The charge their charges. Their voltages become is made by dividing the total sharing voltage in charge of , , and by the total capacitor of . Therefore, becomes . . If the ROM data is “1,” the 3) A wordline becomes . If not, the voltage voltage of the bitline remains bebecomes ground. As a result, the voltage of or ground according to the ROM data. Therecomes
fore, a sense amplifier is needed to detect the ROM data from the small swing voltage. The sense amplifier needs a reference voltage compared with the input voltage. The reference voltage is made by the two small capaciand . The voltages of and become tors and ground, respectively. When a wordline becomes , and are connected and their voltages be. Therefore, the reference voltage becomes come . Fig. 9 shows a ROM using the CSBL. The ROM has groups. Each group obtains data from bitlines. Only one bitline is selected among the bitlines. The CSBL reduces the swing voltage in the selected bitline. Therefore, the power dissipation in the ROM using the CSBL is significantly reduced. The CSBL uses three extra small capacitors per group. These , , and . has much capacitors are is the smaller capacitance than the bitline capacitance. sum of the drain capacitances of the column select transistors and the wiring capacitance. If more swing voltage is needed in the bitline, an additional capacitor can be used to increase the . We can control the swing voltage by capacitance of . Two small capacitors and are used to sizing make the reference voltage of the sense amplifier. The sense amplifier obtains data by comparing the voltage of the selected bitline and the reference voltage. If ROM data is “1,” the . If not, the voltage becomes voltage of the bitline remains . Control signals for ground. The reference voltage is the charge sharing operation in the CSBL are two precharge signals, an S0 signal, an S1 signal, and two sense amplifier enable signals. The simulated waveforms of the control signals,
Authorized licensed use limited to: University of Punjab. Downloaded on November 25, 2009 at 04:11 from IEEE Xplore. Restrictions apply.
648
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 4, APRIL 2003
Fig. 9. ROM using CSBL.
Fig. 10.
Sense amplifier used in the CSBL.
the selected bitline, , , and in the CSBL are shown in Fig. 4(c). and must have exactly the same capacitance to inand is crease the noise margin. The minimum size of chosen in the range where the reference voltage can overcome voltage variations from external and internal noises and layout mismatches. Fig. 10 shows the sense amplifier used in the CSBL. It must have a symmetric layout to obtain correct data because the input voltage difference is very small. To make the input voltages of the sense amplifier robust from noises and layout mismatches, , , and . Although the larger capacwe increase itors increase the noise margin, they cause more power dissipation by increasing swing voltage in the bitline. A conventional diffusion programming ROM cell, shown in Fig. 11, is appropriate for the CSBL because it has small and regular bitline capacitance. It guarantees that the swing voltage is almost the same for all bitlines. Therefore, we can choose the swing voltage by the sensitivity of the sense amplifier.
(a)
(b) Fig. 11.
Diffusion programming ROM cell. (a) Schematic. (b) Layout.
The CSBL reduces power by lowering the swing voltage in bitlines. The CSBL lowers the swing voltage using capacitors instead of using internal voltage down converters because the
Authorized licensed use limited to: University of Punjab. Downloaded on November 25, 2009 at 04:11 from IEEE Xplore. Restrictions apply.
YANG AND KIM: LOW-POWER ROM USING CRCS TECHNIQUES
649
TABLE I PERFORMANCE COMPARISON
charge sharing using capacitors is more efficient than the voltage converters. If the voltage converter is used, two voltage converters are required to generate a swing voltage in bitlines and half of the swing voltage. The voltage converter increases area and power consumption. Moreover, the voltage converter works and consumes power even if the ROM does not work, whereas the CSBL works and consumes power only when the ROM is used. Also, the voltage converter requires a reference voltage generator, but the CSBL does not need it, because the CSBL generates the swing voltage using capacitors. The swing voltage is easily adjusted by changing the size of the capacitors. Therefore, the CSBL is more efficient than the conventional voltage converter. III. PERFORMANCE COMPARISON AND TEST RESULTS A. Performance Comparisons Table I and Fig. 12 show performance comparisons. Conventional low-power ROMs and the proposed ROM are implemented for the comparisons. All circuit simulations are based on a 0.35- m CMOS process and HSPICE model. Parasitic capacitances are included in the simulations. Powers are measured V. at 100-MHz clock frequency with As shown in Fig. 13, the CRCS-ROM with the CRPD, the CRWD, and the CSBL is proposed. The CRPD and the CRWD recycle the previously used charge in the predecoder lines and wordlines, respectively. The CRPD recycles the charge by connecting the previously selected predecoder line and the newly selected predecoder line. A large capacitor is used for the charge recycling operation in the CRWD. The large capacitor has ten times larger capacitance than the wordline. The capacitor supplies the charge to make the voltage of the . After the wordline is used, the selected wordline capacitor obtains the charge which the capacitor supplied to the wordline. The CRCS-ROM has three capacitors per group to reduce the swing voltage in the selected bitlines. The CRCS-ROM uses the diffusion programming ROM core to reduce both area and power consumption. The compared conventional ROMs are a conventional low-power ROM (CV-ROM) [1] and a ROM implemented with high-capacitive CMOS circuits using a charge sharing
Fig. 12.
Power consumptions in ROMs (8K
2 16 bits).
scheme (HCCS-ROM) [3]. Fig. 14 shows the CV-ROM. It uses the conventional low-power techniques such as selective precharge, nMOS precharge, and the diffusion programming ROM core. Only one bitline per group is selectively precharged by using nMOS transistors for the column select to transistors. The diffusion programming ROM core reduces both area and power compared with the contact programming ROM core. Fig. 15 shows the HCCS-ROM. It uses a charge sharing scheme similar to the CSBL. At first, a small dummy capacitor . Then, the dummy capacitor is connected is charged to to a selected bitline to share its charge. Because the dummy capacitor has smaller capacitance than the bitline, the voltage of the bitline becomes very small after the charge sharing. When the ROM data is “1,” the bitline remains at the swing voltage. When the ROM data is “0,” it is discharged to ground. The HCCS-ROM reduces the swing voltage in the bitline. However, the HCCS-ROM uses three bitlines for a data per group. Two additional bitlines and two additional dummy capacitors are required for two reference voltages of the sense amplifier in Fig. 16. The voltage of the low reference column (LRC) is the same as the voltage of the bitline when the ROM data is “0.” The voltage of the high reference column (HRC) is the same as the voltage of the bitline when the ROM data is “1.” The sense amplifier obtains the stored data by comparing the voltages of the bitline, the LRC, and the HRC. Two reference voltage lines and bitlines must have exactly the same capacitance to generate
Authorized licensed use limited to: University of Punjab. Downloaded on November 25, 2009 at 04:11 from IEEE Xplore. Restrictions apply.
650
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 4, APRIL 2003
Fig. 13.
CRCS-ROM.
Fig. 14.
CV-ROM.
the reference voltages which are the same as the voltage of the bitlines when the ROM data is “0” or “1.” Therefore, the HCCS-ROM programs data by connecting the gate of the cell transistor to wordline or ground, as shown in Fig. 17. The layout area is much larger than the diffusion programming
ROM core. It increases the capacitances of wordlines and bitlines. All bitlines including the reference voltage lines must be discharged in every cycle to make the same swing voltage. These result in higher power consumption and greater area overhead. The bitline swing voltage in the HCCS-ROM is
Authorized licensed use limited to: University of Punjab. Downloaded on November 25, 2009 at 04:11 from IEEE Xplore. Restrictions apply.
YANG AND KIM: LOW-POWER ROM USING CRCS TECHNIQUES
Fig. 15.
651
HCCS-ROM.
(a) Fig. 16.
Sense amplifier used in the HCCS-ROM.
significantly reduced. However, the HCCS-ROM uses three bitlines to obtain one data and its ROM core is larger than the diffusion programming ROM core. We assume that the minimum required voltage difference of sense amplifiers is 300 mV. Therefore, we controlled the bitline swing voltages in the simulations. The bitline swing voltages of the CV-ROM, the HCCS-ROM, and the CRCS-ROM are , 300 mV, and 600 mV, respectively. In the CV-ROM, because nMOS transisthe bitline swing voltage is tors are used to precharge the bitlines. In the HCCS-ROM, the bitline swing voltage is 300 mV because the sense amplifiers directly sense the bitline swing voltages. In the CRCS-ROM, the bitline swing voltage is 600 mV because the reference voltage becomes half of the bitline swing voltage and the sense amplifier senses half of the bitline swing voltage. The bitline swing
(b) Fig. 17.
ROM cell used in the HCCS-ROM. (a) Schematic. (b) Layout.
voltage in the CRCS-ROM is two times the minimum required voltage difference of the sense amplifier. Table I shows performance comparisons of low-power 16 bits, 512 ROMs. Various types of 128-kb ROMs (8K
Authorized licensed use limited to: University of Punjab. Downloaded on November 25, 2009 at 04:11 from IEEE Xplore. Restrictions apply.
652
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 4, APRIL 2003
wordlines 256 bitlines) with minimum cell transistors are simulated for the comparisons. The CRPD with 512 wordlines consumes 82% power compared with the conventional predecoder. The CRWD with 256 bitlines consumes 72% power compared with the conventional wordline decoder, when the programmed data are all “0” for the worst case and the large capacitor is 8 pF, which is ten times the wordline capacitance. The CSBL with 256 bitlines and 16 groups consumes only 25% and 64% power compared with the CV-ROM and the HCCS-ROM, respectively. The swing voltage of the CSBL is two times the HCCS-ROM, but the CSBL uses only one bitline to obtain one data, whereas the HCCS-ROM uses three bitlines. The total power dissipation of the CRCS-ROM is 63% and 80% compared with the CV-ROM and the HCCS-ROM, respectively. Power comparisons in each part of the ROMs are shown in Fig. 12. Theoretically, the CRPD and the CRWD can save the power up to 50%, but they save 18% and 28% power in the implemented chip due to the control overheads. These techniques are adaptable for any low-power memory. The CSBL achieves 36% power saving in the bitlines compared with the HCCS-ROM. The CSBL reduces the swing voltage in the bitlines to very small voltage. The CSBL needs a careful design when the additional capacitors are chosen. However, the CSBL can significantly reduce the power dissipation in the bitlines of ROMs. The CRCS-ROM has control overheads for the charge recycling and charge sharing operations. The CRPD is more complex than the conventional predecoder. Although the number of gates in the CRPD is much larger than the conventional predecoder, all gates are minimized. Therefore, the size of the CRPD with three 2-to-4 predecoders is 8500 m (170 m 50 m), whereas the size of the conventional predecoder is 2800 m (140 m 20 m). The CRPD is three times larger than the conventional predecoder. The CRWD needs four charge sharing drivers and a large capacitor. The charge sharing driver is simple and small because it is composed of only three gates and three MOS transistors. The large capacitor is implemented by ten dummy wordlines to make it ten times the wordline capacitance. Therefore, the number of wordlines becomes 522 from 512. The ROM core becomes 2% larger. The CSBL needs three small capacitors per group. The capacitors are implemented by the same transistors used in the diffusion ROM core to adjust ratios of capacitances of bitlines and the small capacitors. To make the swing voltage 0.6 V when the supply voltage is 3.3 V, the number of transistors for the small capacitors becomes 18% of the number of transistors connected to a bitline. However, the small capacitors are shared in a group. Therefore, when a group has 16 bitlines, the small capacitors increase the size of the ROM by only 1.1%.
B. Test Results Fig. 4 showed the simulated waveforms in the CRPD, the CRWD, and the CSBL. The waveforms show that the charge recycling operations and the charge sharing operation in the CRCS-ROM work correctly. Table II tabulates the features of
TABLE II FEATURES OF THE TEST CHIP
Fig. 18.
Chip micrograph.
Fig. 19. Measured waveforms of test chip at 100 MHz with 3.3 V.
the test chip. A 128-kb ROM with 16 bit output is fabricated V. The maxusing a 0.35- m CMOS process with imum clock speed is 120 MHz and the power consumption is 8.63 mW at 100-MHz clock. The core area of the test chip is 0.51 mm . Fig. 18 shows the test chip micrograph. Fig. 19 shows measured waveforms of the test chip at 100 MHz with 3.3 V.
Authorized licensed use limited to: University of Punjab. Downloaded on November 25, 2009 at 04:11 from IEEE Xplore. Restrictions apply.
YANG AND KIM: LOW-POWER ROM USING CRCS TECHNIQUES
IV. CONCLUSION This paper proposes three techniques using charge recycling and charge sharing to reduce the power dissipation in high-capacitive lines of ROMs such as predecoder lines, wordlines, and bitlines, namely, CRPD, CRWD, and CSBL. The CRPD and the CRWD recycle the previously used charge in predecoder lines and wordlines, respectively. They can save 50% power consumption theoretically but the power savings are reduced by control overheads. These techniques are adaptable for any low power memory. The CSBL reduces the swing voltage in the ROM bitlines to very small voltage using the charge sharing technique with three additional capacitors. The CSBL can considerably reduce the power dissipated in the ROM bitlines. The simulation results show that the CRPD, the CRWD, and the CSBL consume 82%, 72%, and 64% power of previous designs in a ROM with 8K 16 bits, respectively. A CRCS-ROM with the CRPD, the CRWD, and the CSBL is implemented. A 16 bits was fabricated in a 0.35- m CRCS-ROM with 8K CMOS process. The CRCS-ROM consumes 8.63 mW at 100 MHz with 3.3 V. The chip core area is 0.51 mm . The maximum operating clock frequency is 120 MHz.
653
, “A Low-power ROM using charge recycling and charge sharing,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, 2002, pp. 108–109. [6] H. Yamauchi, H. Akamatsu, and T. Fujita, “An asymptotically zero power charge-recycling bus architecture for battery-operated ultrahigh data rate ULSIs,” IEEE J. Solid-State Circuits, vol. 30, pp. 423–431, Apr. 1995. [7] M. Hiraki et al., “Data-dependent logic swing internal bus architecture for ultralow-power LSIs,” IEEE J. Solid-State Circuits, vol. 30, pp. 397–402, Apr. 1995. [8] K. W. Mai, T. Mori, B. S. Amrutur, R. Ho, B. Wilburn, M. A. Horowitz, I. Fukushi, T. Izawa, and S. Mitarai, “Low-power SRAM design using half-swing pulse-mode techniques,” IEEE J. Solid-State Circuits, vol. 33, pp. 1659–1671, Nov. 1998.
[5]
Byung-Do Yang received the B.S. and M.S. degrees in electrical engineering and computer science from the Korea Advanced Institute of Science and Technology, Taejon, in 1999 and 2001, respectively, where he is currently working toward the Ph.D. degree in electrical engineering and computer science. His research interests include low-power and highspeed digital circuit design, low-power memory design, and multimedia VLSI design.
ACKNOWLEDGMENT The authors would like to thank the Electronics and Telecommunications Research Institute (ETRI) for the chip fabrication. REFERENCES [1] E. de Angel and E. E. Swartzlander, Jr., “Survey of low-power techniques for ROMs,” in Proc. Int. Symp. Low Power Electronics and Design, 1997, pp. 7–11. [2] R. Sasagawa, I. Fukushi, M. Hamaminato, and S. Kawashima, “Highspeed cascode sensing scheme for 1.0-V contact-programming Mask ROM,” in Symp. VLSI Circuits Dig. Tech. Papers, 1999, pp. 95–96. [3] M. M. Khellah and M. I. Elmasry, “Low-power design of high-capacitive CMOS circuits using a new charge sharing scheme,” in IEEE Int. SolidState Circuits Conf. Dig. Tech. Papers, 1999, pp. 286–287. [4] B.-D. Yang and L.-S. Kim, “A low-power charge-recycling ROM architecture,” in Proc. IEEE Int. Symp. Circuits and Systems, 2001, pp. 510–513.
Lee-Sup Kim received the B.S. degree in electronics engineering from Seoul National University, Seoul, Korea, in 1982 and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, in 1986 and 1990, respectively. He was a Postdoctoral Fellow with Toshiba Corporation, Kawasaki, Japan, from 1990 to 1993, where he was involved in the design of the high-performance digital signal processor and single-chip MPEG2 decoder. Since March 1993, he has been with the Korea Advanced Institute of Science and Technology, Taejon. In March 1997, he became an Associate Professor. During 1998, he was on sabbatical with Chromatic Research and SandCraft, Inc., Sunnyvale, CA. His research interests are in three-dimensional graphics hardware design, LCD display controller design, multimedia programmable processor design, and high-speed and low-power digital integrated circuit design.
Authorized licensed use limited to: University of Punjab. Downloaded on November 25, 2009 at 04:11 from IEEE Xplore. Restrictions apply.