Plls In High Performance Systems_final

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Plls In High Performance Systems_final as PDF for free.

More details

  • Words: 295,065
  • Pages: 512
Contents

Preface

xi

About the Author

xiii Part I

Original Contributions

Devices and Circuits for Phase-Locked Systems B. Razavi

3

Delay-Locked Loops—An Overview C-K. Ken Yang

13

Delta-Sigma Fractional-TV Phase-Locked Loops

23

/. Galton Designing Bang-Bang PLLs for Clock and Data Recovery in Serial Data Transmission Systems R. C. Walker

34

Predicting the Phase Noise and Jitter of PLL-Based Frequency Synthesizers

46

K. S. Kundert Part II

Devices

Physics-Based Closed-Form Inductance Expression for Compact Modeling of Integrated Spiral Inductors S. Jenei, B. K. J C. Nauwelaers, and S. Decoutere {IEEE Journal ofSolid-State Circuits, January 2002) The Modeling, Characterization, and Design of Monolithic Inductors for Silicon RF IC's J R. Long and M. A. Copeland {IEEE Journal of Solid-State Circuits, March 1997) Analysis, Design, and Optimization of Spiral Inductors and Transformers for Si RF IC's A. M. Niknejad, and R. G. Meyer {IEEE Journal of Solid-State Circuits, October 1998) Stacked Inductors and Transformers in CMOS Technology A. Zolfaghari, A. Chan, and B. Razavi {IEEE Journal of Solid-State Circuits, April, 2001) Estimation Methods for Quality Factors of Inductors Fabricated in Silicon Integrated Circuit Process Technologies K. O {IEEE Journal of Solid-State Circuits, August 1998)

73 77

89

101

110

A Q-Factor Enhancement Technique for MMIC Inductors M. Danesh, J. R. Long, R. A. Hadaway, and D. L. Harame {Dig. IEEE Radio Frequency Integrated Circuits Symposium, April 1998)

114

On-Chip Spiral Inductors with Patterned Ground Shields for Si-Based RF IC's C. Patrick Yue and S. S. Wong {IEEE Journal of Solid-State Circuits, May 1998)

118

The Effects of a Ground Shield on the Characteristics and Performance of Spiral Inductors S.-M. Yim, T. Chen, and K. O {IEEE Journal of Solid-State Circuits, February 2002) Temperature Dependence of Q and Inductance in Spiral Inductors Fabricated in a Silicon-Germanium/BiCMOS Technology R. Groves, D. L. Harame, and D. Jadus (IEEE Journal of Solid-State Circuits, September 1997)

127

135

Substrate Noise Coupling Through Planar Spiral Inductor A. L Pun, T. Yeung, J Lau, E J R. Clement, and D. K. Su (IEEE Journal of Solid-State Circuits, June 1998)

140

Design of High-g Varactors for Low-Power Wireless Applications Using a Standard CMOS Process A.-S. Porret, T. Melly, C C Enz, and E. A. Vittoz. (IEEE Journal of Solid-State Circuits, March 2000)

148

On the Use of MOS Varactors in RF VCO's

157

P. Andreani and S. Mattisson (IEEE Journal of Solid-State Circuits, June 2000) Part III

Phase Noise and Jitter

Low-Noise Voltage-Controlled Oscillators Using Enhanced LC-Tanks J. Craninckx and M. Steyaert (IEEE Transactions on Circuits and Systems-II, December 1995) A Study of Phase Noise in CMOS Oscillators B. Razavi (IEEE Journal of Solid-State Circuits, March 1996)

165

A General Theory of Phase Noise in Electrical Oscillators A. Hajimiri, andT.H Lee (IEEE Journal of Solid-State Circuits, February 1998)

189

Physical Processes of Phase Noise in Differential LC Oscillators J. J. Rael, and A. A. Abidi (IEEE Custom Integrated Circuits Conference, May 2000)

205

Phase Noise in LC Oscillators K. A. Kouznetsov and R. G. Meyer (IEEE Journal of Solid-State Circuits, August 2000)

209

The Effect of Varactor Nonlinearity on the Phase Noise of Completely Integrated VCOs JWM. Rogers, J A. Macedo, and C Plett (IEEE Journal of Solid-State Circuits, September 2000)

214

Jitter in Ring Oscillators JA. McNeill (IEEE Journal of Solid-State Circuits, June 1997)

221

Jitter and Phase Noise in Ring Oscillators A. Hajimiri, S. Limotyrakis, andT. H Lee (IEEE Journal of Solid-State Circuits, June 1999)

231

A Study of Oscillator Jitter Due to Supply and Substrate Noise E Herzel, and B. Razavi (IEEE Transactions on Circuits and Systems-II, January 1999)

246

Measurements and Analysis of PLL Jitter Caused by Digital Switching Noise P. Larsson (IEEE Journal of Solid-State Circuits, July 2001)

253

On-Chip Measurement of the Jitter Transfer Function of Charge-Pump Phase-Locked Loops

260

176

B. R. Veillette, and G. W.Roberts (IEEE Journal ofSolid-State Circuits, March 1998) Part IV

Building Blocks

A Low-Noise, Low-Power VCO with Automatic Amplitude Control for Wireless Applications M.A. Margarit, J. L. Tham, R. G Meyer, and M. J. Been (IEEE Journal of Solid-State Circuits, June 1999) A Fully Integrated VCO at 2 GHz M. Zannoth, B. Kolb, J. Fenk, and R. Weigel (IEEE Journal of Solid-State Circuits, December 1998) vi

271 282

Tail Current Noise Suppression in RF CMOS VCOs RAndreani and K Sjoland {IEEE Journal ofSolid-State Circuits, March 2002)

287

Low-Power Low-Phase-Noise Differentially Tuned Quadrature VCO Design in Standard CMOS M. Tiebout {IEEE Journal of Solid-State Circuits, July 2001)

294

Analysis and Design of an Optimally Coupled 5-GHz Quadrature LC Oscillator J. van der Tang, P. van de Ven, D. Kasperkovitz, and A. van Roermund {IEEE Journal of Solid-State Circuits, May 2002)

301

A 1.57-GHz Fully Integrated Very Low-Phase-Noise Quadrature VCO P. Vancorenland and M. S. J Steyaert {IEEE Journal of Solid-State Circuits, May 2002)

306

A Low-Phase-Noise 5GHz Quadrature CMOS VCO Using Common-Mode Inductive Coupling S. L. J. Gierkink, S. Levantino, R. C. Frye, and V. Boccuzzi {European Solid-State Circuits Conference, September 2002)

310

An Integrated 10/5GHz Injection-Locked Quadrature LC VCO in a 0.18jjLm Digital CMOS Process A. Ravi, K. Soumyanath, L. R. Carley, and R. Bishop {European Solid-State Circuits Conference, September 2002)

314

Rotary Traveling-Wave Oscillator Arrays: A New Clock Technology J. Wood and S. Lipa {IEEE Journal of Solid-State Circuits, November 2001)

318

35-GHz Static and 48-GHz Dynamic Frequency Divider IC's Using 0.2-jjum AlGaAs/GaAs-HEMT's Z. Lao, W. Bronner, A. Thiede, M. Schlechtweg, A. Hulsmann, M. Rieger-Motzer, G. Kaufel, B. Raynor, and M. Sedler {IEEE Journal of Solid-State Circuits, October 1997)

330

Superharmonic Injection-Locked Frequency Dividers H. R. Rategh and T. H. Lee {IEEE Journal of Solid-State Circuits, June 1999)

337

A Family of Low-Power Truly Modular Programmable Dividers in Standard 0.35-|xm CMOS Technology C. S. Vaucher, I. Ferencic, M. Locher, S. Sedvallson, U. Voegeli, and Z Wang {IEEE Journal of Solid-State Circuits, July 2000)

346

A 1.75-GHz/3-V Dual-Modulus Divide-by-128/129 Prescaler in 0.7-|mm CMOS J. Craninckx and M. S. J. Steyaert {IEEE Journal of Solid-State Circuits, July 1996)

353

A 1.2 GHz CMOS Dual-Modulus Prescaler Using New Dynamic D-Type Flip-Flops B. Chang, J Park, and W Kirn {IEEE Journal of Solid-State Circuits, May 1996)

361

High-Speed Architecture for a Programmable Frequency Divider and a Dual-Modulus Prescaler P. Larsson {IEEE Journal of Solid-State Circuits, May 1996)

365

A 1.6-GHz Dual Modulus Prescaler Using the Extended True-Single-Phase-Clock CMOS Circuit Technique (E-TSPC) J N. Soares, Jr. and W A. M. Van Noije {IEEE Journal of Solid-State Circuits, January 1999)

370

A Simple Precharged CMOS Phase Frequency Detector

376

H. O. Johansson {IEEE Journal of Solid-State Circuits, February 1998) Part V

Clock Generation by PLLs and DLLs

A 320 MHz, 1.5 mW @ 1.35 V CMOS PLL for Microprocessor Clock Generation V von Kaenel, D. Aebischer, C. Piguet, and E. Dijkstra {IEEE Journal of Solid-State Circuits, Nov. 1996) A Low Jitter 0.3-165 MHz CMOS PLL Frequency Synthesizer for 3 V/5 V Operation H. C Yang, L. K. Lee, and R. S. Co {IEEE Journal of Solid-State Circuits, April 1997)

VII

383 391

Low-Jitter Process-Independent DLL and PLL Based on Self-Biased Techniques 1 G. Maneatis (IEEE Journal ofSolid-State Circuits, Nov. 1996)

396

A Low-Jitter PLL Clock Generator for Microprocessors with Lock Range of 340-612 MHz D. W. Boerstler (IEEE Journal of Solid-State Circuits, April 1999)

406

A 960-Mb/s/pin Interface for Skew-Tolerant Bus Using Low Jitter PLL S Kim, K. Lee, Y Moon, D.-K. Jeong, Y Choi, and H K. him (IEEE Journal of Solid-State Circuits, May 1997)

413

Active GHz Clock Network Using Distributed PLLs V Gutnik and A. P Chandrasakan (IEEE Journal of Solid-State Circuits, Nov. 2000)

422

A Low-Noise Fast-Lock Phase-Locked Loop with Adaptive Bandwidth Control J. Lee andB. Kim (IEEE Journal of Solid-State Circuits, August 2000)

430

A Low-Jitter 125-1250-MHz Process-Independent and Ripple-Poleless 0.18-|xm CMOS PLL Based on a Sample-Reset Loop Filter A.Maxim, B. Scott, E. M. Schneider, M. L. Hagge, S. Chacko, and D. Stiurca (IEEE Journal of Solid-State Circuits, Nov. 2001) A Dual-Loop Delay-Locked Loop Using Multiple Voltage-Controlled Delay Lines Y-JJung, S.-W.Lee, D. Shim, W.Kim, and C Kim (IEEE Journal of Solid-State Circuits, May 2001) An All-Analog Multiphase Delay-Locked Loop Using a Replica Delay Line for Wide-Range Operation and Low-Jitter Performance Y. Moon, J Choi, K. Lee, D.-K. Jeong, and M.-K. Kim (IEEE Journal of Solid-State Circuits, March 2000)

439

449

456

A Semidigital Dual Delay-Locked Loop S. Sidiropoulos and M. A. Horowitz (IEEE Journal of Solid-State Circuits, Nov. 1997)

464

A Wide-Range Delay-Locked Loop with a Fixed Latency of One Clock Cycle H.-H. Chang, J.-W. Lin, C-Y Yang, and S.-I Liu (IEEE Journal of Solid-State Circuits, August 2002)

474

A Portable Digital DLL for High-Speed CMOS Interface Circuits B. W. Garlepp, K S Donnelly, J. Kim, P. S. Chan, J L Zerbe, C Huang, C V Tran, C. L. Portmann, D. Stark, Y-F. Chan, T. H. Lee, and M. A Horowitz (IEEE Journal of Solid-State Circuits, May 1999)

481

CMOS DLL-Base 2-V 3.2-ps Jitter 1-GHz Clock Synthesizer and Temperature-Compensated Tunable Oscillator C J. Foley and M. P Flynn (IEEE Journal of Solid-State Circuits, March 2001)

493

A 1.5 V 86 mW/ch 8-Channel 622-3125-Mb/s/ch CMOS SerDes Macrocell with Selectable Mux/Demux Ratio F. Yang, J. O 'Neill, P Larsson, D. Inglis, and J. Othmer (Dig. International Solid-State Circuits Conference, Feb. 2002)

499

A Register-Controlled Symmetrical DLL for Double-Data-Rate DRAM F Lin, J Miller, A. Schoenfeld, M. Ma, and R. J Baker (IEEE Journal of Solid-State Circuits, April 1999)

502

A Low-Jitter Wide-Range Skew-Calibrated Dual-Loop DLL Using Antifuse Circuitry for High-Speed DRAM S. J Kim, S. H. Hong, J.-K. Wee, J. H Cho, P. S. Lee, J. H Ahn, and J Y Chung (IEEE Journal of Solid-State Circuits, June 2002)

506

Part VI

RF Synthesis

An Adaptive PLL Tuning System Architecture Combining High Spectral Purity and Fast Settling Time C S. Vaucher (IEEE Journal of Solid-State Circuits, April 2000)

517

A 2-V 900-MHz Monolithic CMOS Dual-Loop Frequency Synthesizer for GSM Receivers W.S.T. Yan and H C Luong (IEEE Journal of Solid-State Circuits, Feb. 2001)

530

viii

A CMOS Frequency Synthesizer with an Injection-Locked Frequency Divider for a 5-GHz Wireless LAN Receiver H R. Rategh, H Samavati, and T. H Lee {IEEE Journal ofSolid-State Circuits, May 2000)

543

A 2.6-GHz/5.2-GHz Frequency Synthesizer in 0.4-|xm CMOS Technology C. Lam and B. Razavi (IEEE Journal of Solid-State Circuits, May 2000)

551

Fast Switching Frequency Synthesizer with a Discriminator-Aided Phase Detector C.-Y. Yang and S.-L Liu (IEEE Journal of Solid-State Circuits, Oct. 2000)

558

Low-Power Dividerless Frequency Synthesis Using Aperture Phase Detection A. R. Shahani, D. K. Shaeffer, S. S. Mohan, H Samavati, H R. Rategh, M. del M. Hershenson, M. Xu, C. P Yue, D. J Eddleman, M A. Horowitz, and T. H Lee (IEEE Journal of Solid-State Circuits, Dec. 1998)

566

A Stabilization Technique for Phase-Locked Frequency Synthesizers T.-C. Lee and B. Razavi (Dig. Symposium on VLSI Circuits, June 2001)

574

A Modeling Approach for X-A Fractional-TV Frequency Synthesizers Allowing Straightforward Noise Analysis M. H Perrott, M. D. Trott, and C G. Sodini (IEEE Journal of Solid-State Circuits, Aug. 2002)

578

A Fully Integrated CMOS Frequency Synthesizer with Charge-Averaging Charge Pump and Dual-Path Loop Filter for PCS- and Cellular-CDMA Wireless Systems Y Koo, H Huh, Y Cho, J Lee, J Park, K Lee, D.-K. Jeong, and W. Kim (IEEE Journal of Solid-State Circuits, May 2002)

589

A 1.1-GHz CMOS Fractional-TV Frequency Synthesizer With a 3-b Third-Order 2-A Modulator W.Rhee, B.-S. Song, and A. AH (IEEE Journal of Solid-State Circuits, Oct. 2000)

596

A 1.8-GHz Self-Calibrated Phase-Locked Loop with Precise I/Q Matching C.-H. Park, O. Kim, and B. Kim (IEEE Journal of Solid-State Circuits, May 2001)

603

A 27-mW CMOS Fractional-TV Synthesizer Using Digital Compensation for 2.5-Mb/s GFSK Modulation M. H Perrott, T. L Tewksbury III, and C G. Sodini (IEEE Journal of Solid-State Circuits, Dec. 1997)

610

A CMOS Monolothic 2A-Controlled Fractional-N Frequency Synthesizer for DSC-1800

622

B. De Mauer and M. S. J Steyaert (IEEE Journal of Solid-State Circuits, July 2002) Part VII

Clock and Data Recovery

A 2.5-Gb/s Clock and Data Recovery IC with Tunable Jitter Characteristics for Use in LAN's and WAN's K. Kishine, N. Ishihara, K Takiguchi, and H Ichino (IEEE Journal of Solid-State Circuits, June 1999) Clock/Data Recovery PLL Using Half-Frequency Clock M. Ran, T. Oherst, R. Lares, A. Rothermel, R. Schweer, and N. Menoux (IEEE Journal of Solid-State Circuits, July 1997)

635 643

A 0.5-jxm CMOS 4.0-Gbit/s Serial Link Transceiver with Data Recovery Using Oversampling C.-K. K. Yang, R. Farjad-Rad, andM.A. Horowitz (IEEE Journal of Solid-State Circuits, May 1998)

647

A 2-1600-MHz CMOS Clock Recovery PLL with Low- Vdd Capability P Larsson (IEEE Journal of Solid-State Circuits, Dec. 1999)

656

SiGe Clock and Data Recovery IC with Linear-Type PLL for 10-Gb/s SONET Application Y M. Greshishchev and P Schvan (IEEE Journal of Solid-State Circuits, Sept. 2000)

666

A Fully Integrated SiGe Receiver IC for 10-Gb/s Data Rate Y M. Greshishchev, P Schvan, J L Showell, M.-L Xu, J J Ojha, andJ E. Rogers (IEEE Journal of Solid-State Circuits, Dec. 2000)

673

ix

A 10-Gb/s CMOS Clock and Data Recovery Circuit with a Half-Rate Linear Phase Detector J. Savoj and B. Razavi (IEEE Journal of Solid-State Circuits, May 2001)

681

A 10-Gb/s CMOS Clock and Data Recovery Circuit with Frequency Detection J. Savoj and B. Razavi (Dig. International Solid-State Circuits Conference, Feb. 2001)

688

A 10-Gb/s CDR/DEMUX with LC Delay Line VCO in 0.18p,m CMOS J. E. Rogers andJ. R. Long (Dig. International Solid-State Circuits Conference, Feb. 2002)

691

A 40-Gb/s Integrated Clock and Data Recovery Circuit in a 50-GHz/y, Silicon Bipolar Technology M. Wurzer, J. Bock, H. Knapp, W.Zirwas, E Schumann, and A. Felder (IEEE Journal of Solid-State Circuits, Sept. 1999)

694

A Fully Integrated 40-Gb/s Clock and Data Recovery IC With 1:4 DEMUX in SiGe Technology M. Reinhold, C. Dorschky, E. Rose, R. Pullela, P. Mayer, E Kunz, Y Baeyens, T. Link, andJ-P. Mattia (IEEE Journal of Solid-State Circuits, Dec. 2001)

699

Clock and Data Recovery IC for 40-Gb/s Fiber-Optic Receiver G. Georgiou, Y. Baeyens, Y-K. Chen, A. H. Gnauck, C. Gropper, P. Paschke, R. Pullela, M. Reinhold, C Dorschky, J.-P. Mattia, T. Winkler von Mohrenfels, and C Schulien (IEEE Journal of Solid-State Circuits, Sept. 2002)

707

Index

713

Devices and Circuits for Phase-Locked Systems Behzad Razavi

Abstract—This turtorial deals with the design of devices such as varactors and inductors and circuits such as ring and LC oscillators. First, MOS varactors are introduced as a means of frequency control for low-voltage circuits and their modeling issues are discussed. Next, spiral inductors are studied and various geometries targetting improved Q or higher self-resonance frequencies are presented. Noisetolerant ring oscillator topologies are then described. Finally, a procedure for the design of LC oscillators is outlined. The design of phase-locked systems requires a thorough understanding of devices, circuits, and architectures. Intended as a continuation of [1], this tutorial provides an overview of concepts in device and circuit design for phase-locking in digital, broadband, and RF systems. I. PASSIVE DEVICES

design of the stage(s) driven by the VCO. On the other hand, to avoid forward-biasing the varactors significantly, Vx and Vy must remain above approximately Vcont — 0.4 V. Thus, the peak-to-peak swing at each node is limited to about 0.8 V. Note that the cathode terminals of the varactors also introduce substantial n-well capacitance at X and Y, further constraining the tuning range. In contrast to pn junctions, MOS varactors are immune to forward biasing while exhibiting a sharper C-V characteristic and a wider dynamic range. If configured as a capacitor [Fig. 2(a)], a MOSFET suffers from both a nonmonotonic C-V beCGs

G

S

The demand for low-noise PLLs has encouraged extensive research on active and passive devices. In this section, we study varactors and inductors as essential components of LC oscillators. A. Varactors As supply voltages scale down, pn junctions become a less attractive choice for varactors. Specifically, two factors limit the dynamic range of pn-junction capacitances: (1) the weak dependence of the capacitance upon the reverse bias voltage, e.g., Cj = C ; o/(1 + VR/^B)"1, where m w 0.3.; and (2) the narrow control voltage range if forward-biasing the varactor must be avoided. As an example, consider the LC oscillator shown in Fig. 1. It is desirable to maximize the voltage swings at nodes X and

Accumulation

Strong Inversion

^TH

VGS

(a) ^var

Depletion

Accumulation

p-substrate 0

vQS

(b)

Fig. 2. (a) Simple MOSFET operating as capacitor, (b) MOS varactor.

havior and a high channel resistance in the region between accumulation and strong inversion. To avoid these issues, an "accumulation-mode" MOS varactor is formed by placing an NMOS device inside an n-well [Fig. 2(b)]. Providing an Voo ohmic connection between the source and drain for all gate voltages, the n-well experiences depletion of mobile charges under the oxide as the gate voltage becomes more negative. Thus, the varactor capacitance, Cvar, (equal to the series comX Y bination of the oxide capacitance and the depletion region capacitance) varies as shown in Fig. 2(b). Note that for a "cont sufficiently positive gate voltage, Cvar approaches the oxide capacitance. Fig. 1. LC oscillator using pn-junction varactors. The design of MOS varactors must deal with two important Y so as to both minimize the relative phase noise and ease the issues: (1) the trade-off between the dynamic range and the

channel resistance, and (2) proper modeling for circuit simulations. We now study each issue. Dynamic Range Deep-submicron MOSFETs exhibit susbtantial overlap capacitance between the gate and source/drain terminals. For example, in a typical 0.13-/mi technology, a transistor having minimum channel length, Lmin, displays an overlap capacitance of 0.4 fF/^m and a gate-channel capacitance of 12 fF/fim2. In other words, for an effective channel length of 0.12 /im and a given width, the overlap capacitance between the gate and source/drain terminals of a varactor constitutes 2 x 0.4 fF /(0.12 x 12 fF+2 x 0.4 fF) « 36% of the total capacitance. Thus, even if the gate-channel component varies by a factor of two across the allowable voltage range, the overall dynamic range of the capacitance is given by (0.12 x 12 fF+2 x 0.4 ff)/(0.12 x 6 fF +2 x 0.4 fF) = 1.47. In order to widen the varactor dynamic range, the transistor length can be increased, thereby raising the voltage-dependent component while maintaining the overlap capacitance relatively constant. This remedy, however, leads to a greater resistance between the source and drain, lowering the Q. The resistance reaches a maximum for the most negative gate-source voltage, at which the depletion region's width is maximum and the path through the n-well the longest (Fig. 3).1 Note that

C

var

''max

Cmin VGS

0

Fig. 4. Typical MOS varactor characteristic.

circuits in terms of voltages and currents (e.g., SPICE) interpret the nonlinear capacitance equation correctly. On the other hand, programs that represent the behavior of capacitors by charge equations (e.g., Cadence's Spectre) require that the model be transformed to a Q-V relationship [3]: Qv

=

I CvardVGS

(2)

Cmax — Cmin T , . i

=

2

(-'max + l^min T ,

+

~

.,

. VGS J

Vo In cosh(a + - y - ) VGS,

(3)

which is then used to compute *var —

dt

(4)

If used in charge-based analyses, Eq. (1) typically overestimates the tuning range of oscillators. p-substrate Fig. 3. Effect of n-well resistance in MOS varactor.

the total equivalent resistance that appears in series with the varactor is equal to 1/12 of the drain-source resistance. This is because shorting the drain and source lowers the resistance by a factor of 4 and the distributed nature of the capacitance and resistance reduces it by another factor of 3 [2]. Depending on both the phase noise requirements and the Q limitations imposed by inductors, the varactor length is typically chosen between Lmin and3L m t n .

B. Inductors The design of monolithic inductors has been studied extensively. The parameters of interest include the inductance, the Q, the parasitic capacitance (i.e., the self-resonance frequency, fsR), and the area, all of which trade with each other to some extent. For a spiral structure such as that in Fig. 5, the line width, the line spacing, the number of turns, and the outer

Modeling The C-V characteristics of MOS varactors can be approximated by a hyperbolic tangent function with reasonable accuracy. Using the characteristic shown in Fig. 4 and noting that tanh(±oo) = ± 1, we can write „ , T / .. Cmax ~ Cn Cvar{VGS) = ~

, , , VGS x . Cmax + Cmin • tanh(a+—- )+

0)

Here, a and Vo allow fitting for the intercept and the slope, respectively, and C m , n includes the overlap capacitance. The above model yields different characteristics in different circuit simulation programs! Simulation tools that analyze 1

Fortunately, the capacitance reaches a minimum at this point, and the Q degrades only gradually.

Fig. 5. Spiral inductor.

dimension are under the designer's control, chosen so as to obtain the required performance. Quality Factor The quality factor of monolithic inductors has been the subject of many studies. Before considering the phenomena that limit the Q, it is important to select a useful

and clear definition for this quantity. For a simple inductor operating at low frequencies, the Q is denned as

where Rs denotes the metal series resistance. In analogy with this expression, a more general definition is sometimes given as

_

lm(ZL)

where ZL represents the overall impedance of the inductor at the frequency of interest. While reducing to Eq. (5) at low frequencies, this definition yields Q = 0 if the inductor resonates with its own capacitance and/or any other capacitance. This is because at resonance, the impedance is purely resistive. Since nearly all circuits employ inductors in a resonance mode,2 this expression fails to provide a meaningful measure of inductor performance in circuit design. A more versatile definition assumes that a resonant tank can be represented by a parallel combination [Fig. 6(a)], yielding Fig. 7. Inductor loss mechanisms: (a) metal resistance, (b) substrate loss due to electric coupling, (c) substrate loss due to magnetic coupling.

Q=f^-.

(7)

where WR is the resonance frequency. Note that the tank reduces to Rp at u> = UR, exhibiting a finite (rather than zero) Q. Hereafter, we consider the behavior of inductors at or near resonance. i/

Lp LP

Rp

CP Vout

CP V,n'

(a)

RP

(2) the flow of displacement current through the series combination of the inductor's parasitic capacitance and the substrate resistance; (3) the flow of magnetically-induced ("eddy") currents in the substrate resistance. At low frequencies, the dc resistance is dominant, and as the frequency rises, the other components begin to manifest themselves. With the above observations in mind, let us construct a circuit model for inductors. Depicted in Fig. 8(a) is a simple model where Rs denotes the series resistance at the frequency L

«1

C2

<M

(b)

R

Fig. 6. (a) Parallel tank for definition of Q, (b) common-source stage using a tank.

The utility of Eq. (7) can be seen in the example illustrated in Fig. 6(b). Here, the knowledge of Lp and Q = Rp/(LpL>ji) directly provides the voltage gain and the output swing, whereas the Q given by Eq. (6) serves no purpose. The Q of inductors is limited by resistive losses: parasitic resistances dissipate a fraction of the energy that is reciprocated between the inductor and the capacitor in a tank. Note that the finite Q is also accompanied by generation of noise. For example, in the circuit of Fig. 6(b), Rp produces an output noise voltage of V£ = AkTRp = AkTQLpu>n per unit bandwidth if Lp resonates with Cp. The losses in inductors arise from three mechanisms (Fig. 7): (1) the series resistance of the spiral, including both lowfrequency resistance and current crowding due to skin effect; 2

One exception is inductive degeneration in low-noise amplifiers.

Rs

R

c2

Ci

P

S1

«S2

(a)

c3:

"si

*S2

C4

(b)

Fig. 8. (a) Inductor model including magnetic coupling to substrate, (b) simplified model.

of interest, Rsi and Rs2 represent the substrate resistance through which the diplacement current flows, the transformer models magnetic coupling to the substrate, and Rp is the substrate resistance through which the eddy currents flow. This model reveals how the Q drops at high frequencies. As the impedance of C\ and Ci falls, Rs\ and Rs2 appear as a constant resistance in parallel with the inductor, lowering the Q as u rises. Similarly, at high frequencies, the effect of Rp becomes relatively constant, shunting Lp and further reducing theQ. In practice, the model of Fig. 8(a) is modified as shown in Fig. 8(b) to both allow an easier fit to measured data and

account for the substrate capacitance. The model is usually assumed to be symmetric, i.e., C\ = C2,C3 = C4, and Rsi = Rs2, implying that the equivalent parasitic capacitance, Ceq, is one-half of the total capacitance, Ctot, if one end of the inductor is grounded. This result, however, is not correct because the distributed nature of the structure yields Ceq = Ctot/3 in this case [5]. To avoid this inaccuracy, the inductor must be modeled as a distributed network [5]. Characterization Most inductor modeling programs provide limited capabilities in terms of the type of structure that they can analyze or the maximum frequency at which their results are valid. For this reason, it is often necessary to fabricate and characterize monolithic inductors and use the results to revise the simulated models, thereby obtaining a better fit. Owing to the need for precise measurements at high frequencies, inductors are typically characterized by direct onwafer probing. High-speed coaxial probes having a tightlycontrolled 50-£2 characteristic impedance and a low loss are positioned on pads connected to the inductor. Figure 9(a) shows an example where one end of the spiral is tied to the

VDD LP

Lp "out

M2

»1

'ss

Fig. 10. Setup for "in-situ" measurement of Q.

until the circuit fails to oscillate. For such value of Rp, we have Q = Rp/(Lu). Of course, this technique assumes that the value of the inductor and the oscillation frequency are known. The above method proves useful if (a) thefrequencyof interest is so high and/or the inductance so low that direct measurements are difficult, or (b) an oscillator has been fabricated but the inductors are not available individually, requiring "in-situ" measurement of the Q. Note that other oscillator parameters such as phase noise and ouput swing are also functions of Q, but it is much more straightforward to place the circuit at the edge of oscillation than to calculate the Q from phase noise or output swing measurements. Choice of Geometry The design of inductors begins with the choice of the geometry. Shown in Fig. 11 are two commonlyused structures. The asymmetric spiral of Fig. 11 (a) exhibits

(a)

(b)

Fig. 9. (a) On-wafer measurement of inductor using coaxial probe, (b) calibration structure.

"signal" (S) pad and the other to the "ground" (G) pads. The signal pad is sensed by the center conductor and the ground pads by the outer shield of the coaxial probe. Since the capacitance of the pads and the wires connecting to the spiral is typically significant, the test device is accompanied by a calibration structure [Fig. 9(b)], where the spiral itself is omitted. The scattering (S) parameters of both structures are measured by means of a network analyzer across the band of interest and subsequently converted to Y parameters. Subtraction of the Y parameters of the calibration geometry from those of the device under test yields the actual characteristics of the spiral. An alternative method of measuring the Q of inductors is illustrated in Fig. 10. Here, inductors are incorporated in an oscillator and the tail current can be controlled externally. In the laboratory measurement, the output is monitored on a spectrum analyzer while Iss is reduced so as to place the circuit at the edge of oscillation. Next, the value of Iss thus obtained is used in the simulation of the oscillator and the equivalent parallel resistance of each tank, Rp, is lowered

(a)

(b)

Fig. 11. (a) Asymmetric and (b) symmetric inductors.

a moderate Q, about 5 to 6 at 5 GHz, and its interwinding capacitance does not limit the self-resonance frequency because adjacent turns sustain a small potential difference. The line spacing is therefore set to the minimum allowed by the technology. The symmetric geometry of Fig. 11 (b) provides a greater Q if stimulated differentially [4], about 7 to 10 at 5 GHz, but its interwinding capacitance is typically quite significant because of the large voltage difference between adjacent turns. For this reason, the line spacing is chosen to be twice or three times the minimum allowable value, lowering thefringecapacitance considerably but degrading the Q slightly. In differential circuits, the use of symmetric inductors appears to save area as well. For example, two asymmetric 1-nH inductors can be replaced by a symmetric 2-nH structure, which occupies less area. However, a cascade of differential stages employing multiple symmetric inductors [Fig. 12(a)] faces routing difficulties. As illustrated in Fig. 12(b), the signal lines must travel across the spirals, impacting the

Voo

cantly. However, the capacitance between the spirals may limit the self-resonance frequency. For the two-layer structure of Fig. 13(a), the overall equivalent capacitance is given by [5]



4Cl+C

12

>

(8)

Thus, if the bottom layer is moved down [Fig. 13(b)], then Ceq falls considerably. For example, in a typical 0.13-/im CMOS technology having eight metal layers, the geometry of Fig. 13(b) exhibits one-fifth as much as capacitance as the structure in Fig. 13(a)does. Stacked structures use lower metal layers, which typically suffer from a greater sheet resistance than the topmost layer. As explained below, the resistance can be reduced by placing spirals in parallel. Figure 14 illustrates three other configurations aiming to improve the quality factor. In Fig. 14(a), multiple spirals are

(a)

(b)

^eg

(c)

Fig. 12. (a) Cascade of inductively-loaded differential pairs, (b) layout of first stage using a symmetric inductor, (c) layout of first stage using asymmetric inductors.

performance of the inductors. Furthermore, the power and ground lines must either cross the spirals or go around with adequate spacing. With asymmetric inductors, on the other hand, the lines can be routed as shown in Fig. 12(c), leaving the inductors undisturbed. Note that B\ is quite larger than B2 because the symmetric structure must provide an inductance twice that of each asymmetric spiral. Thus, the signal lines in Fig. 12(b) are longer. The two geometries of Fig. 11 can also be converted to stacked structures, wherein spirals in different metal layers are placed in series so as to achieve a greater inductance per unit area. Figure 13(a) depicts an example using metal 8 and metal

(a)

(b)

Fig. 13. Stack of (a) metal 8 and metal 7 spirals, (b) metal 8 and metal 3 spirals.

7 spirals. The total inductance is equal to L\ + L2+2M, where M denotes the mutual coupling between L \ and L2. Owing to the strong magnetic coupling, the value of M is close to L j and L2, suggesting a fourfold increase in the overall inductance as a result of stacking. In the general case, n stacked identical spirals raise the inductance by a factor of approximately n2. Stacking reduces the area occupied by inductors signifi-

Fig. 14. (a) Parallel combination of spirals to reduce metal resistance, (b) tapered metal width, (c) patterned shield.

placed in parallel so as to reduce the series resistance, but at the cost of larger capacitance to the substrate. Nonetheless, in a typical process having eight metal layers, metal 6 capacitance is about 30% greater than that of metal 8. Since metal 8 is typically twice as thick as metal 6 or metal 7, this topology lowers the series resistance by twofold while raising the parasitic capacitance by 30%. By the same token, in the stacked structure of Fig. 13(b), addition of a metal 2 spiral in parallel with the metal 3 layer decreases the overall resistance by 30% while increasing the equivalent capacitance by about 15%. At frequencies above 5 GHz, the skin depth of aluminum falls below 2 fim, making the parallel combination of spirals less effective. Electromagneticfieldsimulations may therefore be necessary to determine the optimum configuration. The structure in Fig. 14(b) employs tapering of the line width to reduce the resistance of the outer turns. The idea is to maintain a relatively constant inductance-resistance product per turn, achieving a slightly higher Q for a given inductance and capacitance. Unfortunately, most inductor simulation programs cannot analyze such a geometry. Shown in Fig. 14(c) is a method of lowering the loss due to the electric coupling to the substrate. A heavily-conductive

shield is placed under the spiral and connected to ground so that the displacement current flowing through the inductor's bottom-plate capacitance does not experience resistive loss. To stop the flow of magnetically-induced currents, the shield is broken regularly. Note that eddy currents still flow through the substrate, dissipating energy. The conductive shield in Fig. 14(c) may be realized in n~ well, n + , or p + diffusion, polysilicon, or metal, thus bearing a trade-off between the parasitic capacitance and the Q enhancement. The resulting increase in the Q depends on the frequency of operation and the type of shield material, falling in the range of 5 to 10%. Thus far, we have studied square spirals. However, for a given inductance value, a circular structure exhibits less series resistance. Since mask generation for circles is more difficult, some inductors are designed as octagonal geometries to benefit from a slightly higher Q.

rates or retain a low-frequency clock in the "sleep" mode; (2) ring oscillators occupy substantially less area than LC topologies do, an important issue if many oscillators are used; (3) the behavior of ring oscillators across process, supply, and temperature corners is predicted with reasonable accuracy by standard MOS models, whereas the design of LC oscillators heavily relies on inductor and varactor models. In mostly-digital systems such as microprocessors, ring oscillators experience considerable supply and substrate noise, making differential topologies desirable. Figure 15(a) shows an example of a differential gain stage that allows several

C. MOS Transistors

vln

The modeling of MOSFETs for analog and high-frequency design continues to pose challenging problems as sub-0.1fim generations emerge. BSIM models provide reasonable accuracy for phase-locked system design, with the exception that their representation of thermal and flicker noise may err considerably. This issue becomes critical in the prediction of oscillator phase noise. The thermal noise arising from the channel resistance is usually represented by a current source tied between the source and drain and having a spectral density 1% = 4kTjgm, where 7 is the excess noise coefficient. For long-channel devices, 7 = 2 / 3 , but for submicron transistors, j may reach 2.5 to 3. Since some MOS models lack an explicit 7 parameter that the user can set, it is often necessary to artificially raise the effective value of 7 in circuit simulations. For linear, timevariant circuits, this can be accomplished using a noise copying technique [6]. However, the time variance of currents and voltages in oscillators make it difficult to apply this method. As a first-order approximation, the contribution of the transistors to the overall phase noise can be increased by a factor equal to 2.5/(2/3) before all of the noise components are summed.3 The flicker noise parameters are usually obtained by measurements. It is therefore important to check the validity of the device models by comparing measured and simulated results. Owing to their buried channel, PMOS transistors exhibit substantially less flicker noise than NMOS devices even in deep submicron technologies.

VDD

\M5

VDD

"e

M5

MB

M4

Af3 X

"out"

X*

M2

.My

'ss

^cont

(b)

(a)

Fig. 15. (a) Differential stage for use in a ring oscillator, (b) effect of supply noise.

decades of frequency tuning with relatively constant voltage swings. Here, M5 and M$ define the output common-mode (CM) level while M 3 and M4 pull nodes X and Y to VDD, maintaining a constant voltage swing even at low current levels. Unlike a simple differential pair, the stage of Fig. 15(a) does respond to input CM noise even with an ideal Iss . This is because the gate voltages of M 3 and M 4 are referenced to VDD , introducing a change in the drain currents if the input CM level varies. In the presence of asymmetries, such a change results in a differential component af the output. Nevertheless, since the input CM level of each stage in the ring is referenced to VDD by the diode-connected PMOS devices in the preceding stage [Fig. 15(b)], the oscillator exhibits low sensitivity to supply voltage. Figure 16(a) depicts another ring oscillator topology that has become popular in low-voltage digital systems. Here, the VDD

VDD

M2 X{

Vcont

Mz

"1 'DD /1

II. RING OSCILLATORS Despite their relative high noise and poor drive capability, ring oscillators are used in many high-speed applications. Several reasons justify this popularity: (1) in some cases, the oscillator must be tuned over a wide frequency range (e.g., one decade) because the system must support different data 3

In reality, the effective value of 7 also depends on the drain-source voltage to some extent, further complicating the matter.

CB Kcont

(a)

(b)

Fig. 16. (a) Constant-current ring oscillator, (b) transistor-level implementation of (a).

inverters in the ring are supplied by a current source, IQD,

rather than a voltage source, and frequency tuning is also accomplished through IDD- If IDD is designed for low sensitivity to VDD, then the oscillator remains relatively immune to supply noise—the principal advantage of this configuration over standard inverter-based rings that are directly connected to the supply voltage. In practice, the nonidealities associated with IDD limit the supply rejection. Shown in Fig. 16(b) is a transistor implementation where M\ operates as a contolled current source. If I\ is constant, V\ tracks VDD variations whereas Vy does not, yielding a change in IDD through channel-length modulation in M\. Choosing long channels for M\ and Mi alleviates this issue while necessitating wide channels as well to allow a relatively small drain-source voltage for M\. However, the resulting high drain junction capacitance of M\ at Y creates a low-impedance path from VDD to this node at high frequencies. To suppress both resistive and capacitive feedthrough of VDD noise, a bypass capacitor, CB, is tied from Y to ground. However, the pole associated with this node now enters the VCO transfer function, complicating the design of the PLL. Let us now study the response of the circuit of Figs. 15(a) and 16 to substrate noise, VSub- In the former, V8Ub manifests itself through two mechanisms (Fig. 17): (1) by modulating the drain junction capacitance of M\ and M2 and hence <

M2

*1

v*

Vln

cP Vsub

p

"h

fsub

/ss f

Fig. 17. Effect of substrate noise on a differential stage.

the delay of the stage (a static effect); and (2) by injecting a common-mode displacement current through Cp (a dynamic effect). If injected slightly before or after the zero crossings of the oscillation waveform, such a current gives rise to a differential component at the drains of Mi and Mi because these transistors display unequal transconductances as they depart from equilibrium. In the circuit of Fig. 16(b), Vsub modulates both the drain junction capacitance of the NMOS devices and their threshold voltage (and hence the transition points of the waveform). Both effects are static, making the circuit susceptible even to low-frequency noise. It is instructive to determine the minimum supply voltage for the above two circuits. At the midpoint of switching, where the input and output differential voltages are around zero, the stage of Fig. 15(a) requires that VDD > | V G S P | 4- VQSN + Viss, where VQSP abd VQSN denote the gate-source voltages of M3-M4 and M\-M2, respectively, and Viss is the minimum voltage necessary for Iss> Interestingly, the circuit of Fig. 16(b) imposes the same minimum supply voltage. Another critical issue in the circuits of Figs. 15(a) and

16 relates to frequency tuning by means of current sources. The voltage-to-current (V/I) conversion required here presents difficulties at low supply voltages. In the example of Fig. 16(b), as Vcont rises and Vx falls, transistor M3 eventually enters the triode region, thus making I\ supply-dependent. The useful range of Vcont is therefore given by VTHN < Vcont < VDD - I VGSP\ - VTHN, suggesting the use of a wide device for Mi to minimize |PGSP|-

III. LC OSCILLATORS LC oscillators have found wide usage in high-speed and/or low-noise systems. Extensive research on inductors, varactors, and oscillator topologies has provided the grounds for systematic design, helping to demystify the "black magic." LC oscillators offer a number of advantages over ring structures: (a) lower phase noise for a given frequency and power dissipation; (b) greater output voltage swings, with peak levels that can exceed the supply voltage; and (c) ability to operate at higher frequencies. However, LC VCO design requires precise device and circuit modeling because (a) the narrow tuning range calls for accurate prediction of the center frequency; (b) the phase noise is greatly affected by the quality of inductors and varactors and the noise of transistors. Also, occupying a large area, spiral inductors pick up noise from the substrate and make it difficult to incorporate many such oscillators on one chip. The design of LC VCOs targets the following parameters: center frequency, phase noise, tuning range, power dissipation, voltage headroom, startup condition, output voltage swing, and drive capability. The last two have often received less attention, but they directly determine the design difficulty and power consumption of the stages following the oscillator. That is, a buffer placed after the VCO may consume more power than the VCO itself! A. Design Example As an example of VCO design, let us consider the topology shown in Fig. 18. Here, M\ and M2 present a small-signal negative resistance of -2/gm\}i between nodes X and Y,

*fao Cp

RP

LP

LP

RP

CP

Y

X

M2

"1

/ss

Fig. 18. LC oscillator.

compensating for the resistive loss in the tanks and sustaining oscillation. Each tank is modeled by a parallel RLC network, with all loss mechanisms lumped in Rp.4 4 For a narrow frequency range, series resistances in the tank elements can be transformed to parallel components.

The design process begins with a power budget and hence a maximum value for Iss • This is justified by the following observation. Once completed and optimized for a given power budget, the design can readily be scaled for different power levels, bearing a linear trade-off with phase noise while maintaining all other parameters constant. For example, if Iss, the width of M\ and M2, and the total tank capacitance are doubled and the inductance value is halved, the phase noise power falls by a factor of two but the frequency of oscillation and the output voltage swings remain unchanged.5 Since subsequent stages typically require the VCO core to provide a minimum voltage swing, Vmin9 we assume M\ and M 2 steer nearly all of Iss to their correponding tanks and write IssRp = Vmin- Thus, the minimum inductance value is given by =

LP

%•

(typically a buffer), CL- Thus, the allowable varactor capacitance is given by the difference between Ctot and the sum of these components:

(9)

Cvar = (LPU2)-1-CLP-CDB-CGS-4CGD-CL.

(13)

This expression gives the center value of the tolerable varactor capacitance. Of course, a negative Cvar means the inductance is excessively large, calling for a lower Lp, a smaller Rp, and hence a larger Iss. However, to steer a greater tail current, the circuit must employ wider MOS transistors, thus incurring a larger capacitance at nodes X and Y and approaching diminishing returns. This ultimately limits the frequency of oscillation in a given technology. For a given supply voltage and oscillator topology, the varactor capacitance exhibits a known dynamic range Cvar,min < CVar < Cvar,max, yielding a tuning range of u)min < u>osc <

Umax, where -

IssQu'

l 0 )

UJmin

=

IT ir

-x-r

\

(14)

where it is assumed the tank Q is limited by that of the inducy L>p\Lsvar,max ~r Ofixed) tor. Note that this calculation demands knowledge of the Q "max = . , (15) before the inductance is computed, a minor issue because for a VLP\^var,min + ^ fixed) given geometry and frequency of operation, the Q is relatively independent of the inductance. and Cfixed = CLP -f CDB + CGS + 4CGD + CL. We now determine the dimensions of Mi and M2. IncreasFigure 19(a) depicts the oscillator with MOS varactors diing the channel length beyond the minimum value allowed rectly tied to X and Y. Since the output common-mode level by the technology does not significantly lower 7 unless the V length exceeds approximately 0.5 fim. For this reason, the DD VDD vb vb minimum length is usually chosen to minimize the capaciLi L2 L2 M tance contributed by the transistors. The transistors must be *1 R2 Y X X Y wide enough to steer most of Iss while experiencing a voltage C C1 CC1 swing of Vmin at nodes X and Y. Viewing M\ and M2 as a differential pair, we note that M\ must turn off as Vx - Vy /W v 2 Mv1 Mv1 reaches Kn»n. For square-law devices, Mv2 Vcont

VminZ=

\l»nC0!w/L'

(U)

(a)

and hence

w-

2Iss

^cont

(b)

Fig. 19. LC oscillator with (a) direct coupling and (b) capacitive coupling of varactors to tanks.

(n)

a r 1/2 IT ' ^ ' but for short-channel devices, W must be obtained by simulations using proper device models. This choice of W typically guarantees a small-signal loop gain greater than unity, enabling the circuit to start at power-up. With Lp computed from Eq. (10), the total capacitance at nodes X and Y is calculated as Ctot = (Lpu2)~l. This capacitance includes the fo\\ovfingfixed components: (1) the parasitic capacitance of Lp, CLP\ (2) the drain junction, gatesource, and gate-drain capacitances of Mi and M 2 , CDB + Cos + 4CGD>6 and (3) the input capacitance of the next state 5 We assume that, at a given frequency, the Q is relatively independent of the inductance value. 6 Since CQD experiences a total voltage swing of 2V p m m, its Miller effect translates to a factor of two for each transistor.

10

is near VDD > M3 and M4 sustain only a positive gate-source voltage (if 0 < VCOnt < VDD). A S seen from the C-V characteristic of Fig. 2(b), this limitation reduces the dynamic range of the capacitance by about a factor of two. As a remedy, the varactors can be capacitively coupled to X and Y, allowing independent choice of dc levels. Illustrated in Fig. 19(b), such an arrangement defines the gate voltage of Mv\ and Mv2 by Vb « VDD/2 through large resistors R\ and R2. The coupling capacitors, Cc\ and Cci, must be chosen much greater than the maximum value of Cvar so as not to limit the tuning range. For example, if Cc\ — Cci — 5Cvartmax, then the equivalent series capacitance reaches only 5Ciar,max/(6Cvartmax) = 0.83Cvar,maar, Suffering from a 17% reduction in dynamic range. On the other hand, large coupling capacitors display significant bottom-plate capacitance,

thereby loading the oscillator and limiting the tuning range.7 It is possible to realize Cc\ and Cci as "fringe" capacitors (Fig. 20) [7] to exploit the lateral field between adjacent metal

V

DD

t.1

L2

X

Cu

Y

Cu

Cu

Cu

Fine Control Coarse Control

Coarse Control (a) 'out

Fig. 20. Fringe capacitor.

lines. This structure exhibits a bottom-plate parasitic of a few percent, but its value must usually be calculated by means of field simulators. The tuning range of LC VCOs must be wide enough to encompass (a) process and temperature variations, (b) uncertainties due to model inaccuracies; and (c) the frequency band of interest. In wireless communications, the last component makes the design particularly difficult, especially if a single VCO must cover more than one band. For example, in the Global System for Mobile Communication (GSM) standard, the transmit and receive bands span 890-915 MHz and 935-960 MHz, respectively. For one VCO to operate from 890 MHz to 960 MHz, the tuning range must exceed 7.8%. With another 7 to 10% required for variations and model inaccuracies, the overall tuning rang reaches 15 to 18%, a value difficult to achieve. In such cases, two or more oscillators may prove necessary, but at the cost of area and signal routing issues. The phase noise of each oscillator topology must be quantified carefully. The reader is referred to the extensive literature on the subject.

Fewer Capacitors Switched in

(b)

Fig. 21. (a) VCO with fine and coarse digital control, (b) resulting characteristics.

the use of NMOS devices with a gate-source voltage equal to VDD , minimizing their on-resistance. The above technique entails three critical issues. First, the trade-off between the on-resistance and junction capacitance of the MOS switches translates to another between the Q and the tuning range. When on, each switch limits the Q of its corresponding capacitor to (ROnCuu)~]• When off, each switch presents its drain junction and gate-drain capacitances, CPB + CGD, in series with Cu, constraining the lower bound of the capacitance to CU(CDB + CGD)/(CU + CDBCGD) rather than zero. In other words, wider switches degrade the overall Q to a lesser extent but at the cost of narrowing the discrete frequency steps. B. Digital Tuning The second issue relates to potential "blind" zones in the Our study thus far implies that it is desirable to maximize the characteristic of Fig. 21(b). As exemplified by Fig. 22, if the tuning range. However, for a given supply voltage, a wider tuning range inevitably translates to a greater VCO gain, Kvco, thereby making the circuit more sensitive to disturbance ("ripple") on the control line. This effect leads to larger reference sidebands in RF synthesizers and higher jitter in timing applications. With the scaling of supply voltages, the problem of high Kvco has become more serious, calling for alternative solutions. A number of circuit and architecture techniques have been Fig. 22. Blind zone resulting from insufficient fine tuning range. devised to lower the sensitivity of the VCO to ripple on the discrete step resulting from switching out one unit capacitor is control line. For example, a digital tuning mechanism can be greater than the range spanned continuously by the varactors, added to perform coarse adjustment of the frequency, allowing then the oscillator fails to assume the frequency values between the analog (fine) control to cover a much narrower range. Il- /i and f for any combination of the digital and analog controls. lustrated in Fig. 21 (a), the idea is to switch constant capacitors For this2reason, the discrete steps must be sufficiently small to into or out of the tanks, thereby introducing discrete frequency ensure overlap between consecutive bands.8 steps. The varactors then tune the frequency within each step, The third issue stems from the loop settling speed. As leading to the characteristic shown in Fig. 21(b). Note that described below, the PLL takes a long time to determine how the switches are placed between the capacitors and ground 8 rather than between the tank and the capacitors. This permits With afiniteoverlap, however, more than one combination of digital and analog controls may yield a given frequency. To avoid this ambiguity, the loop must begin with a minimum (or maximum) value of the digital control and adjust it monotonically.

7

This is relatively independent of whether the bottom plates are connected to nodes X and Y or to R\ and Rz.

11

many capacitors must be switched into the tanks. Thus, if a change in temperature or channel frequency requires a discrete frequency step, then the system using the PLL must remain idle while the loop settles. When employed in a phase-locked loop, the oscillator of Fig. 21 (a) requires additional mechanisms for setting the digital control. Figure 23 depicts an example for frequency synthesis.

REFERENCES [1] B. Razavi, "Design of Monolithic Phase-Locked Loops and Clock Recovery Circuits - A Tutorial," in Monolithic Phase-Locked Loops and Clock Recovery Circuits, B. Razavi, Ed., Piscataway, NJ: IEEE Press, 1996. [2] P. Larsson, "Parasitic Resistance in an MOS Transistor Used as On-Chip Decoupling Capacitor," IEEEJ. SolidState Circuits, vol. 32, pp. 574-576, April 1997. [3] K. Kundert, Private Communication. [4] M. Danesh et al., "A Q-Factor Enhancement Technique for MMIC Inductors," Proc. IEEE Radio Frequency Integrated Circuits Symp., pp. 217-220, April 1998. [5] A. Zolfaghari, A. Y. Chan, and B. Razavi, "Stacked Inductors and Transformers in CMOS Technology," IEEE Journal of Solid-State Circuits, vol. 36, pp. 620-628, April 2001. [6] F. Behbahani, et al., "A 2.4-GHz Low-IF Receiver for Wideband WLAN in 0.6-//m CMOS," IEEE Journal of Solid-State Circuits, vol. 35, pp. 1908-1916, December 2000. [7] O. E. Akcasu, "High-Capacity Structures in a Semiconductor Device," US Patent 5,208,725, May 1993.

VMLogic Coarse Control

VL<

Charge Pump

VCO Fine Control

Fig. 23. Synthesizer using fine and coarse frequency control.

Here, the oscillator control voltage is monitored and compared with two low and high voltages, VL and VJJ, respectively. If Vcont falls below Vi, the oscillation frequency is excessively low 9 , and one unit capacitor is switched out. Conversely, if Vcont exceeds V#, one unit capacitor is switched in. After each switching, the loop settles and, if still unlocked, continues to undergo discrete frequency steps.

9

We assume the frequency increases with VCOnt.

12

Delay-Locked Loops - An Overview Chih-Kong Ken Yang Abstract — Phase-locked loops have been used for a wide range of applications from synthesizing a desired phase or frequency to recovering the phase and frequency of an input signal. Delay-locked loops (DLLs) have emerged as a viable alternative to the traditional oscillator-based phase-locked loops. With its first-order loop characteristic, a DLL both is easier to stabilize and has no jitter accumulation. The paper describes design considerations and techniques to achieve high performance in a wide range of applications. Issues such as avoiding false lock, maintaining 50% clock duty cycle, building unlimited phase range for frequency synthesis, and multiplying the reference frequency are discussed.

the data bus, the actual sampling clock is no longer properly aligned with the data. A DLL is commonly used to lock the phase of the buffered clock to that of the input data. The phase locking significantly reduces timing uncertainty in sampling the data, which then enables higher data rates as in [3]. Although aperiodic signals can also be delayed by the delay line in a DLL, the inputs to delay lines are typically clock signals. By using a periodic signal, the delay lines do not need arbitrarily long delays and typically only need to span the period of the clock to generate all possible phases. A data signal can be delayed by sampling the data with the appropriately delayed clock. The motivation for using DLLs is that the design of the control loop is simplified by having only phase as the state variable. Section II reviews how such a loop is unconditionally stable and has better jitter characteristics. However, a DLL is not without its own limitations. The variable delay line has a finite delay range and finite bandwidth. Section II also discusses these design considerations. Section III describes different implementations of the variable delay line. Within the past ten years, modifications to the basic DLL architecture have enabled clock and data recovery applications in "plesiochronous" systems [4] where the sampling rates for clock and data differ by a few hundred parts-per-million in frequency. Delay lines with effectively infinite delay are also addressed in Section III. More recently, several researchers such as [5] and [6] have introduced architectures that permit frequency multiplication based on delay lines which further extends their use in clock generation and frequency synthesis. Section IV describes these architectures.

I. INTRODUCTION

Many applications require accurate placement of the phase of a clock or data signal. Although simply delaying the signal could shift the phase, the phase shift is not robust to variations in processing, voltage, or temperature. For more precise control, designers incorporate the phase shift into a feedback loop that locks the output phase with an input reference signal that indicates the desired phase shift. In essence, the loop is identical to a phase-locked loop (PLL) except that phase is the only state variable and that a variable-delay line replaces the oscillator. Such a loop is commonly referred to as a delay-line phase-locked loop or delay-locked loop (DLL). As with a PLL, the goals are (1) accurate phase position or low static-phase offset, and (2) low phase noise or jitter. Because a DLL does not contain an element of variable frequency, it historically has fewer applications than PLLs. Bazes in [1] demonstrated an example of precisely delaying a signal in generating the timing of the row and column access strobe signals for a DRAM. Another common application uses a DLL to generate a buffered clock that has the same phase as a weakly-driven input clock. Johnson in [2] synchronizes the timing of the buffered clock of a floating-point unit with the clock of a microprocessor. A similar application recovers the data of a parallel bus by generating a properly positioned sampling clock. Typically, these systems provide a sampling clock with the same sampling rate but with an arbitrary phase as compared to the data (i.e. a "mesochronous" system [4]). A clocked DRAM data bus is an example of such a system. A clock propagates with the data as one of the signals in the bus and therefore has a nominally known phase relationship with the data. However, in order to receive and buffer the clock to sample

II. DLL

CHARACTERISTICS

The basic loop building blocks are similar to that of a PLL: a phase detector, a filter, and a variable-delay line. Figure 1 illustrates the three main functional blocks. Since phase is the only state variable, a control loop higher than first-order is not needed to compensate a fixed phase error. The resulting transient impulse response is a simple exponential. Although the simple loop characteristics are an advantage that DLLs have over PLLs, the design is complicated by the additional circuitry that is needed to overcome having a limited delay range and not producing its own frequency. A. First-order Loop A phase detector compares the phase of the reference input and the delay-line output. The comparison yields a signal proportional to the phase error. The error is low-pass

C.K. Ken Yang is with University of California at Los Angeles, [email protected].

13

PLL

1.2 in

PD

Filter

KpD

Gp(s)

DLL

1 0.8

VC

0.6

Delay Line dly__in

*^DL

0.4

dly_put

0.2

Figure 1: DLL architecture.

& *HLOOP

°

open loop

H(s) (dB)

20

°

^eO*)

60

°

Figure 3: Step response of PLL and DLL (with same loop characteristics).

,20dB/dec the tracking of the phase of the input clock changes at different frequencies. Based on the transfer function, the loop bandwidth is (Obw = KPDKDLGF. For frequencies within the loop bandwidth the phase of the output clock will track that of the reference input and reject noise within the loop. The phase characteristics of the output clock above the bandwidth of the loop depend on the phase behavior of the delay-line input and the noise from the delay line. The noise transfer function from a noise source lumped at the delay-line output is a high-pass response.

closed loop 1

log CO

Figure 2: Open- and closed-loop transfer characteristics. filtered to produce a control voltage or current that adjusts the delay of the delay line. The delay-line input can be either the reference input or a clean clock signal. The s-domain representation of each loop element is depicted within each block in Fig. 1. The open-loop transfer function can be written as T(s) = KPDKDLGF(s) where Kprj is the phase-detector gain, Gp(s) is the filter transfer function, and KTJL *S t n e delay-line gain. If the loop has finite gain at dc, the resulting output signal will exhibit a static phase error as shown in the following equation.

^ l - o -

l + l/{KPDKDLGF(s))s

1 -I-

In some degenerate cases, the delay-line input is also the reference input. The feedback loop would guarantee a fixed phase relationship between the delay-line output and the reference so any phase variations in the reference would directly appear at the delay-line output in an all-pass response. However, noise due to the delay line is still high-pass filtered.

0 )

B. Advantages over a PLL The loop characteristics are considerably simpler than those of a PLL. A PLL would contain at least two states to store both the frequency and phase information. In order to maintain loop stability, an additional zero is needed. A DLL is less constrained with only a single pole. The loop gain directly determines the desired bandwidth. The only stability consideration is when the loop bandwidth is very near the reference frequency. The periodic sampling nature of the phase detection and the delay in the feedback loop degrade the phase margin. For instance, if the feedback delay is one reference cycle, the loop bandwidth should not exceed 1/4 of the reference frequency. Figure 3 illustrates the response to a noise step applied to the control voltage for both a PLL and a DLL. A PLL accumulates phase error due to its higher-order loop characteristic. In response to a phase error, the control

=o

To eliminate the static phase error, the filter is often an integrator to store the phase variable. This results in a first-order closed-loop transfer function.

H(s) = 1+

(s/KPDKDLGF)

(s/KpDKDLGF)

(2)

The equation assumes that the delay-line input is a clean reference as opposed to the reference input. Higher-order loop filters have not commonly been used but can enable better tracking of a phase ramp (i.e. a frequency difference). Figure 2 shows the open-loop and closed-loop transfer functions. With only a single integrator, the open-loop phase margin is 90°. The loop is unconditionally stable as long as the delay in the loop does not degrade the phase margin excessively. The closed-loop transfer function illustrates that

14

data sample Rcvr +7T data

lock point

transition sample Rcvr

Filter

0

2nd lock point

Vc

ref_clk

-71

Delay Line

Delay (V c ) Figure 5: Delay line phase/delay characteristic.

sampling clock

However, the data receiver is ultimately a binary comparator and the phase detector does not indicate an error that is proportional to the phase difference. Hence, the timing-recovery loop is nonlinear. Although a higher-order PLL using early-late control can be made conditionally stable [8], the resulting phase dithers with a limit cycle translating into jitter. The oscillation depends on the loop parameters and can be considerable for high bandwidth loops. With an early-late DLL, the phase of the clock output also dithers. But because the stability only depends on the delay within the loop, the dithering would only be a few cycles and can be significantly less than the dithering of a PLL.

Figure 4: Early-late receiver architecture using the receiver as the phase detector. Timing diagram showing early and late data.

C. Design Considerations in a DLL A typical DLL involves several design considerations. First, the delay line usually has a finite delay range. If the desired phase of the output signal is beyond the delay range, the loop will not lock properly. Second, the output of the DLL also depends greatly on the input to the delay line. Since the delay-line input propagates to the DLL output, tracking jitter and the output's duty cycle depend not only on the delay-line design but also on the delay-line input. Third, the basic DLL cannot generate new frequencies different from that of the delay-line input. A variable-delay line adjusts the delay by varying the RC time constant of a buffer and often has limited adjustment range. Section III will describe several techniques in greater detail. Even though the delay range is limited, DLLs for a periodic clock signal only need the range to exceed 2n in phase across process and systematic variations to cover all possible phases. For systems with a range of operating frequencies, the delay line must span 2K for the lowest input frequency. An issue known as false-locking occurs when the delay range exceeds IK. There can be several secondary lock points repeating every 2TC. Figure 5 depicts an example of the characteristic of a delay line with two lock points. Since phase detectors must be periodic, if the delay line initializes within 7i of the second lock point, the phase detector will push the delay line toward lock with a longer than necessary delay. Long delays require large RC time constants for a given variable-delay buffer element. The bandlimiting by the

voltage alters the frequency of an oscillator. The output phase is an integration of the frequency change. In response to a noise perturbation, the loop accumulates a phase error before correcting. In contrast, a DLL attenuates the phase error by the time constant of the loop. In the figure, both loops are designed with the same 3-dB bandwidth, the same delay elements, and the PLL is a 2nd-order loop with a damping factor of unity. Clearly, the PLL suffers from larger phase errors due to the phase accumulation. A second advantage relates to clock and data recovery applications. An effective way to recover the timing for sampling a data input is to use the data receiver as a phase detector. The architecture, depicted in Fig. 4, uses the 180°-shifted clock to sample the data transitions in addition to sampling the data values [7]. Whenever data changes values, the sampled transition and the data values can be combined to indicate whether the sampling clock edge is earlier or later than the data transition. Phase information is only present with data transitions. The feedback loop locks when the transition sampling clock samples a metastable value. This commonly used design is known as an early-late or bang-bang architecture. The timing diagram in Fig. 4 illustrates examples of the data being early and late. Due to the inherent setup time of the data receiver, the transition sampling clock may not occur at the same time as the data transition. The phase shift compensates for the receiver setup time and maximizes the margin of error for the data sampling.

15

filter would significantly attenuate a high-frequency input clock. The attenuation increases the jitter and may even prohibit the input from reaching the output. Even if the delay line is constrained to span only one lock point but greater than 2rc, a second similar issue exists. It is difficult to design a delay line such that the adjustable range is exactly *c to -tic across different operating and processing conditions. If initialized at the minimum or maximum delay, the phase detector may push the loop toward either the maximum or minimum delay limit and "false-lock" to an incorrect phase. To address false-locking, designers employ several techniques depending on the application. For systems that require a delay line with a known fixed delay, operating condition variations may be small enough such that the delay line only needs a small variable range that is less than +n and -n. For systems that lock to a fixed phase over a wide range of frequencies, one design [9] uses an auxiliary frequency-sensing loop that generates a voltage to coarsely set the delay for the given input frequency. Then DLL only fine tunes the delay for the desired phase. For data recovery applications where the clock phase can be arbitrary with respect to the data, a common design uses a startup circuit for the DLL that initializes the delay line at its minimum delay to avoid any secondary lock points. However, as mentioned earlier, the phase detector may keep the delay line at the minimum delay. A sensing circuit or a state machine detects when the delay line is at its limit and optionally inverts the feedback clock. The phase would flip by 180° and the loop would lock properly. As will be discussed in Section III, a more robust alternative reconfigures the delay line such that the delay only spans 2n and wraps back to 0° when the delay exceeds 360°. The jitter and duty cycle of the delay-line output clock depend on the input, the coupling of the input to the delay line, and the delay line itself. Often the input is from off-chip and, therefore, it must be carefully received to prevent supply and substrate noise from coupling onto the signal as jitter. In contrast, the high-frequency phase noise of the clock output of an oscillator-based PLL depends primarily on the oscillator design. An improperly received input clock can often result in worse jitter performance in a DLL as compared to a PLL. Similarly, while the duty cycle from an oscillator is only modestly distorted (by the difference between the rising edge and falling edge delays), the duty cycle of the DLL's input clock can be significantly distorted as it propagates to the output. Since duty cycle is a systematic error, a good design corrects duty cycle using an explicit block instead of compounding the difficulty of the delay-line design. A duty-cycle corrector (DCC) is commonly added to either the DLL input or output. Figure 6 illustrates the basic components of the feedback loop: an input with finite slew rate, a buffer element with adjustable threshold, a comparator, and an integrator. The comparator determines the threshold crossing of the clock waveform. The result is integrated and used to skew the threshold of the buffer stage.

clock in Vref

clock out

offset vref

Vrefl

/ clock in

\vrcf2

/

\

duty c y c l e

clock out

™ —»» reduction

*— mmmmmmm

Figure 6: Duty-cycle corrector block diagram. Timing diagram shows change in duty cycle with changing offset.

Because the buffer input has finite slew rate, changing the threshold effectively adjusts the output high and low half-periods. The loop settles when the high and low half-periods are equal. Figure 6 illustrates the reduction in duty cycle as the threshold shifts from Vrefi to Vref2. Since random variation of the duty cycle effectively appears as jitter, single-ended implementations such as that shown in the figure can be very sensitive to common-mode noise. For this reason, differential architectures are preferred [3]. For low jitter on the output clock, the loop components must be carefully designed. Many of the loop components are very similar to that of a PLL and are well described in [10]. For a charge-pump based loop filter, since the filter is only first-order, a simple capacitor replaces the RC filter. As in a PLL, noise on the control voltage directly translates into jitter. Designers may use additional filtering to suppress the noise. The loop element that has deviated the most from PLL design and is critical for functionality and performance is the design of the delay line. III. D E L A Y - L I N E ARCHITECTURES

The primary characteristics of a delay line are (1) gain (i.e. change in delay for a given change in voltage), and (2) delay range. For most applications using periodic inputs, the absolute delay is not critical as long as the range spans 27U. Because delay lines are relatively short, they do not contribute significant thermal or 1/f phase noise. However, for large digital systems, low supply/substrate sensitivity is needed to reject the on-chip switching noise.

16

vc out

a

out

in +

m

in+

-

vb~

in_

Vc (a)

capacitive control (f)

(b) V

CP

vc

out in in+

Figure 8: Delay versus voltage for two different delay buffer elements: types (d) and (f) of Fig. 7.

out

in.

For push-pull type elements such as inverters, the delay can be changed by changing the rate at which the output capacitance is charged [Fig. 7-(d)]. An adjustable current source limits the peak current of an inverter and varies the delay. An alternative method regulates the supply voltage of the inverters and uses the control voltage to set the supply voltage [Fig. 7-(e)]. The effective switching resistance varies with the supply voltage. Instead of changing the resistance, the effective capacitance can also be made adjustable [Fig. 7-(f)]. A transistor that behaves as an adjustable resistance can be used to decouple an explicit output capacitance. The larger the resistance the less capacitance is seen at the output. Figure 8 illustrates the delay versus control voltage for a resistively-controlled delay element. For the element of Fig. 7-(d), either \fcs"^th o r m e ^*as c u r r e n t c a n De z e r o an d, therefore, a single element's delay can span from the minimum buffer delay to infinite. However, since the time constant is proportional to the delay, a long delay setting would significantly attenuate a high-frequency clock. Delay lines with a wide range for high clock frequencies require a large number of broadband delay elements. Unlike resistive control, the maximum delay in a capacitively-controlled element [Fig. 7-(f)] is proportional to R(C int +C exp ) and the minimum delay is proportional to RC int where C int is the intrinsic capacitance of the buffer and the load of the subsequent stage, and C e x p is the explicit capacitance added to the circuit. Because of the limited range per buffer, obtaining a wide delay range involves a large number of buffers. The maximum delay of each buffer is chosen to avoid attenuating the signal. In designs where the clock has a large voltage swing, the transistor in series with the explicit capacitance no longer appears as a variable resistor because the device enters saturation and cut-off. For these buffers, the control voltage determines the fraction of current and period of time in which the buffer's current charges the explicit capacitance. An example of the delay versus control voltage for a capacitively-controlled element is overlaid in Fig. 8. Most

VOT

Vb~

resistive control (d)

Vc (d)

(c)

Vc

in

out

vc

in

out (e)

(f)

Figure 7: Six different delay elements.

A. Basic Delay Line A delay line comprises of a chain of variable-delay elements. Each element is controllable by either a voltage or a current. The delay of each element is proportional to its RC time constant and changing the effective resistance or capacitance adjusts the delay. Figure 7 depicts several examples of buffer elements. For a differential buffer, the load resistance can be an MOS transistor in the triode region [Fig. 7-(a)] where the resistance is proportional to Vos-^th- Varying the gate voltage adjusts the delay of the element. A non-linear device such as a diode can also serve as a load resistance [Fig. 7-(b)]. Since the resistance varies with the current, varying the bias current of the buffer would adjust the delay. Similarly, a negative transconductance that changes with the bias current can be placed in parallel with a fixed load resistance [Fig. 7-(c)]. The varying negative transconductance changes the effective load resistance and hence varies the delay. Because nonlinear elements have resistances that depend on both voltage and current, they can be more sensitive to supply noise.

17

180° Phase Detect + Filter

<*inO

& cI W>

Ckjnl

c

ck

yc

<*inl

X ck

inO

<*outO

^Io

c

clock^

t

4*90,270

f

^outOl

^0,180

in0

^outl

^outl

cllWoi

0135,315 Figure 9: 180°-locked DLL to generate intermediate phases that are a fraction of a cycle. ^45,225

I

0-oc)I0

\ JPhaselntei^Iartor, ^ delay elements exhibit some nonlinearity. As a result, the delay-line gain, K DL , is a function of the delay. Because a DLL is unconditionally stable, the loop still functions with the varying loop parameter. However, more linear elements are better for designs that require a constant loop bandwidth. To compensate for the variable K D L , designers add programmability to the loop-filter capacitor. The control signal for either type of delay elements can be digital. In a digital implementation [11], the current source is binary weighted and switched by a digital word. For capacitively-controlled elements, the capacitance can be binary weighted and switched. A nearly all-digital DLL is then possible by using a simple counter to replace the analog integrating filter.

Figure 10: Phase interpolator design by shorting of the output of two integrators/buffers..

Multiplexers are needed to select the phases to interpolate between. For example, with phases tapped from a 4-stage delay line, if the desired output clock phase is 120°, the interpolator inputs would be from the second and third delay elements. Interpolators essentially perform a weighted average of the input phases. As shown in Fig. 10, ideally, the two input phases drive two integrators which charge a single output. The weighting of the average is by the relative currents of the two integrators. When <x=l, the output clock phase depends only on ckinQ. When a=0.5, i.e. the current is split equally between the two integrators, the output phase is additionally delayed by half the phase difference. As illustrated in Fig. 10, the phase of the interpolated output (ckoutQj) falls between the phases of the non-interpolated

B. Phase Interpolation

Instead of only using the clock phase at the end of a delay line, an earlier clock phase can be tapped from the middle of a delay line. Some applications require the delay line to produce a delay that is a fixed fraction of the input-clock period. Figure 9 shows one implementation that uses a DLL to lock the input clock to the output. An 180° phase detector would guarantee the absolute delay of a delay line to be a half-cycle. Tapping from different points on the delay line provides different phases. As shown in Fig. 9, for a 45° phase shift, the clock can be tapped from the first delay stage of a 4-stage differential delay line. If an arbitrary phase is needed, each delay stage can be tapped and multiplexers can select the nearest desired phase. The number of delay elements quantizes the phase step and limits the resolution [12]. Fine phase resolution requires longer delay lines. Yet, the resolution is limited at high clock frequencies because the maximum number of delay elements needed to span 180° is limited. An arbitrary intermediate phase can be obtained by "interpolating" between two clock phases that are tapped from a delay line. Depending on the weighting, an interpolator produces a clock that has a programmable output phase in between the input clock phases. As long as discrete clock phases that span the entire cycle are available as inputs, any phase for the interpolator's output is possible.

outputs (ckout0 and ck0UtJ).

With ideal integrators, the interpolation is linear, resulting in a constant KpL. Alternatively, an interpolator can effectively be formed with buffer elements instead of integrators. By weighting the drive strength or current of two buffer elements whose outputs are shorted together, one can adjust the output phase. Because the output is not integrated, the resulting interpolation is slightly nonlinear and depends on (1) the phase difference between the inputs and (2) the slew rate (or time constant) of the input and output signals [13]. Figure 11 depicts the linearity of the interpolation for two different input phase separations, s=r and S=2T where x is the buffer's time constant. The larger phase spacing results in greater nonlinearity. Similar to RC delay elements, the interpolation can be digitally controlled. Since the weighting of the interpolation depends on the proportional current, the current sources of the integrators or buffers can be digitally weighted and programmed. In a design for clock and data recovery by [3], quadrature clocks are interpolated to generate an intermediate clock phase within a quadrant. Figure 12 illustrates the mostly analog architecture. An analog control

18

clocks Phase Generator

3.61

l^O

J
J
Control

I 2.6T

Interpolator datajn

1.6T 0.0

0.2

Phase Detect

docksamp

Figure 12: Infinite-range delay line based on phase rotation.

0.4

0.6 0.8 1.0 current partition (X) Figure 11: Buffer based phase interpolator linearity.

Nc-^K^-K^I

voltage produced by the phase detector and filter determines the interpolator currents. Comparators indicate when the current is fully steered to one integrator. A finite state machine driven by the comparators selects the appropriate quadrant by switching the interpolator inputs such that all 360° phases are possible. The quadrature input clocks is generated from an external reference clock through the use of a divide-by-two circuit. Interestingly, because the phase rotates from one quadrant to the next, the architecture effectively has an unlimited delay range. If the input data rate and the reference clock frequency are slightly different, a DLL would continually increase or decrease the delay in order to track the accumulating input phase. A typical DLL with a finite delay range would run out of delay or lose lock. On the other hand, plesiochronous operation is possible with an interpolator-based delay line since the phase smoothly rotates between quadrants. Interpolating between clocks with large phase spacings such as quadrature clocks results in an output clock with slow slew rate. Such waveforms are more susceptible to noise and result in higher jitter. An enhancement uses more closely-spaced phases that span the cycle. The finer phases spacing is possible using a multi-stage ring oscillator. As shown in Fig. 13, a 4-stage differential oscillator would generate 8 phases 45° apart. To guarantee a correct period for each clock phase, the ring oscillator is locked to the external reference clock using a PLL. The role of the PLL is solely for generating the phases. A purely DLL-based architecture is also possible by replacing by using the DLL in Fig. 9 that locks the delay-line output with a 180° phase shift [13]. The architecture is commonly known as a dual-loop design because the first loop, a PLL or DLL, generates the phases and the second loop, the interpolation-based DLL, recovers the data and phase. Since the first loop is not in the feedback of the second loop (or vice versa), the overall system is stable as long as each loop is individually stable. A dual-loop design is possible with the second loop within the

t 4*45,225

^ ^ O

t $135,315

0,180

Figure 13: Oscillator with tapped outputs for multiple phases.

feedback of the first loop [15] as long as the stability of the loop is carefully considered. The data recovery portion of a dual-loop design is conducive to a digital implementation. The binary output of the receiver-replica phase detector can be accumulated using a digital counter. The counter output selects the appropriate phase from the oscillator and controls the digitally programmable interpolators [13],[14]. As long as the quantized phase step is small, the small error only minimally impacts the data recovery. C. Overs amp led Implementation An alternative purely digital approach to clock and data recovery can be implemented by oversampling the data. Figure 14 illustrates an example of a digital architecture. Multiple finely-spaced clock phases oversample the data input. The sampled results are digitally processed to determine both the correct data value and the optimal phase of the data sample. The digital processing can vary in complexity. Simple implementations use the optimal data sample as the received data [18] or take a majority vote from the samples of a single bit [17]. The bit boundaries determine the samples associated with a bit. Transitions that are detected in the samples from the prior or current bits indicate the bit boundaries. The sampling rate limits the timing error margin. Greater amount of oversampling reduces the data-recovery timing error, but increases the number of clock phases. Low data rate UARTs [16] typically use 8 to 16 times oversampling. For high data rates, generating accurate clock phases separated by sub-lOOps is very challenging. More

19

datait1

L L Lr R. C

>

>

cloclT l^o [1:N]

D

C

D

C

•••

>

Receiver Samplers

I

do' Decision Logic

Transition Detect

D

Startable Oscillator dafcin

TDH

180P Delay Line

-£>\ r

18(f Delay Line

dock^p Receiver

delay control

f received data Figure 15: Clock/data recovery using startable oscillator.

received data Figure 14: Oversampled data recovery architecture.

uses logical AND-ORs to combine the multiple phased clocks into a single high-frequency clock [23]. Alternatively, the method in [24] converts each phase into a small pulse and ORs the pulses together to form the output clock. In cases where the output capacitance of the logic gates limits the output frequency, one design [6] uses phases to excite a tuned LC tank to combine the clock phases. Instead of edge combining, the multiplied clock can be the direct output of a delay line. The architecture is similar to a technique for clock and data recovery that uses a startable oscillator [22]. As shown in Fig. 15, the architecture uses data transitions to trigger startable oscillators: high-value data triggers one oscillator and low-value data triggers another. Each startable oscillator comprises of a delay line and an AND gate. The data value enables the AND gate and the triggered oscillator propagates an edge through the delay elements and produces a clock edge delayed by a half-cycle. The edge is used to sample the data. In the absence of input transitions, the delay line is configured an oscillator and generates a sampling edge every cycle. Whenever a new data transition occurs, the oscillator resynchronizes its phase to that of the input. In the implementation by [22], the natural oscillation frequency of the oscillator is determined by an external plesiochronous clock reference. The architecture has not been widely applied to higher data rate designs because the sampling phase is directly derived from the input data without any filtering. The deterministic and random jitter inherent in the data are effectively doubled and can be considerable. If the input is a low-jitter reference clock, a similar architecture can be used for clock multiplication [5]. As illustrated in Fig. 16, a lower frequency but clean reference clock is one input to a multiplexer that feeds into a delay line. The output of the delay line is fed back to the multiplexer as the second input. When a reference clock edge is available, the multiplexer selects the reference input. Otherwise, the multiplexer configures the delay line as an oscillator with the output frequency controlled by the delay. The multiplexer inputs are selected by a counter circuit that determines the number of cycles to oscillate before accepting the next reference clock edge. A phase detector compares the

aggressive designs with the least amount of clocking overhead and high data rates use a minimum of 3x oversampling [17], [18]. Even though phase spacing scales with the gate delay of a technology, so does the bit time in each generation of applications. For oversampling of the data bits, finely-spaced clock phases are needed. Tapping from a delay line produces phases separated by a buffer delay. For even finer phases, several techniques are commonly used. For example, several interpolators can be used where each interpolator has slightly different weighting to generate intermediate phases with spacing less than a buffer delay [19]. An alternative method uses a chain or array of coupled oscillators [20]. By taking a chain of oscillators and coupling them such that the output and input of the chain are separated by only one gate delay, sub-gate-delay phase spacings result from the outputs of each oscillator. Lastly, if the data can be delayed with a chain of delay buffers along with the clock, the clock at each delay stage can be used to sample the data of the corresponding stage. As long as the data and the clock delay lines have slightly different delays, the sub-sampled outputs are effectively an oversampling of the data. The effective phase spacing depends only on the difference between the data delay and clock delay [21]. The architecture has a drawback in that it requires delaying the data and clock by long delays of several cycles, which can significantly increase jitter. IV. CLOCK MULTIPLICATION

With a dual-loop architecture, a DLL can produce a frequency plesiochronous to the delay-line input. However, the rate at which the interpolator weight changes limits the frequency difference. Generating a significantly different or multiplied frequency from a low-frequency input reference is not possible with the architecture. Recently designers have explored several methods of using DLLs for frequency multiplication. One method uses a delay line that is locked to 180°. With the phases that span an entire cycle, the tapped clock edges are combined to form a clock with multiplied frequency. The most direct method

20

restore a clock's duty cycle, the output clock requires correction circuitry. To use DLLs in plesiochronous systems, the delay line must have even more circuitry to achieve an unlimited delay range. In clock multiplication applications, very careful matching in the DLL components is critical to eliminate reference tones. In the many designs that have addressed these subtleties, DLLs have demonstrated low-jitter clock outputs for a variety of clock generation and data recovery applications.

t

Counter+ Control clock^

Delay Line Nxf ref Filter

4 •

Detect

1

REFERENCES

**

[I]

1

Figure 16: DLL-based clock multiplication.

[2]

reference input with the oscillator output and tunes the delay of the delay elements. Once locked, the resulting output clock frequency is a multiple of the input reference frequency. Recent designs [26] extend the frequency range and use an interpolator instead of a multiplexer to blend the delay-line feedback and the low-frequency reference clocks. Both edge-combining multiplication and delay-line multiplication reduce the phase noise of the output clock because the core DLL does not have an oscillator that accumulates phase error. After N cycles, where N is the divide ratio, a new clean reference clock edge arrives and resets any accumulated phase error to zero. The architecture potentially lowers jitter by eliminating the peaking in the transfer function and allows a high tracking bandwidth. However, matching is critical in these designs. Mismatches in the phase detector or charge pump result in a static phase error that modulates the output frequency at the input reference frequency. Similarly, in the edge combining implementations, if the delay line is mismatched, the output clock would contain significant reference tones. Designers either choose the reference frequency carefully so that the tones do not impact the system performance or employ additional circuitry to compensate for the mismatches.

[3]

[4]

[5]

[6]

[7] [8]

[9]

[10] V.

CONCLUSION

DLLs have been commonly used for generating precise phase delays of a signal and have been increasingly popular in clock generation and data recovery applications. Most importantly, because of the first-order loop characteristics that controls the phase directly, DLLs can be designed with high tracking bandwidths and do not exhibit the phase accumulation of an oscillator-based PLL. The more simple loop characteristics belie many subtleties in DLL design. The delay-line input clock must have low-jitter and good duty-cycle. Furthermore, it must be carefully received and coupled to the input of the delay line to maintain good jitter performance. This source of jitter counter-balances the jitter accumulation of PLLs and results in less jitter improvement. Additional circuitry is often needed to prevent false-locking. Since a delay line does not

[II]

[12]

[13]

[14]

[15]

21

Bazes, M., "A Novel Precision MOS Synchronous Delay Line," IEEE Journal of Solid-State Circuits, vol sc-20, no 6, Dec. 1985, pp. 1265-71 Johnson, M.G., E.L. Hudson, "A Variable Delay Line PLL for CPU-Coprocessor Synchronization," IEEE Journal of Solid-State Circuits, vol 23, no 5, Oct. 1988, pp. 1218-23 Lee, T.H., et. al., "A 2.5V CMOS Delay-Locked Loop for an 18 Mbit, 500Megabytes/s DRAM," IEEE Journal of Solid-State Circuits, vol 29, no 12, Dec. 1994, pp. 1491-6 Messerschmitt, D.G., "Synchronization in Digital System Design," IEEE Journal on Selected Areas in Communications, Oct. 1990, pp. 1404-1420 Waizman, A., "A Delay Line Loop for Frequency Synthesis of De-Skewed Clock," IEEE ISSCC Dig. of Tech. Papers, Feb. 1994, San Francisco, Session 18.5 Chien, G., P.R. Gray, "A 900-MHz Local Oscillator Using a DLL-Based Frequency Multiplier Technique for PCS Applications," IEEE Journal of Solid-State Circuits, vol 35, no 12, Dec. 2000, pp. 1996-9 Alexander, J.D., "Clock Recovery from Random Binary Data," Electronic Letters, vol 11, Oct. 1975, pp 541-2 D1 Andrea, N.A., F. Russo, "A binary quantized digital phase locked loop: a graphical analysis," IEEE Transactions on Communications, vol.COM-26, (no.9), Sept. 1978.p.l355-64 Moon, Y,, "An All-Analog Multiphase Delay-Locked Loop Using A Replica Delay Line for Wide-Range Operation and Low-Jitter Performance," IEEE Journal of Solid-State Circuits, vol 35, no 3, Mar. 2000, pp. 377-84 Razavi, B., "Design of Monolithic Phase-Locked Loops and Clock Recovery Circuits - A Tutorial," Monolithic Phase-locked Loops and Clock Recovery Circuits, IEEE Press 1996 New Jersey, pp. 1-28 Dunning, J., et. al. "An All-Digital Phase-Locked Loop with 50-Cycle Lock Time Suitable for High-Performance microprocessors," IEEE Journal of Solid-State Circuits, vol 30, no 4, Apr. 1995, pp. 412-22 Efendovich, A., et. al., "Multifirequency Zero-Jitter Delay-Locked Loop," IEEE Journal of Solid-State Circuits, vol 29, no 1, Jan. 1994, pp. 67-70 Sidiropoulos, S., M.A. Horowitz, "A Semidigital Dual Delay-Locked Loop," IEEE Journal of Solid-State Circuits, vol 32, no 11, Nov. 1997, pp. 1683-92 Garlepp, B., et. al., "A Portable Digital DLL for High-Speed CMOS Interface Circuits," IEEE Journal of Solid-State Circuits, vol 34, no 5, May 1996, pp. 632-44 Larsson, P., "A 2-1600-MHz CMOS Clock Recovery PLL with Low-Vdd Capability," IEEE Journal of

[16]

[ 17]

[18]

[19]

[20]

[21]

Solid-State Circuits, vol 34, no 12, Dec. 1999, pp. 1951-60 Cordell, R., "A 45-Mbit/s CMOS VLSI Digital Phase Aligner," IEEE Journal of Solid-State Circuits, vol 23, no 2, Apr. 1988, pp. 323-28 Lee, K., et. al., "A CMOS Serial Link For Fully Duplexed Data Communication," IEEE Journal of Solid-State Circuits, vol 30, no 4, Apr. 1995, pp. 353-64 Yang, C.K., et al., "A 0.5-|im CMOS 4.0-Gb/s Serial Link Transceiver with Data Recovery Using Oversampling," IEEE Journal of Solid-State Circuits, vol 33, no 5, May 1998, pp. 713-22 Weinlader, D., et al., "An Eight Channel 36-GS/s CMOS Timing Analyzer," IEEE ISSCC Dig. of Tech. Papers, Feb. 2000, San Francisco, pp. 170-1 Maneatis, J., M. Horowitz, "Precise Delay Generation Using Coupled Oscillators," IEEE Journal of Solid-State Circuits, vol 28, no 12, Dec. 1993, pp. 1273-82 Gray, C , et. al., "A Sampling Technique and Its CMOS Implementation with lGb/s Bandwidth and 25ps Resolution", IEEE Journal of Solid-State Circuits, vol 29, no 3, Mar. 1994, pp. 340

[22]

[23]

[24]

[25]

[26]

[27]

22

Ota, Y. et. al., "High-Speed, Burst-Mode, Packet Capable Optical Receiver and Instantaneous Clock Recovery for Optical Bus Operation," IEEE Journal of Lightwave Technology, vol 12, no 2, Feb. 1994, pp. 325-330 Foley, D., M.P. Flynn, "CMOS DLL-Based 2-V 3.2ps Jitter 1-GHz Clock Synthesizer and Temperature-Compensated Tunable Oscillaor," IEEE Journal of Solid-State Circuits, vol 36, no 3, Mar. 2001, pp. 417-23 Kim, C , I. Hwang, S.M. Kang, "Low-Power Small-Area +/-7.28ps Jitter lGHz DLL-Based Clock Generator," IEEE ISSCC Dig. of Tech. Papers, Feb. 2002, San Francisco, Session 8.3 Farjad-rad, R., et. al., "A 0.2-2GHz 12mW Multiplying DLL for Low-Jitter Clock Synthesis in Highly-Integrated Data Communication Chips," IEEE ISSCC Dig. of Tech. Papers, Feb. 2002, San Francisco, Session 4.5 Ye, S., L. Jansson, I. Galton, "A Multiple-Crystal Interface PLL with VCO Realignment to Reduce Phase Noise," IEEE ISSCC Dig. of Tech. Papers, Feb. 2002, San Francisco, Session 4.6 Kim, J., et. al., "A Low-Jitter Mixed-Mode DLL for High-Speed DRAM Applications," IEEE Journal of Solid-State Circuits, vol 35, no 10, Oct. 2000, pp. 1430-3

Delta-Sigma Fractional-TV Phase-Locked Loops Ian Galton Abstract—This paper presents a tutorial on delta-sigma fractional-TV PLLs for frequency synthesis. The presentation assumes the reader has a working knowledge of integer-TV PLLs. It builds on this knowledge by introducing the additional concepts required to understand A£ fractional-TV PLLs. After explaining the limitations of integerTV PLLs with respect to tuning resolution, the paper introduces the delta-sigma fractional-TV PLL as a means of avoiding these limitations. It then presents a selfcontained explanation of the relevant aspects of deltasigma modulation, an extension of the well known integerTV PLL linearized model to delta-sigma fractional-TV PLLs, a design example, and techniques for wideband digital modulation of the VCO within a delta-sigma fractional-TV PLL.

Reference Signal Generator V

%

ref

V


Phase/ Frequency Detector

U Lowpass Loop Filter

d

V

VCO ]

out

Charge Parop

•f N

V V

ref

y

div

Phase/ Frequency Detector

U

d

V

rfv

U

d

I. INTRODUCTION

T

ref-

Over the last decade, delta-sigma (AS)fractional-TVphase locked loops (PLLs) have become widely used for frequency Figure 1: A typical integer-N PLL. synthesis in consumer-oriented electronic communications underlying fractional-TV PLLs in general and AS fractional-TV products such as cellular phones and wireless LANs. Unlike PLLs in particular are presented in Section III. The primary an integer-TV PLL, the output frequency of a AS fractional-TV innovation in ASfractional-TVPLLs relative to other types of PLL is not limited to integer multiples of a reference fre- fractional-TV PLLs is the use of AS modulation. Therefore, a quency. The core of a ASfractional-TVPLL is similar to an self-contained introduction to AS modulation as it relates to integer-TV PLL, but it incorporates additional digital circuitry ASfractional-TVPLLs is presented in Section IV. A AS fracthat allows it to accurately interpolate between integer multi- tional-TV PLL linearized model is derived in Section V and ples of the reference frequency. The tuning resolution de- compared to the corresponding model for integer-TV PLLs. A pends only on the complexity of the digital circuitry, so con- design example is presented to demonstrate how the model is siderable flexibility and programmability is achieved. A sin- used in practice. Design issues that arise in AS fractional-TV gle AS fractional-TV PLL often can be used for local oscillator PLLs but not integer-TV PLLs are presented in Section VI, and generation in applications that would otherwise require a cas- recently developed enhancements to AS fractional-TV PLLs cade of two or more integer-TV PLLs. Moreover, the fine tun- that allow wideband digital modulation of the VCO are preing resolution makes it possible to perform digitally-controlled sented in Section VII. frequency modulation for generation of continuous-phase (e.g., FSK and MSK) transmit signals, thereby simplifying II. INTEGER-TV PLL LIMITATIONS wireless transmitters. These benefits come at the expense of increased digital complexity and somewhat increased phase An example of a typical integer-TV PLL for frequency synnoise relative to integer-TV PLLs. However, with the relentless thesis is shown in Figure 1 [1], [2]. Its purpose is to generate progress in silicon VLSI technology optimized for digital cir- a spectrally pure periodic output signal with a frequency of TV cuitry, this tradeoff is increasingly attractive, especially in /„,/, where TV is an integer, and/ re /is the frequency of the referconsumer products which tend to favor cost reduction over ence signal. The example PLL consists of a phase-frequency performance. detector (PFD), a charge pump, a lowpass loop filter, a voltage This paper presents a tutorial on ADfractional-TVPLLs. It controlled oscillator (VCO), and an TV-fold digital divider. is assumed that the reader has a working knowledge of inte- The PFD compares the positive-going edges of the reference ger-TV PLLs. The paper builds on this knowledge by present- signal to those from the divider and causes the charge pump to ing the additional concepts required to understand AS frac- drive the loop filter with current pulses whose widths are protional-TV PLLs. The limitations of integer-TV PLLs with respect portional to the phase difference between the two signals. The to tuning resolution are described in Section II. The key ideas pulses are lowpass filtered by the loop filter and the resulting waveform drives the VCO. Within the loop bandwidth phase The author is with the Department of Electrical and Computer Engi- noise from the VCO is suppressed and outside the loop bandwidth most of the other noise sources are suppressed, so the neering, University of California at San Diego, La Jolla, CA, USA.

23

/ . - 4 0 kHz ^-492

Phase/ Freq. Detector

fw Charge Pump

Loop Filter

Phase/ Freq. Detector

- 2.402 GHz + *MHz<wk

VCO

Charge Pump

Loop Filter

f**r2-403

GHz

(on average)

19.68 MHz -MM+JPI-I

19.68 MHz ^•60050 + 2 5 *

Figure 2: An example integer-N PLL for generation of the Bluetooth wireless LAN RF channel frequencies.

PLL can be designed to generate a spectrally pure output signal at any integer multiple of the reference frequency,/*/. As indicated by the timing diagram in Figure 1, the loop filter is updated by the charge pump once every reference period. This discrete-time behavior places an upper limit on the loop bandwidth of approximately fnj/l0 above which the PLL tends to be unstable [1]. In integrated circuit PLLs, it is common to further limit the bandwidth to approximately f^/20 to allow for process and temperature variations. The output frequency can be changed by changing N, but N must be an integer, so the output frequency can be changed only by integer multiples of the reference frequency. If finer tuning resolution is required the only option is to reduce the reference frequency. Unfortunately, this tends to reduce the maximum practical loop bandwidth, thereby increasing the settling time of the PLL, the noise contributed by the VCO, and the in-band portions of the noise contributed by the reference source, the PFD, the charge pump, and the divider. This fundamental tradeoff between bandwidth and tuning resolution in integer-Af PLLs creates problems in many applications. For example, a PLL that can be tuned from 2.402 GHz to 2.480 GHz in steps of 1 MHz is required to generate the local oscillator signal in a direct conversion Bluetooth transceiver [3]. An integer-N PLL capable of generating the local oscillator signal from a commonly used crystal oscillator frequency, 19.68 MHz, is shown in Figure 2. A reference frequency of fref = 40 kHz—the greatest common divisor of the crystal frequency and the set of desired output frequencies—is obtained by dividing the crystal oscillator signal by 492. The resulting PLL output frequency is 60050 + 25k times the reference frequency, where k is an integer used to select the desired frequency step. The PLL achieves the desired output frequencies, but its bandwidth is limited to approximately 2 kHz, i.e.,/^/20. Unfortunately, with such a low bandwidth the settling time exceeds the 200 jiS limit specified in the Bluetooth standard, and the phase noise contributed by the VCO would be unacceptably high if it were implemented in present-day CMOS technology. One solution is to use a 1 MHz reference signal, but this requires the crystal frequency to be an integer multiple of 1 MHz, or another PLL to generate a 1 MHz reference frequency. Unfortunately, in low cost consumer electronics applications such as Bluetooth, it is often desirable to be compatible with all of the popular crystal frequencies, so restricting the crystal frequencies to multiples of 1 MHz is not always an option. In such cases, an additional PLL capable of generating the 1 MHz reference signal with very little phase noise from any of the crystal frequencies is required, or, as de-

d

Shift Register with 51 ones and 441 zeros

_r y\»\

Figure 3: A fractional-.^ PLL that generates non-integer multiples of the reference frequency, but has phase noise consisting of large spurious tones.

scribed in the next section, a singlefractional-TVPLL can be used.

III. THE IDEA BEHIND AS FRACTIONAL-//PLLs In this section, the example problem of generating the second Bluetooth channel frequency, 2.403 GHz, with a reference frequency of 19.68 MHz is used as a vehicle with which to explain the idea behind AE fractional-N PLLs. First, a pair of "bad"fractional-TVPLLs are presented that achieve the desired frequency but have poor phase noise performance. Then the AEfractional-TVPLL technique is presented as a means of improving the phase noise performance. The output frequency of an integer-N PLL with a reference frequency of 19.68 MHz is 2.40096 GHz when the divider modulus, N, is set to 122 and 2.42064 GHz when N is set to 123. The problem is that to achieve the desired frequency of 2.403 GHz, TV would have to be set to the non-integer value of 122 + 51/492. This cannot be implemented directly because the divider modulus must be an integer value. However the divider modulus can be updated each reference period, so one option is to switch between N = 122 and N= 123 such that the average modulus over many reference periods converges to 122 + 51/492. In this case, the resulting average PLL output frequency is 2.403 GHz as desired. This is the fundamental idea behind most fractional-// PLLs [4]. While dynamically switching the divider modulus solves the problem of achieving non-integer multiples of the reference frequency, a price is paid in the form of increased phase noise. During each reference period the difference between the actual divider modulus and the average, i.e., ideal, divider modulus represents error that gets injected into the PLL and results in increased phase noise. As described below, the amount by which the phase noise is increased depends upon the characteristics of the sequence of divider moduli. For example, in the fractional-JV PLL shown in Figure 3, the divider modulus is set each reference period to 122 or 123 such that over each set of 492 consecutive reference periods it is set to 122 a total of 441 times and 123 a total of 51 times. Thus, the average modulus is 122-1- 51/492 as required. The sequence of moduli is periodic with a period of 492, so it repeats at a rate of 40 kHz. Consequently, the difference between the actual divider moduli and their average is a periodic sequence with a repeat rate of 40 kHz, so the resulting phase noise is periodic and is comprised of spurious tones at integer multiples of 40 kHz. Many of the spurious tones occur at low frequencies, and they can be very large. Unfortunately, the

24

1

HOI-

19.68 MHz

H

Phase/ Freq. Detector

Charge Pump

Loop Filter

rD>°T HDr-J

fvaT 2 - 4 0 3 G H z (on average)

* Phase/ Freq. r Detector

Charge Pump

Loop Filter

vco

2.403 GHz

19.68 MHz 122 +y\n\ -f- 122 +y[n) Randomized Pulse Density Modulator

fl with probability 51/492 10 with probability 441 /492

51/492

«*®*£U

Figure 4: A fractional-TV PLL that generates non-integer multiples of the reference frequency, but has a large amount of in-band phase noise.

only way to suppress the tones is have a very small PLL bandwidth, which negates the potential benefit of the fractional-N technique. One way to eliminate spurious tones is to introduce randomness to break up the periodicity in the sequence of moduli while still achieving the desired average modulus. For example, as shown in Figure 4, a digital block can be used to generate a sequence, y[n], that approximates a sampled sequence of independent random variables that take on values of 0 and 1 with probabilities 441/492 and 51/492, respectively. During the n^ reference period the divider modulus is set to 122 + y[n], so the sequence of moduli has the desired average yet its power spectral density (PSD) is that of white noise. Thus, instead of contributing spurious tones, the modified technique introduces white noise. Unfortunately, the portion of the white noise within the PLL's bandwidth is integrated by the PLL transfer function, so the overall phase noise contribution again can be significant unless the PLL bandwidth is small. In each fractional-M PLL example presented above, the sequence, y[n], can be written as y[n] =x + em[n], where x is the desired fractional part of the modulus, i.e., x = 51/492, and em[n] is undesired zero-mean quantization noise caused by using integer moduli in place of the ideal fractional value. In the first example, em[n] is periodic and therefore consists of spurious tones at multiples of 40 kHz. In the second example, em[n] is white noise. Each PLL attenuates the portion of em[n] outside its bandwidth, but the portion within its bandwidth is not significantly attenuated. Unfortunately, in each example em[n] contains significant power at low frequencies, so it contributes substantial phase noise unless the PLL bandwidth is very low. A AS fractional-N PLL avoids this problem by generating the sequence of moduli such that the quantization noise has most of its power in a frequency band well above the desired bandwidth of the PLL [5], [6], [7]. An example AS fractionally PLL is shown in Figure 5. The PLL core is similar to those of the previous fractional-N PLL examples, but in this case y[n] is generated by a digital AS modulator. The details of how the AS modulator works are presented in the next section, but its purpose is to coarsely quantize its input sequence, x[n], such that y[n] is integer-valued and has the form: y[n] = x[n 2] + em[ri], where em[n] is de-free quantization noise with most of its power outside the PLL bandwidth. In this example, x[n] consists of the desired fractional modulus value, 51/492, plus a small, pseudo-random, 1-bit sequence. As described in the next section, the pseudo-random sequence is necessary to avoid spurious tones in the AS modulator's quantization noise, but its amplitude is very small so it does not appreciably in-

25

Simulated PLL Phase Noise 500 kHz Loop Bandwidth

{0,217}

pseudo-random bit sequence

-80 y[n {-1,0,1,2}

•100

g-120-140 " 16 °

I 50 kHz Loop Bandwidth \

-180

Figure 5: A A I fractional-A^ PLL example.

crease the phase noise of the PLL. Also shown in Figure 5 are PSD plots of the output phase noise arising from AS modulator quantization noise, em[n], in two computer simulated versions of the example AS fractional-Af PLL, one with a 50 kHz loop bandwidth and the other with a 500 kHz loop bandwidth. As shown in the next section, the PSD of em[n] increases with frequency, so the phase noise PSD corresponding to the 50 kHz bandwidth PLL is significantly smaller than that corresponding to the 500 kHz bandwidth PLL. For example, the former easily meets the requirements for a local oscillator in a direct conversion Bluetooth transceiver, but the latter falls short of the requirements by at least 23 dB. IV.

DELTA-SIGMA MODULATION OVERVIEW

As mentioned above, a digital AS modulator performs coarse quantization in such a way that the inevitable error introduced by the quantization process, i.e., the quantization noise, is attenuated in a specific frequency band of interest. There are many different AS modulator architectures. Most use coarse uniform quantizers to perform the quantization with feedback around the quantizers to suppress the quantization noise in particular frequency bands. Therefore, to illustrate the AS modulator concept, first a specific uniform quantizer example is considered in isolation, and then a specific AS modulator architecture that incorporates the uniform quantizer is presented. A. An Example Uniform Quantizer The input-output characteristic of the example uniform quantizer is shown in Figure 6. It is a 9-level quantizer with integer valued output levels. For each input value with a magnitude less than 4.5, the quantizer generates the corresponding output sample by rounding the input value to the nearest integer. For each input value greater than 4.5 or less than -4.5, the quantizer sets its output to 4 or - 4 , respectively; such values are said to overload the quantizer. By defining the quantization noise as eg[n] = y[n]-r[n]9 the quantizer can be viewed without approximation as an additive noise source as illustrated in the figure.

-y

TI

43 • 2

9-Levei Quantizer

1 -4 5 - 3 5 -2.5 -I 5

|

I

r

0.5 1.5 2.5 3.5 4 5

Delay

f] \

I

Delay

/J

9-Level Quantizer

K

Figure 8: A AX modulator example.

e

-Si

=y-r

r

0.5

y

can be used to circumvent this problem. The structure incorporates the same 9-level quantizer presented above, but in this case the quantizer is preceded by two delaying discrete-time integrators (i.e., accumulators), and surrounded by two feedback loops [8], [9]. Each discrete-time integrator has a transfer function of z~ 1 /(l-z~ 1 ) which implies that its «* output sample is the sum of all its input samples for times k < n. With the quantizer represented as an additive noise source as depicted in Figure 6, the AS modulator can be viewed as a two-input, single-output, linear time-invariant, discrete-time system. It is straightforward to verify that y[n] = x[n-2] + em[n], (1) where em[n] is the overall quantization noise of the AS modulator and is given by em[n] = eq[n]-2eq[n-l] + eq[n-2]. (2)

-0.5 M

No-ovcrload range"

Figure 6: A 9-level quantizer example. 48 kHz sinusoid plus white noise (SNR = lOOdB) — sampled at 48 MHz

(a),(b)

9-level Quantizer

2

I*

•2 500

80

jj -

1500

2000

2

< -120

To illustrate the behavior of the AS modulator, suppose that the same 48 Msample/s input sequence considered above is applied to the input of the AS modulator, and that the discretetime integrators in the AS modulator are clocked at 48 MHz. Figure 9(a) shows the PSD plot of the resulting AS modulator output sequence, y[n], and Figure 9(b) shows a time domain plot of y[n] over two periods of the sinusoid. Two important differences with respect to the uniform quantization example shown in Figure 7 are apparent: the quantization noise PSD is significantly attenuated at low frequencies, and no spurious tones are visible anywhere in the discrete-time spectrum. For instance, the SNR in the zero to 500 kHz frequency band is approximately 84 dB for this example as opposed to 14 dB for the uniform quantization example of Figure 7. Consequently, subjecting the AS modulator output sequence to a lowpass filter with a cutoff frequency of 500 kHz results in a sequence that is very nearly equal to the AS modulator input sequence as demonstrated in Figure 9(c). Below about 120 kHz, the PSD shown in Figure 9(a) is dominated by the two components of the AS modulator input sequence: the 48 kHz sinusoid component, and the input noise component. Above 120 kHz, the PSD is dominated by the AS modulator quantization noise, em[n], and rises with a slope of 40 dB per decade. It follows from (2) that em[n] can be viewed as the result of passing the additive noise from the quantizer, eq[n], through a discrete-time filter with transfer function (1 - z"1 ) 2 . Since this filter has two zeros at dc, the smooth 40 dB per decade increase of the PSD of em[n] indicates that eq[n] is very nearly white noise, at least for the example shown in Figure 9. It can be proven that eq[n] is indeed white noise; it has a variance of 1/12 and is uncorrelated with the AS modulator input sequence [10]. Moreover, this situation holds in general for the example AS modulator architecture provided that the

Ao -2

-140 • 160

1000

(c)

|.100 II

^

-fc>

(b)

(a)

I-

Lowpass Filter (BW = 500 kHz)

10*

10S

1O 9

107

Hz

C

500

1000

1500

time (units of 1/(48 MHz))

Figure 7: (a) A power spectral density plot of the quantizer output in dB, relative to the quantization step-size of A = 1, per Hz, (b) a time domain plot of the quantizer output, and (c) a time domain plot of the quantizer output filtered by a sharp lowpass filter with a cutoff frequency of 500 kHz.

To illustrate some properties of the example quantizer, consider a 48 Msample/s input sequence, x[n], consisting of a 48 kHz sinusoid with an amplitude of 1.7 plus a small amount of white noise such that the input signal-to-noise ratio (SNR) is 100 dB. Figure 7(a) shows the PSD plot of the resulting quantizer output sequence, and Figure 7(b) shows a time domain plot of the quantizer output sequence over two periods of the sinusoid. Given the coarseness of the quantization, it is not surprising that the quantizer output sequence is not a precise representation of the quantizer input sequence. As evident in Figure 7(a), the quantization noise for this input sequence consists primarily of harmonic distortion as represented by the numerous spurious tones distributed over the entire discrete-time frequency band. Even in the relatively narrow frequency band below 500 kHz, significant harmonic distortion corrupts the desired signal. To illustrate this in the time domain, Figure 7(c) shows the sequence obtained by passing the quantizer output sequence through a sharp lowpass discrete-time filter with a cutoff frequency of 500 kHz. The significant quantization noise power in the zero to 500 kHz frequency band causes the sequence shown in Figure 7(c) to deviate significantly from the sinusoidal quantizer input sequence. B. An Example AS Modulator The example AS modulator architecture shown in Figure 8

26

48 kHz sinusoid plus white noise (SNR = lOOdB) — sampled at 48 MHz

Second-Order AI Modulator

(a),(b)

Lowpass Filter (BW = 500 kHz)

Phase/ Freq. Detector

-fc>

Charge Pump

vco '*«>

*<

c,

^')

'm«

Ci

(a)

(b),

J .0

Loop Filter

v

«tQ

f:

+ N + y\n\ V

y\n\-

\ 0

£.100 II

SO0

1OO0

1500

2000

ref-

V

div-

(C) 2 •

<-120

-2 10 4

10 s

10 e

Hz

10T

0

500

1000

1500

2000

'



-

*

;

*•„••-'„•.

Figure 10: The AL fractional-N PLL with the details of a commonly used loop filter and a timing diagram relating to the charge pump output.

time (units of 1/(48 MHz))

-180

thereby more aggressively suppressing quantization noise in particular frequency bands relative to the example secondorder AS modulator. Some of these higher-order AS modulators incorporate a higher than second-order loop filter (e.g., input sequence satisfies two conditions: 1) its magnitude is more than two discrete-time integrators) and a single quantizer sufficiently small that the quantizer within the AS modulator surrounded by one or more feedback loops [15], [16]. In never overloads, and 2) it consists of a signal component plus many cases, these AS modulators are designed specifically to a small amount of independent white noise. It can be shown allow one-bit quantization [7], [17], [18]. This simplifies the that the first condition is satisfied if the input signal is design of the divider in that only two moduli are required, but bounded in magnitude by 3 A where A is the step-size of the such AS modulators tend to have spurious tones in their quanquantizer (for this example, A = 1) [11]. Input sequences with tization noise that cannot be completely suppressed even with values even slightly exceeding 3 A in magnitude generally elaborate dithering techniques. Others of these higher-order cause the quantizer to overload with the result that eq[n] con- AS modulators, often referred to as MASH, cascaded, or multains spurious tones and the SNR in the frequency band of tistage AS modulators, are comprised of multiple lower-order interest is degraded. For this reason, the range between -3A AS modulators, such as the second-order AS modulator preand 3 A is said to be the input no-overload range of the AS sented above, cascaded to obtain the equivalent of a single modulator. For the second condition to be satisfied, the power higher-order AS modulator [5], [19], [20]. of the AS modulator input sequence's white noise component may be arbitrarily small, but if it is absent altogether, eq[n], is V. AS FRACTIONAL-N PLL DYNAMICS not guaranteed to be white. For instance, in the example shown in Figure 9 the input sequence contains a white noise A AS fractional-TV PLL linearized model is derived in this component with 100dB less power than the signal component. section in the form of a block diagram that describes the outIf this tiny noise component were not present, the resulting AS put phase noise in terms of the component parameters and modulator output PSD would contain numerous spurious noise sources in the PLL. As in the case of an integer-Af PLL tones. Since the AS modulators used in AD fractional-iV PLLs the model provides an accurate tool with which to predict the are all-digital devices, the noise must be added digitally. As total phase noise, bandwidth, and stability of the PLL. shown in [12], it is sufficient to add a 1-bit, sub-LSB, independent, white noise dither sequence with zero mean at the A. Derivation of a AS fractional-N PLL L inearized Model input node. In practice, a 1-bit pseudo-random dither seIn PLL analyses it is common to assume that each periodic quence is typically used in place of a truly random dither se- signal within the PLL has the form v(t) = A(t) sin(wt + 0(t)\ quence. Such a sequence can be generated easily using a lin- where A(t) is a positive amplitude function, co is a constant ear feedback shift register, and has the desired result with re- center frequency in radians/sec, and 6{f) is zero-mean phase spect to the quantization noise despite not being truly random noise in radians. In most cases of interest for PLL analysis, [13], [14]. the amplitude is well modeled as a constant value, and the phase noise is very small relative to TC with a bandwidth that is C. Other AS Modulator Options much lower than the center frequency. Solving for the time of To this point, the AS modulation concept has been illus- the w* positive-going zero crossing, yn, of v(t) gives yn = [n trated via the particular example AS modulator architecture 0(yn)/(2n)]'T, where T = litlco is the period of the signal. shown in Figure 8, namely a second-order multi-bit AS modu- Therefore, the sequence, yn, is a sampled version of the phase lator. While this type of AS modulator is widely used in AS noise with very little aliasing, so knowing the sequence and T fractional-N PLLs, there exist other types of AS modulators is approximately equivalent to knowing the phase noise. This that can be applied to AS fractional-N PLLs. Most of the approximation is made throughout the following analysis. The relationship between the charge pump output current other architectures are higher-order AS modulators that perform higher than second-order quantization noise shaping, and the PFD input signals is shown in Figure 10. Ideally, dur-

Figure 9: (a) A power spectral density plot of the A I modulator output in dB, relative to the quantization step-size of A = 1, per Hz, (b) a time domain plot of the AE modulator output, and (c) a time domain plot of the A I modulator output filtered by a sharp lowpass filter with a cutoff frequency of 500 kHz.

27

ing the w* reference period the charge pump output is a current pulse of amplitude / or - / and duration \tn - rn\, where /„ and rn are the times of the charge pump output transitions triggered by the positive-going edges of the divider output and reference signal, respectively. Therefore, the average current sourced or sunk by the charge pump during the «* reference period is /•(/„ - rn)ITnf. In practice, the PFD is usually designed such that, except for a possible constant offset, this result holds even though the current sources have finite rise Figure 11: The A£ fractional-N PLL linearized model. Except for the shaded and fall times [2]. region the model is identical to the corresponding integer-Af PLL model. The first step in deriving the model is to develop an expression for /„ — xn. Ideally, rn = nTKf, but phase noise intro- tions in (5) can be neglected, and the charge pump output can duced by the reference source and PFD cause it to have the be modeled as a smoothly varying function of time with an average value over each reference period equal to that of (5). form With these approximations, (5) implies that T = nT ' "m(j)-Kcoum{t)-kycoUVc^t)-vml)dt~^l lE where 6reJ(j) and OPFDO) a r e the reference source and PFD LO) = I — cp phase noise functions, respectively. If the VCO output were N+a ideal its positive-going edges would be spaced at uniform in(6) 1 tervals of Tnf I (N + a), where a is the fractional part of the 0 ^,v(') , ~AO , &pFD(t) modulus (e.g., a = 51/492 in Figure 5). Therefore, ideally, 2n + 2n + 2n

"

* "fj[MO+*m>(O]»

(3)

^v^i{N+m'

but in practice it deviates because of VCO phase noise, OyccAi), divider phase noise, #
<.-iyg<"+**j)-^-(v-<<>-v,)*-^] T

2n e*.(O, which reduces to

4 ^ ^

#«/(•*) where

tn=nTnf +^—\y\(yW-<*)

*

where ujj) is the result of discrete-time integrating and converting to continuous-time the quantity, y[n] — a. The AL fractional-N PLL linearized model follows directly from (6) and Figure 10. It is shown in Figure 11, where in(t) represents the noise contributed by the charge pump current sources and the loop filter, and z//s) is the transfer function of the loop filter. The model specifies the phase noise transfer functions and loop dynamics of the PLL. For example, the model implies that

N + alh

+ aJ ) ^ - ,

T(s)=

'

and ^ l £ l

l + T(s)

eYCO{s)

vc

°y Y

=

_L_

l + T(s)

(7)

(8)

-Kco f ( v ^ C ) - ^ ) * - ^ ] (4)is the loop gain of the PLL.

For the loop filter shown in Figure 10, the transfer function is 1 l + sRCt z (J, (9) " C^+C2 s[\ +sRClC2 /(C, + C 2 )]'

T

1

ref

0*(O-

2n Subtracting (3) from (4) yields an expression for the average current sourced or sunk by the charge pump during the «* B. Differences Between the AS Fractional-N and Integer-N reference period: PLL Models I(tn-Tn)/Tre/ = The shaded region in Figure 11 indicates the part of the model that is specific to AZ fractional-TV PLLs; except for the shaded region the model is identical to the corresponding _ _ (5) Il (5) model for integer-iV PLLs. Therefore, each phase noise transN+a fer function in an integer-iVTLL is identical to the corresponding phase noise transfer function in a AZfractional-.AfPLL, except every occurrence of TV in the former is replaced by N+a g«,C.) , M O , <WO in the latter. In most cases, N» \ and a<\, so N+a~N 2n In ' In and the corresponding transfer functions in integer-N and AE As mentioned above, the phase noise terms are assumed to fractional-AT PLLs are nearly identical in practice. Similarly, have bandwidths that are much smaller than the reference fre- the loop dynamics and stability issues are nearly the same in quency. Consequently, the sampling of the phase noise func- ASfractional-TVPLLs and integer-TV PLLs.

SUtj-^-^fMo-v,,)*-^

28

^ ^ /

L|QI I pi Debtor H

pum

pM

Fiiter

w

margin, and AS modulator quantization noise suppression. The process is demonstrated below for the AS fractional-N PLL presented in Section III to generate the local oscillator frequencies in a direct conversion Bluetooth wireless LAN transceiver. The PLL is shown in Figure 12 with additional detail regarding the frequency plan. As described previously, the desired output frequencies are/pco = 2.402 GHz + k MHz for k = 0, ..., 78, and the crystal reference frequency is 19.68 MHz. Each of the 79 possible output frequencies is chosen by selecting m and N as indicated in the figure. In each case, the divider modulus is restricted to the set of four integers {N- 1, N,N+ 1, N + 2}. The combinations of m and N were chosen to achieve the desired output frequencies yet keep the signals at the input of the AS modulator sufficiently small so as not to overload the AS modulator [11]. Typical requirements for such a PLL are that the loop bandwidth must be greater than 40 kHz, the phase margin must be greater than 60°, and the PLL phase noise be less than -120 dBc/Hz at offsets from the carrier of 3 MHz and above. Assume that the VCO, divider, PFD, and charge pump circuits have been designed such that the overall PLL phase noise specification can be met provided the phase noise contributed by the AS modulator and loop filter are each less than -130 dBc/Hz at offsets from the carrier of 3 MHz and above. Furthermore, assume that the VCO and charge pump circuits are such that Kvco and / are 200 MHz/V and 200 uA, respectively, and that the loop filter has the form shown in Figure 10. Thus, the remaining design task is to choose the loop filter components such that the bandwidth, phase margin, and phase noise specifications are met. The PLL phase margin, bandwidth, and phase noise arising from AS modulator quantization noise can be derived from the linearized model equations, (7) through (10). While this can be done directly, it involves the solution of third order equations which can be messy. Alternatively, approximate solutions of the equations can be derived that provide better intuition [21]. A particularly convenient set of approximate solutions are

= 2.402 GHz

r V T r

19.68 MHz | — ±N+y\n\

*

2nd-Order~| I m/492—*<£>-*> Digital A£ * T 1 Modulator | y\n\ = {-1, 0,1, 2} {0,2~i7} pseudo-random bit sequence Frequency Plan: • Toget* = 0, I , . . . , o r l 8 : • To get* = 19, 2 1 , . . . , or 38: • To get * = 39, 4 1 , . . . , or 57: • To get k = 58,60,..., or 79:

set N= set N= settf= set N=

122, m = k-25 + 26 123, m = (k- 19)25 + 9 124, m = (*-39)-25 + 17 125, m = (k- 58)25

Figure 12: The example A I fractional-N PLL and frequency plan for generation of the Bluetooth wireless LAN RF channel frequencies.

The primary difference between the AS fractional-^ and integer-TV PLL models is the signal path corresponding to the AX modulator shown in the shaded region of Figure 11. The sequence, y[n] - a, consists of AS modulator quantization noise, e m [«], which, as described previously, gives rise to phase error in the PLL output. For the example second-order AS modulator it follows from the results presented in Section IV and the AS fractional-AT PLL model equations presented above that the PLL phase noise component resulting from em[n] has a PSD given by

l0.J-*J2.J*£tf

_j_Au(W] dBc/Hz

(10) The argument of the log function has the form of a highpass function times a lowpass function, which is consistent with the claim in Section III that the PLL lowpass filters the primarily high frequency quantization noise from the AS modulator. It follows from (10) that the phase noise resulting from em[n] can be decreased by reducing the PLL bandwidth or increasing the reference frequency. If a higher-order AS modulator is used, an equation similar to (10) results except that the exponent of the sinusoid is greater than two. This reduces the in-band portion of the quantization noise, but increases the out-of-band portion, which, depending upon the loop parameters of the PLL, can result in a somewhat lower overall phase noise. However, the PLL loop filter is highly constrained to maintain PLL stability, so the phase noise reduction that can be achieved by increasing the order of the AS modulator is limited in most applications [16].

=tan

™ iH)'

_ IKycoR b-\ ~ 2nN " b '

hw

(11)

. (

}

2

XJBW

and

S0 (/)|

C. A System Design Example

«l 0 .logf^-sin 2 Mf^T] dBc/Hz,

(14) where PM is the phase margin of the PLL, fBw is the 3 dB bandwidth of the PLL, and b = 1 + C2IC\ is a measure of the separation between the two loop filter capacitors [22]. The derivations assume that b is greater than about 10, and (14) is valid for frequencies greater than (C2+Ci)/(2^RC2Ci). These equations are sufficient to determine appropriate loop filter component values. For example, suppose b is set to

The PLL bandwidth and the phase margin both depend upon the loop gain, 7\s)9 which, for the loop filter shown in Figure 10, depends upon the parameters fn/, N, /, KVco, ^> C\9 and C2. Usually,/*,/and //are dictated by the application, and / and Kyco are, at least partially, dictated by circuit design choices. This leaves the loop filter components as the main variables with which to set the desired PLL bandwidth, phase

29

tion delay depends upon the divider modulus and the number of AI modulator output levels is greater than two, the effect is -80 that of a hard non-linearity applied to the AI modulator quantization noise. This tends to fold out-of-band AI modulator -100 quantization noise to low frequencies and introduce spurious -120 tones, which can significantly increase the PLL phase noise. N The problem is analogous to that of multi-bit digital-to-analog -140 O converter step-size mismatches in analog AI data converters CQ •o [23]. Unfortunately, circuit simulations are required to evalu-160 ate the severity of the problem on a case by case basis as both -180 the extent of any modulus-dependent delays and their affect on the PLL phase noise are difficult to predict using hand -200 analysis. -220 There are two well-known solutions to this problem. One 105 106u 10? 108 solution is to resynchronize the divider output to the nearest Hz VCO edge or at least a higher-frequency edge obtained from FigureB: Simulated and calculated PSD plots of the phase noise arising from within the divider circuitry [22], [24]. The ^synchronization A I modulator quantization noise for the example A I fractional-N PLL. erases memory of modulus-dependent delays and noise intro49, so, as indicated by (11), the phase margin is approximately duced within the divider circuitry, but care must be taken to 70°. Solving (14) with the phase noise set to -130 dBc/Hz a t / ensure that the signal used for resynchronization is itself free = 3 MHz indicates t h a t y ^ ~ 50 kHz. Therefore, the phase of modulus dependent delays. The primary drawback of the noise resulting from AI modulator quantization noise is suffi- approach is that it increases power consumption. ciently suppressed with a 50 kHz bandwidth and a phase marThe other solution is to use a AI modulator with single-bit gin of 70°. With this information (12) can be solved to find R (i.e., two level) quantization. In this case, modulus-dependent = 960 Q. with which (13) and the definition of b can be used to delays give rise to phase error at the output of the divider that calculate C2 = 23 nF and C\ = 480 pF. It is straightforward to consists of a constant offset plus a scaled version of the AI verify that the phase noise introduced by the loop filter resistor modulator quantization noise. Since, by design, the AI modu(the only noise source in the loop filter) is well below -130 lator quantization noise has most of its power outside the PLL dBc/Hz at offsets from the carrier of 3 MHz and above as re- bandwidth, the modulus-dependent delays increase the phase quired. noise only slightly. Unfortunately, AI modulators with singleFigure 13 shows PSD plots of the phase noise arising from bit quantization tend not to perform as well as AI modulators AI modulator quantization noise for the example PLL with the with multi-bit (i.e., more than two-level) quantization. For loop filter component values derived above. The heavy curve example, if the 9-level quantizer in the 48 Msample/s AI was calculated directly from the linearized model equations modulator example presented in Section IV were replaced by (7) through (10). The light curve was obtained through a a one-bit quantizer, the dynamic range of the AI modulator in behavioral computer simulation of the PLL. As is evident the zero to 500 kHz band would be reduced from 88.5 dB to from the figure, the two curves agree very well which suggests approximately 65 dB. Moreover, unlike the 9-level quantizer that the approximations made in obtaining the linearized case, the additive noise from the single-bit quantizer would not be white and would be correlated with the input sequence. model are reasonable. An effect that does not have a counterpart in integer-Af Its variance would be input dependent and it would contain PLLs is the presence of zeros in the PSD of the phase noise spurious tones. arising from AI modulator quantization noise at multiples of These problems can be mitigated by using a higher-order the reference frequency. These zeros are a result of the dis- AI modulator architecture to more aggressively suppress the crete-to-continuous-time conversion of the AD modulator in-band portion of the additive noise from the two-level quanquantization noise; each zero is a sampling image of the dc tizer. However, to maintain stability in a higher-order AI zero imposed on the quantization noise by the AI modulator. modulator with single-bit quantization, the useful input range of the AI modulator input signal must be reduced and more poles and zeros must be introduced within the feedback loop VI. AS FRACTIONAL-TV PLL SPECIFIC PROBLEMS as compared to a multi-bit design with a comparable dynamic One of the most significant problems specific to AI frac- range. Even then, the problem of spurious tones persists, and tional-AT PLLs is that they can be sensitive to modulus- it is difficult to predict where they will appear except through dependent divider delays. In practice, each positive-going extensive simulation. Furthermore, to compensate for the divider edge is separated from the VCO edge that triggered it restricted input range of the AI modulator the reference freby a propagation delay. Ideally, this propagation delay is in- quency must be large enough that all of the desired PLL outdependent of the corresponding divider modulus, in which put frequencies can be achieved. This can severely limit decase it introduces a constant phase offset but does not other- sign flexibility. For example, if the magnitude of the AI wise contribute to the phase noise. However, if the propaga- modulator input signal were limited to less than 0.5 in the case -60

"Exact" simulation Linearized Model

30

of the Bluetooth local oscillator application considered above, the reference frequency would have to be greater than 79 MHz. Otherwise, it would not be possible to generate all the Bluetooth channel frequencies. Another issue specific to AI fractional-TV PLLs is that modulus switching increases the average duration over which the charge pump current sources are turned on each period relative to integer-TV PLLs. For comparison, consider a AI fractional-TV PLL and an integer-TV PLL with the same TV (where TV » a), the samey^/, and identical loop components. It follows from (5) that

(15) The last term in (15), which is caused by having the AI modulator switch the divider modulus, represents a significant increase in the time during which the charge pump current sources are turned on each reference period. Consequently, the phase noise arising just from charge pump current source noise is larger in the AIfractional-TVPLL by T Averagefractional-TVPLL charge pump "on time"! L Average integer-TV PLL charge pump "on time" J where A is a constant between 10 and 20. The value of A depends upon the autocorrelation of the charge pump current source noise. For example, if the current source noise in successive charge pump pulses is completely uncorrelated, then A is 10. Near the other extreme, A is close to 20. VII. TECHNIQUES TO WIDEN AE FRACTIONAL-N PLL LOOP BANDWIDTHS A transmitter with virtually any modulation format can be implemented using D/A conversion to generate analog baseband or IF signals and upconversion to generate the final RF signal. However, many of the commonly used modulation formats in wireless communication systems such as MSK and FSK involve only frequency or phase modulation of a single carrier [25]. In such cases, the transmitted signal can be generated by modulating a radio frequency (RF) VCO, thereby eliminating the need for conventional upconversion stages and much of the attendant analog filtering. At least two approaches have been successfully implemented in commercial wireless transmitters to date. One is based on open-loop VCO modulation, and the other is based on AI fractional-TV synthesis. An example of a commercial transmitter that uses the open-loop VCO modulation technique is presented in [26] and [27], in this case for a DECT cordless telephone. Between transmit bursts, the desired center frequency is set relative to a reference frequency by enclosing the VCO within a conventional PLL. During each transmit burst the VCO is switched out of the PLL and the desired frequency modulation is applied directly to its input. The primary limitation of the approach is that it tends to be highly sensitive to noise and interference from other circuits. For example, in [27], the required level of isolation precluded the implementation of a single-

chip transmitter. Furthermore, the modulation index of the transmitted signal depends upon the absolute tolerances of the VCO components which are often difficult to control in lowcost VLSI technologies and can also drift rapidly over time. In principle, AI fractional-TV PLLs can avoid these problems by modulating the VCO within the PLL. This can be done by driving the input of the digital AS modulator with the desired frequency modulation of the transmitted signal. The primary limitation is that bandwidth of the PLL must be narrow enough that the quantization noise from the AI modulator is sufficiently attenuated, but sufficiently high to allow for the modulation. For instance, the phase noise PSD of the example ADfractional-TVPLL shown in Figure 5 with a 50 kHz loop bandwidth meets the necessary phase noise specifications when used as a local oscillator in a conventional upconversion stage within a Bluetooth wireless LAN transmitter. However, if the Bluetooth transmitter is to be implemented by modulating the VCO through the digital AI modulator, then the loop bandwidth of the PLL must be approximately 500 kHz. Unfortunately, when the loop bandwidth of the fractional-// PLL shown in Figure 5 is widened to 500 kHz, the resulting phase noise becomes too large to meet the Bluetooth transmit requirements. Nevertheless, commercial transmitters with VCO modulation through A I fractional-TV synthesizers are beginning to be deployed, especially in low-performance, low-cost wireless systems such as Bluetooth wireless LANs [28]. Facilitating this trend are various solutions that have been devised in recent years to allow for wideband VCO modulation in AI fractional-TV PLLs without incurring the phase noise penalty mentioned above. One of the solutions is to keep the loop bandwidth relatively low, but pre-emphasize (i.e., highpass filter) the digital phase modulation signal prior to the digital AI modulator [29]. Unfortunately, this approach requires the highpass response of the digital pre-emphasis filter to be a reasonably close match to the inverse of the closed-loop filtering imposed by the largely analog PLL. Another of the solutions is to use a high-order loop filter in the PLL with a sharp lowpass response [30]. Increasing the order of the loop filter increases the attenuation of out-of-band quantization noise which allows for higher-order AI modulation to reduce inband quantization noise thereby allowing the loop bandwidth to be increased without increasing the total phase noise. However, as described in [30], this necessitates the use of a Type 1 PLL which significantly complicates the design of the phase detector. Yet another solution is to use a narrow loop bandwidth but modulate the VCO both through the digital AI modulator and through an auxiliary modulation port at the VCO input [28]. The idea is to apply the low-frequency modulation components at the AI modulator input and the high frequency modulation components directly to the VCO. Again, matching is an issue, but it has proven to be manageable at least for low-end applications such as Bluetooth transceivers.

VIII. CONCLUSION The additional concepts and issues associated with AI

31

fractional-^ PLLs for frequency synthesis relative to integer-Af PLLs have been presented. It has been shown that AI fractionak/V PLLs provide tuning resolution limited only by digital logic complexity, and, in contrast to integer-^ PLLs, increased tuning resolution does not come at the expense of reduced bandwidth. Since one of the main innovations in a AS fractional-^ PLL is the use of a AE modulator to control the divider modulus, the relevant concepts underlying AI modulation have been described in detail. A linearized model has been derived from first principles and a design example has been presented to illustrate how the model is used in practice. Techniques for wideband digital modulation of the VCO within a delta-sigma fractional-TV PLL have also been presented.

ory, vol. 38, no.3, pp.1015-1028, May 1992. 12.

I. Galton, "One-bit dithering in delta-sigma modulatorbased D/A conversion," Proc. of the IEEE International Symposium on Circuits and Systems, 1993.

13.

S. W. Golomb, Shift Register Sequences. Laguna Hills, CA: Aegean Park Press, 1982

14. E. J. McCluskey, Logic Design Principles. Englewood Cliffs, NJ: Prentice-Hall, 1986. 15.

16. W. Rhee, B. S. Song, A. AH, "A 1.1-GHz CMOS fractional-N frequency synthesizer with a 3-b third-order AI modulator," IEEE Journal of Solid-State Circuits, vol. 35, no. 10 , pp. 1453-1460, October 2000.

ACKNOWLEDGEMENTS

17. W. L. Lee, C. G. Sodini, "A topology for higher order interpolative coders," Proceedings of the 1987 IEEE International Symposium on Circuits and Systems, vol. 2, pp.459-462, May 1987.

The author is grateful to Sudhakar Pamarti, Eric Siragusa, and Ashok Swaminathan for their helpful discussions and advice regarding this paper.

18. K. C.-H. Chao, S. Nadeem, W. L. Lee, C. G. Sodini, "A higher order topology for interpolative modulators for oversampling A/D converters," IEEE Transactions on Circuits and Systems, vol. 37, no.3, p.309-318, March 1990.

REFERENCES 1.

P. M. Gardner, "Charge-pump phase-lock loops," IEEE Transactions on Communications, vol. COM-28, pp. 1849-1858, November 1980.

2.

B. Razavi, Design of Analog CMOS Integrated Circuits, McGraw Hill, 2001.

3.

Bluetooth Wireless LAN Specification, Version 1.0, 2000.

4.

U. L. Rohde, Microvave and Wireless Synthesizers Theory and Design, John Wiley & Sons, 1997.

5.

B. Miller, B. Conley, "A multiple modulator fractional divider," Annual IEEE Symposium on Frequency Control, vol. 44, pp. 559-568, March 1990.

6.

B. Miller, B. Conley, "A multiple modulator fractional divider," IEEE Transactions on Instrumentation and Measurement, vol. 40, no. 3, pp. 578-583, June 1991.

7.

T. A. Riley, M. A. Copeland, T. A. Kwasniewski, "Delta-sigma modulation in fractional-N frequency synthesis," IEEE Journal of Solid-State Circuits, vol. 28, no. 5, pp. 553-559, May, 1993.

8.

S. K. Tewksbury, R. W. Hallock, "Oversampled, linear predictive and noise-shaping coders of order N >1," IEEE Transactions on Circuits and Systems, vol. CAS25, pp. 436-447, July 1978.

9.

G. Lainey, R. Saintlaurens, P. Serin, "Switched-capacitor second-order noise-shaping coder," IEE Electronics Letters, vol. 19, pp. 149-150, February 1983.

10. I. Galton, "Granular quantization noise in a class of delta-sigma modulators," IEEE Transactions on Information Theory, vol. 40, no. 3, pp. 848-859, May 1994.

S. K. Tewksbury, R. W. Hallock, "Oversampled, linear predictive and noise-shaping coders of order N >1," IEEE Transactions on Circuits and Systems, vol. CAS25, pp. 436-447, July 1978.

19. Y. Matsuya, K. Uchimura, A. Iwata, T. Kobayashi, M. Ishikawa, T. Yoshitome, "A 16-bit oversampling A-to-D conversion technology using triple integration noise shaping," IEEE Journal of Solid-State Circuits, vol. SC22, pp. 921-929, December 1987. 20.

K. Uchimura, T. Hayashi, T. Kimura, A. Iwata, "Oversampling A-to-D and D-to-A converters with multistage noise shaping modulators," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. AASP36, pp. 1899-1905, December 1988.

21.

J. Craninckx, M. S. J. Steyaert, "A fully integrated CMOS DCS-1800 frequency synthesizer," IEEE Journal of Solid-State Circuits, vol. 33, pp. 2054=2065, December 1998.

22.

S. Pamarti, "Techniques for Wideband Fractional-Af Phase-Locked Loops," PhD Dissertation, University of California, San Diego, 2003.

23.

S. R. Norsworthy, R. Schreier, G. C. Temes, Eds. DeltaSigmaData Converters, Theory, Design, and Simulation, New York: IEEE Press, 1997.

24.

L. Lin, L. Tee, P. R. Gray, "A 1.4 GHz differential lownoise CMOS frequency synthesizer using a wideband PLL architecture", IEEE ISSCC Digest of Technical Papers, pp. 204-205, Feb. 2000.

25.

J. G. Proakis, Digital McGraw Hill, 2000.

26.

S. Heinen, S. Beyer, J. Fenk, "A 3.0 V 2 GHz transmitter IC for digital radio communication with integrated VCO's," Digest of Technical Papers, IEEE International Solid-State Circuits Conference, vol. 38, pp. 150-151,

11. N. He, F. Kuhlmann, A. Buzo, "Multiloop sigma-delta quantization," IEEE Transactions on Information The-

32

Communications,

fourth ed.,

Feb. 1995. 27. S. Heinen, K. Hadjizada, U. Matter, W. Geppert, V. Thomas, S. Weber, S. Beyer, J. Fenk, E. Matshke, "A 2.7 V 2.5 GHz bipolar chipset for digital wireless communication," Digest of Technical Papers, IEEE International Solid-State Circuits Conference, vol. 40, pp. 306-307, Feb. 1997. 28. N. Filiol, et. al., "A 22 mW Bluetooth RF transceiver with direct RF modulation and on-chip IF filtering," Digest of Technical Papers, IEEE International Solid-State Circuits Conference, vol. 43, pp. 202-203, Feb. 2001. 29. M. H. Perrott, T. L. Tewksbury III, C. G. Sodini, "A 27Ian Galton received the Sc.B. degree from Brown University in 1984, and the M.S. and Ph.D. degrees from the California Institute of Technology in 1989 and 1992, respectively, all in electrical engineering. Since 1996 he has been a professor of electrical engineering at the University of California, San Diego where he teaches and conducts research in the field of mixed-signal integrated circuits and systems for communications. Prior to 1996 he was with UC Irvine, the NASA Jet Propulsion Laboratory, Acuson, and Mead Data Central. His research involves the invention, analysis, and integrated circuit implementation of key communication system blocks such as data converters, frequency synthesizers, and clock recovery systems. The emphasis of his research is on the development of digital signal processing techniques to mitigate the effects of non-ideal analog circuit behavior with the objective of generating enabling technology for highly integrated, lowcost, communication systems. In addition to his academic research, he regularly consults at several communications and semiconductor companies and teaches portions of various industry-oriented short courses on the design of data converters, PLLs, and wireless transceivers. He has served on a corporate Board of Directors and several corporate Technical Advisory Boards, and his is the Editor-in-Chief of the IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing.

33

mW CMOS fractional-N synthesizer using digital compensation for 2.5-Mb/s GFSK modulation," IEEE Journal of Solid-State Circuits, vol. 32, no. 12, pp. 20482059, Dec. 1997. 30. S. Willingham, M. Perrott, B. Setterberg, A. Grzegorek, B. McFarland, "An integrated 2.5GHz LA frequency synthesizer with 5 ns settling and 2Mb/s closed loop modulation," Digest of Technical Papers, IEEE International Solid-State Circuits Conference, vol. 43, pp. 200201, Feb. 2000.

Designing Bang-Bang PLLs for Clock and Data Recovery in Serial Data Transmission Systems Richard C. Walker coaxial delay lines for setting the timing of the recovered sampling clock with respect to the data eye [1].

Abstract - Clock recovery using phase-locked loops (PLL) with binary (bang-bang) or ternary-quantized phase detectors has become increasingly common starting with the advent of fully monolithic clock and data recovery (CDR) Circuits in the late 1980's. Bang-bang CDR circuits have the unique advantages of inherent sampling phase alignment, adaptability to multi-phase sampling structures, and operation at the highest speed at which a process can make a working flip-flop. This paper gives insight into the behavior of the nonlinear bangbang PLL loop dynamics, giving approximate equations for loop jitter, recovered clock spectrum, and jitter tracking performance as a function of various design parameters. A novel analysis shows that the bang-bang loop output jitter grows as the square-root of the input jitter as contrasted with the linear dependence of the linear PLL.

Early monolithic CDR designs imitated these discrete block diagrams. The propagation delay differences between data and clock paths could be ignored as long as the gate delay skew was a negligible fraction of the total bit time, or unit interval. The need for higher link speeds grew faster than Moore's law, and as clock frequencies approached the effective fT of the active devices, it became increasingly difficult to maintain an optimum sampling phase alignment between the recovered clock and the data over process, temperature, data-rate, and voltage variations. A second problem was that most linear phase detectors produced narrow pulses with widths proportional to the phase error between the timing of the data and the clock [2], [3]. These narrow pulses required a process speed in excess of that required to simply sample data at a given rate. The timing skew and speed of linear phase detector circuits then became the limiting factor for aggressive designs.

I. INTRODUCTION Prior to the advent of fully monolithic designs, clock recovery was traditionally performed with some variant of the circuit in Fig. 1. The clock frequency component was typically extracted from Variable Delay Block 4 nput NRZ Data

Both these difficulties are eliminated by a family of circuits which simultaneously retime data and measure phase error by using matched flip-flops to sample both the middle of each data bit and the transitions between the data bits. Fig. 2 shows such an

Retiming Latch

X

D Q

Retimed Data

/\

/

d dt

X2

pulse conditioning

samples of master transitions-,

BPF or PLL

Recovered Clock

samples of all transitions Input Data

frequency extraction

D Q X A

Y jt

Fig. 1. Traditional non-monolithic clock and data recovery architecture.

t D Q —• A

2 divide by 20

V Y D Q — • Retimed Data

the data stream using some combination of differentiation, rectification and filtering. The bandpass frequency filtering was provided by LC tank, surface acoustic wave (SAW) filter, dielectric resonator or PLL. Because the clock recovery path was separate from the data retiming path, it was difficult to maintain optimum sampling phase alignment over process, temperature, data-rate, and voltage variations. Even the PLL techniques had the drawback of using phase detectors with different set-up times than the retiming flip-flop so that the recovered clock was not intrinsically aligned to the optimum sampling point in the data eye. Circuits utilizing SAW resonator filtering typically required hand matching of SAW and circuit temperature coefficients along with custom cut

vco loop filter

Fig. 2. A simple bang-bang loop using a flip-flop for a phase detector to lock onto a data stream with a guaranteed "0" to " 1 " transition every 20 bits. early gigabit-rate monolithic example of such a circuit [4] which samples data with two matched flip-flops. Flip-flop "Y" samples the middle of each data bit on the rising edge of the VCO clock to produce retimed data, while flip-flop "X" samples the transition of each bit using the falling edge of the VCO clock. The loop is designed to use the 16B/20B line code of Fig. 3 which guarantees a " 0 1 " "master transition" every 20 bits. The divide by 20 circuit and associated flip-flop in Fig. 2 discard every

R. Walker is with Agilent Laboratories, 3500 Deer Creek Road, MS 26-U4, Palo Alto CA 94304. (e-mail: [email protected]).

34

0.8

training sequence 16 data

data

I

means: 16 data means: 16 data means: Training Sequence means: Control Word

0.6

: :

: :

[38](8x)^j| [SOpx) 7 :

0.4

f

:

g - { 3 5 J ••••;

i

I

T> 2 9 1

: [361 (4x) #

J30J(4x) ' "

••• [ 1 9 l ( 1 O x >

f13K10x>

*

*[27]

0.2 0.1 0.08 0.06 0.04

* [25J

: fo-n ^

i (15](2X)-" / ? ^ ( 2 X ) £ H 3 3 ] i : 1261(8*) B . i [22] !- Q . . . l J j .PI] j ["] : [32]

lij*i"

Trck] * H 7 3 i

i

i

i

i

: I o

0.02

• Master Transition

I

Linear PLL BBPLL

1121. 0.01

Fig. 3. Format of 16B/20B line code used with bang-bang CDR of.Fig. 2.

1988

1990

1992

1994

1996

1998

2000

2002

year of publication

Fig. 4. CDR PLL designs over time. The ratio of link speed to effective process transit frequency is plotted vs year of publication. Multi-phase BB PLLs predominate as data rate approaches the process transit frequency limit. (The number of retiming phases used in each design is given in parentheses.)

transition sample except for this master transition sample. During link start-up a training sequence is sent that has only one rising transition at the location of the master transition. Once the loop is locked, arbitrary data is allowed to be sent at the other 18 bits of the frame, while the transition sampler pays attention only to the data stream in the vicinity of the master transition. If the VCO frequency is too high, the transition flip-flop starts sampling prior to the master transition and outputs a "0" to the loop filter. A slightly lower VCO frequency, on the other hand, will cause the loop to be driven by l's.

II. FIRST-ORDER LOOP DYNAMICS Unfortunately, transition-sampling flip-flop-based phase detectors can provide only binary (early/late) or ternary (early/late + hold) phase information. This amounts to a hard non-linearity in the loop structure, leading to an oscillatory steady-state and rendering the circuit unanalyzable with standard linear PLL theory. Precise loop behavior can be simulated efficiently with time-step simulators, but this is cumbersome to use for routine design. Fortunately, simple approximate closed-form expressions can be derived for performance parameters of interest, such as loop jitter generation, recovered clock spectrum, and jitter tracking performance as a function of various design parameters.

The loop drives the falling edge of the VCO into alignment with the data transitions based on the binary-quantized phase error. Because the clock-to-Q delay of the retiming flip-flop is monolithically matched with the phase detector flip-flop, the PLL aligns the recovered clock precisely in the middle of the data eye with nofirst-ordertiming skew over process and temperature variations. Because the narrowest pulse is the output of a flip-flop, such detectors operate at the full speed at which a process is capable of building a functioning flip-flop. This ensures that the phase detector will not be the limiting factor in building the fastest possible retiming circuit.

VCO

An additional advantage of flip-flop-based phase detectors is that since they only require simple processing of digital values, they easily generalize to multi-phase sampling structures allowing CDR operation at frequencies in which it would be impossible to build a working full-speed flip-flop. In contrast, most linear phase detectors require at least some analog processing at the full bit rate, limiting process speed and poorly generalizing to multi-phase sampling architectures.

'update

Fig. 5. A simple bang-bang loop using aflip-flopfor a phase detectoi to lock onto square-wave input.

A simple BB PLL is shown in Fig. 5. A flip-flop is used as a phase detector to lock onto a square wave input signal. Depending on whether the VCO phase samples slightly before or after the rising edge of the input square wave, the flip-flop output is either low or high, adjusting the VCO period in such a way as to move the sampling phase error back towards zero. The dynamics of such a binary-quantized loop are equivalent to a data-driven phase detector operating on alternating 0,1 data with 100% transition density, or a master-transition based loop similar to that shown in Fig. 2. For simplicity, we assume that a valid binary phase determination can be made at every timestep. The consequence of random data

Because of these compelling advantages, the bang-bang loop has become a common design choice for state-of-the CDR designs which are pushing the capability of available IC processes. Fig. 4 surveys CDR designs presented from 1988 to 2001 at the International Solid State Circuits Conference. Designs are plotted by year of presentation against each design's ratio of link speed to effective fp The majority of current designs utilize a combination of multiphase sampling structures and bang-bang PLLs. In addition, all CDRs operating at data rates greater than 0.4 fT are bang-bang designs.

35

and the introduction of a ternary hold mode are considered in a later section.

frequency detector, these non-uniform sampling times must be accounted for.

Thefirst-orderBB PLL of Fig. 5 can be rendered into a block diagram for analysis as shown in Fig. 6. The loop phase error

With the uniform time step approximation, the VCO phase changes up or down (or "walks off') by

ev

•w i s

P

I

I Q

d

%

Kv

e

fvco-fnom

fin = fnom V

^ a n s during each update period.

In summary, the first order loop obeys a simple set of discrete time difference equations:

1 s

tn

+

ra(

®bb ~ ^(fbb^fnom)

ee{±l}

('„)

w

+ z

fbb %(tn+l)

Fig. 6. Block diagram offirstorder loop showing definition of signal names.

=

Wn)

+ £

r,Qbt

(2)

en = sign[e^ M )-e v (^)]

0)

&e(tn) , is defined as the difference between the data phase Bd(tn)

and the VCO phase 6v(tn)

As long as the VCO frequency step brackets the input signal frequency error, the loop will remain phase locked. Assuming

at the nth sampling time

<|>(0 small, the lock range is: - / ^ < 8 / < fbb.

tn. For convenience, phase is measured with respect to an ideal clock source running at fnom

The loop gen-

erates an excess hunting jitter with a peak-to-peak value of two



bang-bang phase steps Jpp=

4%(fbb/fnom).

The frequency of the incoming data signal differs from the For the loop to be locked, the average VCO frequency must equal the average data frequency. The phase detector duty cycle

VCO center frequency by 8 / , and has a zero mean phase jitter of
C, must satisfy the relation

generated by a pattern generator clocked on the rising edges of the jittered clock signal sm[2n(fnom

phase Qd(tn) is then 2ndftn

+ 8 / > + ty(t)]. The data

§/= C(fbb) +

+ Wn)-

(l-C)(-fbb).

The value of C is then given by

c = (l+

The phase detector binary-quantizes the loop phase error at

df

)

each sampling time to give £ w = s i g n [ 9 e ( / n ) ] . (Note: In the The phase detector duty cycle, and therefore its average output voltage are proportional to the loop frequency error. Fig. 7 shows a simulated loop with a range of input frequencies. The loop is

case of a ternary data-driven phase detector, Zn may be set to 0 when it is not possible to make a determination of phase error due to consecutive identical bits in the data stream. The consequence of this "hold" state is treated in a later section). The error signal drives the VCO through an attenuator p , to produce a change in frequency of fbb — P^vco' F r o m t * me *n unt** t * m e *n + 1 ' the VCO operates at one of the two frequencies given by J nom

fvGO

249O.0

/«-•/»

| 2484.C 200.0

nJ bb'

/„,„ /.«.-/«

Fin out of lock

In lock

out of lock

•V:

Because the VCO frequency changes on each cycle, the system has non-uniform sampling times. The time of phase sample

e8 -200.0

'« + 1

=

*n + l/(fnom



Jb^

on the order of 0.1% of fnom form time steps of t

•In a ^ ^

CDR

> fbb

is

5.0

, so that an analysis assuming uni-

^ate = 1 /fnom

10.0

15.0

time (jiseconds)

Fig. 7. Simulated response offirst-orderPLL to a range of input frequencies.

is sufficiently accurate

for most purposes. However, for loop analyses requiring exact charge pump balance, such as wide-range loop pull-in without a

"locked" whenever the input frequency is bracketed by the two VCO frequencies. The rapid alternation between frequencies

36

slightly too high and slightly too low creates a bounded hunting jitter (Jpp).

Proportional (BB) branch

The derivative of the input data phase deviation, d[$(t)]/dt, adds to the frequency error that must be tolerated by the loop. Assuming 8 / = 0 , then for <|)(f) = Asin(2Kfmodt), the

D

Q

VCO

P

v$

Xs

>

maximum amplitude A of phase modulation at frequency fmod before onset of slew-rate limiting is \f^A/

f

mod

Integral branch

• Fig- 8. demon-

Fig. 9. Second-order bang-bang loop schematic. 2490.0

tered on the average incoming data frequency. If certain assumptions are met, as described later, we can consider the system to be composed of two non-interacting loops. These are the loops labeled "bang-bang branch" and "integral branch" If the center frequency control loop is slow enough, the resulting loop behavior will be very similar to a simple first-order loop, but with an extended frequency lock range.

•vco fin 2486.CH

Mb

0.0

°v.

-200.0

ee

0.0

-100.0

5.0

A. Stability Factor

6.0 7.0 time (^seconds)

To preserve the desirable qualities of the first order loop, it is critical that the phase change due to the proportional branch dominate over the phase change from the integral branch.

8.0

Fig. 8. Simulated response of first-order PLL to sinusoidal input jitter just slightly beyond the tracking capability of the loop.

The loop phase change in one update time due to the proportional connection is AQbb = (5 V±K tupdate.

strates the loop at the onset of jitter-induced slew-rate limiting. Although the average input frequency lies within the lock range of the loop, the added sinusoidal jitter causes the instantaneous input

due to the integral branch is A 9 / r t / = V^JL f

The phase change upda%e/{!%)

. The

ratio of these two is the stability factor of the loop

frequency deviation to exceed i / ^ £ • The loop stops toggling and goes into slew rate limiting, leading to a transient phase error.

qt __

^proportional

__

^^integral

A. Summary of First-Order Loop

2pT

^

'update

The reader should be careful not to confuse the bang-bang loop The first-order bang-bang loop has only one degree of freedom.

stability factor t, with the linear loop damping factor £ [5].

Jitter generation, lock range, and jitter tolerance are all inconveniently controlled by one parameter, fbb.

This situation can be

The discrete time difference equations for the second-order loop can be written as

improved by using a second control loop to dynamically adjust the nominal VCO frequency fnom

to be equal to the incoming data

Qd(tn) = Qd(0) + 2n8ftn + Wn)

frequency. Because the phase detector duty cycle is proportional to the loop frequency error, this dynamic centering of VCO frequency can be accomplished by adjusting the VCO center frequency in a feedback loop to drive the phase detector duty cycle C to 50%. This decouples the lock range from jitter tolerance and jitter generation, giving more design freedom.

8A+i> = W + ^

(

e

M

« 2" 1

+^ + ^ J

en = sign[e rf (/ n )-9 v (g]

III. SECOND-ORDER LOOP DYNAMICS

(4)

(3)

From this, it can be seen that the second-order loop has two degrees of freedom, the loop phase step Qbb (or equivalently, the

To extend the loop tracking range independent of the jitter generation, an extra integrator is added between the phase detector and the VCO as in Fig. 9. Since the first-order loop dynamic produces a phase detector duty cycle proportional to the loop frequency error, this added integrator can be viewed as an automatic means for keeping the first-order portion of the loop properly cen-

loop frequency step fbb

) and the stability factor £ . The added

loop integrator extends the frequency tracking range, leaving 6 ^ free to control jitter tolerance and jitter generation.

37

B. Simulations of Second-Order Loop

ited to ^fbb,

Fig. 10 shows two block diagrams for the second-order loop. The upper diagram is a straightforward translation of the schematic in Fig. 9. The lower diagram is a topological re-arrangement

sient at the sampling flip-flop.

then there is no jitter accumulation or phase tran-

2480 0 1 400.0

!_

1

jpTH

tT

Si

i" H

, v rElh

II

-!!^z)J7L^

,

!

!

!

!

? ,

, *

i

r

T ,

£*"i^

!

!

!

!

!

! _

!

,

,

|

,

fi

*

I

| ? ! I j ye r^*\

l

i

i

i

i

[

i

i

I

i

4.0

KV =

.-• L—I* ..-' J

0.0

2.0

W 6

tint f

i i

!



^^

,

i

i

I I ! ?

i

i

i

i

i

i

i

i

i

i

i

i

i

i

i

i

i

i

i

i

5.0 6.0 time (^seconds)

I

I

7.0

Fig. 12. Second-order loop response to instantaneous frequency

\K '—' '—'

step larger than fbb

Af A6i A92 0 e tp V^ Fig. 10. Two equivalent second-order bang-bang loop block diagrams. The proportional phase-control signal flow is highlighted with a dashed line, and the integral frequency-control loop with a solid line.

.

Fig. 12 is a simulation in which the input frequency step is bigger than fbb,

so the loop goes into slew rate limiting, leading to a

transient phase error Qe at the sampler. which places an inner first-order phase tracking loop inside an outer frequency tracking loop. If one writes the transfer function from the output of the non-linear quantizer block back to the input of the quantizer, it can be shown that both diagrams are exactly equivalent. Some of the signals in the second diagram do not correspond to actual physical variables in the circuit, but they are helpful in understanding the operation of the loop.

C. Response to Phase Step For a normalized transient phase step of A = G ^ p / Q ^ , a first-order loop relocks in A update times. The total time for relocking is then ^step/(2nfbb)

.

During the relocking transient of the second-order loop, the loop integrator overshoots the correct steady-state VCO tune voltage. This causes a quadratic overshoot in the phase trajectory.

2.0

^

00

i I

. I

. I

• I

. I

I

. I

. I

. I

. I

. I

. I

i I

. I

. I

i

I

i

I

i

i

I

§

i

I

Fig. 13 shows the second-order phase step response with ^ as a parameter. Up to the first zero crossing, the phase trajectory is given by

1 I

IHfMffllfillil i

I

I

i

i

4.0

5.0 6.0 time (^seconds)

r

7.0

®bb

V

S /

Fig. 11. Second-order loop response to instantaneous frequency

with n = t/tupdate>

step smaller than fbb

approaches A as t, —> °o 9 consistent with a first-order loop. In general, the second-order loop is quicker to reach zero phase error than the first-order loop, but pays for this with an oscillatory overshoot. As a conservative rule of thumb, the magnitude of the oscillatory transient of a second-order step response can be considered bounded by the simple linear transient of the first-order loop. The time required to reach steady state, given a step of A is always less than or equal to A timesteps, independent of £ .

.

Fig. 11 shows the second-order loop responding to a step change in input frequency fin,

producing a slow response

in the outer integral loop. The resulting phase error A 0 j tracked by the inner bang-bang loop 0 V

fint is

to produce the final

sampler phase error Qe . Notice that, unlike linear PLLs, if the power-supply noise-induced VCO frequency modulation is lim-

38

The time of the first zero crossing

200.0

5OO 400

I\ °-°S

^infinite

300 200

-200.0 100.0

g«2d00

100

CO

o

,

CD

-10O

£-200

-200 -300

•8

-400

'\*z

-500 0

100

A91 AG2 9

0.0. « -100.0 2.0 I

S*20.

ot.tr v

V

A0 2 V6-

0.0

200

300

400

500

600

700

time / typdafe

5

Fig. 13. Noise-free loop response to a phase step with stability

-2.04.0 1

5.0 6.0 time (jiseconds)

7.0

Fig. 15. Second-order loop response to large sinusoidal input jitter.

factor ^ as a parameter.

load so that a loop can be easily designed to never slew for signals meeting a typicalfrequency-domainjitter tolerance specification.

IV. S L O P E O V E R L O A D

A. Delta-Sigma Analogy

Many systems, such as SONET, specify jitter tolerance in the form of a sinusoidal jitter at various frequencies. 100.0

£ e i. %.

0.0' 5

-50.0 2.0

SB

4»(t)'

-AegS

-100.0 50.0

TJ

Before developing an analytic equation for slope overload, it is helpful to introduce a further rearrangement of block diagram II from Fig. 10. Fig. 16 transforms the loop by pulling two integra-

%.

s**(t)

-A92

fin

CE

fbb

[L

X

1

2fbb



s

0.0 %

-2.0

4.0

6.0

5.0

7.0



tn

first order AE on Af

Fig. 16. Redrawing of the loop to show inner AX inner modulator operating on the loop frequency error.

time (n seconds) Fig. 14. Second-order loop response to sinusoidal input jitter.

tors through the last summing node prior to the quantizer. The update time interval is set to 1. The definition for bang-bang frequency step f^ = $KVV±, and stability factor

Fig. 14 shows the loop response with a sinusoidal input phase jitter <|>(f) . The outer integral loop tracks the input jitter at AGj

^ = 2pT/£ u p ( j a t e are also substituted in.

with a slight phase lag. The resulting phase error A 9 2 is tracked The shaded area in Fig. 16 shows how the proportional feedback loop can be thought of as an inner AX modulator producing a phase detector duty cycle proportional to the VCO frequency error [6],[7].

by the inner bang-bang loop 6 y to produce the final sampler phase error 0^ . The duty-cycle of the PD output F . varies with the slope of A 9 2 which is proportional to the instantaneous fre-

Fig. 17 summarizes an analysis of the first order delta-sigma (after [8]). When the loop is not in slew rate limiting, or in a periodic limit-cycle, the quantizer (e.g., PD) can be replaced with a

quency error of the outer loop. In Fig. 15, the phase modulation is increased until the instantaneous frequency error exceeds the inner loop's ability to track. Slew-rate limiting produces a tracking error at the sampler Qe . A

unity gain element and a noise source Q(z)

Asin(2ntft/tupda(e)/(2nft/tupdate)

with the same

noise characteris-

tics as a random binary bitstream. Both these constraints are met in practice as the VCO phase noise is sufficient to eliminate any deterministic limit cycles, and the loop is designed to never slew rate limit on any conforming input signal. This insight is critical as

CDR would normally be designed such that slewing would never occur for any valid signal allowed by a particular standard. The next two sections develop an analytic expression for slope over-

39

maximum normalized input phase as a function of normalized frequency max.

O7

H(z)

£ X(z)

Y(z)

Q(z)

^-TrmXU)+rniu)Q^ c

O)

CD

s+s 2^^/(

3

2 "\

')•

This is a curious bootstrapped analysis, in that it assumes a lack of slewing to justify the linearization which permits the computation of the onset of slew rate limiting.

r

c

// 2

-V-(( i)

L

(integration)

v

(s)

Fig. 18 shows a good agreement between this expression and simulated loop performance in which slewing is defined as a contiguous sequence of ten or more identical phase-error indications. This expression can be used to design a loop for a given jitter tol100G

freq

freq

•6-0.1;

1G

Fig. 17. Simplified analysis of delta-sigma circuit.

fctQ.

10M

it allows linear analysis to be applied whenever the bang-bang loop is not in slew rate limiting.

100k

0.1

in

10H

1m

100|A

10m

0.1

1

10

jitter frequency * t u p d a te

Fig. 18. Normalized amplitude of sinusoidal jitter just sufficient to cause slope overload as a function of normalized jitter frequency and with ^ as a parameter.

A closed-form analysis of slope overload can now be derived. Referring to Fig. 16, the system slews when

|AF| >

f^-

Assuming no slew rate limiting, we can use the results from the AZ analysis to justify replacing the loop quantizer with a unity gain element. The maximum input phase jitter in UI as a function (s) , normalized to 8 ^

can then be calcu-

erance. The tolerance plots are single-pole slope for high ^ and high jitter frequency, becoming double-pole at lower frequencies and small i;. At high frequencies, all of the curves become asymptotic to the single-pole tolerance of a first-order bang-bang PLL. The operating region below each of these curves is where the AE approximation is valid, and where a linear loop analysis is justified.

lated using Laplace transforms. We want to find an input excitation F(s),

for which

at all frequencies. The inner AZ of Fig. 16 has a

linearized transfer function of \/{s

+f

b b

source phase noise

) . Using standard

feedback loop theory, the expression for AF can then be written as

AF =

F\

Setting A F = fbb, and tUpciate

I

P + JT

Kv S

L

output

VCO open loop phase noise

Fig. 19. Loop redrawn replacing phase detector with unity gain element and additive quantization noise.

1 ^

and normalizing the equation by letting = 1 , we can solve for F(s)/s

L i

—m—.

1+ ffbbV

I

BB phase noise of form: Asin(x)/x

U* JU+/J

fbb

W.999

10

B. Expression for Slope Overload

|AF| = fbb

points shown are from numerical simulation

"e-iod

1k

With the AE substitution, the inner loop becomes a wide-band unity-gain block as seen from the viewpoint of the outer integral frequency control loop. The noise in the delta-sigma core is firstorder frequency shaped towards high frequencies. However, when the frequency noise is converted to phase noise, the shaping is lost and the noise becomes flat.

of frequency, O •

(s2 + s + ?)/(s3 + s2)

S=3

to get the

40

V. JITTER GENERATION

VI. GAUSSIAN INPUT NOISE

With these insights, it is possible to accurately predict the loop jitter generation in the frequency domain. Fig. 19 is a redrawing of the loop replacing the phase detector by a unity gain element, and an additive noise source. The forward loop gain is

Fig. 21 is a plot of output jitter vs input jitter with £ as a 1OM j

100k

From this can be calculated two transfer functions: the lowpass seen by both the source phase noise and the PD noise to the output, A (5)= 1 / [ 1 + H(s)], and the high-pass transfer function from VCO phase noise to the output, B(s)= H(s)/[l + H(s)]. As shown in Fig. 20, with a source phase noise P(s), a PD phase noise Q(s), and a VCO phase noise R(s), the total loop jitter generation spectrum becomes the RMS combination of each of the three weighted terms J(PA)2

J(s)=

N

I

-80 • 1

" 90

1777(7) ...» JL —IV^^

TTJHT)

.

.

.--{ ! .

0.1

approximated J

I 10k

_J 100k

.

s***^..:

I 1M

! 10M

• "—— 100M 1G

Fig. 20. Example computation of loop jitter generation spectrum

with parametersfrom[11]. generally taken to be the spectrum of the clock driving the data source or BERT, or in the case of a clock multiplying circuit, the spectrum of the reference clock corrected by 20 times the log of the loop frequency multiplication ratio.

™max

W -

J

RMS

RMS =

atan

jitter

in

(^M^)

/TC

unit

intervals

is

' 1k

• 10k

» 100k

1M

by + J

three

regions

of

+ J

InRe

the out

ls a

PP r o x i m a t e l Y

ion J

operation:

-In Region III, the output RMS jitter

ec ual t 0

l

0-7 * J<5j . This surprising

result says that loops with large ^ have output jitter which grows as the square root of the input jitter. Contrast this with a linear PLL which simply low-pass filters the input jitter and thus has an output jitter which grows linearly with the input jitter. An approximate analysis of loop jitter can shed light on this curious square-root dependence of output jitter on input jitter. Assume

0 The

J

^walk

RMS = J

' 100

JUn * 2 a . / ( 1 + 7 | )

The phase noise power is given by S

* 10

i

total " idle linear walk • g > P u t J itter is independent of input jitter G .. This occurs when the self-generated hunting jitter exceeds the input jitter. The RMS jitter in this region is empirically determined to be well approximated by ^idle ~ ^ + (1.65/2;) . In Region II, the output jitter is proportional to the input jitter. This occurs when the input jitter is so high that, for a given £ , the bang-bang dynamic is unable to control the second-order portion of the loop. This leads to large quadratic trajectories in the phase domain, causing the loop phase to "hunt" towards the limits of the input jitter distribution. As the loop phase nears the limits of the input jitter distribution, the bangbang hunting has more effect on stabilizing the second-order loop. In this region, the output jitter is proportional to the input jitter:

^^-r-™ measured phase noise ..J-.-.r?*^' } * "J^toaaii,

-140 I 1k

* 1

^^r

0££ , the loop phase step size. The total loop output jitter can be

1

-130 source phase noise TTfoiiaiMV-.. I..>K^...I -140 I ! ^»»»»~in | Tiihi ii MrtllH

"O -120 -130

—.

:

parameter. For convenience, all jitter sigmas are normalized to

§ I120 11'" 1""_" 1 ? ii^jii^? 1 *^:!: 111111 • 11 iTTr^r^ •=*> r>n&se noiso_35^

iS-110 ....A^.:...^w?!...i.i^[.;;

:

Fig. 21. Normalized output jitter vs input jitter sigma with ^ as a parameter. Simulation is for a non-tristated loop, with square wave data input, 10 timesteps per point, and ignoring phase wrapping.

."--. 1

. , . } computed phase noise .1 1...

:

'\^"y5\'^' j ^ ^ ^ " ^ ^ ^ ^ ^ ^ ^ $-1*3

0-1

TilEnmttiLy.J.y — l-^^^^v.^ii^*' vco phase noise

-80 | ^ . M ~E -90 - ^ y g ^ ^ i ^ - - - -{ 75 - 1 0 0 ...^'yiWHUmiWiHL^.,. 1

:

:

1 •£+**^£%&rmtrMiu i

+ (QA)2 + (RB)2 . The source phase noise is

-100 ^fff^Z.A .12o L 1

;

then



It should be noted that the linearized loop model is only suitable for computation of the jitter spectrum but not for computing the actual sampling point phase error or other time-domain transient response. The linearized response only covers the dynamics of the outerfrequencytracking loop, but does not capture the extra

a zero-mean input jitter distribution with a sigma G -. Using a linearized approximation to the standard probability distribution function, the probability of getting an "early" phase error indication for small loop phase deviation A 6 ,

tracking of the internal nonlinear A S core.

41

is approximately

VII. DATA-DRIVEN PHASE DETECTORS

1 A9 The expected phase change in the loop after one update time is

e

«((i-"«)~"')-^e»

Unless the data contains a guaranteed periodic transition, the CDR will be required to lock onto random transitions embedded in the data stream. The effects of runlength and transition density on loop performance must then be considered. The effect of these two data attributes is dependent on the type of phase detector used. Most modern codes use some variation of Alexander's phase detector [9] shown in Fig. 22.Two matched flip-flops form the

The discrete time equation for the average evolution of loop phase under the condition of a small input phase error can then be expressed as

(

Retimed Data

26 ^

B

a time constant of T = GJ2K/(2Q^^)

50% duty-cycle clock from VCO

T

tune

given by OBLW = vppJtbi/&x) ' E x t e n d i n g t h i s analogy to the loop, we can consider the output of the phase detector as a 50% duty-cycle random NRZ data stream. Given that the output from each "bit" must cause a loop phase change of Qbb , we can to satisfy our loop difference

We can then compute the loop jitter by

•pump

Transition Samples

Fig. 22. Modified form of Alexander's ternary-quantized phase detector for NRZ data along with a typical charge pump for driving the VCO tuning input.

.

This "lowpass" loop characteristic is being driven by random energy from the early/late phase detector output. A related problem is the computation of the baseline wander voltage generated by passing a random NRZ data stream through a coupling capacitor. It can be shown that the sigma on the capacitor voltage is

equation must be JlTlO,.

v

DOWN

Input Data

This equation has the same form as a discrete time approximation to the capacitor voltage in an RC lowpass filter. By analogy, when time is expressed in units of loop update times, any transient phase error in the bang-bang loop can then be said to decay to zero with

compute that the effective V

UP

A

'pump

front-end of Alexander's phase detector, with the first flip-flop driven on therisingedge of the 50% duty-cycle clock, and the second flip-flop driven on the falling edge of the same clock. (Using a fully-differential monolithic ring-oscillator, it is possible to achieve a very precise 50% duty-cycle clock source). When the loop is locked, the rising-edge retiming flip-flop samples the center of each data bit and produces a retimed data bit at (A) and the following retimed bit at (B). The falling-edge flip-flop functions as a phase detector by sampling the transition (T) between the data bits (A,B). To improve the circuit's operating speed, the (T) sample is delayed an extra half bit time by a latch so that the logic on (A,T,B) has a full bit time for resolution.

using the analogous baseline wander expression with the effective loop V

and t . The result is

/eTToJ^n

The transition sample is then compared to the surrounding data bits to determine whether the clock sampling phase is early or late to derive a binary-quantized (bang-bang) or ternary phase error indication. A truth-table for the logic in Fig. 22 is given in Table 1.

TABLE 1. Truth table for logic in Fig. 22. which is consistent with empirical analysis of simulation results. One further insight into this behavior is offered. The secondorder loop drives the phase detector output to a steady-state 50% duty-cycle. In this condition, the loop phase splits the input jitter distribution into equal early and late halves. This means that the bang-bang loop phase is servoed to the median of the input jitter distribution rather than to the mean as would be the case with a linear loop. Because of this, the bang-bang loop makes a constant modest correction in response to large jitter outliers, rather than the proportionally large overcompensation of a linear loop. This insight supports the idea that the bang-bang loop jitter should only be sub-linearly affected by the magnitude of the input jitter.

42

State

A

T

B

UP

DOWN

Meaning

0

0

0

0

0

0

hold

1

0

0

1

0

1

early

2

0

1

0

1

1

hold

3 4

0 1

1

1 1

0

0

1 0

late late

5 6 7

1 1

0 1

1

1

1

1

0 1

0

0 1 1

0

0

hold early hold

The states 2 and 5 in Table 1 correspond to the normally impossible condition of sampling a " 1 " midway between two "0" bits. A custom truth table can use these states to detect either a high biterror-rate condition [10], a VCO running grossly too slowly (eg: lump these states into the "late" condition), or taken as an indication that a link has locked onto its own VCO crosstalk, perhaps by amplification of power supply noise by pick up from a high-gain optical transimpedance amplifier [11].

when Aj > AQ , for this implies exponential growth of the acquisition transient. The convergence is guaranteed

whenever

$>2X. Although usable for tightly constrained block codes such as 8b/ 10B, binary phase detectors are essentially unusable for codes such as 10Gb Ethernet 64b/66b or SONET which can have very long runlengths of up to 66 or 80 bits, respectively.

Since the mid-bit samples (A,B) straddle the (T) transition sample, it is also possible to detect the lack of a transition. This condition corresponds to states 0 and 7 in Table 1. This information can be used to create an extra ternary hold-state in the PD output, causing the charge pump to hold its value during long runlengths. Both binary and ternary PDs will be discussed in turn, along with their implications on loop performance.

B. Ternary Phase-Detector The 3-state, or ternary phase detector provides superior jitter performance for data with long runs [12]. Ternary PDs neither charge nor discharge the loop filter during long runs causing the loop to hold the current estimate of the data frequency. Such loops effectively "stop time" during long runs.

A. Run-length and Latency If the charge pump does not have a hold-mode, it is possible to emulate a ternary loop, with some loss of performance, by continuously toggling the phase-detector output to approximately maintain the current charge pump voltage during long runs.

Binary phase detectors have no hold state, so the PD continues to put out the last valid phase error indication during long data runlengths. In this situation, the loop idling jitter will be multiplied from the expected value by the maximum runlength of the data. For example, an 8B/10B code has a maximum code runlength of 5 and will have a peak jitter walk-off five times the value of that computed for a "10" repetitive data pattern. The average RMS jitter will be a function of the runlength distributions of each particular code. There is also a trade-off in effective stability factor as a repetitive pattern such as "11110000" will be equivalent to a loop with an effective update time 4 times larger than the expected

The peak idling jitter for ternary loops is unchanged from the simple 100% transition density analysis. The RMS jitter will be reduced by the average transition density. Because the loop phase cannot change during hold mode, the jitter tolerance will be derated by the average transition density. This can easily be taken into account by increasing 8 ^ appropriately for the characteristics of the code to be used.

1

S i n c e t h e stabilitv factor is update = l/fnominversely dependent on update time, it is possible for binary PDs to become unstable with data patterns containing very long runs due to the delay in timely phase-error feedback.

C. VCO Tuning Bandwidth The previous analyses all assumed an infinite VCO tuning bandwidth for the proportional tuning input. A VCO time-constant

Slope(t=0) = S

tvco , can slightly reduce hunting jitter if it is small compared to *1

X% - SX

the loop update time.

*i

Timeconstants larger than the loop update time prevent the loop from reversing phase slope within an update period and lengthen the loop limit cycle. If the extra pole is thought of as an

Fig. 23. Setup for computing onset of loop instability with latency X.

extra latency 2 T y c o , then the result of the previous section can be

0

used to give an approximate bound on loop stability. To avoid divergence: ^vco
Comparison with simulation

Fig. 23 shows the loop phase trajectory during an acquisition transient. At t=0, the loop crosses zero phase error with

verifies this equation as a conservative limit on Xyco .

d§/dt = S. From this we can compute an overshoot AQ. When the loop phase again crosses zero phase error, the phase detector is late in responding by a time X. This time is a combination of runlength, latency in the phase detector logic, and highorder poles in the VCO tuning characteristic.

However, it cannot be recommended to flirt with this boundary. Unless one meticulously checks performance by numerical simulation, it is safest to design the VCO to essentially respond fully in one update time. This is usually very easy to achieve in ring-oscillators and possible with some care using low-Q LC VCOs. VIII. CONCLUSION

Due to the loop latency X, the loop overshoots zero phase by

Bang-bang CDR circuits have the unique advantages of inherent sampling phase alignment, adaptability to multi-phase sam-

X /t,-SX before the "braking" effect of the proportional branch starts to act. The onset of catastrophic instability occurs

43

pling structures, and operation at the highest speed at which a process can make a working flip-flop. Approximate equations for loop jitter, recovered clock spectrum, and jitter tracking performance as a function of various design parameters have been derived. The median-tracking property of the bang-bang loop resulting in an output jitter equal to the square root of the input jitter has been presented.

[11]

R. C. Walker, C. Stout and C. Yen, "A 2.488Gb/s SiBipolar Clock and Data Recovery IC with Robust Loss of Signal Detection," in ISSCC Digest of Technical Papers pp. 246-247,466, Feb. 1997.

[12]

N. Ishihara and Y. Akazawa, "A Monolithic 156 Mb/s Clock and Data Recovery PLL Circuit Using the Sampleand-Hold Technique," IEEE Journal of Solid-State Circuits, vol. 29, no. 12, pp. 1566-1571, Dec. 1994.

[13]

D. Chen, and M. O. Baker, "A 1.25 Gb/s, 460mW CMOS Transceiver for Serial Data Communication," in ISSCC Digest of Technical Papers, pp. 242- 243,465 Feb. 1997.

[14]

L. DeVito, J. Newton, R. Goughwell, J. Bulzacchelli and F.Benkley, "A 52MHz and 155 MHz Clock-Recovery PLL," in ISSCC Digest of Technical Papers, pp. 142143,306, Feb. 1991.

[15]

J. F. Ewen, A. X. Widmer, M. Soyuer, K. R. Wrenner, B. Parker and H. A. Ainspan, "Single-Chip 1062Mbaud CMOS Transceiver for Serial Data Communication," in ISSCC Digest of Technical Papers, pp. 32-33,336, Feb. 1995.

[16]

A. Fiedler, R. Mactaggart, J. Welch and S. Krishnan, "A 1.0625Gbps Transceiver with 2x-Oversampling and Transmit Signal Pre-Emphasis," in ISSCC Digest of Technical Papers, pp. 238-239,464, Feb. 1997.

[17]

B. Guo, A. Hsu, Y. Wang and J. Kubinec, "125Mb/s CMOS All-Digital Data Transceiver Using Synchronous Uniform Sampling," in ISSCC Digest of Technical Papers, pp. 112-113, Feb. 1994.

[18]

Y. M. Greshishchev, P. Schvan, J. L. Showell, M. Xu, J. J. Ojha and J. E. Rogers, "A Fully Integrated SiGe Receiver IC for 10-Gb/s Data Rate," IEEE Journal of Solid State Circuits, vol. 35, no. 12, pp. 1949-1957, Dec. 2000.

ACKNOWLEDGMENT The author is grateful to the contributions of Birdy Amrutur, Bill Brown, John Corcoran, Craig Corsetto, Dave DiPietro, Brian Donoghue, Jeff Galloway, Andrew Grzegorek, Tom Hornak, Jim Homer, Tom Knotts, Benny Lai, Adolf Leiter, Bill McFarland, Charles Moore, Rasmus Nordby, Cheryl Owen, Pat Petruno, Kent Springer, Guenter Steinbach, Hugh Wallace, Bin Wu, J.T. Wu, and Chu Yen for technical discussions and helpful insights into bangbang loop behavior. REFERENCES

[1]

C. B. Armitage, "SAW Filter Retiming in the AT&T 432 Mb/s Lightwave Regenerator," in Conference Proceedings: AT&T Bell Labs, pp. 102-103, Sept. 3-6, 1984.

[2]

C. R. Hogge, Jr., "A Self Correcting Clock Recovery Circuit," IEEE Transactions on Electron Devices, vol. ED32, no. 12, pp. 2704-2706, Dec. 1985.

[3]

J. Tani, Crandall, D., Corcoran, J. Hornak, T., "Parallel Interface ICs for 120Mb/s Fiber Optic Links," in ISSCC Digest oj Technical Papers, pp. 190-191,390, Feb. 1987.

[4]

R. C. Walker, T. Hornak, C. Yen and K. H. Springer, "A Chipset for Gigabit Rate Data Communication," in Proceedings of the 1989 Bipolar Circuits and Technology Meeting, pp. 288-290 September 18-19 1989.

[5]

F. Gardner, Phaselock Techniques, New York: John Wiley & Sons, 1979, pp. 8-14.

[6]

I. Galton, "Higher-order Delta-Sigma Frequency-toDigital Conversion," in Proceedings of IEEE International Symposium on Circuits and Systems, pp. 441-444, May 30 -June 2, 1994.

[19]

R. Gu, J. M. Tran, H. Lin, A. Yee and M. Izzard, "A 0.53.5Gb/s Low-Power Low-Jitter Serial Data CMOS Transceiver," in ISSCC Digest of Technical Papers, pp. 352353,478, Feb. 1999.

[7]

I. Galton, "Analog-Input Digital Phase-Locked Loops for Precise Frequency and Phase Demodulation," Transactions on Circuits and Systems-II: Analog and Digital Signal Processing, vol. 42, no. 10, pp. 621-630, Oct. 1995.

[20]

[8]

M. W. Hauser, "Principles of Oversampling AID Conversion," J. Audio Eng. So. vol 39, no. 1/2, pp 3-26, Jan./ Feb. 1991.

J. Hauenschild, C. Dorshcky, T. W. Mohrenfels and R. Seitz, "A lOGb/s BiCMOS Clock and Data Recovery 1:4Demultiplexer in a Standard Plastic Package with External VCO," in ISSCC Digest of Technical Papers, pp. 202203,445, Feb. 1996.

[21]

T. He, and P. Gray, "A Monolithic 480 Mb/s AGC/Decision/Clock Recovery Circuit in 1.2 urn CMOS," IEEE Journal of Solid State Circuits, vol. 28, no. 12, pp. 13141320, Dec. 1993.

[22]

P. Larsson, "A 2-1600MHz 1.2-2.5V CMOS ClockRecovery PLL with Feedback Phase-Selection and Averaging Phase-Interpolation for Jitter Reduction," in ISSCC Digest of Technical Papers, pp. 356-357, Feb. 1999.

[9]

[10]

J. D. H. Alexander, "Clock Recovery from Random Binary Signals," Electronics Letters, vol. 11, no. 22, pp. 541-542, Oct. 1975. J. Hauenschild, D. Friedrich, J. Herrle, J. Krug, "A TwoChip Receiver for Short Haul Links up to 3.5Gb/s with PIN-Preamp Module and CDR-DMUX " in ISSCC Digest of Technical Papers, pp. 308-309,452, Feb. 1996.

44

[23]

B. Lai, and R. C. Walker, "A Monolithic 622Mb/s Clock Extraction Data Retiming Circuit," in JSSCC Digest of Technical Papers, pp. 144,145, Feb. 1991. [24] T. H. Lee, and J. F. Bulzacchelli, "A 155MHz Clock Recovery Delay- and Phase-Locked Loop," IEEE Journal of Solid State Circuits vol. 27, no. 12, pp. 1736-1746, Dec. 1992. [25]

[26]

R. H. Leonowich, and J. M. Steininger, "A 45-MHz CMOS phase/frequency-locked loop timing recovery circuit," in ISSCC Digest of Technical Papers, pp. 14-15,278279, Feb. 1988. I. Lee, C. Yoo, W. Kim, S. Chai and W. Song, "A 622Mb/s CMOS Clock Recovery PLL with Time- Interleaved Phase Detector Array," in ISSCC Digest of Technical Papers, pp. 198-199,444, Feb. 1996.

M. Meghelli, B. Parker, H. Ainspan and M. Soyuer, "A SiGe BiCMOS 3.3V Clock and Data Recovery Circuit for lOGb/s Serial Transmission Systems," in ISSCC Digest of Technical Papers, pp. 56-57, Feb. 2000. [28] T. Morikawa, M. Soda, S. Shiori, T. Hashimoto, F. Sato and K. Emura, "A SiGe Single-Chip 3.3V Receiver IC for lOGb/s Optical Communication System," in ISSCC Digest of Technical Papers, pp. 380-381,481, Feb. 1999. A. Pottbacker, and U. Langmann, "An 8GHz Silicon Bipolar Clock-Recovery and Data-Regenerator IC," IEEE Journal of Solid State Circuits vol. 29, no. 12, pp. 15721576, Dec. 1994.

[30]

M. Reinhold, C. Dorschky, F. Pullela, E. Rose, P. Mayer, P. Paschke, Y. Baeyens, J. Mattia and F. Kunz, "A Fully-Integrated 40Gb/s Clock and Data Recovery / 1:4 DEMUX IC in SiGe Technology," in ISSCC Digest of Technical Papers, pp. 84-85,435, Feb. 2001.

[31]

M. Soyuer, and H. A. Ainspan, "A Monolithic 2.3 Gb/s lOOmW Clock and Data Recovery Circuit," in ISSCC Digest of Technical Papers, pp. 158-159,282, Feb. 1993.

[32]

S. Ueno, K. Watanabe, T. Kato, T. Shinohara, K. Mikami, T. Hashimoto, A. Takai, K. Washio, R. Takeyar and T. Harada, "A Single-Chip lOGb/s Transceiver LSI using SiGe SOI/BiCMOS," in ISSCC Digest of Technical Papers, pp. 82-83,435, Feb. 2001.

[33]

H. Wang, and R. Nottenburg, "A lGb/s CMOS Clock and Data Recovery Circuit," in ISSCC Digest of Technical Papers, pp. 354-355,477, Feb. 1999.

[34]

P. Wallace, R. Bayruns, J. Smith, T. Laverick and R. Shuster, "A GaAs 1.5Gb/s Clock Recovery and Data Retiming Circuit," in ISSCC Digest of Technical Papers, pp. 192-193, Feb. 1990.

[35]

Z. Wang, M. Berroth, J. Seibel, P. Hofinann, A. Hulsmann, Kohler, B. Raynor and J. Schneider, "19GHz Monolithic Integrated Clock Recovery Using PLL and 0.3um Gate-Length Quantum-Well HEMTs," in ISSCC Digest of Technical Papers, pp. 118-119, Feb. 1994,

R. C. Walker, K. Hsieh, T. A. Knotts and C. Yen, "A lOGb/s Si-Bipolar TX/RX Chipset for Computer Data Transmission," in ISSCC Digest of Technical Papers, pp. 302-303,450, Feb. 1998.

[37]

R. C. Walker, J. Wu, C. Stout, B. Lai, C. Yen, T. Hornak and P. Petruno, "A 2-Chip 1.5Gb/s Bus-Oriented Serial Link Interface," in ISSCC Digest of Technical Papers, pp. 226-227,291, Feb. 1992.

[38]

C. K. Yang, and M. A. Horowitz, "0.8um CMOS 2.5Gb/ s Oversampled Receiver for Serial Links," IEEE Journal of Solid State Circuits vol. 31, no. 12, pp. 20150-2023, Dec. 1996.

Richard Walker was born in San Rafael CA, in 1960. He received the B.S. degree in Engineering and Applied Science from the California Institute of Technology in 1982, and an M.S. degree in Computer SciencefromCalifornia State University, Chico, CA in 1992. Rick joined Agilent Laboratories (formerly Hewlett-Packard Laboratories) in 1981, where he is currently a Principal Project Engineer. Since that time, he has worked in the areas of broadband-cable modem design, solidstate laser characterization, phase-locked-loop theory, linecode design, and gigabit-rate serial data transmission. He holds 15 U.S. patents.

[27]

[29]

[36]

45

Predicting the Phase Noise and Jitter of PLL-Based Frequency Synthesizers Kenneth S. Kundert Abstract — Two methodologies are presented for predicting the phase noise and jitter of a PLL-based frequency synthesizer using simulation that are both accurate and efficient. The methodologies begin by characterizing the noise behavior of the blocks that make up the PLL using transistor-level RF simulation. For each block, the phase noise or jitter is extracted and applied to a model for the entire PLL. I. INTRODUCTION

Phase-locked loops (PLLs) are used to implement a variety of timing related functions, such as frequency synthesis, clock and data recovery, and clock de-skewing. Any jitter or phase noise in the output of the PLL used in these applications generally degrades the performance margins of the system in which it resides and so is of great concern to the designers of such systems. Jitter and phase noise are different ways of referring to an undesired variation in the timing of events at the output of the PLL. They are difficult to predict with traditional circuit simulators because the PLL generates repetitive switching events as an essential part of its operation, and the noise performance must be evaluated in the presence of this large-signal behavior. SPICE is useless in this situation as it can only predict the noise in circuits that have a quiescent (time-invariant) operating point. In PLLs the operating point is at best periodic, and is sometimes chaotic. Recently a new class of circuit simulators has been introduced that are capable of predicting the noise behavior about a periodic operating point [1]. SpectreRF is the most popular of this class of simulators and, because of the algorithms used in its implementation, is likely to be the best suited for this application [2]. These simulators can be used to predict the noise performance of PLLs. The ideas presented in this paper allow those simulators to be applied even to those PLLs that have chaotic operating points. A. Frequency Synthesis The focus of this paper is frequency synthesis. The block diagram of a PLL operating as a frequency synthesizer is shown in Figure 1 [3]. It consists of a reference oscillator (OSC), a phase/frequency detector (PFD), a charge pump (CP), a loop filter (LF), a voltage-controlled oscillator (VCO), and two Ken Kundert is with Cadence Design Systems, San Jose, California, [email protected].

frequency dividers (FDs). The PLL is a feedback loop that, when in lock, forces /ft, to be equal to/ r e f . Given an input frequency ^ n , the frequency at the output of the PLL is •'out

M

(1)

Y

where M is the divide ratio of the input frequency divider, and N is the divide ratio of the feedback divider. By choosing the frequency divide ratios and the input frequency appropriately, the synthesizer generates an output signal at the desired frequency that inherits much of the stability of the input oscillator. In RF transceivers, this architecture is commonly used to generate the local oscillator (LO) at a programmable frequency that tunes the transceiver to the desired channel by adjusting the value of N. PU

OSC

/ref PFD

/in

CP

LF

VCO

/fbp

/out

-f-N Fig. 1. The block diagram of a frequency synthesizer.

B. Direct Simulation In many circumstances, SpectreRF* can be directly applied to predict the noise performance of a PLL. To make this possible, the PLL must at a minimum have a periodic steady state solution. This rules out systems such as bang-bang clock and data recovery circuits and fractional-Af synthesizers because they behave in a chaotic way by design. It also rules out any PLL that is implemented with a phase detector that has a dead zone. A dead zone has the effect of opening the loop and letting the phase drift seemingly at random when the phase of the reference and the output of the voltage-controlled oscillator (VCO) are close. This gives these PLLs a chaotic nature. To perform a noise analysis, SpectreRF must first compute the steady-state solution of the circuit with its periodic steady state (PSS) analysis. If the PLL does not have a periodic solution, as the cases described above do not, then it will not converge. There is an easy test that can be run to determine if a circuit has a periodic steady-state solution. Simply perform a transient analysis until the PLL approaches steady state and t Spectre is a registered trademark of Cadence Design Systems.

46

then observe the VCO control voltage. If this signal consists of frequency components at integer multiples of the reference frequency, then the PLL has a periodic solution. If there are other components, it does not. Sometimes it can be difficult to identify the undesirable components if the components associated with the reference frequency are large. In this case, use the strobing feature of Spectre's transient analysis to eliminate all components at frequencies that are multiples of the reference frequency. Do so by strobing at the reference frequency. In this case, if the VCO control voltage varies in any significant way the PLL does not have a periodic solution.

jitter parameters for the corresponding behavioral models [8]. Once everything is ready, simulation of the PLL occurs with the blocks of the PLL being described with behavioral models that exhibit jitter. The actual jitter or phase noise statistics are observed during this simulation. Generally tens to hundreds of thousands of cycles are simulated, but the models are efficient so the time required for the simulation is reasonable. This approach allows prediction of PLL jitter behavior once the noise behavior of the blocks has been characterized. However, it requires the use of an experimental simulator that is not readily available to characterize the jitter of the blocks.

If the PLL has a periodic solution, then in concept it is always possible to apply SpectreRF directly to perform a noise analysis. However, in some cases it may not be practical to do so. The time required for SpectreRF to compute the noise of a PLL is proportional to the number of circuit equations needed to represent the PLL in the simulator times the number of time points needed to accurately render a single period of the solution times the number of frequencies at which the noise is desired. When applying SpectreRF to frequency synthesizers with large divide ratios, the number of time points needed to render a period can become problematic. Experience shows that divide ratios greater than ten are often not practical to simulate. Of course, this varies with the size of the PLL.

In an earlier series of papers [9, 10], the relevant ideas of Demir were adapted to allow use of a commercial simulator, Spectre [11], and an industry standard modeling language, Verilog-A^ [12]. These ideas are further refined in the later half of this paper.

For PLLs that are candidates for direct simulation using SpectreRF, simply configure the simulator to perform a PSS analysis followed by a periodic noise (PNoise) analysis. The period of the PSS analysis should be set to be the same as the reference frequency as defined in Figure 1. The PSS stabilization time (tstab) should be set long enough to allow the PLL to reach lock. This process was successfully followed on a frequency synthesizer with a divide ratio of 40 that contained 2500 transistors, though it required several hours for the complete simulation [4]. C. When Direct Simulation Fails The challenge still remains, how does one predict the phase noise and jitter of PLLs that do not fit the constraints that enable direct simulation? The remainder of this paper attempts to answer that question for frequency synthesizers, though the techniques presented are general and can be applied to other types of PLLs by anyone who is sufficiently determined. D. Monte Carlo-Based Methods Demir proposed an approach for simulating PLLs whereby a PLL is described using behavioral models simulated at a high level [5, 6]. The models are written such that they include jitter in an efficient way. He also devised a simulation algorithm based on solving a set of nonlinear stochastic differential equations that is capable of characterizing the circuit-level noise behavior of blocks that make up a PLL [6, 7]. Finally, he gave formulas that can be used to convert the results of the noise simulations on the individual blocks into values for the

E. Predicting Noise in PLLs There are two different approaches to modeling noise in PLLs. One approach is to formulate the models in terms of the phase of the signals, producing what are referred to as phase-domain models. In the simplest case, these models are linear and analyzed easily in the frequency domain, making it simple to use the model to predict phase noise, even in the presence of flicker noise or other noise sources that are difficult to model in the time domain. Phase-domain models are described in the first half of this paper. The process of predicting the phase noise of a PLL using phase-domain models involves: 1. Using SpectreRF to predict the noise of the individual blocks that make up the PLL. 2. Building high-level behavioral models of each of the blocks that exhibit phase noise. 3. Assembling the blocks into a model of the PLL. 4. Simulating the PLL to find the phase noise of the overall system. The other approach formulates the models in terms of voltage, which are referred to as voltage-domain models. The advantage of voltage-domain models is that they can be refined to implementation. In other words, as the design process transitions to being more of a verification process, the abstract behavioral models initially used can be replaced with detailed gate- or transistor-level models in order to verify the PLL as implemented. A voltage-domain model is strongly nonlinear and never has a quiescent operating point, making it incompatible with a SPICE-Iike noise analysis. Often such models have a periodic operating point and so can be analyzed with small-signal RF noise analysis (SpectreRF), but it is also common for that not to be the case. For example, a fractional-^ synthesizer does t Verilog is a registered trademark of Cadence Design Systems licensed to Accellera.

47

not have a periodic operating point. Occasionally, the circuit is sensitive enough that the noise affects the large-signal behavior of the PLL, such as with bang-bang clock-and-data recovery PLLs, which invalidates any use of small-signal noise analysis.

involves hundreds or thousands of cycles at the input to the phase detector. With large divide ratios, this can translate to hundreds of thousands of cycles of the VCO. Thus, the number of time points needed for a single simulation could range into the millions.

Modeling large-signal noise in a voltage-domain model as a voltage or a current is problematic. Such signals are very small and continuously and very rapidly varying. Extremely tight tolerances and small time steps are required to accurately resolve such signals with simulation. To overcome these problems, the noise is instead represented using the effect it has on the timing of the transitions within the PLL. In other words, the noise is added to circuit in the form of jitter. In this case there is no need for either small time steps or tight tolerances.

This is all true when simulating the PLL in terms of voltages and currents. When doing so, one is said to be using voltagedomain models. However, that is not the only option available. It is also possible to formulate models based on the phase of the signals. In this case, one would be using phasedomain models. The high frequency variations associated with the voltage-domain models are not present in phasedomain models, and so simulations are considerably faster. In addition, when in lock the phase-domain-based models generally have constant-valued operating points, which simplifies small-signal analysis, making it easier to study the closedloop dynamics and noise performance of the PLL using either AC or noise analysis.

The process of predicting the jitter of a PLL with voltagedomain models involves: 1. Using SpectreRF to predict the noise of the individual blocks that make up the PLL. 2. Converting the noise of the block to jitter. 3. Building high-level behavioral models of each of the blocks that exhibit jitter. 4. Assembling the blocks into a model of the PLL. 5. Simulating the PLL to find the jitter of the overall system.

A linear phase-domain model of a frequency synthesizer is shown in Figure 2. Such a model is suitable for modeling the behavior of the PLL to small perturbations when the PLL is in lock as long as you do not need to know the exact waveforms and instead are interested in how small perturbations affect the phase of the output. This is exactly what is needed to predict the phase noise performance of the PLL.

The simple linear phase-domain model described in the first part of this paper, and the nonlinear voltage-domain model described in the second part, represent the two ends of a continuum of models. Generally, the phase-domain models are considerably more efficient, but the voltage-domain models do a better job of capturing the details of the behavior of the loop, details such as the signal capture and escape processes. The phase-domain models can be made more general by making them nonlinear and by analyzing them in the time domain. It is common to use such models with fractional-TV synthesizers. Conversely, simplifications can be made to the voltagedomain models to make them more efficient. It is even possible to use both voltage- and phase-domain models for different parts of the same loop. One might do so to retain as much efficiency as possible while allowing part of the design to be refined to implementation level. In general it is best to understand both approaches well, and use ideas from both to construct the most appropriate approach for your particular situation. II. PHASE-DOMAIN MODEL

It is widely understood that simulating PLLs is expensive because the period of the VCO is almost always very short relative to the time required to reach lock. This is particularly true with frequency synthesizers, especially those with large multiplication factors. The problem is that a circuit simulator must use at least 10-20 time points for every period of the VCO for accurate rendering, and the lock process often

osc +^

FD/u

PFD

M

I '•ft

LF

CP *det 271

l

%

//(CO)

VCO ^out

w

/CO

¥DN

1 N

Fig. 2. Linear time-invariant phase-domain model of the synthesizer shown in Figure 1. The derivation of the model begins with the identification of those signals that are best represented by their phase. Many blocks have large repetitive input signals with their outputs being primarily sensitive to the phase of their inputs. It is the signals that drive these blocks that are represented as phase. They are identified using a (]) variable in Figure 2. Notice that this includes all signals except those at the inputs of the LF and VCO. The models of the individual blocks will be derived by assuming that the signals associated with each of the phase variables is a pulse train. Though generally the case, it is not a requirement. It simply serves to make it easier to extract the models. Define ri(r0, T, T) to be a periodic pulse train where one of the pulses starts at / 0 and the pulses have duration t and period T as shown in Figure 3. This signal transitions between 0 and 1 if t is positive, and between 0 and - 1 if z is negative. The phase of this signal is defined to be $ = 2%t^/T. In many cases, the duration of the pulses is of no interest, in which case n(r 0 , T) is used as a short hand. This occurs because the input

48

that the signal is driving is edge triggered. For simplicity, we assume that such inputs are sensitive to the rising edges of the signal, that r0 specifies the time of a rising edge, and that the signal is transitioning between 0 and 1.

't-fhn , t.r - H

4 i w ' ;tura^r T>0

T<0

Fig. 3. The pulse train waveform represented by FI(^Q, X, 7). The input source produces a signal vin = n(/Q, T). Since this is the input, t0 is arbitrary. As such, we are free to set its phase <>| to any value we like. Given a signal vj = n(/ 0 , T) a frequency divider will produce an output signal v o = U(t0, NT) where N is the divide ratio. The phase of the input isfa= 2nto/T and the phase of the output is <|>o = 2ntrf(NT) and so the phase transfer characteristic of a divider is •o = *iW. (2) There are many different types of phase detectors that can be used, each requiring a somewhat different model. Consider a simple phase-frequency detector combined with a charge pump [13]. In this case, the detector takes two inputs, vj = n(*i, T) and v 0 = n(f 0 , T) and produces an output *cp = /maxn(^o» ?i~ro» T) where / max is the maximum output current of the charge pump. The output of the charge pump immediately passes through a low pass filter that is designed to suppress signals at frequencies of 1/7 and above, so in most cases the pulse nature of this signal can be ignored in favor of its average value, = 7max~Y"" = 7max

= 1%

~^^\

"*o)

The model of (3) is a continuous-time approximation to what is inherently a discrete-time process. The phase detector does not continuously monitor the phase difference between its two input signals, rather it outputs one pulse per cycle whose width is proportional to the phase difference. Using a continuous time approximation is generally acceptable if the bandwidth of the loop filter is much less than/ ref (generally less than/ ref /10 is sufficient). In practical PLLs this is almost always the case. It is possible to develop a detailed phasedomain PFD model that includes the discrete-time effects, but it would run more slowly and the resulting phase-domain model of the PLL would not have a quiescent operating point, which makes it more difficult to analyze. The voltage-controlled oscillator, or VCO, converts its input voltage to an output frequency, and the relationship between input voltage and output frequency can be represented as /out = *Xv c )

(4)

The mapping from voltage to frequency is designed to be linear, so a first-order model is often sufficient, / o u t = ^vco v c-

(5)

It is the output phase that is needed in a phase-domain model, •outW = 2nJ* vco v c (r)dr

(6)

or in the frequency domain, out(«) = — ^ ^ ( c o ) .

(7)

A. Small-Signal Stability This completes the derivation of the phase-domain models for each of the blocks. Now the full model is used to help predict the small-signal behavior of the PLL. Start by using Figure 2 to write a relationship for its loop gain. Start by defining G fwd = I2H1 = - J 2 f f ( ( D ) _ l E 2 =

<3)

dct

vc

°

(8)

where AT Kdetdet = / m a x . Of course, this is only valid for to be the forward gain, ost ^% a t t n e mmost. - The behavior outside this range l^l"" ^2! < 2fl 4V 1 depends strongly on the type of phase detector used [3]. Even G = — =~ (9) rCV <>ou« * ^ within this range, the phase detector may be better modeled with a nonlinear transfer characteristic. For example, there to be the feedback factor, and can be a flat spot in the transfer characteristics near 0 if the detector has a dead zone. However it is generally not producT ~ GfwdGrev ^ (10) tive to model the dead zone in a phase-domain model.' to be the loop gain. The loop gain is used to explore the smallsignal stability of the loop. In particular, the phase margin is an important stability metric. It is the negative of the differt This phase-domain model is a continuous-time model that ignores ence between the phase shift of the loop at unity gain and the sampling nature of the PFD. A dead zone interacts with the sam- 180°, the phase shift that makes the loop unstable. It should pling nature of the PFD to create a chaotic limit cycle behavior that is not modeled with the phase-domain model. This chaotic behavior be no less than 45° [14]. When concerned about phase noise creates a substantial amount of jitter, and for this reason, most mod- or jitter, the phase margin is typically 60° or more to reduce ern phase detectors are designed such that they do not exhibit dead peaking in the closed-loop gain, which results in excess phase zones. noise.

49

B. Noise Transfer Functions In Figure 4 various sources of noise have been added. These noise sources can represent either the noise created by the blocks due to intrinsic noise sources (thermal, shot, and flicker noise sources), or the noise coupled into the blocks from external sources, such as from the power supplies, the substrate, etc. Most are sources of phase noise, and denoted FDw

•in"

PFD/CP

•»f

1

I

(g

M

LF

*det

//(co)

^

271

'det

^fdm

VCO 2TC

•oui

^vco

j(O TVCO

FD*

1

r

N

^fdn

Fig. 4. Linear time-invariant phase-domain model of the synthesizer shown in Figure 2 with representative noise sources added. The <|)'s represent various sources of noise. •in* •fdm* ^fdn' anc* •vcc because the circuit is only sensitive

to phase at the point where the noise is injected. The one exception is the noise produced by the PFD/CP, which in this case is considered to be a current, and denoted /det. Then the transfer functions from the various noise sources to the output are r

G

- *out _

ref =

^"f •«*

_

°vco - i

=

G

NG

fwd _

i ^ _

l

=

_

tfa

1

Ml-T

'

N

MN-G^'

and by inspection, Gfdn = j = * - "Graf G fdm - j 2 * - ~GKf, Tfdm - *out _ ^det - "j

n n

(11)

N

•; f - TTTa

C = isH! = ±S*a* = *

fwd

^-Gfwd'

K

m

Consider further the asymptotic behavior of the loop and the VCO noise at low offset frequencies (co —> 0). Oscillator phase noise in the VCO results in the power spectral density 5(})vco being proportional to 1/co2, or fyvco~ 1/co2 (neglecting flicker noise). If the LF is chosen such that //(co) ~ 1, then Gfwd - 1 /co, and contribution from the VCO to the output noise power, GvccAvco»*s f*mte anc* nonzero. If the LF is chosen such that //(co) - 1/co, as it typically is when a true charge pump is employed, then Gfwd ~ 1/co2 and the noise contribution to the output from the VCO goes to zero at low frequencies. C. Noise Model One predicts the phase noise exhibited by a PLL by building and applying the model shown in Figure 4. The first step in doing so is to find the various model parameters, including the level of the noise sources, which generally involves either direct measurement or simulating the various blocks with an RF simulator, such as SpectreRF. Use periodic noise (or PNoise) analysis to predict the output noise that results from stochastic noise sources contained within the blocks using simulation. Use a periodic AC or periodic transfer function (PAC or PXF) to compute the perturbation at the output of a block due to noise sources outside the block, such as on supplies.

'

Once the model parameters are known, it is simply a matter of computing the output phase noise of the PLL by applying the (\3) equations in Section II-B to compute the contributions to (J)out from every source and summing the results. Be careful to account for correlations in the noise sources. If the noise sources are perfectly correlated, as they might be if the ultid4) mate source of noise is in the supplies or substrate, then use a direct sum. If the sources produce completely uncorrelated noise, as they would when the ultimate source of noise is ran(15) dom processes within the devices, use a root-mean-square sum.

27CG

ref 7?—'

(

As co -» 0, Gfwd -» °° because of the 1 /(/G>) term from the VCO. So at DC, G r e f , G f d m , G f d n ^ N , Cm->N/M and G vco -* 0 • At low frequencies, the noise of the PLL is contributed by the OSC, PFD/CP, FD M and FD M and the noise from the VCO is diminished by the gain of the loop.

,v

Alternatively, one could build a Verilog-A model and use simulation to determine the result. The top-level of such a model A 'det det is shown in Listing 1. It employs noisy phase-domain models On this last transfer function, we have simply referred *det to for each of the blocks. These models are given in Listings 3-7 the input by dividing through by the gain of the phase detecand are described in detail in the next few sections (HI-VI). In tor. this example, the noise sources are coded into the models, but These transfer functions allow certain overall characteristics the noise parameters are not set at the top level to simplify the of phase noise in PLLs to be identified. As co->°o, model. To predict the phase noise performance of the loop in Gfwd -» 0 because of the VCO and the low-pass filter, and so lock, simply specify these parameters in Listing 1 and perG form a noise analysis. To determine the effect of injected ref> Gdet> Gfdm> Gfdn> G i n ~> ° a n d G vco ~> l • A t h i S h f r e " quencies, the noise of the PLL is that of the VCO. Clearly this noise, first refer the noise to the output of one of the blocks, must be so because the low-pass LF blocks any feedback at and then add a source into the netlist of Listing 1 at the approhigh frequencies. priate place and perform an AC analysis. r

n u

}

50

Listing 1 — Phase-domain model for a PLL configured as a frequency synthesizer.

include "discipline.h" module pll(out); output out; phase out; parameter integer m = 1 from [1 :inf); parameter real Kdet = 1 from (O:inf); parameter real Kvco = 1 from (0:inf); parameter real d = 1 n from (O:inf); parameter real c2 = 200p from (0:inf); parameter real r = 10K from (0:inf); parameter integer n = 1 from [1 :inf); phase in, ret, fb; electrical c;

//input divide ratio //detector gain // VCO gain //Loop filter C1 //Loop filter C2 //Loop filterR // fb divide ratio

oscillator OSC(in); divider #(.ratio(m)) FDm(in, ref); phaseDetector #(.gain(Kdet)) PD(ref, fb, c); loopFilter#(.c1(c1), .c2(c2), .r(r)) LF(c); vco #(.gain(Kvco)) VCO(c, out); divider #(.ratio(n)) FDn(out, fb); endmodule

is because oscillators inherently tend to amplify noise found near their oscillation frequency and any of its harmonics. The reason for this behavior is covered next, followed by a description of how to characterize and model the noise in an oscillator. The origins of oscillator phase noise are described in a conceptual way here. For a detailed description, see the papers by Kaertner or Demir et al [15, 16, 17]. A. Oscillator Phase Noise Nonlinear oscillators naturally produce high levels of phase noise. To see why, consider the trajectory of a fully autonomous oscillator's stable periodic orbit in state space. In steady state, the trajectory is a stable limit cycle, v. Now consider perturbing the oscillator with an impulse and assume that the deviation in the response due to the perturbation is Av, as shown in Figure 5. Separate Av into amplitude and phase variations, Av(r) = [\+a(t)]v(t + $&)-v(t).

(17)

where v represents the unperturbed T-periodic output voltage of the oscillator, oc represents the variation in amplitude, § is the variation in phase, and/ o = \IT is the oscillation frequency.

Listings 1 and 3-7 have phase signals, and there is no phase discipline in the standard set of disciplines provided by Verv2 ilog-A or Verilog-AMS in discipline.h. There are several different resolutions for this problem. Probably the best solution 'CL Av(0) is to simply add such a discipline, given in Listing 2, either to '6 'o discipline.h as assumed here or to a separate file that is A!>6 '51 h 'l included as needed. Alternatively, one could use the rotav l tional discipline. It is a conservative discipline that includes l A '6 torque as a flow nature, and so is overkill in this situation. h h h Finally, one could simply use either the electrical or the volth h K age discipline. Scaling for voltage in volts and phase in radians is similar, and so it will work fine except that the units Fig. 5. The trajectory of an oscillator shown in state space with and will be reported incorrectly. Using the rotational discipline without a perturbation Av. By observing the time stamps (?Q» ..., fg) would require that all references to the phase discipline be one can see that the deviation in amplitude dissipates while the changed to rotational in the appropriate listings. Using either deviation in phase does not. the electrical or voltage discipline would require that both the Since the oscillation is stable and the duration of the disturname of the disciplines be changed from phase to either elecbance is finite, the deviation in amplitude eventually decays trical or voltage, and the name of the access functions be away and the oscillator returns to its stable orbit (oc(f) -» 0 as changed from Theta to V. t -» oo). In effect, there is a restoring force that tends to act against amplitude noise. This restoring force is a natural conListing 2 — Signal flow discipline definition for phase signals (the sequence of the nonlinear nature of the oscillator that acts to nature Angle is defined in discipline.h). suppresses amplitude variations. * include "discipline.h" discipline phase potential Angle; enddiscipline m.

OSCILLATORS

Oscillators are responsible for most of the noise at the output of the majority of well-designed frequency synthesizers. This

The oscillator is autonomous, and so any time-shifted version of the solution is also a solution. Once the phase has shifted due to a perturbation, the oscillator continues on as if never disturbed except for the shift in the phase of the oscillation. There is no restoring force on the phase and so phase deviations accumulate. A single perturbation causes the phase to permanently shift (§(t) —> A(|) as t —> oo). If we neglect any short term time constants, it can be inferred that the impulse response of the phase deviation <|>(0 can be approximated with

51

a unit step s(t). The phase shift over time for an arbitrary input disturbance u is oo

t


(18)

at A/= 1 Hz and/ c is the flicker noise corner frequency. As shown in Figure 6, n is extracted by simply extrapolating to 1 Hz from a frequency where the noise from the white sources dominates.

—oo

\\^3:

or the power spectral density (PSD) of the phase is

lHz

White sources dominate \T2:1

External noise sources

a

f\» ** /o

Fig. 6. Extracting the noise parameters, n> a, and/ c , for an oscillator. The parameter a is an alternative to n where n = afo2. It is used later. The graph is plotted on a log-log scale. Sty is not directly observable and often difficult to find, so now Sty is related to L, the power spectral density of the output voltage noise Sv normalized by the power in the fundamental tone. Sv is directly available from either measurement with a spectrum analyzer or from RF simulators, and £ i s defined as

SJ

««> - ^f.

B. Characterizing Oscillator Phase Noise Above it was shown that oscillators tend to convert perturbations from any source into a phase variation at their output whose magnitude varies with I/A/ (or l / A / 2 i n power). Now assume that the perturbation is from device noise in the form of white and flicker stochastic processes. The oscillator's response will be characterized first in terms of the phase noise Sty, and then because phase noise is not easily measured, in terms of the normalized voltage noise L. The result will be a small set of easily extracted parameters that completely describe the response of the oscillator to white and flicker noise sources. These parameters are used when modeling the oscillator. Assume that the perturbation consists of white and flicker noise and so has the form

SJL6f)~\+f±.

\^ ^v

This shows that in all oscillators the response to any form of perturbation, including noise, is amplified and appears mainly in the phase. The amplification increases as the frequency of the perturbation approaches the frequency of oscillation in proportion to l/A/(or I/A/ 2 in power). Notice that there is only one degree of freedom — the phase of the oscillator as a whole. There is no restoring force when the phase of all signals associated with the oscillator shift together, however there would be a restoring force if the phase of signals shifted relative to each other. This observation is significant in oscillators with multiple outputs, such as quadrature or ring oscillators. The dominant phase variations appear identically in all outputs, whereas relative phase variations between the outputs are naturally suppressed by the oscillator or added by subsequent circuitry and so tend to be much smaller [8].

Flicker sources dominate

(20)

m

where Vj is the fundamental Fourier coefficient of v, the output signal. It satisfies oo

Vkei2«kf°'.

»(,) = £

(23)

In (41) of [15], Demir et al shows that for a free-running oscillator perturbed only by white noise sources*

440 =I

n 2 2 2

A,2,

2

2w 7C + A /

(24)

2

which is a Lorentzian process with corner frequency of /comer = " * « / o At frequencies above the corner,

<25>

Then from (19) the response will take the form which agrees with Vendelin [18],

where the factor of (2ri)2 in the denominator of (19) has been absorbed into «, the constant of proportionality. Thus, the response of the oscillator to white and flicker noise sources is characterized using just two parameters, n &ndfc, where n is the portion of Sty attributable to the white noise sources alone

Use (21) to extract/ c . Then use both (21) and (26) to determine n by choosing JSf well above the flicker noise corner frequency,/^ and the corner frequency of (25),/ c o r n e p to avoid ambiguity and well b e l o w / 0 to avoid the noise from other sources that occur at these frequencies. t Demir uses c rather than «, where n = c/02.

52

C. Phase-Domain Models for the Oscillators

though they should not contain any white space, wpn was chosen to represent white phase noise and/p/i stands for flicker phase noise.

The phase-domain models for the reference and voltage-controlled oscillators are given in Listings 3 and 4. The VCO model is based on (6). Perhaps the only thing that needs to be explained is the way that phase noise is modeled in the oscillators. Verilog-AMS provides the flicker jioise function for modeling flicker noise, which has a power spectral density proportional to l / / a with a typically being close to 1. However, Verilog-AMS does not limit a to being close to one, making this function well suited to modeling oscillator phase noise, for which a is 2 in the white-phase noise region and close to 3 in the flicker-phase noise region (at frequencies below the flicker noise corner frequency). Alternatively, one could dispense with the noise parameters and use the noise jtable function in lieu of the flicker jfioise functions to use the measured noise results directly. The "wpn" and "fpn"

When interested in the effect of signals coupled into the oscillator through the supplies or the substrate, one would compute the transfer function from the interfering source to the phase output of the oscillator using either a PAC or PXF analysis. Again, one would simply assume that the perturbation in the output of the oscillator is completely in the phase, which is true except at very high offset frequencies. One then employs (12) and (13) to predict the response at the output of the PLL. IV.

Even in the phase-domain model for the PLL, the loop filter remains in the voltage domain and is represented with a full circuit-level model, as shown in Listing 5. As such, the noise behavior of the filter is naturally included in the phasedomain model without any special effort assuming that the noise is properly included in the resistor model.

Listing 3 — Phase-domain oscillator noise model. include "discipline.h' module oscillator(out); output out; phase out; parameter real n = 0 from [O:inf); // white output phase noise at 1 Hz (rad2/Hz) parameter real fc = 0 from [O:inf); // flicker noise corner frequency (Hz)

Listing 5 — Loopfiltermodel. include "discipline.h" module loopFilter(n); electrical n; ground gnd;* parameter real d = 1n from (0:inf); parameter real c2 = 200p from (0:inf); parameter real r = 10K from (O:inf); electrical int;

analog begin Theta(out) <+ flicker__noise(n, 2, "wpn") + flicker_noise(n*fc, 3, "fpn"); end endmodule

capacitor #(.c(d)) C1(n, gnd); capacitor #(.c(c2)) C2(n, int); resistor #(.r(r)) R(int, gnd);

Listing 4 — Phase-domain VCO noise model. include "discipline.h" include "constants.h" module vco(in, out); input in; output out; voltage in; phase out; parameter real gain = 1 from (0:inf); //transfer gain, Kvco (HzN) parameter real n = 0 from [0:inf); // white output phase noise at 1 Hz (rad2/Hz) parameter real fc = 0 from [0:inf); // flicker noise corner frequency (Hz) analog begin Theta(out) <+ 2*'M_PI*gain*ldt(V(in)); Theta(out) <+ flickerjioise(n, 2, "wpn") + flicker_nofse(n*fc, 3, "fpn"); end endmodule strings passed to the noise functions are labels for the noise sources. They are optional and can be chosen arbitrarily,

LOOP FILTER

endmodule t The ground statement is not currently supported in Cadence's Verilog-A implementation, so instead ground is explicitly passed into the module. V. PHASE DETECTOR AND CHARGE PUMP

As with the VCO, the noise of the PFD/CP as needed by the phase-domain model is found directly with simulation. Simply drive the block with a representative periodic signal, perform a PNoise analysis, and measure the output noise current. In this case, a representative signal would be one that produced periodic switching at the output. This is necessary to capture the noise present during the switching process. Generally the noise appears as in Figure 7, in which case the noise is parameterized with n and/ c . n is the noise power density at frequencies above the flicker noise corner frequency,/ c , and below the noise bandwidth of the circuit. The phase-domain model for the PFD/CP is given in Listing 6. It is based on (3). Alternatively, as before one could

53

A SQ

A. A Cyclostationary Noise. Flicker sources dominate

P 0] Formally, the term cyclostationary implies that the autocorrea lation function of a stochastic process varies with / in a peri^V^TJI White sources dominate * tJ odic fashion [19, 20], which in practice is associated with a periodic variation in the noise power of a signal. In general, T" TTS. pei /c \. the noise produced by all of the nonlinear blocks in a PLL is strongly cyclostationary. To understand why, consider the 1 N*^ str noise produced by a logic circuit, such as the inverter shown Noise bandwidth noi i n Figure 8. The noise at the output of the inverter, n out , Fig. 7. Extracting the noise parameters, n and / c , for the PFD/CP. in comes from different sources depending on the phase of the The graph is plotted on a log-log scale. coi output signal, v out . When the output is high, the output is ou use the noise jtable function in lieu of the white_noise and jinsensitive ng to small changes on the input. The transistor A/p is flickerjnoise functions to use the measured noise results on Q n and the noise at the output is predominantly due to the directly. the thermal noise from its channel. This is region A in the figure. Wl When the output is low, the situation is reversed and most of Listing 6 — Phase-domain phase detector noise model. the output noise is due to the thermal noise from the channel of o f A/N. This is region B. When the output is transitioning, 'include "discipline.h" thermal noise from both Afp and M N contribute to the output. * include "constants.h" the In addition, the output is sensitive to small changes in the module phaseDetector(pin, nin, out); input. In fact, any noise at the input is amplified before reachm input pin, nin; output out; * ing the output. Thus, noise from the input tends to dominate in phase pin, nin; * over the thermal noise from the channels of M P and M N in ov electrical out; this region. Noise at the input includes noise from the previparameter real gain = 1 from (O:inf); thi ous stage and noise from both devices in the form of flicker // transfer gain (A/cycle) ou noise and thermal noise from gate resistance. This is region C parameter real n = 0 from [O:inf); no in // white output current noise (A2/Hz) [n the figure. parameter real fc = 0 from [Orinf); // flicker noise corner frequency (Hz)

Hr

analog begin l(out) <+ gain * Theta(pin,nin) / (2**M_PI); l(out) <+ white_noise(n, "wpn") + flicker_noise(n*fc, 1, "fpn"); end endmodule

in

v

^p out

outh

VI. FREQUENCY DIVIDERS

There are several reasons why the process of extracting the noise produced by the frequency dividers is more complicated than that needed for other blocks. First, the phase noise is needed and, as of the time when this document was written, SpectreRF reports on the total noise and does not yet make the phase noise available separately. Secondly, the frequency dividers are always followed by some form of edge-sensitive thresholding circuit, in this case the PFD, which implies that the overall noise behavior of the PLL is only influenced by the noise produced by the divider at the time when the threshold is being crossed in the proper direction. The noise produced by the frequency divider is cyclostationary, meaning that the noise power varies over time. Thus, it is important to analyze the noise behavior of the divider carefully. The second issue is discussed first.

p^ Fig. 8. Noise produced by an inverter (nout) as a function of the oui output signal (vout). In region A the noise is dominated by the thermal n0]

noise of Mp in region B its dominated by the thermal noise of A/^,

n< in region C the output noise includes the thermal noise from both |jand

devices as well as the amplified noise from the input.

JThe J challenge in estimating the effect of noise passing m] through a threshold is the difficulty in estimating the noise at

tfa the point where the threshold is crossed. There are several dif-

ferent ways of estimating the effect of this noise, but the simplest is to use the strobed noise feature of SpectreRF. * When the strobed noise feature is active, the noise produced by the

54

circuit is periodically sampled to create a discrete-time random sequence, as shown in Figure 9. SpectreRF then computes the power-spectral density of the sequence. The sample time would be adjusted to coincide with the desired threshold crossings. Since the T-periodic cyclostationary noise process is sampled every T seconds, the resulting noise process is stationary. Furthermore, the noise present at times other than at the sample points is completely ignored.

i Pi Pi P.

\J \J u

V (t)

1

*

S^f) = [2nfo/^Pjsn{f).

C. Phase-Domain Model for Dividers To extract the phase noise of a divider, drive the divider with a representative periodic input signal and perform a PSS analysis to determine the threshold crossing times and the slew rate {dvldt) at these times. Then use SpectreRF's strobed PNoise analysis to compute Sn(f). When running PNoise analysis, assure that the maxsidebands parameter is set sufficiently large to capture all significant noise folding. A large value will slow the simulation. To reduce the number of sidebands needed, use T as small as possible. S^(f) is then computed from (32). Generally the noise appears as in Figure 10. Notice that the noise is periodic in/with period 1/7 because n is a discrete-time sequence with period T. The parameters n and/ c for the divider are extracted as illustrated. The high frequency roll-off is generally ignored because it occurs above the frequency range of interest.

t

Fig. 9. Strobed noise. The lower waveform is a highly magnified view of the noise present at the strobe points in vn, which are chosen to coincide with the threshold crossings in v.

B. Converting to Phase Noise The act of converting the noise from a continuous-time process to a discrete-time process by sampling at the threshold crossings makes the conversion into phase noise easier. If vn is the continuous-time noisy response, and v is the noise-free response (response with the noise sources turned off), then^ n;=v n (/7)~v0T). (27> Then if vn is noisy because it is corrupted with a phase noise process 0, then

Vn{t) = 2itf/ v(t+m

5A

«r\

(28)

M L D | p . (30) lntn

Finally, <|>,- can be found from nt using

,dv(iT)

A / \ UT

With ripple counters, one usually only characterizes one stage at a time and combines the phase noise from each stage by assuming that the noise in each stage is independent (true for device noise, would not be true for noise coupling into the divider from external sources). The variation due to phase noise accumulates, however it is necessary to account for the increasing period of the signals at each stage along the ripple counter. Consider an intermediate stage of a /sT-stage ripple counter. The total phase noise at the output of the ripple counter that results due to the phase noise 5 ^ at the output of stage k is (TK/T02S^. So the total phase noise at the output of the ripple counter is

and

. ,

White sources dominate

Fig. 10. Extracting the noise parameters, n and/ c , for the divider.

(29)

at

Flicker sources dominate

Noise bandwidth \ /

Assume the phase noise § is small and linearize v using a Taylor series expansion

ni . sv(jT) + MiI)gZ)_ v ( j T ) = ' at 2nf0n

(32)

where Sn(f) and S^(f) are the power spectral densities of the ni and <(),• sequences.

n. r

i

I

v is T periodic, which makes dv(iT)ldt a constant, and so

(31)

K

t The strobed-noise feature of SpectreRF is also referred to as its time-domain noise feature. t It is assumed that the sequence nt is formed by sampling the noise at iT, which implies that the threshold crossings also occur at iT. In practice, the crossings will occur at some time offset from iT. That offset is ignored. It is done without loss of generality with the understanding that the functions v and vn can always be reformulated to account for the offset.

Vut = 4 £ ^

(33)

*=o * arem e

where S^ and 7Q phase noise and signal period at the input to the first stage of the ripple counter. With undesired variations in the supplies or in the substrate the resulting phase noise in each stage would be correlated, so one would need to compute the transfer function from the sig-

55

nal source to the phase noise of each stage and combine in a vector sum.

a frequency divider that implements non-integer divide ratio except in a few very restrictive cases, so instead a divider that is capable of switching between two integer divide ratios is used, and one rapidly alternates between the two values in such a way that the time-average is equal to the desired noninteger divide ratio [13]. A block diagram for a fractional-Af synthesizer is shown in Figure 11. Divide ratios of N and N + 1 are used, where N is the first integer below the desired divide ratio, and N + 1 is the first integer above. For example, if the desired divide ratio is 16.25, then one would alternate between the ratios of 16 and 17, with the ratio of 16 being used 75% of the time. Early attempts at fractional-N synthesis alternated between integer divide ratios in a repetitive manner, which resulted in noticeable spurs in the VCO output spectrum. More recently, AZ modulators have been used to generate a random sequence with the desired duty cycle to control the multi-modulus dividers [21]. This has the effect of trading off the spurs for an increased noise floor, however the AZ modulator can be designed so that most of the power in its output sequence is at frequencies that are above the loop bandwidth, and so are largely rejected by the loop.

Unlike in ripple counters, phase noise does not accumulate with each stage in synchronous counters. Phase noise at the output of a synchronous counter is independent of the number of stages and consists only of the noise of its clock along with the noise of the last stage. The phase-domain model for the divider, based on (2), is given in Listing 7. As before, one could use the noisejtable function in lieu of the white_noise and flickerjnoise functions to use the measured noise results directly. Listing 7 — Phase-domain divider noise model. Include "discipline.h" module divider(in, out); input in; output out; phase in, out; parameter real ratio = 1 from (OAni);//divide ratio parameter real n = 0 from [0:inf); // white output phase noise (rads?/Hz) parameter real fc = 0 from [0:inf); // flicker noise corner frequency (Hz)

osc

analog begin Theta(out) <+ Theta(in) / ratio; Theta(out) <+ white_noise(n, "wpn") + flicker_noise(n*fc, 1, "fpn"); end endmodule

/ref

CP

PFD

LF

VCO /out

/J

FD +N, N+l

Mod

Fig. 11. The block diagram of a fractional-N frequency synthesizer. VII.

FRACTIONAL-N SYNTHESIS

One of the drawbacks of a traditional frequency synthesizer, also known as an integer-N frequency synthesizer, is that the output frequency is constrained to be N times the reference frequency. If the output frequency is to be adjusted by changing N, which is constrained by the divider to be an integer, then the output frequency resolution is equal to the reference frequency. If fine frequency resolution is desired, then the reference frequency must be small. This in turn limits the loop bandwidth as set by the loop filter, which must be at least 10 times smaller than the reference frequency to prevent signal components at the reference frequency from reaching the input of the VCO and modulating the output frequency, creating spurs or sidebands at an offset equal to the reference frequency and its harmonics. A low loop bandwidth is undesirable because it limits the response time of the synthesizer to changes in N. In addition, the loop acts to suppress the phase noise in the VCO at offset frequencies within its bandwidth, so reducing the loop bandwidth acts to increase the total phase noise at the output of the VCO. The constraint on the loop bandwidth imposed by the required frequency resolution is eliminated if the divide ratio N is not limited to be an integer. This is the idea behind fractional-N synthesis. In practice, one cannot directly implement

56

The phase-domain small-signal model for the combination of a fractional-// divider and a AE modulator is given in Listing 8. It uses the noisejtable function to construct a simple piece-wise linear approximation of the noise produced in an rfi1 order AE modulator that is parameterized with the low frequency noise generated by the modulator, along with the corner frequency and the order. VIII. JITTER

The signals at the input and output of a PLL are often binary signals, as are many of the signals within the PLL. The noise on binary signals is commonly characterized in terms of jitter. Jitter is an undesired perturbation or uncertainty in the timing of events. Generally, the events of interest are the transitions in a signal. One models jitter in a signal by starting with a noise-free signal v and displacing time with a stochastic process./. The noisy signal becomes vn(0 = v(r+y(0)

(34)

withy assumed to be a zero-mean process and v assumed to be a 7-periodic function, j has units of seconds and can be interpreted as a noise in time. Alternatively, it can be reformulated as a noise is phase, or phase noise, using

between transitions. The next metric characterizes the correlations between transitions as a function of how far the transitions are separated in time.

Listing 8 — Phase-domain fractional-N divider model. include "discipline.h" module divider(in, out); input in; output out; phase in, out; parameter real ratio = 1 from (O:lnf);// divide ratio parameter real n = 0 from [0:inf); // white output phase noise (rads?/Hz) parameter real fc = 0 from [0:inf); // flicker noise corner frequency (Hz) parameter real bw = 1 from (O:inf);//AX mod bandwidth parameter integer order = 1 from (0:9);//AZ mod order parameter real fmax = 10*bw from (bw:inf); // maximum frequency of concern analog begin Theta(out) <+ Theta(in) / (ratio + noise_table([ 0, n, bw, n, fmax, n*pow((fmax/bw),order) ], "dsn")); end endmodule

Define Jk(i) to be the standard deviation of ti+k - th 7,(0 = Vvar(f,. + , - * , . ) .

(38) 1

Jk(i) is referred to as k-cycle jitter or long-term jitter '. It is a measure of the uncertainty in the length of k cycles and has units of time. 7j, the standard deviation of the length of a single period, is often referred to as the period jitter, and it denoted J, where J = 7]. Another important jitter metric is cycle-to-cycle jitter. Define 7} = ft-+i - tx to be the period of cycle i. Then the cycle-tocycle jitter 7CC is ' c c » = V Var < 7 '.- + l- 7 '*>-


(35)

vn(0 = v(, + f | ) .

(36)

where/ o = 1/Fand

A. Jitter Metrics Define {^} as the sequence of times for positive-going zero crossings, henceforth referred to as transitions, that occur in vn. The various jitter metrics characterize the statistics of this sequence. The simplest metric is the edge-to-edge jitter, 7 ee , which is the variation in the delay between a triggering event and a response event. When measuring edge-to-edge jitter, a clean jitter-free input is assumed, and so the edge-to-edge jitter 7 ec is

Cycle-to-cycle jitter is like edge-to-edge jitter in that it is a scalar jitter metric that does not contain information about the correlation in the jitter between distant transitions. However, it differs in that it is a measure of short-term jitter that is relatively insensitive to long-term jitter [22]. As such, cycle-tocycle jitter is the only jitter metric that is suitable for use when flicker noise is present. All other metrics are unbounded in the presence of flicker noise. If7(0 is either stationary or T-cyclostationary, then {t{\ is stationary, meaning that these metrics do not vary with i, and so 7 e e (0, «J&(0> a n d Jcc(0 c a n b e shortened to 7 ee , Jk, and 7CC. These jitter metrics are illustrated in Figure 12.

edge-to-edge jitter

Edge-to-edge jitter is also a scalar jitter metric, and it does not convey any information about the correlation of the jitter

~ \

jeeu) = 7^8^j fc-cycle jitter

Jk(i) = Jynr(tl + k-tii

(37) Edge-to-edge jitter assumes an input signal, and so is only defined for driven systems. It is an input-referred jitter metric, meaning that the jitter measurement is referenced to a point on a noise-free input signal, so the reference point is fixed. No such signal exists in autonomous systems. The remaining jitter metrics are suitable for both driven and autonomous systems. They gain this generality by being self-referred, meaning that the reference point is on the noisy signal for which the jitter is being measured. These metrics tend to be a bit more complicated because the reference point is noisy, which acts to increase the measured jitter.

<39>

K_c

"*" p ' "~i

|—I

H* HI

cycle-to-cycle jitter

|—|

|—i

k cycles p

r~\

^\ ~ti+k

H * ~H

'««= j™iTi+l-Ti) J l J l J U L f Fig. 12. The various jitter metrics. B. Types of Jitter The type of jitter produced in PLLs can be classified as being from one of two canonical forms. Blocks such as the PFD, CP, and FD are driven, meaning that a transition at their output is a direct result of a transition at their input. The jitter t Some people distinguish betweenfc-cyclejitter and long-term jitter by defining the long-term jitter J^ as being thefc-cyclejitter Jk as k -»«>.

57

exhibited by these blocks is referred to as synchronous jitter, it is a variation in the delay between when the input is received and the output is produced. Blocks such as the OSC and VCO are autonomous. They generate output transitions not as a result of transitions at their inputs, but rather as a result of the previous output transition. The jitter produced by these blocks is referred to as accumulating jitter, it is a variation in the delay between an output transition and the subsequent output transition. Table I previews the basic characteristics of these two types of jitter. The formulas for jitter given in this table are derived in the next two sections. T A B L E I: THE TWO CANONICAL FORMS OF JITTER.

Jitter Type

Circuit Type

synchronous

driven (pFD/cp ^

. . accumulating |

autonomous ( Q S Q yep)

Period Jitter , /var(« ( r ) ) J = ^ / < f t |

Synchronous jitter is exhibited by driven systems. In the PLL, the PFD/CP and FDs exhibit synchronous jitter. In these components, an output event occurs as a direct result of, and some time after, an input event. It is an undesired fluctuation in the delay between the input and the output events. If the input is a periodic sequence of transitions, then the frequency of the output signal is exactly that of the input, but the phase of the output signal fluctuates with respect to that of the input. The jitter appears as a modulation of the phase of the output, which is why it is sometimes referred to as phase modulated or PM jitter.

W) = ^

<>*(0 = Vvaraa + k)T+jsync(ti

+ k)]

ft-'i>'

- [iT+j^)])

< 43)

,(44)

y,(/) = 72varO sync (/ /) ).

(45)

Jk(i) = V27ee(0 .

(46)

Since 7Sync(0 is jT-cyclostationary ysync =; syn c('/) is independent of i, and so is 7 ee and Jk. The factor of 72 in (46) stems from the length of an interval including the independent variation from two transitions. From (46), Jk is independent of £, and so Jk = J for k = 1,2, ...m. (47) Using similar arguments, one can show that with simple synchronous jitter, Jcc = J,

(48)

A. Extracting Synchronous Jitter

IX. SYNCHRONOUS JITTER

v n (0 = Ht+jsync(O)

k® = Vvar<'i +

Generally, the jitter produced by the PFD/CP and FDs is well approximated by simple synchronous jitter if one can neglect flicker noise.

,— 7 = ToT

Let T| be a stationary or T-cyclostationary process, then

J

The jitter in driven blocks, such as the PFD/CP or FDs, occurs because of an interaction between noise present in the blocks and the thresholds that are inherent to logic circuits. In systems where signals are continuous valued, an event is usually defined as a signal crossing a threshold in a particular direction. The threshold crossings of a noiseless periodic signal, v(0, are precisely evenly spaced. However, when noise is added to the signal, vn(r) = v(i) + nv(t), each threshold crossing is displaced slightly. Thus, a threshold converts additive noise to synchronous jitter.

The amount of displacement in time is determined by the amplitude of the noise signal, nv(t) and the slew rate of the (40) periodic signal, dv(tc)/dt, as the threshold is crossed, as shown in Figure 13 [23]. If the noise /^ is stationary, then (41)

var0 ))s (49) exhibits synchronous jitter. If t| is further restricted to be a white Gaussian stationary or T-cyclostationary process, then vn(0 exhibits simple synchronous jitter. The essential charac- where tc is the time of a threshold crossing in v (assuming the teristic of simple synchronous jitter is that the jitter in each noise is small). event is independent or uncorrelated from the others, and (35) shows that it corresponds to white phase noise. Driven cir'c Av cuits exhibit simple synchronous jitter if they are broadband and if the noise sources are white, Gaussian and small. The Noise sources are considered small if the circuit responds linearly to Threshold Histogram the noise, even though at the same time the circuit may be responding nonlinearly to the periodic drive signal.

-^

^7^

For systems that exhibit simple synchronous jitter, from (37), J

Jf>= Similarly, from (38),

J™<Jsync«i»-

•At

(42)

Jitter Histogram

Fig. 13. How a threshold converts noise into jitter.

58

Generally nv is not stationary, but cyclostationary (refer back With ripple counters, one usually only characterizes one stage to Section VI-A). It is only important to know when the noisy at a time. The total jitter due to noise in the ripple counter is periodic signal vn(t) crosses the threshold, so the statistics of then computed by assuming that the jitter in each stage is nv are only significant at the time when vn(t) crosses the independent (again, this is true for device noise, but not for noise coupling into the divider from external sources) and threshold, taking the square-root of the sum of the square of the jitter on var(n ( O ) each stage.

v

"°- < '< )) - T^pk>-

<50)

The jitter is computed from (42) using (49) or (50), dv(tc)/dt

V

'

To compute var(nv(rc)), one starts by driving the circuit with a representative periodic signal, and then sampling v(t) at intervals of 7to form the ergodic sequence {v(rl)} where tt = tc for some i. Then the variance is computed by computing the power spectral density for the sequence by integrating from / = ~/0/2 to/ o /2. Recall that the noise is periodic in/with period/o = 1/7because n is a discrete-time sequence with rate T.

Unlike in ripple counters, jitter does not accumulate with synchronous counters. Jitter in a synchronous counter is independent of the number of stages and consists only of the jitter of its clock along with the jitter of the last stage. 2) Extracting the Jitter of the Phase Detector: The PFD/CP is not followed by a threshold. Rather, it feeds into the LF, which is sensitive to the noise emitted by the CP at all times, not just during transitions. This argues that the noise of the PFD/CP be modeled as a continuous noise current. However, as mentioned earlier, doing so is problematic for simulators and would require very tight tolerances and small time steps. So instead, the noise of the PFD/CP is referred back to its inputs. The inputs of the PFD/CP are edge triggered, so the noise can be referred back as jitter.

In practice, this is done by using the strobed noise capability of SpectreRF^ to compute the power spectral density of the sequence. When the strobed noise feature is active, the noise To extract the input-referred jitter of a PFD/CP, drive both produced by the circuit is periodically sampled to create a dis- inputs with periodic signals with offset phase so that the PFD/ crete-time random sequence, as shown in Figure 9. SpectreRF CP produces a representative output. Use SpectreRF's PNoise then computes the power-spectral density of the sequence. analysis to compute the output noise over the total bandwidth The sample time should be adjusted to coincide with the of the PFD/CP (in this case, use the conventional noise analydesired threshold crossings. Since the T-periodic cyclostation- sis rather than the strobed noise analysis). Choose the freary noise process is sampled every T seconds, the resulting quency range of the analysis so that the total noise at noise process is stationary. Furthermore, the noise present at frequencies outside the range is negligible. Thus, the noise should be at least 40 dB down and dropping at the highest fretimes other than at the sample points is completely ignored. quency simulated. Integrate the noise over frequency and 1) Extracting the Jitter of Dividers: To extract the jitter of a apply Wiener-Khinchin Theorem [24] to determine divider, drive the divider with a representative periodic input var(n) = f Sn(f)df, (53) signal and perform a PSS analysis to determine the threshold crossing times and the slew rate (dv/dt) at these times. Then use SpectreRF's strobed PNoise analysis to compute £„(/). the total output noise current squared [19]. Then either calcuThe sample point should be set to coincide with the point late or measure the effective gain of the PFD/CP, K^cV Scale where the output signal crosses the threshold of the subse- the gain so that it has the units of amperes per second. Then quent stage (the phase detector) in the appropriate direction. divide the total output noise current by the gain and account When running PNoise analysis, assure that the maxsidebands for there being two transitions per cycle to distribute the noise parameter is set sufficiently large to capture all significant over to determine the input-referred jitter for the PFD/CP, noise folding. A large value will slow the simulation. To J = T F**W (54) reduce the number of sidebands needed, use T as small as ee K } PFD/cp 2nKdJ 2 ' possible. SpectreRF computes the power spectral density, which is integrated to compute the total noise at the sample As before, when running PNoise analysis, assure that the points, maxsidebands parameter is set sufficiently large to capture all significant noise folding. A large value will slow the simula/°/2 tion. To reduce the number of sidebands needed, use T as var(nv(fc)) = J Sn(f,tz)df. (52) small as possible. •'o —oo

Then J^ is computed from (51). X. ACCUMULATING JITTER

t The strobed-noise feature of SpectreRF is also referred to as its timf-Hotnain nnisp. fpfltiire.

Accumulating jitter is exhibited by autonomous systems, such as oscillators, that generate a stream of spontaneous output

59

transitions. In the PLL, the OSC and VCO exhibit accumulating jitter. Accumulating jitter is characterized by an undesired variation in the time since the previous output transition, thus the uncertainty of when a transition occurs accumulates with every transition. Compared with a jitter free signal, the frequency of a signal exhibiting accumulating jitter fluctuates randomly, and the phase drifts without bound. Thus, the jitter appears as a modulation of the frequency of the output, which is why it is sometimes referred to as frequency modulated or FM jitter. Again assume that T| be a stationary or T-cyclostationary process, then

W ) = f neorfc J

o

(55)

Similarly, Jcc = 727.

(59)

Generally, the jitter produced by the OSC and VCO are well approximated by simple accumulating jitter if one can neglect flicker noise. A. Extracting Accumulating Jitter The jitter in autonomous blocks, such as the OSC or VCO, is almost completely due to oscillator phase noise. Oscillator phase noise is a variation in the phase of the oscillator as it proceeds along its limit cycle. In order to determine the period jitter / of vn(f) for a noisy oscillator, assume that it exhibits simple accumulating jitter so that T| in (55) is a white Gaussian r-cyclostationary noise process (this excludes flicker noise) with a power spectral density of

v n (0 = v(/+y a c c (O) (56) exhibits accumulating jitter. While Tj is cyclostationary and so S^(f)= a, has bounded variance, (55) shows that the variance of y acc , and hence the phase difference between v(t) and v n (0, is and an autocorrelation function of unbounded. Rr](tvt2) = ab{tx-t2)y If t| is further restricted to be a white Gaussian stationary or T-cyclostationary random process, then v n (0 exhibits simple where 8 is a Kronecker delta function. Then

(60)

(61)

accumulating jitter. In this case, the process {yacc(*T)} that AccW = f T\TW (62> results from sampling y a c c every T seconds is a discrete Wiener process and the phase difference between v(/7) and vn(/7) is a random walk [19]. As shown next, simple accumu- is a Wiener process [19], which has an autocorrelation funclating jitter corresponds to oscillator phase noise that results tion of R from white noise sources. j (*!> l2> = amin(f 1? h^ • ( 63 > The essential characteristic of simple accumulating jitter is that the incremental jitter that accumulates over each cycle is The period jitter is the standard deviation of the variation in independent or uncorrelated. Autonomous circuits exhibit one period, and so Jl simple accumulating jitter if they are broadband and if the = ™0'acc('+7Wacc«). (64) noise sources are white, Gaussian and small. The sources are ^2 = E [ 0 a c c 0 + 7 ) - j a c c ( 0 ) 2 ] (65) considered small if the circuit responds linearly to the noise, though at the same time the circuit may be responding nonlin2 2 2 J = E[/ acc (r + T) - 2jacc(t + 7); acc (0 +; a c c (0 ] (66) early to the oscillation signal. An autonomous circuit is considered broadband if there are no secondary resonant Jl = EL/acc« + 7) 2 ] " 2 Et/ a c c (/ + 7)y acc (/)] + E[/ a c c (0 2 ] (67) responses close in frequency to the primary resonance.* J2 = R. (t + T,t + T)-2Rj (/+7W) + * / . (M) (68) For systems that exhibit simple accumulating jitter, each tran•'ace •'ace 'ace sition is relative to the previous transition, and the variation in J2 = a(t + T) - 2at + at (69) the length of each period is independent, so the variance in / = Jaf (70) the time of each transition accumulates, Jk= 4~kJ for k = 0, 1 , 2 , . . . , (57) We now have a way of relating the jitter of the oscillator to the PSD of T|. However, x\ is not measurable, so instead the jitter where is related to the phase noise S§. To do so, consider simple accumulating jitter written in terms of phase, J = ^varO^. +^-varO-^,.)). (58) •accW = 2nfohcc^ = 2%fo hOOrfC, t Oscillators are strongly nonlinear circuits undergoing large periodic variations, and so signals within the oscillator freely mix up and down in frequency by integer multiples of the oscillation frequency. where/„ = 1/r. From (60) and (71) the PSD of ^ For this reason, any low frequency time constants or resonances in (2rc/o)2 _ aft supply or bias lines would effectively act like close-in secondary res5* (A/) = a onances. In fact, this is the most likely cause of such phenomenon. ^acc (2nAf)2 A/ 2 '

60

(71) is (72)

From (26)

XI. JITTER OF A PLL

If a PLL synthesizer is constructed from blocks that exhibit simple synchronous and accumulating jitter, then the jitter 2 ^acc 2A/ behavior of the PLL is relatively easy to estimate [26]. a = 2UAf)^ . (74) Assume that the PLL has a closed-loop bandwidth of/ L , and that x L = l/2rc/L, then for k such that kT « T L , jitter from the VCO dominates and the PLL exhibits simple accumulating Determine a by choosing A/well above the corner frequency, jitter equal to that produced by the VCO. Similarly, at large k t0 /comer avoid ambiguity and well below/ o to avoid the noise (low frequencies), the PLL exhibits simple accumulating jitter from other sources that occur at these frequencies. equal to that produced by the OSC. Between these two 1) Example: To compute the jitter of an oscillator, an RF sim- extremes, the PLL exhibits simple synchronous jitter. The ulator such as SpectreRF is used to find L &ndfo of the oscil- amount of which depends on the characteristics of the loop lator. Given these, a is found with (74), J is found with (70) and the level of synchronous jitter exhibited by the FDs and and Jk is found with (57). This procedure is demonstrated for the PFD/CP. The behavior of such a PLL is shown in the oscillator shown in Figure 14. This is a very low noise Figure 15. oscillator designed in O.35JI CMOS by of Rael and Abidi Accumulating jitter [25]. The frequency of oscillation is 1.1 GHz and the resonafrom OSC tor has a loaded Q of 6. Accumulating jitter

UAf) = fa (A/) = ^ - ,

(73)

2

logC/*)

T

from VCO ^ Synchronous jitter from PFD/CP, FDs

J

AJ

log(*)

Fig. 15. Long-term jitter (Jk) for an idealized PLL as a function of the number of cycles.

' <j -|jK<-t-. ; ' '

XII. MODELING A PLL WITH JITTER

®'DD Fig. 14. Differential LC oscillator. The procedure starts by using an RF simulator such as SpectreRF to compute the normalized phase noise L. Its PNoise analysis is used, with the maxsidebands parameter set to at least 10 to adequately account for noise folding within the oscillator.* In this case, £ = - 1 1 0 dBc at 100 kHz offset from the carrier. Apply (74) to compute a from L, where £( A/) = 10"11, A/= 100 kHz, and/ 0 = 1.1 GHz, a = 2 • 10" 11

1Q

= 165.3X10" 21 .

(75)

Vl.lxlOV The period jitter J is then computed from (70), fa /165.3 x 10~ 21 ,nf-, 1O~. / - = /— = 12.3 fs. (76) ^/0 A/ 1.1 GHz In this example, the noise was extracted for the VCO alone. In practice, the LF is generally combined with the VCO before extracting the noise so that the noise of the LF is accounted for. / r* J = JaT =

The basic behavioral models for the blocks that make up a PLL are well known and so will not be discussed here in any depth [27, 28]. Instead, only the techniques for adding jitter to the models are discussed. Jitter is modeled in an AHDL by dithering the time at which events occur. This is efficient because it does not create any additional activity, rather it simply changes the time when existing activity occurs. Thus, models with jitter can run as efficiently as those without. A. Modeling Driven Blocks A feature of Verilog-A allows especially simple modeling of synchronous jitter. The transitionQ function, which is used to model signal transitions between discrete levels, provides a delay argument that can be dithered on every transition. The delay argument must not be negative, so a fixed delay that is greater than the maximum expected deviation of the jitter must be included. This approach is suitable for any model that exhibits synchronous jitter and generates discrete-valued outputs. It is used in the Verilog-A divider module shown in Listing 9, which models synchronous jitter with (41) where 7 sync *s a stationary white discrete-time Gaussian random process. It is also used in Listing 10, which models a simple PFD/CP.

t At one point it was mistakenly suggested in the documentation for SpectreRF that maxsidebands should be set to 0 for oscillators. This causes SpectreRF to ignore all noise folding and results in a signifi- 1) Frequency Divider Model: The model, given in Listing 9, cant underestimation of the total noise. operates by counting input transitions. This is done in the

61

Listing 9 — Frequency divider that models synchronous jitter.

Listing 10 — PFD/CP model with synchronous jitter. include "discipline.h"

include "discipline.h" module divider (out, in);

module pfd_cp (out, ret, vco);

input in; output out; electrical in, out;

input ref, vco; output out; electrical ref, vco, out;

parameter real Vlo=-1, Vhi=1; parameter integer ratio=2 from [2:inf); parameter integer dir=1 from [-1:1] exclude 0; //dir=1 for positive edge trigger //dir=-1 for negative edge trigger parameter real tt=1n from (0:inf); parameter real td=O from (0:inf); parameter real jitter=O from [0:td/5);//edge-to-edge jitter parameter real ttol=1p from (0:td/5);// ttoi« jitter

parameter real lout=100u; parameter integer dir=1 from [-1:1] exclude 0; //dir=1 for positive edge trigger //dir=-1 for negative edge trigger parameter real tt=1n from (0:inf); parameter real td=O from (0:inf); parameter real jitter=O from [0:\d/5);//edge-to-edge jitter parameter real ttol=1p from (0:td/5);//tfo/«jitter

integer count, n, seed; real dt; analog begin @(initial_step) seed = -311 ; @(cross(V(in) - (Vhi + Vlo)/2, dir, ttol)) begin //count input transitions count = count + 1; if (count >= ratio) count = 0; n = (2*count >= ratio); //add jitter dt = jitter*$dist_normal(seed,0,1); end V(out) <+ transition^ ? Vhi: Vlo, td+dt, tt); end endmodule

integer state, seed; real dt; analog begin @(initiaLstep) seed = 716; @ (cross(V(ref), dir, ttol)) begin if (state > -1) state = state - 1 ; dt = jitter*$dist_normal(seed,0,1); end @(cross(V(vco), dir, ttol)) begin

if (state < 1) state = state + 1; dt = jitter*$dist_normal(seed,0,1); end l(out) <+ transition(lout*state, td + dt, tt); end endmodule

input in the direction dir, the output is decremented. If both the VCO and reference inputs are at the same frequency, then @ cross block. The cross function triggers the @ block at the the average value of the output is proportional to the phase precise moment when its first argument crosses zero in the difference between the two, with the average being negative if direction specified by the second argument. Thus, the @ the reference transition leads the VCO transition and positive block is triggered when the input crosses the threshold in the otherwise [3]. As before, the time of the output transitions is user specified direction. The body of the @ block increments randomly dithered by dt to model jitter. The output is modthe count, resets it to zero when it reaches ratio, then detereled as an ideal current source and a finite transition time promines if count is above or below its midpoint (n is zero if the vides a simple model of the dead band in the CP. count is below the midpoint). It also generates a new random dither dT that is used later. Outside the @ block is code that B. Modeling Accumulating Jitter executes continuously. It processes n to create the output. The value of the ?: operator is Vhi if n is 1 and Vlo if n is 0. Finally, 1) OSC Model: The delay argument of the transition^) functhe transition function adds a finite transition time of tt and a tion cannot be used to model accumulating jitter because of delay of td + dt. The finite transition time removes the discon- the accumulating nature of this type of jitter. When modeling tinuities from the signal that could cause problems for the a fixed frequency oscillator, the timerQ function is used as simulator. The jitter is embodied in dt, which varies randomly shown in Listing 11. At every output transition, the next tranfrom transition to transition. To avoid negative delays, td must sition is scheduled using the timerQ function to be always be larger than dt. This model expects jitter to be speci- T/K + Jb/Jk in the future, where 8 is a unit-variance zerofied as igg, as computed with (51). mean random process and K is the number of output transitions per period. Typically, K = 2. 2) PFD/CP Model: The model for a phase/frequency detector combined with a charge pump is given in Listing 10. It imple- C. VCO Model ments a finite-state machine with a three-level output, - 1 , 0 and +1. On every transition of the VCO input in direction dir, A VCO generates a sine or square wave whose frequency is the output is incremented. On every transition of the reference proportional to the input signal level. VCO models, given in

62

Listing 11 — Fixed frequency oscillator with accumulating jitter.

AT isis aa random random variable variable with with variance variance AT

include "discipline.h" module osc (out);

var(AT) = 2 j p = Jl.

(78)

75,Axf. = —lz

(79)

Therefore,

output out; electrical out; parameter real freq=1 from (0:inf); parameter real Vlo==-1, Vhi=1;

JK

where 8 is a zero-mean unit-variance Gaussian random process. The dithered frequency is

parameter real tt=O.O1/freq from (O:inf);

parameter real jitter=O from [0:0M1req);// period jitter integer n, seed; real next, dT;

f - if—1—^ fi

analog begin @ (initiaLstep) begin seed = 286; next = 0.5/freq + $abstime; end

k

Y,]

f

ram (

}

''-ursj.-

<81>

Finally varCr,) = J2/K, so AT,- = JS/Jk

and AT1- = jKJdr

The final model given in Listing 12. This model can be easily modified to fit other needs. Converting it to a model that generates sine waves rather than square waves simply requires replacing the last two lines with one that computes and outputs the sine of the phase. When doing so, consider reducing the number of jitter updates to one per period, in which case the factor of 1.414 should be changed to 1. Listing 13 is a Verilog-A model for a quadrature VCO that exhibits accumulating jitter. It is an example of how to model an oscillator with multiple outputs so that the jitter on the outputs is properly correlated. D. Efficiency of the Models

Vv JU

Conceptually, a model that includes jitter should be just as efficient as one that does not because jitter does not increase the activity of the models, it only affects the timing of particular events. However, if jitter causes two events that would normally occur at the same time to be displaced so that they are no longer coincident, then a circuit simulator will have to use more time points to resolve the distinct events and so will run more slowly. For this reason, it is desirable to combine jitter sources to the degree possible.

mod 2rc "Knit

75 Fig. 16. jitter.

Ax, " l + t f A V c

The @ cross statement is used to determine the exact time when the phase crosses the thresholds, indicating the beginning of a new interval. At this point, a new random trial S£ is generated.

Listings 12 and 13, are constructed using three serial operations, as shown in Figure 16. First, the input signal is scaled to compute the desired output frequency. Then, the frequency is integrated to compute the output phase. Finally, the phase is used to generate the desired output signal. The phase is computed with idtmod, a function that provides integration followed by a modulus operation. This serves to keep the phase bounded, which prevents a loss of numerical precision that would otherwise occur when the phase became large after a long period of time. Output transitions are generated when the phase passes -n/2 and n/2.

Vin

fc

Let A 7 . = ATAT£. , then

V(out) <+ transition^ ? Vhl: Vlo, 0, tt); end endmodule



i +

T

@(timer(next)) begin n = !n; dT = jitter*$dist_normal(seed,0,1); next = next + 0.5/freq + 0.707*dT; end

0)

~ K\% + ti%) ~

Kx

Block diagram of VCO behavioral model that includes

The jitter is modeled as a random variation in the frequency of the VCO. However, the jitter is specified as a variation in the period, thus it is necessary to relate the variation in the period to the variation in the frequency. Assume that without jitter, the period is divided into K equal intervals of duration T = T/K = l/Kf0. The frequency deviation will be updated every interval and held constant during the intervals. With jitter, the duration of an interval is %. = T + AT.. (77)

To make the HDL models even faster, rewrite them in either Verilog-HDL or Verilog-AMS. Be sure to set the time resolution to be sufficiently small to prevent the discrete nature of time in these simulators from adding an appreciable amount of jitter. 1) Including Synchronous Jitter into OSC: One can combine the output-referred noise of F D ^ and FD^ and the input-

63

Listing 12 — VCO model that includes accumulating jitter. include "discipline.h" include "constants.h" module vco (out, in); input in; output out; electrical out, in; parameter parameter parameter parameter parameter parameter parameter parameter

real Vmin=0; real Vmax=Vmin+1 from (Vmin:inf); real Fmin=1 from (Orinf); real Fmax=2*Fmin from (Fmin:inf); real Vlo=-1, Vhi=1; real tt=0.01/Fmax from (O:inf); real jitter=O from [0:0.25/Fmax);// period jitter real ttol=1u/Fmax from (0:1/Fmax);

real freq, phase, dT; integer n, seed; analog begin ©(initlaLstep) seed = - 5 6 1 ; //compute the freq from the input voltage freq = (V(in) - Vmin)*(Fmax - Fmin) / (Vmax - Vmin) + Fmin;

Listing 13 — Quadrature Differential VCO model that includes accumulating jitter. include "discipline.h" Include "constants.h" module quadVco (Plout.Nlout, PQout,NQout, Pin,Nin); electrical Plout, Nlout, PQout, NQout, Pin, Nin; output Plout, Nlout, PQout, NQout; input Pin, Nin; parameter parameter parameter parameter parameter parameter parameter parameter

real freq, phase, dT; integer i, q, seed; analog begin @(initial_step) seed = 133; //compute the freq from the input voltage freq = (V(Pin.Nin) - Vmin) * (Fmax - Fmin) / (Vmax - Vmin) + Fmin;

//bound the frequency (this is optional) if (freq > Fmax) freq = Fmax; if (freq < Fmin) freq = Fmin;

//bound the frequency (this is optional) if (freq > Fmax) freq = Fmax; if (freq < Fmin) freq = Fmin;

/ / add the phase noise freq = f req/(1 + dT*freq); //phase is the integral of the freq modulo 2K phase = 2*^M_PI*idtmod(freq, 0.0,1.0, -0.5);

/ / add the phase noise freq = freq/(1 + dT*freq);

/ / update jitter twice per period // 1A14=sqrt(K), K=2 jitter updates/period @(cross(phase + *M_PI/2, + 1 , ttol) or cross(phase - 'M_PI/2, +1, ttol)) begin d T = 1.414*jitter*$dist_jiormal(seed,0,1); n = (phase >= - M_PI/2) && (phase < 'M_PI/2); end

//phase is the integral of the freq modulo 2K phase = 2* % MJ D l*idtmod(freq, 0.0,1.0, -0.5); // update jitter where phase crosses n/2 //2=sqrt(K), K=4 jitter updates per period @(cross(phase - 3**M_PI/4, +1, ttol) or cross(phase - x M_PI/4, + 1 , ttol) or cross(phase + 'lvLPI/4, + 1 , ttol) or cross(phase + 3**M__PI/4, +1, ttol)) begin dT = 2*jitter*$dist_normal(seed,0,1); I = (phase >= -3*^M_PI/4) && (phase < %M_PI/4); q = (phase >= - M_PI/4) && (phase < 3*%M_PI/4); end

//generate the output V(out) <+ transition^ ? Vhi: Vlo, 0, tt); end endmodule referred noise of the PFD/CP with the output noise of OSC. A modified fixed-frequency oscillator model that supports two jitter parameters and the divide ratio M is given in Listing 14 (more on the effect of the divide ratio on jitter in the next section). The accJitter parameter is used to model the accumulating jitter of the reference oscillator, and the syncJitter parameter is used to model the synchronous jitter of FD^, FDN and PFD/CP. Synchronous jitter is modeled in the oscillator without using a nonzero delay in the transition function. This is a more efficient approach because it avoids generating two unnecessary events per period. To get full benefit from this optimization, a modified PFD/CP given in Listing 15 is used. This model runs more efficiently by removing support for jitter and the td parameter.

real Vmin=0; real Vmax=Vmin+1 from (Vmin:inf); real Fmin=1 from (O:inf); real Fmax=2*Fmin from (Fminrinf); real Vlo=-1, Vhi=1; real jitter=O from [0:0.25/Fmax);// period jitter real ttol=1u/Fmax from (0:1/Fmax); real tt=0.01/Fmax;

//generate the I and Q outputs V(Plout) <+ transition(i ? Vhi: Vlo, 0, tt); V(Nlout) <+ transition^ ? Vlo: Vhi, 0, tt); V(PQout) <+ transition^ ? Vhi: Vlo, 0, tt); V(NQout) <+ transition(q ? Vlo : Vhi, 0, tt); end endmodule 2) Merging the VCO and FDN: If the output of the VCO is not used to drive circuitry external to the synthesizer, if the divider exhibits simple synchronous jitter, and if the VCO exhibits simple accumulating jitter, then it is possible to include the frequency division aspect of the FD^ as part of the

64

Listing 14 — Fixed-frequency oscillator with accumulating and synchronous jitter.

Listing 15 — PFD/CP without jitter. include "discipline.h"

include "discipline.h"

module pfd_cp (out, ref, vco);

module osc (out);

input ref, vco; output out; electrical ref, vco, out;

output out; electrical out; parameter real freq=1 from (0:inf); parameter real ratio=1 from (0:inf); parameter real Vlo=-1, Vhi=1; parameter real tt=0.01*ratio/freq from (0:inf); parameter real accJitter=O from [O:O.1/freq); //period jitter parameter real syncJitter=O from [0:0.1 *ratlo/freq); // edge-to-edge jitter

analog begin @(initial_step) begin accSeed = 286; syncSeed = -459; accSD = accJltter*sqrt(ratio/2); syncSD = syncJitter; next = 0.5/freq + $abstime; end

l(out) <+ transition(lout * state, 0, tt); end endmodule

@(timer(next + dt)) begin n = !n; dT = accSD*$dist_normal(accSeed,0,1); dt = syncSD*$dist_normal(syncSeed,0,1); next = next + 0.5*ratio/freq + dT; end

Thus, to merge the divider into the VCO, the VCO gain must be reduced by a factor of N, the period jitter increased by a factor of JN , and the divider model removed. After simulation, it is necessary to refer the computed results, which are from the output of the divider, to the output of VCO, which is the true output of the PLL. The period jitter at the output of the VCO, Jyco* c a n ^ e computed with (82). To determine the effect of the divider on 5^(0)), square both sides of (82) and apply (70)

V(out) <+ transition^ ? Vhi: Vlo, 0, tt); end endmodule VCO by simply adjusting the VCO gain and jitter. If the divide ratio of FDN is large, the simulation runs much faster because the high VCO output frequency is never generated. The Verilog-A model for the merged VCO and FDN is given in Listing 16. It also includes code for generating a logfile containing the length of each period. The logfile is used in Section XIII when determining 5VCO» the power spectral density of the phase of the VCO output. Recall that the synchronous jitter of F D M and FD# has already been included as part of OSC, so the divider model incorporated into the VCO is noiseless and the jitter at the output of the noiseless divider results only from the VCO jitter. Since the divider outputs one pulse for every N pulses at its input, the variance in the output period is the sum of the variance in N input periods. Thus, the period jitter at the output, /prj, is JN times larger than the period jitter at the input, or

'FD

= JN+VCO-

integer state; analog begin @(cross(V(ref), dir, ttol)) begin If (state > -1) state = state - 1; end @(cross(V(vco), dir, ttol)) begin if (state < 1) state = state + 1; end

integer n, accSeed, syncSeed; real next, dT, dt, accSD, syncSD;

JVcO'

parameter real lout=100u; parameter integer dir=1 from [-1:1] exclude 0; //dir= 1 for positive edge trigger // dir = -1 for negative edge trigger parameter real tt=1n from (0:inf); parameter real ttol=1p from (0:inf);

(82)

fl

FDrFD

aa T VCO7VCO

N

(83)

TvcO=TFD/N> and so

a

(84)

vco

From (72),

s^ VCO-£r2

- ^FD f 2

(85)

Jvco Finally,/vco = W/ FD , and so ^VCO2^

5prj>.

(86)

Once FDN is incorporated into the VCO, the VCO output signal is no longer observable, however the characteristics of the VCO output are easily derived from (82) and (86), which are summarized in Table II. It is interesting to note that while the frequency at the output of FDN is N times smaller than at the output of the VCO, except for scaling in the amplitude, the spectrum of the noise close to the fundamental is to a first degree unaffected by the presence of FD#. In particular, the width of the noise spec-

65

T A B L E II: CHARACTERISTICS OF V C O OUTPUT RELATIVE TO THE

Listing 16 — VCO with FD N .

OUTPUT OF FD/v ASSUMING THE V C O EXHIBITS SIMPLE ACCUMULATING JITTER AND THE FDyy IS NOISE FREE.

Include "discipline.h" module vco (out, in);

Frequency

Jitter

input in; output out; electrical out, in; parameter parameter parameter parameter parameter parameter parameter parameter

real real real real real real real real

Vmin=0; Vmax=Vmin+1 from (Vmin:inf); Fmin=1 from (0:inf); Fmax=2*Fmin from (Fminrinf); ratio=1 from (0:inf); Vlo=-1, Vhi=1; tt=O.O1 *ratio/Fmax from (0:inf); jitter=O from [0:0.25*ratio/Fmax); //VCO period jitter parameter real ttol=1u*ratio/Fmaxfrom (O:ratio/Fmax); parameter real outStart=inf from (1/Fmin:inf); real freq, phase, dT, delta, prev, Vout; integer n, seed, fp; analog begin @ (initial_step) begin seed = - 5 6 1 ; delta = jitter * sqrt(2*ratio); fp = $fopen("periods.m"); Vout = Vlo; end

/vco

//apply the frequency divider, add the phase noise freq = (freq / ratio)/(1 + dT * freq / ratio); //phase is the integral of the freq modulo 1 phase = idtmod(freq, 0.0,1.0, -0.5); / / update jitter twice per period @(cross(phase - 0.25, +1, ttol)) begin dT = delta * $dist_normal(seed, 0,1); Vout = Vhi; end @(cross(phase + 0.25, +1, ttol)) begin dT = delta * $dist_normal(seed, 0,1); Vout = Vlo; if ($abstime >= outStart) $fstrobe( fp, "%0.10e", $abstime - prev); prev = $abstime; end V(out) <+ transition(Vout, 0, tt); end endmodule

trum is unaffected by FD#. This is extremely fortuitous, because it means that the number of cycles we need to simulate is independent of the divide ratio N. Thus, large divide ratios do not affect the total simulation time.

^/FD

j V C 0

-'*> " ^

54 = ^vco

N2S,

VFD

To understand why FD# does not affect the width of the noise spectrum, recall that while we started with a jitter that varied continuously with time, j(t) in (34), for either efficiency or modeling reasons we eventually sampled it to end up with a discrete-time version. The act of sampling the jitter causes the spectrum of the jitter to be replicated at the multiples of the sampling frequency, which adds aliasing. This aliasing is visible, but not obvious, at high frequencies in Figure 18. However, especially with accumulating jitter, the phase noise amplitude at low frequencies is much larger than the aliased noise, and so the close-in noise spectrum is largely unaffected by the sampling. The effect of FD^ is to decimate the sampled jitter by a factor of TV, which is equivalent to sampling the jitter signal, yCX at the original sample frequency divided by N. Thus, the replication is at a lower frequency, the amplitude is lower, and the aliasing is greater, but the spectrum is otherwise unaffected.

//compute the freq from the input voltage freq = (V(in) - Vmin)*(Fmax - Fmin) / (Vmax - Vmin) + Fmin; //bound the frequency (this is optional) if (freq > Fmax) freq = Fmax; if (freq < Fmin) freq = Fmin;

=

Phase Noise

XIII. SIMULATION AND ANALYSIS

The synthesizer is simulated using the netlist from Listing 18 and the Verilog-A descriptions in Listings 14-16, modifying them as necessary to fit the actual circuit. The simulation should cover an interval long enough to allow accurate Fourier analysis at the lowest frequency of interest {Fm^. With deterministic signals, it is sufficient to simulate for K cycles after the PLL settles if F m i n = \I(TK). However, for these signals, which are stochastic, it is best to simulate for \0K to 100AT cycles to allow for enough averaging to reduce the uncertainty in the result. One should not simply apply an FFT to the output signal of the VCO/FDyy to determine £(A/) for the PLL. The result would be quite inaccurate because the FFT samples the waveform at evenly spaced points, and so misses the jitter of the transitions. Instead, -£(40 can be measured with Spectre's Fourier Analyzer, which uses a unique algorithm that does accurately resolve the jitter [11]. However, it is slow if many frequencies are needed and so is not well suited to this application. Unlike HAf), S^(Af) can be computed efficiently. The Verilog-A code for the VCO/FDN given in Listing 16 writes the length of each period to an output file named periods.m. Writing the periods to the file begins after an initial delay, specified using outStart, to allow the PLL to reach steady state. This file is then processed by Matlab from Math Works using the script shown in Listing 17. This script computes S^(Af),

66

the power spectral density of <|), using Welch's method [28]. The frequency range is from/ out /2 to/out/nfft. The script cornListing 17 — Matlab script used for computing S^Af). These results must be further processed using Table II to map them to the output of the VCO. % Process period data to compute S^(Af) echo off; nfft=512; % should be power of two winLength=nfft; overlap=nfft/2; winNBW=1.5; % Noise bandwidth given in bins % Load the data from the file generated by the VCO load periods.m; % output estimates of period and jitter T=mean(periods); J=std(periods); maxdT = max(abs(periods-T))/T; fprintf(T = %.3gs, F = %.3gHz\n',T, 1/T); fprintf('Jabs = %.3gs, Jrel = %.2g%%\n\ J, 100*J/T); fprintf('max dT = %.2g%%\n\ 100*maxdT); fprintf('periods = %d, nfft = %d\n\ length(periods), nfft); % compute the cumulative phase of each transition phases=2*pi*cumsum(periods)/T; % compute power spectral density of phase [Sphi,f]=psd(phases,nfftl1/T,winLength,overlap,'linear>);

XIV.

EXAMPLE

These ideas were applied to model and simulate a PLL acting as a frequency synthesizer. A synthesizer was chosen with/ ref = 25 MHz,/ 0 U t = 2 GHz, and a channel spacing of 200 kHz. As such, M = 125 and N = 10,000. The noise of OSC is -95 dBc/Hz at 100 kHz. Applying (74) to compute a, where HAf) = 316 x 10"12, A/ = 100 kHz, and fo = 25 MHz, gives a = 10"14. The period jitter J is then computed from (70), giving J = 20 ps. The noise of VCO is -48 dBc/Hz at 100 kHz. Applying (74) and (70) with £(4/*) = 1.59 x 10"5, A/ = 100 kHz, and/ 0 = 2 GHz, gives a = 7.9 x 10~14 and an period jitter of J = 6.3 ps. The period jitter of the PFD/CP and FDs was found to be 2 ns. The FDs were included into the oscillators, which suppresses the high frequency signals at the input and output of the synthesizer. The netlist is shown in Listing 18. The results (compensated for non-unity resolution bandwidth (-28 dB) and for the suppression of the dividers (80 dB)) are shown in Figures 17-20. The simulation took 7.5 minutes for 450k time-points on a HP 9000/735. The use of a large number of time points was motivated by the desire to reduce the level of uncertainty in the results. The period jitter in the PLL was found to be 9.8 ps at the output of the VCO. Listing 18 — Spectre netlist for PLL synthesizer.

% correct for scaling in PSD due to FFT and window Sphi=winNBW*Sphi/nffi;

//PLL-based frequency synthesizer that models jitter simulator lang=spectre

% plot the results (except at DC) K = length(f); semi!ogx(f(2:K),10*log10(Sphi(2:K))); title('Power Spectral Density of VCO Phase'); xlabel('Frequency (Hz)'); ylabel('S phi (dB/Hz)'); rbw = winNBW/(T*nfft); RBW=sprintf('Resolution Bandwidth = %.0f Hz (%.0f dB)\ rbw, 10*log10(rbw)); imtext(0.5,0.07, RBW);

ahdijnclude "osc.va" //Listing 14 ahdLJnclude "pfd_cp.va" //Listing 15 ahdl_include "vco.va" //Listing 16 Osc

freq=25MHz ratio=125\ accJitter=20ps syncJitter=2ns PFD (err in fb) pfd__cp lout=500ua C1 (errc) capacitor c=3.125nF R (c 0) resistor r=10k C2 (c 0) capacitor c=625pF VCO (fb err) vco Fmin=1 GHz Fmax=3GHz \ Vmin=-4 Vmax=4 ratio=10000 \ jitter=6ps outStart=10ms

putes Sty(Af) with a resolution bandwidth of rbw.^ Normally, S$(&f) is given with a unity resolution bandwidth. To compensate for a non-unity resolution bandwidth, broadband signals such as the noise should be divided by rbw. Signals with bandwidth less than rbw, such as the spurs generated by leakage in the CP, should not be scaled. The script processes the output of VCO/FD^. The results of the script must be further processed using the equations in Table II to remove the effect ofFDtf.

t The Hanning window used in the psd() function has a resolution bandwidth of 1.5 bins [29]. Assuming broadband signals, Matlab divides by 1.5 inside psd() to compensate. In order to resolve narrowband signals, the factor of 1.5 is removed by the script, and instead included in the reported resolution bandwidth.

(in)

JitterSim

Osc& + 125

osc

tran

in

stop=60ms

PFD & CP fb

err

VCO& + 10,000

r I

The low-pass filter LF blocks all high frequency signals from reaching the VCO, so the noise of the phase lock loop at high frequencies is the same as the noise generated by the openloop VCO alone. At low frequencies, the loop gain acts to stabilize the phase of the VCO, and the noise of the PLL is dom-

67

0

-10

VCO-OL

-20

-10

^-30

5 -40

I-

OL

S-50

•o

^ t o- 3 0

*-«

FD/CP,FD-OL

OSC-Ol> -50

-80

300 Hz

3

-40

CL

-70

PLL-CL

1kHz 3 kHz

10 kHz 30 kHz

Fig. 17. Noise of the closed-loop PLL at the output of the VCO when only the reference oscillator exhibits jitter (CL) versus the noise of the reference oscillator mapped up to the VCO frequency when operated open loop (OL).

1 kHz

3 kHz

10 kHz

30 kHz

100 kHz

Fig. 20. Closed-loop PLL noise performance compared to the openloop noise performance of the individual components that make up the PLL. The achieved noise is slightly larger than what is expected from the components due to peaking in the response of the PLL.

the loop. In this example, noise at the middle frequencies is dominated by the synchronous jitter generated by the PFD/ CO and FDs. The measured results agree qualitatively with the expected results. The predicted noise is higher than one would expect solely from the open-loop behavior of each block because of peaking in the response of the PLL from 5 kHz to 50 kHz. For this reason, PLLs used in synthesizers where jitter is important are usually overdamped.

0 OL

-10

300 Hz

100 kHz

m-20

-30 XV.

CL -40

300 Hz

1kHz 3 kHz

10 kHz 30 kHz

100 kHz

Fig. 18. Noise of the closed-loop PLL at the output of the VCO when only the VCO exhibits jitter (CL) versus the noise of the VCO when operated open loop (OL).

A

-25 -30 ^.-35 N

5 -40 CO

OL

X \

CL

3-45

-55



VV

A methodology for modeling and simulating the phase noise and jitter performance of phase-locked loops was presented. The simulation is done at the behavioral level, and so is efficient enough to be applied in a wide variety of applications. The behavioral models are calibrated from circuit-level noise simulations, and so the high-level simulations are accurate. Behavioral models were presented in the Verilog-A language, however these same ideas can be used to develop behavioral models in purely event-driven languages such as VerilogHDL and Verilog-AMS. This methodology is flexible enough to be used in a broad range of applications where phase noise and jitter is important. REFERENCES

[1] Ken Kundert. "Introduction to RF simulation and its application." Journal ofSolid-State Circuits, vol. 34, no. 9, September 1999.

V

-60 1kHz

10 kHz

CONCLUSION

[2] Cadence Design Systems. "SpectreRF simulation option." www.cadence.com/datasheets/spectrerf.html. [3] F. Gardner. Phaselock Techniques. John Wiley & Sons, 1979.

100kHz

Fig. 19. Noise of the closed-loop PLL at the output of the VCO when only the PFD/CP, FDM, and FD^ exhibit jitter (CL) versus the noise of these components mapped up to the VCOfrequencywhen operated open loop (OL).

[4] D. Yee, C. Doan, D. Sobel, B. Limketkai, S. Alalusi, and R. Brodersen. "A 2-GHz low-power single-chip CMOS receiver for WCDMA applications." Proceedings of the European Solid-State Circuits Conference, Sept. 2000.

inated by the phase noise of the OSC. There is some contribution from the VCO, but it is diminished by the gain of

68

[5] A. Demir, E. Liu, A. Sangiovanni-Vincentelli, and I. Vassiliou. "Behavioral simulation techniques for phase/ delay-locked systems." Proceedings of the IEEE Custom Integrated Circuits Conference, pp. 453-456, May 1994. [6] A. Demir, E. Liu, and A. Sangiovanni-Vincentelli. "Time-domain non-Monte-Carlo noise simulation for nonlinear dynamic circuits with arbitrary excitations." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 15, no. 5, pp. 493-505, May 1996. [7] A. Demir, A. Sangiovanni-Vincentelli. "Simulation and modeling of phase noise in open-loop oscillators." Proceedings of the IEEE Custom Integrated Circuits Conference, pp. 445-456, May 1996. [8] A. Demir, A. Sangiovanni-Vincentelli. Analysis and Simulation of Noise in Nonlinear Electronic Circuits and Systems. Kluwer Academic Publishers, 1997. [9] Ken Kundert. "Modeling and simulation of jitter in phase-locked loops." In Analog Circuit Design: RF Analog-to-Digital Converters; Sensor and Actuator Interfaces; Low-Noise Oscillators, PLLs and Synthesizers, Rudy J. van de Plassche, Johan H. Huijsing, Willy M.C. Sansen, Kluwer Academic Publishers, November 1997. [10] Ken Kundert. "Modeling and simulation of jitter in PLL frequency synthesizers." Available from www.designers-guide.com. [11] Kenneth S. Kundert. The Designer's Guide to SPICE and Spectre. Kluwer Academic Publishers, 1995. [12] Verilog-A Language Reference Manual: Analog Extensions to Verilog-HDL, version 1.0. Open Verilog International, 1996. Available from www.eda.org/verilogams. [13] Ulrich L. Rohde. Digital PLL Frequency Synthesizers. Prentice-Hall, Inc., 1983. [14] Paul R. Gray and Robert G. Meyer. Analysis and Design of Analog Integrated Circuits. John Wiley & Sons, 1992. [15] A. Demir, A. Mehrotra, and J. Roychowdhury. "Phase noise in oscillators: a unifying theory and numerical methods for characterization." IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, vol. 47, no. 5, May 2000, pp. 655 -674. [16] F. Kaertner. "Determination of the correlation spectrum of oscillators with low noise." IEEE Transactions on Microwave Theory and Techniques, vol. 37, no. 1, pp. 90101, Jan. 1989.

[17] F. X. Kaertner. "Analysis of white and/^01 noise in oscillators." International Journal of Circuit Theory and Applications, vol. 18, pp. 485-519, 1990. [18] G. Vendelin, A. Pavio, U. Rohde. Microwave Circuit Design. J. Wiley & Sons, 1990. [19] W. Gardner. Introduction to Random Processes: With Applications to Signals and Systems. McGraw-Hill, 1989. [20] Joel Phillips and Ken Kundert. "Noise in mixers, oscillators, samplers, and logic: an introduction to cyclostationary noise." Proceedings of the IEEE Custom Integrated Circuits Conference, CICC 2000. The paper and presentation are both available from www.designersguide, com. [21] T. A. D. Riley, M. A. Copeland, and T. A. Kwasniewski. "Delta-sigma modulation in fractional-TV frequency synthesis." IEEE Journal of Solid-State Circuits, vol. 28 no. 5, May 1993, pp. 553 -559 [22] Frank Herzel and Behzad Razavi. "A study of oscillator jitter due to supply and substrate noise." IEEE Transactions on Circuits and Systems - //; Analog and Digital Signal Processing, vol. 46. no. 1, Jan. 1999, pp. 56-62. [23] T. C. Weigandt, B. Kim, and P. R. Gray. "Jitter in ring oscillators." 1994 IEEE International Symposium on Circuits and Systems (ISCAS-94), vol. 4, 1994, pp. 2730. [24] A. Papoulis. Probability, Random Variables, and Stochastic Processes. McGraw-Hill, 1991. [25] J. J. Rael and A. A. Abidi. "Physical processes of phase noise in differential LC oscillators." Proceedings of the IEEE Custom Integrated Circuits Conference, CICC 2000. [26] J. McNeill. "Jitter in Ring Oscillators." IEEE Journal of Solid-State Circuits, vol. 32, no. 6, June 1997. [27] H. Chang, E. Charbon, U. Choudhury, A. Demir, E. Felt, E. Liu, E. Malavasi, A. Sangiovanni-Vincentelli, and I. Vassiliou. A Top-Down Constraint-Driven Methodology for Analog Integrated Circuits. Kluwer Academic Publishers, 1997. [28] A. Oppenheim, R. Schafer. Digital Signal Processing. Prentice-Hall, 1975. [29] F. Harris. "On the use of windows for harmonic analysis with the discrete Fourier transform." Proceedings of the IEEE, vol. 66, no. 1, January 1978.

69

331

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 31, NO. 3, MARCH 1996

A Study of Phase Noise in CMOS Oscillators Behzad Razavi, Member, IEEE

Abstract-This paper presents a study of phase noise in two inductorless CMOS oscillators. First-order analysis of a linear oscillatory system leads to a noise shaping function and a new definition of Q. A linear model of CMOS ring oscillators is used to calculate their phase noise, and three phase noise phenomena, namely, additive noise, high-frequency multiplicative noise, and low-frequency multiplicativenoise, are identified and formulated. Based on the same concepts, a CMOS relaxation oscillator is also analyzed. Issues and techniques related to simulation of noise in the time domain are described,and two prototypesfabricated in a 0.5-pm CMOS technology are used to investigate the accuracy of the theoretical predictions. Compared with the measured results, the calculated phase noise values of a 2-GHz ring oscillator and a 900-MHz relaxation oscillator at 5 MHz offset have an error of approximately 4 dB.

models, the analytical approach can predict the phase noise with approximately 4 to 6 dB of error. The next section of this paper describes the effect of phase noise in wireless communications. In Section 111, the concept of Q is investigated and in Section IV it is generalized through the analysis of a feedback oscillatory system. The resulting equations are then used in Section V to formulate the phase noise of ring oscillators with the aid of a linearized model. In Section VI, nonlinear effects are considered and three mechanisms of noise generation are described, and in Section VII, a CMOS relaxation oscillator is analyzed. In Section VIII, simulation issues and techniques are presented, and in Section IX the experimental results measured on the two prototypes are summarized.

I. INTRODUCTION

V

OLTAGE-CONTROLLED oscillators (VCO’s) are an integral part of phase-locked loops, clock recovery circuits, and frequency synthesizers. Random fluctuations in the output frequency of VCO’s, expressed in terms of jitter and phase noise, have a direct impact on the timing accuracy where phase alignment is required and on the signal-to-noise ratio where frequency translation is performed. In particular, RF oscillators employed in wireless tranceivers must meet stringent phase noise requirements, typically mandating the use of passive LC tanks with a high quality factor (Q). However, the trend toward large-scale integration and low cost makes it desirable to implement oscillators monolithically. The paucity of literature on noise in such oscillators together with a lack of experimental verification of underlying theories has motivated this work. This paper provides a study of phase noise in two inductorless CMOS VCO’s. Following a first-order analysis of a linear oscillatory system and introducing a new definition of Q, we employ a linearized model of ring oscillators to obtain an estimate of their noise behavior. We also describe the limitations of the model, identify three mechanisms leading to phase noise, and use the same concepts to analyze a CMOS relaxation oscillator. In contrast to previous studies where time-domain jitter has been investigated [l], [2], our analysis is performed in the frequency domain to directly determine the phase noise. Experimental results obtained from a 2-GHz ring oscillator and a 900-MHz relaxation oscillator indicate that, despite many simplifying approximations, lack of accurate MOS models for RF operation, and the use of simple noise Manuscript received October 30, 1995; revised December 17, 1995. The author was with AT&T Bell Laboratories, Holmdel, NJ 07733 USA. He is now with Hewlett-Packard Laboratories, Palo Alto, CA 94304 USA. Publisher Item Identifier S 0018-9200(96)02456-0.

11. PHASE NOISEIN WIRELESS COMMUNICATIONS Phase noise is usually characterized in the frequency domain. For an ideal oscillator operating at W O , the spectrum assumes the shape of an impulse, whereas for an actual oscillator, the spectrum exhibits “skirts” around the center or “carrier” frequency (Fig. 1). To quantify phase noise, we consider a unit bandwidth at an offset Aw with respect to W O , calculate the noise power in this bandwidth, and divide the result by the carrier power. To understand the importance of phase noise in wireless communications, consider a generic transceiver as depicted in Fig. 2, where the receiver consists of a lownoise amplifier, a band-pass filter, and a downconversion mixer, and the transmitter comprises an upconversion mixer, a band-pass filter, and a power amplifier. The local oscillator (LO) providing the carrier signal for both mixers is embedded in a frequency synthesizer. If the LO output contains phase noise, both the downconverted and upconverted signals are corrupted. This is illustrated in Fig. 3(a) and (b) for the receive and transmit paths, respectively. Referring to Fig. 3(a), we note that in the ideal case, the signal band of interest is convolved with an impulse and thus translated to a lower (and a higher) frequency with no change in its shape. In reality, however, the wanted signal may be accompanied by a large interferer in an adjacent channel, and the local oscillator exhibits finite phase noise. When the two signals are mixed with the LO output, the downconverted band consists of two overlapping spectra, with the wanted signal suffering from significant noise due to tail of the interferer. This effect is called “reciprocal mixing.” Shown in Fig. 3(b), the effect of phase noise on the transmit path is slightly different. Suppose a noiseless receiver is to

0018-9200/96$05.00 0 1996 IEEE

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 31, NO. 3, MARCH 1996

332

Aiw

Fig. 1. Phase noise in an oscillator.

Low-Noise Amplifier c

Band-Pass Filter

Frequency

Synthesizer

.

Amplifier

Band-Pass

Fig. 2. Generic wireless transceiver.

detect a weak signal at w2 while a powerful, nearby tranmitter generates a signal at w1 with substantial phase noise. Then, the wanted signal is corrupted by the phase noise tail of the transmitter. The important point here is that the difference between w1 and w2 can be as small as a few tens of kilohertz while each of these frequencies is around 900 MHz or 1.9 GHz. Therefore, the output spectrum of the LO must be extremely sharp. In the North American Digital Cellular (NADC) IS54 system, the phase noise power per unit bandwidth must be about 115 dB below the carrier power (i.e., - I15 dBc/Hz) at an offset of 60 kHz. Such stringent requirements can be met through the use of LC oscillators. Fig. 4 shows an example where a transconductance amplifier (G,) with positive feedback establishes a negative resistance to cancel the loss in the tank and a varactor diode provides frequency tuning capability. This circuit has a number of drawbacks for monolithic implementation. First, both the control and the output signals are single-ended,

(b) Fig 3. Effect of phase noise on (a) receive and (b) transmt paths.

making the circuit sensitive to supply and substrate noise. Second, the required inductor (and varactor) Q is typically greater than 20, prohibiting the use of low-Q integrated inductors. Third, monolithic varactors also suffer from large series resistance and hence a low Q. Fourth, since the LO signal inevitably appears on bond wires connecting to (or operating as) the inductor, there may be significant coupling of this signal to the front end (“LO leakage”), an undesirable effect especially in homodyne architectures [ 3 ] . Ring oscillators, on the other hand, require no external components and can be realized in fully differential form, but

333

RAZAVI: A STUDY OF PHASE NOISE IN CMOS OSCILLATORS

“cc

llI

I

Freq. Control

-L Fig. 4.

(2)

LC oscillator,

Q

= 2~

Energy Energy Dissipated per Cycle

their phase noise tends to be high because they lack passive resonant elements.

111. DEFINITIONS OF Q

The quality factor, Q, is usually defined within the context of second-order systems with (damped) oscillatory behavior. Illustrated in Fig. 5 are three common definitions of Q. For an RLC circuit, Q is defined as the ratio of the center frequency and the two-sided -3-dB bandwidth. However, if the inductor is removed, this definition cannot be applied. A more general definition is: 27r times the ratio of the stored energy and the dissipated energy per cycle, and can be measured by applying a step input and observing the decay of oscillations at the output. Again, if the circuit has no oscillatory behavior (e.g., contains no inductors), it is difficult to define “the energy dissipated per cycle.” In a third definition, an LC oscillator is considered as a feedback system and the phase of the open-loop transfer function is examined at resonance. For a simple LC circuit such as that in Fig. 4, it can be easily shown that the Q of the tank is equal to 0 . 5 ~ 0d@/dw, where W O is the resonance frequency and d@/dw denotes the slope of the phase of the transfer function with respect to frequency. Called the “openloop &” herein, this definition has an interesting interpretation if we recall that for steady oscillations, the total phase shift around the loop must be precisely 360”. Now, suppose the oscillation frequency slightly deviates from W O . Then, if the phase slope is large, a significant change in the phase shift arises, violating the condition of oscillation and forcing the frequency to return to W O . In other words, the open-loop Q is a measure of how much the closed-loop system opposes variations in the frequency of oscillation. This concept proves useful in our subsequent analyses. While the third definition of Q seems particularlly wellsuited to oscillators, it does fail in certain cases. As an example, consider the two-integrator oscillator of Fig. 6 , where the open-loop transfer function is simply

-(?) 2

H(s)=

(1)

yielding CP = L H ( s = j w ) = 0, and Q = 0. Since this circuit does indeed oscillate, this definition of Q is not useful here.

(3) Q=--00 dQ 2 do Fig. 5. Common definitions of

&.

Fig. 6. Two-integrator oscillator.

Fig. 7. Linear oscillatory system.

IV. LINEAROSCILLATORY SYSTEM Oscillator circuits in general entail “compressive” nonlinearity, fundamentally because the oscillation amplitude is not defined in a linear system. When a circuit begins to oscillate, the amplitude continues to grow until it is limited by some other mechanism. In typical configurations, the open-loop gain of the circuit drops at sufficiently large signal swings, thereby preventing further growth of the amplitude. In this paper, we begin the analysis with a linear model. This approach is justified as follows. Suppose an oscillator employs strong automatic level control (ALC) such that its oscillation amplitude remains small, making the linear approximation

LEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 31, NO. 3, MARCH 1996

334

Fig. 8. Noise shaping in oscillators.

valid. Since the ALC can be relatively slow, the circuit parameters can be considered time-invariant for a large number of cycles. Now, let us gradually weaken the effect of AJX so that the oscillator experiences increasingly more “selflimiting.” Intuitively, we expect that the linear model yields reasonable accuracy for soft amplitude limiting and becomes gradually less accurate as the ALC is removed. Thus, the choice of this model depends on the error that it entails in predicting the response of the actual oscillator to various sources of noise, an issue that can be checked by simulation (Section VIII). While adequate for the cases considered here, this approximation must be carefully examined for other types of oscillators. To analyze phase noise, we treat an oscillator as a feedback system and consider each noise source as an input (Fig. 7). The phase noise observed at the output is a function of: 1) sources of noise in the circuit and 2) how much the feedback system rejects (or amplifies) various noise components. The system oscillates at w = W O if the transfer function

goes to infinity at this frequency, i.e., if H ( j w 0 ) = -1. For frequencies close to the carrier, w = W O A w , the open-loop transfer function can be approximated as

+

and the noise tranfer function is

Since H ( j w 0 ) = -1 and for most practical cases << 1, (4) reduces to

spectral density is shaped by

This is illustrated in Fig. 8. As we will see later, (6) assumes a simple form for ring oscillators. To gain more insight, let H ( j w ) = A ( w ) exp[j@(w)],and hence (7)

Since for w

A

M WO,

= 1, ( 6 ) can be written as

We define the open-loop Q as

Combining (8) and (9) yields

a familiar form previously derived for simple LC oscillators [4]. It i s interesting to note that in an LC tank at resonance, d A / & = 0 and (9) reduces to the third definition of Q given in Section III. In the two-integrator oscillator, on the other hand, d A / d w = 2/wo,d@./dw = 0 , and Q = 1. Thus, the proposed definition of Q applies to most cases of interest. To complete the discussion, we also consider the case shown in Fig. 9, where H l ( j w ) H ~ ( ~= wH ) ( j w ) . Therefore, Y ( j w ) / X ( j w )is given by (5). For, Y ~ ( j w ) / X ( j w we ) , have

l a w dH/dwl

Y

,[j(wo

+ Aw)] M

-1 . aw-dH dw

~

(5)

This equation indicates that a noise component at w = wo+Aw is multiplied by - ( A w d H / d w ) - l when it appears at the output of the oscillator. In other words, the noise power

giving the following noise shaping function:

335

RAZAVI: A STUDY OF PHASE NOISE IN CMOS OSCILLATORS

y,(im)

-

-

Y(p) Fig. 9. Oscillatory system with nonunity-gain feedback.

-

Fig. 11. Linearized model of CMOS VCO.

open-loop transfer function is thus given by -8

H(jw)=

(a)

vDD

(13)

(1 + j f i-5~)~’

Freq. .........:.I ...................... i..; ,. ....................... I I 0...........i.......................... i .......................... i



Therefore, JdA/dwl = 9/(4wO) and Id@/dwl = 3&/(4w0). It follows from (6) or (10) that if a noise current Inl is injected onto node 1 in the oscillator of Fig. 11, then its power spectrum is shaped by

This equation is the key to predicting various phase noise components in the ring oscillator. AND MULTIPLICATIVE NOISE VI. ADDITIVE

Control

(b) Fig. 10. CMOS VCO: (a) block diagram and (b) implementation of one stage.

V. CMOS RING OSCILLATOR Submicron CMOS technologies have demonstrated potential for high-speed phase-locked systems [ 5 ] ,raising the possibility of designing fully integrated RF CMOS frequency synthesizers. Fig. 10 shows a three-stage ring oscillator wherein both the signal path and the control path are differential to achieve high common-mode rejection. To calculate the phase noise, we model the signal path in the VCO with a linearized (single-ended) circuit (Fig. 11). As mentioned in Section IV, the linear approximation allows a first-order analysis of the topologies considered in this paper, but its accuracy must be checked if other oscillators are of interest. In Fig. 11, R and C represent the output resistance and the load capacitance of each stage, respectively, ( R M l / g m 3 = l / g m 4 ) , and G,R is the gain required for steady oscillations. The noise of each differential pair and its load devices are modeled as current sources Inl-In3, injected onto nodes 1-3, respectively. Before calculating the noise transfer function, we note that the circuit of Fig. 11 oscillates if, at W O , each stage has unity voltage gain and 120’ of phase shift. Writing the open-loop transfer function and imposing these two conditions, we have W O = & / ( R C ) and G,R = 2. The

Modeling the ring oscillator of Fig. 10 with the linearized circuit of Fig. 11 entails a number of issues. First, while the stages in Fig. 10 tum off for part of the period, the linearized model exhibits no such behavior, presenting constant values for the components in Fig. 11. Second, the model does not predict mixing or modulation effects that result from nonlinearities. Third, the noise of the devices in the signal path has a “cyclostationary” behavior, i.e., periodically varying statistics, because the bias conditions are periodic functions of time. In this section, we address these issues, first identifying three types of noise: additive, high-frequency multiplicative, and low-frequency multiplicative. A, Additive Noise Additive noise consists of components that are directly added to the output as shown in Fig. 7 and formulated by (6) and (14). To calculate the additive phase noise in Fig. 10 with the aid of (14), we note that for w M wo the voltage gain in each stage is close to unity. (Simulations of the actual CMOS oscillator indicate that for W O = 27r x 970 MHz and noise injected at w - W O = 27r x 10 MHz onto one node, the components observed at the three nodes differ in magnitude by less than 0.1 dB.) Therefore, the total output phase noise power density due to In1-Jns is

-

-

-

-

where it is assumed I& = I:2 = I& = I:. For the differential stage of Fig. 10, the thermal noise current per unit bandwidth

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL 31, NO 3, MARCH 1996

336

Fig. 12. High-frequency multiplicative noise.

is equal to

E = 8kT(gml + gm3)/3 M

8 k T I R . Thus,

In this derivation, the thermal drain noise current of MOS devices is assumed equal to = 41;T(2gm/3). For shortchannel devices, however. the noise may be higher [6]. Using a charge-based model in our simulation tool, we estimate the factor to be 0.873 rather than 2/3. In reality, hot-electron effects further raise this value. Additive phase noise is predicted by the linearized model with high accuracy if the stages in the ring operate linearly for most of the period. In a three-stage CMOS oscillator designed for the RF range, the differential stages are in the linear region for about 90% of the period. Therefore, the linearized model emulates the CMOS oscillator with reasonable accuracy. However, as the number of stages increases or if each stage entails more nonlinearity, the error in the linear approximation may increase. Since additive noise is shaped according to (16), its effect is significant only for components close to the carrier frequency.

2

- Fig. 13. Frequency modulation due to tail current noise

oscilPatory system. Simulations indicate that for the oscillator topologies considered here, these two components have approximately equal magnitudes. Thus, the nonlinearity folds all the noise components below W O to the region above and vice versa, effectively doubling the noise power predicted by (6). Such components are significant if they are close to W O and are herein called high-frequency multiplicative noise. This phenomenon is illustrated in Fig. 12. (Note that a component at 3w0 A w is also translated to W O A w , but its magnitude is negligible.) This effect can also be viewed as sampling of the noise by the differential pairs, especially if each stage experiences hard switching. As each differential pair switches twice in every period, a noise component at w, is translated to 2w0 f wn. Note that for highly nonlinear stages, the Taylor expansion considered above may need to include higher order terms.

+

B. High-Frequency Multiplicative Noise

The nonlinearity in the differential stages of Fig. 10, especially as they turn off, causes noise components to be multiplied by the carrier (and by each other). If the input/output characteristic of each stage is expressed as VoUt= all,$n CQV,; Q~V,:, then for an input consisting of the carrier and a noise component, e.g., K n ( t ) = A0 cos wot A, cosw,t, the output exhibits the following important terms:

+

+

+

wn)t ~&~~c ( tl)i , ~ cos(wo , ~:2w,)t vo”tl(t) fx Q2’4OAn COS(W0 f

+

C. Low-Frequency Multiplicative Noise

Since the frequency of oscillation in Fig. 10 is a function of the tail current in each differential pair, noise components vout3(t)a 3 ~ ;cos(2w0 ~ , - Wn)t. in this current modulate the frequency, thereby contributing Note that Voutl(t) appears in band if w, is small, i.e., if phase noise [classical frequency modulation (FM)]. Depicted it is a low-frequency component, but in a fully differential in Fig. 13, this effect can be significant because, in CMOS configuration, Voutl(t) = 0 because a2 = 0. Also, Vouta(t)oscillators, W O must be adjustable by more than &20% to is negligible because A, << Ao, leaving Vout3(t) as the only compensate for process variations, thus making the frequency significant cross-product. quite sensitive to noise in the tail current. This mechanism is This simplified one-stage analysis predicts the frequency of illustrated in Fig. 14. the components in response to injected noise, but not their To quantify this phenomenon, we find the sensitivity or magnitude. When noise is injected into the oscillator, the “gain” of the VCO, defined as HVCO= dwOut/dIss in magnitude of the observed response at w, and 2w0 - w, Fig. 13, and use a simple approximation. If the noise per unit depends on the noise shaping properties of the feedback bandwidth in ISS is represented as a sinusoid with the same

#

RAZAVI: A STUDY OF PHASE NOISE IN CMOS OSCILLATORS

tit

~

+

e . .

0

00

~

..J *-””

0

00

M

M1

L

F

+ 1,s

then the output signal of the oscillator

(a)

R 4 L k

+ ‘ss (b)

Fig. 15. Gain stage with (a) stationary and (b) cyclostationary noise.

wot + Kvco

For KvcoI,/w,

;”

-4

*e*.

Fig. 14. Low-frequency multiplicative noise.

power: I, cos w,t, can be written as

:qn 331

/

I , cos w,t

dt)

(17)

<< 1 radian (“narrowband FM’)

. [ C O S ( w o + b,)t

- COS(WO -

w,)t].

(19)

Thus, the ratio of each sideband amplitude to the carrier amplitude is equal to ImKvc0/(2w,), i.e., IV,12(with respect to carrier) =

(20)

Since KVCOcan be easily evaluated in simulation or measurement, (20) is readily calculated. It is seen that modulation of the carrier brings the low frequency noise components of the tail current to the band around WO.Thus, flicker noise in I, becomes particularly important. In the differential stage of Fig. 3(b), two sources of lowfrequency multiplicative noise can be identified: noise in Iss and noise in Ms and Me. For comparable device size, these two sources are of the same order and must be both taken into account. D. Cyclostationary Noise Sources

As mentioned previously, the devices in the signal path exhibit cyclostationary noise behavior, requiring the use of periodically varying noise statistics in analysis and simulations. To check the accuracy of the stationary noise approximation, we perform a simple, first-order simulation on the two cases depicted in Fig. 15. In Fig. 15(a), a sinusoidal current source with an amplitude of 2 nA is connected between the drain and source of M I to represent its noise with the assumption that M I carries half of I S S .In Fig. 15(b), the current source is also a sinusoid, but its amplitude is a function of the drain current of M I . Since MOS thermal noise current (in the saturation we use a nonlinear dependent region) is proportional to 6, source in SPICE [7] as In(t) = a q m s i n w , t , where w, = 27r x 980 MHz. The factor Q is chosen such that I,(t) = 2 nA x sinw,t when V,(t) = 1 x Iss/2 (balanced

Fig. 16. Addition of output voltages of N oscillators.

condition). Simulations indicate that the sideband magnitudes in the two cases differ by less than 0.5 dB. It is important to note that this result may not be accurate for other types of oscillators. E. Power-Noise Trade-off

As with other analog circuits, oscillators exhibit a tradeoff between power dissipation and noise. Intuitively, we note that if the output voltages of N identical oscillators are added in phase (Fig. 16), then the total carrier power is multiplied by N2, whereas the noise power increases by N (assuming noise sources of different oscillators are uncorrelated). Thus, the phase noise (relative to the carrier) decreases by a factor N at the cost of a proportional increase in power dissipation. Using the equations developed above, we can also formulate this trade-off. For example, from (16), since G,R M 2, we have

To reduce the total noise power by N , G, must increase by the same factor. For any active device, this can be accomplished by increasing the width and the bias current by N . (To maintain the same frequency of oscillation, the load resistor is reduced by N . ) Therefore, for a constant supply voltage, the power dissipation scales up by N .

IEEE JOURNAC OF SOLID-STATE CIRCUITS, VOL. 31, NO. 3, MARCH 1996

338

COMPARISON OF

TABLE I THREE-STAGE AND FOUR-STAGE RINGOSCILLATORS 3-Stage VCO

&

&Stage VCO

2

Minimum Required DC Gain

4

Noise Shaping Function

Open-Loop Q

?! 4

(e 1.3)

Jz (M 1.4)

Ml M2

Total Additive Noise Power Dissipation

1.8 mW

3.6 mW

Fig. 17. Substrate and supply noise in gain stage.

VII. CMOS RELAXATIONOSCILLATOR In this section, we apply the analysis methodology described thus far to a CMOS relaxation oscillator [Fig. 18(a)]. When The choice of number of stages in a ring oscillator to designed to operate at 900 MHz, this circuit hardly “relaxes” minimize the phase noise has often been disputed. With and the signals at the drain and source of MI and M2 are close the above formulations, it is possible to compare rings with to sinusoids. Thus, the linear model of Fig. 7 is a plausible different number of stages (so long as the approximations choice. To utilize our previous results, we assume the signals at remain valid). For the cases of interest in RF applications, the sources of M I and M2 are fully differentiall and redraw the we consider three-stage and four-stage oscillators designed to circuit as in Fig. 18(b), identifying it as a two-stage ring with operate at the same frequency. Thus, the four-stage oscillator capacitive degeneration (CA = 2C). The total capacitance incorporates smaller impedance levels and dissipates more seen at the drain of M I and MZ is modeled with C1 and C2, power. Table 1 compares various aspects of the two circuits. respectively. (This is also an approximation because the input We make three important observations. 1) Simulations show impedance of each stage is not purely capacitive.) It can be that if the four-stage oscillator is to operate at the same speed easily shown that the open-loop transfer function is as the three-stage VCO, the value of R in the former must be approximately 60% of that in the latter. 2) The Q’s of the two VCO’s (10) are roughly equal. 3) The total additive thermal noise of the two VCO’s is about the same, because where C1 = C2 = CD and gm denotes the transconductance of the four-stage topology has more sources of noise, but with each transistor. For the circuit to oscillate at W O , H ( j w 0 ) = 1, lower magnitudes. and each stage must have a phase shift of 180”, with 90” From these rough calculations, we draw two conclusions. contributed by each zero and the remaining 90” by the two First, the phase noise depends on not only the Q, but the poles at -gm/CA and -1/(RCD). It follows from the second number and magnitude of sources of noise in the circuit. condition that Second, four-stage VCO’s have no significant advantage over three-stage VCO’s, except for providing quadrature outputs. i.e., W O is the geometric mean of the poles at the drain and source of each transistor. Combining this result with the first condition, we obtain G. Supply and Substrate Noise

F. Three-Stage Versus Four-Stage Oscillators

Even though the gain stage of Fig. 10 is designed as a differential circuit, it nonetheless suffers from some sensitivity to supply and substrate noise (Fig. 17). Two phenomena account for this. First, device mismatches degrade the symmetry of the circuit. Second, the total capacitance at the common source of the differential pair (i.e., the source junction capacitance of M I and M2 and the capacitance associated with the tail current source) converts the supply and substrate noise to current, thereby modulating the delay of the gain stage. Simulations indicate that even if the tail current source has a high dc output impedance, a 1-mV,, supply noise component at 10 MHz generates sidebands 60 dB below the canier at W O f (27r x 10 MHz).

gmR =

CA

CA - CD



After lengthy calculations, we have

and

This assumption is justified by decomposing C into two series capacitors, each one of value 2C, and monitoring the midpoint voltage. The commonmode swing at this node is approximatley 18 dB below the differential swings at the source of M I and M2.

339

RAZAVI: A STUDY OF PHASE NOISE IN CMOS OSCILLATORS

LT2vDD

Ml

R

T

I

-100‘

1

1.1

12

t .3

1A

I

I

1.5

1.8 GI42

Fig. 19. Simulated oscillator spectrum with injected white noise.

(C)

(d)

Fig. 18. (a) CMOS relaxation oscillator, (b) circuit of (a) redrawn, (c) noise current of one transistor, and (d) tranfformed noise current.

For C, = O.~CA,Qreaches its maximum value-unity. In other words, the maximum Q occurs if the (floating) timing capacitor is equal to the load capacitance. The noise shaping function is therefore equal to ( ~ ~ / A w ) ~ / 4 . Since the drain-source noise current of M I and M2 appears between two internal nodes of the circuit [Fig. 18(c)], the transformation shown in Fig. 18(d) can be applied to allow the use of our previous derivations. It can be shown that

and the total additive thermal noise observed at each drain is 10 3 This power must be doubled to account for high-frequency multiplicative noise. RESULTS VIII. SIMULATION

A. Simulation Issues The time-varying nature of oscillators prohibits the use of the standard small-signal ac analysis available in SPICE and other similar programs. Therefore, simulations must be performed in the time domain. As a first attempt, one may generate a pseudo-random noise with known distribution, introduce it into the circuit as a SPICE piecewise linear waveform, run a transient analysis for a relatively large number of oscillation periods, write the output as a series of points equally spaced in time, and compute the fast Fourier transform (FFT)of the output. The result of one such attempt is shown in Fig. 19. It is important to note that 1) many coherent sidebands

appear in the spectrum even though the injected noise is white, and 2) the magnitude of the sidebands does not directly scale with the magnitude of the injected noise! To understand the cause of this behavior, consider a much simpler case, illustrated in Fig. 20. In Fig. 20(a), a sinusoid at 1 GHz is applied across a 1-k0 resistor, and a long transient simulation followed by interpolation and FFT is used to obtain the depicted spectrum. (The finite width results from the finite length of the data record and the “arches” are attributed to windowing effects.) Now, as shown in Fig. 20(b), we add a 30-MHz squarewave with 2 ns transition time and proceed as before. Note that the two circuits share only the ground node. In this case, however, the spectrum of the 1-GHz sinusoid exhibits coherent sidebands with 15 MHz spacing! Observed in AT&T’ s internal simulator (ADVICE), HSPICE, and Cadence SPICE, this effect is attributed to the additional points that the program must calculate at each edge of the squarewave, leading to errors in subsequent interpolation. Fortunately, this phenomenon does not occur if only sinusoids are used in simulations.

B. Oscillator Simulations In order to compute the response of oscillators to each noise source, we approximate the noise per unit bandwidth at frequency w, with an impulse (a sinusoid) of the same power at that frequency. As shown in Fig. 21, the “sinusoidal noise” is injected at various points in the circuit and the output spectrum is observed. This approach is justified by the fact that random Gaussian noise can be expressed as a Fourier series of sinusoids with random phase [8], [SI.Since only one sinusoid is injected in each simulation, the interaction among noise components themselves is assumed negligible, a reasonable approximation because if two noise components at, say, -60 dB are multiplied, the product is at -120 dB. In the simulations, the oscillators were designed for a center frequency of approximately 970 MHz. Each circuit and its linearized models were simulated in the time domain in steps

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL 31, NO 3, MARCH 1996

340

Fig. 21. Simulated configuration.

The vertical axis represents 10 log qzs.Note that the observed magnitude of the 980-MHz component differs by less than 0.2 dB in the two cases, indicating that the linearized model is indeed an accurate representation. As explained in Section VIB, the 960-MHz component originates from third-order mixing of the carrier and the 980-MHz component and essentially doubles the phase noise. In order to investigate the limitation of the linear model, the oscillator was made progressively more nonlinear. Shown in Fig. 23 is the output spectra of a four-stage CMOS oscillator, revealing approximately 1 dB of error in the prediction by the linear model. The error gradually increases with the number of stages in the ring and reaches nearly 6 dB for an eight-stage oscillator. For bipolar ring oscillators (differential pairs with no emitter followers), simulations reveal an error of approximately 2 dB for three stages and 7 dB for four stages in the ring.

IX, EXPERIMENTAL RESULTS A. Measurements

9

9.5

10

10.5

11

x 100 MHZ

(b)

Fig. 20 Simple simulation revealing effect of pulse waveforms, (a) single sinusoidal source and (b) sinusoidal source along with a square wave generator.

of 30 ps for 8 ps, and the output was processed in MATLAB to obtain the spectrum. Since simulations of the linear model yield identical results to the equations derived above, we will not distinguish between the two hereafter. Shown in Fig. 22 are the output spectra of the linear model and actual circuit of a three-stage oscillator in 0.5-pm CMOS technology with a Z-nA, 980-MHz sinusoidal current injected into the signal path (the drain of one of the differential pairs).

Two different oscillator configurations have been fabricated in a 0.5-pm CMOS technology to compare the predictions in this paper with measured results. Note that there are three sets of results: theoretical calculations based on linear models but including multiplicative noise, simulated predictions based on the actual CMOS oscillators, and measured values. The first circuit is a 2.2-GHz three-stage ring oscillator. Fig. 24 shows one stage of the circuit along with the measured device parameters. The sensitivity of the output frequency to the tail current of each stage is about 0.43 MHzIpA. The measured spectrum is depicted in Fig. 25(a) and (b) with two different horizontal scales. Due to lack of data on the flicker noise of the process, we consider only thermal noise at relatively large frequency offsets, namely, 1 MHz and 5 MHz. It is important to note that low-frequency flicker noise causes the center of the spectrum to fluctuate constantly. Thus, as the resolution bandwidth (RBW) of the spectrum analyzer is reduced [from 1 MHz in Fig. 25(a) to 100 kHz in Fig. 25(b)], the carrier power is subject to more averaging and appears to decrease. To maintain consistency with calculations, in which

RAZAVI: A STUDY OF PHASE NOISE IN CMOS OSCILLATORS

341

OL ...,.".........., . . . . . . . . . . .l'1 ....... ....). . . . . . . II . ...... ! or .......,......... I 4 I

....... .....................

.. I . . . . .

..............................................

..: . . . ...... : . . . . . . . . . . .:. . . . . . . .:. . . . . . . . . ;

................................

a............................. :..... ..................... a .........................................................

4!

[_ :

:..............:..............

.... . . . . . . . .. ............. .. . . . . . . . .. ............. .. . . . .

-80 ...................................... -80 ......................................

-6oc

-1 I

-100 -120 -140 -180 -1 80 9.4

92

..........................

j ....................

............ . . . . . . . . . . . . . . . .

9.8

II

I

-@+

9.8

10

10.2

10.4 x 100 MHz

I

I /I I

-100

-100 -120 -120 -140 -140 -160 -180

-180

-180

9

9.2

9.4

9.6

9.8

10

9.2

9.4

9.6

9.8

10

10

10.2 10.4 x 100 MHz

x 100 MHz

(b)

(b) Fig. 22. Simulated output spectra of (a) linear model and (b) actual circuit of a three-stage CMOS oscillator.

the phase noise is normalized to a constant carrier power, this power (i.e., the output amplitude) is measured using an oscilloscope. The noise calculation proceeds as follows. First, find the additive noise power in (16), and double the result to account for third-order mixing (high-frequency multiplicative noise). Next, calculate the low-frequency multiplicative noise from (20) for one stage and multiply the result by three. We assume (from simulations) that the internal differential voltage and the drain noise swing is equal to 1 V,, (0.353 V,,,)

2

current of MOSFET's is given by = 4kT(0.863gm). For Aw = 2n x 1 MHz, calculations yield

high-frequency multiplicative noise = - 100.1 dBc/Hz (29) low-frequency multiplicative noise = - 106.3 dBc/Hz (30) total normalized phase noise = -99.2 dBc/Hz. (31)

Fig. 23. Simulated output spectra of (a) linear model and (b) actual circuit of a four-stage CMOS oscillator.

gm= 11214 U Ms-Me: WILE 13.4u/0.5~ g m = 11630 U M g : WIL = 1 3 . 4 ~ l 0 . 5 ~

-

I, 790 UA g m = 11530 U

Fig. 24. Gain stage used in 2-GHz CMOS oscillator.

Simulations of the actual CMOS oscillator predict the total noise to be -98.1 dBc/Hz. From Fig. 25(b), with the carrier power of Fig. 25(a), the phase noise is approximately equal to -94 dBc/Hz.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 31, NO. 3, MARCH 1996

342

Ml-M,:

-

W/L= 100u10.5~ g= ,, 1/84 Ili

R=275 R C = 0.6 pF

-

Iss= 3 mA

Fig . 26. Relaxation oscillator parameters

(b)

F ig. 25. Measured output spectrum of ring oscillator (10 dB/div. vertical SIsale). (a) 5 MHz/div. horizontal scale and 1 MHz resolution bandwidth, (b) 1 MHz horizontal scale and 100 lcHz resolution bandwidth.

Similarly, fcir A w = 27r x 5 MHz, calculations yield high-frequency multiplicative noise = - 114.0 dBc/Hz (32) low-frequency multiplicative noise = -120.2 dBc/Hz (33) total normalized phase noise = - 113.1 dBc/Hz (34) amd simulations predict - 112.4 dBc/Hz, while Fig. 25(a) indicates a phase noise of - 109 dBc/Hz. Note that these values correspond to a center frequency of 2.2 GHz and should be lowered by approximately 8 dB for 900 MHz operation, as shown in (9). The second circuit is a 920-MHz relaxation oscillator, depicted in Fig. 26. The measured spectra are shown in Fig. 27. Since simulations indicate that the low-frequency multiplicative noise is negligible in this implementation, we consider only the thermal noise in the signal path. For A w = 27r x 1 MHz, calculations yield a relative phase noise of -105 dBc/Hz, simulations predict -98 dB, and the spectrum in Fig. 27 gives -102 dBc/Hz. For A w = 27r x 5 MHz, the calculated and simulated results are -119 dBc/Hz and

Fig. 27. Measured output spectrum of relaxation oscillator (10 dB/div. vertical scale). (a) 2 MJWdiv. horizontal scale and 100 kHz resolution bandwidth and (b) 1 MHz horizontal scale and 10 1
- 120 dBckIz, respectively, while the measured value is dBc/Hz.

-

115

B. Discussion Using the above measured data points and assuming a noise shaping function as in (10) with a linear noise-power trade-off (Fig. 16), we can make a number of observations.

343

RAZAVI: A STUDY OF PHASE NOISE IN CMOS OSCILLATORS

How much can the phase noise be lowered by scaling device dimensions? If the gate oxide of MOSFET’s is reduced indefinitely, their transconductance becomes relatively independent of their dimensions, approaching roughly that of bipolar transistors. Thus, in the gain stage of Fig. 24 the transconductance of A41 and A42 (for 1 ~ = 1 1 ~ 2= 395 pA) would go from (214 O)-’ to (66 s2)-’. Scaling down the load resistance proportionally and assuming a constant oscillation frequency, we can therefore lower the phase noise by 101og(214/66) M 5 dB. For the relaxation oscillator, on the other hand, the improvement is about IO dB. These are, of course, greatly simplified calculations, but they provide an estimate of the maximum improvement expected from technology scaling. In reality, short-channel effects, finite thickness of the inversion layer, and velocity saturation further limit the transconductance that can be achieved for a given bias current. It is also instructive to compare the measured phase noise of the above ring oscillator with that of a 900-MHz three-stage CMOS ring oscillator reported in [lo]. The latter employs single-ended CMOS inverters with rail-to-rail swings in a 1.2pm technology and achieves a phase noise of -83 dBc/Hz at 100 kHz offset while dissipating 7.4 mW from a 5-V supply. Assuming that Relative Phase Noise K

(wg)zL aw

Vswingz

__ 1

IDD

(35)

where J&iIlg denotes the internal voltage swing and 100 is the total supply current, we can utilize the measured phase noise of one oscillator to roughly estimate that of the other. With the parameters of the 2.2-GHz oscillator and accounting for different voltage swings and supply currents, we obtain a phase noise of approximately -93 dBc/Hz at 100 kHz offset for the 900-MHz oscillator in [IO]. The 10 dB discrepancy is attributed to the difference in the minimum channel length, l/f noise at 100 kHz, and the fact that the two circuits incorporate different gain stages.

ACKNOWLEDGMENT The author wishes to thank V. Gopinathan for many illuminating discussions and T. Aytur for providing the oscillator simulation and measurement results.

REFERENCES [l] A. A. Abidi and R. G. Meyer, “Noise in relaxation oscillators,” IEEE J. Solid-state Circuits, vol. SC-18, pp. 794-802, Dec. 1983. [2] T. C. Weigandt, B. Kim, and P. R. Gray, “Analysis of timing jitter in cmos ring oscillators,” in Proc. ISCAS, June 1994. [3] A. A. Abidi, “Direct conversion radio tranceivers for digital communications,” in ZSSCC Dig. Tech. Papers, Feb. 1995, pp. 186187. [4] D. B. Leeson, “A simple model of feedback oscillator noise spectrum,” Proc. IEEE, pp. 329-330, Feb. 1966. [5] B. Razavi, K. F. Lee, and R.-H. Yan, “Design of high-speed low-power frequency dividers and phase-locked loops in deep submicron CMOS,” IEEE J. Solid-state Circuits, vol. 30, pp. 101-109, Feb. 1995. [6] Y. P. Tsividis, Operation and Modeling of the MOS Transistor. New York McGraw-Hill, 1987. [7] J. A. Connelly and P. Choi, Macromodeling with SPICE. Englewood Cliffs, NJ: Prentice-Hall, 1992. [8] S. 0. Rice, “Mathematical analysis of random noise,’’ Bell System Tech. J., pp. 282-332, July 1944, and pp. 46156, Jan. 1945. [9] P. Bolcato et al., “A new and efficient transient noise analysis technique for simulation of CCD image sensors or particle detectors,” in Proc. CICC, 1993, pp. 14.8.1-14.8.4. [lo] T. Kwasniewski, et al., “Inductorless oscillator design for personal communications devices-A 1.2 pm CMOS process case study,” in Proc. CICC, May 1995, pp. 327-330.

Behzad Razavi (S’87-M’91) received the B.Sc. degree in electrical engineering from Tehran (Sharif) University of Technology, Tehran, Iran, in 1985, and the M.Sc. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, in 1988 and 1991, respectively From 1992 to 1996, he was a Member of Technical Staff at AT&T Bell Laboratories, Holmdel, NJ, where his research involved integrated circuit design for communication systems He is now with Hewlett-Packard Laboratories, Palo Alto, CA. His current interests include wireless transceivers,data conversion, clock recovery, frequency synthesis, and low-voltage low-power circuits. He has been a Visiting Lecturer at Princeton University, Princeton, NJ, and Stanford University. He is also a member of the Technical Program Committee of the International Solid-state Circuits Conference. He has served as Guest Editor to the IEEE OF SOLID-STATE CIRCUITS and International Journal of High Speed JOURNAL Electronics and is currently an Associate Editor of JSSC. He is the author of the book Principles of Data Conversion System Design (IEEE Press, 1995), and editor of Monolothic Phase-Locked Loops and Clock Recovery Circuits (IEEE Press, l996). Dr. Razavi received the Beatrice Winner Award for Editorial Excellence at the 1994 ISSCC, the best paper award at the 1994 European Solid-State Circuits Conference, and the best panel award at the 1995 ISSCC.

928

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 6, JUNE 1998

Correspondence Corrections to “A General Theory of Phase Noise in Electrical Oscillators”

Comments on “A 64-Point Fourier Transform Chip for Video Motion Compensation Using Phase Correlation”1

Ali Hajimiri and Thomas H. Lee

Kevin J. McGee

The authors of the above paper1 have found an error in (19) on p. 185. The factor of 8 in the denominator should be 4; therefore (19) should read

Abstract— The fast Fourier transform (FFT) processor of the above paper,1 contains many interesting and novel features. However, bit reversed input/output FFT algorithms, matrix transposers, and bit reversers have been noted in the literature. In addition, lower radix algorithms can be modified to be made computationally equivalent to higher radix algorithms. Many FFT ideas, including those of the above paper,1 can also be applied to other important algorithms and architectures.

i2n 1 c2 n 1f Lf1!g = 10 1 log 4q2 n=0 2 : max 1!

I. INTRODUCTION 1

Noise power around the frequency n!0 + 1! causes two equal sidebands at !0 6 1!: However, the noise power at n!0 0 1! has a similar effect as mentioned in the paper. Therefore, twice the power of noise at n!0 + 1! should be taken into account. This will also change the 4 in the denominator of (21) to 2 to read 2 2 Lf1!g = 10 1 log 0q2rms 1 2in1 =1f : 1!2 max

Similarly, (24) must change, and its correct form is

1!1=f = !1=f 1

c0 20rms

2

2  !1=f 1 12 cc01 :

This will result in the factor of 1/2 becoming redundant in (29), i.e.,

!0 1 Lf1!g = 10 1 log VkT2 1 Rp 1 (C! 1 1! 0 )2 max

2

:

However, note that the discussion following (29) is still valid. 2 The factor c02 =20rms should be changed to (c0 =20rms )2 in the following instances: 1) p. 185, second column, last paragraph; 2) p. 190, second column, first paragraph; 3) p. 190, second column, second paragraph. Nevertheless, the expression used to calculate the 0rms to predict phase noise of ring oscillators is based on a simulation that takes this effect into account automatically, and therefore the predictions are still valid. The authors regret any confusion this error may have caused. Manuscript received February 27, 1998. The authors are with the Center for Integrated Systems, Stanford University, Stanford, CA 94305-4070 USA. Publisher Item Identifier S 0018-9200(98)03730-5. 1 A. Hajimiri and T. H. Lee, IEEE J. Solid-State Circuits, vol. 33, pp. 179–194, Feb. 1998.

In the above paper, the authors present a fast Fourier transform (FFT) processor that contains many interesting and novel features. The mathematics in the above paper,1 describe a matrix computation where both time inputs and frequency outputs are in bit-reversed order. Bit-reversed input/output FFT algorithms, while not widely known, are not new, having been previously described in [3]. Fig. 1, for example, is a 16-point, radix-4, undecimated, bit reversed input/output, constant output geometry graph based on [3]. The algorithm1 is also described as a decimation-in-time-andfrequency (DITF) type, but the architecture appears to be based on decimation-in-time (DIT). In the above paper,1 Figs. 4 and 10 show a first calculation stage with unity twiddles before the butterfly and a second and third calculation stage with prebutterfly twiddles. Although the butterfly implementation of Fig. 51 may be unique, the use of prebutterfly twiddles in all three stages, along with unity twiddles in the first, would seem to indicate DIT. The architecture1 is also a pipeline and contains many elements common to this type of processor, such as matrix transposers and bit reversers, as will be described below. II. MATRIX TRANSPOSERS

AND

BIT REVERSERS

Block serial/parallel or parallel/serial converters, sometimes called matrix transpose or corner turn buffers, are used in many systems. They perform a matrix transpose on data blocks by exchanging rows and columns. Fig. 2 (from [7]) shows, from upper left to lower right, the flow of data through a 4 2 4 shift-based transposer. The rotator lines show where data will be routed on the next clock cycle and the output is the transpose of the input. The switching action was noted in [7] and [8] and rotator designs can be found in [4], [7], and [8]. Although Fig. 6(b)1 is also an 8 2 8 transposer, it is being used in a somewhat unusual way. By providing a complex (real and

Manuscript received January 31, 1997; revised March 5, 1998. The author was with the Naval Undersea Warfare Center, Newport, RI 02841 USA. He is now at 33 Everett Street, Newport, RI 02840 USA. Publisher Item Identifier S 0018-9200(98)03731-7. 1 C. C. W. Hui, T. J. Ding, J. V. McCanny, and R. F. Woods, IEEE J. Solid-State Circuits, vol. 31, pp. 1751–1761, Nov. 1996.

0018–9200/98$10.00  1998 IEEE

790

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

Jitter and Phase Noise in Ring Oscillators Ali Hajimiri, Sotirios Limotyrakis, and Thomas H. Lee, Member, IEEE

Abstract—A companion analysis of clock jitter and phase noise of single-ended and differential ring oscillators is presented. The impulse sensitivity functions are used to derive expressions for the jitter and phase noise of ring oscillators. The effect of the number of stages, power dissipation, frequency of oscillation, and shortchannel effects on the jitter and phase noise of ring oscillators is analyzed. Jitter and phase noise due to substrate and supply noise is discussed, and the effect of symmetry on the upconversion of 1/f noise is demonstrated. Several new design insights are given for low jitter/phase-noise design. Good agreement between theory and measurements is observed. Index Terms—Design methodology, jitter, noise measurement, oscillator noise, oscillator stability, phase jitter, phase-locked loops, phase noise, ring oscillators, voltage-controlled oscillators.

I. INTRODUCTION

D

UE to their integrated nature, ring oscillators have become an essential building block in many digital and communication systems. They are used as voltage-controlled oscillators (VCO’s) in applications such as clock recovery circuits for serial data communications [1]–[4], disk-drive read channels [5], [6], on-chip clock distribution [7]–[10], and integrated frequency synthesizers [10], [11]. Although they have not found many applications in radio frequency (RF), they can be used for some low-tier RF systems. Recently, there has been some work on modeling jitter and phase noise in ring oscillators. References [12] and [13] develop models for the clock jitter based on time-domain treatments for MOS and bipolar differential ring oscillators, respectively. Reference [14] proposes a frequency-domain approach to find the phase noise based on an linear timeinvariant model for differential ring oscillators with a small number of stages. In this paper, we develop a parallel treatment of frequencydomain phase noise [15] and time-domain clock jitter for ring oscillators. We apply the phase-noise model presented in [16] to obtain general expressions for jitter and phase noise of the ring oscillators. The next section briefly reviews the phase-noise model presented in [16]. In Section III, we apply the model to timing jitter and develop an expression for the timing jitter of oscillators, while Section IV provides the derivation of a closed-form expression to calculate the rms value of the impulse sensitivity function (ISF). Section V introduces expressions for jitter and phase noise in single-ended and differential ring oscillators Manuscript received April 8, 1998; revised November 2, 1998. A. Hajimiri is with the California Institute of Technology, Pasadena, CA 91125 USA. S. Limotyrakis and T. H. Lee are with the Center for Integrated Systems, Stanford University, Stanford, CA 94305 USA. Publisher Item Identifier S 0018-9200(99)04200-6.

in long- and short-channel regimes of operation. Section VI describes the effect of substrate and supply noise as well as the noise due to the tail-current source in differential structures. Section VII explains the design insights obtained from this treatment for low jitter/phase-noise design. Section VIII summarizes the measurement results. II. PHASE NOISE The output of a practical oscillator can be written as (1) is periodic in 2 and and where the function model fluctuations in amplitude and phase due to internal and external noise sources. The amplitude fluctuations are significantly attenuated by the amplitude limiting mechanism, which is present in any practical stable oscillator and is particularly strong in ring oscillators. Therefore, we will focus on phase variations, which are not quenched by such a restoring mechanism. As an example, consider the single-ended ring oscillator with a single current source on one of the nodes shown in Fig. 1. Suppose that the current source consists of an impulse (in coulombs) occurring at time of current with area This will cause an instantaneous change in the voltage of that node, given by (2) is the effective capacitance on that node at where the time of charge injection. This produces a shift in the the change in the phase is transition time. For small proportional to the injected charge (3) is the voltage swing across the capacitor and The dimensionless function is the time-varying proportionality constant and is periodic in 2 It is large when a given perturbation causes a large phase shift thus and small where it has a small effect [16]. Since represents the sensitivity of every point of the waveform to a is called the impulse sensitivity function. perturbation, The time dependence of the ISF can be demonstrated by considering two extreme cases. The first is when the impulse is injected during a transition; this will result in a large phase shift. As the other case, consider injecting an impulse while the output is saturated to either the supply or the ground. This impulse will have a minimal effect on the phase of the oscillator, as shown in Fig. 2.

where

0018–9200/99$10.00  1999 IEEE

HAJIMIRI et al.: JITTER AND PHASE NOISE IN RING OSCILLATORS

791

Fig. 1. Five-stage inverter-chain ring oscillator.

Fig. 2. Effect of impulses injected during transition and peak.

Being interested in its phase we can treat an oscillator as a system that converts voltages and currents to phase. As is evident from the discussion leading to (3), this system is linear for small perturbations. It is also time variant, no matter how small the perturbations are. Unlike amplitude changes, phase shifts persist indefinitely, since subsequent transitions are shifted by the same amount. Thus, the phase impulse response of an oscillator is a timevarying step. Also note that as long as the introduced change in the voltage due to the current impulse is small, the resultant phase shift is linearly proportional to the injected charge, and hence the transfer function from current to phase is linear. The unit impulse response of the system is defined as the amount of phase shift per unit current impulse [16]. Based

on the foregoing argument, we obtain the following timedependent impulse response: (4) is a unit step. where Knowing the response to an impulse, we can calculate in response to any injected current using the superposition integral

(5)

792

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

where represents the noise current injected into the node of interest. Note that the integration arises from the closedloop nature of the oscillator. The single-sideband phase-noise spectrum due to a white-noise current source is given by [16]1 (6) is the rms value of the ISF, is the singlewhere sideband power spectral density of the noise current source, is the frequency offset from the carrier. In the case and of multiple noise sources injecting into the same node, represents the total current noise due to all the sources and is given by the sum of individual noise power spectral densities [17]. If the noise sources on different nodes are uncorrelated, the waveform (and hence the ISF) of all the nodes are the same except for a phase shift, assuming identical stages. Therefore, noise sources is times the total phase noise due to all the value given by (6) (or 2 times for a differential ring oscillator). From (5), it follows that the upconversion of low-frequency noise, such as 1 noise, is governed by the dc value of the and 1 regions in ISF. The corner frequency between 1 and is related to the spectrum of the phase noise is called through the following equation [16]: the 1 noise corner

Fig. 3. Clock jitter increasing with time.

(7) is the dc value of the ISF. Since the height of the where positive and negative lobes of the ISF is determined by the slope of the rising and falling edges of the output waveform, respectively, symmetry of the rising and falling edges can and hence the upconversion of 1 noise. reduce III. JITTER In an ideal oscillator, the spacing between transitions is constant. In practice, however, the transition spacing will be variable. This uncertainty is known as clock jitter and (i.e., the time deincreases with measurement interval lay between the reference and the observed transitions), as illustrated in Fig. 3. This variability accumulation (i.e., “jitter accumulation”) occurs because any uncertainty in an earlier transition affects all the following transitions, and its effect persists indefinitely. Therefore, the timing uncertainty when seconds have elapsed is the sum of the uncertainties associated with each transition. The statistics of the timing jitter depend on the correlations among the noise sources involved. The case of each transition’s being affected by independent noise sources has been considered in [12] and [13]. The jitter introduced by each stage is assumed to be totally independent of the jitter introduced by other stages, and therefore the total variance of the jitter is given by the sum of the variances introduced at each stage. For ring oscillators with identical stages, the variance will be given where is the number of transitions during and by 1 A more accurate treatment [17] shows that the phase noise does not grow without bound as fo approaches zero (it becomes flat for small values of fo ): However, this makes no practical difference in this discussion.

Fig. 4. RMS jitter versus measurement time on a log–log plot.

is the variance of the uncertainty introduced by one stage is proportional to during one transition. Noting that the standard deviation of the jitter after seconds is [13] (8) where is a proportionality constant determined by circuit parameters. Another instructive special case that is not usually considered is when the noise sources are totally correlated with one another. Substrate and supply noise are examples of such noise sources. Low-frequency noise sources, such as 1 noise, can also result in a correlation between induced jitter on transitions over multiple cycles. In this case, the standard deviations rather than the variances add. Therefore, the standard deviation of the seconds is proportional to jitter after (9) is another proportionality constant. Noise sources where such as thermal noise of devices are usually modeled as uncorrelated, while substrate and supply-noise sources, as well as low-frequency noise, are approximated as partially or fully correlated sources. In practice, both correlated and uncorrelated sources exist in a circuit, and hence a log–log versus the measurement delay plot of the timing jitter for an open-loop oscillator will demonstrate regions with slopes of 1/2 and 1, as shown in Fig. 4.

HAJIMIRI et al.: JITTER AND PHASE NOISE IN RING OSCILLATORS

793

Fig. 5. ISF for ring oscillators of the same frequency with different number of stages.

In most digital applications, it is desirable for to In practice, we wish decrease at the same rate as the period to keep constant the ratio of the timing jitter to the period. Therefore, in many applications, phase jitter, defined as (10) is a more useful measure. can be obtained using (5). As shown An expression for or where is an in Appendix A, for integer, the phase jitter due to a single white noise source is given by (11) Using (10) and (11), the proportionality constant calculated to be

in (8) is

Fig. 6. Approximate waveform and ISF for ring oscillator.

(12) IV. CALCULATION OF THE ISF

FOR

RING OSCILLATORS

To calculate phase noise and jitter using (6) and (12), one needs to know the rms value of the ISF. Although one can always find the ISF through simulation, we obtain a closedform approximate equation for the rms value of the ISF of ring oscillators, which usually makes such simulations unnecessary. It is instructive to look at the actual ISF of ring oscillators to gain insight into what constitutes a good approximation. Fig. 5 shows the shape of the ISF for a group of single-ended CMOS ring oscillators. The frequency of oscillation is kept constant (through adjustment of channel length), while the number of stages is varied from 3 to 15 (in odd numbers). To calculate the ISF, a narrow current pulse is injected into one of the nodes of the oscillator, and the resulting phase shift is measured a few cycles later in simulation. As can be seen, increasing the number of stages reduces the peak value of the ISF. The reason is that the transitions of the Since the normalized waveform become faster for larger sensitivity during the transition is inversely proportional to the slope, the peak of the ISF drops. It should be noted that only the peak of the ISF is inversely proportional to the slope, and

Fig. 7. Relationship between rise time and delay.

this relation should not be generalized to other points in time. Also, the widths of the lobes of the ISF decrease as becomes larger, since each transition occupies a smaller fraction of the period. Based on these observations, we approximate the ISF as triangular in shape and with symmetric rising and falling edges, as shown in Fig. 6. The case of nonsymmetric rising and falling edges is considered in Appendix B. where is the The ISF has a maximum of 1 maximum slope of the normalized waveform in (1). Also, the , and hence the width of the triangles is approximately 2 slopes of the sides of the triangles are 1. Therefore, assuming can be estimated as equality of rise and fall times,

(13)

794

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

Fig. 8. RMS values of the ISF’s for various single-ended ring oscillators versus number of stages.

On the other hand, stage delay is proportional to the rise time (14) is the normalized stage delay and is a proportionwhere ality constant, which is typically close to one, as can be seen in Fig. 7. The period is 2 times longer than a single stage delay (15) Using (13) and (15), the following approximate expression for is obtained: (16) dependence of is independent of Note that the 1 Fig. 8 illustrates for the ISF shown in the value of Fig. 5 with plus signs on log–log axes. The solid line shows which is obtained from (16) for the line of To verify the generality of (16), we maintain a fixed channel length for all the devices in the inverters while varying the number of stages to allow different frequencies of is calculated, and is shown in Fig. 8 oscillation. Again, with circles. We also repeat the first experiment with a different supply voltage (3 V as opposed to 5 V), and the result is shown are almost with crosses. As can be seen, the values of identical for these three cases. is primarily a function It should not be surprising that because the effect of variations in other parameters, of and device noise, have already been decoupled such as , and thus the ISF is a unitless, frequency- and from amplitude-independent function.

Equation (16) is valid for differential ring oscillators as well, since in its derivation no assumption specific to singleended oscillators was made. Fig. 9 shows the for three sets of differential ring oscillators, with a varying number of stages (4–16). The data shown with plus signs correspond to oscillators in which the total power dissipation and the drain voltage swing are kept constant by scaling the tail-current changes. Members of the sources and load resistors as second set of oscillators have a fixed total power dissipation and fixed load resistors, which result in variable swings and for whom data are shown with circles. The third case is that of a fixed tail current for each stage and constant load resistors, whose data are illustrated using crosses. Again, in spite of the diverse variations of the frequency and other dependency of and its circuit parameters, the 1 independence from other circuit parameters still holds. In the which case of a differential ring oscillator, is the best fit approximation for corresponds to This is shown with the solid line in Fig. 9. A similar result can be obtained for bipolar differential ring oscillators. decreases as the number of stages increases, Although one should not prematurely conclude that the phase noise can be reduced using a larger number of stages because the number of noise sources, as well as their magnitudes, also increases for a given total power dissipation and frequency of oscillation. In the case of asymmetric rising and falling edges, both and will change. As shown in Appendix B, the 1 corner of the phase-noise spectrum is inversely proportional corner can be to the number of stages. Therefore, the 1 reduced either by making the transitions more symmetric in terms of rise and fall times or by increasing the number of stages. Although the former always helps, the latter has other region, as will be implications on the phase noise in the 1 shown in the following section.

HAJIMIRI et al.: JITTER AND PHASE NOISE IN RING OSCILLATORS

795

Fig. 9. RMS values of the ISF’s for various differential ring oscillators versus number of stages.

where

V. EXPRESSIONS FOR JITTER AND PHASE NOISE IN RING OSCILLATORS

(19)

In this section, we derive expressions for the phase noise and jitter of different types of ring oscillators. Throughout this section, we assume that the symmetry criteria required to (and hence the upconversion of 1 noise) are minimize already met and that the jitter and phase noise of the oscillator are dominated by white noise. For CMOS transistors, the drain current noise spectral density is given by (17) is the zero-bias drain source conductance, is where is the gate-oxide capacitance per unit area, the mobility, and are the channel width and length of the device, is the gate voltage overdrive. The respectively, and coefficient is 2/3 for long-channel devices in the saturation region and typically two to three times greater for shortchannel devices [18]. Equation (17) is valid in both shortand long-channel regimes as long as an appropriate value for is used. A. Single-Ended CMOS Ring Oscillators We start with a single-ended CMOS ring oscillator with equal-length NMOS and PMOS transistors. Assuming that the maximum total channel noise from NMOS and PMOS devices, when both the input and output are at is given by (18)

and (20) and

is the gate overdrive in the middle of transition, i.e.,

During one period, each node is charged to and then discharged to zero. In an -stage single-ended ring oscillator, the power dissipation associated with this process However, during the transitions, some extra is current, known as crowbar current, is drawn from the supply, which does not contribute to charging and discharging the capacitors and goes directly from supply to ground through both transistors. In a symmetric ring oscillator, these two components are approximately equal, and their difference will depend on the ratio of the rise time and stage delay. Therefore, the total power dissipation is approximately given by (21) to make the waveforms symmetric Assuming to the first order, we have (22) is the delay of each stage and and are the where rise and fall time, respectively, associated with the maximum slope during a transition. Assuming that the thermal noise sources of the different devices are uncorrelated, and assuming that the waveforms (and hence the ISF) of all the nodes are the same except for a noise sources is phase shift, the total phase noise due to all

796

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

times the value given by (6). Taking only these inevitable noise sources into account, (6), (16), (18), (21), and (22) result in the following expressions for phase noise and jitter:

Using (28) and (29), we obtain the same expressions for phase noise and jitter as given by (23) and (24), except for a new

(23)

(30)

(24) is the characteristic voltage of the device. For where long-channel mode of operation, it is defined as Any extra disturbance, such as substrate and supply noise, or noise contributed by extra circuitry or asymmetry in the waveform will result in a larger number than (23) and (24). Note that lowering threshold voltages reduces the phase noise, in agreement with [12]. Therefore, the minimum achievable phase noise and jitter for a single-ended CMOS ring oscillator, assuming that all symmetry criteria are met, occurs for zero threshold voltage

which results in a larger phase noise and jitter than the longAgain, note the absence channel case by a factor of of any dependency on the number of stages. B. Differential CMOS Ring Oscillators Now consider a differential MOS ring oscillator with resistive load. The total power dissipation is (31) is the number of stages, is the tail bias current where is the supply voltage. The of the differential pair, and frequency of oscillation can be approximated by

(25)

(32)

(26)

Surprisingly, tail-current source noise in the vicinity of does not affect the phase noise. Rather, its low-frequency noise as well as its noise in the vicinity of even multiples of the oscillation frequency affect the phase noise. Tail noise in the vicinity of even harmonics can be significantly reduced by a variety of means, such as with a series inductor or a parallel capacitor. As before, the effect of low-frequency noise can be minimized by exploiting symmetry. Therefore, only the noise of the differential transistors and the load are taken into account. The total current noise on each single-ended node is given by

As can be seen, the minimum phase noise is inversely proportional to the power dissipation and grows quadratically with the oscillation frequency. Further, note the lack of dependence on the number of stages (for a given power dissipation and oscillation frequency). Evidently, the increase in the number of noise sources (and in the maximum power due to the higher transition currents required to run at the same as frequency) essentially cancels the effect of decreasing increases, leading to no net dependence of phase noise on This somewhat surprising result may explain the confusion that exists regarding the optimum , since there is not a strong dependence on the number of stages for single-ended CMOS ring oscillators. Note that (25) and (26) establish the lower bound and therefore should not be used to calculate the phase noise and jitter of an arbitrary oscillator, for which (6) and (12) should be used, respectively. We may carry out a similar calculation for the short-channel case. For such devices, the drain current may be expressed as (27) is the critical electric field and is defined as the value where of electric field resulting in half the carrier velocity expected from low field mobility. Combining (17) with (27), we obtain the following expression for the drain current noise of a MOS device in short channel: (28)

(33) is the load resistor, for a where balanced stage in the long-channel limit and in the short-channel regime. The phase noise and jitter due to all 2 noise sources is 2 times the value given by (6) and (12). Using (16), the expression for the phase noise of the differential MOS ring oscillator is

(34) and is given by (35)

The frequency of oscillation can be approximated by

(29)

Equations (34) and (35) are valid in both long- and shortchannel regimes of operation with the right choice of Note that, in contrast with the single-ended ring oscillator, a differential oscillator does exhibit a phase noise and jitter dependency on the number of stages, with the phase noise

HAJIMIRI et al.: JITTER AND PHASE NOISE IN RING OSCILLATORS

797

degrading as the number of stages increases for a given frequency and power dissipation. This result may be understood as a consequence of the necessary reduction in the charge swing that is required to accommodate a constant frequency increases. At the of oscillation at a fixed power level as same time, increasing the number of stages at a fixed total power dissipation demands a proportional reduction of tail, current sources, which will reduce the swing, and hence by a factor of 1 C. Bipolar Differential Ring Oscillator A similar approach allows us to derive the corresponding results for a bipolar differential ring oscillator. In this case, the power dissipation is given by (31) and the oscillation frequency by (32). The total noise current is given by the sum of collector shot noise and load resistor noise

Fig. 10. Phasors for noise contributions from each source.

(36) is the electron charge, is the collector where Using these current during the transition, and relations, the phase noise and jitter of a bipolar ring oscillator are again given by (34) and (35) with the appropriate choice of VI. OTHER NOISE SOURCES Other noise sources, such as tail-current source noise in a differential structure, or substrate and supply noise sources, may play an important role in the jitter and phase noise of ring oscillators. The low-frequency noise of the tail-current source affects phase noise if the symmetry criteria mentioned in Section II are not met by each half circuit. In such cases, the ISF for the tail-current source has a large dc value, which increases the upconversion of low-frequency noise to phase noise. This upconversion is particularly prominent if the tail device has a large 1 noise corner. Substrate and supply noise are among other important sources of noise. There are two major differences between these noise sources and internal device noise. First, the power spectral density of these sources is usually nonwhite and often demonstrates strong peaks at various frequencies. Even more important is that the substrate and supply noise on different nodes of the ring oscillator have a very strong correlation. This property changes the response of the oscillator to these sources. To understand the effect of this correlation, let us consider the special case of having equal noise sources on all the nodes of the oscillator. If all the inverters in the oscillator are the same, the ISF for different nodes will only differ in phase by as shown in Fig. 10. Therefore, the total multiples of phase due to all the sources is given by superposition of (5)

(37)

Fig. 11. Sideband power below carrier for equal sources on all five nodes at nf0 f :

+ m

Expanding the term in brackets in a Fourier series, we can i.e., show that it is zero except at dc and multiples of (38) where is the th Fourier coefficient of the ISF. Equation (38) means that for identical sources, only noise in the vicinity of affects the phase. integer multiples of To verify this effect, sinusoidal currents with an amplitude of 10 A were injected into all five nodes of the five-stage ring oscillator of Fig. 1 at different offsets from integer multiples of the frequency of oscillation, and the induced sidebands were measured. The measured sideband power with respect to the carrier is plotted in Fig. 11. As can be seen in Fig. 11, only injection at low frequency and in the vicinity of the fifth harmonic are integrated, and show a 20 dB/dec slope. The effect of injection in the vicinity is much of harmonics that are not integer multiples of smaller than at the integer ones. Ideally, there should be no sideband induced by the injection in the vicinity of harmonics that are not integer multiples of ; however, as can be seen in Fig. 11, there is some sideband power due to the amplitude response. Low-frequency noise can also result in correlation between uncertainties introduced during different cycles, as its value does not change significantly over a small number of periods.

798

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

Therefore, the uncertainties add up in amplitude rather than power, resulting in a region with a slope of one in the log–log plot of jitter even in the absence of external noise sources such as substrate and supply noise.

VII. DESIGN IMPLICATIONS One can use (23) and (34) to compare the phase-noise performance of single-ended and differential MOS ring osstages, the phase noise cillators. As can be seen for of the differential ring oscillator is approximately times larger than the phase noise of a singleand Since the minimum ended oscillator of equal for a regular ring oscillator is three, even a properly designed differential CMOS ring oscillator underperforms its single-ended counterpart, especially for a larger number of stages. This difference is even more pronounced if proper precautions to reduce the noise of the tail current are not taken. However, the differential ring oscillator may still be preferred in IC’s because of the lower sensitivity to substrate and supply noise, as well as lower noise injection into other circuits on the same chip. The decision to use differential versus single-ended ring oscillators should be based on both of these considerations. The common-mode sensitivity problem in a single-ended ring oscillator can be mitigated to some extent by using two identical ring oscillators laid out close to each other that oscillate out of phase because of small coupling inverters [19]. Single-ended configurations may be used in a less noisy environment to achieve better phase-noise performance for a given power dissipation. As shown in Appendix B, asymmetry of the rising and falling edges degrades phase noise and jitter by increasing corner frequency. Thus, every effort should be taken the 1 to make the rising and falling edges symmetric. By properly adjusting the symmetry properties, one can suppress or even eliminate low-frequency-noise upconversion [16]. As shown in [16], differential symmetry is insufficient, and the symmetry of each half circuit is important. One practical method to achieve this symmetry is to use more linear loads, such as resistors or linearized MOS devices. This method reduces the 1 noise upconversion and substrate and supply coupling [20]. Another revealing implication, shown in Appendix A, is the reduction corner frequency as increases. Hence for a of the 1 process with large 1 noise, a larger number of stages may be helpful. One question that frequently arises in the design of ring oscillators is the optimum number of stages for minimum jitter and phase noise. As seen in (23), for single-ended oscillators, region is not a strong the phase noise and jitter in the 1 function of the number of stages for single-ended CMOS ring oscillators. However, if the symmetry criteria are not well noise, a larger satisfied and/or the process has a large 1 will reduce the jitter. In general, the choice of the number of stages must be made on the basis of several design criteria, such as 1 noise effect, the desired maximum frequency of oscillation, and the influence of external noise sources, such as supply and substrate noise, that may not scale with

The jitter and phase noise behavior are different for differential ring oscillators. As (34) suggests, jitter and phase noise increase with an increasing number of stages. Hence if the 1 noise corner is not large, and/or proper symmetry measures have been taken, the minimum number of stages (three or four) should be used to give the best performance. This recommendation holds even if the power dissipation is not a primary issue. It is not fair to argue that burning more power in a larger number of stages allows the achievement of better phase noise, since dissipating the same total power in a smaller number of stages results in better jitter/phase noise as long as it is possible to maximize the total charge swing. Another insight one can obtain from (34) and (35) is that the jitter of a MOS differential ring oscillator at a and is smaller than that of a differential given bipolar ring oscillator, at least for today’s range of circuit and process parameters. As we go to shorter channel lengths, the characteristic voltage for the MOS devices given by (30) becomes smaller, and thus phase noise degrades with scaling. Bipolar ring oscillators do not suffer from this problem. LC oscillators generally have better phase noise and jitter compared to ring oscillators for two reasons. First, a ring oscillator stores a certain amount of energy in the capacitors during every cycle and then dissipates all the stored energy during the same cycle, while an LC resonator dissipates only of the total energy stored during one cycle. Thus, for a 2 given power dissipation in steady state, a ring oscillator suffers Second, in a ring from a smaller maximum charge swing oscillator, the device noise is maximum during the transitions, which is the time where the sensitivity, and hence the ISF, is the largest [16].

VIII. EXPERIMENTAL RESULTS The phase-noise measurements in this section were performed using three different systems: an HP 8563E spectrum analyzer with phase-noise measurement capability, an RDL NTS-1000A phase-noise measurement system, and an HP E5500 phase-noise measurement system. The jitter measurements were performed using a Tektronix CSA 803A communication signal analyzer. Tables I–III summarize the phase-noise measurements. All the reported phase-noise values are at a 1-MHz offset from the carrier, chosen to achieve the largest dynamic range in the measurement. Table I shows the measurement results for three different inverter-chain ring oscillators. These oscillators are made of the CMOS inverters shown in Fig. 12(a), with no frequency tuning mechanism. The output is taken from one node of the ring through a few stages of tapered inverters. Oscillators number 1 and 2 are fabricated in a 2- m 5-V CMOS process, and oscillator number 3 is fabricated in a 0.25m 2.5-V process. The second column shows the number of ratios of the NMOS stages in each of the oscillators. The and PMOS devices, as well as the supply voltages, the total measured supply currents, and the frequencies of oscillation are shown next. The phase-noise prediction using (23) and (6), together with the measured phase noise, are shown in the last three columns.

HAJIMIRI et al.: JITTER AND PHASE NOISE IN RING OSCILLATORS

799

TABLE I INVERTER-CHAIN RING OSCILLATORS

TABLE II CURRENT-STARVED INVERTER-CHAIN RING OSCILLATORS

As an illustrative example, we will show the details of phase-noise calculations for oscillator number 3. Using (16) to the phase noise can be obtained from (6). We calculate calculate the noise power when the stage is halfway through a transition. At this point, the drain current is simulated to be of 4 10 V/m and a of 2.5 is used in 3.47 mA. An A Hz (28) to obtain a noise power of The total capacitance on each node is fF, and fC. There is one such noise source on hence times the value each node; therefore, the phase noise is MHz dBc/Hz. given by (6), which results in Table II summarizes the data obtained for current-starved ring oscillators with the cell structure shown in Fig. 12(b), all implemented in the same 0.25- m 2.5-V process. Ring oscillators with a different number of stages were designed with roughly constant oscillation frequency and total power dissipation. Frequency adjustment is achieved by changing the channel length, while total power dissipation control is ratios of the performed by changing device width. The inverter and the tail NMOS and PMOS devices are shown is kept at while node in Table II. The node is at 0 V. The measured total current dissipation and the frequency of oscillation can be found in columns 7 and 8. Phase-noise calculations based on (23) and (6) are in good

agreement with the measured results. The die photo of the chip containing these oscillators is shown in Fig. 13. The slightly superior phase noise of the three-stage ring oscillator (number 4) can be attributed to lower oscillation frequency and longer channel length (and hence smaller ). Table III summarizes the results obtained for differential ring oscillators of various sizes and lengths with the inverter topology shown in Fig. 12(c), covering a large span of frequencies up to 5.5 GHz. All these ring oscillators are implemented in the same 0.25- m 2.5-V process, and all the oscillators, except the one marked with N/A, have the tuning circuit shown. The resistors are implemented using an unsilicided polysilicon layer. The main reason to use poly resistors is to reduce 1 noise upconversion by making the waveform on each node closer to the step response of an RC network, which is more symmetrical. The value of these load resistors and the ratios of the differential pair are shown in Table III. A fixed 2.5-V power supply is used, resulting in different total power dissipations. As before, the measured phase noise is in good agreement with the predicted phase noise using (34) and (6). The die photo of oscillator number 26 can be found in Fig. 14. To illustrate further how one obtains the phase-noise predictions shown in Table III, we elaborate on the phase-noise

800

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

TABLE III DIFFERENTIAL RING OSCILLATORS

(a) Fig. 12.

(b)

(c)

Inverter stages for (a) inverter-chain ring oscillators, (b) current-starved inverter-chain ring oscillators, and (c) differential ring oscillators.

calculations for oscillator number 12. The noise current due to one of differential pair NMOS devices is given by (28). The total capacitance on each node in the balanced case is fF, and the simulated voltage swing is 1.208 V; fC. In the balanced case, this current therefore, mA, and therefore is half of the tail current, i.e., the noise current of the NMOS device has a single-sideband A Hz The thermal spectral density of noise due to the load resistor is A Hz; therefore, the total current noise density is given by A Hz For a differential ring oscillator with stages, there is one such noise source on each node; therefore, the phase noise is 2 times the value given by MHz dBc/Hz The total (6), which results in mW, and power dissipation is

Therefore, with an of 0.9, (34) predicts a phase noise of MHz dBc/Hz Timing jitter for oscillator number 12 can be measured using the setup shown in Fig. 15. The oscillator output is divided into two equal-power outputs using a power splitter. The CSA 803A is not capable of showing the edge it uses to trigger, as there is a 21-ns minimum delay between the triggering transition and the first acquired sample. To be able to look at the triggering edge and perhaps the edges before that, a delay line of approximately 25 ns is inserted in the signal path in front of the sampling head. This way, one may look at the exact edge used to trigger the signal. If the sampling head and the power splitter were noiseless, this edge would show no jitter. However, the power splitter and the sampling head introduce noise onto the signal, which cannot be easily

HAJIMIRI et al.: JITTER AND PHASE NOISE IN RING OSCILLATORS

Fig. 13.

Die photograph of the current-starved single-ended oscillators.

Fig. 14.

Die photograph of the 12-stage differential ring oscillator.

801

The effect of this excess jitter should be subtracted from the jitter due to the DUT. Assuming no correlation between the jitter of the DUT and the sampling head, the equivalent jitter due to the DUT can be estimated by (39)

Fig. 15.

Timing jitter measurement setup using CSA803A.

distinguished from the device under test (DUT)’s jitter. This extra jitter can be directly measured by looking at the jitter on the triggering edge. This edge can be readily identified since it has lower rms jitter than the transitions before and after it.

is the effective rms timing jitter, is the where after the triggering edge, measured rms jitter at a delay is the jitter on the triggering edge. and Fig. 16 shows the rms jitter versus the measurement delay for oscillator number 12 on a log–log plot. The best fit for the Equations (12) data shown in Fig. 16 is and and (35) result in respectively. The region of the jitter plot with the slope of one can be attributed to the 1 noise of the devices, as discussed at the end of Section VI. In a separate experiment, the phase noise of oscillator and number 7 is measured for different values of

802

Fig. 16.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

RMS jitter versus measurement interval for the four-stage, 2.8-GHz differential ring oscillator (oscillator number 12).

on the phase noise and jitter at a given total power dissipation and frequency of oscillation was shown for single-ended and differential ring oscillators using the general expression for the rms value of the ISF. The upconversion of low-frequency 1 was analyzed showing the effect of waveform asymmetry and the number of stages. New design insights arising from this approach were introduced, and good agreement between theory and measurements was obtained. APPENDIX A RELATIONSHIP BETWEEN JITTER AND PHASE NOISE The phase jitter is (40) where Fig. 17.

Phase noise versus symmetry voltage for oscillator number 7.

(41) These bias voltages are chosen in such a way as to keep a constant oscillation frequency while changing only corner of the the ratio of rise time to fall time. The 1 phase noise is measured for different ratios of the pullup and pulldown currents while keeping the frequency constant. One can observe a sharp reduction in the corner frequency at the point of symmetry in Fig. 17. IX. CONCLUSION An analysis of the jitter and phase noise of single-ended and differential ring oscillators was presented. The general noise model, based on the ISF, was applied to the case of ring oscillators, resulting in a closed-form expression for the phase noise and jitter of ring oscillators [(6), (23), (34)]. The model was used to perform a parallel analysis of jitter and phase noise for ring oscillators. The effect of the number of stages

Therefore

(42) For a white-noise current source, the autocorrelation func; therefore tion is (43) which is for

(44)

HAJIMIRI et al.: JITTER AND PHASE NOISE IN RING OSCILLATORS

803

Analog and digital designers prefer using phase noise and timing jitter, respectively. The relationship between these two parameters can be obtained by noting that timing jitter is the standard deviation of the timing uncertainty

(45) represents the expected value. Since the autocorwhere is defined as relation function of (46) the timing jitter in (45) can be written as

Fig. 18. Approximate waveform and the ISF for asymmetric rising and falling edges.

APPENDIX B NONSYMMETRIC RISING AND FALLING EDGES We approximate the ISF in this Appendix by the function depicted in Fig. 18. The rms value of the ISF is

(47) The relation between the autocorrelation and the power spectrum is given by the Khinchin theorem [21], i.e., (48)

(52)

represents the power spectrum of Therewhere fore, (47) results in the following relationship between clock jitter and phase noise:

and are the maximum slope during the rising where and falling edge, respectively, and represents the asymmetry of the waveform and is defined as

(49)

(53)

can be approximated It may be useful to know that for large offsets [22]. As can be seen from the by foregoing, the rms timing jitter has less information than the phase-noise spectrum and can be calculated from phase noise using (49). However, unless extra information about the shape of the phase-noise spectrum is known, the inverse is not possible in general. In the special case where the phase noise is dominated and are given by (6) and (12). by white noise, Therefore, can be expressed in terms of phase noise in the region as 1

noting that (54) Combining (52) and (54) results in the following: (55) i.e., which reduces to (16) in the special case of symmetric rising and falling edges. The dc value of the ISF, can be calculated from Fig. 18 in a similar manner and is given by

(50) is the phase noise measured in the 1 where region at an offset frequency of and is the oscillation frequency. Therefore, based on (8), the rms cycle-to-cycle jitter will be given by (51) Note that for (50) and (51) to be valid, the phase noise at should be in the 1 region.

(56) Using (7), the 1

corner is given by (57)

corner As can be seen for a constant rise-to-fall ratio, the 1 decreases inversely with the number of stages; therefore, ring oscillators with a smaller number of stages will have a larger noise corner. As a special case, if the rise and fall time 1 are symmetric, , and the 1 corner approaches zero.

804

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

ACKNOWLEDGMENT The authors would like to thank M. A. Horowitz, G. Nasserbakht, A. Ong, C. K. Yang, B. A. Wooley, and M. Zargari for helpful discussions and support. They would further like to thank Texas Instruments, Inc., and Stanford Nano-Fabrication facilities for fabrication of the oscillators.

[19] T. Kwasniewski, M. Abou-Seido, A. Bouchet, F. Gaussorgues, and J. Zimmerman, “Inductorless oscillator design for personal communications devices—A 1.2  CMOS process case study,” in Proc. CICC, May 1995, pp. 327–330. [20] J. G. Maneatis and M. A. Horowitz, “Precise delay generation using coupled oscillators,” IEEE J. Solid-State Circuits, vol. 28, pp. 1273–1282, Dec. 1993. [21] W. A. Gardner, Introduction to Random Processes. New York: McGraw-Hill, 1990. [22] W. F. Egan, Frequency Synthesis by Phase Lock. New York: Wiley, 1981.

m

REFERENCES [1] L. DeVito, J. Newton, R. Croughwell, J. Bulzacchelli, and F. Benkley, “A 52 and 155 MHz clock-recovery PLL,” in ISSCC Dig. Tech. Papers, Feb. 1991, pp. 142–143. [2] A. W. Buchwald, K. W. Martin, A. K. Oki, and K. W. Kobayashi, “A 6GHz integrated phase-locked loop using AlCaAs/Ga/As heterojunction bipolar transistors,” IEEE J. Solid-State Circuits, vol. 27, pp. 1752–1762, Dec. 1992. [3] B. Lai and R. C. Walker, “A monolithic 622 Mb/s clock extraction data retiming circuit,” in ISSCC Dig. Tech. Papers, Feb. 1993, pp. 144–144. [4] R. Farjad-Rad, C. K. Yang, M. Horowitz, and T. H. Lee, “A 0.4 mm CMOS 10 Gb/s 4-PAM pre-emphasis serial link transmitter,” in Symp. VLSI Circuits Dig. Tech Papers, June 1998, pp. 198–199. [5] W. D. Llewellyn, M. M. H. Wong, G. W. Tietz, and P. A. Tucci, “A 33 Mbi/s data synchronizing phase-locked loop circuit,” in ISSCC Dig. Tech. Papers, Feb. 1988, pp. 12–13. [6] M. Negahban, R. Behrasi, G. Tsang, H. Abouhossein, and G. Bouchaya, “A two-chip CMOS read channel for hard-disk drives,” in ISSCC Dig. Tech. Papers, Feb. 1993, pp. 216–217. [7] M. G. Johnson and E. L. Hudson, “A variable delay line PLL for CPUcoprocessor synchronization,” IEEE J. Solid-State Circuits, vol. 23, pp. 1218–1223, Oct. 1988. [8] I. A. Young, J. K. Greason, and K. L. Wong, “A PLL clock generator with 5–110 MHz of lock range for microprocessors,” IEEE J. Solid-State Circuits, vol. 27, pp. 1599–1607, Nov. 1992. [9] J. Alvarez, H. Sanchez, G. Gerosa, and R. Countryman, “A widebandwidth low-voltage PLL for PowerPCTM microprocessors,” IEEE J. Solid-State Circuits, vol. 30, pp. 383–391, Apr. 1995. [10] I. A. Young, J. K. Greason, J. E. Smith, and K. L. Wong, “A PLL clock generator with 5–110 MHz lock range for microprocessors,” in ISSCC Dig. Tech. Papers, Feb. 1992, pp. 50–51. [11] M. Horowitz, A. Chen, J. Cobrunson, J. Gasbarro, T. Lee, W. Leung, W. Richardson, T. Thrush, and Y. Fujii, “PLL design for a 500 Mb/s interface,” in ISSCC Dig. Tech. Papers, Feb. 1993, pp. 160–161. [12] T. C. Weigandt, B. Kim, and P. R. Gray, “Analysis of timing jitter in CMOS ring oscillators,” in Proc. ISCAS, June 1994. [13] J. McNeill, “Jitter in ring oscillators,” IEEE J. Solid-State Circuits, vol. 32, pp. 870–879, June 1997. [14] B. Razavi, “A study of phase noise in CMOS oscillators,” IEEE J. Solid-State Circuits, vol. 31, pp. 331–343, Mar. 1996. [15] A. Hajimiri, S. Limotyrakis, and T. H. Lee, “Phase noise in multigigahertz CMOS ring oscillators,” in Proc. Custom Integrated Circuits Conf., May 1998, pp. 49–52. [16] A. Hajimiri and T. H. Lee, “A general theory of phase noise in electrical oscillators,” IEEE J. Solid-State Circuits, vol. 33, pp. 179–194, Feb. 1998. , The Design of Low Noise Oscillators. Boston, MA: Kluwer [17] Academic, 1999. [18] A. A. Abidi, “High-frequency noise measurements of FET’s with small dimensions,” IEEE Trans. Electron Devices, vol. ED-33, pp. 1801–1805, Nov. 1986.

Ali Hajimiri received the B.S. degree in electronics engineering from Sharif University of Technology, Tehran, Iran, in 1994 and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, in 1996 and 1998, respectively. He was a Design Engineer with Philips, where he worked on a BiCMOS chipset for GSM cellular units from 1993 to 1994. During the summer of 1995, he was with Sun Microsystems, where he worked on the UltraSparc microprocessor’s cache RAM design methodology. During the summer of 1997, he was with Lucent Technologies (Bell Labs), where he investigated low-phase-noise integrated oscillators. In 1998, he joined the Faculty of the California Institute of Technology, Pasadena, as an Assistant Professor. His research interests are high-speed and RF integrated circuits. He is coauthor of The Design of Low Noise Oscillators (Boston, MA: Kluwer Academic, 1999). Dr. Hajimiri was the Bronze Medal Winner of the 21st International Physics Olympiad, Groningen, the Netherlands. He was a corecipient of the International Solid-State Circuits Conference 1998 Jack Kilby Outstanding Paper Award.

Sotirios Limotyrakis was born in Athens, Greece, in 1971. He received the B.S. degree in electrical engineering from the National Technical University of Athens in 1995 and the M.S. degree in electrical engineering from Stanford University, Stanford, CA, in 1997, where he currently is pursuing the Ph.D. degree. In the summer of 1993, he was with K.D.D. Corp., Saitama R&D Labs, Japan, where he worked on the design of communication protocols. During the summers of 1996 and 1997, he was with the Texas Instruments Inc. R&D Center, Dallas, TX, where he focused on LNA, low-phase-noise oscillator design, and GSM mobile unit transmit path architectures. His current research interests include the design of mixed-signal circuits for high-speed data conversion and broad-band communications. Mr. Limotyrakis received the W. Burgess Dempster Memorial Fellowship from the School of Engineering, Stanford University, in 1995.

Thomas H. Lee (S’87–M’87), for a photograph and biography, see p. 585 of the May 1999 issue of this JOURNAL.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 3, MARCH 1998

483

On-Chip Measurement of the Jitter Transfer Function of Charge-Pump Phase-Locked Loops Benoˆıt R. Veillette, Student Member, IEEE, and Gordon W. Roberts, Member, IEEE

Abstract— An all-digital technique for the measurement of the jitter transfer function of charge-pump phase-locked loops (PLL’s) is introduced. Input jitter may be generated using one of two methods. Both rely on delta–sigma modulation to shape the unavoidable quantization noise to high frequencies. This noise is filtered by the low-pass characteristic of the device and has little impact on the test results. For an input–output response measurement, the output jitter is compared against a threshold. As the stimulus generation and output analysis circuits are digital, do not require calibration, and demand a small area overhead, this jitter transfer function measurement scheme may be placed on the die to adaptively tune a PLL after fabrication. The technique can also implement built-in self-test (BIST) for the characterization or manufacture test of PLL’s. The validity of the scheme was verified experimentally with offthe-shelf components. Index Terms—Mixed analog-digital integrated circuits, phaselocked loops, self-testing, semiconductor device testing, sigma– delta modulation.

I. INTRODUCTION

P

HASE-LOCKED loops (PLL’s) operating on digital signals are fundamental in microelectronic systems. Indeed, they realize essential functions such as clock distributions and clock recovery. They can thus be found in large digital circuits such as microprocessors and on mixed-signal IC’s for digital communications. These devices process digital signals where the phase information is contained in the transition times. They usually make use of sequential phase detectors which require charge-pumps and are thus called charge-pump PLL’s [1]. The specifications for these PLL’s are extremely aggressive, especially when embedded in high-speed digital communication systems. Process variations can adversely affect circuit performance and result in low yield. To increase the number of good parts, some of the components can be trimmed. This is, however, a very expensive process. Furthermore, aging or different operating conditions can later affect circuits such that they no longer meet specifications. The solution is to allow the PLL to self-calibrate. With this property, the expensive trimming stage can be avoided and changes in the operating conditions can be tracked. However, the challenge now is to integrate on-chip a characterization scheme. This involves two functional blocks: a stimulus source and a response analyzer.

Manuscript received July 15, 1997; revised October 20, 1997. This work was supported by NSERC and the Micronet, a Canadian federal network of centers of excellence dealing with microelectronic devices, circuits, and systems for ultra large scale integration. The authors are with Microelectronics and Computer Systems Laboratory, McGill University, Montr´eal, PQ H3A 2A7, Canada. Publisher Item Identifier S 0018-9200(98)01021-X.

Fig. 1. Jitter transfer function test setup.

These circuits should not necessitate calibration themselves. Furthermore, the silicon area overhead should be small. Fig. 1 shows a typical test setup for the measurement of the jitter transfer function in a laboratory [2]. A signal source generates a high-frequency carrier which is phase modulated with the source output of a spectrum analyzer using an Armstrong phase modulator [3]. This jittery signal is fed to the clock input of a data generator. The recovered signal from the PLL is down-modulated and observed on the spectrum analyzer. It can be seen that this procedure requires precision analog signal sources and instruments. Looking for shortcuts, engineers are tempted to break the task and measure components independently [4] to infer the device characteristics. However, the nature of the phase-locked loop renders this solution unattractive as the tight feedback and the sensitivity of some nodes to parasitics makes it difficult to relate the values of components to the PLL behavior. A significant level of accuracy may only be achieved by measuring the system as a whole. In this paper we propose to verify charge-pump PLL’s characteristics using mostly synchronous digital circuits. It implies that the stimulus signals can only change at the clock edges and that the output signals may only be sampled at the same clock edges. This would seem like an unbearable constraint as jitter, both created and measured, is quantized to the test clock period. The results of any test would thus be severely limited in precision. However, this hurdle can be overcome using low-pass delta–sigma ( ) modulation [5]. This technique can encode high quality signals, in this case a sinewave, on one or a few bits. The quantization noise introduced in the operation is shaped to high frequencies. Since PLL’s are low-pass, high-frequency jitter is filtered

0018–9200/98$10.00  1998 IEEE

484

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 3, MARCH 1998

Fig. 2. Digital phase locked loop functional block diagram.

out. The output will thus exhibit a sufficiently high SNR to be considered a pure sinewave. Therefore, the input highfrequency jitter noise does not affect test results. This filtering principle was demonstrated for a voice codec, another low-pass circuit [6]. Two methods will be presented to create jitter. The first one is the digital modulation of the edges of a reference signal using a higher frequency clock. The alternative is the injection of a sinusoidal signal in the loop through a second charge pump. The second method requires a lower clock frequency but the first one is nonintrusive. Contrary to the usual measurement procedure, a fixed output jitter is selected and the amplitude of the input jitter at a given frequency is varied. Ultimately, the test signal amplitude that results in the selected output jitter is obtained and used to compute the jitter transfer function. It is important to note that while both jitter creation schemes are digital, the jitter loop injection method does not require a test clock frequency larger than the PLL operating frequency. Nevertheless, as will be shown, the test accuracy can be increased with the digital phase modulation and a higher clock frequency. The outline of the paper is the following. First, an overview of the PLL will be provided in Section II to establish the notation. In Sections III and IV, the two methods for stimulating the device will be explained. Section V will study signal generation using delta–sigma modulation. The analysis of the output signal as well as the test methodology will be the topics of Section VI. Section VII will briefly look at the issue of accuracy. Experimental setup and results will be discussed in Sections VIII and IX, respectively. Overhead will be addressed in Section X. Finally, conclusions will be drawn. II. PHASE-LOCKED LOOP OVERVIEW A block diagram of a simple charge pump PLL is shown in Fig. 2. To the left, a sequential phase detector compares the transitions of the reference input and voltage-controlled oscillator (VCO) output signals. A two-output phase-frequency detector (PFD) is illustrated, but other types of sequential phase detectors may also be employed. The output of the PFD can be any of three logic states, and thus a charge pump is required for digital-to-analog conversion. The charge pump can be of the current or voltage type. The jitter signal injection technique of Section V relies on current charge pumps, but it could also be adapted for a voltage charge pump as long as the filter allows summation of voltages. Referring back to Fig. 2, the low-pass filter removes short term variations and shapes the PLL characteristics. The VCO in turn generates a square wave whose frequency depends on the level of its analog input. A counter may be inserted in the feedback loop to lock the VCO clock to a lower frequency reference signal.

(a)

(b) Fig. 3. Models of charge-pump phase-locked loop: (a) continuous time and (b) discrete time.

The continuous-time linear model of this PLL in steady-state operation is illustrated in Fig. 3(a). The variables and are the phase of the reference signal and the VCO output signal, respectively. It should be understood that while these variables represent jitter in a signal, the other variables in the circuit stand for either voltage or current. The phase detector converts phase to one of the two analog quantities while the VCO performs the complementary operation. In Fig. 3(a), the parameter is the composite gain of the phase detector and charge-pump circuits expressed in A/rad or V/rad depending on the type of charge pump. The transfer function of the loop filter is denoted . The gain of the VCO is labeled and is stated in rad/V. If a counter is present, then its effect is lumped into this constant by dividing the VCO intrinsic gain by the counter length ( in Fig. 2). With this operation, a counter in the PLL becomes transparent for the proposed measurement method. The continuous-time model allows satisfactory predictions of the PLL behavior. On the other hand, the phase of digital signals is contained in the signal transitions and is thus better represented as a discrete-time sequence. Therefore, the analysis of PLL operating on digital signals should be performed using difference equations or -transform tools. Indeed, it has been shown that the discrete-time model is more accurate, especially at high jitter frequency [7]. This model is shown in Fig. 3(b). The closed-loop equation governing the operation of the PLL is (1) is In this equation, the discrete-time transfer function the impulse invariant transform of the series combination of the loop filter and VCO transfer functions ( ). The function is labeled the jitter transfer function. Many PLL specifications can be extracted from this transfer function such as the jitter bandwidth and the jitter peaking. The latter . measure corresponds to the maximum value of Process variations can significantly alter the charge pump and VCO gains as well as the filter passive components values. The method we propose can be used to automatically trim components for compliance with a desired . Alterna-

VEILLETTE AND ROBERTS: ON-CHIP MEASUREMENT OF THE JITTER TRANSFER FUNCTION

485

tively, it can be used to screen devices by comparing the measured magnitude of against a mask. III. INPUT DIGITAL PHASE MODULATION The first method that we will consider to generate the test stimulus is the digital phase modulation of the PLL input. Sinusoidal jitter of arbitrary frequency and amplitude can be generated by modifying the edges of the reference signal digitally. A test clock frequency which is a multiple of the reference signal frequency is a prerequisite. This method therefore may not be suitable for high-frequency PLL’s. The principle is to control the instantaneous phase of a 1-b digital signal by delaying it by an amount set by a multibit digital signal. Fig. 4 illustrates a circuit that can realize this operation. The input signal to the delay cells string is a pulse whose period is an integer multiple of the test clock period. The digital signal generator output, denoted , is updated at every PLL cycle and delays the input signal by a variable amount. Fig. 5 shows a typical waveform at the output of this circuit for a test clock ( ) eight times the reference signal frequency ( ). The dotted lines represent the test clock edges with the thick ones used as zero jitter marks. The PLL input signal jitter can thus be expressed as rad

(2)

Obviously, the number of inputs to the multiplexer is limited by the silicon area available but also by the ratio of the frequency of the clock operating the delay cells to the frequency of the reference signal. The limited number of delay cells is likely to make the jitter of the input signal coarsely quantized such that its SNR is unacceptably low. However, PLL’s are frequency selective with respect to jitter. Indeed, the jitter transfer function is low-pass and the device filters high-frequency jitter. When incorporating a modulator in the signal generator, quantization noise can be shaped to high frequencies [5]. Therefore, the encoded signal from a low-pass modulator, such as in Fig. 5, contains a high-quality low-frequency sinewave and high-frequency is quantization noise as illustrated in Fig. 6(a). While a multibit signal as it controls an multiplexer, modulation can ultimately encode signals on a single bit. Because the PLL is low-pass and is designed to reject highfrequency components in the loop, the quantization noise will be suppressed, leaving only the sinusoid in the output jitter as shown in Fig. 6(b). Section V will examine how such a signal can be generated. IV. SIGNAL INJECTION

IN THE

LOOP

The second option for the generation of jitter is the injection of a test signal at the input of the loop filter as shown in Fig. 7. This signal source, represented here by the variable , is injected through a second charge pump with gain . It should be understood that this signal source is not a jittery digital signal but an analog signal embedded in a 1-b digital signal encoded using a modulator. However, this signal, when referred back to the input, is equivalent to input

Fig. 4. Digital phase modulation circuit.

Fig. 5. Typical digital phase modulated waveform.

(a)

(b)

Fig. 6. Phase spectrum: (a) input signal jitter and (b) output signal jitter.

Fig. 7. Discrete-time model of phase-locked loop modified for signal injection.

jitter. The PLL reference signal meanwhile is a square wave and therefore the input jitter will be zero. This setup will be used to evaluate the characteristics of the PLL through the measure of its transfer function. Examining the model of Fig. 7, it can be seen that this transfer function will be equal to (3) It is thus equivalent within a multiplicative constant to the jitter . For a spectral test such as the transfer function jitter transfer function, the test signal is a sinewave encoded into a single bit. The quantization noise is concentrated at high frequencies and is filtered out as explained in the previous section. It is important to note that the clock period for signal injection must be an integer multiple of the reference signal period to prevent aliasing of the quantization noise back in the PLL passband. This condition implies that the signal

486

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 3, MARCH 1998

(a)

Fig. 8. Injecting a signal into the PLL.

injection frequency cannot be higher than the reference signal frequency. Converting the 1-b digital stream to an analog signal and summing it with the output of the phase detector is quite simple. A second current charge pump is placed in parallel with the phase detector charge pump and both outputs are connected together, forming a current summing node as shown in Fig. 8. The accuracy of this analog-todigital conversion is a function of the matching of the two current sources and , typically ranging between 0.1–1% in monolithic form. In this schematic, the impedance implements the loop filter and is the controlling voltage of the VCO.

(b) Fig. 9. Spectrum of test jitter signals: (a) signal band and (b) Nyquist interval.

V. NOISE-SHAPED SINUSOID GENERATION The basis of the jitter generation methods described in the previous two sections is low-pass delta–sigma modulation. This technique allows the encoding of a bandlimited signal represented by a large number of bits onto a very small number of bits. An error is created by this quantization operation but it is filtered such that it appears mostly at high frequencies. This feat is realized by feeding back the quantization error to a filter and summing it with the input before the next quantization. Using delta–sigma modulation, arbitrary signals, such as a sinusoid, can be represented by a square wave (1-b signal) with the difference being composed mostly of high-frequency noise. Fig. 9 shows the spectrum of a sinusoid for jitter generation coded using different quantizers. Two curves show the power density spectrum of multibit signals for the digital phase modulation method. They were generated for different test clock ratios ( ) and thus different quantization steps ( ). The last curve shows a 1-b signal for the signal injection modulator was method. The same second-order low-pass used to produce each curve, only the quantizer was changed. The input signal amplitude was kept the same in each case. As explained previously, the signal is located at low frequencies where the noise power is small. It can also be seen that the quality of the signal improves with the number of quantization steps (smaller step size ). This is evident by the drift in

the curve downward in Fig. 9 as decreases. Three possible methods, illustrated in Fig. 10, may be employed on-chip to generate any of the signals of Fig. 9. The selection of one of these for implementation is based on the available resources in the system, silicon area, PLL speed, and required accuracy. Each method is described below. A. Direct Frequency Synthesis The most straightforward and versatile signal generation scheme is a ROM-based digital frequency synthesizer [8] followed by a low-pass modulator. The output of the modulator may be a 1-b signal or a multibit signal [9] as required by the jitter creation technique used. However, the direct frequency synthesis solution requires a large silicon area and is thus usually not a good choice for built-in selftest (BIST). However, if a large block of RAM is present in the system, then it could be enrolled to store data for signal generation during the PLL test phase. B. Delta–Sigma Oscillator An alternative is to use a circuit called a low-pass oscillator [10]. The circuit, illustrated in Fig. 10(b), is a digital resonator where the frequency setting multiplier has been replaced by the combination of a modulator and

VEILLETTE AND ROBERTS: ON-CHIP MEASUREMENT OF THE JITTER TRANSFER FUNCTION

(a)

taken. Some form of optimization is necessary to obtain good results [12]. Also, a different data stream is required for each frequency and amplitude desired. Nonetheless, considerable speed can be achieved with this technique, and the overhead is the smallest of the three methods. VI. EVALUATION

(b)

(c) Fig. 10. Generating delta–sigma encoded sinusoids: (a) direct digital frequency synthesis, (b) delta–sigma oscillator, and (c) fixed-length periodic byte stream.

Fig. 11.

Comparing waveform against jitter threshold.

a multiplexer. To avoid large multiplexers, the output of the oscillator is usually a single bit. Therefore, a second modulator may be required for multibit signal generation. However, as the structure is similar to the first one except for the quantizer, hardware can be time-shared. Because a ROM or a multiplier is not required, the silicon implementation of oscillators is very efficient. However, the presence of a nonlinear block in the feedback loop makes this device difficult to predict using a linear model. Off-line simulations are required to achieve maximum accuracy. Nevertheless, oscillators can be implemented quickly and may possibly use available on-chip computing resources such as a digital signal processor (DSP). C. Fixed-Length Bit Streams The last method consists in generating a data stream from a software sinewave generator and low-pass modulator and then selecting a subset of this stream. This subset is stored in memory and the resulting data stream is then repeated [11] as illustrated in Fig. 10(c). One can view this approach as a special case of the ROM-based digital frequency synthesis scheme presented above. This method is particularly useful for single-bit signals as they can be represented using a very small number of bits, on the order of a hundred. The downside is that signal quality will vary widely with different subsets of the same length of a given modulator output if care is not

487

OF THE

OUTPUT JITTER

The exact gauging of the jitter response is rather difficult. However, a measure that can be made with good accuracy is the point when reaches a predetermined value or threshold. The edges of the test clock, assumed to be jitter free, will be used to establish this threshold. Fig. 11 illustrates how this can be accomplished. The dashed lines represent the rising edge of the test clock with the bold ones indicating the rising transition of the reference clock. The output signal is sampled at positive test clock edges until two adjacent samples yield a zero followed by a one, indicating a rising edge of the signal. If this rising edge occurred in intervals immediately before or after the reference test clock edge, then jitter is below threshold; otherwise, an error is generated. Over a time interval, the number of errors are counted and a bit error rate (BER) measure can be obtained. This averaging is done to prevent glitches and noise signals from significantly affecting the final result. In Fig. 11, a ratio of the test clock frequency over the reference signal frequency of eight allows a minimum value for the threshold of . Larger values could also be used for the threshold by allowing more test clock cycles in the valid interval. The previous method requires a test clock frequency at least three times the reference frequency. Indeed, three periods of the test clock for a PLL period is the limiting case as two of the clock periods must compose the valid interval, leaving one to catch jitter above threshold. A different scheme, however, can be used that compares jitter against a rad threshold using a 50% duty cycle test clock of the same frequency as the reference clock. Fig. 12(a) illustrates how it can be implemented with a few gates and registers. The threshold is verified by sampling a data signal with both the reference clock, used at the input of the PLL, and the recovered clock at the VCO output. This data signal will toggle between one and zero at the reference signal falling edges, resulting in a frequency half that of the reference clock. When the output jitter exceeds rad, sampling errors will occur with the VCO output clock because it will sample a different data than the reference clock. This is shown in Fig. 12(b) where an error occurs when the jitter goes from 0.9 to 1.1 . This circuit is somewhat similar to the circuit typically used for a jitter tolerance test [2]. A single frequency point test is thus performed as follows: an input jitter ( or ) is applied to the PLL and its amplitude is modified until the maximum value resulting in output jitter below threshold is found. The fastest procedure for obtaining or is to use a binary search algorithm. The initial amplitude ( ) is zero and the initial increment . ( ) is 0.5. A test is performed with amplitude If the VCO output jitter is lower than the threshold, then the next amplitude will be . Otherwise, the

488

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 3, MARCH 1998

(a)

(b) Fig. 12.

(a) Circuit to evaluate  rad jitter threshold. (b) Typical waveforms.

amplitude remains the same ( ). The increment is then divided by two and the procedure is repeated until the desired accuracy is achieved. The uncertainty associated with the signal amplitude following a step measure will be which equals 2 . However, arbitrary accuracy cannot be achieved as noise sources are always present. VII. ACCURACY The accuracy of the measured results will depend on many factors. Foremost is the residual input signal jitter quantization noise which passes through the PLL. The jitter creation clock frequency along with the modulator noise transfer function, the number of quantization bits, and the PLL bandwidth will influence the effective SNR of the measured jitter at the output. It is interesting to note that for each frequency point on the jitter transfer function, the SNR of the measured output jitter will remain constant. This is because, under a white noise assumption for the quantizer error, the noise present in the PLL input signal jitter does not vary for a selected modulator implementation. It is only a function of the quantizer step and does not change with the signal amplitude and frequency. Neglecting internal noise sources, the PLL output jitter noise power density is a result of the input jitter noise power density shaped by the jitter transfer function. It can thus be seen that the PLL output jitter noise power is independent of the parameters of the test sinusoid. Furthermore, as the threshold is fixed, the PLL output sinewave jitter amplitude must also be constant. In the measurement procedure, the input jitter sinewave amplitude is varied to account for the response of the PLL at a given frequency. Consequently, for a given PLL, the SNR of the measured output jitter is fully determined from the quantizer step size for the input signal jitter and the jitter threshold at the output. Fig. 13 shows the theoretical SNR of the output jitter with respect to the ratio of the cutoff frequency of a second-order PLL over the PLL lock frequency, referred to here as the PLL relative bandwidth. A curve is displayed for each of the three input jitter quantization granularities. A second-order modulator is used to generate the input signal, and the output jitter amplitude is held constant at . For

Fig. 13. Signal-to-noise ratio of the output jitter versus the PLL relative bandwidth.

reference, digital data communication systems such as SONET mandate a relative loop bandwidth of about 0.1%. Also of concern is the amount of jitter present in the test clock. However, the clock signal should be generated by a tester and, provided a sound floorplan, this source of error should be negligible. A more significant problem is the jitter, both static and random, generated internally. It will add to the jitter from the input signal and thus modify the effective output jitter threshold. Its effect can only be reduced by using a jitter threshold much larger than the jitter noise. Alternatively, it could be accounted for and subtracted from the final results. Finally, for the signal injection method, the matching of the two charge pumps is obviously a cause of errors. VIII. EXPERIMENTAL SETUP A test setup, whose schematic is shown in Fig. 14, was implemented on a breadboard using off-the-shelf components. The device under test (DUT) is centered around the VCO from a 74HC4046 monolithic phase-locked loop. However, because this IC uses a voltage charge pump followed by a passive filter and since its phase detector could not be separated from this block, an XC4010 FPGA was programmed to implement the classical phase-frequency detector [13]. The charge pump is built out of discrete NPN and PNP transistors, a resistor, and analog switches from a 74HC4066. A circuit is also required to maintain the transistors inside their linear region when the switches are open, but it is not shown here. The PLL was operated at 100 kHz as parasitics of the board and the time constant of some components do not allow for a higher frequency. However, the measurement scheme should be extendable to much higher frequency as the test circuits are similar in nature to the PLL components. Both digital signals for jitter creation ( and ) are generated by a low-pass oscillator programmed on the same field-programmable gate array (FPGA). It uses 24-b buses to achieve a tunability of 55 parts per million of its programmable clock. A 3-b quantizer in addition to the standard 1-b quantizer makes the signal generator capable of

VEILLETTE AND ROBERTS: ON-CHIP MEASUREMENT OF THE JITTER TRANSFER FUNCTION

Fig. 14.

489

Experimental setup.

multibit output [refer to Fig. 10(b)] for the purpose of digital phase modulation. It should be noted that apart from the modulator circuitry is not duplicated as it will quantizer, be operated in time-shared mode at double speed for multibit operation. The input to the PLL can be set to accommodate both jitter generation methods. For the loop jitter injection scheme, a 100kHz square wave is presented to the input of the phase detector. The input can also be the same signal phase modulated with the help of an 800-kHz test clock. Eight jitter steps are therefore possible, resulting in a quantization. Various jitter threshold circuits are also implemented on the FPGA. The jitter threshold circuit of Fig. 12(a) will be employed in conjunction with the loop jitter injection. On the other hand, thresholds of and are implemented for the digital phase modulation method, making use of the higher frequency test clock. For each test, a warm-up stage of 214 data cycles is executed to remove transients before a 216 data cycle test stage is performed. The error threshold is set to 64, corresponding to a BER of 10 3 . A control module built around a finite state machine selects the amplitude of the input jitter for the ensuing test according to the output of the jitter threshold circuit, using the binary search algorithm. At each frequency point, the amplitude is resolved to an accuracy of 15 b within 13 s. The entire digital circuitry for all the experiments requires 81% of the resources of an XC4010 FPGA. This experimental setup is connected to a workstation through I/O modules to allow a driving software to set the low-pass oscillator frequency as well as read the amplitude. IX. EXPERIMENTAL RESULTS The jitter transfer function measurement was carried out for both the jitter injection and the digital phase modulation techniques on two different PLL configurations with different bandwidths and damping values. Table I summarizes the main parameters of these experiments. The same current amplitude was used for both charge pumps ( ). The transfer functions are presented in the continuous-time domain as this is more typical of what can be found in industry.

TABLE I EXPERIMENT PARAMETERS

The results of the experiments on the first configuration are shown in Fig. 15. A measured jitter transfer function for each jitter generation method is displayed. The dotted line represents the theoretical jitter transfer function as predicted from the direct measurement of the components. The phase modulation scheme used a threshold for this experiment. The curve shows a 0.4-dB offset which can be attributed mostly to the static jitter of the PLL. For the other jitter creation scheme, the signal injection clock was chosen to be 50 kHz, that is half the PLL rate, in order to demonstrate the flexibility in selecting this parameter. The offset is larger, possibly because of mismatch between the two charge pumps realized out of discrete transistors. The first two columns of Table II summarize the features of the curves after removal of the offsets. Both methods yield similar results for the PLL bandwidth and jitter peaking. The theoretical predictions are slightly off, most probably because of the parasitics of the setup which were not accounted for in the calculations. The jitter transfer functions measured in the second experiment are shown in Fig. 16. This PLL exhibits a larger bandwidth and is more damped. It can be seen that the curve obtained here with the jitter injection technique is of lesser quality. This came about because the larger bandwidth yields a lower output jitter SNR. From the graph of Fig. 13, it can be seen that this SNR is barely over 20 dB. On the other hand, the digital phase modulation still shows a smooth curve because of the 3-b quantization which results in lower jitter

490

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 3, MARCH 1998

sought. However, in many applications, a smaller number of signal frequencies and amplitudes are necessary and a fixedlength periodic bit stream could generate the signals for the cost a few kilobits of RAM. For example, to verify that the bandwidth of the PLL is smaller than some value, only one test is required. Specific values for overhead or measurement time depend heavily on the PLL application. Moreover, one should not have a dogmatic stance about overhead as the addition of the on-chip measurement circuits add value to a system. The economic gains from a BIST may far outweigh the cost of extra silicon.

XI. CONCLUSIONS

Fig. 15.

Jitter transfer function for experiment 1. TABLE II RESULTS SUMMARY

We have presented a PLL jitter transfer function measurement technique which is entirely digital except for the possible addition of a charge pump. The technique is suitable for onchip measurement since it does not require trimming and the silicon overhead is small. Two methods were introduced for the creation of jitter, allowing tradeoffs between test clock frequency on one side and loading, complexity, and accuracy on the other side. Experimental results were presented which suggest this scheme could be successfully implemented on silicon. ACKNOWLEDGMENT The authors acknowledge the suggestions of B. Gerson from PMC Sierra. REFERENCES

Fig. 16.

Jitter transfer function for experiment 2.

noise levels at the output. Again, the meaningful parameters are summarized in the two right-most columns of Table II. X. IMPLEMENTATION For any integrated measurement scheme, the area overhead is obviously a major concern. While the digital portion of the experimental setup uses a large portion of the FPGA, a much more compact implementation is possible. Indeed, a oscillator was selected as the signal generator because of its versatility as a complete jitter transfer function was

[1] F. M. Gardner, “Charge-pump phase-lock loops,” IEEE Trans. Commun., vol. COMM-28, pp. 1849–1857, Nov. 1980. [2] L. DeVito, “A versatile clock recovery architecture and monolithic implementation,” in Monolithic Phase-Locked Loops and Clock Recovery Circuits: Theory and Design, B. Razavi, Ed. New York: IEEE Press, 1996. [3] E. H. Armstrong, “A method of reducing disturbances in radio signaling by a system of frequency modulation,” in Proc. IRE, May 1936, vol. 24, no. 5, pp. 689–740. [4] P. Goteti, G. Devarayanadurg, and M. Soma, “DFT for embedded charge-pump PLL systems incorporating IEEE 1149.1,” in Proc. IEEE 1997 CICC, Santa Clara, CA, May 1997, pp. 210–213. [5] M. W. Hauser, “Principles of oversampling A/D conversion,” J. Audio Eng. Soc., vol. 39, nos. 1/2, pp. 3–26, Jan./Feb. 1991. [6] M. F. Toner and G. W. Roberts, “A BIST scheme for an SNR, gain tracking, and frequency response test of sigma–delta ADC,” IEEE Trans. Circuits Syst.–II, vol. 41, pp. 1–15, Jan. 1995. [7] J. P. Hein and J. W. Scott, “z -domain model for discrete-time PLL’s,” IEEE Trans. Circuits Syst., vol. 35, pp. 1393–1400, Nov. 1988. [8] J. Tierney, C. M. Rader, and B. Gold, “A digital frequency synthesizer,” IEEE Trans. Audio Electroacoustic, vol. 19, pp. 48–57, 1971. [9] J. G. Kenney and L. R. Carley, “Design of multi-bit noise shaping data converter,” Analog Integrated Circuits and Signal Processing J., May 1993, vol. 3, no. 3, pp. 259–272. [10] A. K. Lu, G. W. Roberts, and D. A. Johns, “A high-quality analog oscillator using oversampling D/A conversion techniques,” IEEE Trans. Circuits Syst.–II, vol. 41, pp. 437–444, July 1994. [11] E. M. Hawrysh and G. W. Roberts, “An integration of memory-based analog signal generation into current DFT architectures,” in Proc. 1996 ITC, Washington, DC, Oct. 1996, pp. 528–537. [12] B. Dufort and G. W. Roberts, “Signal generation using periodic single and multi bit sigma–delta modulated streams,” in Proc. IEEE 1997 ITC, Washington, DC, Nov. 1997, pp. 396–405. [13] C. A. Sharpe, “A 3-state phase detector can improve your next PLL design,” EDN Mag., pp. 55–59, Sept. 1976.

VEILLETTE AND ROBERTS: ON-CHIP MEASUREMENT OF THE JITTER TRANSFER FUNCTION

Benoˆıt R. Veillette (S’97) was born in TroisRivi´eres, Qu´ebec, Canada, on January 1, 1971. He received the B.Eng. (Honors) degree and the M.Eng. degree from McGill University, Montr´eal, PQ, Canada, in 1993 and 1995, respectively. He is now completing the Ph.D. degree in electrical engineering from the same institution. His current research interests are in delta–sigma modulation, analog integrated circuits for communications, and mixed-signal testing.

491

Gordon W. Roberts (S’85–M’85) was born in Toronto, Canada, in 1959. He received the B.A.Sc. degree in electrical engineering from the University of Waterloo in 1983 and the M.Eng. and Ph.D. degrees also in electrical engineering from the University of Toronto in 1986 and 1989. In 1989 he joined the faculty of McGill University where he is presently an Associate Professor. He co-authored several text books and has contributed seven chapters to various edited volumes related to analog IC design and test. He is presently an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS and Editor of the IEEE Design and Test Magazine. He has received numerous department and faculty awards for teaching, as well as several IEEE awards for his work related to mixed-signal testing.

Physical Processes of Phase Noise

Differential LC Oscillators

Systems Laboratory University o f California Los Angeles,CA 90095-1594

Introduction There is an unprecedented interest among circuit designers today to obtain insight into mechanisms of phase noise in LC oscillators. For only with this insight is it possible to optimize oscillator circuits using low-quality integrated resonators to comply with the exacting phase noise specifications of modern wireless systems. Various numerical simulators are now available to assist the circuit designer [ 11, [2], [3], in some cases accompanied by qualitative interpretations [4]. At present, therefore, the situation of the oscillator designer is similar to the designer of amplifiers who is equipped only with SPICE, but who lacks physical insight and methods for simple yet accurate analysis with which to optimize a circuit. Over the years, various attempts at phase noise analysis have produced results that are variations on Leeson’s classic “heuristic derivation without formal proof” [SI, [6]. These analyses are based on a linear model of an LC resonator in steady-state oscillation through application of either feedback or negative conductance. The results confirm Leeson by showing that phase noise is proportional to noise-to-carrier ratio and inversely to the square of resonator quality factor. However, without knowledge of the constant of proportionality, which Leeson leaves as an unspecified noise factor, the actual phase noise cannot be predicted.

The results are validated against SpectreRF simulations and measurements on two differential CMOS oscillators tuned by resonators with very different Qls.

Recognizing Phase Noise For the purposes of analysis, a noise spectrum is considered as consisting of uncorrelated sinewaves in a 1 Hz bandwidth at any given frequency. Voltage or current noise produces amplitude and phase fluctuations when superimposed on a periodic signal (from now on, a large sinewave V0sin(2nf,t)). This is clearly seen [lo] by isolating one sinewave vn in the noise spectrum, say at a frequency offset +fmfrom the sinewave frequency f,. Figure 1 shows this as a phasor vn rotating relative to the sinewave phasor V,, which is then decomposed into two equal collinear phasors at +fm, and two antiphase conjugate phasors which are assigned a negative relative frequency -fm . Grouping the phasors pairwise as ?fm, it is seen that one pair modulates the amplitude of the sinewave with time (AM), while the other sweeps its phase (PM). Thus, half of any additive noise on a sinewave produces phase noise, the other half amplitude noise. When sin(w,t) is accompanied either by noise sinewave phasors +sin(wO+wm)t,+sin(wO-wm)t or by fcos(a,+am)t, +cos(o,a,)t, then phase noise alone is present.

Simple Model of the Differential Oscillator

It is now well understood that the large-signal periodic switching of a self-limited oscillator [7] underpins this noise factor [SI. At first sight, an accurate noise analysis of an oscillator subject to periodic bias currents appears intractable, however by using sensible approximations Huang has solved this problem for a Colpitts oscillator [9] and obtained good agreement between analysis and measurements of thermally induced phase noise. The mechanisms of flicker noise upconversion, which are important in CMOS oscillators, remain obscure.

This paper treats the well-known tail-current biased differential L C oscillator (Figure 2). In steady state, the differential pair acts as a negative conductance that switches the tail current I, into the LC resonator. Owing to filtering in the L C circuit, the square wave of current creates a sinusoidal voltage across the resonator of amplitude (4/z)I,R. This voltage drives the differential pair into switching, thus sustaining oscillation. In a CMOS oscillator the amplitude may build up to several volts, eventually limited by the supply voltage.

In this paper we concentrate on an understanding of the popular differential LC oscillator. We introduce simple models to capture the nonlinear processes that convert voltage or current thermal noise in resistors or transistors into phase noise in the oscillator. The analysis does not require hypothetical elements, such as limiters or amplitude control loops, to fully explain phase noise. A simple expression at the end accurately specifies thermally induced phase noise, and lends substance to Leeson’s original hypothesis. Next, the upconversion of flicker noise into phase noise is traced to mechanisms first identified in the 1930’s, but apparently since forgotten. Unlike thermally induced phase noise, which appears as phase modulation sidebands, flicker noise is shown to upconvert by bias-dependent frequency modulation.

In previous work on noise in mixers [ll], we have shown how a simple model of the switching differential pair is sufficient to explain all frequency translations of noise. This model is used here. Suppose that some noise (v”) accompanies the resonator sinewave. Assuming that a small fraction of the resonator voltage around the zero crossing is enough to fully switch the differential pair, then the noise simply advances or retards the instant of zero crossing (Figure 3(a)). The randomly pulse-width modulated current at the switch output may be decomposed into the original periodic square wave in the absence of noise, superimposed with pulses of constant height but random width (Figure 3(b)). In turn, these pulses may be approximated by a train of impulses at twice the oscillation frequency multiplying the original noise waveform vn(t) (Figure 3(c)).

25-1-1 0-7803-5809-O/OO/$lO.OO 0 2000 IEEE IEEE 2000 CUSTOM INTEGRATED CIRCUITS CONFERENCE

569

Thermally Induced Phase Noise Resonator Noise Now consider a current source insin((w,+wm)t+@) representing noise in the loss conductance of the resonator, where i:=4kT/R. According to the model above, this modulates the zero crossing instants of the differential pair, producing a current which, in addition to the usual square wave, also consists of current pulses sampling this noise at 20,. After sampling, frequency components appear at O,?O,, 3w,+0,, ... However, usually the resonator will filter the 3"' and higher harmonics, leaving o ~ ~ was, the only important terms. These will induce a symmetric voltage response in the resonator, and through feedback arrive at steady state. The steady-state oscillation, in general, is of the form: uOut= V,sinw,,t + Asin(w, - w,)t Bcos(w,, - ~ , ~ ) t

+

+

+

+

+Csin(w,, w,")t Dcos(w,, w,,,)t and here A=-C= i"x(L0,2/40,), while BzD-0. The relative signs of A and C prove that the steady-state response to current noise in the resonator's resistor is phase noise in the oscillator. The singlesideband phase noise density is found by the ratio of the sideband power at a given frequency to the power in the fundamental oscillation frequency. Thus, the thermally induced phase noise density due to resonator loss is:

where N,=2, the number of loss sources (in the left and right resonators) and N,=4 because uncorrelated quadrature noise originating at o,+o, contributes to SSB phase noise at offset w,.

Tail Current Noise The switching action of the differential pair commutates noise in the tail currents like a single-balanced mixer. The noise is translated up and down in frequency, and enters the resonator. The resulting voltage drives the differential pair, the noise components modulating the zero crossing instants. The resulting impulses of current feed back into the resonator. The steady-state solution is found by solving simultaneous equations of a form that anticipates the end result, much like in any feedback circuit. The single-balanced mixer shows the largest conversion gain around the fundamental switching frequency, 1/3'd the current conversion gain around the 3'" harmonic, and so on. Therefore, only mixing by the fundamental at is important. Noise originating in the tail current at W, upconverts to w0+w,. Similarly, noise at 20,,f0, downconverts to o , ~ o , . Analysis shows that the upconversion produces coefficients A=C, B=-D, both of which indicate AM only. It should be noted that AM noise superimposed on the resonator fundamental frequency does not modulate the zero crossings of the switching differential pair, and therefore does not propagate in the feedback loop back into the resonator. However, the downconversion results in phase noise only, with A=-C, and B=D=O. The phase noise caused by thermal noise originally at 20, is:

where y is the noise factor of a single FET, classically 2/3. It is important to note that the AM noise resulting from upconversion, if impiessed across a varactor at the resonator, will modulate the varactor, thus the oscillation frequency by AM-to-FM conversion [E!]. Although the process is different, the resulting sidebands are indistinguishable from PM noise sidebands. Unlike the other mechanisms of phase noise, this effect depends on the varactor characteristics and VCO tuning range and it may be significant only in certain situations.

DifferentialPair Noise Noise originating in the differential pair is unlike the previous two cases. There, only certain parts of the noise spectrum contributed significantly to the total phase noise. White noise in the resonator is filtered at harmonics of the resonant frequency. White noise in the tail current only experiences a significant conversion gain around the second harmonic of the oscillation frequency. However, the simple model says that an impulse train samples white noise in the differential pair, which if true, will cause it to accumulate without bound at any specified offset frequency om. In reality, any practical differential pair requires a non-zero input voltage excursion to switch, and this is provided by the oscillation waveform across the resonator. Therefore, noise in the differential pair is actually not sampled by impulses, but by time windows of finite width. The window height is proportional to transconductance, and width is set by tail current, and slope of the oscillation waveform at zero crossing. The input-referred noise spectral density of the differential pair is inversely proportional to transconductance. Thus, the narrower the sampling window, that is, the larger the sampling bandwidth, the lower the noise spectral density [ 111. Analysis shows that the noise bandwidth product is constant, and produces pure phase noise. After taking into account the accumulation of frequency translations throughout the sampling bandwidth, the following compact yet exact expression is reached:

We note that [8]has arrived at a similar analysis for the first two sources of noise, but was unable to obtain a closed-form expression for this last term.

Proving Leeson's Hypothesis Leeson originally postulated that thermally induced phase noise in any oscillator takes the form:

where F is an unspecified noise factor. By summing the expressions obtained above for thermally induced phase noise arising from the resonator, differential pair and tail bias current, respectively, for the differential oscillator Leeson's noise factor is:

We emphasize that this simple expression captures all nonlinear effects and frequency translations. At low bias currents while the

570

25-1-2

amplitude of oscillation is smaller than the power supply, the differential pair acts as a pure current switch driving the resonator and V,=(4/x)RIT [13]. Then the second term comprising F simplifies to 2y. This means that as tail current increases and assuming gmblrSR is held constant, the noise factor remains constant and phase noise improves as V i , that is, as I,’. This has been observed by others [131. However, beyond a critical tail current the amplitude Vuis pegged constant, limited by supply voltage. Further increases in I, will cause the differential pair’s contribution to noise factor to rise, degrading phase noise proportionally to I, (Figure 4). Therefore, for least phase noise the tail current should be just enough to drive the amplitude to its maximum possible value.

However, this is not the only mechanism of indirect FM. At RF, active device capacitance is also significant, and it no longer appears as a pure negative resistance to the resonator. For example, the differential pair commutates current flowing in the capacitor C, at the tail, which presents a negative capacitor (or, equivalently, an inductor in a narrowband sense) at the differential output (Figure 6). This speeds up the oscillation frequency. Flicker noise in the differential pair FETs modulates the duty cycle of commutation, and therefore the effective negative capacitance. Here, too, Groszkowski gives a method of systematic analysis [16], which captures the reactive components in the active devices by measuring the area n enclosed by hysteresis in the dynamic negative resistance curve.

Flicker Noise Upconversion

Aw _ - --

Close-in to the oscillation frequency, the slope of the phase noise spectrum in all CMOS VCO’s turns from -20 to -30 dB/decade. This is ascribed to the upconversion of flicker noise in FETs. To understand this, let us first see if the analysis above explains this upconversion. Flicker noise in the tail current source at frequency W, indeed upconverts to O&O, and enters the resonator, but as AM, not PM noise. Therefore, in the absence of a high gain varactor to convert AM to FM, flicker noise in the tail current will not appear as phase noise. Next consider flicker noise in the differential pair. T h e preceding analysis says that this modulates zero crossings, and injects a noise current into the resonator consisting of flicker noise sampled by an impulse train with frequency 20,. Thus noise originating at frequency O, produces currents at O, and at 20,f0, . Both frequencies are strongly attenuated in the resonator, and neither explains flicker-induced phase noise at w,+o,. One can only conclude that the mechanisms of flicker noise upconversion are quite different than for thermally induced phase noise.

FundamentalSources of FM in Oscillators In 1934, Groszkowski [151 while studying electronic oscillators realized that the steady-state oscillation frequency seldom coincides with the natural frequency of the resonator which tunes the oscillator. He found that the discrepancy arises because the active device in the oscillator, such as the differential pair current switch in the circuit considered here, drives the resonator with a harmonic-rich waveform. T h e harmonics will flow into the lower impedance capacitor (Figure 5) and upset the exact reactive power balance between the Land the C required for steady state. Now the frequency of oscillation must shift down until the reactive power in the inductor increases to equal the reactive power in the capacitor due to the fundamental and all harmonics. T h e shift, Am, is:

Aw _ --1 w, c

2Q‘

2

n2(1-n2) (1 - n2)’ n2/ Q’

+



m’

where mnis the normalized level of the nthharmonic. AO is the sum of all negative terms, which means that oscillation frequency slows down with more harmonic content. Now the harmonic content at the output of a periodically switching differential pair is a function of the tail current. In the autonomous oscillator, the drive to the differential pair is also a function of tail current. The sensitivity a ~ / a I ,is responsible for an “indirect” F M [7] due to flicker noise in I,

w,

n

2Q2w,L

+q -

n2(1- n’) .m: 2Q2 n=2 (1 n’)’ n2/Q’

+

Thus the sensitivity of the reactance to bias current or offset voltage in the differential pair is estimated, which is another means whereby flicker noise modulates the frequency of oscillation.

Validation of Analysis T h e phase noise model was validated on two CMOS differential L C oscillators. One oscillator uses a low Q, on-chip inductor, while the other uses off-chip inductors with large Q Flicker noise is modelled as a bias-independent, gate-referred voltage source [ 141. T h e measured data and SpectreRF simulations are plotted with predictions based on this paper. Excellent agreement (Figure 7) is found across the entire spectrum, which encompasses thermally induced phase noise and upconverted flicker noise. K. S. Kundert, “Introduction to RF simulation and its application,” IEEE J’ournulof SolidSture Circuitr, pp. 1298-319, 1999.

A. Demir, A. Mehrotra, and J. Roychowdhury, “Phase noise in oscillators: a unifying theory and numerical methods for characterisation,” in Derign und Automution Confirence, San Francisco, p p 26-3 1, 1998. B. De Smedt and G. Gielen, “Accurate simulation of phase noise in oscillators,” in Europun Solid-Sture Circuits Confirence, p p 208-1 1, 1997. A. Hajimiri and T H. Lee, “A general theory of phase noise in electrical oscillators,” IEEEJournul of 3olid-Stute Circuirr, vol. 33, no. 2, p p 179-94, 1998. D. B. Leeson, “A Simple Model of Feedback Oscillator Noise Spectrum,” Proceedings of the IEEE, vol. 54, pp. 329-330, 1966. J. Craninckx and M. Steyaert, “Low-noise voltage-controlled oscillators using enhanced LC-tanks,” I E E E Trumucrionr on Circuirr und Syslems 11: Anulog und DigiiulSignulProcesring, vol. 42, no. 12, p p 794-804, 1995. K. K. Clarke and D. T Hess, Cummuniculion Circurrr: Anulysis undDerign. Malabar, FL: Krieger, 1971. C. Samori, A. L. Lacaita, E Villa, and E Zappa, “Spectrum folding and phase noise in LC tuned oscillators,” IEEE Trunructionr on Circuits andSysremr 11: Anulog und DigitulSignul Processing, vol. 45, no. 7, pp. 781-90, 1998. Q Huang, “On the exact design of R F oscillators,” CICCProceedings, pp. 4 1 4 , 1998. W. P. Robins, Phure Noire in SignulSuurcer. London: Peter Peregrinus, 1982. H. Darabi and A. Abidi, “Noise in CMOS Mixers: A Simple Physical Model,” IEEEJournuluf S o l i d S r ~ r eCircuirr, vol. 35, no. 1, in press, 2000. C. Samori, A. L. Lacaita, A. Zanchi, S. Levantino, and E Torrisi, “Impact of Indirect Stability on Phase Noise Performance of Fully-Integrated LC Tuned VCOs,” in Europeun Solid-Stute Circuits Confirence, Duisburg, Germany, p p 202-205, 1999. A. Hajimiri and T H . Lee, “Phase Noise in CMOS Differential L C oscillators,” in Symporium on VLSI Cirnrirr, Honolulu, HI, pp. 48-51, 1998. J. Chang, A. A. Abidi, and C. R. Viswanathan, “Flicker Noise in CMOS Transistors from Subthreshold to Strong Inversion at Various Temperatures,” IEEE Trunructionr on Electron Dcuicer, vol. 41, pp. 1965-1971,1994. J. Groszkowski, “The Interdependence of Frequency Variation and Harmonic Contcnt, and the problem of Constant-Frequency Oscillators,” Proc. of the I R E , vol. 21, no. 7, pp 958-981, 1934. J. Groszkowski, Frequency of Sel/-OrciNu~ions.Oxford: Pergamon Press, 1964.

25-1-3

571

Figure 5. Harmonics of oscillating current flow into capacitor, increasing its reactive energy. Steady state frequency shifts down until inductor energy balances.

Figure 1. Noise phasor added to a sinewave decomposes into PM and AM sidebands.

T Figure 2. Differential LC oscillator biased by tail current.

Nonlinear active /Y~LV~/(~L)~

Figure 6. Capacitors associated with active device appear as reactances across the resonator, shifting frequency.

-40

LO Voltage

-60

N

5

til

l%

% -80

.-

v)

0

z al

2-100 a c

-120 Approximate Model of Noise Pulses J

I

I

Samdina Impdsetain Jime N o i s h

Figure 3. (a) Noise at input of differential pair modulates instants of zero crossing. (b) Output current consists of square wave, plus random noise pulses. (c) Noise pulses modelled as a train of impulses sampling noise waveform.

-140 1

10 100 Offset Frequency,kHz

1000

1

10 100 Offset Frequency,kHz

1000

-60

-70 N

5 -80 I%

9

cm -90

.-v)

:-100 0

Phase Noise

Oscillation

'

Figure 4. Increasing tail current first causes amplitude to rise, until limited by supply. Phase noise diminishes with rising amplitude, then worsens due to higher noise factor.

(ZI v)

E -110

Bias Current, IT

-120 -130

Figure 7. Validation of the analysis presented in this paper. Measured phase noise is compared with predictions from analysis, and with SpectreRF simulations. (a) 0.35-pm CMOS 1.1 GHz oscillator using resonator with loaded Q o f 6. (b) 0.25-pm CMOS 830 MHz oscillator using discrete inductor with loaded of Q o f 25.

572

25-1-4

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

179

A General Theory of Phase Noise in Electrical Oscillators Ali Hajimiri, Student Member, IEEE, and Thomas H. Lee, Member, IEEE

Abstract— A general model is introduced which is capable of making accurate, quantitative predictions about the phase noise of different types of electrical oscillators by acknowledging the true periodically time-varying nature of all oscillators. This new approach also elucidates several previously unknown design criteria for reducing close-in phase noise by identifying the mechanisms by which intrinsic device noise and external noise sources contribute to the total phase noise. In particular, it explains the details of how 1=f noise in a device upconverts into close-in phase noise and identifies methods to suppress this upconversion. The theory also naturally accommodates cyclostationary noise sources, leading to additional important design insights. The model reduces to previously available phase noise models as special cases. Excellent agreement among theory, simulations, and measurements is observed. Index Terms—Jitter, oscillator noise, oscillators, oscillator stability, phase jitter, phase locked loops, phase noise, voltage controlled oscillators.

I. INTRODUCTION

T

HE recent exponential growth in wireless communication has increased the demand for more available channels in mobile communication applications. In turn, this demand has imposed more stringent requirements on the phase noise of local oscillators. Even in the digital world, phase noise in the guise of jitter is important. Clock jitter directly affects timing margins and hence limits system performance. Phase and frequency fluctuations have therefore been the subject of numerous studies [1]–[9]. Although many models have been developed for different types of oscillators, each of these models makes restrictive assumptions applicable only to a limited class of oscillators. Most of these models are based on a linear time invariant (LTI) system assumption and suffer from not considering the complete mechanism by which electrical noise sources, such as device noise, become phase noise. In particular, they take an empirical approach in describing the upconversion of low frequency noise sources, such as noise, into close-in phase noise. These models are also reduced-order models and are therefore incapable of making accurate predictions about phase noise in long ring oscillators, or in oscillators that contain essential singularities, such as delay elements.

Manuscript received December 17, 1996; revised July 9, 1997. The authors are with the Center for Integrated Systems, Stanford University, Stanford, CA 94305-4070 USA. Publisher Item Identifier S 0018-9200(98)00716-1.

Since any oscillator is a periodically time-varying system, its time-varying nature must be taken into account to permit accurate modeling of phase noise. Unlike models that assume linearity and time-invariance, the time-variant model presented here is capable of proper assessment of the effects on phase noise of both stationary and even of cyclostationary noise sources. Noise sources in the circuit can be divided into two groups, namely, device noise and interference. Thermal, shot, and flicker noise are examples of the former, while substrate and supply noise are in the latter group. This model explains the exact mechanism by which spurious sources, random or deterministic, are converted into phase and amplitude variations, and includes previous models as special limiting cases. This time-variant model makes explicit predictions of the relationship between waveform shape and noise upconversion. Contrary to widely held beliefs, it will be shown that the corner in the phase noise spectrum is smaller than noise corner of the oscillator’s components by a factor determined by the symmetry properties of the waveform. This result is particularly important in CMOS RF applications because it shows that the effect of inferior device noise can be reduced by proper design. Section II is a brief introduction to some of the existing phase noise models. Section III introduces the time-variant model through an impulse response approach for the excess phase of an oscillator. It also shows the mechanism by which noise at different frequencies can become phase noise and expresses with a simple relation the sideband power due to an arbitrary source (random or deterministic). It continues with explaining how this approach naturally lends itself to the analysis of cyclostationary noise sources. It also introduces a general method to calculate the total phase noise of an oscillator with multiple nodes and multiple noise sources, and how this method can help designers to spot the dominant source of phase noise degradation in the circuit. It concludes with a demonstration of how the presented model reduces to existing models as special cases. Section IV gives new design implications arising from this theory in the form of guidelines for low phase noise design. Section V concludes with experimental results supporting the theory. II. BRIEF REVIEW OF EXISTING MODELS AND DEFINITIONS The output of an ideal sinusoidal oscillator may be expressed as , where is the amplitude,

0018–9200/98$10.00  1998 IEEE

180

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

Fig. 2. A typical RLC oscillator.

Fig. 1. Typical plot of the phase noise of an oscillator versus offset from carrier.

is the frequency, and is an arbitrary, fixed phase reference. Therefore, the spectrum of an ideal oscillator with no random fluctuations is a pair of impulses at . In a practical oscillator, however, the output is more generally given by

The semi-empirical model proposed in [1]–[3], known also as the Leeson–Cutler phase noise model, is based on an LTI assumption for tuned tank oscillators. It predicts the following : behavior for

(1)

(3)

where and are now functions of time and is a periodic function with period 2 . As a consequence of the fluctuations represented by and , the spectrum of a practical oscillator has sidebands close to the frequency of oscillation, . There are many ways of quantifying these fluctuations (a comprehensive review of different standards and measurement methods is given in [4]). A signal’s short-term instabilities are usually characterized in terms of the single sideband noise spectral density. It has units of decibels below the carrier per hertz (dBc/Hz) and is defined as

is an empirical parameter (often called the “device where is the excess noise number”), is Boltzmann’s constant, is the average power dissipated in absolute temperature, is the oscillation frequency, the resistive part of the tank, is the effective quality factor of the tank with all the is the offset loadings in place (also known as loaded ), is the frequency of the corner from the carrier and and regions, as shown in the sideband between the region can be spectrum of Fig. 1. The behavior in the obtained by applying a transfer function approach as follows. , is easily The impedance of a parallel RLC, for calculated to be

1 Hz

(2) (4)

where 1 Hz represents the single sidefrom the carrier with a band power at a frequency offset of measurement bandwidth of 1 Hz. Note that the above definition includes the effect of both amplitude and phase fluctuations, and . The advantage of this parameter is its ease of measurement. Its disadvantage is that it shows the sum of both amplitude and phase variations; it does not show them separately. However, it is important to know the amplitude and phase noise separately because they behave differently in the circuit. For instance, the effect of amplitude noise is reduced by amplitude limiting mechanism and can be practically eliminated by the application of a limiter to the output signal, while the phase noise cannot be reduced in the same manner. Therefore, in most applications, is dominated by its phase portion, , known as phase noise, which we will simply denote as .

is the parallel parasitic conductance of the tank. where should For steady-state oscillation, the equation be satisfied. Therefore, for a parallel current source, the closedloop transfer function of the oscillator shown in Fig. 2 is given by the imaginary part of the impedance (5) The total equivalent parallel resistance of the tank has an equivalent mean square noise current density of . In addition, active device noise usually contributes a significant portion of the total noise in the oscillator. It is traditional to combine all the noise sources into one effective noise source, expressed in terms of the resistor noise with

HAJIMIRI AND LEE: GENERAL THEORY OF PHASE NOISE IN ELECTRICAL OSCILLATORS

181

Fig. 3. Phase and amplitude impulse response model.

a multiplicative factor, , known as the device excess noise number. The equivalent mean square noise current density can therefore be expressed as . Unfortunately, it is generally difficult to calculate a priori. One important reason is that much of the noise in a practical oscillator arises from periodically varying processes and is therefore cyclostationary. Hence, as mentioned in [3], and are usually used as a posteriori fitting parameters on measured data. Using the above effective noise current power, the phase region of the spectrum can be calculated as noise in the

(a)

(b)

(c) Fig. 4. (a) Impulse injected at the peak, (b) impulse injected at the zero crossing, and (c) effect of nonlinearity on amplitude and phase of the oscillator in state-space.

(6) III. MODELING Note that the factor of 1/2 arises from neglecting the contribution of amplitude noise. Although the expression for the noise in the region is thus easily obtained, the expression portion of the phase noise is completely empirical. for the As such, the common assumption that the corner of the phase noise is the same as the corner of device flicker noise has no theoretical basis. The above approach may be extended by identifying the individual noise sources in the tuned tank oscillator of Fig. 2 [8]. An LTI approach is used and there is an embedded assumption of no amplitude limiting, contrary to most practical cases. For the RLC circuit of Fig. 2, [8] predicts the following: (7) is yet another empirical fitting parameter, and where is the effective series resistance, given by

OF

PHASE NOISE

A. Impulse Response Model for Excess Phase inputs An oscillator can be modeled as a system with (each associated with one noise source) and two outputs that are the instantaneous amplitude and excess phase of the oscillator, and , as defined by (1). Noise inputs to this system are in the form of current sources injecting into circuit nodes and voltage sources in series with circuit branches. For each input source, both systems can be viewed as singleinput, single-output systems. The time and frequency-domain fluctuations of and can be studied by characterizing the behavior of two equivalent systems shown in Fig. 3. Note that both systems shown in Fig. 3 are time variant. Consider the specific example of an ideal parallel LC oscillator shown in Fig. 4. If we inject a current impulse as shown, the amplitude and phase of the oscillator will have responses similar to that shown in Fig. 4(a) and (b). The instantaneous voltage change is given by

(8) (9) , , , and are shown in Fig. 2. Note that it where from circuit parameters. is still not clear how to calculate Hence, this approach represents no fundamental improvement over the method outlined in [3].

is the total injected charge due to the current where impulse and is the total capacitance at that node. Note that the current impulse will change only the voltage across the

182

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

(a)

(b)

Fig. 5. (a) A typical Colpitts oscillator and (b) a five-stage minimum size ring oscillator.

capacitor and will not affect the current through the inductor. It can be seen from Fig. 4 that the resultant change in and is time dependent. In particular, if the impulse is applied at the peak of the voltage across the capacitor, there will be no phase shift and only an amplitude change will result, as shown in Fig. 4(a). On the other hand, if this impulse is applied at the zero crossing, it has the maximum effect on the excess phase and the minimum effect on the amplitude, as depicted in Fig. 4(b). This time dependence can also be observed in the state-space trajectory shown in Fig. 4(c). Applying an impulse at the peak is equivalent to a sudden jump in voltage at point , which results in no phase change and changes only the amplitude, while applying an impulse at point results only in a phase change without affecting the amplitude. An impulse applied sometime between these two extremes will result in both amplitude and phase changes. There is an important difference between the phase and amplitude responses of any real oscillator, because some form of amplitude limiting mechanism is essential for stable oscillatory action. The effect of this limiting mechanism is pictured as a closed trajectory in the state-space portrait of the oscillator shown in Fig. 4(c). The system state will finally approach this trajectory, called a limit cycle, irrespective of its starting point [10]–[12]. Both an explicit automatic gain control (AGC) and the intrinsic nonlinearity of the devices act similarly to produce a stable limit cycle. However, any fluctuation in the phase of the oscillation persists indefinitely, with a current noise impulse resulting in a step change in phase, as shown in Fig. 3. It is important to note that regardless of how small the injected charge, the oscillator remains time variant. Having established the essential time-variant nature of the systems of Fig. 3, we now show that they may be treated as linear for all practical purposes, so that their impulse responses and will characterize them completely. The linearity assumption can be verified by injecting impulses with different areas (charges) and measuring the resultant phase change. This is done in the SPICE simulations of the 62-MHz Colpitts oscillator shown in Fig. 5(a) and the fivestage 1.01-GHz, 0.8- m CMOS inverter chain ring oscillator shown in Fig. 5(b). The results are shown in Fig. 6(a) and (b), respectively. The impulse is applied close to a zero crossing,

(a)

(b)

Fig. 6. Phase shift versus injected charge for oscillators of Fig. 5(a) and (b).

where it has the maximum effect on phase. As can be seen, the current-phase relation is linear for values of charge up to 10% of the total charge on the effective capacitance of the node of interest. Also note that the effective injected charges due to actual noise and interference sources in practical circuits are several orders of magnitude smaller than the amounts of charge injected in Fig. 6. Thus, the assumption of linearity is well satisfied in all practical oscillators. It is critical to note that the current-to-phase transfer function is practically linear even though the active elements may have strongly nonlinear voltage-current behavior. However, the nonlinearity of the circuit elements defines the shape of the limit cycle and has an important influence on phase noise that will be accounted for shortly. We have thus far demonstrated linearity, with the amount of excess phase proportional to the ratio of the injected charge to the maximum charge swing across the capacitor on the node, i.e., . Furthermore, as discussed earlier, the impulse response for the first system of Fig. 3 is a step whose amplitude depends periodically on the time when the impulse is injected. Therefore, the unit impulse response for excess phase can be expressed as (10) where is the maximum charge displacement across the capacitor on the node and is the unit step. We call the impulse sensitivity function (ISF). It is a dimensionless, frequency- and amplitude-independent periodic function with period 2 which describes how much phase shift results from applying a unit impulse at time . To illustrate its significance, the ISF’s together with the oscillation waveforms for a typical LC and ring oscillator are shown in Fig. 7. As is shown in the Appendix, is a function of the waveform or, equivalently, the shape of the limit cycle which, in turn, is governed by the nonlinearity and the topology of the oscillator. Given the ISF, the output excess phase can be calculated using the superposition integral

(11)

HAJIMIRI AND LEE: GENERAL THEORY OF PHASE NOISE IN ELECTRICAL OSCILLATORS

(a)

183

(b)

Fig. 7. Waveforms and ISF’s for (a) a typical LC oscillator and (b) a typical ring oscillator.

where represents the input noise current injected into the node of interest. Since the ISF is periodic, it can be expanded in a Fourier series (12) where the coefficients are real-valued coefficients, and is the phase of the th harmonic. As will be seen later, is not important for random input noise and is thus neglected here. Using the above expansion for in the superposition integral, and exchanging the order of summation and integration, we obtain

Fig. 8. Conversion of the noise around integer multiples of the oscillation frequency into phase noise.

consists of two impulses at as shown in Fig. 8. This time the only integral in (13) which will have a low frequency argument is for . Therefore is given by (16) in . which again results in two equal sidebands at More generally, (13) suggests that applying a current close to any integer multiple of the oscillation frequency will result in two equal sidebands at in . Hence, in the general case is given by (17)

(13) B. Phase-to-Voltage Transformation Equation (13) allows computation of for an arbitrary input current injected into any circuit node, once the various Fourier coefficients of the ISF have been found. As an illustrative special case, suppose that we inject a low frequency sinusoidal perturbation current into the node of interest at a frequency of (14) where is the maximum amplitude of . The arguments of all the integrals in (13) are at frequencies higher than and are significantly attenuated by the averaging nature of the integration, except the term arising from the first integral, which involves . Therefore, the only significant term in will be (15) As a result, there will be two impulses at in the power spectral density of , denoted as . As an important second special case, consider a current at a frequency close to the carrier injected into the node of interest, given by . A process similar to that of the previous case occurs except that the spectrum of

So far, we have presented a method for determining how using (13). much phase error results from a given current Computing the power spectral density (PSD) of the oscillator output voltage requires knowledge of how the output voltage relates to the excess phase variations. As shown in Fig. 8, the conversion of device noise current to output voltage may be treated as the result of a cascade of two processes. The first corresponds to a linear time variant (LTV) currentto-phase converter discussed above, while the second is a nonlinear system that represents a phase modulation (PM), which transforms phase to voltage. To obtain the sideband power around the fundamental frequency, the fundamental harmonic of the oscillator output can be used as the transfer function for the second system in Fig. 8. Note as the input. this is a nonlinear transfer function with Substituting from (17) into (1) results in a single-tone phase modulation for output voltage, with given by (17). Therefore, an injected current at results in a pair with a sideband power relative of equal sidebands at to the carrier given by (18)

184

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

(a)

(b)

Fig. 9. Simulated power spectrum of the output with current injection at (a) f MHz and (b) f0 f : GHz.

m = 50

+ m = 1 06

This process is shown in Fig. 8. Appearance of the frequency deviation in the denominator of the (18) underscores that the impulse response is a step function and therefore behaves as a time-varying integrator. We will frequently refer to (18) in subsequent sections. Applying this method of analysis to an arbitrary oscillator, a sinusoidal current injected into one of the oscillator nodes at a frequency results in two equal sidebands at , as observed in [9]. Note that it is necessary to use an LTV because an LTI model cannot explain the presence of a pair of equal sidebands close to the carrier arising from sources at frequencies , because an LTI system cannot produce any frequencies except those of the input and those associated with the system’s poles. Furthermore, the amplitude of the resulting sidebands, as well as their equality, cannot be predicted by conventional intermodulation effects. This failure is to be expected since the intermodulation terms arise from nonlinearity in the voltage (or current) input/output characteristic of active devices of the form . This type of nonlinearity does not directly appear in the phase transfer characteristic and shows itself only indirectly in the ISF. It is instructive to compare the predictions of (18) with simulation results. A sinusoidal current of 10 A amplitude at different frequencies was injected into node 1 of the 1.01-GHz ring oscillator of Fig. 5(b). Fig. 9(a) shows the simulated power spectrum of the signal on node 4 for a low frequency MHz. This power spectrum is obtained using input at the fast Fourier transform (FFT) analysis in HSPICE 96.1. It is noteworthy that in this version of HSPICE the simulation artifacts observed in [9] have been properly eliminated by calculation of the values used in the analysis at the exact points of interest. Note that the injected noise is upconverted and , as predicted into two equal sidebands at by (18). Fig. 9(b) shows the effect of injection of a current at GHz. Again, two equal sidebands are observed and , also as predicted by (18). at Simulated sideband power for the general case of current can be compared to the predictions of injection at

Fig. 10. Simulated and calculated sideband powers for the first ten coefficients.

(18). The ISF for this oscillator is obtained by the simulation method of the Appendix. Here, is equal to , where is the average capacitance on each node of the circuit and is the maximum swing across it. For this oscillator, fF and V, which results in fC. For a sinusoidal injected current of amplitude A, and an of 50 MHz, Fig. 10 depicts the simulated and predicted sideband powers. As can be seen from the figure, these agree to within 1 dB for the higher power sidebands. The discrepancy in the case of the low power sidebands ( – ) arises from numerical noise in the simulations, which represents a greater fractional error at lower sideband power. Overall, there is satisfactory agreement between simulation and the theory of conversion of noise from various frequencies into phase fluctuations. C. Prediction of Phase Noise Sideband Power Now we consider the case of a random noise current whose power spectral density has both a flat region and a region, as shown in Fig. 11. As can be seen from (18) and the foregoing discussion, noise components located near integer multiples of the oscillation frequency are transformed to low frequency noise sidebands for , which in turn become close-in phase noise in the spectrum of , as illustrated in Fig. 11. It can be seen that the total is given by the sum of phase noise contributions from device noise in the vicinity of the integer multiples of , weighted by the coefficients . This is shown in Fig. 12(a) (logarithmic frequency scale). The resulting single sideband spectral noise density is plotted on a logarithmic scale in Fig. 12(b). The sidebands in the spectrum of , in turn, result in phase noise sidebands in the spectrum of through the PM mechanism discuss in the previous subsection. This process is shown in Figs. 11 and 12. The theory predicts the existence of , , and flat regions for the phase noise spectrum. The low-frequency noise sources, such as flicker noise, are weighted by the coefficient and show a dependence on the offset frequency, while

HAJIMIRI AND LEE: GENERAL THEORY OF PHASE NOISE IN ELECTRICAL OSCILLATORS

185

(a)

(b)

Fig. 11. bands.

Conversion of noise to phase fluctuations and phase-noise sideFig. 12. (a) PSD of spectrum, Lf ! g.

1

the white noise terms are weighted by other coefficients and give rise to the region of phase noise spectrum. It is apparent that if the original noise current contains low frequency noise terms, such as popcorn noise, they can appear in the phase noise spectrum as regions. Finally, the flat noise floor in Fig. 12(b) arises from the white noise floor of the noise sources in the oscillator. The total sideband noise power is the sum of these two as shown by the bold line in the same figure. To carry out a quantitative analysis of the phase noise sideband power, now consider an input noise current with a white power spectral density . Note that in (18) for represents the peak amplitude, hence, Hz. Based on the foregoing development and (18), the total single sideband phase noise spectral density in dB below the carrier per unit bandwidth due to the source on one node at an offset frequency of is given by

(t)

and (b) single sideband phase noise power

obvious from the foregoing development that the corner of the phase noise and the corner of the device noise should be coincident, as is commonly assumed. In fact, from Fig. 12, it should be apparent that the relationship between these two frequencies depends on the specific values of the various coefficients . The device noise in the flicker noise dominated portion of the noise spectrum can be described by (22) is the corner frequency of device noise. where Equation (22) together with (18) result in the following expression for phase noise in the portion of the phase noise spectrum: (23)

(19)

Now, according to Parseval’s relation we have (20) where

is the rms value of

corner, , is the frequency where The phase noise the sideband power due to the white noise given by (21) is equal to the sideband power arising from the noise given by (23), as shown in Fig. 12. Solving for results in the following expression for the corner in the phase noise spectrum:

. As a result (24) (21)

This equation represents the phase noise spectrum of an region of the phase noise spectrum. arbitrary oscillator in For a voltage noise source in series with an inductor, should be replaced with , where represents the maximum magnetic flux swing in the inductor. We may now investigate quantitatively the relationship corner and the corner of the between the device phase noise. It is important to note that it is by no means

This equation together with (21) describe the phase noise spectrum and are the major results of this section. As can be seen, the phase noise corner due to internal noise sources is not equal to the device noise corner, but is smaller by a factor equal to . As will be discussed later, depends on the waveform and can be significantly reduced if certain symmetry properties exist in the waveform device noise need not imply of the oscillation. Thus, poor poor close-in phase noise performance.

186

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

Fig. 14.

0(x), 0e (x), and (x) for the Colpitts oscillator of Fig. 5(a).

Fig. 13. Collector voltage and collector current of the Colpitts oscillator of Fig. 5(a).

D. Cyclostationary Noise Sources In addition to the periodically time-varying nature of the system itself, another complication is that the statistical properties of some of the random noise sources in the oscillator may change with time in a periodic manner. These sources are referred to as cyclostationary. For instance, the channel noise of a MOS device in an oscillator is cyclostationary because the noise power is modulated by the gate source overdrive which varies with time periodically. There are other noise sources in the circuit whose statistical properties do not depend on time and the operation point of the circuit, and are therefore called stationary. Thermal noise of a resistor is an example of a stationary noise source. A white cyclostationary noise current can be decomposed as [13]: (25) is a white cyclostationary process, is a where is a deterministic periodic white stationary process and function describing the noise amplitude modulation. We define to be a normalized function with a maximum value of is equal to the maximum mean square noise 1. This way, power, , which changes periodically with time. Applying the above expression for to (11), is given by

used in all subsequent calculations, in particular, calculation of the coefficients . Note that there is a strong correlation between the cyclostationary noise source and the waveform of the oscillator. The maximum of the noise power always appears at a certain point of the oscillatory waveform, thus the average of the noise may not be a good representation of the noise power. Consider as one example the Colpitts oscillator of Fig. 5(a). The collector voltage and the collector current of the transistor are shown in Fig. 13. Note that the collector current consists of a short period of large current followed by a quiet interval. The surge of current occurs at the minimum of the voltage across the tank where the ISF is small. Functions , , and for this oscillator are shown in Fig. 14. Note that, in this case, is quite different from , and hence the effect of cyclostationarity is very significant for the LC oscillator and cannot be neglected. The situation is different in the case of the ring oscillator of Fig. 5(b), because the devices have maximum current during the transition (when is at a maximum, i.e., the sensitivity is large) at the same time the noise power is large. , , and for the ring oscillator of Functions Fig. 5(b) are shown in Fig. 15. Note that in the case of the ring oscillator and are almost identical. This indicates that the cyclostationary properties of the noise are less important in the treatment of the phase noise of ring oscillators. This unfortunate coincidence is one of the reasons why ring oscillators in general have inferior phase noise performance compared to a Colpitts LC oscillator. The other important reason is that ring oscillators dissipate all the stored energy during one cycle.

(26) E. Predicting Output Phase Noise with Multiple Noise Sources As can be seen, the cyclostationary noise can be treated as a stationary noise applied to a system with an effective ISF given by (27) can be derived easily from device noise characterwhere istics and operating point. Hence, this effective ISF should be

The method of analysis outlined so far has been used to predict how much phase noise is contributed by a single noise source. However, this method may be extended to multiple noise sources and multiple nodes, as individual contributions by the various noise sources may be combined by exploiting superposition. Superposition holds because the first system of Fig. 8 is linear.

HAJIMIRI AND LEE: GENERAL THEORY OF PHASE NOISE IN ELECTRICAL OSCILLATORS

187

In particular, consider the model for LC oscillators in [3], as well as the more comprehensive presentation of [8]. Those models assume linear time-invariance, that all noise sources are stationary, that only the noise in the vicinity of is important, and that the noise-free waveform is a perfect sinusoid. These assumptions are equivalent to discarding all but the term in the ISF and setting . As a specific example, consider the oscillator of Fig. 2. The phase noise due solely to the tank parallel resistor can be found by applying the following to (19):

Fig. 15.

(28)

0(x), 0e (x), and (x) for the ring oscillator of Fig. 5(b).

The actual method of combining the individual contributions requires attention to any possible correlations that may exist among the noise sources. The complete method for doing so may be appreciated by noting that an oscillator has a current noise source in parallel with each capacitor and a voltage noise source in series with each inductor. The phase noise in the output of such an oscillator is calculated using the following method. 1) Find the equivalent current noise source in parallel with each capacitor and an equivalent voltage source in series with each inductor, keeping track of correlated and noncorrelated portions of the noise sources for use in later steps. 2) Find the transfer characteristic from each source to the output excess phase. This can be done as follows. a) Find the ISF for each source, using any of the methods proposed in the Appendix, depending on the required accuracy and simplicity. b) Find and (rms and dc values) of the ISF. 3) Use and coefficients and the power spectrum of the input noise sources in (21) and (23) to find the phase noise power resulting from each source. 4) Sum the individual output phase noise powers for uncorrelated sources and square the sum of phase noise rms values for correlated sources to obtain the total noise power below the carrier. Note that the amount of phase noise contributed by each noise source depends only on the value of the noise power density , the amount of charge swing across the effective capacitor it is injecting into , and the steady-state oscillation waveform across the noise source of interest. This observation is important since it allows us to attribute a definite contribution from every noise source to the overall phase noise. Hence, our treatment is both an analysis and design tool, enabling designers to identify the significant contributors to phase noise. F. Existing Models as Simplified Cases As asserted earlier, the model proposed here reduces to earlier models if the same simplifying assumptions are made.

is the parallel resistor, is the tank capacitor, and where is the maximum voltage swing across the tank. Equation (19) reduces to (29) Since [8] assumes equal contributions from amplitude and phase portions to , the result obtained in [8] is two times larger than the result of (29). Assuming that the total noise contribution in a parallel tank oscillator can be modeled using an excess noise factor as in [3], (29) together with (24) result in (6). Note that the generalized approach presented here is capable of calculating the fitting parameters used in (3), ( and ) in terms of coefficients of ISF and device noise corner, . IV. DESIGN IMPLICATIONS Several design implications emerge from (18), (21), and (24) that offer important insight for reduction of phase noise in the oscillators. First, they show that increasing the signal charge displacement across the capacitor will reduce the phase noise degradation by a given noise source, as has been noted in previous works [5], [6]. In addition, the noise power around integer multiples of the oscillation frequency has a more significant effect on the closein phase noise than at other frequencies, because these noise components appear as phase noise sidebands in the vicinity of the oscillation frequency, as described by (18). Since the contributions of these noise components are scaled by the Fourier series coefficients of the ISF, the designer should seek to minimize spurious interference in the vicinity of for values of such that is large. Criteria for the reduction of phase noise in the region are suggested by (24), which shows that the corner of the phase noise is proportional to the square of the coefficient . Recalling that is twice the dc value of the (effective) ISF function, namely (30) it is clear that it is desirable to minimize the dc value of the ISF. As shown in the Appendix, the value of is closely related to certain symmetry properties of the oscillation

188

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

(a)

(b)

(a)

(b)

Fig. 17. Simulated power spectrum with current injection at f for (a) asymmetrical node and (b) symmetrical node.

m = 50 MHz

(c)

(d)

Fig. 16. (a) Waveform and (b) ISF for the asymmetrical node. (c) Waveform and (d) ISF for one of the symmetrical nodes.

waveform. One such property concerns the rise and fall times; the ISF will have a large dc value if the rise and fall times of the waveform are significantly different. A limited case of this for odd-symmetric waveforms has been observed [14]. Although odd-symmetric waveforms have small coefficients, the class of waveforms with small is not limited to odd-symmetric waveforms. To illustrate the effect of a rise and fall time asymmetry, consider a purposeful imbalance of pull-up and pull-down rates in one of the inverters in the ring oscillator of Fig. 5(b). of the This is obtained by halving the channel width of the PMOS NMOS device and doubling the width device of one inverter in the ring. The output waveform and corresponding ISF are shown in Fig. 16(a) and (b). As can be seen, the ISF has a large dc value. For comparison, the waveform and ISF at the output of a symmetrical inverter elsewhere in the ring are shown in Fig. 16(c) and (d). From these results, it can be inferred that the close-in phase noise due to low-frequency noise sources should be smaller for the symmetrical output than for the asymmetrical one. To investigate this assertion, the results of two SPICE simulations are shown in Fig. 17. In the first simulation, a sinusoidal current source of amplitude 10 A at MHz is applied to one of the symmetric nodes of the

oscillator. In the second experiment, the same source is applied to the asymmetric node. As can be seen from the power spectra of the figure, noise injected into the asymmetric node results in sidebands that are 12 dB larger than at the symmetric node. Note that (30) suggests that upconversion of low frequency noise can be significantly reduced, perhaps even eliminated, by minimizing , at least in principle. Since depends on the waveform, this observation implies that a proper choice of waveform may yield significant improvements in close-in phase noise. The following experiment explores this concept by changing the ratio of to over some range, while injecting 10 A of sinusoidal current at 100 MHz into one node. The sideband power below carrier as a function of the to ratio is shown in Fig. 18. The SPICEsimulated sideband power is shown with plus symbols and the sideband power as predicted by (18) is shown by the solid line. As can be seen, close-in phase noise due to upconversion of low-frequency noise can be suppressed by an arbitrary factor, at least in principle. It is important to note, however, that the minimum does not necessarily correspond to equal transconductance ratios, since other waveform properties influence the value of . In fact, the optimum to ratio in this particular example is seen to differ considerably from that used in conventional ring oscillator designs. The importance of symmetry might lead one to conclude that differential signaling would minimize . Unfortunately, while differential circuits are certainly symmetrical with respect to the desired signals, the differential symmetry disappears for the individual noise sources because they are independent of each other. Hence, it is the symmetry of each half-circuit that is important, as is demonstrated in the differential ring oscillator of Fig. 19. A sinusoidal current of 100 A at 50 MHz injected at the drain node of one of the buffer stages results in two equal sidebands, 46 dB below carrier, in the power spectrum of the differential output. Because of the voltage dependent conductance of the load devices, the individual waveform on each output node is not fully symmetrical and consequently, there will be a large

HAJIMIRI AND LEE: GENERAL THEORY OF PHASE NOISE IN ELECTRICAL OSCILLATORS

Fig. 18. Simulated and predicted sideband power for low frequency injection versus PMOS to NMOS W=L ratio.

Fig. 19.

Four-stage differential ring oscillator.

upconversion of noise to close-in phase noise, even though differential signaling is used. Since the asymmetry is due to the voltage dependent conductance of the load, reduction of the upconversion might be achieved through the use of a perfectly linear resistive load, because the rising and falling behavior is governed by an RC time constant and makes the individual waveforms more symmetrical. It was first observed in the context of supply noise rejection [15], [16] that using more linear loads can reduce the effect of supply noise on timing jitter. Our treatment shows that it also improves low-frequency noise upconversion into phase noise. Another symmetry-related property is duty cycle. Since the ISF is waveform-dependent, the duty cycle of a waveform is linked to the duty cycle of the ISF. Non-50% duty cycles generally result in larger for even . The high- tank of an LC oscillator is helpful in this context, since a high will produce a more symmetric waveform and hence reduce the upconversion of low-frequency noise. V. EXPERIMENTAL RESULTS This section presents experimental verifications of the model to supplement simulation results. The first experiment ex-

189

m = 100 3 + m = 16 3 MHz.

Fig. 20. Measured sideband power versus injected current at kHz, f0 f : MHz, f0 f : MHz, f0 f

+ m=55

2 + m = 10 9

f

:

amines the linearity of current-to-phase conversion using a five-stage, 5.4-MHz ring oscillator constructed with ordinary CMOS inverters. A sinusoidal current is injected at frequencies kHz, MHz, MHz, and MHz, and the sideband powers at are measured as the magnitude of the injected current is varied. At any amplitude of injected current, the sidebands are equal in amplitude to within the accuracy of the measurement setup (0.2 dB), in complete accordance with the theory. These sideband powers are plotted versus the input injected current in Fig. 20. As can be seen, the transfer function for the input current power to the output sideband power is linear as suggested by (18). The slope of the best fit line is 19.8 dB/decade, which is very close to the predicted slope of 20 dB/decade, since excess phase is proportional , and hence the sideband power is proportional to , to leading to a 20-dB/decade slope. The behavior shown in Fig. 20 verifies that the linearity of (18) holds for injected input currents orders of magnitude larger than typical noise currents. The second experiment varies the frequency offset from an integer multiple of the oscillation frequency. An input , sinusoidal current source of 20 A (rms) at , and is applied to one node and the output is measured at another node. The sideband power is plotted in Fig. 21. Note that the slope in all four cases is versus 20 dB/decade, again in complete accordance with (18). The third experiment aims at verifying the effect of the on the sideband power. One of the predictions coefficients is responsible for the upconverof the theory is that is sion of low frequency noise. As mentioned before, a strong function of waveform symmetry at the node into which the current is injected. Noise injected into a node with an asymmetric waveform (created by making one inverter asymmetric in a ring oscillator) would result in a greater increase in sideband power than injection into nodes with more symmetric waveforms. Fig. 22 shows the results of an experiment performed on a five-stage ring oscillator in which one of the stages is modified to have an extra pulldown

190

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

m

Fig. 21. Measured sideband power versus f , for injections in vicinity of multiples of f0 .

Fig. 22. Power of the sidebands caused by low frequency injection into symmetric and asymmetric nodes of the ring oscillator.

NMOS device. A current of 20 A (rms) is injected into this asymmetric node with and without the extra pulldown device. For comparison, this experiment is repeated for a symmetric node of the oscillator, before and after this modification. Note that the sideband power is 7 dB larger when noise is injected into the node with the asymmetrical waveform, while the sidebands due to signal injection at the symmetric nodes are essentially unchanged with the modification. The fourth experiment compares the prediction and measurement of the phase noise for a five-stage single-ended ring oscillator implemented in a 2- m, 5-V CMOS process running at MHz. This measurement was performed using a delay-based measurement method and the result is shown in Fig. 23. Distinct and regions are observed. We first start with a calculation for the region. For this process we have a gate oxide thickness of nm and threshold voltages of V and V. All five inverters are similar with m m and m m, and a lateral diffusion of m. Using the process and geometry information, the total capacitance on each node, including parasitics, is calculated to be fF. Therefore,

Fig. 23. Phase noise measurements for a five-stage single-ended CMOS ring oscillator. f0 = 232 MHz, 2-m process technology.

fC. As discussed in the previous section, noise current injected during a transition has the largest effect. The current noise power at this point is the sum of the current noise powers due to NMOS and PMOS devices. At this bias point, A2 /Hz and ( 2 A /Hz. Using the methods outlined in the Appendix, it may be shown that for ring oscillators. Equation (21) for identical noise sources then predicts . At an offset of kHz, this equation predicts kHz dBc/Hz, in good agreement with a measurement of 114.5 dBc/Hz. To predict the phase noise in the region, it is enough to calculate corner. Measurements on an isolated inverter on the the same die show a noise corner frequency of 250 kHz, when its input and output are shorted. The ratio is calculated to be 0.3, which predicts a corner of 75 kHz, compared to the measured corner of 80 kHz. The fifth experiment measures the phase noise of an 11stage ring, running at MHz implemented on the same die as the previous experiment. The phase noise measurements are shown in Fig. 24. For the inverters in this oscillator, m m and m m, which results in a total capacitance of 43.5 fF and fC. The phase noise is calculated in exactly the same manner as the previous experiment and is calculated to be , or 122.1 dBc/Hz at a 500-kHz offset. The measured phase noise is 122.5 dBc/Hz, again in good agreement with predictions. The ratio is calculated corner of 43 kHz, while the to be 0.17 which predicts a measured corner is 45 kHz. The sixth experiment investigates the effect of symmetry on region behavior. It involves a seven-stage currentstarved, single-ended ring oscillator in which each inverter stage consists of an additional NMOS and PMOS device in series. The gate drives of the added transistors allow independent control of the rise and fall times. Fig. 25 shows the phase noise when the control voltages are adjusted to achieve symmetry versus when they are not. In both cases the control voltages are adjusted to keep the oscillation frequency

HAJIMIRI AND LEE: GENERAL THEORY OF PHASE NOISE IN ELECTRICAL OSCILLATORS

Fig. 24. Phase noise measurements for an 11-stage single-ended CMOS ring oscillator. f0 = 115 MHz, 2-m process technology.

191

Fig. 26. Sideband power versus the voltage controlling the symmetry of the waveform. Seven-stage current-starved single-ended CMOS VCO. f0 = 50 MHz, 2-m process technology.

Fig. 27. Phase noise measurements for a four-stage differential CMOS ring oscillator. f0 = 200MHz, 0.5-m process technology. Fig. 25. Effect of symmetry in a seven-stage current-starved single-ended CMOS VCO. f0 = 60 MHz, 2-m process technology.

A2 /Hz. Using these numbers , the phase noise in the region is predicted to be , or 103.2 dBc/Hz at an offset of 1 MHz, while the measurement in Fig. 27 shows a phase noise of 103.9 dBc/Hz, again in agreement with prediction. Also note that despite differential symmetry, there is a distinct region in the phase noise spectrum, because each half circuit is not symmetrical. The eighth experiment investigates cyclostationary effects in the bipolar Colpitts oscillator of Fig. 5(a), where the conduction angle is varied by changing the capacitive divider ratio while keeping the effective parallel capacitance constant to maintain an of 100 MHz. As can be seen in Fig. 28, increasing decreases the conduction angle, and thereby reduces the , leading to an initial decrease in phase noise. effective However, the oscillation amplitude is approximately given by , and therefore decreases for large values of . The phase noise ultimately increases for large as a consequence. There is thus a definite value of (here, about 0.2) that minimizes the phase noise. This result provides a theoretical basis for the common rule-of-thumb that one should

is for constant at 60 MHz. As can be seen, making the waveform more symmetric has a large effect on the phase noise in the region without significantly affecting the region. Another experiment on the same circuit is shown in Fig. 26, which shows the phase noise power spectrum at a 10 kHz offset versus the symmetry-controlling voltage. For all the data points, the control voltages are adjusted to keep the oscillation frequency at 50 MHz. As can be seen, the phase noise reaches a minimum by adjusting the symmetry properties of the waveform. This reduction is limited by the phase noise in region and the mismatch in transistors in different stages, which are controlled by the same control voltages. The seventh experiment is performed on a four-stage differential ring oscillator, with PMOS loads and NMOS differential stages, implemented in a 0.5- m CMOS process. Each stage is tapped with an equal-sized buffer. The tail current source has a quiescent current of 108 A. The total capacitance on each fF of the differential nodes is calculated to be and the voltage swing is V, which results in fF. The total channel noise current on each node

192

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

Fig. 28. Sideband power versus capacitive division ratio. Bipolar LC Colpitts oscillator f0 = 100 MHz.

use ratios of about four (corresponding to Colpitts oscillators [17].

Fig. 29. State-space trajectory of an nth-order oscillator.

) in

VI. CONCLUSION This paper has presented a model for phase noise which explains quantitatively the mechanism by which noise sources of all types convert to phase noise. The power of the model derives from its explicit recognition of practical oscillators as time-varying systems. Characterizing an oscillator with the ISF allows a complete description of the noise sensitivity of an oscillator and also allows a natural accommodation of cyclostationary noise sources. This approach shows that noise located near integer multiples of the oscillation frequency contributes to the total phase noise. The model specifies the contribution of those noise components in terms of waveform properties and circuit parameters, and therefore provides important design insight by identifying and quantifying the major sources of phase noise degradation. In particular, it shows that symmetry properties of the oscillator waveform have a significant effect on the upconversion of low frequency noise and, hence, the corner of the phase noise can be significantly lower than device noise corner. This observation is particularly the important for MOS devices, whose inferior noise has been thought to preclude their use in high-performance oscillators. APPENDIX CALCULATION OF THE IMPULSE SENSITIVITY FUNCTION In this Appendix we present three different methods to calculate the ISF. The first method is based on direct measurement of the impulse response and calculating from it. The second method is based on an analytical state-space approach to find the excess phase change caused by an impulse of current from the oscillation waveforms. The third method is an easy-to-use approximate method.

for a few cycles afterwards. By sweeping the impulse injection time across one cycle of the waveform and measuring the resulting time shift , can calculated noting that , where is the period of oscillation. Fortunately, many implementations of SPICE have an internal feature to perform the sweep automatically. Since for each impulse one needs to simulate the oscillator for only a few cycles, the simulation executes rapidly. Once is found, the ISF is calculated by multiplication with . This method is the most accurate of the three methods presented. B. Closed-Form Formula for the ISF An th-order system can be represented by its trajectory in an -dimensional state-space. In the case of a stable oscillator, the state of the system, represented by the state vector, , periodically traverses a closed trajectory, as shown in Fig. 29. Note that the oscillator does not necessarily traverse the limit cycle with a constant velocity. In the most general case, the effect of a group of external impulses can be viewed as a perturbation vector which . As suddenly changes the state of the system to discussed earlier, amplitude variations eventually die away, but phase variations do not. Application of the perturbation impulse causes a certain change in phase in either a negative or positive direction, depending on the state-vector and the direction of the perturbation. To calculate the equivalent time shift, we first find the projection of the perturbation vector on a unity vector in the direction of motion, i.e., the normalized velocity vector

(31)

where is the equivalent displacement along the trajectory, and A. Direct Measurement of Impulse Response In this method, an impulse is injected at different relative phases of the oscillation waveform and the oscillator simulated

is the first derivative of the state vector. Note the scalar nature of , which arises from the projection operation. The equivalent time shift is given by the displacement divided by

HAJIMIRI AND LEE: GENERAL THEORY OF PHASE NOISE IN ELECTRICAL OSCILLATORS

193

the “speed” (32) which results in the following equation for excess phase caused by the perturbation: (33) In the specific case where the state variables are node voltages, and an impulse is applied to the th node, there will be a change in given by (10). Equation (33) then reduces to (34)

Fig. 30. ISF’s obtained from different methods.

is the norm of the first derivative of the waveform where vector and is the derivative of the th node voltage. Equation (34), together with the normalized waveform function defined in (1), result in the following:

identical stages. The denominator may then be approximated by

(35)

Fig. 30 shows the results obtained from this method compared with the more accurate results obtained from methods and . Although this method is approximate, it is the easiest to use and allows a designer to rapidly develop important insights into the behavior of an oscillator.

where represents the derivative of the normalized waveform on node , hence

(38)

(36) ACKNOWLEDGMENT It can be seen that this expression for the ISF is maximum during transitions (i.e., when the derivative of the waveform function is maximum), and this maximum value is inversely proportional to the maximum derivative. Hence, waveforms with larger slope show a smaller peak in the ISF function. In the special case of a second-order system, one can use the normalized waveform and its derivative as the state variables, resulting in the following expression for the ISF: (37) where represents the second derivative of the function . In the case of an ideal sinusoidal oscillator , so that , which is consistent with the argument of Section III. This method has the attribute that it computes the ISF from the waveform directly, so that simulation over only one cycle of is required to obtain all of the necessary information. C. Calculation of ISF Based on the First Derivative This method is actually a simplified version of the second approach. In certain cases, the denominator of (36) shows little variation, and can be approximated by a constant. In such a case, the ISF is simply proportional to the derivative of the waveform. A specific example is a ring oscillator with

The authors would like to thank T. Ahrens, R. Betancourt, R. Farjad-Rad, M. Heshami, S. Mohan, H. Rategh, H. Samavati, D. Shaeffer, A. Shahani, K. Yu, and M. Zargari of Stanford University and Prof. B. Razavi of UCLA for helpful discussions. The authors would also like to thank M. Zargari, R. Betancourt, B. Amruturand, J. Leung, J. Shott, and Stanford Nanofabrication Facility for providing several test chips. They are also grateful to Rockwell Semiconductor for providing access to their phase noise measurement system. REFERENCES [1] E. J. Baghdady, R. N. Lincoln, and B. D. Nelin, “Short-term frequency stability: Characterization, theory, and measurement,” Proc. IEEE, vol. 53, pp. 704–722, July 1965. [2] L. S. Cutler and C. L. Searle, “Some aspects of the theory and measurement of frequency fluctuations in frequency standards,” Proc. IEEE, vol. 54, pp. 136–154, Feb. 1966. [3] D. B. Leeson, “A simple model of feedback oscillator noises spectrum,” Proc. IEEE, vol. 54, pp. 329–330, Feb. 1966. [4] J. Rutman, “Characterization of phase and frequency instabilities in precision frequency sources; Fifteen years of progress,” Proc. IEEE, vol. 66, pp. 1048–1174, Sept. 1978. [5] A. A. Abidi and R. G. Meyer, “Noise in relaxation oscillators,” IEEE J. Solid-State Circuits, vol. SC-18, pp. 794–802, Dec. 1983. [6] T. C. Weigandt, B. Kim, and P. R. Gray, “Analysis of timing jitter in CMOS ring oscillators,” in Proc. ISCAS, June 1994, vol. 4, pp. 27–30. [7] J. McNeil, “Jitter in ring oscillators,” in Proc. ISCAS, June 1994, vol. 6, pp. 201–204. [8] J. Craninckx and M. Steyaert, “Low-noise voltage controlled oscillators using enhanced LC-tanks,” IEEE Trans. Circuits Syst.–II, vol. 42, pp. 794–904, Dec. 1995.

194

[9] B. Razavi, “A study of phase noise in CMOS oscillators,” IEEE J. Solid-State Circuits, vol. 31, pp. 331–343, Mar. 1996. [10] B. van der Pol, “The nonlinear theory of electric oscillations,” Proc. IRE, vol. 22, pp. 1051–1086, Sept. 1934. [11] N. Minorsky, Nonlinear Oscillations. Princeton, NJ: Van Nostrand, 1962. [12] P. A. Cook, Nonlinear Dynamical Systems. New York: Prentice Hall, 1994. [13] W. A. Gardner, Cyclostationarity in Communications and Signal Processing. New York: IEEE Press, 1993. [14] H. B. Chen, A. van der Ziel, and K. Amberiadis, “Oscillator with oddsymmetrical characteristics eliminates low-frequency noise sidebands,” IEEE Trans. Circuits Syst., vol. CAS-31, Sept. 1984. [15] J. G. Maneatis, “Precise delay generation using coupled oscillators,” IEEE J. Solid-State Circuits, vol. 28, pp. 1273–1282, Dec. 1993. [16] C. K. Yang, R. Farjad-Rad, and M. Horowitz, “A 0.6mm CMOS 4Gb/s transceiver with data recovery using oversampling,” in Symp. VLSI Circuits, Dig. Tech. Papers, June 1997. [17] D. DeMaw, Practical RF Design Manual. Englewood Cliffs, NJ: Prentice-Hall, 1982, p. 46.

Ali Hajimiri (S’95) was born in Mashad, Iran, in 1972. He received the B.S. degree in electronics engineering from Sharif University of Technology in 1994 and the M.S. degree in electrical engineering from Stanford University, Stanford, CA, in 1996, where he is currently engaged in research toward the Ph.D. degree in electrical engineering. He worked as a Design Engineer for Philips on a BiCMOS chipset for the GSM cellular units from 1993 to 1994. During the summer of 1995, he worked for Sun Microsystems, Sunnyvale, CA, on the UltraSparc microprocessor’s cache RAM design methodology. Over the summer of 1997, he worked at Lucent Technologies (Bell-Labs), where he investigated low phase noise integrated oscillators. He holds one European and two U.S. patents. Mr. Hajimiri is the Bronze medal winner of the 21st International Physics Olympiad, Groningen, Netherlands.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

Thomas H. Lee (M’83) received the S.B., S.M., Sc.D. degrees from the Massachusetts Institute of Technology (MIT), Cambridge, in 1983, 1985, and 1990, respectively. He worked for Analog Devices Semiconductor, Wilmington, MA, until 1992, where he designed high-speed clock-recovery PLL’s that exhibit zero jitter peaking. He then worked for Rambus Inc., Mountain View, CA, where he designed the phaseand delay-locked loops for 500 MB/s DRAM’s. In 1994, he joined the faculty of Stanford University, Stanford, CA, as an Assistant Professor, where he is primarily engaged in research into microwave applications for silicon IC technology, with a focus on CMOS IC’s for wireless communications. Dr. Lee was recently named a recipient of a Packard Foundation Fellowship award and is the author of The Design of CMOS Radio-Frequence Integrated Circuits (Cambridge University Press). He has twice received the “Best Paper” award at ISSCC.

56

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999

Transactions Briefs A Study of Oscillator Jitter Due to Supply and Substrate Noise Frank Herzel and Behzad Razavi

(a)

Abstract—This paper investigates the timing jitter of single-ended and differential CMOS ring oscillators due to supply and substrate noise. We calculate the jitter resulting from supply and substrate noise, show that the concept of frequency modulation can be applied, and derive relationships that express different types of jitter in terms of the sensitivity of the oscillation frequency to the supply or substrate voltage. Using examples based on measured results, we show that thermal jitter is typically negligible compared to supply- and substrate-induced jitter in high-speed digital systems. We also discuss the dependence of the jitter of differential CMOS ring oscillators on transistor gate width, power consumption, and the number of stages. Index Terms—Jitter, oscillator, phase-locked loops, supply noise.

(b) Fig. 1. Single-ended ring oscillator: (a) block diagram and (b) implementation of one stage.

I. INTRODUCTION High-speed digital circuits such as microprocessors and memories employ phase locking at the board-chip interface to suppress timing skews between the on-chip clock and the system clock [1]–[3]. Fabricated on the same substrate as the rest of the circuit, the phase-locked loop (PLL) must typically operate from the global supply and ground busses, thus experiencing both substrate and supply noise. The noise manifests itself as jitter at the output of the PLL, primarily through various mechanisms in the voltage-controlled oscillator (VCO). As exemplified by measured results reported in the literature, we show that the contribution of device electronic noise to jitter is typically much less than that due to supply and substrate noise. This paper describes the effect of supply and substrate noise on the performance of single-ended and differential ring oscillators, providing insights that prove useful in the design of other types of oscillators as well. Section II summarizes the oscillators studied in this work and Section III defines various types of jitter. Sections IV and V quantify the jitter due to thermal noise in the oscillation loop and frequency-modulating noise, respectively. Sections VI and VII apply the developed results to the analysis of supply and substrate noise, and Section VIII presents the dependence of jitter upon parameters such as device size, the number of stages, and power dissipation. II. RING OSCILLATORS

UNDER INVESTIGATION

In this paper, we investigate both single-ended ring oscillators (SERO’s) and differential ring oscillators (DRO’s). The latter are much more important in digital circuit applications, since DRO’s are less affected by supply and substrate noise. The circuit topologies are shown in Fig. 1 for the SERO and in Fig. 2 for the DRO. Manuscript received October 1, 1997; revised August 2, 1998. This paper was recommended by Associate Editor B. H. Leung. F. Herzel was with the Electrical Engineering Department, University of California at Los Angeles, Los Angeles, CA 90095, USA, on leave from the Institute for Semiconductor Physics, Frankfurt, Oder, Germany. B. Razavi is with the Electrical Engineering Department, University of California, Los Angeles, CA 90095 USA. Publisher Item Identifier S 1057-7130(99)01471-8.

(a)

(b) Fig. 2. Differential ring oscillator: (a) block diagram and (b) implementation of one stage.

The simulations were performed with the SPICE parameters of a 0.6-m CMOS technology. We employed the minimum gate length throughout the paper. Furthermore, unless indicated otherwise, we use the following parameters for the differential stage: W m, RL k , ISS mA, CL ; VDD V. The rms value of VDD was chosen to be 71 mV, corresponding to a peak amplitude of 100 mV for a sinusoidal perturbation.

1

=1

=1

=0

III. DEFINITIONS

= 80

=3

OF

()

JITTER

We consider the output voltage Vout t of an oscillator in the steady state. The time point of the nth minus-to-plus zero crossing of Vout t is referred to as tn . The nth period is then defined as Tn tn+1 0 tn . For an ideal oscillator, this time difference is independent of n, but in reality it varies with n as a result of noise in the circuit. This results in a deviation Tn Tn 0 T from the mean period T . The quantity Tn is an indication of jitter.

=

1

1 =

1057–7130/99$10.00  1999 IEEE





()

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999

57

because the latter type hardly changes when the oscillator is placed in the loop. A more general quantification of the jitter is possible by means of the steady-state autocorrelation function (ACF) defined as N C1T m Tn+m Tn : (4) N !1 N n=1 To obtain an intuitive understanding of this quantity, we insert (4) with m in (2), obtaining

( ) = lim 1

=0

(1

1 )

1Tc2 = C1T (0):

(a)

(5)

Equation (5) states that the ACF with zero argument is the squared cycle jitter. For a nonzero argument, the ACF decreases with increasing m, finally approaching zero for m ! 1. This indicates that the timing error Tn has a finite memory. In order to express the cycle-to-cycle jitter by the ACF, we rewrite (3) as N Tcc2 Tn+1 0 Tn 2 N !1 N n=1 C1T 0 C1T : (6)

1

(b)

1 = lim 1 (1 1 ) = 2 (0) 2 (1)

Fig. 3. Illustration of (a) long-term jitter and (b) cycle-to-cycle jitter.

More specifically, absolute jitter or long-term jitter N Tabs N Tn n=1

1

( )=

1

(1)

is often used to quantify the jitter of phase-locked loops. Modeling the total phase error with respect to an ideal oscillator [Fig. 3(a)], absolute jitter is nonetheless illsuited to describing the performance of oscillators because, as shown later, the variance of Tabs diverges with time. A better figure of merit for oscillators is cycle jitter, defined as the rms value of the timing error Tn 1

1

1

1 1Tc = Nlim !1 N

N n=1

1Tn2 :

(2)

Cycle jitter describes the magnitude of the period fluctuations, but it contains no information about the dynamics. The third type of jitter considered here is cycle-to-cycle jitter [Fig. 3(b)] given by

1 1Tcc = Nlim !1 N

N

(Tn+1 0 Tn )2

n=1

(3)

representing the rms difference between two consecutive periods. Note the difference between the cycle jitter and the cycle-to-cycle jitter: the former compares the oscillation period with the mean period and the latter compares the period with the preceding period. Hence, in contrast to cycle jitter, cycle-to-cycle jitter describes the shortterm dynamics of the period. The long-term dynamics, on the other hand, are not characterized by cycle-to-cycle jitter. For example, if =f noise modulates the frequency slowly, Tcc does not reflect the result accurately. With respect to the zero crossings, the cycle-to-cycle jitter is a double-differential quantity in that three zero crossings of the output voltage are related to each other. As discussed in Section V, this results in a completely different dependence on the modulation frequency than for the cycle jitter. We should note that an oscillator embedded in a phase-locked loop periodically receives correction pulses from the phase detector and charge pump, and hence its long-term jitter strongly depends on the PLL dynamics. Thus, for the analysis of a free-running oscillator, cycle jitter and cycle-to-cycle jitter are more meaningful, particularly

1

1

1 In this paper, we use a time average definition of jitter which is equivalent to the stochastic average if and only if the process T 2 is ergodic.

1

This expression will be used for an analytical calculation of the cycle-to-cycle jitter in Section V. IV. JITTER DUE TO DEVICE ELECTRONIC NOISE The electronic noise of the devices in an oscillator loop leads to phase noise and jitter [5], [7], [8]. Our objective is to express jitter in terms of phase noise and vice versa. These relationships are useful as they relate two measurable quantities. In this paper, we neglect the effect of =f noise because it introduces only slow phase variations in the oscillator. Such variations are suppressed by the large loop bandwidth of PLL’s used in today’s digital systems. As derived in the Appendix, for white noise sources in the oscillator, the single-sideband phase noise S (phase noise with respect to the carrier) can be expressed in terms of the cycle-to-cycle jitter according to

1

S (!) =

!03 =4 1Tcc2 (! 0 !0)2 + !03 =8 2 1Tcc4

3 2  !(0!=40!10 )T2 cc (7) and ! 0 !0 is the offset

where !0 is the oscillation frequency frequency. The Appendix also shows that the cycle-to-cycle jitter can be deduced from the phase noise according to

1Tcc2  4!3 S (!)(! 0 !0 )2 : 0

(8)

To obtain an estimate of the thermal jitter, we consider the differential CMOS ring oscillator in [5]. For the 2.2-GHz oscillator with a phase noise of 094 dBc/Hz at 1 MHz offset, we obtain from (8) a thermal cycle-to-cycle jitter of 0.3 ps, i.e., less than 0.3 . Similar values are obtained for the 900-MHz CMOS ring oscillators reported in [6]. In most timing applications, such small values are negligible with respect to other sources of random jitter. The thermal absolute jitter is proportional to the square root of the measurement interval t. As derived in the Appendix, the absolute jitter is given by

1

p

1Tabs = f20 1Tcc 1t:

(9)

In [9], the rms value of absolute jitter has been divided by the square root of the measurement time to obtain a time-independent figure of merit. This is not possible for supply and substrate noise, since (7)–(9) are derived for white noise in the feedback loop. Supply and substrate noise, however, are generally not white.

58

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999

in the remainder. The deviation of the period from the mean is

1T (t) = f0 + 11 f0 (t) 0 f10

(12)

0  0 VmfK 2 cos !m t:

(13)

0

1 (+ )

Multiplying this expression by T t  and averaging the result with respect to t, we obtain the steady-state ACF 2

2

1T (t +  )1T (t) = Vm2fK4 0 cos !m : 0

Fig. 4. Illustration of frequency modulation through changes of drain junction capacitances.

(14)

1 ()

This quantity represents the ACF of the process T t in a continuous-time description. For the evaluation of the jitter according and C of the discrete-time to (5) and (6), we need the values C ACF. If the ACF does not change significantly during one oscillation period, these values can be determined from the continuous-time ACF at time points  2 T . A numerical 2 T and  verification of this approach is given below. Inserting (14) with  in (5), we find the cycle jitter

(1)

(0)

=0

=1  =0  =0 1Tc = Vpm2Kf 20 :

(15)

0

= T = 1=f0 , we obtain from (6) and (14) the

For  and  cycle-to-cycle jitter

1Tcc = VmfK2 0 1 0 cos(!m=f0 ): 0

Fig. 5. Illustration of the VCO model of an oscillator.

V. JITTER OF A FREQUENCY-MODULATED OSCILLATOR The frequency of an oscillator generally depends on the supply and substrate voltage. The variation of the oscillation frequency with a voltage may be described by a sensitivity function, also called the gain of the VCO and denoted by KVCO . For example, as shown in Fig. 4, the drain junction capacitance of M1 and M2 varies with VDD and VSub , thus modulating the frequency of the ring oscillator. In some cases, KVCO itself may be a function of the modulating frequency. In Fig. 4, for example, high-frequency supply noise results in fast changes in VP and hence substantial displacement current through CS (the capacitance contributed by M1 , M2 , and the current source). Note that, in general, the frequency of an oscillator depends on various bias and supply voltages, as conceptually illustrated in Fig. 5. An oscillator subject to supply and substrate noise may be considered as a VCO with different “control” voltages each having a different sensitivity. In this section, we attribute the cycle jitter and the cycle-to-cycle jitter of a frequency-modulated oscillator to the static sensitivity K0 . As shown in Sections VI and VII, these expressions describe supply and substrate noise quite accurately. Let the modulating control voltage be a small sinusoidal perturbation

1Vm (t) = Vm cos !m t:

Equations (15) and (16) express the jitter in terms of the lowfrequency sensitivity and the modulation frequency. They will be verified numerically in Sections VI and VII. The main benefit of these equations is that the calculation of the jitter is reduced to the calculation or measurement of the oscillation frequency as a function of the supply or substrate voltage. From (15), we note that cycle jitter is independent of frequency so long as the quasi-static approximation (11) holds. By contrast, cycle-to-cycle jitter increases with frequency. For fm  f0 we find from (16)

1Tcc  VmpK2f0 !3 m : (17) 0 Note that the cycle-to-cycle jitter 1Tcc is approximately proportional to the modulation frequency fm . This can be interpreted by noting that 1Tcc is a double-differential quantity, as is evident from (5)

and (6). Having reduced the jitter calculation to the static sensitivity K0 , we need to extract this quantity from simulations. For this purpose, we apply a dc voltage perturbation to the supply. Fig. 6 shows the oscillation frequency of the SERO and the DRO as a function of the supply voltage. The frequency varies linearly with the supply voltage over a relatively wide range of VDD . The slope of the curves in Fig. 6 represents the low-frequency sensitivity K0 , indicating that the SERO is much more sensitive than the DRO. Using these values in (15) and (16), we can predict the jitter from supply and substrate noise easily.

(10) VI. JITTER DUE

We assume, as an approximation, that the frequency change follows the control voltage according to

1f0 (t) = VmK0 cos !mt

(16)

(11)

where K0 is the static sensitivity, also called low-frequency VCO gain. This approximation is referred to as quasi-static approximation

TO

SUPPLY NOISE

The supply and substrate noise created in a digital system is quite complex. In addition to components at the clock frequency and harmonics and subharmonics thereof, the noise spectrum generally exhibits random signals resulting from the activities of each building block as well. A rigorous treatment requires that the noise spectrum be measured in a realistic environment and subsequently incorporated in the analysis as explained in Section V.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999

59

(a)

(a)

(b)

(b)

Fig. 6. Oscillation frequency of (a) the single-ended ring oscillator and (b) the differential ring oscillator as a function of static supply voltage.

Fig. 7. Cycle jitter and cycle-to-cycle jitter of (a) the SERO and (b) the DRO as a function of supply voltage noise frequency. The solid lines represent the quasi-static FM expressions.

In the following, we investigate the jitter due to sinusoidal supply voltage perturbations. The calculation of the jitter consists of the following steps: interpolation of the voltage waveform to find the zero crossings; calculation of the periods Tn and subtraction of the mean period T to obtain Tn ; and calculation of the cycle jitter (2) and the cycle-to-cycle jitter (3) by performing time averaging. We should also mention that simulations indicate that jitter has a relatively linear dependence on the noise amplitude for supply variations as large as a few hundred millivolts. Fig. 7 plots the analytical and simulated cycle and cycle-to-cycle jitter of single-ended and differential ring oscillators. As can be seen, the analytical results of Section V predict the jitter with reasonable accuracy.

Fig. 8. Substrate noise modeling.



1

VII. JITTER DUE

TO

SUBSTRATE NOISE

Substrate noise can be treated in the same fashion as supply noise. For the numerical simulation of substrate noise, the bulk terminal of the transistors is driven by a noise source (Fig. 8). Fig. 9 shows the calculated jitter of the DRO as a function of the noise frequency. Comparison with Fig. 7 indicates that for the DRO, a supply voltage perturbation is almost equivalent to a substrate voltage perturbation of opposite sign. To understand this, note from Fig. 10 that, with an ideal tail current source, a change of V in VDD is equivalent to a change

1

1

of 0 V in Vsub . Simulations confirm that the static sensitivity K0 is indeed equal for supply and substrate noise, apart from the sign. Fig. 9 also demonstrates that the quasi-static FM approach is suited to describing the jitter introduced by substrate noise. Furthermore, it suggests that a substantial fraction of the jitter results from the voltage dependence of Cdb and Csb . VIII. OSCILLATOR DESIGN

FOR

LOW JITTER

The simulation results presented thus far indicate the superior performance of differential oscillators with respect to single-ended topologies. Nonetheless, even differential configurations have a wide design space; device size, voltage swings, power dissipation, and the number of stages in a ring oscillator influence the overall sensitivity to supply and substrate noise.

60

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999

Fig. 11. Jitter of the DRO versus gate width. Fig. 9. Cycle jitter and cycle-to-cycle jitter of the DRO versus substrate voltage noise frequency. Solid lines represent the quasi-static FM expressions. The empty symbols show the jitter with the drain-bulk and source-bulk capacitances set to zero.

(a)

Fig. 10. DRO.

Illustration of the equivalence of supply and substrate noise for the

In this section, we study jitter as a function of three parameters: transistor gate width, power dissipation, and the number of stages. To make meaningful comparisons, the circuit is modified in each case such that the frequency of oscillation remains constant. These parameters also affect the thermal jitter to some extent, but, considering the vastly different designs reported in [5] and [6], we note that this type of jitter still remains negligible. A. Effect of Transistor Gate Width The differential three-stage ring oscillator of Fig. 2 begins to oscillate for W  30 m. Fig. 11 shows the effect of the gate width on the jitter, where the oscillation frequency is kept constant by adjusting CL in Fig. 2. The jitter reaches a minimum for W  80 m. For large W , the value of CL must be reduced so as to maintain the same oscillation frequency, yielding a larger voltage-dependent fraction due to drain and source junctions of each device and hence a higher sensitivity to noise.

(b) Fig. 12. Illustration of the relationship between power consumption and noise for (a) device electronic noise and (b) supply noise.

By contrast, the effect of supply and substrate noise on the jitter of a given oscillator topology is relatively independent of the power drain. This can be understood with the aid of the conceptual illustrations in Fig. 12, where the output voltages of N identical oscillators are added in phase. In Fig. 12(a), only the device electronic noise is considered [5]. Since thepnoise in each oscillator is uncorrelated, the output noise voltage is N times that of each oscillator, whereas the output signal voltage is N 2Vj . In Fig. 12(b), on the other hand, all oscillators are disturbed by the same noise source, thus exhibiting completely correlated noise. That is, both the noise voltage and the signal voltage are increased by a factor of N . To confirm the above observation, the gate width and tail current were decreased while the load resistance was increased proportionally. Table I shows that the jitter is quite constant.

B. Effect of Power Consumption

C. Effect of Number of Stages

The jitter resulting from device electronic noise generally exhibits an inverse dependence upon the oscillator power dissipation [5], [10].

In applications where the required oscillation frequency is considerably lower than the maximum speed of the technology, a ring

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999

IMPACT

OF

61

TABLE I POWER CONSUMPTION

TABLE II THREE-STAGE VERSUS SIX-STAGE OSCILLATOR

Fig. 14. Grounded shield used under the capacitor to block substrate noise.

an n-well, grounded by a low-resistance n+ ring, is placed under the capacitor so as to block the noise produced in the substrate. IX. CONCLUSION We have investigated the timing jitter in oscillators subject to supply and substrate noise. For digital timing applications, the effect of supply and substrate noise on the jitter is typically much more pronounced than that of thermal noise. For supply and substrate noise, we have derived analytical relationships between the cycle-to-cycle jitter and the low-frequency sensitivity of the oscillation frequency to supply or substrate noise. These relationships have been verified by means of numerical calculations for single-ended and differential CMOS ring oscillators. For differential ring oscillators, we have investigated the dependence of the jitter on the transistor gate width, power consumption, and the number of stages. As a special result, we have found that in applications where the required oscillation frequency is lower than the maximum speed of the technology, a three-stage ring oscillator with additional load capacitances gives the lowest jitter. APPENDIX JITTER AND PHASE NOISE DUE TO THERMAL AND SHOT NOISE The output voltage of an oscillator can be written as

( ) = V0 cos[!0t + (t)]

V t

(18)

()

where V0 is the amplitude, !0 is the oscillation frequency, and  t is the slowly varying excess phase. The excess frequency is

Fig. 13. Jitter of the three-stage and the six-stage DRO versus gate width for an oscillation frequency of 500 MHz.

1!(t) = dtd (t) and hence

( )=

 t

oscillator may incorporate more than three stages. Thus, the optimum number of stages with respect to the jitter is of interest. Shown in Table II and plotted in Fig. 13 is the jitter of three-stage and six-stage oscillators designed for a frequency of 500 MHz with constant tail current and voltage swings. We note that the minimum values of cycle jitter and cycle-to-cycle jitter are smaller in a threestage topology. This is because for the three-stage oscillator, the reduction of the oscillation frequency to the desired value is obtained by means of the fixed capacitances CL rather than by the voltagedependent capacitances of the transistors. Hence, a smaller fraction of the total load capacitance is subject to variations with supply and substrate noise. The addition of a fixed capacitor to each stage nonetheless entails the issue of substrate noise coupling to the bottom plate of the capacitor. In order to minimize this effect, a grounded shield must isolate the capacitor from the substrate, as illustrated in Fig. 14. Here,

t

0

1!(u) du + (0):

(19)

(20)

Thermal and shot noise may be considered as white noise since their cutoff frequencies are typically much higher than the oscillation frequency. White noise in the feedback loop of the oscillator results in phase diffusion, a phenomenon described by a Wiener process [11]. Extensive investigations of phase noise indicate that white noise sources in all types of oscillators give rise to a phase noise power spectrum proportional to = ! 2 , where ! is the offset frequency with respect to the carrier frequency [4], [5]. This trend is valid for offset frequencies as high as several percent of the carrier frequency. Thus, frequency noise, ! t , can be assumed white in such a band. The autocorrelation of ! t is given by

1 (1 )

1

1 () 1 () 1!(t +  )1!(t) = 2D( ) (21) where D is the diffusivity and  ( ) the Delta function. The probability density of (t) represents a Gaussian distribution centered

62

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999

at 

(0) with the variance

2 = 2D t:

(22)

On the other hand, the phase noise can be expressed by the cycle-tocycle jitter by inserting (32) in (25), yielding

As evident from (22), the variance diverges with time. The autocorrelation of V t is known [12] and reads

() 2 hV (t +  )V (t)i = V20 exp(0Dj j)cos(!0  ):

(24) (! 0 !0)2 + D2 : This quantity is often normalized to V02 =2 and referred to as relative

phase noise with respect to the carrier [5] or as single-sideband phase noise [8], given by

2D (! 0 !0 )2 + D2 :

 D , we obtain from (25) 2D S (!)  (! 0 !0 )2 :

()

=0

The excess phase change during the nth cycle is referred to as The nth oscillation period is defined by the relation

2f0Tn = 2 + 1n :

(27)

1n . (28)

 1Tn = 21fn0 = 1n 2T : (29) Hence, the cycle jitter 1Tc of the period during one cycle is related to 1c according to  1Tc = 1c 2T : (30) For white noise sources, two successive periods are uncorrelated. Since cycle-to-cycle jitter represents the difference between two periods, the variance of cycle-to-cycle jitter is twice as large as the variance of one period, yielding

p

(31)

Combining (27), (30), and (31), we obtain

1Tcc2 = 8!3 D 0

with

!0 =

2 : T

(32)

(33)

The cycle-to-cycle jitter can now be expressed in terms of the singlesideband phase noise by inserting (32) in (26) to give

1Tcc2  4!3 S (!)(! 0 !0 )2 : 0

S =

=

f03 1Tc2 (f 0 f0 )2

(36)

2

1abs = 2D 1t =  1t:

=1

1Tcc = 21Tc :

(35)

(26)

For the deviation of the nth period Tn from the mean period =f0 , we then find

T

:

(25)

=

1c = 2DT:

1Tcc4

where f0 !0 = . Equation (36) turns out to be a special case of (35) for ! 0 !0  D . The absolute jitter increases proportionally to the square root of the measurement interval t as evident from (22). Hence, the absolute phase jitter is

Next, we will relate the cycle-to-cycle jitter and the single-sideband phase noise to each other. Note that the stationary Wiener process has no memory and the increments in different time intervals are statistically independent [11]. Therefore, the rms mean increment of the excess phase  t within one cycle, i.e., the cycle jitter of the phase, equals the increment of  t between t and t T . Thus, from (22), we obtain the phase cycle jitter as

()

2

A similar expression has been derived for ring oscillators in [10] and reads in our notation

D

SV (!) = V02

For ! 0 !0

!03 =4 1Tcc2 (! 0 !0)2 + !03=8

(23)

Performing the Fourier transformation, we obtain the one-sided power spectral density

S (!) =

S =

(34)

1

p

(37)

Using (32), the proportionality constant  can be related to the cycle-to-cycle jitter according to

=

p

2D = 2f03=2 1Tcc :

(38)

REFERENCES [1] I. A. Young, J. K. Greason, and K. L. Wong, “A PLL clock generator with 5 to 110 MHz of lock range for microprocessors,” IEEE J. SolidState Circuits, vol. 27, pp. 1599–1607, Nov. 1992. [2] J. Alvarez, H. Sanchez, G. Gerosa, and R. Countryman, “A widebandwidth low-voltage PLL for powerPCTM microprocessors,” IEEE J. Solid-State Circuits, vol. 30, pp. 383–391, Apr. 1995. [3] R. Bhagwan and A. Rogers, “A 1 GHz dual-loop microprocessor PLL with instant frequency shifting,” in IEEE Proc. ISSCC, San Francisco, CA, Feb. 1997, pp. 336–337. [4] D. B. Leeson, “A simple model of feedback oscillator noise spectrum,” in Proc. IEEE, pp. 329–330, Feb. 1966. [5] B. Razavi, “A study of phase noise in CMOS oscillators,” IEEE J. Solid-State Circuits, vol. 31, pp. 331–343, Mar. 1996. [6] T. Kwasniewski et al., “Inductorless oscillator design for personal communications devices—A 1.2 m CMOS process case study,” in Proc. CICC, May 1995, pp. 327–330. [7] F. X. K¨artner, “Analysis of white and f 0 noise in oscillators,” Int. J. Circuits Theory, Appl., vol. 18, pp. 485–519, Sept. 1990. [8] W. Anzill and P. Russer, “A general method to simulate noise in oscillators based on frequency domain techniques,” IEEE Trans. Microwave Theory Tech., vol. 41, pp. 2256–2263, Dec. 1993. [9] J. A. McNeill, “Jitter in ring oscillators,” IEEE J. Solid-State Circuits, vol. 32, pp. 870–879, June 1997. [10] T. C. Weigandt, B. Kim, and P. R. Gray, “Analysis of timing jitter in CMOS ring oscillators,” in Proc. IEEE Int. Symp. Circuits and Systems (ISCAS’94), London, U.K., June 1994, vol. 4, pp. 27–30. [11] C. W. Gardiner, Handbook of Stochastic Methods. Berlin: SpringerVerlag, 1983. [12] R. L. Stratonovich, Topics in the Theory of Random Noise. New York: Gordon and Breach, 1967.

204

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001

A 2-V 900-MHz Monolithic CMOS Dual-Loop Frequency Synthesizer for GSM Receivers William S. T. Yan and Howard C. Luong, Member, IEEE

Abstract—A 900-MHz monolithic CMOS dual-loop frequency synthesizer suitable for GSM receivers is presented. Implemented in a 0.5- m CMOS technology and at a 2-V supply voltage, the dual-loop frequency synthesizer occupies a chip area of 2.64 mm2 and consumes a low power of 34 mW. The measured phase noise of the synthesizer is 121.8 dBc/Hz at 600-kHz offset, and the measured spurious levels are 79.5 and 82.0 dBc at 1.6 and 11.3 MHz offset, respectively. Index Terms—Frequency synthesis, frequency synthesizer, phase-locked loop, radio frequency, voltage-controlled oscillator. Fig. 1. Block diagram of the GSM-receiver front-end.

I. INTRODUCTION

M

ODERN transceivers for wireless communication consist of low-noise amplifiers, power amplifiers, mixers, DSP chips, filters, and frequency synthesizers. These building blocks have been realized using hybrid technologies and require interfacing circuits, which increases the power consumption and limits the maximum operating speed of the transceivers. For this reason, it has become increasingly attractive to design and monolithically integrate all these building blocks on a single chip. Designing fully integrated frequency synthesizers for this integration is always desirable but most challenging. The first requirement is to achieve high-frequency operation with reasonable power consumption. However, the most critical challenges for the frequency synthesizer are the phase-noise and spurious-level performance. Finally, small chip area is essential to monolithic system integration. In recent years, monolithic frequency synthesizers with good phase-noise performance have been reported [1]–[3]. However, those designs operate at supply voltages of at least 2.7 V and power consumption of more than 50 mW. Moreover, frequency synthesizers suffer from fractional fractionalspurs which degrade their spurious-tone performance. This paper presents a monolithic dual-loop frequency synthesizer for GSM 900 system, which is implemented in a 0.5- m CMOS process, that achieves high operating frequency (935.2–959.8 MHz), low power consumption (34 mW), low phase noise ( 121.8 dBc/Hz at 600kHz), low spurious level ( 82.0 dBc at 11.3MHz), and fast switching time (830 s). Section II derives the design specification of the frequency synthesizer for GSM 900. Section III describes the architecture for the proposed dual-loop design. In Section IV, Manuscript received December 29, 1999; revised October 5, 2000. The authors are with the Department of Electrical and Electronic Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong (e-mail: [email protected]; [email protected]). Publisher Item Identifier S 0018-9200(01)00927-1.

circuit implementation of critical building blocks is discussed. Section V presents the measurement results of the synthesizer including its phase noise, spurious level, and switching time of the frequency synthesizer together with a comprehensive performance evaluation. II. DESIGN SPECIFICATION The performance of frequency synthesizers is mainly specified by their output frequency, phase noise, spurious level, and switching time. This section derives the specifications of a frequency synthesizer for GSM receivers. A. Output Frequency In GSM-900 systems, the receiver-channel frequencies are expressed as follows: MHz

(1)

is the channel number. To receive where signals in different channels, a GSM-receiver front-end, shown in Fig. 1, is adopted. The receiver front-end consists of a lownoise amplifier (LNA) and an RF filter for filtering out-of-band noise and blocking signals. The received signal is then mixed down to an IF frequency ( ) of 70 MHz for base-band signal processing. To extract information from the desired channel, the ) of the frequency local oscillator (LO) output frequency ( synthesizer is changed accordingly, as follows: –

MHz

(2)

which is the output-frequency range of the frequency synthesizer to be achieved. B. Phase Noise The blocking-signal specification for GSM 900 receivers is shown in Fig. 2, where the desired signal power can be as low

0018–9200/01$10.00 © 2001 IEEE

YAN AND LUONG: MONOLITHIC CMOS DUAL-LOOP FREQUENCY SYNTHESIZER

205

Fig. 2. SNR degradation due to the phase noise and spurious level.

as 102 dBm. At 600-kHz offset frequency, the power of the blocking signal can be as high as 43 dBm [4]. With a correct LO frequency, the desired channel signal is downconverted to IF frequency. However, blocking signals are also downconverted with the LO signal and its phase noise. Since the power of the blocking signal is much larger than that of the desired signal, the phase-noise power falls into the IF frequency and degrades the signal to noise ratio (SNR). The phase-noise specification can be expressed as follows: Fig. 3. GSM 900 receive and transmit time.

SNR dBc/Hz at 600 kHz

(3)

of 9 dB is the SNR specification for the whole where SNR dBm and dBm are receiver, and the power levels of the minimum desired signal and maximum blocking signal, respectively. C. Spurious Tones Because of the feedthrough and modulation of the reference away from the signal, two spurious tones appears at the desired output frequency, as shown in Fig. 2. The derivation of the spurious-tone specification is similar to that of the phase noise except that the channel bandwidth is not considered in this can be expressed as case. The spurious-tone specification follows: SNR dBc at 1.6 MHz dBc for offset

MHz.

(4)

respectively. For system monitoring purposes, a time slot between slot #6 and slot #7 is adopted. Therefore, the most critical switching time is from the transmission period (slot #4) to the system monitoring period (slot #6.5), which is equal to 865 s. However, to take care of the settling time of the other components, the switching time of the frequency synthesizer is recommended to be kept within one time slot (577 s) [5]. III. DUAL-LOOP DESIGN To reduce the switching time and the chip area of a synthesizer, a high loop bandwidth and a high reference frequency are desired. Moreover, to suppress the phase-noise contribution of the reference signal and improve frequency-divider complexity, a lower frequency-division ratio is desirable. Therefore, a dual-loop frequency synthesizer is proposed [6]. As shown in Fig. 4, the dual-loop design consists of two reference signals and two phase-locked loops (PLLs) in cascade configuration. In the feedback path of the high-frequency loop, a mixer is adopted to provide the frequency shift. The output frequency of the synthesizer is expressed as follows:

D. Switching Time In GSM 900 systems, time-division multiple-access (TDMA) is adopted within each frequency channel. As shown in Fig. 3, each frequency channel is divided into eight time slots. Signals are received and transmitted in time slot #1 and slot #4,

(5) and are frequencies of the two reference where , and are frequency division ratios. signals, and

206

Fig. 4.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001

Proposed dual-loop frequency synthesizer.

Due to the dual-loop architecture, the comparison frequencies of the low-frequency and high-frequency loops are scaled up from 200 kHz to 1.6 and 11.3 MHz, respectively. Therefore, the loop bandwidths of both PLLs can be increased so that the switching time and the chip area can be reduced. Compared to single-loop integer- designs, the frequency-division ratio is reduced from 4236–4449 of the programmable divider to 226–349. Such a reduction in the division ratio significantly simplifies the frequency-divider design and reduces phase-noise contribution of the input reference signal. In the proposed dual-loop synthesizer, the divide-by-32 diand the high-frequency loop together greatly attenuvider ates the phase noise and the spurious tones of the low-frequency loop. As such, the low-frequency loop can be designed to have a larger loop bandwidth and a loop filter as small as one-fifth of the loop filter in the high-frequency loop. The low-frequency loop requires additional components, including the phase-frequency detector (PFD1), the charge pump (CP1), and the fre, but they are all quite small and have very quency divider little impact on the chip area. In additional, VCO1 is implemented by a ring oscillator, which occupies a much smaller chip area compared to VCO2. Altogether, the dual-loop design requires no more than 25% overhead in the chip area compared to a fraction- design with the same loop bandwidth. of the Although the input-reference frequency low-frequency loop is scaled up by 8 times; the required frequency range of the oscillator VCO1 in the low-frequency loop is also scaled up from 25 to 200 MHz. On the other hand, the phase-noise of the ring oscillator is attenuated by the and is then amplified by the high-frefrequency divider quency loop; the total phase-noise attenuation from VCO1 output to the synthesizer output is 18 dB. Consequently, this voltage-controlled oscillator (VCO) requires a high operating frequency (600 MHz), a wide frequency range (200 MHz), and a low phase noise ( 103 dBc/Hz at 600 kHz). A novel ring VCO design that meets all of these tough specifications will be presented in the next section.

IV. CIRCUIT IMPLEMENTATION This section discusses the design consideration and circuit implementation of the major building blocks that are unique and critical to the proposed dual-loop synthesizer, namely the two VCOs, the frequency dividers, the charge pump, and the loop filters. Detailed analysis and design of other building blocks will not be presented, either because they can be found somewhere else or they are too obvious. A. Ring Oscillator VCO1 The schematic of the proposed two-stage ring oscillator and its delay cell to meet the required specification as described in Section III are shown in Fig. 5(a) and (b), respectively. The delay as input transconductors, cell consists of nMOS transistors for maintaining oscillacross-coupled pMOS transistors , and a bias trantion, diode-connected pMOS transistors for frequency tuning. The source nodes of transistors sistor are connected to supply to maximize its output amplitude , which also helps suppress noise sources by turning them off more often [7] and thus further enhances the phase-noise performance. The half circuit of the delay cell is shown in Fig. 5(c). By equating the delay-cell voltage gain to be unity, the oscillating frequency of the ring oscillator can be expressed as follows:

(6) is transconductance, is channel conductance, and where is the total capacitance at output node. Oscillation starts is large enough to overcome the output load when the . V, transistors are turned When control voltage , and the oscillator operates at maximum freon to cancel , transistor are quency. When control voltage

YAN AND LUONG: MONOLITHIC CMOS DUAL-LOOP FREQUENCY SYNTHESIZER

207

Fig. 5. Circuit implementation of the ring oscillator VCO1. (a) Ring oscillator. (b) Delay cell. (c) Half circuit of delay cell.

turned off , and the oscillator operates at minimum , and frequency range are frequency. By (6), expressed as follows:

(7)

is proportional to , nMOS transistors Since are adopted as the input devices to minimize power consumption. From (7), 50% tuning range can be achieved when . Based on the approximate impulse-stimulus function (ISF) and the analysis presented in [8], the phase noise of the oscillator is estimated to be approximately 107 dBc/Hz at 600-kHz offset. On the other hand, using SpectreRF [9], the phase noise is simulated to be 111.7 dBc/Hz at 600 kHz. B. LC Oscillator VCO2 As the far-offset phase noise is dominated by the VCO2, an LC oscillator is adopted to meet the stringent phase-noise specification. Fig. 6 shows the schematic of the LC oscillator. Crossare used to start and to maintain oscilcoupled transistors lation with lower parasitics. PN-junction varactors implemented by p diffusion on the n-well are used for frequency-tuning purpose. The common-mode output voltage is designed at 1.1 V . To reduce to enhance the driving of the frequency divider phase-noise contribution due to flicker noise, pMOS transistors are used as the current source. To design an LC oscillator which satisfies the phase-noise requirement with minimum power consumption, inductors with large inductance and small series resistance are desired. Therefore, two-layer inductors are adopted [10] for which the inductance and the quality factor can be scaled up by 4 and 2 times, respectively. For the same reason, pn-junction varactors are interdigitized with p islands surrounded by n-well contacts to enhance the quality factor. Finally, the transconductance of transistor is designed so that it does not overcompen-

Fig. 6.

Circuit implementation of the LC oscillator VCO2.

sate the LC tank too much (only twice) to reduce phase-noise . contribution by transistors Based on the method described in [11], the phase noise of the LC VCO is estimated to be 124.0 dBc/Hz at 600-kHz frequency offset, which agrees well with the simulation using SpectreRF. C. Frequency Dividers As the divide-by-4 frequency divider needs to convert sinusoidal signals from the VCO2 output into square-wave signals, the first stage of the divider is implemented by pseudo-nMOS logic while the second divide-by-2 divider is implemented by the TSPC-logic divide-by-2 divider [12]. The first divider is shown in Fig. 7 and consists of a pseudo-nMOS amplifier and a divide-by-2 divider. Since the pseudo-nMOS logic is a ratioed logic, the ratio between pMOS and nMOS transistors is designed to be less than 1.6 to make sure output logic “0” turns off the next stage. D. Programmable-Frequency Divider Fig. 8 shows the block diagram of the programmable-fre[13]. At reset state, the prescaler divides quency divider

208

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001

TABLE I SYSTEM DESIGN OF PROGRAMMABLE-FREQUENCY DIVIDER

Fig. 7. Circuit implementation of the pseudo-nMOS divide-by-2 frequency divider.

N

control signal “1,” the gated inverter is by-passed, and the prescaler is a divide-by-12 divider. When control signal “0,” the final state “010” will be de“0” and the input signal is delayed by tected, at which one clock cycle. Thus the function of divide-by-13 is achieved. The back-carrier-propagation approach allows low-frequency signals (more significant bits) to switch to the final state much earlier than high-frequency signals (less significant bits) and thus reduces power consumption for a given speed. E. Charge Pumps and Loop Filters

Fig. 8. Block diagram of the programmable-frequency divider

N

.

input signal by , and its output is counted by both and counters. After the counter has counted pulses, the counter changes the state of the modulus control line and counter counts the prescaler divides input by . Then the the remaining – cycles to reach overflow. As a whole, the programmable divider generates one complete cycle for every input cycles. The operation repeats after the counter is reset. As the frequency-division ratio (226 349) can be achieved with different combinations of , , and , the most optimal combination in terms of performance needs to be identified and chosen as the design. As the counter must finish before the counter resets it, the division ratio of the counter should be larger than that of the counter. To optimize the power consumption, the operating frequencies and number of bits of both the and counters should be minimized. Table I shows the different combinations of , , and which can implement the desired division ratio. Case 1 requires the highest operating frequencies and number of bits for and counters, so it is not adopted. Case 4 has the problem that the value is larger than the value. It seems that Case 3 is the best one, but Case 2 is chosen as the final design because it is much easier to implement an asynchronous divide-by-12 frequency divider than a divide-by-14 divider. The dual-modulus prescaler is implemented by the back-carrier-propagation approach as shown in Fig. 9 [14]. When

Fig. 10 shows the circuit implementation of the charge pumps used in the two loops. Each charge pump consists of two cascode-current sources for both the pull-up and pull-down currents, four complementary switches, and a unity-gain amplifier. By using high-swing cascode current sources, the output impedance is increased for effective current injection. Minimum-size complementary switches are adopted to minimize clock feedthrough and charge injection of the switches. The unity-gain amplifier keeps the voltages of nodes VCO and to be equal so that charge sharing between nodes VCO, , can be minimized [15]. and The design of the loop filters in the two PLLs is a secondorder low-pass filter which is implemented using linear capacitors and silicide-blocked polysilicon resistors. The values of capacitance, resistance, and charge-pump current are optimally designed to satisfy simultaneously the phase-noise, spurioustone, and switching-time requirements with minimum chip area [16]. The loop bandwidth of the low-frequency and high-frequency loops are 40 and 27 kHz, respectively. F. Phase Noise of the Dual-Loop Frequency Synthesizer Based on the linearized model shown in Fig. 11, the transfer to output phase of both function from the input phase the low-frequency and high-frequency loops can be expressed as follows:

(8) where and ,

is the phase-detector gain, is the VCO gain, and are the total capacitance, zero time constant,

YAN AND LUONG: MONOLITHIC CMOS DUAL-LOOP FREQUENCY SYNTHESIZER

209

Fig. 9. Circuit implementation of the dual-modulus prescaler.

Fig. 10.

Circuit implementation of the charge pump and the loop filter.

and pole time constant of the corresponding loop filters, respecis the phase-detector gain, is the VCO gain, tively. , and are the total capacitance, zero time constant, and and pole time constant of the corresponding loop filters, respectively. Since the transfer function is a low-pass function, the reference phase noise is highly attenuated at high offset frequency. It also shows that the close-in phase noise of the reference signals is amplified by the frequency-division ratio. In this work, the division ratio is reduced from 4449 to 349, and the phase-noise contribution from the reference signals is supdB. pressed by Another important source of the close-in phase noise is the charge-pump noise. The transfer functions between the

charge-pump noise current to the output phase noise can be derived to be

(9)

210

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001

Fig. 11. Linearized model of the dual-loop frequency synthesizer.

It shows that small frequency-division ratio and large phasedetector gain or large charge-pump current are preferred for phase-noise consideration. Another factor not included in (9) which also affects the charge-pump noise is its turn-on time. In this proposed synthesizer, the charge-pump turn-on time is designed to be equal to 1/10 of the input period so that it is long enough to eliminate the phase-frequency detector (PFD) dead-zone problem, but at the same time is short enough to minimize the charge-pump phase-noise contribution. The phase-noise contribution of the loop-filter resistors in both PLLs can be estimated using their equivalent noise currents as follows:

which are high-pass functions. Therefore, the far-offset phase noise of the synthesizer is dominated by the VCO phase noise. Since the loop bandwidths of both PLLs are designed in a range of tens of kilohertz to achieve spurious-level specification, the far-offset phase noise of the PLLs only depends on the VCO phase noise itself. To evaluate the overall phase-noise performance, the relationship between the phase noise of the low-frequency loop and that of the synthesizer output can be written as

(12) (10) which are bandpass functions with peaks appearing between the zero and the pole of the loop filter. To suppress the phase-noise peaking, large loop-filter capacitors are desired at the cost of large chip area. For the phase-noise contribution of the VCOs, the transfer function between the VCO phase noise and output phase noise can be found to be

dB close-in which shows that there exists phase-noise suppression for the low-frequency loop. The estimated phase noise of the whole synthesizer is 81.4 dBc/Hz at 20.9 kHz and 123.8 dBc/Hz at 600 kHz. The contribution of each component is shown in Fig. 12, which shows that the close-in phase noise ( 100 kHz) is dominated by the charge pump CP1 and loop filter LF1, while the far-offset phase noise ( 100 kHz) is dominated by the LC oscillator. V. EXPERIMENTAL RESULTS

(11)

The dual-loop frequency synthesizer is implemented in a standard 0.5- m CMOS technology. Linear capacitors are put under all the bias pins to serve as on-chip bypass capacitors.

YAN AND LUONG: MONOLITHIC CMOS DUAL-LOOP FREQUENCY SYNTHESIZER

211

B. Measurement of Varactors The measurement results of the pn-junction varactor at 900 MHz are shown in Fig. 15. As the p diffusion of the varactors used in the LC oscillator are connected to the output of the LC oscillator core, they are biased at 1.16 V, which is the dc bias of the oscillator core during the measurement. The measured capacitance is close to the estimated results in the is around 2 reverse-biased region. The series resistance due to the minimum junction spacing and the nonminimum junction width. The quality factor is around 30 in the operating region of the oscillator. C. Measurement of Ring Oscillator VCO1

Fig. 12. Estimated phase noise of the whole dual-loop frequency synthesizer and contribution of each components at the synthesizer output.

The phase noise of the oscillators are measured by a directphase-noise measurement [17]. First, the carrier power is determined at large video (VBW) and resolution bandwidths (RBW). Then, the resolution bandwidth is reduced until the noise edges and not the envelope of the resolution filter are displayed. Finally, the phase noise is measured at the corresponding frequency offset from the carrier. To make sure that the measured phase noise is valid, the displayed values must be at least 10 dB above the intrinsic noise of the analyzer. Fig. 16 shows the measurement results of the ring oscillator VCO1. The operating frequency is measured to be between 324.0 and 642.2 MHz, over which the measured phase noise is between 111 and 108 dBc/Hz at 600 kHz. The power consumption is around 10 mW. D. Measurement of LC Oscillator VCO2

Fig. 13.

Die photo of the dual-loop frequency synthesizer.

Fig. 13 shows the die photo of the dual-loop frequency synthesizer, and the active area of the synthesizer is 2.64 mm . For characterization and measurement of passive devices, testing structures for spiral inductors and varactors are included on the same die with the synthesizer and are measured by a network analyzer. To de-embed the probing-pad parasitics, an open-pad structure is also measured. A. Measurement of Inductors , and Fig. 14 shows the inductance , series resistance of the on-chip spiral inductor. The measured quality factor inductance is close to simulation results and drops at frequencies close to the self-resonant frequency. However, the series (30.2 ) is almost three times larger than the exresistance pected value (11.6 ). The increase in series resistance is mainly caused by eddy current induced within substrate and n-well fingers [16]. As series resistance increases significantly, the port-1 quality factor is limited to be 1.6 at 900 MHz.

Fig. 17 shows the measurement results of the LC oscillator VCO2. Due to the quality-factor degradation of the spiral inductor, the bias current of the oscillator is increased by 15% above its designed value to achieve the phase noise specification ( 121 dBc/Hz at 600 kHz). The measured operating frequency range is between 725.0 and 940.5 MHz. The oscillation stops when the VCO control voltage is below 0.6 V because the varactors become forward-biased. Over the desired frequency range between 865.2 and 889.8 MHz, the achieved phase noise is below 121 dBc/Hz at 600 kHz. E. Measured Phase Noise of the Frequency Synthesizer Fig. 18 shows the phase-noise measurement results of the dual-loop frequency synthesizer at 889.8 MHz. The measured phase noise is 121.8 dBc/Hz at 600 kHz which satisfies the GSM requirement. At offset frequencies between 10 and 100 Hz, the phase noise is mainly contributed by the flicker noise of the charge pump. However, the peak phase noise of 65.67 dBc/Hz at 15 kHz is measured, which is 15 dB higher than the estimation presented in Fig. 12. At offset frequencies above 100 kHz, where the phase noise should be dominated by VCO2 and should go down by 20 dB/dec, the measured phase noise goes down at a rate of 40 dB/dec. It is believed that the increase in the close-in phase noise is mainly due to the charge pump CP2 and the loop filter LF2, for the following reasons. First, according to Fig. 12, only CP2 and LF2 have the phase-noise slope of 40 dB/dec above 100-kHz frequency offset. Second,

212

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001

Fig. 14.

Measurement results and equivalent circuit model of the spiral inductors at 900 MHz.

Fig. 15.

Measurement results and bias condition of the pn-junction varactors at 900 MHz.

the measured peak-to-flat close-in phase noise in Fig. 18 is around 15 dB, which is quite close to that of the estimated value in Fig. 12. Lastly, it is observed experimentally that the close-in phase noise is changed as the charge-pump current of the high-frequency loop is adjusted. Unfortunately, the phase-noise contribution of CP2 and LF2 cannot be measured individually.

F. Measured Spurious Tones of the Frequency Synthesizer Fig. 19 shows the measured spurious level of the dual-loop frequency synthesizer at 865.2 MHz, which are 79.5 dBc at 1.6 MHz, 82.0 dBc at 11.3 MHz, and 82.83 dBc at 16 MHz. At 11.3 MHz, the spurious level is only 6 dB above the requirement. However, the predicted spurious level at 1.6 MHz should be below 90 dBc and the one at 16 MHz should not exist [16].

YAN AND LUONG: MONOLITHIC CMOS DUAL-LOOP FREQUENCY SYNTHESIZER

Fig. 16.

213

Measurement results of the ring oscillator VCO1.

Fig. 19.

Measured spurious level of the proposed synthesizer.

G. Switching Time of the Frequency Synthesizer

Fig. 17.

Measurement results of the LC oscillator VCO2.

To determine the worst-case switching time of the frequency is switched from synthesizer, the frequency division ratio 226 to 349, and the control voltages of both VCO1 and VCO2 are measured. As shown in Fig. 20, the measurable switching time of the proposed synthesizer is 830 s for a frequency error of approximately 10 kHz due to the limited resolution of our oscilloscope. Since the VCO gain is 160 MHz/V, in order to achieve a measurement accuracy of a 100-Hz frequency error, the resolution of the oscilloscope would need to be better than 100 nV, which unfortunately is not obtainable with our equipment. On the other hand, the synthesizer suffers from a slew-rate problem during the channel switching due to a small charge-pump current (1.6 A) and a large loop-filter capacitor (1.1 nF) in the high-frequency loop. H. Performance Evaluation

Fig. 18.

Phase-noise measurement results of the proposed synthesizer.

During the measurement, the 1.6-MHz reference signal is generated by a 16-MHz crystal oscillator and a decade counter. Therefore, the 16-MHz spur is caused by the substrate coupling between the crystal oscillator and the synthesizer. To verify the reason of the increased spurious level at 1.6 MHz, the low-frequency loop is disabled, and the spurious level is still 75.1 dBc at 1.6 MHz, which implies that the increase in spurious level at 1.6 MHz is mainly caused by the substrate coupling.

Table II summarizes the measured performance of the proposed frequency synthesizer, and Table III lists the performance of other fully integrated synthesizers for comparison. The proposed synthesizer operates at a single 2-V supply while all other designs require supply voltages of at least 2.7 V. Note that since the designs [1] and [3] operate at higher frequencies, their power consumption should be scaled down accordingly for a fair comparison. With this frequency normalization, the power consumption of the proposed dual-loop synthesizer is still comparable to that of the other designs. The synthesizer presented in [1] is a fractional- design with a 26.6-MHz comparison frequency and a loop bandwidth of 45 kHz. As comparison, the low-frequency loop of this work has a comparison frequency of only 1.6 MHz but a loop bandwidth of up to 40 kHz because of the relaxed requirement of the spurious tones. On the other hand, the high-frequency loop uses a 11.3-MHz comparison frequency but a loop bandwidth limited to 27 kHz, which is the real limiting factor of the switching-time performance. Therefore, it is believed that a switching time close to that in [1] can be achieved by

214

Fig. 20.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001

Switching-time measurement results of the proposed synthesizer. TABLE II PERFORMANCE SUMMARY OF THE PROPOSED SYNTHESIZER

respectively. For the same reason, the total loop-filter capacitance can be smaller than 60 pF, which greatly reduces chip area. However, the situation would be much different if channel programmability is included. Although the proposed synthesizer consists of two loop filters, but the chip area is just a little bit larger than that of the design in [3] due to the use of linear capacitors and silicideblocked resistors. Compared to the designs in [1] and [3], the spurious levels are between 75 and 85 dBc, which indicates that CMOS designs suffer the same problem from the substrate coupling between the reference signal and VCO. However, for the close-in phase noise, the proposed dual-loop synthesizer suffers from the 15-dB increase at the peak due to the charge pumps and loop filters as discussed in Section V-E. I. Generation of the Second Reference Sources

eliminating the slew-limiting problem in the high-frequency loop. The work described in [2] is an integer- bipolar junction transistor (BJT) design with channel spacing of 600 kHz, and its comparison frequency and loop bandwidth are limited to be 600 and 4 kHz, respectively. With such a low bandwidth, the settling time is still less than 600 s. It implies that if a larger charge-pump current or an active loop filter is adopted in the high-frequency loop of the dual-loop design, slew limiting can be suppressed and the switching time performance can be enhanced. Since the density of the BJT transistors is not as good as the CMOS transistor, the chip area is relatively large even though it does not include an on-chip loop filter. The synthesizer in [3] is also an integer- design but without channel programmability, and as such, its comparison frequency and loop bandwidth can be as high as 61.5 MHz and 200 kHz,

The main drawback of our dual-loop synthesizer is that it requires two reference sources. In reality, if a single reference signal is preferred for the whole frequency synthesizer, the second reference signal (204.8 MHz instead of 205 MHz) can be generated from the 1.6-MHz reference signal by a third PLL with frequency division ratio of 128. Since this third PLL has a fixed division frequency, its frequency divider can be implemented simply by cascading seven divide-by-2 dividers. Without the divide-by-32 divider between the third PLL output and the high-frequency loop, the close-in phase-noise requirement of this PLL would be in fact 30 dB more stringent ( 92 dBc/Hz) than that of the low-frequency loop. However, since the division ratio is only 128, it would offer a relaxation in requirement. The remaining 21.3 dB phase-noise suppression could be achieved by increasing the charge-pump current of the third PLL.

YAN AND LUONG: MONOLITHIC CMOS DUAL-LOOP FREQUENCY SYNTHESIZER

215

TABLE III PERFORMANCE COMPARISON OF RECENT WORK ON FULLY INTEGRATED FREQUENCY SYNTHESIZERS

In addition to the close-in phase noise, the far-offset phasenoise requirement would also be more stringent by the same amount. Assuming that VCO1 is adopted in the third PLL, its phase noise could be improved to be 116 dBc/Hz at 600 kHz at a 204-MHz operation. With a 30.6-dB filtering effect of the high-frequency loop, the phase-noise contribution by the VCO of the third PLL would become 146.6 dBc/Hz at 600 kHz, which would have negligible effect on the overall phase noise. Basically, the implementation of the third PLL would be similar to that of the low-frequency loop, and its chip area would be less than 10% of the total area of the dual-loop synthesizer. Since the third VCO operates at half the frequency of VCO1, its power consumption would be 25% of that of VCO1 ( 2.5 mW). Similarly, since the divide-by-128 divider also operates at half of the frequency, it would consume only half the power as compared ( 0.3 mW). Although the charge-pump curto the divider rent should be increased by 100 times to 640 A, the average power current would be only 64 A since the turn-on time is only around 1/10 of the input period. In conclusion, by introducing the third PLL to generate the second reference signal, the additional power required would be less than 3 mW, and the increase in the total chip area would still be less than 10%. VI. CONCLUSION A 900-MHz monolithic CMOS dual-loop frequency synthesizer with good phase-noise performance for GSM receivers is presented. Compared to other fully integrated synthesizer designs, this proposed synthesizer operates at much lower supply voltage and consumes approximately the same power with frequency normalization. Implemented in a standard 0.5- m CMOS technology and at 2-V supply voltage, the synthesizer has a power consumption of 34 mW. At 900 MHz, the measured phase noise is 121.8 dBc/Hz at 600-kHz frequency offset,

and the spurious level is 82 dBc at 11.3 MHz. Due to the substrate coupling and testing setup, additional spurious levels are measured to be 79.5 dBc at 1.6 MHz and 82.8 dBc at 16 MHz. The chip area is less than 2.64 mm . Even if a third PLL is implemented to generate the second reference frequency, the increase in the total power consumption and the total chip area would be negligibly small. REFERENCES [1] J. Craninckx and M. Steyaert, “A fully integrated CMOS DCS-1800 frequency synthesizer,” IEEE J. Solid-State Circuits, vol. 33, pp. 2054–2065, Dec. 1998. [2] A. Ali and J. L. Tham, “A 900-MHz frequency synthesizer with integrated LC voltage-controlled oscillator,” Proc. IEEE Int. Solid-Stage Circuits Conf., vol. 1, pp. 390–391, 1996. [3] J. F. Parker and D. Ray, “A 1.6-GHz CMOS PLL with on-chip loop filter,” IEEE J. Solid-State Circuits, vol. 33, pp. 337–343, Mar. 1998. [4] “Digital cellular telecommunications system (Phase 2 ); Radio transmission and reception (GSM 5.05),” European Telecommunications Standards Institute, 1996. [5] D. Craninckx and D. Steyaert, Wireless CMOS Frequency Synthesizer Design. Norwell, MA: Kluwer, 1998, pp. 201–202. [6] T. Aytur and J. Khoury, “Advantages of dual-loop frequency synthesizers for GSM applications,” in Proc. IEEE Int. Symp. Circuits and Systems, 1997. [7] C. H. Park and B. Kim, “A low-noise 900-MHz VCO in 0.6-m CMOS,” in Proc. Symp. VLSI Circuits, 1998. [8] A. Hajimiri, S. Limotyrakis, and T. H. Lee, “Phase noise in multi-gigahertz CMOS ring oscillators,” in Proc. IEEE 1998 Custom Integrated Circuit Conf., 1998, pp. 49–52. [9] “Oscillator noise analysis in SpectreRF, application note to SpectreRF,” CADENCE, 1998. [10] R. B. Merrill, T. W. Lee, H. You, R. Rasmussen, and L. A. Moberly, “Optimization of high-Q integrated inductors for multilevel metal CMOS,” in Proc. Int. Electronic Device Meeting, 1995, pp. 983–986. [11] A. Hajimiri and T. H. Lee, “A general theory of phase noise in electrical oscillators,” IEEE J. Solid-State Circuits, pp. 179–194, Feb. 1998. [12] J. Yuan and C. Svenson, “High-speed CMOS circuit technique,” IEEE J. Solid-State Circuits, vol. 24, pp. 62–70, Feb. 1989. [13] B. Razavi, RF Microelectronics. Englewood Cliffs, NJ: Prentice Hall, 1997.

+

216

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001

[14] P. Larsson, “High-speed architecture for a programmable frequency divider and a dual-modulus prescaler,” IEEE J. Solid-State Circuits, vol. 31, pp. 744–748, May 1996. [15] I. A. Young, J. K. Greason, and K. L. Wong, “PLL clock generator with 5 to 110 MHz of lock range for microprocessors,” IEEE J. Solid-State Circuits, vol. 27, pp. 1599–1607, Nov. 1992. [16] W. S. T. Yan, “A 2-V 900-MHz monolithic CMOS dual-loop frequency synthesizer for GSM receivers,” M.Phil. thesis, Hong Kong University of Science and Technology. [Online.] Available: http://www.ee.ust.hk/~eetak, 1999. [17] T. Fredrich, “Direct phase noise measurements using a modern spectrum analyzer,” Microwave J., vol. 35, pp. 94–114, Aug. 1992.

William S. T. Yan received the Bachelor and Master’s degrees in electrical and electronics engineering from the Hong Kong University of Science and Technology, Hong Kong, in 1996 and 1999, respectively. He is currently with Maxim Integrated Products, Sunnyvale, CA. His current interests are in the areas of high-frequency integrated-circuit design.

Howard C. Luong (M’91) received the B.S. (high honors), M.S., and Ph.D. degrees in electrical engineering and computer sciences from the University of California, Berkeley, in 1988, 1990, and 1994, respectively. For his Master’s thesis, he worked on MOS analog multipliers with scaling technologies. For his Ph.D. dissertation, he designed and fabricated a superconductive flash-type analog-to-digital converter that operated at multi-gigahertz clock and input frequencies. Since September 1994, he has been with the electrical and electronics engineering faculty at the Hong Kong University of Science and Technology, where he has been the Faculty-In-Charge of the Analog Research Lab and the Associate Director of the EEE Undergraduate Program Committee. His research interests are in high-performance analog and RF integrated circuits for wireless and portable communications. Dr. Luong has served as an Associate Editor for IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II. He received the Faculty Teaching Excellence Appreciation Award from the Hong Kong University of Science and Technology School of Engineering in 1995, 1996, and 2000.

2048

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997

A 27-mW CMOS FractionalSynthesizer Using Digital Compensation for 2.5-Mb/s GFSK Modulation Michael H. Perrott, Student Member, IEEE, Theodore L. Tewksbury III, Member, IEEE, and Charles G. Sodini, Fellow, IEEE

Abstract— A digital compensation method and key circuits are presented that allow fractional-N synthesizers to be modulated at data rates greatly exceeding their bandwidth. Using this technique, a 1.8-GHz transmitter capable of digital frequency modulation at 2.5 Mb/s can be achieved with only two components: a frequency synthesizer and a digital transmit filter. A prototype transmitter was constructed to provide proof of concept of the method; its primary component is a custom fractional-N synthesizer fabricated in a 0.6-m CMOS process that consumes 27 mW. Key circuits on the custom IC are an onchip loop filter that requires no tuning or external components, a digital MASH – modulator that achieves low power operation through pipelining, and an asynchronous, 64-modulus divider (prescaler). Measurements from the prototype indicate that it meets performance requirements of the digital enhanced cordless telecommunications (DECT) standard.

(a)

61

(b)

Index Terms— Compensation, continuous phase modulation, digital radio, frequency modulation, frequency shift keying, frequency synthesizers, phase locked loops, sigma–delta modulation, transmitters. (c)

I. INTRODUCTION

T

HE use of wireless products has been rapidly increasing the last few years, and there has been worldwide development of new systems to meet the needs of this growing market. As a result, new radio architectures and circuit techniques are being actively sought that achieve high levels of integration and low power operation while still meeting the stringent performance requirements of today’s radio systems. Our focus is on the transmitter portion of this effort, with the objective of achieving over 1-Mb/s data rate using frequency modulation. To achieve the goals of low power and high integration, it seems appropriate to develop a transmitter architecture that consists of the minimal topology that accomplishes the required functionality. All digital, narrowband radio transmitters that are spectrally efficient require two operations to be performed. The baseband modulation data must be filtered to limit the extent of its spectrum, and the resulting signal must Manuscript received July 7, 1997; revised August 4, 1997. This work was supported by DARPA Contract DAAL-01-95-K-3526. M. H. Perrott was with the Microsystems Technology Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA. He is now with Hewlett-Packard Laboratories, Palo Alto, CA 94304-1392 USA. T. L. Tewksbury III was with Analog Devices, Wilmington, MA 01887 USA. He is now with IBM Microelectronics, Waltham, MA 02254 USA. C. G. Sodini is with the Microsystems Technology Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA. Publisher Item Identifier S 0018-9200(97)08270-X.

Fig. 1. Methods of frequency modulation upconversion: (a) mixer based, (b) direct modulation of VCO, and (c) indirect modulation of VCO.

be translated to a desired RF band. This paper will focus on the issue of frequency translation, which can be accomplished in at least three different ways for frequency modulation. As illustrated in Fig. 1, the modulation signal can be (a) multiplied by a local oscillator (LO) frequency using a mixer, (b) fed into the input of a voltage controlled oscillator (VCO), or (c) fed into the input of a frequency synthesizer. Approach (a) can theoretically be accomplished with either a heterodyne or homodyne approach. The heterodyne approach offers excellent radio performance but carries a high cost in implementation due to the current inability to integrate the high- , low-noise, low-distortion bandpass filters required at intermediate frequencies (IF) [1]. As a result, the direct conversion approach has recently grown in popularity [2]–[4]. In this case, two mixers and baseband A/D converters are required to form in-phase/quadrature (I/Q) channels and a frequency synthesizer to obtain an accurate carrier frequency. Approach (b) is referred to as direct modulation of a VCO and has appeared in designs for the digital enhanced cordless telecommunications (DECT) standard [5], [6]. A frequency synthesizer is used to achieve an accurate frequency setting and then disconnected so that modulation can be fed into the

0018–9200/97$10.00  1997 IEEE

PERROTT et al.: CMOS FRACTIONAL-

SYNTHESIZER USING DIGITAL COMPENSATION

VCO unperturbed by its dynamics. This technique allows a significant reduction in components; no mixers are required since the VCO performs the frequency translation, and only one D/A converter is required to produce the modulation signal. Power savings are thus achieved, as demonstrated by the fact that the design in [5] appears to consume nearly half the power of the mixer based designs in [2]–[4]. Unfortunately, since the synthesizer is inactive during modulation, the nominal frequency setting of the VCO tends to drift as a result of leakage currents. In addition, undesired perturbations, such as the turn-on transient of the power amp, can dramatically shift the output frequency. As stated in [6], the isolation requirements for this method exclude the possibility of a onechip solution. Therefore, while the approach offers a significant advantage in terms of power dissipation, the goal of high integration is lost. Finally, approach (c) can be viewed as indirect modulation of the VCO through appropriate control of a frequency synthesizer that sets the VCO frequency and yields the simplest transmitter solution of those presented. The synthesizer has a digital input which allows elimination of the D/A converter that is required when directly modulating the VCO. Since the synthesizer controls the VCO during modulation, the problem of frequency drift during modulation is eliminated. Also, isolation requirements at the VCO input are greatly reduced at frequencies within the PLL bandwidth. The primary obstacle faced with this architecture is that a severe constraint is placed on the maximum achievable data rate due to the reliance on feedback dynamics to perform modulation. This paper presents a compensation method and key circuits that allow modulation of a frequency synthesizer at rates that are over an order of magnitude faster than its bandwidth. Application of the technique allows a high data rate ( 1 Mb/s) transmitter with good spectral efficiency to be realized with only two components: a frequency synthesizer and a digital transmit filter. By avoiding additional components such as mixers and D/A converters in the modulation path, a low power transmitter architecture is achieved. Since off-chip filters are not required, high integration is accomplished as well. The technique can be used in transmitter applications where frequency modulation is desired, and a moderate tolerance is allowed on the modulation index. (When using compensation, the accuracy of the modulation index, which is defined as the ratio of the peak-to-peak frequency deviation of the transmitter output to its data rate, is limited by variations in the openloop gain of the PLL [7].) To provide proof of concept of the technique, we present results from a 1.8-GHz prototype that supports Gaussian frequency shift keying (GFSK) modulation, the same modulation method used in DECT, at data rates in excess of 2.5 Mb/s. We begin by reviewing a fractional- synthesizer method presented in [8]–[11] that provides a convenient structure with which to apply the technique. It is shown that high data rates and good noise performance are difficult to achieve with this topology. A method is proposed to overcome these problems, followed by discussion of issues that ensue from its use. A description of key circuits in the prototype is then given, which include an on-chip loop filter, a 64-modulus divider,

2049

N modulator.

Fig. 2. A spectrally efficient, fractional-

and a pipelined, digital – modulator. Finally, experimental results are presented and conclusions made. II. BACKGROUND The fractional- approach to frequency synthesis enables fast dynamics to be achieved within the phase-locked loop (PLL) by allowing a high reference frequency [8]; a very useful benefit when attempting to modulate the synthesizer. High resolution is achieved with this approach by allowing noninteger divide values to be realized through dithering; it has been shown that low spurious noise can be obtained by using a high-order – modulator to perform this operation [8], [10], [11]. This approach leads to a simple synthesizer structure that is primarily digital in nature, and is referred to as a fractional- synthesizer with noise shaping. Using this fractional- approach, it is straightforward to realize a transmitter that performs phase/frequency modulation in a continuous manner by direct modulation of the synthesizer. Fig. 2 illustrates a simple transmitter capable of Gaussian minimum shift keying (GMSK) modulation [9]. The binary data stream is first convolved with a digital finite impulse response (FIR) filter that has a Gaussian shape. (Physical implementation of this filter can be accomplished with a ROM whose address lines are controlled by consecutive samples of the data and time information generated by a counter.) The digital output of this filter is then summed with a nominal divide value and fed into the input of a digital – converter, the output of which controls the instantaneous divide value of the PLL. The nominal divide value sets the carrier frequency, and variation of the divide value causes the output frequency to be modulated according to the input data. Assuming that the PLL dynamics have sufficiently high bandwidth, the characteristics of the modulation waveform are determined primarily by the digital FIR filter and thus accurately set. Fig. 3 depicts a linearized model of the synthesizer dynamics in the frequency domain. The digital transmit filter confines the modulation data to low frequencies, the – modulator adds quantization noise that is shaped to high frequencies, and the PLL acts as a low-pass filter that passes the input but attenuates the – quantization noise. In the figure, is

2050

Fig. 3. Linearized model of fractional-

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997

N modulator.

calculated as (1) and are the loop filter transfer funcwhere tion, the VCO gain (in Hz/V), and the nominal divide value, respectively. (See [7] for modeling details.) An analogy between the fractional- modulator and a – D/A converter can be made by treating the output frequency of the PLL as an analog voltage. A key issue in the system is that the – modulator adds quantization noise at high frequency offsets from the carrier. In general, the noise requirements for a transmitter are very strict in this range to avoid interfering with users in adjacent channels. In the case of the DECT standard, the phase noise density can be no higher than 131 dBc/Hz at a 5-MHz offset [6]. Noise at low frequency offsets is less critical for a transmitter and need only be below the modulation signal by enough margin to insure an adequate signal-to-noise ratio. Sufficient reduction of the – quantization noise can be accomplished through proper choice of the – sample rate, which is assumed to be equal to the reference frequency, and the PLL transfer function, (Note that this problem is analogous to that encountered in the design of – D/A converters, except that the noise spectral density at high frequencies, rather than the overall signal-to-noise ratio, is the key parameter.) One way of achieving a low spectral density for the noise is to use a high sample rate for the – so that the quantization noise is distributed over a wide frequency range and its spectral density reduced. Alternatively, the attenuation offered by can be increased; this is accomplished by decreasing its cutoff frequency, , or increasing its order, Unfortunately, a low value of carries a penalty of lowering the achievable data rate of the transmitter. This fact can be observed from Fig. 3; the modulation data must pass through the dynamics of the PLL so that its bandwidth is restricted by that of Given this constraint, the achievement of low noise must be achieved through proper setting of the – sample rate and PLL order. It is worthwhile to quantify required values of these parameters for a given data rate and noise specification. To do so, we first choose to be a Butterworth response of order

The above expression is chosen for the sake of simplicity in calculations; other filter responses could certainly be implemented. The spectral density of the noise at the transmitter

Fig. 4. Achievable data rates versus PLL order and noise from – is 136 dBc/Hz at 5 MHz offset.

61 0

6–1 sample rate when

output due to quantization noise is expressed as (2) where is the – sample period, and a multistage (MASH) structure [12] of order is assumed for the modulator. By choosing the order of the MASH – to be the same as the order of , the rolloff of (2) and the VCO noise are matched at high frequencies ( 20 dB/dec). Fig. 4 displays the resulting parameters at different data rates; these values were calculated by setting (2) to 136 dBc/Hz at 5 MHz and the ratio to 0.7, where is the data rate. (The noise specification was chosen to achieve less than 131 dBc/Hz at 5 MHz offset after adding in VCO phase noise.) The figure reveals that the achievement of high data rates and low noise must come at the cost of power dissipation and complexity when attempting direct modulation of the synthesizer. In particular, the power consumed by the digital circuitry is increased at a high – sample rate by virtue of the increased clock rate of the – modulator and the digital FIR filter, The power consumed by the analog section is increased for high values of PLL order since additional poles and zeros must be implemented. This issue is aggravated by the need to set these additional time constants with high accuracy in order to avoid stability problems in the PLL. If tuning circuits are used to achieve such accuracy [13], spurious noise problems can also be an issue. III. PROPOSED METHOD The obstacles of high data rate modulation discussed above are greatly mitigated if the modulation bandwidth is allowed to exceed that of the PLL. In this case, the bandwidth of can be set sufficiently low that an excessively high PLL order or reference frequency is not necessary to achieve the required noise performance. Fig. 5 illustrates the proposed method that achieves this goal. By cascading a compensation filter, , with the digital FIR filter, the transfer function seen by the modulation data can be made flat by setting

PERROTT et al.: CMOS FRACTIONAL-

SYNTHESIZER USING DIGITAL COMPENSATION

2051

TABLE I THEORETICALLY ACHIEVABLE DATA RATES USING COMPENSATION FOR SECOND-ORDER PLL

Fig. 5. Proposed compensation method.

This new filter is simple to implement in a digital manner—by combining it with the FIR filter, we need only alter the ROM storage values. In fact, savings in area and power of the ROM can be achieved over the uncompensated method since the number of time samples that need to be stored are dramatically reduced [7]. To illustrate the technique, we consider the case where is chosen to implement GFSK modulation with as used in the DECT standard, and is second order. Under these assumptions, the time domain version of is described as samples of (3) where “ ” is the convolution operator, is the – sample period, is the period of the data stream, and equals for and zero elsewhere. Since is the inverse of , we write (4) In the time domain, the digital compensated FIR filter is then calculated by taking samples of the expression

(5) For described in (3), these derivatives are well defined and can be calculated analytically. A final form is derived by substituting (3) into (5) to yield

Fig. 6. The effect of mismatch.

signal. If the order of is increased to the resulting signal swings will be amplified according to The achievable data rates using compensation are limited by the ability of the PLL to accommodate this increased signal swing. PLL components that are particularly affected are the – modulator, the divider, and the charge pump. Assuming an appropriate multibit – structure and multimodulus divider topology are used, the bottleneck in dynamic range will be set by the limited duty cycle range of the charge pump. Table I displays the achievable data rates at different – sample rates using compensation; the noise specification was identical to that used to generate Fig. 4. In light of the signal swing limitation and our goal of simplicity, we have restricted our attention to second-order PLL dynamics. Calculations were based on the assumption that the duty cycle of the charge pump is limited only by its transient response, which was assumed to be 5 ns. Comparison of this information with Fig. 4 reveals that compensation allows high data rates to be achieved with relatively low power and complexity. In the actual prototype, data rates as high as 2.85 Mb/s are achieved with a secondorder PLL with kHz, and a – sample rate of 20 MHz. B. Matching Issues

(6) A. Achievable Data Rates increases Equation (6) reveals that the signal swing of in proportion to for large values of Since is the ratio of the modulation data rate to the bandwidth of the PLL, we see that high data rates lead to large signal swings of the modulation signal when using compensation. Intuitively, this behavior makes sense since the attenuation of must be overcome by the compensated

In practice, mismatch will occur between the compensation filter and PLL dynamics. While the compensation filter is digital and therefore fixed, the PLL dynamics are analog in nature and sensitive to process and temperature variations. Fig. 6 illustrates that a parasitic pole/zero pair occurs when the bandwidth of the PLL is too high; a similar situation occurs when its bandwidth is too low. As will be seen in the results sections, the parasitic pole/zero pair causes intersymbol interference (ISI) and modulation deviation error. To mitigate this problem in the prototype, an on-chip loop filter with accurate time constants was implemented, and open-loop gain

2052

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997

Fig. 7. Prototype system.

control was used to accurately place the overall pole and zero positions of the PLL transfer function. An additional issue related to be mismatch arises from practical concerns in the PLL implementation. The achievement of a large dynamic range in the charge pump is aided by including an integrator in the loop filter (see Section IV-B2), which yields an overall PLL transfer function as

(7)

, is now added that occurs A parasitic pole/zero pair, and well below in frequency. Unfortunately, taking the inverse of (7) leads to a compensation filter that is IIR in nature and cannot be implemented with a ROM. To avoid such difficulties, we can ignore the parasitic pole/zero pair and use as described in (4). The resulting ISI is negligible since and are close to each other and low in value. However, the digital compensation filter must be modified to be samples of to accommodate the increased gain of at frequencies greater than

Fig. 8. Die photo. TABLE II POWER DISSIPATION OF IC CIRCUITS

IV. IMPLEMENTATION To show proof of concept of the proposed compensation method, the system depicted in Fig. 7 was built using a custom CMOS fractional- synthesizer that contains several key circuits. Included are an on-chip, continuous-time filter that requires no tuning or external components, a digital MASH – modulator with six output bits that achieves low power operation through pipelining, and a 64-modulus divider that supports any divide value between 32 and 63.5 in half cycle increments. An external divide-by-two prescaler is used so that the CMOS divider input operates at half the VCO frequency, which modifies the range of divide values to include all integers between 64 and 127.

Fig. 8 displays a die photograph of the custom IC, which was fabricated in a 0.6- m, double-poly, double-metal, CMOS V and process with threshold voltages of V. The entire die is 3 mm by 3 mm, and its power dissipation is 27 mW. Table II lists the power consumed by individual circuits. The power supply values given in Table II were chosen to be as low as possible to minimize power dissipation; at the cost of higher power dissipation, all circuits could be powered by a single 3.3-V supply.

PERROTT et al.: CMOS FRACTIONAL-

SYNTHESIZER USING DIGITAL COMPENSATION

2053

Fig. 10. An asynchronous, 64-modulus divider implementation.

Fig. 9. An asynchronous, eight-modulus divider topology.

The 64-modulus divider and six-output-bit – modulator provide a dynamic range for the compensated modulation data that is wide enough to support data rates in excess of 2.5 Mb/s. The on-chip loop filter allows an accurate PLL transfer function to be achieved by tuning just one PLL parameter—the open-loop gain. A brief overview of each of these components is now presented. A. Divider To achieve a low-power design, it is desirable to use an asynchronous divider structure to minimize the amount of circuitry operating at high frequencies. As such, a multimodulus divider structure was designed that consists of cascaded divide-by-2/3 sections [14]; this architecture is an extension of the common dual-modulus topology [15]. The eight-modulus example in Fig. 9 shows the proposed structure which allows a wide range of divide values to be achieved by allowing a variable number of input cycles to be “swallowed” per output cycle. Each divide-by-2/3 stage normally divides its input by two in frequency, but will swallow an extra cycle per OUT period when its control input, , is set to one. As shown for the case where all control bits are set to one, the number of IN cycles swallowed per OUT period is binary weighted according to the stage position. For instance, setting causes one cycle of IN to be swallowed, while setting causes four cycles of IN to be swallowed. Proper selection of allows any integer divide value between 8 and 15 to be achieved. The 64-modulus divider that was developed for the prototype system uses a similar principle to that discussed above, but has a modified first stage to achieve high-speed operation. Specifically, the implemented architecture consists of a highspeed divide-by-4/5/6/7 state machine followed by a cascaded chain of divide-by-2/3 state machines as illustrated in Fig. 10. The divide-by-4/5/6/7 stage accomplishes cycle swallowing by shifting between four phases of a divide-by-two circuit. Each of the four phases is staggered by one IN cycle, which allows single cycle pulse swallowing resolution despite the fact that two cascaded divide-by-two structures are used; details of this approach are discussed at length in [7]. The important point to make about the phase shifting approach, which is

Fig. 11. PFD, charge pump, and loop filter.

also advocated in [15], is that it allows a minimal number of components to operate at high frequencies—the first two stages are simply divide-by-two circuits, not state machines. Also, the fact that control signals are not fed into the first divide-by-two circuit allows it to be placed off-chip in the prototype. B. Analog Section The achievement of accurate PLL dynamics is accomplished in the prototype system with the variable gain loop filter topology depicted in Fig. 11. The input to the filter is the instantaneous phase error between the reference frequency and divider output and is manifested as the deviation of the phase frequency detector (PFD) output duty cycle from its nominal value of 50%. As modulation data is applied, the duty cycle is swept across a range of values; the shaded region in the figure corresponds to the deviation that occurs when GFSK modulation at 2.5 Mb/s is applied. A 50% nominal duty cycle is desired to avoid the dead-zone of the PFD and thus reduce distortion of the modulation signal. The prototype used a PFD design from [16] to achieve this characteristic. To produce a signal that is a filtered version of the phase error, the output of the PFD is converted to complementary current waveforms by a charge pump before being sent into the inputs of an on-chip loop filter. The conversion to current allows the filtering operation to be performed without resistors and also provides a convenient means of performing gain control of the resulting transfer function. An integrator is included in the loop filter which forces the average current from the charge pump to be zero and the nominal duty cycle to be, ideally, 50% when the PLL is locked. A PFD design with 50% nominal duty cycle is seldom used in PLL circuits due to power consumption and spurious noise issues—the charge pump is always driving current into the loop filter under such conditions, and spurs at multiples of the reference frequency are produced due to the square wave output of the PFD. Fortunately, these problems are greatly mitigated in the prototype transmitter since the charge pump

2054

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997

(a)

(b)

Fig. 12.

(c)

Loop filter implementation.

Fig. 13. Effect of transient time and mismatch on duty cycle range.

output current is very small (at its largest setting, it toggles between 3.5 and 3.5 A), and the loop filter bandwidth is very low (84 kHz) in comparison to the reference frequency (20 MHz). The resulting spur at the transmitter output is less than 60 dBc at 20 MHz when measuring the transmitter in an unmodulated state without an RF bandpass filter at its output. When modulated, this spur is convolved with the modulation signal and thus turned into phase noise [7]; it is reasonable to assume that this noise is reduced to a negligible level when the RF bandpass filter is included due to its high frequency offset. 1) Loop Filter: The on-chip loop filter uses an opamp to integrate one of the currents and add it to a first-order filtered version of the other current. This topology, shown in Fig. 12, realizes the transfer function

kHz

kHz

(8)

The open loop gain, , is adjusted by varying the charge pump output current, The first-order pole is created using a switched capacitor technique, which reduces its sensitivity to thermal and process variations and removes any need for tuning. Note that, although this time constant is formed through a sampling operation, the output of the switched capacitor filter is a continuous-time signal. Finally, the value of the zero, is determined primarily by the ratio of capacitors and under the assumption that the complementary charge pump currents are matched. A particular advantage of the filter topology is that the rate of sampling and can be set high since it is independent and of the settling dynamics of the opamp. As such, are set to the PFD output frequency, 20 MHz, to avoid aliasing problems. The opamp is realized with a single-ended, two-stage topology chosen for its simplicity and wide output swing. Its unity gain frequency was designed to be 6 MHz; this value is sufficiently higher than the bandwidth of the GFSK modulation signal at 2.5 Mb/s to avoid significantly affecting it. It is recognized that the single-ended structure has higher sensitivity to substrate noise than a differential counterpart. However, little would be gained in this case by making it fully

differential since the output of the opamp is connected directly to the varactor of an LC-based VCO, which is inherently single ended. Fortunately, measured eye diagrams and spectral plots presented at the end of this paper conform to calculations that exclude substrate noise, thereby showing that it has negligible impact on the modulation and noise performance of the prototype system. However, as even higher levels of integration are sought in future radio systems, the impact of substrate noise will need to be carefully considered. The limited dynamics of the opamp prevent it from following the fast transitions of its input current waveforms. To prevent these waveforms from adversely affecting the performance of the opamp, the voltage swing that appears at its input terminals is reduced to a low amplitude (less than 40 mV peak-to-peak) by capacitors and In the case of , this capacitor also serves as part of the switched capacitor filter. 2) Charge Pump: Proper design of the charge pump is critical for the achievement of high data rates since it forms the bottleneck in dynamic range that is available to the modulation signal. Fig. 13 illustrates the fundamental issues that need to be considered in its design. To avoid distortion of the modulation signal, the variation in duty cycle should be limited to a range that allows the output of the charge pump to settle close to its final value following all positive and negative transitions. Fig. 13(a) shows the dynamic range available for a welldesigned charge pump; the nominal duty cycle is 50% and the transition times are fast. Fig. 13(b) demonstrates the reduction in dynamic range that occurs when the nominal duty cycle is offset from 50%. This offset is caused by a mismatch between positive and negative currents produced by the charge pump. (The type II PLL dynamics force an average current of zero.) Finally, Fig. 13(c) illustrates a case in which the charge pump has slow transition times, the result again being a reduction in dynamic range. The charge pump topology was designed with the above issues in mind and is illustrated in Fig. 14. The core component and that is of the architecture is a differential pair and , and from fed from the top by two current sources, Ideally, and are equal the bottom by a tail current, to where is adjusted by a 5-b D/A that to and

PERROTT et al.: CMOS FRACTIONAL-

SYNTHESIZER USING DIGITAL COMPENSATION

Fig. 14.

Charge pump implementation.

Fig. 15.

A second-order, digital MASH structure.

controls the node Transistors and are switched which ideally causes and on and off according to to switch between and To achieve a close match between the positive and negative currents of each charge pump output, the design strives to set and In the first case, and are implemented as cascoded PMOS devices whose layout is optimized to achieve high levels of device matching. Unfortunately, device matching cannot be used to achieve a close match between and since they are generated by different types of devices. To circumvent this obstacle, a feedback stage is used to adjust by comparing currents produced by a replica stage. This technique allows to be matched to to the extent that the replica stage is matched to the core circuit. A low transient time in the charge pump response is obtained by careful design of signal and device characteristics at the source nodes of and First, the parasitic capacitance at this node is minimized by using appropriate layout techniques to reduce the source capacitance of and , the drain capacitance of , and the interconnect capacitance between each of the devices. Second, the voltage deviation is minimized at this node that occurs when switches. The level converter depicted in Fig. 14 accomplishes this task by reducing the voltage variation at nodes and to less than 350 mV and setting an appropriate dc bias.

2055

Fig. 16. A pipelined adder topology.

C.



Modulator

Fig. 15 shows the second-order MASH – topology used in the prototype. This structure is well known [12] and has properties that are well suited to our transmitter application. The MASH topology is unconditionally stable over its entire input range and is readily pipelined by using a technique described in this section. The spectral density at the output of a second-order MASH – modulator is described by the equation (9) In the presence of a sufficiently active input, can be considered a white noise source with spectral density

2056

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997

Fig. 17.

A pipelined, second-order, digital MASH structure.

Fig. 18.

Pipelined digital data path to divider input.

This assumption is reasonable while the modulation signal is applied; we have found that setting the least significant bit (LSB) of the modulator high also helps to achieve this condition by forcing the internal states of the MASH structure to constantly change. A fact that does not appear to have been appreciated in the literature is that the digital MASH – structure is highly amenable to pipelining. This is a useful technique when seeking a low power implementation since it allows the supply voltage to be reduced by virtue of the fact that the required throughput can be achieved with lower circuit speed. To pipeline the MASH structure, we apply a well-known technique that has been used for adders and accumulators [17], [18]. Fig. 16 illustrates a 3-b example. Since the critical path in these structures is their carry chain, registers are inserted in this path. To achieve time alignment between the input and the delayed carry information, registers are also used to skew the input bits. As indicated in the figure, we refer to this operation as “pipe shifting” the input. The adder output is realigned in time by performing an “align shift” of its bits as shown. (Note that shading is applied to the adder block in Fig. 16 as a reminder that its bits are skewed in time.) The same pipelining approach can be applied to digital accumulators since there is no feedback from higher to lower bits. Since its basic building blocks are adders and accumulators, a MASH – modulator of any order can be pipelined using this technique. Using the symbols introduced in the previous two figures, Fig. 17 depicts a pipelined, second-order MASH topology. Each first-order – is realized as a pipelined accumulator with feedback removed from the most significant bits in its output. The output of the second stage is fed into the

filter , which is implemented with two pipelined adders and a delay element, A delay is inserted between these two adders in order to pipeline their sum path, which requires a matching delay in the path above for time alignment. Also, a delay must also be included in the output path of the first – stage to compensate for the time delay incurred through the second stage. Since a signal once placed in the “pipe shifted domain” can be sent through any number of cascaded, pipelined adders and/or integrators, only one pipe shift and align shift are needed in the entire structure. Fig. 18 illustrates the implementation of the overall digital path using pipelining. To save area, the circuits were pipelined every two bits as opposed to one, and pipe shifting was not applied to the carrier frequency signal since it is constant during modulation. To achieve flexibility, the compensated digital transmit filter was implemented in software, as opposed to a ROM, and the resulting digital data stream fed into the custom CMOS IC. V. MEASURED PERFORMANCE The primary performance criteria by which a transmitter is judged are its accuracy in modulation and its noise performance. We now describe the characterization of the prototype in relation to these issues. Fig. 19 shows measured eye diagrams from the prototype using an HP 89441A modulation analyzer. To illustrate the impact of mismatch between the compensation filter and PLL dynamics, measurements were taken under three different values of open-loop gain. These results indicate that the modulation performance of the transmitter is quite good even when the open-loop gain is in error by 25%; the effects of

PERROTT et al.: CMOS FRACTIONAL-

SYNTHESIZER USING DIGITAL COMPENSATION

(a) Fig. 19.

(b)

2057

(c)

Measured eye diagrams at 2.5 Mb/s for three different open-loop gain settings: (a)

025% gain error, (b) 0% gain error, and (c) 25% gain error. VALUES

OF

TABLE III NOISE SOURCES WITHIN PLL

where is 30 MHz/V. The value of the kT/C noise current produced by the switched capacitor operation, is calculated as (11) Fig. 20.

Expanded view of PLL system.

this gain error are to produce a moderate amount of ISI and an error in the modulation deviation. An explanation of the observed ISI and deviation error is given in [7]. In brief, the resulting mismatch creates a parasitic pole/zero pair that occurs near the cutoff frequency of the PLL (84 kHz in this case); the resulting transfer function seen by the data can be viewed as the sum of a low-pass and an all-pass filter. ISI is introduced as data excites the impulse response of the low-pass filter, and modulation deviation error occurs since the magnitude of the all-pass is changed according to the amount of mismatch present. Fig. 20 displays the dominant sources of noise in the prototype; their values are displayed in Table III. Many of these values were obtained through ac simulation of the relevant circuits in HSPICE. Note that all noise sources other than are assumed to be white, so that the values of their variance suffice for their description. This assumption is only approximate for the VCO noise in the prototype, as will be seen in the measured data. Based on measurements, the input referred noise of the VCO was calculated in the table from the expression dBc/Hz at

MHz (10)

where is Boltzmann’s constant, and is temperature in degrees Kelvin. Assuming that each of the noise sources in Fig. 20 are independent of each other, we can express the overall phase noise spectral density at the transmitter output as (12) where and are the noise contributions from the dominant voltage, current, and quantization noise sources. Based on the values in Table III and the model in Fig. 20, we obtain

(13) where

In the case of the division of by two is an approximation based on the fact that the dominant charge pump noise source is switched in and out at each opamp input with a nominal duty cycle of 50%. Note that is given by (2). A plot of the spectra in (13) is shown in Fig. 21(a). Computation of these spectra assumed the parameter values

2058

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997

(a)

(b) Fig. 21. Noise spectra of synthesizer: (a) calculated: (1) charge pump induced, S8 (4) overall, S8 f and (b) measured synthesizer and open-loop VCO noise.

()

listed in Fig. 20 and Table III, and kHz

described by (7) with kHz

kHz As seen in this diagram, the noise from the charge pump dominates at low frequencies, and the influence of the – quantization noise dominates at high frequencies. Fig. 21(b) shows measured noise results from the transmitter prototype taken with an HP 3048A phase noise measurement

()

f ;

(2) VCO and opamp induced,

S

8

()

f ;

(3)

6–1 induced,

S

8

()

f ;

system. (The spurious content of the – modulator was reduced to negligible levels by feeding a binary data stream into the LSB of the modulation path so that the internal states of the – were randomized; the binary data stream was designed to have relatively flat spectral characteristics and negligible levels of spurious energy at frequencies greater than 10 kHz.) The resulting spectrum compares quite well with the calculated curve in Fig. 21(a), especially at high frequency offsets close to 5 MHz. At lower frequencies in the range of 100 kHz, the measured noise is within about 3 dB

PERROTT et al.: CMOS FRACTIONAL-

SYNTHESIZER USING DIGITAL COMPENSATION

of the predicted value; the higher discrepancy in this region was calculated without might be attributed to the fact that considering the offset or transient response of the charge pump and/or the possible inaccuracy of the HSPICE device models at low currents. Note that the spur at 20-MHz offset (the reference frequency), which is due to the 50% nominal duty cycle of the PFD, is less than 60 dBc. Fig. 21(b) demonstrates that the unmodulated transmitter has an output spectrum of 132 dBc/Hz at 5-MHz offset from the carrier. At this frequency offset, simulations reveal that the output spectrum of the modulated transmitter is equal to when its data rate is close to the DECT rate of 1 Mb/s [7]. This being the case, the transmitter satisfies the DECT noise specification of 131 dBc/Hz at 5-MHz offset; eye diagrams for data rates close to 1 Mb/s are found in [7]. VI. CONCLUSION A digital compensation method and key circuits were presented that allow modulation of a frequency synthesizer at rates over an order of magnitude faster than its bandwidth. Using this technique, a transmitter prototype was built that achieves 2.5-Mb/s data rate modulation using GFSK modulation at a carrier frequency of 1.8 GHz. Measured results indicate that the architecture can achieve the modulation and noise performance required by the DECT standard with a structure that is highly integrated and has low power dissipation. In particular, the mostly digital design requires no off-chip filters, no mixers, and no D/A converters in the modulation path. Further, the structure contains only the core components required of a narrowband, spectrally efficient transmitter: a frequency synthesizer and a digital transmit filter. ACKNOWLEDGMENT The authors thank G. Dawe and J. Mourant for guidance in RF issues, A. Chandrakasan for discussion on low power methods, R. Weiner for bonding the die, B. Broughton for aid in phase noise measurements, and M. Trott, P. Ferguson, P. Katzin, Z. Zvonar, and D. Fague for advice. REFERENCES [1] P. Gray and R. Meyer, “Future directions in silicon IC’s for RF personal communications,” in IEEE Custom IC Conf., 1995, pp. 83–90. [2] T. Stetzler, I. Post, J. Havens, and M. Koyama, “A 2.7–4.5 V single-chip GSM transceiver RF integrated circuit,” in Proc. IEEE Int. Solid-State Circuits Conf., Feb. 1995, pp. 150–151. [3] J. Min, A. Rofougaran, H. Samueli, and A. A. Abidi, “An all-CMOS architecture for a low-power frequency-hopped 900 MHz spread spectrum transceiver,” in IEEE Custom IC Conf., 1994, pp. 16.1/1-4. [4] S. Sheng, L. Lynn, J. Peroulas, K. Stone, I. O’Donnell, and R. Brodersen, “A low-power CMOS chipset for spread-spectrum communications,” in Proc. IEEE Int. Solid-State Circuits Conf., Feb. 1996, pp. 346–347. [5] S. Heinen, S. Beyer, and J. Fenk, “A 3.0 V 2 GHz transmitter IC for digital radio communication with integrated VCO’s,” in Proc. IEEE Int. Solid-State Circuits Conf., Feb. 1995, pp. 150–151. [6] S. Heinen, K. Hadjizada, U. Matter, W. Geppert, V. Thomas, S. Weber, S. Beyer, J. Fenk, and E. Matschke, “A 2.7 V 2.5 GHz bipolar chipset for digital wireless communication,” in Proc. IEEE Int. Solid-State Circuits Conf., Feb. 1997, pp. 306–307. [7] M. H. Perrott, “Techniques for high data rate modulation and low power operation of fractional-N frequency synthesizers,” Ph.D. dissertation, MIT, 1997.

2059

[8] T. A. Riley, M. A. Copeland, and T. A. Kwasniewski, “Delta-sigma modulation in fractional-N frequency synthesis,” IEEE J. Solid-State Circuits, vol. 28, pp. 553–559, May 1995. [9] T. A. Riley and M. A. Copeland, “A simplified continuous phase modulator technique,” IEEE Trans. Circuits Syst. II, vol. 41, pp. 321–328, May 1994. [10] B. Miller and B. Conley, “A multiple modulator fractional divider,” in Proc. 44th Annual Symp. on Frequency Control, May 1990, pp. 559–567. [11] B. Miller and B. Conley, “A multiple modulator fractional divider,” IEEE Trans. Instrum. Meas., vol. 40, pp. 578–583, June 1991. [12] J. Candy and G. Temes, Oversampling Delta-Sigma Data Converters. New York: IEEE Press, 1992. [13] Y. Tsividis and J. Voorman, Integrated Continuous-Time Filters. New York: IEEE Press, 1993. [14] T. Kamoto, N. Adachi, and K. Yamashita, “High-speed multi-modulus prescaler IC,” in 1995 Fourth IEEE Int. Conf. Universal Personal Communications. Record. Gateway to the 21st Century, 1995, pp. 991, 325-8. [15] J. Craninckx and M. S. Steyaert, “A 1.75-GHz/3-V dual-modulus divideby-128/129 prescaler in 0.7-m CMOS,” IEEE J. Solid-State Circuits, vol. 31, pp. 890–897, July 1996. [16] M. Thamsirianunt and T. A. Kwasniewski, “A 1.2 m CMOS implementation of a low-power 900-MHz mobile radio frequency synthesizer,” in IEEE Custom IC Conf., 1994, pp. 16.2/1-4. [17] S.-J. Jou, C.-Y. Chen, E.-C. Yang, and C.-C. Su, “A pipelined multiplieraccumulator using a high-speed, low-power static and dynamic full adder design,” IEEE J. Solid-State Circuits, vol. 32, pp. 114–118, Jan. 1997. [18] F. Lu and H. Samueli, “A 200-MHz CMOS pipelined multiplieraccumulator using a quasidomino dynamic full-adder cell design,” IEEE J. Solid-State Circuits, vol. 28, pp. 123–132, Feb. 1993.

Michael H. Perrott (S’97) was born in Austin, TX, in 1967. He received the B.S.E.E. degree from New Mexico State University, Las Cruces, in 1988, and the M.S. and Ph.D. degrees in electrical engineering and computer science from Massachusetts Institute of Technology, Cambridge, in 1992 and 1997, respectively. He currently works at Hewlett-Packard Laboratories, Palo Alto, CA. His interests include signal processing and circuit design applied to communication systems.

Theodore L. Tewksbury III (S’86–M’87) received the S.B. degree in architecture in 1983 and the M.S. and Ph.D. degrees in electrical engineering and computer science in 1987 and 1992, respectively, all from the Massachusetts Institute of Technology, Cambridge. His doctoral dissertation consisted of an experimental and theoretical investigation of the effects of oxide traps on the large-signal transient performance of analog MOS circuits. He joined Analog Devices, Inc., in 1987 as Design Engineer for the Converter Group, where he worked on high-speed, high-resolution data acquisition circuits for video, instrumentation, and medical applications. From 1992 to 1994, as Senior Characterization Engineer, he was involved in the development of high-accuracy analog models for advanced bipolar, BiCMOS, and CMOS processes, with emphasis on the statistical modeling of manufacturing variations. In December 1994, he joined the newly formed Communications Division at Analog Devices as RF Design Engineer. He is presently involved in the design of RF integrated circuits for wireless communications, including GSM, DECT, and DBS. He is also actively involved in the development and modeling of advanced semiconductor technologies for RF applications, including ADRF (Analog Devices bipolar RF process) and silicon germanium.

2060

Charles G. Sodini (S’80–M’82–SM’90–F’95) was born in Pittsburgh, PA, in 1952. He received the B.S.E.E. degree from Purdue University, Lafayette, IN, in 1974 and the M.S.E.E. and the Ph.D. degrees from the University of California, Berkeley, in 1981 and 1982, respectively. He was a Member of the Technical Staff at Hewlett-Packard Laboratories from 1974 to 1982, where he worked on the design of MOS memory and later, on the development of MOS devices with very thin gate dielectrics. He joined the faculty of the Massachusetts Institute of Technology, Cambridge, in 1983, where he is currently a Professor in the Department of Electrical Engineering and Computer Science. His research interests are focused on IC fabrication, device modeling, and device level circuit design, with emphasis on analog and memory circuits and systems. Dr. Sodini held the Analog Devices Career Development Professorship of Massachusetts Institute of Technology’s Department of Electrical Engineering and Computer Science and was awarded the IBM Faculty Development Award from 1985–1987. He has served on a variety of IEEE Conference Committees, including the International Electron Device Meeting where he was the 1989 General Chairman. He was the Technical Program Co-Chairman for the 1992 Symposium on VLSI Circuits and the 1993-1994 Co-Chairman of the Symposium. He has served on the Electron Device Society Administrative Committee from 1988–94 and is currently a member of the Solid-State Circuits Council.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997

780

IEEE JOURNAL ON SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

A CMOS Frequency Synthesizer with an Injection-Locked Frequency Divider for a 5-GHz Wireless LAN Receiver Hamid R. Rategh, Student Member, IEEE, Hirad Samavati, Student Member, IEEE, and Thomas H. Lee, Member, IEEE

Abstract—A fully integrated 5-GHz phase-locked loop (PLL) based frequency synthesizer is designed in a 0 24 m CMOS technology. The power consumption of the synthesizer is significantly reduced by using a tracking injection-locked frequency divider (ILFD) as the first frequency divider in the PLL feedback loop. On-chip spiral inductors with patterned ground shields are also optimized to reduce the VCO and ILFD power consumption and to maximize the locking range of the ILFD. The synthesizer consumes 25 mW of power of which only 3.8 mW is consumed by the VCO and the ILFD combined. The PLL has a bandwidth of 280 kHz and a phase noise of 101 dBc/Hz at 1 MHz offset frequency. The spurious sidebands at the center of adjacent channels are less than 54 dBc. Index Terms—CMOS RF circuits, frequency synthesizers, injection-locked frequency dividers, wireless LAN.

Fig. 1. (a) U-NII and HIPERLAN frequency bands and (b) channel allocation in our U-NII band WLAN system.

I. INTRODUCTION

we present our proposed architecture of the frequency synthesizer which takes advantage of an injection-locked frequency divider (ILFD) to reduce the overall power consumption. Section IV-A is dedicated to the design of the VCO and demonstrates how on-chip spiral inductors can be optimized to reduce the VCO power consumption and to improve the phase noise performance at the same time. Section IV-B describes the design issues of ILFD’s as well as the optimization of on-chip spiral inductors for wide-locking-range and low-power ILFD’s. The pulse swallow frequency divider, charge pump, and loop filter are the subjects of Sections IV-C, IV-D, and IV-E, respectively. The measurement results are presented in Section V and conclusions are made in Section VI.

T

HE DEMAND for wireless local area network (WLAN) systems which can support data rates in excess of 20 Mb/s with very low cost and low power consumption is rapidly increasing. The newly released unlicensed national information infrastructure (U-NII) frequency band in the United States is primarily intended for wideband WLAN and provides 300 MHz of spectrum at 5 GHz [Fig. 1(a)]. The lower 200 MHz of this band (5.15–5.35 GHz) overlaps the European high-performance radio LAN (HIPERLAN) frequency band. The upper 100 MHz of the spectrum which overlaps the industrial, scientific, and medical (ISM) band is not used in our system. To stay compatible with HIPERLAN the lower 200 MHz of the spectrum is divided into eight channels which are 23.5 MHz wide [Fig. 1(b)]. The minimum signal level at the receiver is 70 dBm while the maximum strength of the received signal is 20 dBm. The large dynamic range and wide channel bandwidths set very stringing requirements for the synthesizer phase noise and spurious sideband levels. In this paper we describe the design of a fully integrated integer- frequency synthesizer as a local oscillator (LO) for a U-NII band WLAN receiver. The front end of the receiver is described in [9]. Section II describes some of the synthesizer design challenges and reviews previously existing solutions. In Section III Manuscript received August 2, 1999; revised November 29, 1999. This work was supported by the Stanford Graduate Fellowship program and IBM Corporation. The authors are with the Center for Integrated Systems, Stanford University, Stanford, CA 94305 USA (e-mail: [email protected]). Publisher Item Identifier S 0018-9200(00)02988-7.

II. FREQUENCY SYNTHESIZERS Frequency synthesizers are an essential part of wireless receivers and often consume a large percentage (20–30%) of the total power (Table I). A typical PLL-based frequency synthesizer comprises both high and low frequency blocks. The high frequency blocks, mainly the VCO and first stage of the frequency dividers, are the main power consuming blocks, especially in a CMOS implementation. Therefore, BiCMOS technology has often been chosen over CMOS, where the VCO and the prescaler are designed with bipolar transistors and the low frequency blocks are CMOS [1]. Off-chip VCO’s and dividers have also been used as an alternative [4]. However, because of the increased cost neither of these two solutions is suitable for many applications, and a fully integrated CMOS solution is favorable. A dividerless frequency synthesizer [11] which eliminates power–hungry frequency dividers is one solution for such low-power and fully integrated systems. In this technique an

0018–9200/00$10.00 © 2000 IEEE

RATEGH et al.: CMOS FREQUENCY SYNTHESIZER

781

TABLE I POWER CONSUMPTION OF FULLY INTEGRATED WIRELESS RECEIVERS

aperture phase detector is used to compare the phase of the reference signal and the VCO output at every rising edge of the reference signal for only a time window which is a small fraction of the reference period. Thus no frequency divider is required in this PLL. The idea of a dividerless frequency synthesizer, although suitable for systems such as a GPS receiver where only one LO signal is required, is not readily applied to wireless systems which require multiple LO frequencies with a small frequency separation. Fig. 2. Frequency synthesizer block diagram.

III. PROPOSED SYNTHESIZER ARCHITECTURE Our proposed architecture (Fig. 2) is an integer- frequency synthesizer with an initial low power divide-by-two in the PLL feedback loop. The prescaler follows the fixed frequency divider and operates at half the output frequency, thus, its power consumption is reduced significantly. Furthermore, the first divider is an injection-locked frequency divider [6], [7] which takes advantage of the narrowband nature of the system and trades off bandwidth for power via the use of resonators. To further reduce the power consumption, optimization techniques are used to design the on-chip spiral inductors of the VCO and ILFD. Because of the fixed initial divide-by-two in the loop the reference frequency in our system is half of the LO spacing and is 11 MHz. Consequently, the loop bandwidth is reduced to maintain the loop stability. This bandwidth reduction helps to filter harmonics of the reference signal, mainly the second harmonic, which generate spurs in the middle of the adjacent channels. The drawbacks of a reduced loop bandwidth are an increased settling time and a higher in-band VCO phase noise. The higher in-band VCO phase noise is not a limiting factor as the in-band noise is dominated by the upconverted noise of the reference signal. The slower settling time is only a problem in very fast frequency-hopped systems. The synthesized LO frequency in our system is 16/17 of the received carrier frequency. This choice of LO frequency not only eases the issue of image rejection in the receiver [9], but also facilitates the generation of the second LO, which is 1/16 of the first LO, with the same synthesizer. IV. SYNTHESIZER BUILDING BLOCKS A. Voltage-Controlled Oscillator Fig. 3 shows the schematic of the VCO. Two cross-coupled transistors M1 and M2 generate the negative impedance required to cancel the losses of the RLC tank. On-chip spiral inductors with patterned ground shields [15] are used in this design. The two main requirements for the VCO are low phase noise and low power consumption. If the inductors were the main source of noise, maximizing their quality factor would improve the phase noise significantly. However, in multi-GHz VCO’s with short channel transistors, inductors are not the

main source of noise and a better design strategy is to maximize the effective parallel impedance of the RLC tank at resonance. This choice increases the oscillation amplitude for a given power consumption and hence reduces the phase noise caused by the noise injection from the active devices. Since inductors product should are the main source of loss in the tank, the be maximized to maximize the effective parallel impedance of the tank at resonance, where is the inductance and is the quality factor of the spiral inductors. It is important to realize alone does not necessarily maximize the that maximizing product, and it is the latter that matters here. To design the spiral inductors, we use the same inductor model reported in [14]. The inductance is first approximated with a monomial expression as in [3]. Optimization is used product. The next to find the inductor with the maximum inductors in this design are 2.26 nH each with an estimated quality factor of 5.8 at 5 GHz. It is worth mentioning that at 5 GHz, the magnetic loss in the highly doped substrate of the epi process reduces the inductor quality factor significantly. Approximate calculations show that substrate inductive loss is proportional to the cube of the inductor’s outer diameter. Therefore, a multilayer stacked inductor which has a smaller area compared to a single-layer inductor with the same inductance may achieve a larger quality factor. We should mention that in our design, inductors are laid out using only the top-most metal layer. The varactors in Fig. 3 are accumulation-mode MOS capacitors [5], [12]. The quality factor of these varactors can be substantially degraded by gate resistance if they are not laid out properly. In our design each varactor is laid out with 14 fingers which are 3 m wide and 0.5 m long. The quality factor of this varactor at 5 GHz is estimated to exceed 60. The losses of the RLC tank are thus dominated by the inductors, as expected. B. Injection-Locked Frequency Divider Fig. 4 shows the schematic of the voltage–controlled ILFD used in the frequency synthesizer. The incident signal (the VCO output) is injected into the gate of M3 and is delivered with a subunity voltage gain to Vx, the common source connection of M1 and M2. Transistor M4 is used to provide a symmetric

782

IEEE JOURNAL ON SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

Fig. 3. Schematic of the VCO. Fig. 4. Schematic of the differential ILFD.

load for the VCO. The output signal is fed back to the gates of M1 and M2 and is summed with the incident signal across the gates and sources of M1 and M2. The nonlinearity of M1 and M2 generates intermodulation products which allow sustained oscillation at a fraction of the input frequency [6]. As shown in [6] in the special case of a divide-by-two and a third-order ILFD nonlinearity, the phase-limited locking range of an can be expressed as

(1)

where free–running oscillation frequency; frequency offset from ; incident amplitude; impedance of the RLC tank at resonance; quality factor of the RLC tank; second-order coefficient of the nonlinearity. As (1) suggests, a larger incident amplitude as well as a larger result in a larger achievable which we refer to as the oscillator , so the largest locking range. In an practical inductance should be used to maximize the locking range. A larger quadratic nonlinearity ( ) also increases the locking range. So a circuit architecture with a large second-order nonlinearity is favorable for a divide-by-two ILFD and in fact the circuit in Fig. 4 has such a characteristic. The common source connection node of the differential pair moves at twice the frequency of the output signal even in the absence of the incident signal. So this circuit has a natural tendency for divide-by-two operation when the incident signal is effectively injected into node Vx. To further extend the locking range, the ILFD is designed such that the resonant frequency of its output tank tracks the input frequency. Accumulation mode MOS varactors are used to tune the ILFD and its control voltage is tied to the VCO control voltage (Fig. 2). The locking range of the ILFD therefore does not limit the tuning range of the PLL beyond what is determined by the VCO.

As in the VCO design, on-chip spiral inductors with patterned ground shields are used in the ILFD, but with a different optimization objective. As mentioned earlier the largest practical inductance maximizes the locking range. However, reduction of power consumption demands maximization of the product. The inductor has its largest value when the total capacitance that resonates with it is minimized. To reduce its parasitic bottom plate capacitance the inductor should be laid out with narrow topmost metal lines. However, the large series resistance of narrow metal strips degrades the inductor quality factor product significantly. Therefore, both and and reduces the product may not be maximized simultaneously for an the on-chip spiral inductor resonating with a fixed capacitance. Optimization is thus used to design for the maximum inductance product is large enough to satisfy the specisuch that the fied power budget. The inductors resulting from this trade-off are 9.5 nH each with an estimated quality factor of 4.2 at the divider output frequency (2.5 GHz). C. Pulse Swallow Frequency Divider ) consists of a The pulse swallow frequency divider ( prescaler followed by a program and pulse swallow counter. Only one CMOS logic ripple counter is used for both program and pulse swallow counters. The program counter generates one output pulse for every ten input pulses. The output of the pulse swallow counter is controlled by three channel select bits. The overall division ratio is 220–227. At the beginning of the cycle the prescaler divides by 23. As soon as the first three bits of the ripple counter match the channel select bits, the prescaler begins to divide by 22. The next cycle starts after the ripple counter counts to ten. The prescaler consists of three dual-modulus divide-by-2/3 and one divide-by-2 frequency divider made of source-coupled logic (SCL) flip-flops and gates (Fig. 5). The modulus control (MC) input selects between divide-by-22 and divide-by-23. Except for the second dual modulus all other dividers including the CMOS counters are triggered by the falling edges of their input clocks, allowing a delay of as much as half the period of the input of each divider. With this arrangement we guarantee , and (Fig. 5) and prevent a race overlap between condition.

RATEGH et al.: CMOS FREQUENCY SYNTHESIZER

783

Fig. 6. Simplified schematic of the charge pump and loop filter. Fig. 5. Block diagram of the prescaler.

D. Charge Pump Fig. 6 shows the circuit diagram of the charge pump and loop filter. The charge pump has a differential architecture. However, , drives the loop filter. To preonly a single output node, from drifting to the rails when neither of the vent the node up and down signals (U and D) is active, the unity gain buffer shown in Fig. 6 is placed between the two output nodes. This buffer keeps the two output nodes at the same potential and thus reduces the charge pump offset. The power of the spurious sidebands in the synthesized output signal is thereby reduced. In this charge pump the current sources are always on and the PMOS and NMOS switches are used to steer the current from one branch of the charge pump to the other.

Linearized PLL model.

where is the crossover frequency. By differentiating (4) with it can be shown that the maximum phase margin respect to is achieved at (5) and the maximum phase margin is

E. Loop Filter and capacitor in the loop filter (Fig. 6) genResistor . Capacitor erate a pole at the origin and a zero at and the combination of and are used to add extra poles at frequencies higher than the PLL bandwidth to reduce reference feedthrough and decrease the spurious sidebands at harmonics and , of the reference frequency. The thermal noise of although filtered by the loop, directly modulates the VCO control voltage and can cause substantial phase noise in the VCO if the resistors are not sized properly. The capacitors and resistors of the loop filter should be properly chosen to perform the required filtering function and maintain the stability of the loop without introducing too much noise. Fig. 7 shows a linearized phased-locked loop model. In a third-order loop, the loop filter , and and its impedance can be written contains only as

Notice that the maximum phase margin is only a function of (ratio of and ) and for less than 1 the phase margin is less than 20 which makes the loop practically unstable. to To complete our loop analysis we force be the crossover frequency of the loop and get

(7)

is the charge pump

(4)

(8)

. The open loop transfer

(3) is the VCO gain constant and where current. The phase margin of the loop is

(6)

Now we can define a loop filter design recipe as follows. from the VCO simulation. 1) Find 2) Choose a desired phase margin and find from (6). 3) Choose the loop bandwidth and find from (5). and such that they satisfy (7). 4) Select . If the calculated 5) Calculate the noise contribution of noise is negligible the design is complete, otherwise go back to step four and increase . The same loop analysis can be repeated for a fourth-order loop. In this case the phase margin is

(2) and where function of the third-order PLL is

Fig. 7.

784

IEEE JOURNAL ON SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

Fig. 8. Die micrograph.

where

and . The crossover frequency for the maximum phase margin is shown in (9), at the bottom of the page. to be the crossover frequency it should satisfy Finally for (10), shown at the bottom of the page. As in the third-order loop the maximum phase margin is not a function of the absolute values of the ’s and ’s and is only a , and ). The loop function of their ratios ( filter design recipe for the fourth-order loop is modified as follows. from the VCO simulation. 1) Find , 2) Choose a desired phase margin and find from (8) and (9). and

3) Choose the loop bandwidth and find from (9). and such that they satisfy (10). 4) Select and .If their 5) Calculate the noise contribution of noise contribution is negligible the design is complete, otherwise go back to step four and increase . Notice that in a fourth-order loop there are two degrees of and to achieve a defreedom in choosing sired phase margin. Therefore, the suppression of the spurious sidebands can be improved without reducing the phase margin or the loop bandwidth. In our system the maximum VCO gain constant is 500 MHz/V. With this VCO gain, and loop filter values k pF pF of k pF, and A, the crossover frequency is about 280 kHz with a 46 phase margin. The calculated contribution to VCO phase noise at 10 MHz offset frequency is 137 dBc/Hz, which is negligible compared to the intrinsic noise of the VCO. V. MEASUREMENT RESULTS The frequency synthesizer is designed in a 0.24- m CMOS technology. Fig. 8 shows the die micrograph of the synthesizer with an area of 1 mm 1.6 mm, including pads.

(9)

(10)

RATEGH et al.: CMOS FREQUENCY SYNTHESIZER

Fig. 9.

785

Fig. 11.

ILFD phase noise measurements.

Fig. 12.

Phase noise of the synthesizer output signal.

VCO tuning range.

Fig. 10. ILFD locking range and power consumption as a function of incident amplitude.

TABLE II ILFD PERFORMANCE SUMMARY

The analog blocks (VCO, ILFD, and prescaler) are supplied by 1.5 V while the digital portions of the synthesizer are supplied by 2 V. The reason for this choice of supplies is to achieve a larger tuning range for the VCO. The accumulation mode MOS ) capacitors in this technology have a flatband voltage ( around zero volts. Thus to get the full range of capacitor variation the control voltage should exceed the VCO supply to produce a net negative voltage across the varactors in Fig. 3. To eliminate a need for multiple supplies the VCO can be biased with a PMOS current source, and by connecting the sources of M1 and M2 to ground. More than 500 MHz (10% of the center frequency) of VCO tuning range is achieved for a 1.5-V control voltage variation (Fig. 9). The free-running oscillation frequency of the ILFD changes of the center frequency) for a 1.5-V more than 110 MHz ( control voltage variation. Fig. 10 shows the locking range of the ILFD as a function of the incident amplitude for two different control voltages. As expected, changing the control voltage only changes the operation frequencies and not the locking range. The ILFD’s average power consumption is also shown on the same figure. Increasing the incident amplitude increases the locking range and the average power consumption. The average power at 1-V incident

amplitude is less than 0.8 mW while the locking range exceeds of the center frequency). 1000 MHz ( The ILFD phase noise measurement results are shown in Fig. 11. The solid line shows the phase noise of the HP83732B signal generator used as the incident signal. The dashed line is the phase noise of the free–running ILFD. The two other curves are the phase noise of the ILFD when locked to two different incident frequencies. The curve marked as middle frequency is measured when the incident frequency is in the middle of the locking range and the edge frequency curve is measured at the lower edge of the locking range. At low offset frequencies the output of the frequency divider follows the phase noise of the incident signal and is 6 dB lower due to the divide-by-two operation. However, at larger offset frequencies the added noise from the divider itself, the external amplifier, and measurement tools reduces the 6 dB difference between the incident and

786

IEEE JOURNAL ON SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

TABLE III MEASURED SYNTHESIZER PERFORMANCE

output phase noise. The ILFD phase noise measurements for offset frequencies higher than 200 kHz are not accurate due to the dominance of noise from the external amplifier. The spurious tones at 11-MHz offset frequency from the center frequency are more than 45 dB below the carrier. The spurs at the 22-MHz offset frequency are at 54 dBc. Since the LO spacing is twice the reference frequency, the spurs at 11-MHz offset frequency fall at the edge of each channel and are less critical than the 22-MHz spurs which are located at the center of adjacent channels. With the 54 dBc spurs at 22 MHz offset frequency, an undesired adjacent channel may be 44 dB stronger than the desired channel for a minimum 10 dB signal-to-interference ratio. Phase noise measurements of the complete synthesizer output signal are shown in Fig. 12. The phase noise at small offset frequencies is mainly determined by the phase noise of the reference signal. The phase noise measured at offset frequencies beyond the PLL bandwidth is the inherent VCO phase noise. The phase noise at 1-MHz offset frequency is measured to be 101 dBc/Hz. The phase noise at 22 MHz offset frequency is extrapolated to be 127.5 dBc/Hz. Therefore the signal in the adjacent channel can be 43 dB stronger than that of the desired channel for a 10 dB signal–to–interference ratio. VI. CONCLUSION In this work we demonstrate the design of a fully integrated, 5-GHz CMOS frequency synthesizer designed for a U-NII band WLAN system. The tracking injection-locked frequency divider used as the first divider in the PLL feedback loop reduces the power consumption considerably without limiting the performance of the PLL. Table II summarizes the performance of the ILFD. The power consumption of two flip-flop based frequency dividers at 5 GHz are also listed for comparison purposes. In a 0.24- m CMOS technology a simulated SCL flip-flop based

frequency divider loaded with the same capacitance as in the ILFD consumes almost an order of magnitude more power than the ILFD with a 600-MHz locking range. The measurement results of a fast flip-flop based divider in an advanced 0.1- m CMOS technology show a power consumption of 2.6 mW at 5 GHz [8] which is more than four times the power of the ILFD with a 600 MHz locking range. Table III summarizes the performance of the synthesizer. The spurious sidebands at offset frequencies of twice the reference signal are more than 54 dB below the carrier. The spurs are and signals to mainly due to charge injection from the the loop, and can be reduced significantly by using a cascode structure for transistors M1–M4 (Fig. 6). Better matching between the up and down current sources also improves the sideband spurs. Of the 25-mW total power consumption, less than 3.8 mW is consumed by the VCO and ILFD combined. This low power consumption is achieved by the optimized design of the spiral inductors in the VCO and ILFD. The prescaler operates at 2.5 GHz and consumes 19 mW, of which about 40% is consumed in the first 2/3 dual modulus divider. Therefore the ILFD, which takes advantage of narrowband resonators, consumes an order of magnitude less power than the first 2/3 dual modulus divider, while operating at twice the frequency. ACKNOWLEDGMENT The authors would like to thank Dr. M. Hershenson, Dr. S. Mohan, and T. Soorapanth for their valuable technical discussions and help. They also thank National Semiconductor for fabricating the chip. REFERENCES [1] T. S. Aytur and B. Razavi, “A 2-GHz, 6 mW BiCMOS frequency synthesizer,” IEEE J. Solid-State Circuits, vol. 30, pp. 1457–1462, Dec. 1995. [2] J. Craninckx and M. Steyaert, “A fully integrated CMOS DCS-1800 frequency synthesizer,” in ISSCC Dig., 1998, pp. 372–373. [3] M. Hershenson, S. S. Mohan, S. P. Boyd, and T. H. Lee, “Optimization of inductor circuits via geometric programming,” in Design Automation Conf. Dig., June 1999, pp. 994–998. [4] C. G. S. M. H. Perrott and T. L. Tewksbury, “A 27-mW CMOS fractional-N synthesizer using digital compensation for 2.5-Mb/s GFSK modulation,” IEEE J. Solid-State Circuits, vol. 32, pp. 2048–2059, Dec. 1997. [5] A. S. Porret, T. Melly, and C. C. Enz, “Design of high-Q varactors for low-power wireless applications using a standard CMOS process,” in Custom Integrated Circuits Conf. Dig., May 1999, pp. 641–644. [6] H. R. Rategh and T. H. Lee, “Superharmonic injection-locked frequency dividers,” IEEE J. Solid-State Circuits, vol. 34, pp. 813–821, June 1999. [7] H. R. Rategh, H. Samavati, and T. H. Lee, “A 5GHz, 1mW CMOS voltage controlled differential injection-locked frequency divider,” in Custom Integrated Circuits Conf. Dig., May 1999, pp. 517–520. [8] B. Razavi, K. F. Lee, and R. H. Yan, “Design of high-speed, low-power frequency dividers and phase-locked loops in deep submicron CMOS,” IEEE J. Solid-State Circuits, vol. 30, pp. 101–109, Feb. 1995. [9] H. Samavati, H. R. Rategh, and T. H. Lee, “A 5GHz CMOS wireless-LAN receiver front-end,” IEEE J. Solid-State Circuits, vol. 35, pp. xxx–xxx, May 2000. [10] D. Shaeffer, A. Shahani, S. Mohan, H. Samavati, H. Rategh, M. Hershenson, M. Xu, C. Yue, D. Eddleman, and T. Lee, “A 115-mW, 0.5-m CMOS GPS receiver with wide dynamic-range active filters,” IEEE J. Solid-State Circuits, vol. 33, pp. 2219–2231, Dec. 1998. [11] A. Shahani, D. Shaeffer, S. Mohan, H. Samavati, H. Rategh, M. Hershenson, M. Xu, C. Yue, D. Eddleman, and T. Lee, “Low-power dividerless frequency synthesis using aperture phase detector,” IEEE J. Solid-State Circuits, vol. 33, pp. 2232–2239, Dec. 1998.

RATEGH et al.: CMOS FREQUENCY SYNTHESIZER

[12] T. Soorapanth, C. P. Yue, D. K. Shaeffer, T. H. Lee, and S. S. Wong, “Analysis and optimization of accumulation-mode varactor for RF ICs,” in Symp. VLSI Circuits Dig., 1998, pp. 32–33. [13] M. Steyaert, M. Borremans, J. Janssens, B. D. Muer, N. Itoh, J. Craninckx, J. Crols, E. Morifuji, H. S. Momose, and W. Sansen, “A single-chip CMOS transceiver for DCS-1800 wireless communications,” in ISSCC Dig., 1998, pp. 48–49. [14] C. P. Yue, C. Ryu, J. Lau, T. H. Lee, and S. S. Wong, “A physical model for planar spiral inductors on silicon,” in IEDM Tech. Dig., 1996, pp. 6.5.1–6.5.4. [15] C. P. Yue and S. S. Wong, “On-chip spiral inductors with patterned ground shields for Si-Based RF IC’s,” in Symp. VLSI Circuits Dig., 1997, pp. 85–86.

Hamid R. Rategh (S’99) was born in Shiraz, Iran in 1972. He received the B.S. degree in electrical engineering from Sharif University of Technology, Tehran, Iran, in 1994 and the M.S. degree in biomedical engineering from Case Western Reserve University, Cleveland, OH, in 1996. He is currently pursuing the Ph.D. degree in the Department of Electrical Engineering, Stanford University, Stanford, CA. During the summer of 1997, he was with Rockwell Semiconductor System, Newport Beach, CA, where he was involved in the design of a CMOS dual-band, GSM/DCS1800, direct conversion receiver. His current research interests are in low-power radio frequency (RF) integrated circuits design for high-data-rate wireless local area network systems. Mr. Rategh received the Stanford Graduate Fellowship in 1997. He was a member of the Iranian team in the 21st International Physics Olympiad, Groningen, the Netherlands.

787

Hirad Samavati (S’99) received the B.S. degree in electrical engineering from Sharif University of Technology, Tehran, Iran, in 1994, and the M.S. degree in electrical engineering from Stanford University, Stanford, CA, in 1996. He currently is pursuing the Ph.D. degree at Stanford University. During the summer of 1996, he was with Maxim Integrated Products, where he designed building blocks for a low-power infrared transceiver IC. His research interests include RF circuits and analog and mixed-signal VLSI, particularly integrated transceivers for wireless communications. Mr. Samavati received a departmental fellowship from Stanford University in 1995 and a fellowship from the IBM Corporation in 1998. He is the winner of the ISSCC Jack Kilby outstanding student paper award for the paper “Fractal Capacitors” in 1998.

Thomas H. Lee (M’96) received the S.B., S.M. and Sc.D. degrees in electrical engineering, all from the Massachusetts Institute of Technology (MIT), Cambridge, in 1983, 1985, and 1990, respectively. He joined Analog Devices in 1990, where he was primarily engaged in the design of high-speed clock recovery devices. In 1992, he joined Rambus, Inc., Mountain View, CA, where he developed high-speed analog circuitry for 500 megabyte/s CMOS DRAM’s. He has also contributed to the development of PLL’s in the StrongARM, Alpha, and K6/K7 microprocessors. Since 1994, he has been an Assistant Professor of electrical engineering at Stanford University, where his research focus has been on gigahertz-speed wireline and wireless integrated circuits built in conventional silicon technologies, particularly CMOS. He holds 12 U.S. patents and is the author of a textbook, The Design of CMOS Radio-Frequency Integrated Circuits (Cambridge, MA: Cambridge Press, 1998), and is a coauthor of two additional books on RF circuit design. He is also a cofounder of Matrix Semiconductor. Dr. Lee has twice received the “Best Paper” award at the International SolidState Circuits Conference, was coauthor of a “Best Student Paper” at ISSCC, and recently won a Packard Foundation Fellowship. He is a Distinguished Lecturer of the IEEE Solid-State Circuits Society, and was recently named a Distinguished Microwave Lecturer.

536

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 5, MAY 2002

A Fully Integrated CMOS Frequency Synthesizer With Charge-Averaging Charge Pump and Dual-Path Loop Filter for PCS- and Cellular-CDMA Wireless Systems Yido Koo, Hyungki Huh, Yongsik Cho, Jeongwoo Lee, Joonbae Park, Kyeongho Lee, Deog-Kyoon Jeong, and Wonchan Kim

Abstract—A fully integrated CMOS frequency synthesizer for PCS- and cellular-CDMA systems is integrated in a 0.35- m CMOS technology. The proposed charge-averaging charge pump scheme suppresses fractional spurs to the level of noise, and the improved architecture of the dual-path loop filter makes it possible to implement a large time constant on a chip. With current-feedback bias and coarse tuning, a voltage-controlled oscillator (VCO) enables constant power and low gain of the VCO. Power dissipation is 60 mW with a 3.0-V supply. The proposed frequency synthesizer provides 10-kHz channel spacing with phase noise of 121 dBc/Hz in the PCS band and 127 dBc/Hz in the cellular band, both at 1-MHz offset frequency. Index Terms—Bonding-wire inductor, CMOS RF, coarse tuning, dual-path loop filter, fractional- -type prescaler, frequency synthesizer, phase noise, phase-locked loop. Fig. 1.

Dual-band CDMA RF transceiver.

I. INTRODUCTION

W

IRELESS systems, such as PCS-CDMA, cellular CDMA, and JSTD-018PCS, require the frequency synthesizer to have precise channel spacing and low phase noise to meet the overall noise specification and to prevent unwanted signal mixing of the interferer. Most existing frequency synthesizers are implemented in silicon germanium (SiGe) or bipolar technologies, and use several external devices such as temperature-compensated crystal oscillator (TCXO) and loop filter. Because of cost and power consumption requirements, fully integrated CMOS RF building blocks are crucial and have been widely explored [1], [2]. Fig. 1 shows an example of a dual-band RF transceiver architecture for PCS- and cellular CDMA. A local oscillator (LO) signal from a dual-band frequency synthesizer is fed to the first mixer of the receiver for downconversion and it is also used in the transmitter for upconversion. The noise requirement of the frequency synthesizer is determined by the blocking profile of the system, which is calculated from the power of signal and interferer, minimum signal-to-noise ratio (SNR), and bandwidth

Manuscript received July 28, 2001; revised November 9, 2001. Y. Koo, H. Huh, Y. Cho, D.-K. Jeong, and W. Kim are with the School of Electrical Engineering and Computer Science, Seoul National University, Seoul 151-742, Korea (e-mail: [email protected]). J. Lee, J. Park, and K. Lee are with GCT Semiconductor, Inc., San Jose, CA 95131 USA. Publisher Item Identifier S 0018-9200(02)03676-4.

specification [3]. The lower the phase noise of the LO signal is, the less unwanted signal around the carrier is modulated within the in-band channel. Table I shows worldwide mobile frequency standards and RF phase-locked loop (PLL) requirements. CDMA systems require fast switching time with precise accuracy of the channel frequency. The channel raster is 30 kHz in cellular and 50 kHz in PCS systems and, to support the dual-band solution of the CDMA system, the frequency resolution of the synthesizer must be 10 kHz. This is a major limiting factor in the reduction of the locking time and root mean square (rms) phase error. It also makes it difficult to achieve single-chip integration due to the loop filter that has a large time constant. In Section II, the special features of the proposed frequency synthesizer are discussed. Section III describes several building blocks of the synthesizer. The measurement results are given in Section IV, and conclusions are presented in Section V. II. SYSTEM ARCHITECTURE The proposed PLL is a monolithic integrated circuit that performs dual-band RF synthesis for CDMA wireless communication applications without any external device. Fig. 2 shows the block diagram of the fractional- -type frequency synthesizer is mainly architecture. The external reference frequency 19.68 MHz, and 19.8 and 19.2 MHz are also supported. The voltage-controlled oscillator (VCO) oscillates at 1.7 GHz for

0018-9200/02$17.00 © 2002 IEEE

KOO et al.: FULLY INTEGRATED CMOS FREQUENCY SYNTHESIZER

537

TABLE I WORLDWIDE MOBILE FREQUENCY STANDARDS AND RF-PLL REQUIREMENTS

Fig. 3. Timing diagram of reference and VCO inputs of PFD in locked state when the fraction is 1/3.

Fig. 2.

N frequency synthesizer architecture.

Fractional-

the PCS band, and at 900 MHz for the cellular band, and employs bonding wire as the inductor of the inductance–capacitance (LC) tank. is 10 kHz To meet the frequency resolution of 10 kHz, and loop bandwidth is only 1 kHz with an integer- -type prescaler. To reduce rms phase error, it is very important for the PLL to have a wide bandwidth. The fractional- architecture, compared with its integer- counterpart, has a wider loop bandwidth with the same frequency resolution. However, it suffers from a major drawback called fractional spur. The fractional- structure is employed for the prescaler in the proposed architecture. Channel selection and other control signals are provided through a serial interface. To suppress the fractional spur, a special type of charge pump is designed. The new scheme of the dual-path loop filter enables flexible filter design for on-chip integration. The VCO combined with current-feedback bias and coarse tuning enables constant power and low gain of the VCO. III. SYNTHESIZER BUILDING BLOCKS A. Charge Pump With Charge-Averaging Scheme While the reference spur occurs in integer- synthesis due to charge-pump mismatch, fractional- synthesis suffers from the fractional spur caused by the phase difference in the locked state as well as charge-pump mismatch. Fig. 3 shows the timing diagram of the reference and the VCO inputs of the phase/frequency detector (PFD) in the locked state. The fractional spur mainly stems from the variation of the prescaler division factor, in other words, the phase difference

between reference and VCO inputs at every cycle of operation of the PFD. For example, to achieve a locking at 1/3 fractional frequency, as shown in Fig. 3, the prescaler division factor is in one cycle and in the other two cycles in every successive three cycles. It produces voltage ripples in the control signal of the VCO and therefore a fractional spur occurs. However, the average phase error during one circulation is zero in the locked state. As seen in Fig. 3, the sum of successive phase errors during three clock cycles is zero. This is the motivation of the proposed scheme. Fig. 4 shows the proposed charge-averaging scheme and its operation. The charge pump is composed of four current sources and four sampling capacitors. The same up/dn signals are fed to each current source, which has 1/3 of the total current. Since the fractional coefficient of fractional- is three in this design, we use four pairs of switches and four capacitors in the charge – and capacitor stores pump. Each pair of switches and then the charge that is injected from the pump during dumps it in the next period in turn. In the locked state, the sum of the phase errors in the three cycles is zero; therefore, the after charge summing during is the same voltage of . This results in no voltage ripple in the dump mode. The as switching noise due to charge sharing and clock feedthrough may affect the amount of charge in the capacitors. These could be static errors. However, they influence each capacitor equally. This type of static dc error does not cause voltage ripples, i.e., a fractional spur. Fig. 5 shows the behavioral simulation results of the conventional fractional- architecture and the proposed charge-averaging scheme. In simulation, the VCO gain is set as perfectly linear and neither current mismatch nor clock feedthrough in the switch is assumed. Other characteristics are the same in both cases, except for the charge pump. In the conventional scheme

538

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 5, MAY 2002

Fig. 6.

Fig. 4.

Mode change of charge-pump operation.

Charge-averaging scheme. (a) Block diagram. (b) Its operation.

N

Fig. 5. Behavioral simulation of spur in fractional- architecture. (a) With conventional scheme. (b) With proposed charge-averaging scheme.

shown in Fig. 5(a), approximately 50 dBc of fractional spur is found, but in the proposed scheme shown in Fig. 5(b), the fractional spur is suppressed to the noise level. With respect to the locking time, the charge-averaging scheme can exhibit an undesirable effect. If the size of the sampling capacitor is very small compared to that of the loop filter capacitor, the locking time is increased. To solve this problem, in the acquisition mode, the charge pump operates in the same way as the normal charge pump, which means that the charge pump is directly connected to the loop filter. After locking to the desired frequency, the charge-averaging mode is employed (Fig. 6). Therefore, there is no additional burden in locking time. In the ac analysis of the loop characteristic, time delay degrades the phase margin. As the time delay goes larger, the system becomes more unstable. The averaging method in this scheme results in an added time delay in the loop characteristic.

Fig. 7. Loop filter characteristics. (a) Conventional second-order loop filter. (b) Dual-path loop filter.

Therefore, there exists a tradeoff between loop bandwidth and loop stability. B. Dual-Path Loop Filter Most PLLs use a second-order loop filter to suppress the control voltage ripple and to guarantee an appropriate phase margin. Fig. 7(a) shows a conventional second-order loop filter and the expression for control voltage in ac analysis. As described above, the frequency resolution is 10 kHz and the bandwidth of the PLL is a few kilohertz. This means that more than 1 nF of capacitance should be integrated on a chip, which is

KOO et al.: FULLY INTEGRATED CMOS FREQUENCY SYNTHESIZER

Fig. 8.

539

Proposed architecture of dual-path loop filter.

the major limiting factor for on-chip integration. In addition, the thermal noise generated in a large resistor is modulated to phase noise via the control signal. The dual-path loop filter in Fig. 7(b) can be a solution for this problem. It separates the and , so it is possible to design the loop filter loop with more freely in conjunction with the pump current while keeping the loop transfer function nearly the same as that of a normal second-order loop filter. An example of the dual-path loop filter was proposed by Craninckx and Steyaert [4]. In spite of many advantages inherent in this architecture, it has two active devices, an amplifier and a current adder. These inject additional noise into the control voltage, and after modulation in the VCO, the phase noise may increase. In addition, the floating capacitor across the amplifier is implemented with the metal-to-metal capacitor, so it requires a large area. Fig. 8 shows the new dual-path loop filter implementation that is proposed in this paper. A unity-gain buffer is inserted between and to separate and . If continuously follows , the operation is the same as that of the normal second-order loop filter. It is less noisy because there is only one active device and requires a smaller area since there is no floating capacitor. and are implemented using two pMOS transistors, whose source, drain, and bulk are tied to a separate, quiet supply.

Fig. 9. Bonding wire for inductor of VCO. (a) Pad and lead frame. (b) Modeling.

C. Voltage-Controlled Oscillator Two major issues in the design of the VCO are low phase noise to meet the overall noise figure criteria and high gain linearity for robust stability. Phase noise is mainly dependent of the LC tank [5]. Although an on the quality factor on-chip spiral inductor has recently been widely explored [6], a bonding-wire inductor is superior to a spiral inductor in terms of resistance, i.e., quality factor. In addition, the bonding-wire inductor has constant inductance over a wide frequency range. Fig. 9 shows bonding-wire inductor modeling. Two pads are connected to the differential output of the VCO and the ends of two lead frames are connected as a short or by external inductance, according to the operating band, PCS or cellular. factor of the inductance is expressed as The with parasitic components ignored. If the parameters of the nH, , pH, bonding wire are m , which are typical values in the QFN20 package, the factor of the inductor at 1.7 GHz is 43. Fig. 10 shows the circuit diagram of the VCO. The oscillation frequency is controlled by the combination of fixed and

Fig. 10. Circuit diagram of (a) VCO and (b) bias circuit.

variable capacitor. There are two methods that have been previously reported for implementation of the fixed capacitor. One is a metal-to-metal capacitor [1] and the other is a MOS transistor [7]. The former is used since it is superior in terms of VCO pushing characteristics. Metal-to-metal capacitors and switches are used for coarse tuning, which are controlled by the coarse should be tuning controller. The size of the switch sufficiently large in order to avoid degradation of the factor of the LC tank. As a variable capacitor, an accumulation-mode MOS transistor is used for fine tuning. The MOS capacitor has an inherent nonlinear capacitance. But, with the coarse tuning scheme, the control voltage moves within 0.2 V around half of the supply voltage, thereby obtaining almost linear gain.

540

Fig. 11.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 5, MAY 2002

Output swing of VCO versus variations of inductance.

Fig. 13. Coarse tuning controller. (a) Block diagram. (b) Timing diagram of operation.

Fig. 14.

Die microphotograph.

of 40 MHz/V, and 30 MHz/V is typical. The total range is divided into 64 levels by coarse tuning, and the frequency spacing of two adjacent curves is approximately 7 MHz in the PCS band. D. Coarse Tuning Architecture Fig. 12.

VCO tuning range of (a) cellular band, and (b) PCS band.

The output swing level of the VCO is another key issue, since the receiver and transmitter chips expect an LO signal of constant power. Generally bonding-wire inductance varies by 10 and to compensate, the total capacitance varies accordingly. This produces the variation of the VCO swing level. The bias circuit in Fig. 10(b) is designed to have a constant output swing regardless of capacitance. It monitors the operating status of the fixed and variable capacitors and provides current in the direction of compensating for the output swing. Fig. 11 shows the VCO swing variation for the conventional and proposed scheme when the inductance varies. Fig. 12 shows the measured frequency tuning range in the cellular and PCS band. More than 25% of the tuning range is obtained in each case. It is sufficient to compensate for the variation of bonding-wire inductance. The VCO gain is a maximum

Small VCO gain is important for reduction of both spur and phase noise. The frequency spur is directly proportional to the VCO gain. Also, as the VCO gain is reduced, the fluctuation of control and supply voltages are less modulated to the phase noise of the VCO. To meet the requirement of the wide frequency range and the small VCO gain, the coarse tuning controller, shown in Fig. 13, is designed to control the fixed capacitor in the VCO. The coarse tuning controller, shown in Fig. 13, is composed of an edge detector, counter/comparator, lock filter, and shift are register. During coarse tuning, the rising edges of and the result is compared counted in one period of to a predetermined value of desired frequency. Starting from the center frequency, a proper level is found by a successive approach. The lock-detection filter determines when to start the fine tuning process by monitoring the up/dn signal. The total elapsed time in coarse tuning is less than 100 s.

KOO et al.: FULLY INTEGRATED CMOS FREQUENCY SYNTHESIZER

541

TABLE II SUMMARY OF SYNTHESIZER PERFORMANCE

Fig. 15.

Measured carrier spectrum.

at one output of the VCO is 1.05 nH. All control signals are fed through a serial interface. The VCO at the bottom and the loop filter on the left have a common analog supply and ground, and the others are all connected to the digital supply and ground. The power consumption of the total chip is 60 mW and the VCO alone dissipates 12.3 mW. Fig. 15 shows the measured carrier spectrum with the center frequency of 980 MHz. The output power is 1.2 dBm with an inductive load, which is sufficient for the output power requirements. Fig. 16 shows the measured phase noise in the cellular and PCS band. Phase noise is 106 dBc/Hz at 100-kHz offset and 127 dBc/Hz at 1-MHz offset in the cellular band, and 104 dBc/Hz at 100-kHz offset and 121 dBc/Hz at 1-MHz offset in the PCS band. Fractional spurs are suppressed to the phase noise level. Table II shows the performance summary. V. CONCLUSION In this paper, we demonstrate a fully integrated CMOS frequency synthesizer designed for PCS- and cellular-CDMA wireless systems. A charge-averaging scheme for reducing fractional spurs and a dual-path loop filter architecture are proposed. The new bias circuit of the VCO compensates for the variation of output swing of the VCO caused by the variation of bonding-wire inductance, and the proposed coarse tuning technique achieves a small VCO gain and a wide operating frequency range of the VCO simultaneously. The frequency synthesizer fabricated in a 0.35- m CMOS technology offers 127-dBc/Hz and 121-dBc/Hz phase noise at 1-MHz offset with 980 MHz and 1.76 GHz of carrier frequency, respectively. Fig. 16.

Measured PLL output phase noise. (a) Cellular band. (b) PCS band.

REFERENCES IV. EXPERIMENTAL RESULTS AND SUMMARY The proposed frequency synthesizer has been fabricated in a 0.35- m CMOS technology. Fig. 14 shows the die photograph of the synthesizer with an area of 2.5 mm 2.0 mm including pads. The circuit has been measured with a nominal 3.0-V supply and a 2.7-V worst case. The bonding wire of the QFN20 package used in the VCO has 1.36 nH of self-inductance and 0.31 nH of mutual inductance, so the total inductance

[1] A. Kral, F. Behbahani, and A. A. Abidi, “RF-CMOS oscillators with switched tuning,” in Proc. IEEE Custom Integrated Circuits Conf., May 1998, pp. 555–558. [2] C. Lam and B. Razavi, “A 2.6-GHz/5.2-GHz frequency synthesizer in 0.4-m CMOS technology,” in Symp. VLSI Circuits Dig. Tech. Papers, June 1999, pp. 117–120. [3] B. Razavi, RF Microelectronics. Upper Saddle River, NJ: Prentice Hall, 1998. [4] J. Craninckx and M. Steyaert, “A fully integrated CMOS DCS-1800 frequency synthesizer,” IEEE J. Solid-State Circuits, vol. 33, pp. 2054–2065, Dec. 1998.

542

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 5, MAY 2002

[5] D. B. Leeson, “A simple model of feedback oscillator noise spectrum,” Proc. IEEE, vol. 54, pp. 329–330, Feb. 1966. [6] S. Mohan, M. Hershenson, S. Boyd, and T. H. Lee, “Simple accurate expressions for planar spiral inductances,” IEEE J. Solid-State Circuits, vol. 34, pp. 1419–1424, Oct. 1999. [7] J.-M. Mourant, J. Imbornone, and T. Tewksbury, “A low phase noise monolithic VCO in SiGe BiCMOS,” in IEEE Radio Frequency Integrated Circuits (RFIC) Symp. Dig., June 2000, pp. 65–68.

Yido Koo was born in Seoul, Korea, in 1973. He received the B.S. and M.S. degrees from the School of Electrical Engineering, Seoul National University, Seoul, Korea, in 1996 and 1998, respectively, where he is currently working toward the Ph.D. degree. His research interests include RF building blocks and systems for wireless communication and highspeed interface for data communications. Currently, he is developing a low-noise frequency synthesizer for CDMA and GSM applications.

Hyungki Huh was born in Seoul, Korea. He received the B.S. and M.S. degree in electrical engineering from Seoul National University, Seoul, Korea, in 1998 and 2001, respectively, where he is currently working toward the Ph.D. degree in electrical engineering. His research interests are in the area of RF circuits and systems with emphasis on the fractional frequency synthesizer.

Yongsik Cho was born in Daegu, Korea. He received the B.S. degree in electrical engineering from Seoul National University, Seoul, Korea, in 2000, where he is currently working toward the M.S. degree in electrical engineering. His research interests are in the area of RF circuits and systems.

Jeongwoo Lee received the B.S. and M.S. degrees in electronics engineering and the Ph.D. degree in electrical engineering from Seoul National University, Seoul, Korea, in 1994, 1996, and 2000, respectively. He is currently a Manager with the W-CDMA team of GCT Semiconductor Inc., San Jose, CA. His current research interests include CMOS transceiver circuitry for highly integrated radio applications.

Joonbae Park received the B.S. and M.S. degrees in electronics engineering and the Ph.D. degree in electrical engineering from Seoul National University, Seoul, Korea, in 1993, 1995, and 2000, respectively. In 1998, he joined GCT Semiconductor Inc., San Jose, CA, as Director of the Analog Division. He is currently involved in the development of CMOS RF chip sets for WLL, W-CDMA, and wireless LAN. His other research interests include data converters and high-speed communication interfaces. Dr. Park received the Best Paper Award of VLSI Design’99, Goa, India.

Kyeongho Lee was born in Seoul, Korea, in 1969. He received the B.S. and M.S. degrees in electronics engineering and the Ph.D. degree in electrical engineering from Seoul National University, Seoul, Korea, in 1993, 1995, and 2000, respectively. He was with Silicon Image, Inc., Sunnyvale, CA, as a Member of Technical Staff, where he worked on CMOS high-bandwidth low-EMI transceivers. He is currently with GCT Semiconductor Inc., San Jose, CA, as a Co-Chief Executive Officer. His research interests include various CMOS high-speed circuits for wire/wireless communication systems and integrated CMOS RF systems.

Deog-Kyoon Jeong received the B.S. and M.S. degrees in electronics engineering from Seoul National University, Seoul, Korea, in 1981 and 1984, respectively, and the Ph.D. degree in electrical engineering and computer sciences from the University of California, Berkeley, in 1989. From 1989 to 1991, he was with Texas Instruments Inc., Dallas, TX, where he was a Member of Technical Staff and worked on the modeling and design of BiCMOS gates and the single-chip implementation of the SPARC architecture. He joined the faculty of the Department of Electronics Engineering and Inter-University Semiconductor Research Center, Seoul National University, as an Assistant Professor in 1991. He is currently an Associate Professor of the School of Electrical Engineering, Seoul National University. His main research interests include high-speed I/O circuits, VLSI systems design, microprocessor architectures, and memory systems.

Wonchan Kim was born in Seoul, Korea, on December 11, 1945. He received the B.S. degree in electronics engineering from Seoul National University, Korea, in 1972. He received the Dip.-Ing. and Dr.-Ing. degrees in electrical engineering from the Technische Hochschule Aachen, Aachen, Germany, in 1976 and 1981, respectively. In 1972, he was with Fairchild Semiconductor Korea as a Process Engineer. From 1976 to 1982, he was with the Institut für Theoretische Electrotecnik RWTH Aachen. Since 1982, he has been with the School of Electrical Engineering, Seoul National University, where he is currently a Professor. His research interests include development of semiconductor devices and design of analog/digital circuits.

1028

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

61

A Modeling Approach for – Fractional-N Frequency Synthesizers Allowing Straightforward Noise Analysis Michael H. Perrott, Mitchell D. Trott, Member, IEEE, and Charles G. Sodini, Fellow, IEEE

Abstract—A general model of phase-locked loops (PLLs) is derived which incorporates the influence of divide value variations. The proposed model allows straightforward noise and dynamic analyses of – fractional- frequency synthesizers and other PLL applications in which the divide value is varied in time. Based on the derived model, a general parameterization is presented that further simplifies noise calculations. The framework is used to analyze the noise performance of a custom – synthesizer implemented in a 0.6- m CMOS process, and accurately predicts the measured phase noise to within 3 dB over the entire frequency offset range spanning 25 kHz to 10 MHz.

61

61

Index Terms—Delta, dithering, divider, fractional- , frequency, modeling, noise, phase-locked loop, PLL, quantization noise, sigma, synthesizer. Fig. 1. Block diagram of a

I. INTRODUCTION

T

HE USE OF wireless products has been rapidly increasing in the last decade, and there has been worldwide development of new systems to meet the needs of this growing market. As a result, new radio architectures and circuit techniques are being actively sought that achieve high levels of integration and low-power operation while still meeting the stringent performance requirements of today’s radio systems. One such technique is the use of – modulation to achieve high-resolution frequency synthesizers that have relatively fast settling times, as described by Riley et al. in [1], Copeland in [2], and Miller and Conley in [3], [4]. This method has now been used in a variety of applications ranging from accurate frequency generation [1], [5]–[7] to direct frequency modulation for transmitter applications [8]–[12]. However, despite its increasing use, a general model of – fractional- synthesizers to encompass dynamic and noise performance has not previously been presented. The primary obstacle to deriving such a model is that, in contrast to classical phase-locked loop (PLL) systems, a – synthesizer dynamically varies the divide value in the PLL according to the output of a – modulator. Traditional methods of PLL analysis assume a static divide value, and the step toward allowing for dynamic variations is not straightforward. As a result, the impact Manuscript received November 14, 2000; revised March 14, 2002. This work was supported in part by the Defense Advanced Research Projects Agency under Contract DAAL-01-95-K-3526. M. H. Perrott and C. G. Sodini are with the Microsystems Technology Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA (e-mail: [email protected]). M. D. Trott is with Hewlett-Packard Laboratories, Palo Alto, CA 94304 USA. Publisher Item Identifier 10.1109/JSSC.2002.800925.

6–1 frequency synthesizer.

of the divide value variations is often treated in isolation of other influences on the PLL [1], such as noise in the phase detector and voltage-controlled oscillator (VCO), and overall analysis of the synthesizer becomes cumbersome. In this paper, we develop a simple model for the – synthesizer that allows straightforward analysis of its dynamic and noise performance. The predictions of the model compare extremely well to simulated and experimental results of implemented – synthesizers [9], [10], [13]. In addition, we present a PLL parameterization that simplifies calculation of the PLL dynamics and assessment of the synthesizer noise performance. To develop the – synthesizer model, we first derive a general model of the PLL that incorporates the influence of divide value variations. The derivation is done in the time domain and then converted to a frequency-domain block diagram. We parameterize the resulting PLL model in terms of a single funcand illustrate its usefulness in determining the noise tion performance of the PLL. The – modulator is then included in the generalized PLL model and its impact on the PLL is analyzed. Finally, the modeling approach is used to calculate the noise performance of a custom – synthesizer integrated in a 0.6- m CMOS process and then compared to measured results. II. BACKGROUND Fig. 1 displays a block diagram of a – frequency synthesizer, along with a snapshot of the signals associated with various nodes in this system. A PLL in essence, the synthesizer achieves accurate setting of its output frequency by locking to a reference frequency. This locking action is accomplished through feedback by dividing down the VCO output frequency and comparing its phase to the phase of the reference source

0018-9200/02$17.00 © 2002 IEEE

PERROTT et al.: MODELING APPROACH FOR



FRACTIONAL-

FREQUENCY SYNTHESIZERS

1029

to produce an error signal. The phase comparison operation is done through the use of a phase/frequency detector (PFD) which also acts as a frequency discriminator when the PLL is out of lock. The loop filter attenuates high-frequency components in the PFD output so that a smoothed error signal is sent to the VCO input. It consists of an active or passive network, and is typically fed by a charge pump which converts the error signal to a current waveform. The charge pump is not necessary, but provides a convenient means of setting the gain of the loop filter and simplifies implementation of an integrator when required. As illustrated in the figure, a key characteristic of – synthesizers is that the divide value is dynamically changed in time according to the output of a – modulator. By doing so, much higher frequency resolution can be achieved for a given PLL bandwidth setting than possible with classical integer- frequency synthesizers [1]. III. TIME-DOMAIN PLL MODEL We now derive time-domain models for each individual PLL block shown in Fig. 1. The primary focus of our effort is on obtaining a divider model incorporating dynamic changes to its value. However, the derivation of this model requires careful attention to the way we model the PFD. In particular, we will parameterize signals associated with a tristate PFD with sequences that can be directly related to the divider operation. This approach is extended to an XOR-based PFD by relating its output to that of a tristate PFD. Following a brief derivation of the VCO model, we then obtain the divider model by relating its operation to the VCO model and the PFD sequences discussed above. Finally, the charge pump and loop filter models are described, and the overall PLL model constructed. A. Tristate PFD The tristate PFD and its associated signals are shown in Fig. 2. , is characterized as a series of The output of the detector, pulses whose widths are a function of the relative phase differand . We paramence between rising edges of and with eterize the phase difference between and , respectively. the discrete-time sequences is nominally zero, and is defined in (1). The seare parameterized by the following ries of pulses that form discrete-time sequences. • : time instants at which the rising edges of the reference clock occur. : time instants at which the rising edges of the • divider output occur. : time difference between rising edges of and • . Assuming a constant reference frequency, consecutive values for are related for all as

where is the reference period. We will make use of the parameterization in deriving the PFD model; the other sequences will be used when deriving the divider model. Since phase detection is a memoryless operation, its influence on the PLL dynamics is sufficiently modeled by its gain. How-

Fig. 2. Tristate phase-frequency detector and associated signals.

ever, the pulsed behavior of the PFD output adds some complexity in deriving the value of that gain, so our derivation will consist of two steps. The first step relates the input phase difsequence. The second step relates the ference to the sequence to an impulse approximation of the waveform. to the phase difference, The relationship of , is defined as (1) To verify the above definition, one observes from Fig. 2 that a to be . phase error of causes sequence on the PLL dynamics is cumThe impact of the bersome to model analytically since the pulse-width modulated PFD output has a nonlinear influence on the PLL dynamics. However, a simple approximation greatly eases our efforts—we simply represent the PFD output as an impulse sequence rather than a modulated pulse sequence. Fig. 3 illustrates this approxare represented as impulses with area imation; pulses in equal to their corresponding pulse, as described by (2) We discuss the significance of the above expression when we derive the frequency-domain model of the PLL in Section IV. Our justification for the impulse approximation is heuristic—each PFD output pulse has much smaller width than the loop filter impulse response, and therefore acts like an impulse when the two are convolved together. Obviously, the accuracy of this approximation depends on how much smaller the PFD output pulse widths are compared to the dominant time constant of the loop filter. Since the PFD pulses must be smaller than a reference period, high accuracy is achieved

1030

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

Fig. 3. Impulse sequence approximation of PFD output.

when the reference frequency is much higher than the loop filter (PLL) bandwidth. Fortunately, this condition is satisfied when dealing with – synthesizers since a high reference frequency to PLL bandwidth ratio is required to adequately suppress the – quantization noise. For additional discussion on this issue, see [13]. B.

XOR-Based

PFD

An XOR-based PFD is shown in Fig. 4 [13]–[15], along with associated signals that will be discussed later. Assuming the is PFD is not performing frequency acquisition, the signal , so that the detector operates simply passed to the output, as an XOR phase detector. As such, the detector outputs an and are in quadrature, average error of zero when is nominally a two-level square wave rather than the and trilevel short-pulse waveform obtained with the tristate design. The combination of having wide pulses and only two output levels allows the XOR-based PFD to achieve high linearity, which is desirable for – synthesizer applications to avoid folding down – quantization noise [13]. To model the XOR-based PFD, we simply relate its associated signals to the tristate detector so that the previous results can be readily applied. Fig. 4 displays the signals associated can be decomwith this PFD, and reveals that the output , and a trilevel posed into the sum of a square wave, . The first component is independent of pulse waveform, the input phase difference to the detector and presents a spurious noise signal to the PLL; its influence can be made negligible , captures the with proper design. The second component, , on the impact of the input phase difference, PFD output, and can be parameterized according to the width of its pulses, where

As with the tristate detector, the impulse approximation can be applied to obtain

which, if we ignore , is the tristate expression multiplied by a factor of 2. Thus, if we ignore the phase offset of and the square wave , the XOR-based PFD has an iden-

Fig. 4.

XOR-based

PFD, associated signals, and E (t) decomposition.

tical model to that of the tristate topology except that its gain is increased by a factor of 2. C. Voltage-Controlled Oscillator For our purposes, only two equations are needed to model the VCO. The first relates deviations in the VCO phase, defined , to changes in the VCO input voltage, . Since as VCO phase is the integral of VCO frequency, and deviations in , where is in units VCO frequency are calculated as of hertz per volt, we have (3) The second equation relates the absolute VCO phase, defined as , to deviations in the VCO phase and the nominal VCO : frequency (4) Our modeling efforts will be primarily focused on deviations in the VCO phase, so that (3) is of the most interest. However, (4) is required in the divider derivation that follows. D. Divider Modeling of the divider will be accomplished by first re, to the VCO phase deviations, lating the PFD pulse widths, , and the divide value sequence, . Given this relationship, the divider model is “backed out” using the PFD gain expression in (1). We begin by noting that the divider output edges occur , completes whenever the absolute VCO phase, radian increments of phase. As stated in (4), is , and phase variations, composed of a ramp in time, . These statements are collectively illustrated in Fig. 5. occur at the rising edges of the Note that changes in divider.

PERROTT et al.: MODELING APPROACH FOR



FRACTIONAL-

FREQUENCY SYNTHESIZERS

1031

Fig. 6. Time-domain model of PLL.

Fig. 5.

Relationship of divider edges to instantaneous VCO phase, 8

(t).

Carrying out the summation operation, we obtain

Now, we can relate to the VCO phase signal and divider sequence using (4) and Fig. 5. The first of two key equations is derived from Fig. 5 as (5)

Assuming initial conditions are zero, this last expression becomes

The second key equation is obtained by evaluating (4) at time and and subtracting the resulting instants expressions:

(8)

which, since written as

and

, is equivalently

(6) We combine the two key equations into one formulation by substitution of (6) into (5):

The final form of the desired equation is obtained by modifying (8) according to the following statements: , , • Define . . • Approximate As such, we obtain

(9) with the We obtain the desired divider model by replacing is zero. PFD gain expression in (1) and assuming (10)

Rearrangement of this last expression then produces

(7) Equation (7) is a difference equation relating all variables of interest; to remove the differences we sum the formulation over all positive time samples up to sample :

It is important to note that the only approximation made in de. Essentially, we riving (10) is that are ignoring the nonuniform time sampling of the VCO phase deviations. As discussed in [13] and verified by actual implementations [9], [10], this approximation is quite accurate in practice even when the PLL is modulated. E. Charge Pump and Loop Filter The charge pump and loop filter relate the PFD output to the VCO input . We model the charge pump as a simple of value . The time domain model scaling operation on . of the loop filter is characterized by its impulse response, F. Overall Model We now combine the results of Section III-A–E to obtain the overall time-domain PLL model shown in Fig. 6. The PFD model is obtained from (1) and (2), the divider model from (10), and the VCO model from (3). As discussed earlier, the

1032

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

XOR-based PFD has a factor of two larger gain than the tristate design, which is captured by the factor in the PFD model. For convenience in analysis to follow, we also define an abstract , as the output of the divider accumulation action. signal, Some observations are in order. First, the divider effectively samples the continuous-time output phase deviation of the , and then divides its value by . The output VCO, , is influenced by the integration phase of the divider, . The integration of of deviations in the divider value, is a consequence of the fact that the divider output is a causes an incremental change in phase signal, whereas the frequency of the divider output. Second, the PFD, charge pump, and loop filter translate the discrete-time error signal and to the continuous-time input of formed by . These elements, along with the divider, also the VCO, to . act as a D/A converter for mapping changes in

IV. FREQUENCY-DOMAIN PLL MODEL Derivation of a frequency-domain model of the PLL is complicated by the sampling operation and impulse train modulator shown in Fig. 6. We discuss a simple approximation for the sampling operation and impulse train modulator that results in a linear time-invariant PLL model. This method, known as pseudocontinuous analysis [16], takes advantage of the fact that the impulsive output of the PFD is low-pass filtered in continuous time by the loop filter.

Fig. 7. Pseudocontinuous method of modeling a sampling operation in the frequency domain.

A. Pseudocontinuous Approximation that is sampled with period and then Consider a signal , as described by converted to an impulse sequence

Fig. 8.

where . The frequency-domain relationship and is found by taking the Fourier transform between of the above expression, which leads to

This expression reveals that the Fourier transform of , , is composed of multiple copies of the Fourier transform , , that are scaled in magnitude by and shifted of . We assume in frequency from one another with spacing is confined to frequencies that the frequency content of and , so that negligible aliasing between within . occurs between the copies of to Developing a frequency-domain model relating is complicated by the many copies of in that occur due to the sampling operation. However, if we is fed into a continuous-time low-pass filter assume that with sufficiently low bandwidth, we can obtain a simple and . approximation of the relationship between Fig. 7 graphically illustrates a frequency-domain view of the sampling operation and the impact of following it with a . continuous-time low-pass filter of bandwidth less than The low-pass filter significantly attenuates all of the replicated

Frequency-domain model of PLL.

copies of within except for the baseband copy, which allows us to approximate the relationship between and in the frequency domain as a simple scaling operation . In so doing, we ignore aliasing effects that will occur of at frequencies beyond if there is frequency content in to . However, our analysis will the range of be reasonably accurate when performing closed-loop analysis for most frequencies of interest in our application. The double outline of the box in the figure is meant to serve as a reminder that a sampling operation is taking place. B. Resulting Model The time-domain block diagram in Fig. 6 is now readily converted to the frequency domain by taking the -transform of the discrete-time blocks, the Fourier transform of the continuous-time blocks, and by applying the approximation of the sampling operation discussed above. Fig. 8 displays the resulting model. Note that all blocks are parameterized by the common variable , which denotes frequency in hertz, under the assumption that all discrete-time sequences interact with the continuous-time blocks as modulated impulse trains of period . Also note that all the signals in the PLL are still denoted in the time domain even though they interact

PERROTT et al.: MODELING APPROACH FOR



FRACTIONAL-

FREQUENCY SYNTHESIZERS

1033

Fig. 9. Detailed view of PLL noise sources and examples of their respective spectral densities.

through frequency-domain blocks. The reason for this notation convention is that, in practice, these signals are stochastic and do not have defined Fourier transforms, but rather are described by their power spectral densities. Fig. 10. Parameterized model of PLL for dynamic response and noise calculations.

V. PARAMETERIZATION OF PLL We now parameterize the PLL dynamics depicted in Fig. 8 in . Using this terms of a single function which we will call parameterization, we then develop a general noise model for frequency synthesizers in which all the relevant transfer functions . are described in terms of A. Derivation To parameterize the PLL dynamics, it is convenient to define a base function that provides a simple description of all the PLL transfer functions of interest. It turns out that the following definition works well for this purpose. (11) where

is the open-loop transfer function of the PLL: (12)

is low pass in nature with infinite gain at dc, Since has the following properties: as as

Divider/reference jitter, , corresponds to noise-induced variations in the transition times of the Reference or Divider is caused output waveforms. A periodic reference spur by use of the XOR-based PFD, or by the tristate PFD when its output duty cycle is nonzero. Charge-pump noise is caused by noise produced in the transistors that compose the charge-pump circuit. Finally, VCO noise includes the intrinsic noise of the VCO and voltage noise at the output of the loop filter. For convenience in later discussion, we have lumped these noise sources into two categories, VCO noise and detector noise, as shown in Fig. 9. Fig. 10 displays the transfer function relationships from each of the above noise sources to the synthesizer output. The derivation of these transfer functions is straightforward based on Fig. 9 parameterization derived earlier. Note that two and the different parameterizations are shown to describe the impact of divide value variations on the PLL output phase. The alternate , more directly to model relates changes in the divide value, the PLL output frequency. Its derivation follows by noting that the order of linear time-invariant blocks can be switched, and that

(13)

is a low-pass filter with a low frequency gain implying that of one. in terms of One may try to tie an intrinsic meaning to PLL behavior. However, it is meant only as a convenient vehicle for compactly describing the PLL transfer functions of interest, as will be shown later in this section. B. Application to Noise Analysis The derived parameterization allows straightforward calculation of the noise performance of a synthesizer as a function of various noise sources in the PLL, which are shown in Fig. 9.

for Note that the validity of the dynamic model, and its alternate, presented in Fig. 10, has been verified in previous work discussed in [9], [13]. The validity of the noise model will be verified in Section VII. Calculation of spectral noise densities using Fig. 10 is complicated by the fact that both discrete-time (DT) and

1034

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

continuous-time (CT) signals are present. Three cases are of significance, and their respective spectral noise calculations are as follows [17]: fed into CT filter to produce a Case 1) CT input : CT output (14) fed into DT filter Case 2) DT input : duce a DT output

to pro(15)

fed into CT filter :

Case 3) DT input CT output

to produce a

(16) In Case (3), we assume that the DT input interacts with the CT filter as a modulated impulse train of period . The above spectral density calculations and Fig. 10 allow us to accurately calculate the influence of the various noise sources on the PLL output. A few qualitative observations are also in order. Detector noise is low-pass filtered by the PLL dynamics, while VCO noise is high-pass filtered by the PLL dynamics. The overall noise power in the PLL output, whose integral over frequency corresponds to the time-domain jitter of the PLL output, is a function of the PLL bandwidth. If the PLL bandwidth is very low, VCO noise will dominate over a wide frequency range due to the abundant suppression of detector noise. Likewise, a high PLL bandwidth will suppress VCO noise over a wide frequency range at the expense of allowing more detector noise through. VI.



SYNTHESIZER MODEL

We are now ready to incorporate the – modulator into the general PLL model. We do so by first providing a brief description of – modulator fundamentals, and then provide intuition to the means by which they increase the frequency resolution of a synthesizer compared to a classical implementation in which the divider value is held constant. Finally, we present a frequency-domain model of the – synthesizer and use it to calculate the impact of the – quantization noise on the PLL output phase. A.



Modulator

A – modulator achieves a high-resolution signal using only a few output levels. To do this, the modulator dithers its output at a high rate such that the “average” value of the dithered sequence corresponds to a high-resolution input signal whose energy is confined to low frequencies. Appropriate filtering of the output sequence removes quantization noise produced by the dithering, which yields a high-resolution signal closely matching that of the input. In – synthesizer applications, it is important to note that the – modulator is purely digital in its implementation. Thus, – structures that are difficult to implement in the analog world due to high matching requirements, such as the MASH (or cascaded) architecture [18], [19], are trivial to implement in

Fig. 11.

Illustration of dithering action of

6–1 modulator.

this application due to the precise matching offered by digital circuits. In general, modeling of a – modulator is accomplished by assuming its quantization noise is independent of its input [19]. This leads to a linear time-invariant model that is parameterized by transfer functions from the input and quantization noise to the output. For instance, a MASH – modulator structure [19] of , and output is described by order , input (17) Thus, the modulator passes its input to the output along with , that is shaped by the filter . quantization noise, is white and uniformly distributed between 0 and 1 Ideally, [20], [21]. so that its spectrum is flat and of magnitude It is convenient to parameterize the – modulator in terms of two transfer functions. The signal transfer function (STF) of to output , the – modulator is defined from the input while the noise transfer function (NTF) is defined from the base to the output. Inspection of (17) reveals quantization noise that a MASH structure of order is parameterized as STF: NTF: B. Application to PLL To understand the impact of using a – modulator to control the divide value in a frequency synthesizer, Fig. 11 contrasts the way the divide value is varied in classical versus – fractional- frequency synthesizers based on the alternate model in Fig. 10. Note that the divide value variations are cast as continuous-time signals to get the proper scale factor such that a unit change in divide value yields an output frequency change Hz. In the classical case, the divide value is static except of when the output frequency is changed, and the PLL output frequency responds to the change according to the low-pass nature . In contrast, a – fractionalof the PLL dynamics

PERROTT et al.: MODELING APPROACH FOR

Fig. 12.

Parameterized model of a



FRACTIONAL-

FREQUENCY SYNTHESIZERS

1035

6–1 synthesizer.

synthesizer constantly dithers the divide value at a high rate such that extracts compared to the bandwidth of out its low-frequency content. The low frequency content of the – output is, in turn, set by the – input , which can have arbitrarily high resolution. Thus, the – modulator allows the PLL output frequency to be controlled to a very high resolution independent of the reference frequency—a high reference frequency can be used while simultaneously achieving high-frequency resolution. C. Frequency-Domain Model To obtain the frequency-domain model of a – synthesizer, we simply extend the PLL model in Fig. 10 to include the – modulator, as shown in Fig. 12. This figure depicts a general model of a – modulator which is characterized by its STF is assumed ideal (i.e., and NTF. The base quantization noise white) in the illustration. Fig. 12 offers several insights to the fundamentals of – frequency synthesis. First, we see that the shaped – quantization noise passes through a digital accumulator and then the , before impacting the output phase of the PLL dynamics, PLL. The digital accumulator, a consequence of the integrating nature of the divider, effectively reduces the noise-shaping order , act to remove the of the – by one. The PLL dynamics, high-frequency quantization noise produced by the – modulator. The – quantization noise adds an additional noise source to those already present in the PLL, but the relationship from each noise source to the output phase remains purely a and the nominal divide value. function of D. Quantization Noise Impact on PLL As Fig. 12 reveals, a – synthesizer’s noise performance is impacted by the – quantization noise in addition to the intrinsic detector and VCO noise sources found in the classical PLL. Calculation of this impact is straightforward using the presented modeling approach. For example, given the NTF of an th order MASH structure is , we calculate the impact of its quantization noise on the PLL output using Fig. 12 and (16) as

Fig. 13.

Block diagram of prototype system.

which is also expressed as

(18) If the quantization noise spectra of

is white, then

as previously discussed. In many cases, is not white and must be computed numerically by simulating the – modu. lator at a given value of Equation (18) shows that the – quantization noise is reduced in order by one due to the integrating action of the is white, the shaped noise rises at divider. Assuming dB/decade for frequencies . Therefore, if is chosen to be the same as the order of the the order of – , the quantization noise seen at the PLL output will roll off at 20 dB/decade outside the PLL bandwidth. This rolloff characteristic matches that of the VCO noise. VII. RESULTS The above methodology is now used to analyze the noise performance of a prototype system described in [9], [13]. Fig. 13 displays a block diagram of the prototype, which consists of a custom CMOS fractional- synthesizer IC that includes an XOR-based PFD, an on-chip loop filter that uses switched capacitors to set its time constant, a second-order digital MASH – modulator, and an asynchronous 64-modulus divider that supports any divide value between 32 and 63.5 in half-cycle increments. An external divide-by-2 prescaler is used so that the CMOS divider input operates at half the VCO frequency, which modifies the range of divide values to include all integers between 64 and 127. A computer interface is used to set the digital frequency value that is fed into the input of the – modulator. A. Modeling A linearized frequency-domain model of the prototype system is shown in Fig. 14. The open-loop transfer function of and a zero the system consists of two integrators, a pole at at . Additional poles and zeros occur in the system due to the effects of finite opamp bandwidth and other nonidealities,

1036

Fig. 14.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

Linearized frequency-domain model of prototype system.

but are not significant for the analysis to follow. The parameterization is calculated from Fig. 14 and (11) as

(19) Fig. 15.

The parameters of the system were set such that the PLL had a bandwidth of 84 kHz:

Expanded view of PLL System. TABLE I VALUES OF NOISE SOURCES WITHIN PLL

kHz kHz kHz (20) Fig. 15 expands the block diagram of the prototype to indicate the circuits of relevance and their respective noise contributions. A few comments are in order. First, a reference frequency of 20 MHz was chosen to achieve an acceptably low impact of – quantization noise while still allowing low-power implementation of the digital logic. This choice of reference freto achieve an output quency, in turn, required that was set to carrier frequency of 1.84 GHz. The value of was chosen 30 MHz/V by the external VCO. The value of as large as practical in order to obtain good noise performance; it was constrained to 30 pF due to area constraints on the die of the custom IC. B. Noise Analysis Table I displays the value of each noise source shown in Fig. 15. Many of these values were obtained through ac simulation of the relevant circuits in HSPICE. Note that all are assumed to be white, so that noise sources other than the values of their variance suffice for their description. This , assumption holds for the input-referred VCO noise, provided that the output phase noise of the VCO rolls off at 20 dB/dec [22], [23]; the 20 dB/dec rolloff is achieved in , which has a flat spectral density, the model since passes through the integrating action of the VCO. The actual VCO deviates from the 20 dB/dec rolloff at low frequencies noise, and at high frequencies due to a finite noise due to floor. However, the assumption of 20 dB/dec rolloff suffices for the frequency offsets of interest.

The input-referred noise of the VCO was calculated from an open-loop VCO phase noise measurement (shown in Fig. 17) at 5-MHz frequency offset as dBc/Hz at

MHz

is 30 MHz/V. The value of the where produced by the switched-capacitor operation lated as

(21) noise current was calcu-

(22) is Boltzmann’s constant, and is temperature where in degrees Kelvin. Finally, the spectral density of the – quantization noise was calculated as (23) is the order of the – modulator. where The noise sources in Table I can be classified as either charge-pump noise, VCO noise, or – quantization noise, , , and , respectively. For which we denote as is referred to the convenience, we will assume that input of the VCO, so that it passes through the transfer function before influencing the VCO output phase. Given the

PERROTT et al.: MODELING APPROACH FOR



FRACTIONAL-

FREQUENCY SYNTHESIZERS

1037

values of these sources, the overall noise spectral density at the is described as synthesizer output (24) , , and are the contributions where , , and , respectively. is given by from . and are calculated from (18) with Fig. 10 and (14) as

(25) and are white, and Note that we have assumed that since an XOR-based PFD is used. that and The task that remains is to determine the values of . Examination of Fig. 15 reveals that charge-pump noise is a function of the following noise sources:

Fig. 16.

Calculated noise spectra of synthesizer compared to measured results.

(26) while VCO noise is a function of the noise sources (27) and We will quickly infer the value of the functions in this paper; the reader is referred to [13] for more detail. . Examination of Table I reveals Let us first determine is an order of magnitude larger than , , and that . Since the noise source is switched alternately between the positive and negative terminals of OP1, its contribution to will be pulsed in nature. At a nominal duty cycle of 50%, to be split equally between we would expect the energy of is then the positive and negative terminals of OP1. As such, . This intuitive argument was verified using a detailed C simulation of the PLL [24]. Note that a more accurate estiwill take into account any offset in the nominal duty mate of cycle of the phase detector output, and the transient response of the charge pump. . Since Table I reveals that Now let us determine is of the same order of , we simply add these components . This expression is accurate at to obtain frequencies less than the unity gain bandwidth of OP1; the noise source is passed to its output with a gain of approximately one in this region. At frequencies beyond OP1’s bandwidth, the is attenuated in this expression is conservatively high since frequency range. Based on the above information, plots of the spectra in (24) are shown in Fig. 16. For convenience, we have also overlapped measured results from Fig. 17 for easy comparison, which will be discussed shortly. As shown in Fig. 16, the influence of detector noise dominates at low frequencies, and the influence of VCO and – quantization noise dominate at high frequencies. described by (19) with the Note that the calculations use parameter values specified in (20).

Fig. 17. Measured closed-loop synthesizer noise and open-loop VCO noise.

Fig. 17 shows measured plots of and the open-loop phase noise of the VCO from the synthesizer prototype; the plots were obtained from an HP 3048A phase-noise measurement system. It should be noted that the LSB of the – modulator was dithered to reduce spurious content, which was necessary due to the low order of the – modulator. The resulting spectra compare quite well with the calculated curve in Fig. 16 over the frequency offset range of 25 kHz to 10 MHz. Above 10 MHz, the phase-noise measurement was limited by the sensitivity of the measurement equipment. Note that the 60 dBc spur at 20-MHz offset is due to the 50% nominal duty cycle of the PFD; no effort was made to reduce it below this level during the design process since it was acceptable for the intended application of the prototype. VIII. CONCLUSION In this paper, we developed a general model of a PLL that incorporates the influence of divide value variations. A model for – fractional- synthesizers was obtained by simply incorporating a – modulator model into this framework. The PLL model was parameterized by a single transfer func, which further simplifies noise calculations. The tion framework was used to calculate the noise performance of a custom – synthesizer, and was shown to accurately predict

1038

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

measured results within 3 dB over a frequency offset range from 25 kHz to 10 MHz.

[24] M. H. Perrott, “Fast and accurate behavioral simulation of fractional-N frequency synthesizers and other PLL/DLL circuits,” in Proc. Design Automation Conf. (DAC), June 2002, pp. 498–503.

ACKNOWLEDGMENT The authors would like to thank the Hong Kong University of Science and Technology, and in particular, J. Lau, P. Chan, and P. Ko, for their support in the writing of this paper. REFERENCES [1] T. A. Riley, M. A. Copeland, and T. A. Kwasniewski, “Delta–sigma modulation in fractional- frequency synthesis,” IEEE J. Solid State Circuits, vol. 28, pp. 553–559, May 1993. [2] M. A. Copeland, “VLSI for analog/digital communications,” IEEE Commun. Mag., vol. 29, pp. 25–30, May 1991. [3] B. Miller and B. Conley, “A multiple modulator fractional divider,” in Proc. 44th Annu. Symp. Frequency Control, May 1990, pp. 559–567. , “A multiple modulator fractional divider,” IEEE Trans. Instrum. [4] Meas., vol. 40, pp. 578–583, June 1991. [5] W. Rhee, B.-S. Song, and A. Ali, “A 1.1-GHz CMOS fractional- frequency synthesizer with 3-b third-order sigma–delta modulator,” IEEE J. Solid-State Circuits, vol. 35, pp. 1453–1460, Oct. 2000. [6] B. Miller, “Technique enhances the performance of PLL synthesizers,” Microw. RF, pp. 59–65, Jan. 1993. [7] T. Kenny, T. Riley, N. Filiol, and M. Copeland, “Design and realization of a digital delta–sigma modulator for fractional- frequency synthesis,” IEEE Trans. Veh. Technol., vol. 48, pp. 510–521, Mar. 1999. [8] T. A. Riley and M. A. Copeland, “A simplified continuous phase modulator technique,” IEEE Trans. Circuits Syst. II, vol. 41, pp. 321–328, May 1994. [9] M. Perrott, T. Tewksbury, and C. Sodini, “A 27-mW CMOS fractional- synthesizer using digital compensation for 2.5-Mb/s GFSM modulation,” IEEE J. Solid-State Circuits, vol. 32, pp. 2048–2060, Dec. 1997. [10] S. Willingham, M. Perrott, B. Setterberg, A. Grzegorek, and W. McFarland, “An integrated 2.5-GHz sigma–delta frequency synthesizer with 5 microseconds settling and 2-Mb/s closed-loop modulation,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), Feb. 2000, pp. 200–201. [11] N. Filiol, T. Riley, C. Plett, and M. Copeland, “An agile ISM band frequency synthesizer with built-in GMSK data modulation,” IEEE J. Solid-State Circuits, vol. 33, pp. 998–1008, July 1998. [12] N. Filiol, C. Plett, T. Riley, and M. Copeland, “An interpolated frequency-hopping spread-spectrum transceiver,” IEEE Trans. Circuits Syst. II, vol. 45, pp. 3–12, Jan. 1998. [13] M. H. Perrott, “Techniques for high data rate modulation and low power operation of fractional- frequency synthesizers with noise shaping,” Ph.D. dissertation, Massachusetts Inst. Technol., Cambridge, MA, 1997. [14] A. Hill and A. Surber, “The PLL dead zone and how to avoid it,” RF Design, pp. 131–134, Mar. 1992. [15] M. Thamsirianunt and T. A. Kwasniewski, “A 1.2-m CMOS implementation of a low-power 900-MHz mobile radio frequency synthesizer,” in Proc. IEEE Custom Integrated Circuits Conf. (CICC), 1994, p. 16.2. [16] J. A. Crawford, Frequency Synthesizer Handbook. Norwood, MA: Artech, 1994. [17] E. A. Lee and D. G. Messerschmitt, Digital Communication, 2nd ed. Norwell, MA: Kluwer, 1994. [18] J. Candy and G. Temes, Oversampling Delta–Sigma Data Converters. New York: IEEE Press, 1992. [19] S. Norsworthy, R. Schreier, and G. Temes, Delta–Sigma Data Converters: Theory, Design, and Simulation. New York: IEEE Press, 1997. [20] A. Sripad and D. Snyder, “A necessary and sufficient condition for quantization errors to be uniform and white,” IEEE Trans. Acoust. Speech Signal Proc., vol. ASSP-25, pp. 442–448, Oct. 1977. [21] W. Bennett, “Spectra of quantized signals,” Bell Syst. Tech. J., vol. 27, pp. 446–472, July 1948. [22] D. Leeson, “A simple model of feedback oscillator noise spectrum,” Proc. IEEE, vol. 54, pp. 329–330, Feb. 1966. [23] A. Hajimiri and T. Lee, “A general theory of phase noise in electrical oscillators,” IEEE J. Solid-State Circuits, vol. 33, pp. 179–194, Feb. 1998.

N

N

N

N

N

Michael H. Perrott received the B.S. degree in electrical engineering from New Mexico State University, Las Cruces, in 1988, and the M.S. and Ph.D. degrees in electrical engineering and computer science from the Massachusetts Institute of Technology (M.I.T.), Cambridge, in 1992 and 1997, respectively. From 1997 to 1998, he was with Hewlett-Packard Laboratories, Palo Alto, CA, working on high-speed circuit techniques for – synthesizers. In 1999, he was a visiting Assistant Professor at the Hong Kong University of Science and Technology, where he taught a course on the theory and implementation of frequency synthesizers. From 1999 to 2001, he was with Silicon Laboratories, Austin, TX, where he developed circuit and signal-processing techniques to achieve high-performance clock and data recovery circuits. He is currently an Assistant Professor in the Department of Electrical Engineering and Computer Science at M.I.T., where his research focuses on high-speed circuit and signal processing techniques for data links and wireless applications.

61

Mitchell D. Trott (S’90–M’92) received the B.S. and M.S. degrees in systems engineering from Case Western Reserve University, Cleveland, OH, in 1987 and 1988, respectively, and the Ph.D. degree in electrical engineering from Stanford University, Stanford, CA, in 1992. He was an Assistant and Associate Professor in the Department of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology, Cambridge, from 1992 until 1998. He was Director of Research with ArrayComm, Inc., San Jose, CA, from 1998 to 2002. He is currently with Hewlett-Packard Laboratories, Palo Alto, CA. His research interests include multiuser communication, information theory, and coding theory.

Charles G. Sodini (S’80–M’82–SM’90–F’94) was born in Pittsburgh, PA, in 1952. He received the B.S.E.E. degree from Purdue University, Lafayette, IN, in 1974, and the M.S.E.E. and Ph.D. degrees from the University of California, Berkeley, in 1981 and 1982, respectively. He was a Member of the Technical Staff with Hewlett-Packard Laboratories from 1974 to 1982, where he worked on the design of MOS memory and, later, on the development of MOS devices with very thin gate dielectrics. He joined the faculty of the Massachusetts Institute of Technology (M.I.T.), Cambridge, MA, in 1983, where he is currently a Professor in the Department of Electrical Engineering and Computer Science. His research interests are focused on integrated circuit and system design with emphasis on analog, RF, and memory circuits and systems. Along with Prof. R. T. Howe, he is a coauthor of an undergraduate text on integrated circuits and devices entitled Microelectronics: An Integrated Approach (Englewood Cliffs, NJ: Prentice-Hall, 1996). Dr. Sodini held the Analog Devices Career Development Professorship at M.I.T.’s Department of Electrical Engineering and Computer Science and was awarded the IBM Faculty Development Award from 1985 to 1987. He has served on a variety of IEEE Conference Committees, including the International Electron Device Meeting, of which he was the 1989 General Chairman. He was the Technical Program Co-Chairman in 1992 and the Co-Chairman for 1993–1994 of the Symposium on VLSI Circuits. He served on the Electron Device Society Administrative Committee from 1988 to 1994. He has been a member of the Solid-State Circuits Society (SSCS) Administrative Committee since 1993 and is currently President of the SSCS.

888

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 6, JUNE 2003

A Stabilization Technique for Phase-Locked Frequency Synthesizers Tai-Cheng Lee and Behzad Razavi, Fellow, IEEE

Abstract—A stabilization technique is presented that relaxes the tradeoff between the settling speed and the magnitude of output sidebands in phase-locked frequency synthesizers. The method introduces a zero in the open-loop transfer function through the use of a discrete-time delay cell, obviating the need for resistors in the loop filter. A 2.4-GHz CMOS frequency synthesizer employing the technique settles in approximately 60 s with 1-MHz channel spacing while exhibiting a sideband magnitude of 58.7 dBc. Designed for Bluetooth applications and fabricated in a 0.25- m digital CMOS technology, the synthesizer achieves a phase noise of 112 dBc/Hz at 1-MHz offset and consumes 20 mW from a 2.5-V supply. Index Terms—Charge pumps, feedforward, loop stability, oscillators, phase-locked loops (PLLs), prescalers, synthesizers.

I. INTRODUCTION

T

HE design of phase-locked loops (PLLs) must generally deal with a tight tradeoff between the settling time and the amplitude of the ripple on the oscillator control line. For phaselocked RF synthesizers, this tradeoff limits the performance in terms of the channel switching speed and the magnitude of the reference sidebands that appear at the output. This paper describes a loop stabilization technique that yields a small ripple while achieving fast settling [1]. Using a discrete-time delay cell, the PLL architecture creates a zero in the open-loop transfer function. Another important advantage of the technique is that it uses no resistors in the loop filter, lending itself to digital CMOS technologies. Also, it “amplifies” the value of the loop filter capacitor, thus saving a great deal of silicon area. Realized in a 2.4-GHz CMOS synthesizer, the proposed method provides a settling time of approximately 60 reference cycles with an output sideband level of 59 dBc. Section II of the paper develops the foundation for the proposed technique. Section III describes the 2.4-GHz synthesizer architecture and the design of its building blocks and Section IV proposes fast simulation techniques for RF synthesizers. Section V summarizes the experimental results. II. STABILIZATION TECHNIQUE Consider the PLL shown in Fig. 1(a), where a voltage-controlled oscillator (VCO) is driven by a charge pump (CP) and a Manuscript received April 23, 2002; revised February 18, 2003. T.-C. Lee was with the Department of Electrical Engineering, University of California, Los Angeles, CA 90095 USA. He is now with the Department of Electrical Engineering, National Taiwan University, Taipei 10617, Taiwan, R.O.C. B. Razavi is with the Department of Electrical Engineering, University of California, Los Angeles, CA 90095 USA (e-mail: [email protected]) Digital Object Identifier 10.1109/JSSC.2003.811879

Fig. 1. (a) Conventional PLL architecture. (b) Proposed PLL architecture with delayed charge pump circuit.

phase/frequency detector (PFD). Resistor provides the stabisuppresses the glitch generated by lizing zero and capacitor the charge pump at every phase comparison instant. The glitch arises from: 1) the mismatch between the arrival times of the Up and Down pulses; 2) the mismatch between the widths of the Up and Down pulses; 3) the mismatch between the charge pump current sources (both random and due to channel-length modulation); and 4) the mismatch between the charge injection and clock feedthrough of the pMOS and nMOS switches in the charge pumps. Charge sharing also exacerbates the ripple [2]. deterThe principal limitation of this architecture is that lowers the ripple on the control mines the settling whereas must remain below by roughly a factor of voltage. Since 10 so as to avoid underdamped settling, the loop must inevitably if is to sufficiently suppress be slowed down by a large the ripple. It is, therefore, desirable to seek methods of creating the stabilizing zero without the resistor so that the capacitor that defines the switching speed also directly suppresses the ripple. A number of approaches to realizing a zero in PLLs have been reported [3]–[5]. The circuits in [3] and [4] require a transconductance amplifier, whose design for large output swings (necessary for maximizing the tuning range of LC VCOs) and low flicker noise becomes difficult. The synthesizer in [5] employs a voltage-controlled delay line but it mandates a large delay and a nearly rail-to-rail control voltage.

0018-9200/03$17.00 © 2003 IEEE

LEE AND RAZAVI: STABILIZATION TECHNIQUE FOR PHASE-LOCKED FREQUENCY SYNTHESIZERS

It is important to note that the problem of ripple becomes increasingly more serious as the supply voltage is scaled down and/or the operating frequency goes up. The relative magnitude of the primary sidebands at the output of the VCO is given by where is the peak amplitude of the is the gain of the VCO, and first harmonic of the ripple, is the synthesizer reference frequency. For a given relative tuning range (e.g., 10 ), the gain of LC VCOs must increase MHz/V and if the supply voltage goes down. If MHz, then the fundamental ripple amplitude must be less than 63 V to guarantee sidebands 60 dB below the carrier. In order to arrive at the stabilization technique, consider the PLL architecture shown in Fig. 1(b). Here, the primary charge , drives a single capacitor while a secondary pump, , injects charge after some delay . The charge pump, is thus equal to total current flowing through (1) (2) is assumed to be much smaller than the loop where time constant. Consequently, the transfer function of the PFD/CP/LPF combination can be expressed as (3) Assuming

, we have

889

each stage is wide enough to support such pulses, then a very large number of stages is required to obtain the necessary , demanding a high power dissipation. with process The second issue relates to the variation of , such and temperature. Since is directly proportionally to variations can greatly affect the loop stability. To resolve the above difficulties, the architecture is modified as shown in Fig. 2(a), where a discrete-time analog delay line and . The delay network is realized as is placed after depicted in Fig. 2(b), consisting of two interleaved master-slave sample-and-hold branches operating at half of the reference freas follows. When is quency. The circuit emulates shares a charge packet corresponding to the previous high, while samples a level proporphase comparison with and tional to the present phase difference. In the next period, exchange roles. The interleaved sampling network, there. fore, provides a delay equal to the reference period The discrete-time delay technique of Fig. 2 allows a precise definition of the zero frequency without the use of resistors. To quantify the behavior of a PLL incorporating this method, we assume the loop settling time is much greater than so that the delay network can be represented by the continuous-time model shown in Fig. 2(b). Here, approximates the interleaved branches. Equation (4) can then be rewritten as (8)

(4) obtaining a zero at (5) can, therefore, stabilize the loop. Proper choice of The damping factor and the settling time of the loop can be written, respectively, as (6) (7) In order to achieve a sufficiently low zero frequency, must be large or close to unity. Since the accuracy in the definition of is limited by mismatches between the two charge must still be a large value. For example, if pumps, MHz, pF, A, MHz/V, , and , then a of approximately 500 ns is required to ensure a well-behaved loop response. The architecture of Fig. 1(b) suffers from two critical drawbacks. First, it requires that the delay stage provide a very and accommodate a wide range of Up and Down large pulsewidths. Specifically, when the loop is locked, the Up and Down pulses are less than 500 ps wide. The tradeoff between delay and bandwidth, therefore, makes the design of the delay line difficult. As depicted in Fig. 1(b), if the bandwidth of each stage in the delay line is reduced so as to yield a large delay, then the narrow Up and Down pulses are heavily attenuated, giving rise to a dead zone. Conversely, if the bandwidth of

and the current through is newhere it is assumed glected. This equation exhibits two interesting properties. First, , then and the value if is “amplified” by . For example, if , of is multiplied by a factor of 10, saving substantial area. then Second, the zero frequency is equal to (9) a value independent of process and temperature. Assuming , we obtain the damping factor and the settling time constant of the loop as (10) (11) Note that the damping factor exhibits much less process and temperature dependence than in the conventional loop of , the proposed circuit resemFig. 1(a). Interestingly, for bles the topology of Fig. 1(a) but with the resistor replaced by a switched-capacitor network. While providing insight and serving as design guidelines, the above results are obtained by a continuous-time approximation of the loop and their validity must be verified. Simulations using A, MHz/V, pF, the values yield and s. Equations and (10) and (11) predict these parameters to be 0.31 and 13 s, respectively. Thus, the continuous-time approximation provides a reasonably accurate estimate of the loop behavior even for

890

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 6, JUNE 2003

Fig. 2. Actual implementation of PLL with delay sampling circuit and continuous-time approximation of delay network.

underdamped settling (where the loop time constant is relatively short). For RF synthesis, the delay network of Fig. 2(b) must be designed carefully so as to minimize ripple on the control voltage. Since in the locked condition, the voltages at nodes and are or and creates nearly equal, charge sharing between only a small ripple. Furthermore, the switches in the delay stage are realized as small, complementary devices to introduce negligible charge injection and clock feedthrough. Comparison With Conventional Architecture In order to quantify the advantage of the proposed architecture over the in conventional PLL topology, we note that capacitor or . Since the sampling Fig. 2 appears in parallel with capacitors are typically two to three times larger than , they suppress the charge pump nonidealities by about 9 to 12 dB. The behavioral model shown in Fig. 3(a) is simulated in MATLAB for the two cases. As explained in Section IV, the reference frequency and the divide ratio are scaled by a factor of 100 to speed up the simulation. The nonideality of the charge that is pump is modeled by a constant current mismatch injected into the loop filter at each phase comparison instant. Fig. 3(b) depicts the settling behavior and the output spectrum for the two cases. (The plots are deliberately offset for clarity). For approximately equal settling times, the proposed topology (Type A) achieves 10 dB lower sidebands than the conventional loop does.

III. SYNTHESIZER DESIGN A 2.4-GHz CMOS synthesizer targeting Bluetooth applications has been designed using the stabilization technique described above. This section presents the architecture and building blocks of the synthesizer. Shown in Fig. 4, the synthesizer uses an integer- architecture with a feedback divider whose modulus is given by , where , , and – . MHz, the output frequency covers the 2.4-GHz With ISM band. The output of the swallow counter is pipelined by the to allow a relaxed design for the level converter flip-flop and the swallow counter. The buffer following the VCO suppresses the kickback noise of the prescaler when the modulus changes. It also avoids limiting the tuning range of the VCO by the input capacitance of the prescaler. A. VCO Design The VCO topology is shown in Fig. 5(a). To provide both negative and positive voltages across the MOS varactors, the and are grounded and the circuit is biased on sources of . The inductors are realized as shown in Fig. 5(b), top by with the bottom spiral moved down to metal 2 so as to reduce the parasitic capacitance [7]. Each inductor is about 14 nH, occupies an area of 180 m 180 m, and exhibits a of 4 and a parasitic capacitance of 100 fF.

LEE AND RAZAVI: STABILIZATION TECHNIQUE FOR PHASE-LOCKED FREQUENCY SYNTHESIZERS

891

Fig. 3. (a) MATLAB behavioral simulations for the ripples on the control lines. (b) Time-domain settling and VCO output spectrum during lock for Type A (delay-sampling loop filter) and Type B (conventional loop filter).

The prescaler must divide the 2.4-GHz signal while consuming a small power dissipation. Depicted in Fig. 6(b), the circuit employs three current-steering flip-flops with diode-connected loads. The use of NOR gates obviates the need for power- and headroom-hungry level shift circuits (or large input swings) required in NAND gates. The program counter and the swallow counter incorporate static flip-flops to ensure reliable operation at low frequencies. IV. SIMULATION TECHNIQUES

Fig. 4. Synthesizer architecture.

The varactors are implemented as accumulation-mode nMOS devices (placed inside n-well). In this design, a 160-fF varactor is employed to allow a tuning range of about 12%. The measured phase noise of the VCO is 120 dBc/Hz at 1-MHz offset.

A 2.4-GHz synthesizer with a reference frequency of 1 MHz requires a transient simulation step of approximately 20 ps for a total settling time on the order of 100 s, i.e., five million points. The simulation, therefore, requires an extremely long time (about 5 days on an Ultra 10 Sun Workstation) owing to both the vastly different time scales and the large number of devices (especially in the divider). This section describes a number of techniques that reduce the simulation time by several orders of magnitude while revealing the loop dynamics with reasonable accuracy. A. Linear Discrete-Time Model

B. Pulse-Swallow Counter Shown in Fig. 6(a), the pulse-swallow counter consists of a prescaler, a program counter and a swallow counter. The pipelining in the swallow counter allows the use of a small divider ratio in the prescaler. Simulations suggest that a 4/5 topology minimizes the overall divider power dissipation.

The voltages at nodes and in Fig. 2(b) can be, respectively, expressed by the following discrete-time equations

(12)

892

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 6, JUNE 2003

Fig. 5.

(a) LC oscillator. (b) Two-layer stacked inductor.

Fig. 6.

(a) Divider. (b) Prescaler.

(13) denotes the input phase error. Even though the where transform of and can be derived, the high order of the resulting polynomials makes it difficult to derive the close-loop response in analytical form. Thus, the two equations, along with the rest of the PLL, are realized in MATLAB. The VCO is modeled as a phase accumulator, with each new value of phase obtained as the previous phase plus the product of the time interval and the new frequency. The phase detector is simply a for the above equations. The linear subtractor, generating discrete-time model facilitates the choice of the charge pump for fast settling and current, the value of , and the value of minimum ripple on the oscillator control line. Fig. 7 shows the settling behavior of the synthesizer as predicted by MATLAB and transistor-level implementation. The simple discrete-time model yields a moderate accuracy while requiring orders of magnitude less simulation time. B. Transistor-Level Model The impact of various PFD, CP, and VCO nonidealities upon the loop dynamics must ultimately be studied in a realistic transistor-level implementation. We present two techniques that reduce the simulation time from days to minutes. The first method is based on “time contraction,” whereby the reference frequency is scaled up by a factor of 100 and the main loop filter capacitor ( in Fig. 2) and the divide ratio are scaled

Fig. 7.

Settling behavior of MATLAB and transistor-level simulations.

down by the same factor. All other loop parameters remain unand by changed. From (10) and (11), we note that scaling 100 maintains a constant damping factor while scaling the settling time by 100. Since the PFD operates reliably at 100 MHz with no dead zone, this method directly reduces the simulation time by a factor of 100. Fig. 8 depicts an example of time contraction by a factor of 10. Note that the time axis has a logarithmic scale. It can be observed that the loop settling behavior scales accurately by the same factor. In the second method, the divider is realized as a simple behavioral model in HSPICE that uses a handful of ideal devices and its complexity is independent of the divide ratio. Illustrated in Fig. 9, the principle of the behavioral divider is to pump a

LEE AND RAZAVI: STABILIZATION TECHNIQUE FOR PHASE-LOCKED FREQUENCY SYNTHESIZERS

Fig. 8.

893

Time contraction. Fig. 10. Die photo.

Fig. 9.

Divider behavioral model. TABLE I FAST SIMULATION SUMMARY

well-defined charge packet into an integrator in every period and reset the integrator when its output exceeds a certain level . Using an ideal op amp, comparator, and switches with , the circuit can achieve arbitrarily proper choice of and large divide ratios. (The duty cycle of output can be controlled .) This technique yields another factor of 20 reduction in by the simulation speed, allowing the synthesizer to be simulated in less than 3 min on an Ultra 10 Sun Workstation. Table I summarizes the results of the two simulation techniques. V. EXPERIMENTAL RESULTS The frequency synthesizer has been fabricated in a digital 0.25- m CMOS technology. Shown in Fig. 10 is a photograph

Fig. 11.

Measured output spectrum of the synthesizer.

of the die, whose active area measures 0.65 mm 0.45 mm. The circuit has been tested in a chip-on-board assembly while running from a 2.5-V power supply. The power dissipation is 20 mW. Fig. 11 shows the output spectrum in the locked condition. The phase noise is equal to 112 dBc/Hz at 1 MHz offset, well exceeding the Bluetooth requirement. The primary reference sidebands are at approximately 58.7 dBc. This level is lower than that achieved in [8] with differential VCO control and an 86.4-MHz reference frequency. Similarly, the designs in [9] and [10] exhibit an inferior tradeoff between the settling time and the sideband magnitudes. Fig. 12 plots the measured settling behavior of the synthesizer when its channel number is switched by 64. Here, the channel select input is periodically switched between the two end channels and the oscillator control voltage is monitored. The settling time is about 60 s, i.e., 60 input cycles. Table II summarizes the measured performance of the synthesizer. VI. CONCLUSION A PLL stabilization technique is introduced that relaxes the tradeoff between the settling time and the ripple on the control voltage, while obviating the need for resistors in the loop filter. The proposed approach creates a zero in the open loop

894

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 6, JUNE 2003

[5] A. Zolfaghari, A. Chan, and B. Razavi, “A 2.4-GHz 34-mW CMOS transceiver for frequency-hopping and direct-sequence applications,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2001, pp. 1259–1265. [6] C. Lam and B. Razavi, “A 2.6-GHz/5.2-GHz frequency synthesizer in 0.4-m CMOS technology,” IEEE J. Solid-State Circuits, vol. 35, pp. 788–794, May 2000. [7] A. Zolfaghari, A. Chan, and B. Razavi, “Stacked inductors and transformers in CMOS technology,” IEEE J. Solid-State Circuits, pp. 620–628, Apr. 2001. [8] L. Lin, L. Tee, and P. R. Gray, “A 1.4-GHz differential low-noise CMOS frequency synthesizer using a wideband PLL architecture,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2000, pp. 204–205. [9] T. K. K. Kan, G. C. T. Leung, and H. C. Luong, “A 2-V 1.8-GHz fully integrated CMOS dual-loop frequency synthesizer,” IEEE J. Solid-State Circuits, vol. 37, pp. 1012–1020, Aug. 2002. [10] C.-W. Lo and H. C. Luong, “A 1.5-V 900-MHz monolithic CMOS fast-switching frequency synthesizer for wireless applications,” IEEE J. Solid-State Circuits, vol. 37, pp. 459–470, Apr. 2002. Fig. 12.

Control voltage during loop settling. TABLE II SYNTHESIZER PERFORMANCE SUMMARY

transfer function by adding two consecutive phase comparison results and can be extended to more consecutive samples as in a transversal filter. The method also “amplifies” the loop filter capacitor by a large number (e.g., 10), saving substantial chip area. The proposed concepts are demonstrated in a 2.4-GHz RF CMOS synthesizer. The stabilization technique finds applications in other phaselocked systems as well. Examples include clock generators and clock and data recovery circuits. REFERENCES [1] T. C. Lee and B. Razavi, “A stabilization technique for phase-locked frequency synthesizers,” in VLSI Symp. Dig. Tech. Papers, June 2001, pp. 39–42. [2] M. G. Johnson and E. L. Hudson, “A variable delay line PLL for CPUcoprocessor synchronization,” IEEE J. Solid-State Circuits, vol. 23, pp. 1218–1223, Oct. 1988. [3] I. I. Novof, J. Austin, R. Kelkar, D. Strayer, and S. Wyatt, “Fully integrated CMOS phase-locked loop with 15 to 240 MHz locking range and 50 ps jitter,” IEEE J. Solid-State Circuits, vol. 11, pp. 1259–1265, Nov. 1995. [4] S. Sidiropoulos, D. Liu, J. Kim, G. Wei, and M. Horowitz, “Adaptive bandwidth DLL’s and PLL’s using regulated supply CMOS buffers,” in VLSI Symp. Dig. Tech. Papers, June 2000, pp. 124–127.

6

Tai-Cheng Lee was born in Taiwan, R.O.C., in 1970. He received the B.S. degree from National Taiwan University, Taipei, Taiwan, R.O.C., in 1992, the M.S. degree from Stanford University, Stanford, CA, in 1994, and the Ph.D. degree from the University of California, Los Angeles, in 2001, all in electrical engineering. He was with LSI Logic from 1994 to 1997 as a Circuit Design Engineer. He served as an Adjunct Assistant Professor with the Graduate Institute of Electronics Engineering (GIEE), National Taiwan University, from 2001 to 2002. Since 2002, he has been with the Department of Electrical Engineering and GIEE, National Taiwan University, where he is an Assistant Professor. His main research interests are in high-speed mixed-signal and analog circuit design, data converters, PLL systems, and RF circuits.

Behzad Razavi (S’87–M’90–SM’00–F’03) received the B.Sc. degree in electrical engineering from Sharif University of Technology, Tehran, Iran, in 1985 and the M.Sc. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, in 1988 and 1992, respectively. He was an Adjunct Professor at Princeton University, Princeton, NJ, from 1992 to 1994, and at Stanford University in 1995. He was with AT&T Bell Laboratories and Hewlett-Packard Laboratories until 1996. Since September 1996, he has been an Associate Professor and subsequently Professor of electrical engineering at the University of California, Los Angeles. He is the author of Principles of Data Conversion System Design (New York: IEEE Press, 1995), RF Microelectronics (Englewood Cliffs, NJ: Prentice-Hall, 1998), Design of Analog Integrated Circuits (New York: McGraw-Hill, 2001), Design of Integrated Circuits for Optical Communications (New York: McGraw-Hill, 2002), and the editor of Monolithic Phase-Locked Loops and Clock Recovery Circuits (New York: IEEE Press, 1996). His current research includes wireless transceivers, frequency synthesizers, phase-locking and clock recovery for high-speed data communications, and data converters. Dr. Razavi received the Beatrice Winner Award for Editorial Excellence at the 1994 ISSCC, the Best Paper Award at the 1994 European Solid-State Circuits Conference, the Best Panel Award at the 1995 and 1997 ISSCC, the TRW Innovative Teaching Award in 1997, and the Best Paper Award at the IEEE Custom Integrated Circuits Conference in 1998. He was the corecipient of the Jack Kilby Outstanding Student Paper Award at the 2002 ISSCC. He served on the Technical Program Committee of the International Solid-State Circuits Conference (ISSCC) from 1993 to 2002. He has also served as Guest Editor and Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, and the International Journal of High Speed Electronics. He is recognized as one of the top ten authors in the 50-year history of ISSCC. He is also an IEEE Distinguished Lecturer.

490

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 4, APRIL 2000

An Adaptive PLL Tuning System Architecture Combining High Spectral Purity and Fast Settling Time Cicero S. Vaucher, Member, IEEE

Abstract—An adaptive phase-locked loop (PLL) architecture for high-performance tuning systems is described. The architecture combines contradictory requirements posed by different performance aspects. Adaptation of loop parameters occurs continuously, without switching of loop filter components, and without interaction from outside of the tuning system. The relationship of performance aspects (settling time, phase noise, and spurious signals) to design variables (loop bandwidth, phase margin, and loop filter attenuation at the reference frequency) are presented, and the basic tradeoffs of the new concept are discussed. A circuit implementation of the adaptive PLL, optimized for use in a multiband (global) car-radio tuner IC, is described in detail. The realized tuning system achieved state-of-the-art settling time and spectral purity performance in its class (integer- PLL’s): a signal-tonoise ratio of 65 dB, a 100-kHz spurious reference breakthrough signal under 81 dBc, and a residual settling error of 3 kHz after 1 ms, for a 20-MHz frequency step. It simultaneously fulfills the speed requirements for inaudible frequency hopping and the heavy signal-to-noise ratio specification of 64 dB. Index Terms—Adaptive systems, FM noise, frequency synthesizers, phase-locked loops.

I. INTRODUCTION

F

AST settling time–frequency synthesizers are essential building blocks of modern communication systems. Typical examples are digital cellular mobile systems, which employ a combination of time-division duplex (TDD) and frequency-division duplex (FDD) techniques. In these systems, the downlink frequencies (base station to handsets) are placed in different bands with respect to uplink frequencies. In order to save cost and decrease the size of the handset, it is desirable to use the same frequency synthesizer to generate uplink and downlink frequencies. Requirements are that the synthesizer has to switch between bands and settle to another frequency within a predetermined time ( 1.7 ms for GSM and DCS-1800 systems [1]). Car-radio receivers with optimal radio data system (RDS) performance ask for fast-settling-time tuning systems as well [2]. The RDS network transmits a list of (nationwide) alternative frequencies carrying the same program. The tuner performs a background scanning of these frequencies, so that optimum

Manuscript received July 23, 1999; revised November 29, 1999. The author is with Philips Research Laboratories, Eindhoven 5656 AA The Netherlands (e-mail: [email protected]). Publisher Item Identifier S 0018-9200(00)02861-4.

reception condition is provided when the receiver is displaced within different coverage regions. For the system to be effective, the background scanning has to be performed in a transparent (inaudible) way to the listener. A possible but expensive way to do that is to use two tuners in the receiver, with one of them being used for checking on alternative frequencies only. Single-tuner solutions—which have a much better price/performance ratio—require a tuning system architecture able to do frequency hopping in an inaudible way [2]. In other words, a fast-settling-time architecture is required for these applications. Communication systems often pose severe requirements on the spectral purity of the tuning system local oscillator (LO) signal. There are two main reasons for this. First, to avoid problems with reciprocal mixing of adjacent channels. Reciprocal mixing decreases the receiver's selectivity and disturbs the reception of weak signals. Second, because the mixing process, which is used for down-conversion of the radio-frequency (RF) signals, superposes the phase noise of the LO on the modulation of the RF signal. Hence, the signal-to-noise ratio (SNR) at the output of the demodulator is a function of LO's phase noise level [3]. This paper describes an adaptive tuning system architecture that combines fast settling time with excellent spectral purity performance. The architecture was optimized to be used in a global car-radio tuner IC with inaudible RDS background scanning. The integer- frequency synthesizer has an SNR of 65 dB and a 100-kHz spurious reference breakthrough under 81 dBc at the voltage-controlled oscillator (VCO) ( 87 dBc at the mixer). Residual settling error for a 20-MHz frequency step is 3 kHz after 1 ms. These results are similar to those of a fractional- implementation [4]. The complexity of our tuning system, however, is much smaller. The adaptive phase-locked loop (PLL) was integrated in a 5-GHz, 2- m bipolar technology. The tuning system works with 8.5-V supply voltage for the charge pumps and with 5 V for the logic functions. Total current consumption is 21 mA from the 5-V supply and 12 mA from the 8.5-V supply. The architecture of the multiband tuner IC is described in Section II. Section III presents relationships of settling time, phase noise, and spurious signals to the design variables, namely loop bandwidth, phase margin, and loop filter attenuation at the reference frequencies. Section IV introduces the adaptive PLL architecture and discusses the advantages and tradeoffs of the concept. Section V describes the circuit implementation, and Section VI presents a summary of measured results.

0018–9200/00$10.00 © 2000 IEEE

VAUCHER: ADAPTIVE PLL TUNING SYSTEM ARCHITECTURE

491

Fig. 1. Simplified block diagram of the global car-radio tuner IC.

TABLE I RECEPTION BANDS WITH CORRESPONDING TUNING SYSTEM PARAMETERS

II. MULTIBAND TUNER ARCHITECTURE The block diagram of the global tuner IC with inaudible background scanning is shown in Fig. 1. The receiver and tuning system architectures have been defined such that all reception bands can be accessed with a single VCO and a single loop filter, without changes to the application. Mapping the frequency of the VCO to the different input bands is achieved by dividing its output frequency by different ratios, depending on the band to be received. The division is accomplished in the FM DIV and

AM DIV dividers, which are set in between the VCO output and the RF mixers. Table I presents the VCO frequency and tuning system parameter settings for various reception bands, including the American Weather Band. By dividing the VCO output, the tuning resolution is 1 kHz in AM mode and 50 kHz in FM mode, despite the fact that reference frequencies are 20 kHz and 100 kHz, respectively. Combining the different reception bands in one single application—the same VCO and same loop filter—complicates the design of the tuning system. A reception band with worst case

492

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 4, APRIL 2000

Fig. 2. Open-loop frequency response (Bode plot) of a type-2, third-order charge-pump PLL for different values of phase margin  .

(a)

spectral purity requirements determines the loop filter design. Nonetheless, robustness for variations in tuning system parameters, for all reception bands, has to be insured. The relationships between different performance aspects on system level are discussed in the following section. III. SETTLING TIME AND SPECTRAL PURITY PERFORMANCE The properties of a PLL are strongly related to its phase detector implementation [5]. Present-day PLL frequency synthesizers usually employ the tristate, sequential phase frequency detector (PFD), combined with a charge pump (CP) [6]. The analysis of the PLL properties presented in this paper assumes the use of a PFD/CP in the loop. A. Settling Time, Loop Bandwidth, and Loop Phase Margin Bode diagrams are a powerful tool for designing PLL tuning systems [7], [8] because they enable direct assessment of the and open-loop bandwidth (0-dB freloop's phase margin are obquency ). Accurate and reliable results for and tained with ease to implement behavioral models [9] and with fast ac simulation runs. In spite of the advantages of the “ac method,” design equations relating the settling performance of a type-2, third-order charge-pump PLL1 [6] to its open-loop bandwidth and phase margin have, to the best of our knowledge, not yet been published in the open literature. Fig. 2 presents Bode plots of a type-2, third-order loop for . Fig. 3(a) displays the trandifferent values of phase margin sient response of such a loop for three different values of phase , normalized margin. The responses are plotted as . is the remaining frequency error with respect to for is the amplitude of the frequency jump. the final value and , so that Fig. 3(b) presents the responses as on the “long-term” transient response is easily the impact of observed. The influence of the phase margin on the settling time, obtained with transient simulations similar to those of Fig. 3, is presented in Fig. 4. The figure shows the time necessary for to reach a numerical value of the value of 10. The settling time decreases with increasing phase margin, 1The

most widely used configuration in synthesizer applications.

(b) Fig. 3. Setting transient for different values of  , normalized for f t. (a) Setting error (represented as f f t =f ) versus f t. (b) Setting error (represented as j f f t j=f ) versus f t.

ln( 1 ( )

1( ) )

reaching a minimum for values of around 50 . Increasing the phase margin further leads to a sharp increase in the settling time. The relationship of settling time and phase margin, displayed in Fig. 4, can be understood with the help of Fig. 5. It presents the pole and zero locations of the closed-loop transfer function of a third-order loop with different values of phase margin (Bode plots presented in Fig. 2). The real part of the dominant (comfor values of of about 50 . When plex) poles approach equals 53 , all three poles lie at . That is the location with the fastest damping of the transient error. The fastest response, however, is obtained with 51 . The complex parts of the poles “speed up” the settling transient a bit further (25%). For higher values of phase margin, the dominant real pole moves to the right on the real axis. This pole is responsible for the slowing 53 . Fig. 5 down of the PLL response for values of shows that the dominant pole, for 60 phase margin, lies at about 0.4 . Hence, it may be concluded that the usual practice of designing critically damped loops—which have a phase margin of about 70 [5]—is not appropriate for fast-settling-time applications. Let us consider Fig. 3(b) again. One sees that the (envelope of the) curves can be approximated by straight lines. The ap-

VAUCHER: ADAPTIVE PLL TUNING SYSTEM ARCHITECTURE

Fig. 4. Setting time as function of the phase margin for f

493

=f

=e

.

( ) for a 1(ln( )) of ten.

Fig. 6. Average values of  

1

In (3):

Fig. 5. Position of the closed-loop poles and zeros of a third-order PLL corresponding to different values of  , as displayed in Fig. 2.

proach proposed here takes into account with the help of an . By so doing, we arrive at effective damping coefficient the following approximation for the envelope of the curves of Fig. 3(b): (1) can be obtained from tranNumerical estimations for sient simulations with the help of the following expression: (2) The settling time results presented in Fig. 4 leads to the nudisplayed in Fig. 6. These values repmerical values for , as they are obtained from a resent an average value for of ten. Manipulation of (1) results in an equation describing the minimum loop bandwidth required to achieve given settling speciand fications (3)

locking time(s); amplitude of the frequency jump (Hz); maximum frequency error (Hz) at ; can be read from Fig. 6. Two points about the present treatment of the transient response need further explanation. First, the presented results are based on a linear continuous-time model for the discrete-time charge-pump PLL. It is known in the literature [6] that the continuous-time approach is a good approximation for the discrete-time PLL if the reference (sampling) frequency of the loop is at least a factor of ten higher than its open-loop bandwidth . Therefore, the value of , calculated with (3), . If has to be checked against the loop's reference frequency is smaller than ten, then actual settling the target ratio behavior will deviate from the calculations. The second point is that usual implementations of the phase frequency detector have a limited linear phase error detection range, namely, from 2 to 2 [9]. When the instantaneous becomes larger than 2 , the PFD interprets phase error 2 . This effect leads to a the error information as longer settling time than predicted with (3). The maximum value , denoted , was found to obey the following relaof , where is the main ditionship: is a fitting factor for the influence of the vider ratio and . Numerical values for , obtained phase margin on from transient simulations, lie in the range [0.7,0.8]. Hence, the maximum phase error is contained in the interval 2 , when 2 . If this condition is satisfied, then the (discrete-time) transient response is accurately predicted by the continuous-time linear model. Inaudible RDS background scanning requires settling times of 1 ms, defined as a residual settling error of 6 kHz for a 20-MHz frequency jump. The nominal loop phase margin is set of five. On the other to 50 , which corresponds to a in the hand, it is appropriate to use a lower value for calculations (e.g., 2.5), to provide enough margin for variations in the nominal values of loop bandwidth and phase margin. Solving (3) for these settling specifications leads to a nominal value of 3.2 kHz for the loop bandwidth .

494

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 4, APRIL 2000

Fig. 7. FM noise density and residual FM for loop bandwidths of 800 Hz and 3 kHz.

The loop bandwidth that satisfies different settling requirements can be calculated with the help of (3). Settling specifications, however, often require loop bandwidths that are not optimal with respect to spectral purity performance, as will become clear in the next subsection. B. Phase Noise Performance and Loop Bandwidth The dependency of the total phase noise of a PLL tuning system on the phase noise of the loop components is well known in the literature [3], [5], [10]. The phase noise of the VCO is suppressed inside the loop bandwidth, whereas the (phase) noise from the other building blocks is transferred to the VCO output, multiplied by the closed-loop transfer function of the PLL: a low-pass function that suppresses their noise contribution outside the loop bandwidth. There is a “crossover point” for the loop bandwidth, where the noise contribution from the dividers and charge pump becomes dominant with respect to the noise from the VCO. For terrestrial FM reception, the LO signal residual frequency noise (residual FM) determines the ultimate receiver's SNR performance. The SNR specification for the application is 64 dB, defined for a reference level of 22.5-kHz peak deviation with 50- s deemphasis. Complying to the specification requires the residual FM in the LO signal to be less than 10 Hz rms. The frequency (FM) noise density of the LO signal is linked to its phase noise power density by [5]. equals , the single-sideband noise-to-carrier ratio, so that 2 . Finally, the residual FM can be calculated (4) in (4) depend on the signal The integration limits and bandwidth of the application [3]. For terrestrial FM reception, the lower limit is 20 Hz and the higher is 20 kHz. Fig. 7 presents

the simulated frequency noise (FM noise) power density and the residual FM, which is plotted as function of , with fixed at 20 Hz. The FM noise density and the residual FM are plotted for values of loop bandwidth of 800 Hz and of 3 kHz. For 3 kHz, the residual FM amounts to 40 Hz rms, which is 12 dB higher than the specification. A loop bandwidth of 800 Hz, on the other hand, leads to a residual FM of 8 Hz rms, which satisfies the SNR requirement. The contributions of different noise sources to the total frequency noise density, in the case of an 800-Hz loop bandwidth, are displayed in Fig. 8. The contribution of the VCO to the residual FM equals that of the other synthesizer building blocks. This is a good compromise, and 800 Hz was chosen as the nominal loop bandwidth for in-lock situations. The settling specification requires a bandwidth of 3.2 kHz. The SNR constraint, on the other hand, asks for 800 Hz. These conflicting requirements can be combined when the loop bandwidth is made adaptive as a function of the operating mode: frequency jump or in-lock. Adapting the value of the loop bandwidth during frequency jumps is easily accomplished by switching the nominal value of the charge-pump current [6], [13]. This method, however, often causes disturbances in the VCO tuning voltage—the so-called secondary glitch-effect—at the moment the current is switched from high to low values. These disturbances are highly undesirable, as they have to be corrected by the loop in small bandwidth mode. What is more, the “secondary glitches” may cause audible disturbances in analog systems and increase the bit error rate in digital systems. To provide stability for a small bandwidth loop requires a transfer function zero located at low frequencies (large time constant). A low-frequency zero, however, is undesirable for operation in high bandwidth mode. It causes the phase margin to be “too” high, which increases the settling time. Note that the efdecreases for high values of fective damping coefficient phase margin (see Fig. 6).

VAUCHER: ADAPTIVE PLL TUNING SYSTEM ARCHITECTURE

495

Fig. 8. Contributions from different noise sources to the total FM noise density and residual FM (20 Hz–20 kHz) with 800-Hz loop bandwidth.

Therefore, for optimal settling time and phase noise, one has not only to switch the value of the loop bandwidth but also to change the location of the zero in the transfer function. C. Reference Spurious Signals and Loop Filter Attenuation The use of phase frequency detectors yields the minimum levels of spurious breakthrough at the reference frequency [11]. The spurious signals are due to compensation of leakage currents or to imperfections in the charge pump’s implementation. Standard FM modulation theory and the small angle approximation lead to the following equation for the amplitude of the from spurious signal (in dBc), which is at an offset frequency the carrier:

Fig. 9.

Adaptive PLL tuning system architecture.

spurious (5) where offset frequency from the carrier (Hz); amplitude of ac current component with frequency (A); impedance of the loop filter at (V/A); VCO gain (Hz/V). is twice the value of the loop-filter The value of dc leakage current [12] in loops operating with well-designed charge pumps. In cases where the charge pump has chargesharing problems and/or charge injection into the loop filter, may become dominated by these second-order effects. The imperfections can lead to spurious components with (much) higher amplitudes than would be expected based on the leakage current alone. Rearranging the above equation leads to a formula that relates to the specified maximum the required filter attenuation at , to the dc leakage level of spurious signals , and to the VCO gain current (6)

Fig. 10. Loop-filter configuration, charge-pump currents, and component values used in the global car-radio tuner IC.

The relevant values of equal and its harmonics in a Hz. standard PLL operating with a reference frequency of Therefore, the required loop-filter (trans)impedance for these frequencies can be readily calculated. The VCO gain, the spurious specification, and the expected (maximum) leakage current are known.

496

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 4, APRIL 2000

Fig. 11.

Bode plots of the adaptive loop during frequency jumps and in-lock.

Fig. 12.

Implementation of the DZ building block.

An important conclusion to be taken from the above equations is that the amplitude of the spurious signals is not dependent on the absolute value of loop bandwidth. Instead, it is determined by the (trans)impedance of the loop filter. This means that, at least in principle, “any” spurious specification can be achieved simply by decreasing the impedance level of the loop filter. In practice, this is not a viable option because the PLL loop bandwidth is proportional to the value of the loop-filter resistor and to the charge-pump current [6]. For a constant value of the loop bandwidth, a decrease of the loop-filter impedance level requires a proportional increase of the nominal charge-pump current. This leads to difficulties in the charge-pump design and to higher power dissipation. To avoid these difficulties, more RC sections are added to the basic loop-filter configuration, so that the filter attenuation at higher frequencies is increased. Additional RC sections, however, inevitably cause phase lag at lower frequencies. The phase lag de-

creases the loop phase margin and increases the settling time in high-bandwidth mode. Therefore, to provide optimal settling, low-power dissipation, and good spurious performance, one has not only to switch the value of the loop bandwidth but also to bypass (some) RC sections of the loop filter. The PLL architecture presented here complies with these requirements.

IV. ADAPTIVE PLL ARCHITECTURE A. Basic Architecture The basic idea is to have two loops working in parallel, as depicted in Fig. 9. Loop 1, built around PFD1 and CP1, is dimensioned for in-lock operation. Loop 2, built around PFD2, DZ, and CP2, is dimensioned for fast settling time. Loop 1 operates all the time, whereas Loop 2 is only active during tuning

VAUCHER: ADAPTIVE PLL TUNING SYSTEM ARCHITECTURE

actions. Loop 1 and Loop 2 share the crystal oscillator, the reference divider, and the main divider. A smooth takeover from Loop 1, after a frequency jump, avoids “secondary glitch” effects. The high-current charge pump CP2 is only active during tuning. CP2 is controlled by the dead-zone (DZ) block. DZ generates a smooth transition into a well-defined dead zone for CP2 when lock is achieved, so that sudden disturbances of the VCO tuning voltage are avoided. Additional freedom for optimization of the loop parameters is obtained by using two separate charge-pump outputs and by applying the charge-pump currents to different nodes of the loop filter. In this way, the location of the zeros for frequency jumps and in-lock can be set in a continuous way, without switching of loop components—which is a source of “secondary glitch” problems. Furthermore, the path from Icpl to Vtune may contain additional filtering sections for, e.g., attenuation of spurious signals and/or fractional- quantization noise [14]. These filter sections may be bypassed by Icph to increase the phase margin in high-bandwidth mode. B. Loop-Filter Implementation The ideas described above are demonstrated with the help of Figs. 10 and 11. Fig. 10 presents the loop-filter configuration and component values used in the global tuner IC (Fig. 1). Fig. 11 shows the optimized Bode diagrams of the adaptive PLL (in FM mode) with the loop filter of Fig. 10. During frequency jumps both CP1 and CP2 are active; the loop filter zero frequency is 1/2 RbCa and lies at a high frequency, matching the 0-dB open-loop frequency. It enables stability and fast tuning to be achieved. The nominal loop bandwidth in this mode is 3.2 kHz, and the phase margin is 50 . After the frequency jump only CP1 is active. The zero of the loop filter moves to a lower frequency (1/2 Ra Rb Ca), without the switching of loop-filter components. The low-frequency zero increases the phase margin in-lock. When the loop is in-lock, an extra pole is introduced (1/2 RcCc), which increases the 100-kHz reference suppression by about 20 dB. During frequency jumps, these elements are bypassed by CP2, increasing the phase margin in high-bandwidth mode. If the loop bandwidth were increased by simply switching the amplitude of CP1, one would end up with an unstable loop, because of a phase margin of less than 10 in high-bandwidth mode. C. Dead-Zone Implementation The new element in the adaptive PLL architecture is the combination of the DZ block with the high-current charge pump CP2. The function of DZ is to provide CP2 with a well-des. The dead zone is centered symmetfined dead zone of rically around the locking position of charge pump CP1 [see Fig. 13(a)]. The logic diagram of the DZ/CP2 combination is depicted in Fig. 12. The figure shows how the different logic functions influence the duty cycle of the up and dn signals from the phase frequency detector (PFD2). At the input of DZ, the up and dn signals have a finite duty cycle, even for an in-lock situation . The finite duty cycle eliminates dead-zone problems in CP1. The XOR and AND gates are used to cancel the finite

497

(a)

(b)

(c) Fig. 13.

Shift in locking position as function of VCO tuning voltage.

in-lock duty cycle. The processed up and dn signals are then applied to low-pass filters and slicers, whose function is to prevent pulses that have too small a duty cycle from reaching CP2. The cutoff frequency of the low-pass filters, the discrimination level of the slicers, and the turn-on time of CP2 determine the size of s. the dead zone around the lock position A tradeoff among settling performance, circuit implementation, and robustness arises, when the magnitude of the dead zone has to be determined. Let us start discussing circuit aspects. The dead zone of charge pump CP2 should be centered around the locking position of the loop for optimum settling and spectral purity performance. The locking position, however, is a function of the output voltage of charge pump CP1. The effect is depicted in Fig. 13. One sees that, as the tuning voltage Vtune increases, there is a shift of the locking position to positive . The reason lies in the finite output resistance values of of the active element used in CP1. Different current gains in CP1's UP and DOWN branches need to be compensated by up and dn signals with different duty cycles at the locking point. Different duty cycles are accomplished by a shift in the loop's locking position. Fig. 13 shows situations where the gain in the UP branch of the pump decreases as Vtune increases. The ideal operating situation is depicted in Fig. 13(a). Situation (b) is still allowed from the point of view of spectral purity but has asymmetrical settling performance. Finally, (c) depicts a situation that should never happen: the locking position shifts so much that the high-current charge pump CP2 becomes active and degrades the in-lock spectral purity. Therefore, increasing the size of CP2's dead zone s) eases the design of charge pump CP1 and increases the ( robustness of the system. On the other hand, the size of CP2's dead zone influences the settling performance of the adaptive loop. The influence of on the transient response was simulated with behavioral models. The results are displayed in Fig. 14, together with the settling requirements that ensure inaudible background scanning functionality. Table II presents the settling time for different settling . A dead-zone value of accuracies and different values of infinity corresponds to the situation where only CP1 is active

498

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 4, APRIL 2000

Fig. 14.

Detail of settling transient for different values of  .

TABLE II SIMULATED IN-LOCK SNR AND SETTLING TIME (ms) FOR A 20-MHz FREQUENCY JUMP FOR DIFFERENT VALUES OF THE DEAD ZONE AND DIFFERENT SETTLING ACCURACIES

(nonadaptive loop). Table II shows that by using the adaptive loop architecture, it is possible to combine fast settling time with leaves more “residual” phase good SNR in-lock. Increasing (and frequency) error to be corrected by the small bandwidth loop. The closer one comes to the locking point in high bandwidth mode, the shorter the total settling transient will be. A dead-zone value of 15 ns is a good compromise for the intended application.

Fig. 15.

Micrograph of the tuner IC.

V. CIRCUIT IMPLEMENTATION A die micrograph of the total tuner IC is displayed in Fig. 15. The adaptive PLL has been integrated with the other functional blocks of Fig. 1 in a 5-GHz, 2- m bipolar technology [15]. Fig. 16. Architecture of the main programmable divider.

A. Programmable Dividers The architecture of the main divider is depicted in Fig. 16. The high-frequency part of the programmable divider is based on the programmable prescaler concept described in [12] and consists of a chain of 2/3 divider cells. The modular architecture enables easy optimization of power dissipation and robustness for process variations. The division range of the basic prescaler configuration is extended by the low-frequency programmable counter. The logic functions of the PLL were implemented with

current routing logic techniques (CRL) [12], [16]. The low-frequency part of the main and reference dividers operate with low current levels to limit total power dissipation. To decrease the phase noise of the reference signal going to the phase detectors, this signal is reclocked in a high-current D-flip-flop (D-FF). The clean crystal signal is used to clock the D-FF. The total main divider current consumption is 5 mA. The first 2/3 cell consumes 2.1 mA.

VAUCHER: ADAPTIVE PLL TUNING SYSTEM ARCHITECTURE

Fig. 17.

Simplified circuit diagram of charge pump CP1.

Fig. 18.

CP1 and CP2 charge-pump currents as a function of

499

1t .

B. Oscillators The LC VCO uses an external tank circuit. It can be tuned from 150 to 250 MHz, with a voltage tuning range from 0.5 to 8 V. The VCO phase noise is 100 dBc/Hz at 10 kHz, for a carrier frequency of 237 MHz. The VCO core consumes 1.5 mA. The 20.5-MHz reference crystal oscillator operates in linear mode, to avoid harmonics interfering in the FM reception bands. Quadrature generation for the image rejection FM mixers (see Fig. 1) is accomplished in a divider-by-two (FM DIV), with the exception of reception in the American Weather Band (WX). In that case, I/Q signals are generated with a RC-CR network directly from the VCO. This avoids the need to have the VCO operating at 346 MHz, and a change in the LC VCO tuned circuit during WX reception. C. Charge Pumps Fig. 17 shows the simplified circuit diagram of the low-current charge pump CP1. The up and dn signals from the phase

Fig. 19.

Settling transient for a 20-MHz tuning step.

500

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 4, APRIL 2000

(a)

(b) Fig. 20.

Spectral purity measurements in FM mode: (a) reference spurious breakthrough and (b) close to the carrier.

detector drive the input differential pairs, which set the currents in the PNP current switches Q1 and Q2 on and off. The collector outputs of Q1 and Q2 are kept at equal dc levels by the dc feed-

back arrangement provided by Q3 and Q4. This prevents asymmetry in the source and sink currents, ensuring good centring of the charge-pump characteristics for all tuning voltages. Q5 and

VAUCHER: ADAPTIVE PLL TUNING SYSTEM ARCHITECTURE

501

Fig. 21. Evaluation of the FM channel—VCO purity determines SNR for V kHz; 26 dB = 2:0 V. THD meas.: FMdev = 75 kHz.

> 300 V. Fin = 97:1 MHz, AF freq = 1 kHz. SNR meas.: FMdev = 22:5

Q6 provide means for stabilization of currents and for speeding up the switching of Q1 and Q2. The reset circuits monitor the currents in Q1 and Q2 and generate the reset signals RST Up and RST Dn. These signals are fed back to reset the phase detectors. The high-current charge pump CP2 is a scaled-up version of the CP1 circuit, without the reset circuits.

VII. CONCLUSION

VI. MEASUREMENTS The measured charge-pump currents as a function of the time difference between the phase detector inputs are shown in Fig. 18. Good centering of the two charge-pump outputs is observed, and there is enough margin for variations in the in-lock position of CP1. The measured settling transient response is displayed in Fig. 19. The settling performance complies to the settling requirements and enables inaudible background scanning in single-tuner RDS applications. The frequency spectrum of the VCO in FM mode is presented in Fig. 20(a) and (b). Fig. 20(a) shows the spurious reference breakthrough at 100 kHz to be under 81 dBc. There is yet a 6-dB improvement in noise and spurious breakthrough before the VCO signal reaches the FM mixers, due to the division by two in the FM DIV divider (see Fig. 1). Fig. 20(b) displays the phase noise spectrum close to the carrier. Spectrum measurements done in AM mode showed a reference spurious breakthrough of 57 dBc, at an offset of 20 kHz from the carrier. For AM, the improvement in phase noise and spurious performance amounts to 26 dB, due to the division by 20 in between the VCO and the AM mixers. Finally, the SNR and THD of the total FM receiver chain are displayed in Fig. 21 as a function of the antenna input signal . For low values of , the noise is dominated by RF level input noise and by the quality of the building blocks in the signal processing chain: low-noise amplifier, mixers, and demodulator. ( 300 V), the dominant noise source For high values of becomes the LO signal. The excellent measured FM sensitivity, 2.0 V for 26-dB SNR, and the ultimate SNR of 65 dB verify the spectrum purity of the tuning system and of the RF channel.

This paper described an adaptive PLL architecture for high-performance tuning systems. The relationships of performance aspects to design variables were presented. It is demonstrated that design for spectral purity performance often leads to suboptimal settling performance, because of different requirements on the loop bandwidth and on the location of the zeros and poles of the closed-loop transfer function. The adaptive architecture described here resolves these contradictory requirements, without the necessity of switching circuit elements in the loop filter. The adaptation of loop bandwidth occurs continuously, as a function of the phase error in the loop, and without interaction from outside of the tuning system. During frequency jumps, high bandwidth and high phase margin are obtained by bypassing filter sections. When the loop is locked, the architecture allows heavy filtering of spurious signals. The implementation of the dead-zone block was presented, and the basic tradeoffs of the concept were discussed. The adaptive PLL was optimized for use in a multiband (global) car-radio tuner IC, which features inaudible background scanning. Design and architecture of the PLL building blocks were discussed, and measurement results were presented. The integrated adaptive PLL tuning system achieved state-of-the-art settling and spectral purity performance in its class (integer- PLL’s). It fulfills simultaneously the speed requirements for inaudible frequency hopping and the heavy SNR specification of 64 dB. ACKNOWLEDGMENT The author wishes to thank D. Kasperkovitz for technical support during the project, K. Kianush for his tireless disposition in bringing the car-radio project to a successful end, H. Vereijken for the optimization and layout of the synthesizer building blocks, B. Egelmeers for the implementation and evaluation of the concept in a bread-board functional model, and G. van Werven for the measurements.

502

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 4, APRIL 2000

REFERENCES [1] B. Razavi, “A 900 MHz/1.8 GHz CMOS transmitter for dual-band applications,” IEEE J. Solid-State Circuits, vol. 34, pp. 573–579, May 1999. [2] K. Kianush and C. S. Vaucher, “A global car radio IC with inaudible signal quality checks,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, 1998, pp. 130–131. [3] W. P. Robins, Phase Noise in Signal Sources, 2nd ed, ser. 9. London, U.K.: Inst. Elect. Eng., 1996. [4] H. Adachi, H. Kosugi, T. Awano, and K. Nakabe, “High-speed frephase-locked loop,” quency-switching synthesizer using fractional IEICE Trans. Electron., pt. 2, vol. 77, no. 4, pp. 20–28, 1994. [5] U. L. Rohde, RF and Microwave Digital Frequency Synthesizers. New York: Wiley, 1997. [6] F. M. Gardner, “Charge-pump phase-lock loops,” IEEE Trans. Commun., vol. 28, no. 11, pp. 1849–1858, Nov. 1980. [7] H. Meyr and G. Ascheid, Synchronization in Digital Communications. New York: Wiley, 1990. [8] F. M. Gardner, Phase-Lock Techniques. New York: Wiley, 1979. [9] B. Razavi, Ed., Monolithic Phase-Locked Loops and Clock Recovery Circuits. New York: IEEE Press, 1996. [10] V. F. Kroupa, “Noise properties of PLL systems,” IEEE Trans. Commun., vol. C-30, pp. 2244–2552, Oct. 1982. [11] C. S. Vaucher, “Synthesizer architectures,” in Analog Circuit Design, R. J. van de Plassche, Ed. Norwell, MA: Kluwer, 1997.

N

[12] C. Vaucher and D. Kasperkovitz, “A wide-band tuning system for fully integrated satellite receivers,” IEEE J. Solid-State Circuits, vol. 33, no. 7, pp. 987–998, July 1998. [13] K. Nagaraj, “Adaptive charge pump for phase-locked loops,” U.S. Patent 5 208 546, 1993. [14] B. Miller and B. Conley, “A multi-modulator fractional divider,” in Proc. IEEE 44th Annu. Symp. Frequency Control, 1990, pp. 559–567. [15] Philips Semiconductors, TEA6840H global car-radio tuner datasheet, 1999. [16] W. G. Kasperkovitz, “Digital shift register,” U.S. Patent 5 113 419, 1992.

Cicero S. Vaucher (M’98) was born in São Francisco de Assis, Brazil, in 1968. He graduated in electrical engineering from the Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil, in 1989. He joined the Integrated Transceivers group of Philips Research Laboratories, Eindhoven, The Netherlands, in 1990, where he works on implementations of low-power building blocks for frequency synthesizers, on synthesizer architectures for low-noise/high-tuning-speed applications, and on CAD modeling of PLL synthesizers.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 10, OCTOBER 2000

1445

Fast-Switching Frequency Synthesizer with a Discriminator-Aided Phase Detector Ching-Yuan Yang, Student Member, IEEE, and Shen-Iuan Liu, Member, IEEE

Abstract—A phase-locked loop (PLL) with a fast-locked discriminator-aided phase detector (DAPD) is presented. Compared with the conventional phase detector (PD), the proposed fast-locked PD reduces the PLL pull-in time and enhances the switching speed, while maintaining better noise bandwidth. The synthesizer has been implemented in a 0.35- m CMOS process, and the output phase noise is 99 dBc/Hz at 100-kHz offset. Under the supply voltage of 3.3 V, its power consumption is 120 mW.

II. BASIC IDEA AND MODEL A simple charge-pump PLL consists of four major blocks: the phase detector (PD), the charge-pump circuit, the loop filter, and the voltage-controlled oscillator (VCO) [3]–[6]. Fig. 2 shows the linear model of a charge-pump PLL-based frequency synthesizer. The closed-loop transfer function can be represented as

Index Terms—Bandwidth adjusting, fast acquisition, fast locking, frequency synthesizers, phase detectors, phase-locked loops.

I. INTRODUCTION

P

HASE-LOCKED loop (PLL) circuits have been found to be useful wherever there is a need to synchronize a local oscillator with an independent incoming signal, such as serial data links and RF wireless communications. In order to optimize the loop performance, some features should be taken care of [1], [2]. First, to minimize output phase jitter due to external noise, the loop bandwidth should be made as narrow as possible. Second, to minimize output jitter due to internal oscillator noise, or to obtain best tracking and acquisition properties, the loop bandwidth should be made as wide as possible. These principles obviously oppose each other; and therefore some compromises between these two principles are always inevitable. The block diagram of a PLL with a discriminator-aided phase detector (DAPD) is shown in Fig. 1. One could leave the discriminator connected permanently and/or merely weight the relative contributions of the system so as to obtain the desired damping. The discriminator-aided path adds to lock the PLL quickly. Once the PLL is in lock, a better bandwidth can be maintained while the discriminator is disconnected. In this paper, a novel DAPD is presented to reduce pull-in and to enhance the switching speed of the PLL, while time maintaining the same noise bandwidth and avoiding modulation damping. Section II describes the basic concept of the proposed structure. Sections III and IV present the realization and the measurement of the system, respectively, and Section V concludes the paper. Manuscript received November 30, 1999; revised April 28, 2000. This work was sponsored by the National Science Council under Contract 88-2219-E-002-024. C.-Y. Yang was with the Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan 10617, R. O. C. He is now with the Department of Electronic Engineering, HuaFan University, Taipei, Taiwan 223, R.O.C. S.-I. Liu is with the Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan 10617, R. O. C. Publisher Item Identifier S 0018-9200(00)08697-2.

(1) The conventional PD is implemented in conjunction with a charge-pump loop filter in the PLL, as illustrated in Fig. 3. To determine the transfer function of the PD, assume there is and in the a time interval between two input signals PD, the output current of the charge-pump circuit is a pulse of duration , and the amplitude of the charge-pump current is . In the continuous-time approximation, the average value per input signal period can be given as (2) The transfer function curve of a linear PD is shown in Fig. 4(a), where the vertical axis represents the charge injected into the loop filter during one period of the input signal. The characteristic of a nonlinear PD, as shown in Fig. 4(b), can be divided into two regions [7]. It has the same characteristic within the locked-in region as that of the linear PD, but the acquisition time will be reduced with the steeper characteristic outside the lock-in region. When designing a PLL with the nonlinear PD, first the central slope is determined to fulfill the requirement of noise and modulation for the PLL with a standard PD. Then, is gradually increased to improve acthe slope near quisition speed. The proposed nonlinear PD can be built with delay cells and standard PD circuits, as shown in Fig. 4(c). The standard PD is a digital circuit, triggered by the positive edge of the input reference signal and the output feedback signal . Considering the delay cells with , the PDs decide the among these regions. Acposition of the phase difference cording to the value of , the charge pump will output the corresponding current controlled by the up signals or the down signals . The behavior model of the nonlinear PD can be explained by the waveforms of Fig. 4(d). According to the time difference between both input signals and , the up signals are used to increase and the down signals are used to decrease the frequency of signal . The nonlinear PD always generates the right signal to equalize the frequency of both input signals as the conventional PD. The time interval is positive

0018–9200/00$10.00 © 2000 IEEE

1446

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 10, OCTOBER 2000

Fig. 1.

Block diagram of PLL with discriminator-aided phase acquisition.

Fig. 2.

Linear model of PLL frequency synthesizer.

and a resistor with a capacitor added in parallel as shown in Fig. 2. The impedance of this filter can be (4)

and with open-loop gain of the PLL equals

. The

(5) Fig. 3. Phase detector with charge-pump filter.

which has a crossover frequency of

(negative) where leads (lags) . When is larger than , may appear “high” level; when is smaller may appear “high” level. As the nonlinear PD than is applied, two cases can occur during different time interval : , the injected charge . Case 1: , the injected charge Case 2: , which can be approxias , is very small. mated Generally, the total transfer function of the PD with the current-pump circuit and the loop filter can be expressed as [8]

(6) The open-loop gain of this third-order PLL can be calculated in terms of the frequency , as follows:

(7) and its phase margin can be determined in terms of

(3) (8) is the pump current of the charge-pump circuit. The where is the series connection of a capacitor impedance

In order to maintain the same loop gain and phase margin for

YANG AND LIU: FAST-SWITCHING FREQUENCY SYNTHESIZER

1447

Fig. 4. (a) Characteristic of the conventional linear phase detector. (b) Characteristic of the nonlinear phase detector. (c) Block diagram of the nonlinear phase detector. (d) Operation of the nonlinear phase detector.

Fig. 5.

Block diagram of the frequency synthesizer.

the sake of stability, the charge-pump current becomes instead of outside the locked-in region, and the loop-filter instead of while increases resistor would become times, i.e., the loop bandwidth increases times. It may speed up the switching capability of the PLL. Once it is locked on the correct frequency, the PLL will then return to the low-noise operation.

III. CIRCUIT REALIZATION A. Architecture The designed frequency synthesizer integrates the proposed DAPD, the charge-pump circuit, a prescaler, and a VCO in a single CMOS chip. It is similar to the structure of a conventional integer-N frequency synthesizer, as shown in Fig. 5. By adding

1448

Fig. 6.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 10, OCTOBER 2000

Schematic of phase detector with DAPD and charge-pump filter.

the frequency-doubling block, the output frequency can be up to 900 MHz from a 450-MHz VCO. B. Phase Detector with DAPD and Charge-Pump Filter A schematic diagram of the DAPD is shown in Fig. 6. The phase frequency detectors are used to compare the phase difference of ) of the DAPD deboth input signals. The output signal ( pends on the phase difference of both input signals whether it is or not. Considering the delay cells with delay larger than , which is very small but never negligible, the DAPD decides the operating bandwidth of the loop filter. When leads , the , and is “low”and is time difference islarger than is smaller than , and “high.” Otherwise, when lags is “high” and is “low.” In a word, if the absolute value , of the time difference between input signals is larger than may appear “high” level. Also, the charge-pump current beandtheresistoroftheloopfilterbecomes ,i.e., comes . Until the absolute value of is within and are both “high,” thus is brought to “low” level, then and , rethe charge-pump current and the resistor return to spectively, with a narrower bandwidth for better noise rejection. However, the delay cell is adopted according to the VCO’s noise. , Assuming that the phase characteristic of the signal is should be larger than to make the DAPD work. In our design, the loop bandwidth of the PLL equals about krad/s, and the loop gain zero and the loop pole

are placed on a factor of four below and above , of 560 A is applied respectively. In addition, a pump current is chosen. The values of the resistors and the parameter and and the capacitors and are 470 , 235 , 33 nF, and 2.2 nF, respectively. The open-loop gain response is depicted in Fig. 7. Curve (a) is the characteristic of the PLL with the DAPD while the bandwidth is 120 kHz. However, the PLL will return curve (b) with the bandwidth of 40 kHz when it is near in lock. These curves give the same phase margin of approximately 60 . Thus the PLL would be usually stable. Currently, most frequency synthesizers use phase-frequency detectors (PFDs) as their PDs. A PFD is a sequential circuit which can not only detect the phase error but also provides a frequencysensitive signal to aid acquisition when the loop is out of lock. The drawback of some conventional PFDs is a dead zone in the phase characteristic, which generates the phase error in the output signals. To solve this problem, a dynamic CMOS PFD is adopted as shown in Fig. 8(b), which is similar to the one proposed in [10]. The PFD consists of two half-transparent registers, shown in Fig. 8(a), [9] and a NAND gate. It is triggered by the negative edge of input signals. The timing diagram of the PFD is shown in Fig. 8(c). Even though the input signals are in-phase, the glitches caused by the reset path always exist. So, extra filters are added in the DAPD to remove the effect of the glitches. So far, the positive gain of the VCO is applied from the above discussion. However, since the gain of the VCO is negative as

YANG AND LIU: FAST-SWITCHING FREQUENCY SYNTHESIZER

1449

Fig. 7. Simulated open-loop gain Bode plot.

(a)

(b)

(c) Fig. 8. Implementation of phase-frequency detector. (a) Half-transparent cell. (b) Phase-frequency detector (PFD). (c) Timing diagram of PFD.

described later, and of the PD connected to the charge pump should be interchanged. The charge pump, which is based on one described in [11], is adopted. It suppresses the charge sharing from the parasitic capacitance by a pair of switchedcurrent sources. C. Dual-Modulus Prescaler The dual-modulus prescaler is the high-frequency building block in the frequency synthesizer. This circuit shown in Fig. 9 divides the frequency of the VCO output signal by a factor of 32 or 33 depending on the logic value of the controlled signal

mode [12]–[15]. It consists of a synchronous divide-by-4/5 counter as the first stage and an asynchronous divide-by-8 counter as the second stage. The circuits in the first stage are fully differential, while the single-ended logic circuits are used in the second stage. To reduce the supply noise, an emitter-coupled logic (ECL)-like differential logic is used in the high-speed stage [16]. In the divide-by-4/5 circuit, the DFF is a differential flip-flop. Fig. 10 shows the schematic diagram of a NAND-gate logic flip-flop. Merging the logic gates to a flip-flop saves power and increases the operating speed. The toggle flip-flops are made by true single-phase clocking (TSPC) DFFs of [12] behind a differential-to-single buffer.

1450

Fig. 9.

Fig. 10.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 10, OCTOBER 2000

Fig. 11.

VCO schematic.

Fig. 12.

IC microphotograph.

Fig. 13.

Experimentally measured VCO transfer curve.

Functional block diagram of the dual-modulus prescaler.

Schematic of the differential NAND-gate flip-flop.

This buffer is used to achieve the rail-to-rail output signal in the low-speed stage. D. VCO The VCO is another high-frequency building block in a frequency synthesizer. Still, an ECL-like current-mode differential pair, as shown in Fig. 11, is used as a delay cell [17], [18] to achieve high common-mode rejection in a four-stage ring oscillator. The coarse tuning of the ring-oscillator’s center frequency is achieved by the bias Vbpo1 (or through the use of a digital-to-analog converter), and a fine tuning technique is needed for the PLL voltage-control path. The gain required for the oscillator is easily determined by the ratio of M1 and M2 as the current gain. The proposed delay cell has the better noise performance because the operation of the circuit is carried out by the differential signal immune to the power-supply-injected and substrate-injected noise sources. The replica bias circuit adjusts the load over a wide range in response to a swept supply current. It insures the output swing of delay cells maintain fixed and takes a changeable bias current to cover a suitable range of different output frequencies. Bypass capacitors are also an important consideration for the replica bias and voltage reference circuits. On-chip bypass capacitors can be used to help reduce their noise contribution to the ring-oscillator delay cells.

IV. MEASUREMENT RESULTS The synthesizer is implemented in a 0.35- m CMOS process. The microphotograph of the fabricated frequency synthesizer is shown in Fig. 12. The loop filter is off-chip, and the output signal of the VCO is connected to a source follower. The frequency synthesizer is measured at a supply voltage of 3.3 V. The frequency of the reference signal is 14 MHz. Fig. 13 shows the measured VCO transfer function by varying the controlled voltage. The measured VCO has a monotonic frequency range of 435–485 MHz. The gain of the VCO is 32.4 MHz/V at the center frequency of 460 MHz. Fig. 14 shows the output signal spectrum (using HP8560A Spectrum Analyzer after locked) of 448 MHz with the phase noise 99 dBc/Hz at 100-kHz offset. By adding an external frequency doubler, however, the phase noise is 91 dBc/Hz at 100-kHz offset from 896-MHz carrier as shown in Fig. 15. Also, the measured waveform in the time domain is also shown in Fig. 16, and its rms and peak-to-peak jitter measured by CSA803 (Communication Signal Analyzer)

YANG AND LIU: FAST-SWITCHING FREQUENCY SYNTHESIZER

1451

Fig. 14. Measured output spectrum of the frequency synthesizer. (a) With span 50 MHz. (b) With span 1 MHz.

Fig. 16.

Measured waveform. (a) In time domain. (b) Jitter performance.

Fig. 17.

Measured frequency jump waveform of the frequency synthesizer.

Fig. 15. Measured output spectrum of the frequency synthesizer with added frequency doubler.

are 18.9 and 110 ps, respectively. Another important parameter is the time that the PLL takes to lock in to a new frequency when channel switches. Fig. 17 shows the switching waveforms for a frequency jump from 448 to 462 MHz from the HP53310A Modulation Domain Analyzer. Obviously, the DAPD can improve the switching speed of the PLL. The power consumption is 120 mW and the chip area is 40 2.0 mm including the pad areas.

1452

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 10, OCTOBER 2000

V. CONCLUSION In this paper, a PLL with the DAPD is implemented in a 0.35- m CMOS process. The proposed DAPD can be applied to enhance the switching speed of the PLL, but maintain better noise bandwidth. When adding the DAPD in the PLL, it will control the charge pump and loop filter and still maintain the loop stablity with the same phase margin as in the steady state. The prototype frequency synthesizer using this structure is also implement at 448 MHz, and the output waveform is 99 dBc/Hz at 100-kHz offset. By adding a frequency doubler, the synthesizer can operate at 896 MHz, and the output waveform is 91 dBc/Hz at 100-kHz offset from carrier. ACKNOWLEDGMENT The authors would like to thank the SHARP Technology Company, Japan, for the fabrication of the chip. REFERENCES [1] F. M. Gardner, Phaselock Techniques, 2nd ed. New York, NY: Wiley, 1979. [2] P. Larsson, “Reduced pull-in time of phase-locked loops using a simple nonlinear phase detector,” IEE Proc. Commun., vol. 142, no. 4, pp. 221–226, Aug. 1995. [3] D. H. Wolaver, Phase-Locked Loop Circuit Design. Englewood Cliffs, NJ: Prentice-Hall, 1991. [4] F. M. Gardner, “Charge-pump phase-locked loops,” IEEE Trans. Commun., vol. COM-28, pp. 1849–1858, Nov. 1980. [5] R. E. Best, Phase-Locked Loops: Theory, Design and Applications. New York, NY: McGraw-Hill, 1984. [6] M. V. Paemel, “Analysis of a charge-pump PLL: A new model,” IEEE Trans. Commun., vol. 42, pp. 2490–2498, July 1994. [7] C. Y. Yang, W. C. Chung, and S. I. Liu, “Effectively reduced pull-in time of PLL with nonlinear phase comparator,” in 8th VLSI/CAD Symp., Taiwan, R.O.C., Aug. 1997, pp. 205–208. [8] D. Byrd, C. Davis, and W. O. Keese, “A fast locking scheme for PLL frequency synthesizer,” National Semiconductor, Santa Clara, CA, Application Note, July 1995. [9] J. Yuan and C. Svensson, “Fast CMOS nonbinary divider and counter,” Electron. Lett., vol. 29, pp. 1222–1223, June 1993. [10] S. Kim, K. Lee, Y. Moon, D. K. Jeong, Y. Choi, and H. K. Kim, “A 960Mb/s/pin interface for skew-tolerant bus using low jitter PLL,” IEEE J. Solid-State Circuits, vol. 32, pp. 691–700, May 1997. [11] I. A. Young, J. K. Greason, and K. L. Wong, “A PLL clock generator with 5- to 110-MHz of lock range for microprocessors,” IEEE J. Solid-State Circuits, vol. 27, pp. 1599–1607, Nov. 1992.

[12] Q. Huang and R. Rogenmoser, “Speed optimization of edge-triggered CMOS circuits for gigahertz single-phase clocks,” IEEE J. Solid-State Circuits, vol. 31, pp. 456–465, Mar. 1996. [13] B. Chang, J. Park, and W. Kim, “A 1.2- GHz CMOS dual-modulus prescaler using new dynamic D-type flip-flop,” IEEE J. Solid-State Circuits, vol. 31, pp. 749–752, May 1996. [14] P. Larsson, “High-speed architecture for a programmable frequency divider and a dual-modulus prescaler,” IEEE J. Solid-State Circuits, vol. 31, pp. 744–748, May 1996. [15] C. Y. Yang, G. K. Dehng, J. M. Hsu, and S. I. Liu, “New dynamic flip-flops for high-speed dual-modulus prescaler,” IEEE J. Solid-State Circuits, vol. 33, pp. 1568–1571, Oct. 1998. [16] F. Piazza and Q. Huang, “A low-power CMOS dual-modulus prescaler for frequency synthesizer,” IEICE Trans. Electron., vol. E80-C, pp. 314–319, Feb. 1997. [17] S. J. Lee, B. Kim, and K. Lee, “A fully integrated low-noise 1-GHz frequency synthesizer design for mobile communication application,” IEEE J. Solid-State Circuits, vol. 32, pp. 760–765, May 1997. [18] D. Y. Jeong, S. H. Chae, W. C. Song, and G. H. Cho, “High-speed differential-voltage clamped current-mode ring oscillator,” Electron. Lett., vol. 33, pp. 1102–1103, June 1997.

Ching-Yuan Yang (S’97) was born in Miaoli, Taiwan, R.O.C., in 1967. He received the B.S. degree in electrical engineering from the Tatung Institute of Technology, Taipei, Taiwan, in 1990, and the M.S. and Ph.D. degrees in electrical engineering from National Taiwan University, Taipei, in 1996 and 2000, respectively. He is currently and Assistant Professor with the Department of Electronic Engineering, Huafan University, Taiwan. His research interests are in the area of integrated circuits and systems for high-speed interfaces and wireless communications.

Shen-Iuan Liu (S’88–M’93) was born in Keelung, Taiwan, R.O.C., on April 4, 1965. He received the B.S. and Ph.D. degrees in electrical engineering from National Taiwan University, Taipei, Taiwan, in 1987 and 1991, respectively. During 1991 to 1993, he served as a Second Lieutenant in the Chinese Air Force. During 1991 to 1994, he was an Associate Professor in the Department of Electronic Engineering, National Taiwan Institute of Technology. He joined the Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan in 1994 and has been a Professor since 1998. He holds nine U.S. patents and fourteen R.O.C. patents, with some pending. His research interests are in analog and digital integrated circuits and systems.

2232

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998

Low-Power Dividerless Frequency Synthesis Using Aperture Phase Detection Arvin R. Shahani, Derek K. Shaeffer, Student Member, IEEE, S. S. Mohan, Student Member, IEEE, Hirad Samavati, Student Member, IEEE, Hamid R. Rategh, Student Member, IEEE, Maria del Mar Hershenson, Student Member, IEEE, Min Xu, Student Member, IEEE, C. Patrick Yue, Student Member, IEEE, Daniel J. Eddleman, Student Member, IEEE, Mark A. Horowitz, Senior Member, IEEE, and Thomas H. Lee, Member, IEEE

Abstract—A phase-locked-loop (PLL)-based frequency synthesizer incorporating a phase detector that operates on a windowing technique eliminates the need for a frequency divider. This new loop architecture is applied to generate the 1.573-GHz local oscillator (LO) for a Global Positioning System receiver. The LO circuits in the locked mode consume only 36 mW of the total 115-mW receiver power, as a result of the power saved by eliminating the divider. The PLL’s loop bandwidth is measured to be 6 MHz, with a reference spurious level of 047 dBc. The front-end receiver, including the synthesizer, is fabricated in a 0.5-m, triple-metal, single-poly CMOS process and operates on a 2.5-V supply. Index Terms—Frequency synthesizers, Global Positioning System, phase detection, phase-locked loops, radio-frequency integrated circuits, radio receivers. Fig. 1. GPS receiver architecture.

I. INTRODUCTION

T

HE growing demand for portable, low-cost wirelesscommunication devices has spurred interest in radiofrequency integrated circuits. Part of offering a completely integrated solution involves identifying a low-power, monolithic gigahertz local oscillator (LO) implementation. A quartzcrystal-based oscillator cannot be used directly for the LO, since the fundamental modes of inexpensive quartz crystals are limited to approximately 30 MHz [1], and overtone orders of 50 are impractical. However, a crystal oscillator can be used as the reference in a static-modulus phase-locked-loop (PLL) frequency synthesizer. As is well known, the stability of the frequency-multiplied reference is retained by a wideband loop. This ability to synthesize a stable high-frequency source is beneficial, but it comes at the expense of significant power consumption. This paper addresses the power issue by introducing a new type of phase detector capable of phaselocking the synthesizer’s frequency-multiplied output to its reference input, without the use of a divider. Eliminating the need for the divider allows the synthesis of a 1.573-GHz output on only 36 mW of power in this technology. Section II examines the PLL-based LO used for the Global Positioning System (GPS) receiver architecture shown in Fig. 1 [2] and introduces the element that eliminates the need Manuscript received May 7, 1998; revised August 4, 1998. The authors are with the Center for Integrated Systems, Stanford University, Stanford, CA 94305 USA. Publisher Item Identifier S 0018-9200(98)09432-3.

N

Fig. 2. Integer-

synthesizer block diagram.

for a divider: the aperture phase detector (APD). Treatment begins at the architectural level and descends into the APD’s detailed nature. Both the theory and the implementation of an APD are covered. Section III presents experimental results on the APD PLL. II. PLL A. Architecture The conventional and widely used implementation of the PLL frequency synthesizer with static modulus is the integersynthesizer [3]. The traditional divide-by- block shown in Fig. 2 can be realized with a single counter. However, there are two drawbacks associated with the divider: power consumption and switching noise. Power consumption is large, particularly at high frequencies, because of the well-known relationship. For example, a recently published 1.6-

0018–9200/98$10.00  1998 IEEE

SHAHANI et al.: FREQUENCY SYNTHESIS USING APD

2233

(a)

Fig. 4. APD synthesizer block diagram.

(b)

(c)

Fig. 5. Idealized APD state diagram.

(d) Fig. 3. Phaselock techniques. (a) Phaselocked signals. (b) Phaselock with a divider and PFD. (c) PFD along; negative charge pump current commands the VCO to decrease its frequency, breaking phaselock. (d) Phaselock with an APD.

GHz integer- synthesizer built in a 0.6- m CMOS technology reported a total power consumption of 90 mW, of which 22.5 mW were used by the divider [4]. A further disadvantage of the divider is the on-chip interference generated by its highspeed digital transitions. This is particularly worrisome if the synthesizer is to be integrated with the front end’s sensitive low-noise amplifier. To reduce power consumption and high-frequency noise, a windowing technique that eliminates the divide-by- block for phase comparisons is investigated here. To appreciate how windowing may be of benefit, it is worthwhile to revisit the phenomenon of locking in a conventional PLL. To retain phaselock, it is necessary to align every th rising or falling edge of the voltage controlled oscillator (VCO) with a corresponding reference edge. Phaselock is demonstrated in , where every fourth rising VCO edge Fig. 3(a) for lines up with a rising reference edge. A divider with a phasefrequency detector (PFD) accomplishes edge alignment by first dividing down the VCO by the right multiple so that edge

alignment is unambiguous, as pictured in Fig. 3(b). Because the PFD compares phase over the entire reference cycle, a PFD cannot phaselock two inputs at different frequencies. In fact, it is precisely this property that makes the PFD popular. Now consider using a PFD without a divider. Clearly, there would be an edge ambiguity problem, rendering the PFD quite ineffective, as seen in Fig. 3(c). The reason is that the PFD responds to every edge of the VCO, evidenced by the charge pump current’s net negative value. This erroneously commands the VCO to decrease its frequency. However, by restricting the time interval during which phase is examined, one may eliminate the edge ambiguity, and hence the frequency divider. The dashed boxes in Fig. 3(d) define the window during which phase may be compared, even if the two inputs are of different frequency. The window can be controlled by the reference time base, since it periodically opens at that rate. Furthermore, the window need only be wide enough so that a VCO edge falls within it, which is equivalent to requiring that the window be active for a time longer than the instantaneous VCO period. No dividers are thus necessary to maintain phaselock, and this phase detector, called an APD, can operate with two inputs that are at different frequencies, as shown in Fig. 4. A more substantive description of the APD’s operation is provided in Fig. 5, which illustrates the state diagram for an idealized APD. When the window opens, the phase detector becomes active. The -input rising edge sets the (denoting “late”) terminal true, and the -input rising edge sets the

2234

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998

Fig. 7. APD PLL block diagram in lock.

where

is the reference phase and (3)

is the VCO phase and is the angular VCO frewhere quency. The average charge pump current over one reference cycle can thus be written as

N FAA.

Fig. 6. APD synthesizer block diagram with integer-

(denoting “early”) terminal true. Subsequent edges of the -input are ignored until the next window opens. The time and signals difference between the rising edges of the is proportional to the phase error between the reference phase and the VCO phase. If is set first, the VCO phase is late; and conversely, if is set first, the VCO phase is early. When the window closes, the and terminals are reset (to false). Fig. 1 shows that some type of frequency acquisition aid (FAA) is required to bring an APD-based loop initially into lock. This necessity is a consequence of restricting phase comparisons to a window, which eliminates the phase detector’s ability to perform frequency detection. This issue is discussed in further detail in Section II-D. For this work, an external acquisition aid was used for experimental purposes. An integrated implementation of the acquisition aid, Fig. 6, uses the traditional divider with PFD to lock the loop and then powers down the acquisition aid, transferring control to the low-power APD. An APD can be used once in lock because the reference is derived from a stable crystal oscillator.

(4) , giving

When the loop is in lock,

(5) is the phase-detector gain constant. Note that even where though there is no explicit divider in the loop, the VCO phase in (5), just as in a conventional loop. is divided by This model can be used in place of the APD in Fig. 4, and the other blocks in the same figure can be replaced by their corresponding linear time-invariant (LTI) models, yielding the overall system model shown in Fig. 7. Fig. 7 is an LTI representation of the APD PLL in lock, from which the phase transfer function is readily found to be (6) is the VCO gain constant and where filter’s impedance, expressed in the -domain.

B. Loop Theory Having provided an overview of APD operation, we now develop a linearized APD PLL model relating input and output phase. This model is important for quantitative loop design and ensures that the synthesized output has the desired stability and noise performance. From the description of the late and early APD signals given in the previous subsection, the average charge pump current over one reference cycle is given by (1) is the magnitude of the charge pump current, is where the time of the first VCO rising edge in the window, is the is the time of the reference rising edge in the window, and angular reference frequency. The current can be expressed as a function of the reference and VCO phases by relating these phases to and , assuming small phase errors. Expressions relating edge time to signal phase are (2)

C. APD Characteristic (

Versus

is the loop

)

The derivation in the previous subsection treats the APD for small phase errors. For completeness, it is instructive to examine the response of the APD to arbitrary phase errors. Now, the delay between the time the window opens and the time at which the reference edge occurs becomes important. This delay is designated by , which is a positive quantity , where is whose least restrictive range is limited to the reference period. However, the loop can lock if and only , where is the VCO period. if is in the interval Otherwise, the first VCO edge within the window will always precede the reference edge. From Fig. 8, it is apparent that the characteristic will be periodic in VCO phase, because when the VCO waveform has moved one VCO period to the right, the situation is identical to the start. As the VCO waveform moves to the right, the varies proportionally with phase error time difference . Therefore, to find the APD’s characteristic, and need to be calculated at only two points, and the remainder of the

SHAHANI et al.: FREQUENCY SYNTHESIS USING APD

2235

Fig. 10. APD characteristic for

d

= (Tv )=2 and

N

= 4.

Fig. 8. Position of VCO and reference edge in window.

Fig. 11. Fig. 9. APD characteristic over (2 )=N interval.

M

= 2;

N

= 7 subharmonic-lock mode.

phase error’s periodicity, with larger values increasing the periodicity. Fig. 10 shows the complete APD characteristic (a variation in ) for the specific case where and .

characteristic is generated by connecting these endpoints. and are first calculated for (7) (8) Next,

and

are calculated at the other extreme where

(9) (10) From this information, the portion of the APD characteristic shown in Fig. 9 can be constructed. The influence of two parameters— , the delay between the time the window opens and when the reference edge occurs, and , the ratio between the VCO and reference frequencies—warrants special attention. Decreasing shifts the characteristic diagonally up (along the line of the characteristic), and increasing shifts the characteristic diagonally down. It is desirable to have equal to half the synthesized frequency’s period. By designing for this condition, the APD to provide a characteristic will be centered about affects the symmetrical correction range. The parameter

D. Subharmonic-Lock Modes The existence of subharmonic-lock modes explains the need for an acquisition aid. During each window, which opens periodically at the reference rate, the APD makes a single phase comparison. It is this property that allows an APD to phaselock the VCO’s output to an integer multiple of the reference input. But the ability to examine the phase of two signals at different frequencies introduces more modes than just the desired integer-lock modes. Additional subharmoniclock modes occur if the net current delivered over multiple cycles of the reference is zero, allowing the loop to stay locked at an undesired frequency [5]. the number of reference cycles over If we designate by which the net charge delivered to the loop filter is zero, then an expression relating the reference frequency to the VCO . Fig. 11 frequency when phaselock occurs is displays the points on the APD characteristic between which , and the loop ping-pongs for the specific case where . Because , the charge pump alternates between pumping up on one cycle and pumping down on the next cycle, balancing the charge to the loop filter over two cycles.

2236

Fig. 12.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998

APD circuit diagram.

These subharmonic-lock modes are problematic because they are spaced, in frequency, closer than the neighboring integer-lock modes. However, the APD favors integer over subharmonic modes for two reasons. First, the loop’s bandwidth imposes a limit on . If the number of reference cycles over which the charge pump current averages to zero grows too large, the loop will act on partial information because the loop responds to signals averaged over a loop period. The loop period is the reciprocal of the closed-loop bandwidth. Another reason the APD favors integer over subharmonic modes is that because the subharmonic modes have a lower detector gain the VCO edge arrives at a different time in each of the cycles. If the APD characteristic is nonlinear, then the overall individual linearized detector gain is the average of the detector gains. Using an FAA to ensure frequency lock eliminates the concern of locking in a subharmonic mode. Once lock has been achieved, and control transferred to the APD, the APD is capable of maintaining lock at the desired frequency. E. APD Circuit Implementation Fig. 12 shows a circuit implementation of an APD. The reference clock (which has about a 50% duty cycle) is shaped by the structure preceding the delay to have fast falling edges, since these are the edges that enable the precharged gates. is off and is on, When the reference input is low, causing the output to be high. After the reference rises, shuts off before turns on due to the two inverter delays. does not fight to pull the output low, and Therefore, creates a fast falling edge. The window opens on this fast falling edge. The delay between the opening of the window and the reference edge is determined by two inverters with a capacitor in the middle. The APD uses two precharged gates to evaluate the reference and VCO phases. An advantage of precharged gates is that they only respond once while active. In this case, the precharged gates are precharged low, and rise on detection of low levels. Also, the precharge action does not affect the loop to first order, because the state has the same action as the state .

The behavior of this APD circuit differs somewhat from the ideal APD discussed in Section II-A. In particular, the circuit implementation responds to falling edges instead of rising edges, and more precisely, the precharged gates act as level detectors of a low voltage level instead of as edge detectors. A simulation of the APD characteristic over a interval . The phase error is is shown in Fig. 13, where plotted against the average charge pump current over one reference cycle , and includes the nonidealities of the charge pump as well. The flat section near 0.2 rad is where the signal driving the charge pump is compressing due to the level detection nature of the precharged gates. Another imperfection in this circuit’s APD characteristic is the section with finite negative slope instead of a discontinuity. From the in a state characteristic, the phase detector’s gain constant is evaluated to be 7.4 of zero static phase error A/rad. F. PLL Circuit Implementation In Section II-B, a general model for a locked APD PLL was developed, expressing the closed-loop phase transfer function in terms of the loop filter’s -domain impedance and an idealized VCO. We now provide specific expressions for the loop as actually implemented. The loop filter used is the conventional network shown in Fig. 14, whose -domain impedance is (11) A single-pole amplifier was used to interface with the VCO’s must varactor, thus the ideal VCO transfer function be modified to (12) is the 3-dB bandwidth of the VCO’s preamplifier. where Using (11) and (12) in (6) enables us to write the complete phase transfer function of the implemented APD PLL as shown in (13) at the bottom of the page. In the next section, we compare measured data to (13). III. EXPERIMENTAL RESULTS A test chip (see Fig. 15) containing a copy of the APD PLL used in the complete GPS receiver is used to evaluate the APD PLL. Two separate tests are performed; one to verify the derived closed-loop transfer function of the APD PLL and the other to observe the synthesized LO spectrum for the GPS receiver. In the second test, the synthesized LO is also checked

(13)

SHAHANI et al.: FREQUENCY SYNTHESIS USING APD

Fig. 13.

Simulated APD circuit characteristic.

Fig. 14.

Loop filter used in APD PLL.

2237

Fig. 16. PLL test setup number 1.

Fig. 15.

Die photograph of PLL on GPS receiver test chip.

with a microwave frequency counter to verify its long-term stability. Fig. 16 shows the experimental setup for the first test. Phase noise is measured for offsets from 1 kHz to 10 MHz with the HP8563E spectrum analyzer, which has special phase-noisemeasurement software. Ten MHz is used as the upper limit since the loop is designed to have a bandwidth less than 10 MHz. Beyond the loop’s bandwidth, the PLL’s phase noise is

determined by the VCO’s phase noise, making measurement of the PLL’s transfer function difficult. One of the largest factors affecting measurement accuracy is the noise floor of the instrument. To minimize this error source, measurements of the floor with a clean source are performed first. These results are later used to calibrate the data. Reference phase noise and PLL phase noise are also both measured. After some data processing, the PLL’s closed-loop phase transfer function is determined. and the predicted Fig. 17 shows the measured , from (13), for the case where the reference frequency . is 143 MHz and the VCO frequency is 1.573 GHz is The seven loop parameters in (13) are set as follows: is taken from measured VCO data; , and known;

2238

Fig. 17.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998

Measured and predicted

j

H (f ) . j

Fig. 19. LO spectrum. TABLE I MEASURED APD PLL PERFORMANCE

Fig. 18.

PLL test setup number 2.

are taken to be their designed loop filter values; is is fit. The fit value calculated from the technology data; and , 6.6 A/rad, is a little less than the simulated value noted of for an ideal in Section II-E, 7.4 A/rad. Since is due to APD, one could argue that the discrepancy in an actual pump current that is lower than the pump current is measured, it is found to used in simulations. But when is readily explained. be correct. Still, the discrepancy in The simulation in Fig. 13 establishes an upper bound on because it is measured in a state of zero static phase error. The simulated APD circuit characteristic illustrates that the detector gain (i.e., the slope) decreases the farther that one departs from zero radians. The charge pump is known to have some offset; thus, the loop has some static phase error in lock . to overcome the offset, resulting in a slightly lower Fig. 18 shows the experimental setup for the second test. The LO spectrum is measured with the HP8563E spectrum analyzer, and the frequency is checked with an HP5350B microwave frequency counter. Fig. 19 displays the synthesized output spectrum, in which the PLL’s ability to track the low close-in phase noise of the reference can be seen. The visible skirts are due to the VCO’s phase noise outside the 6-MHz bandwidth of the PLL. Spurious tones at 47 dBc are primarily due to control-line ripple resulting from charge pump leakage. In GPS applications, the measured spurious level is acceptable because of the absence of blockers at the corresponding offset frequencies. In more demanding applications, one may reduce ripple through improved charge pump design and the use of analog phase interpolation [3]. Table I provides a summary of the APD PLL’s performance.

The PLL has a wide bandwidth of 6 MHz, and the APD circuit consumes only one-quarter of the total synthesizer power. With the elimination of the divider, the main power consumer in the synthesizer is now the VCO. IV. CONCLUSION A new method for performing phase detection that eliminates the divide-by- function within a PLL has been presented. A frequency acquisition aid circuit, which can be powered down once lock is established, is required. By using an aperture phase detector, a 1.573-GHz local oscillator can be synthesized on roughly half the power of a loop containing a conventional divider. Additionally, elimination of the divider also reduces the frequency of transitions that might cause substrate and supply bounce. The power savings and noise reduction make the APD PLL an attractive design for lowpower, integrated frequency synthesizers. ACKNOWLEDGMENT The authors gratefully acknowledge Rockwell International for fabricating the receiver and Dr. C. Hull and Dr. P. Singh for their valuable assistance. In addition, the authors acknowledge Tektronix, Inc., for supplying simulation tools and E. McReynolds for his invaluable support of, and assistance with, CMOS modeling issues. Last, the authors thank IBM for generous student support through IBM fellowships.

SHAHANI et al.: FREQUENCY SYNTHESIS USING APD

REFERENCES [1] M-tron Engineering Notes, Dec. 1997. [2] D. K. Shaeffer, A. R. Shahani, S. S. Mohan, H. Samavati, H. R. Rategh, M. Hershenson, M. Xu, C. P. Yue, D. J. Eddleman, and T. H. Lee, “A 115-mW, 0.5-m CMOS GPS receiver with wide dynamic-range active filters,” IEEE J. Solid-State Circuits, vol. 33, pp. 2219–2231, Dec. 1998. [3] T. H. Lee, The Design of CMOS Radio-Frequency Integrated Circuits. Cambridge: Cambridge Univ. Press, 1998. [4] J. F. Parker and D. Ray, “A 1.6 GHz CMOS PLL with on-chip loop filter,” IEEE J. Solid-State Circuits, vol. 33, pp. 337–343, Mar. 1988. [5] F. M. Gardner, Phaselock Techniques, 2nd ed. New York: Wiley, 1979.

2239

Min Xu (S’97), for a photograph and biography, see this issue, p. 2231.

C. Patrick Yue (S’93), for a photograph and biography, see this issue, p. 2231.

Daniel J. Eddleman (S’98), for a photograph and biography, see this issue, p. 2231. Arvin R. Shahani, for a photograph and biography, see this issue, p. 2041.

Derek K. Shaeffer (S’98), for a photograph and biography, see this issue, p. 2230.

S. S. Mohan (S’98), for a photograph and biography, see this issue, p. 2231.

Hirad Samavati (S’98), for a photograph and biography, see this issue, p. 2041.

Hamid R. Rategh (S’98), for a photograph and biography, see this issue, p. 2231.

Maria del Mar Hershenson (S’98), for a photograph and biography, see this issue, p. 2231.

Mark A. Horowitz (S’77–M’78–SM’95) received the B.S. and M.S. degrees in electrical engineering from the Massachusetts Institute of Technology, Cambridge, in 1978 and the Ph.D. degree from Stanford University, Stanford, CA, in 1984. He is the Yahoo Founders Professor of Electrical Engineering and Computer Science at Stanford. His research area is in digital system design. He has led a number of processor designs including MIPS-X, one of the first processors to include an on-chip instruction cache; TORCH, a statistically scheduled, superscalar processor; and FLASH, a flexible DSM machine. He has also worked on a number of other chip design areas, including high-speed memory design, high-bandwidth interfaces, and fast floating point. In 1990, he took a leave from Stanford to help start Rambus, Inc., a company designing high-bandwidth memory interface technology. His current research includes multiprocessor design, low-power circuits, memory design, and high-speed links. Dr. Horowitz received a 1985 Presidential Young Investigator Award and an IBM Faculty Development Award, as well as the 1993 Best Paper Award from the International Solid-State Circuits Conference.

Thomas H. Lee (S’87–M’87), for a photograph and biography, see this issue, p. 2041.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

777

A 1.8-GHz Self-Calibrated Phase-Locked Loop with Precise I/Q Matching Chan-Hong Park, Student Member, IEEE, Ook Kim, Member, IEEE, and Beomsup Kim, Senior Member, IEEE

Abstract—This paper describes a 1.8-GHz self-calibrated phase-locked loop (PLL) implemented in 0.35- m CMOS technology. The PLL operates as an edge-combining type fractionalfrequency synthesizer using multiphase clock signals from a ring-type voltage-controlled oscillator (VCO). A self-calibration circuit in the PLL continuously adjusts delay mismatches among delay cells in the ring oscillator, eliminating the fractional spur commonly found in an edge-combing fractional divider due to the delay mismatches. With the calibration loop, the fractional spurs caused by the delay mismatches are reduced to 55 dBc, and the corresponding maximum phase offsets between the multiphase signals is less than 0.2 . The frequency synthesizer PLL operates from 1.7 to 1.9 GHz and the closed-loop phase noise is 105 dBc/Hz at 100-kHz offset from the carrier. The overall circuit consumes 20 mA from a 3.0-V power supply. Index Terms—Delay mismatch, fractional- frequency synthesizer, I/Q signal generation, PLL, ring oscillator, self-calibration.

I. INTRODUCTION

I

N A MODERN wireless digital data transmission system, both in-phase (I) and quadrature-phase (Q) channels are used for channel efficiency. Therefore, precise I and Q clock signals for the modulation/demodulation of both I and Q channels are needed for high-performance digital transceivers. Since any gain or phase imbalance between I and Q signals reduces the dynamic range and degrades the bit-error rate (BER) of the receivers, the accuracy of the 90 phase difference between I and Q clock signals generated from a local oscillator (LO) must be maintained as far as possible. Especially for homodyne or image rejection receivers, the effects of I/Q mismatch become more critical [1]. A lossy phase shifter utilizing resistors and capacitors [2], [3] or a quadrature generator using a frequency divider [4], [5] is widely used to derive quadrature-phase clock signals from a single-phase oscillator. However, the RC phase shifter circuit often produces phase and amplitude errors due to the mismatches of components, and the quadrature generator may suffer from phase imbalances due to the inexact input clock duty cycle. For the correction of these I/Q phase errors, some analog or digital calibration techniques have been adopted [6], [7]; however, increased phase noise or complexity caused by the added calibration system limits the performance of the system. Manuscript received June 15, 2000; revised October 15, 2000. C.-H. Park and B. Kim are with the Department of Electrical Engineering and Computer Sciences, Korea Advanced Institute of Science and Technology, Taejon 305-701, Korea (e-mail: [email protected]). O. Kim was with SK Telecom, Sungnam Kunggi-do 463-020, Korea. He is now with Silicon Image, Inc., Sunnyvale, CA 94085 USA. Publisher Item Identifier S 0018-9200(01)03031-1.

Fig. 1.

I/Q signal generation from a ring oscillator.

On the other hand, a ring oscillator may be used to produce the quadrature clock signals without such a phase shifter [8], [9]. The multiphase clock signals from a multistage ring oscillator are easily converted into the I/Q clock signals, as shown in Fig. 1. However, in practice, the mismatches between the delay cells cause significant I/Q phase errors, and an additional phase shifter is therefore required. A self-calibration technique to eliminate the delay mismatches between the delay cells in a ring oscillator is proposed in this paper. A delay calibration loop in the phase-locked loop (PLL) measures the delay mismatch in each delay cell at a time and eliminates it through an extra control line attached to the delay cells. The calibration loop automatically operates in the background and barely interferes with the main PLL loop behavior. freA prototype 1.8-GHz edge-combining fractionalquency synthesizer equipped with the calibration circuit is implemented. Fig. 2 shows the block diagram. Thanks to the calibration technique, the fractional spur caused by the mismatches is attenuated by 25 dB and the I/Q phase offset is maintained within 0.2 . The structure of the proposed fractional- frequency synthesizer is presented in Section II. Section III describes the delay mismatch problem of the edge-combining-type synthesizer PLL. Section IV shows the underlined algorithm for the proposed mismatch calibration scheme. Section V presents the circuit implementation of the self-calibrated frequency synthesizer PLL. Finally, experimental results are shown in Section VI, and conclusions are given in Section VII. II. EDGE-COMBINING FRACTIONALSYNTHESIZER

FREQUENCY

The frequency resolution of a conventional integer- frequency synthesizer is the same as the reference frequency of

0018–9200/01$10.00 © 2001 IEEE

778

Fig. 2. Block diagram of the proposed fractionalPLL with a self-calibration loop.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

N

frequency synthesizer

the synthesizer PLL. Therefore, narrow channel spacing is accompanied by a small loop bandwidth, which leads to slow dynamics [10]. In case of a fractional- frequency synthesizer, the output frequency is a fractional multiple of the reference frequency, so that narrow channel spacing is achieved along with a higher phase detector frequency. Consequently, the loop bandwidth can be widened, and faster settling time and lower close-in phase noise of the frequency synthesizer is achieved [10]. The noninteger dividing values in a fractional- synthesizer can be realized by the periodic dithering of the dividing ratio between integer values [11]. However, the dithering leads to a periodic phase error and introduces spurious tones in the output spectrum. To resolve the spurious noise problem, several methods, such as phase interpolation using a digital-to-analog (D/A) converter [12], or altering the dithering pattern using a – modulator [10], [11] have been proposed and used, but they still have limitations such as increased power dissipation and spurious noise. On the other hand, if multiphase clock signals are available, the noninteger divider is directly implemented without dithering or interpolation. Fig. 3 shows the operation of the fractionaldivider using the proposed edge-combining technique. The output of an integer frequency divider is sequentially latched by the eight-phase voltage-controlled oscillator (VCO) output signals, as shown in Fig. 3(a). As a result, a set of phase-shifted waveforms is obtained, and the amount of the phase shift is 1/8 of the VCO period. By manipulating the waveform set, a waveform whose period is a fractional multiple of the VCO period is generated. For example, if the desired fractional value is 1/8, the pulse-switching block multiplexes the delayed waveforms with following periodic sequence:

and the output period of the fractional divider becomes , as shown in Fig. 3(b). Here, the division ratio may periodically change in order to create the noninteger division ratio. For example, if the fractional ratio of 1/8 is required as shown in Fig. 3(b), the division ratio should periodically switch

Fig. 3. (a) Block diagram of the fractional divider. (b) Operation of the fractional divider when = 1.

k

Fig. 4. Edges of the multiphase signals. (a) With delay mismatches. (b) Without delay mismatches.

from to when the output pulse is multiplexed from out8 to out1. When the PLL is locked, the output frequency of the PLL is , and the synthesizer operates as a modulo-8 fractional- frequency synthesizer. The value of determines the switching sequence of the pulse-switching block. The switching sequence and division ratio are controlled by a separated control logic. III. DELAY MISMATCH PROBLEM AND FRACTIONAL SPURS Although multiphase clocks from a ring oscillator can be used to implement a fractional- frequency synthesizer, an important problem still exists: how to deal with the delay mismatches between the delay cells in the VCO. Ideally, the phase differences among the multiphase signals from a ring oscillator are precisely equal, as shown in Fig. 4(a). However, in practice, the edges of the multiphase clocks are not uniformly spaced, as shown in Fig. 4(b). The delay mismatches arise from several mismatch, device size mismatch, and so on. causes such as

PARK et al.: SELF-CALIBRATED PHASE-LOCKED LOOP WITH PRECISE I/Q MATCHING

Fig. 5.

779

s-domain model of a PLL.

Since each edge of the fractional- divider output is periodically synchronized with one of the multiphase signals from the VCO, the timing information on the delay mismatches is contained in the divider output. Therefore, when the PLL is locked, the delay mismatches introduce periodic phase errors at the input of the phase-frequency detector (PFD). Due to the periodic phase errors, fractional spurs appear in the output spectrum of the synthesizer. Also, if I and Q signals are tapped from the VCO, the I/Q phase offset will appear. Therefore, the delay mismatches must be eliminated to realize low phase-noise frequency synthesizer. The relationship between the phase errors and the fractional spurs is derived from the noise transfer function of the PLL. Fig. 5 shows the -domain model of the PLL. If only the effect of the divider noise is considered, the output phase noise becomes

(1)

Fig. 6. Simulated output spectrum of the PLL with the maximum phase offset of 2.5 . TABLE I PARAMETERS FOR SPURIOUS NOISE SIMULATION OF THE PLL

where PFD gain; loop filter transfer function; VCO gain; dividing ratio; output phase noise; phase noise generated from the fractional divider. is For the edge-combining frequency synthesizer, mainly caused by the delay mismatches between the delay cells, and is periodic because the fractional- dividing is performed by periodic combining of the phase-shifted waveforms. is a sinusoidal, the output voltage of the Assuming that PLL is presented as

and the output spectrum of the PLL appears at is given by relative power of the spurious tones

dBc

. The

(3)

Fig. 6 shows the simulated output spectrums of the frequency synthesizer. When the maximum phase offset is set to 2.5 , 30-dBc spurious tones are shown near the carrier. During the simulation, the design parameters of the PLL are selected as described in Table I. IV. MISMATCH CALIBRATION ALGORITHM

(2)

Fig. 7 shows the input waveforms of the PFD when the PLL is is locked and the fractional dividing ratio is set to 1/8. the relative phase error corresponding to the th output signal of

780

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Fig. 7. Periodic phase error at the PFD input when PLL is locked without calibration.

the VCO. By adjusting the rising phases of the corresponding outputs, the phase errors due to the delay mismatches can be eliminated. Since a PLL makes the average phase error zero in the locking mode, the sum of the individual phase error becomes zero when the PLL is locked. In other words, when the number of the multiphase clocks is eight

where is the phase error due to the th output signal after cycles of calibration, and is the amount of the calibration for the th VCO output in the th iteration. If the above iteration is performed continuously until is satisfied for all delay cells, the final value of the phase error due to the 1st VCO output becomes

(4) If the calibration circuit shifts the rising phase of the first output , the phase errors are temporarily changed to by (5)

(9)

When the PLL is locked again (6) , based on (4), and by assuming that the phase disturbance, is equally distributed to all delay cells to satisfy (6), the phase errors become

(7) If this operation is performed on each output of the VCO one by one, the phase errors are changed as shown at the bottom of the page, and this is the completion of one iteration of the calibration procedure. The resulted phase errors after first iteration are given by

(8)

Similarly (10) Consequently, all the phase errors become zero after finishing the completion of the calibration. Note that the calibration algorithm performs correctly even if the amount of the individual phase correction is different and even if the order of the calibration is changed. Fig. 8 shows the trend of phase error during the calibration, simulated by MATLAB. Here, maximum 10-mV mismatches are assumed. V. CIRCUIT IMPLEMENTATION A. Overall Structure As shown in Fig. 2, a loop for the calibration is combined with the main fractional- synthesizer. The calibration loop periodically measures the phase error due to delay mismatch at the PFD, and compensates for the mismatches by updating the offset control voltage of delay cells one after another. This update operation must be performed only when the PLL is locked,

initially: 1st step: 2nd step: .. . 8th step:

.. .

.. .

.. .

PARK et al.: SELF-CALIBRATED PHASE-LOCKED LOOP WITH PRECISE I/Q MATCHING

Fig. 8. Simulated behavior of phase error during the compensation.

Fig. 10.

781

Detail structure of the self-calibration loop.

one of the offset control signals is updated. The signal is periodically asserted, and the period is much longer than the locking time of the PLL. The output of the charge pump is sequentially connected to the one of the offset-control nodes in the VCO. Since the sequence of the calibration is identical to the pulse-combining sequence in the fractional- divider, the measured phase error affects only the corresponding VCO output. When the fractional dividing ratio is zero, the frequency synthesizer operates as an integer- type, and only one output of the VCO is used for phase comparison. The digital logic controls the sequence of the signal updating through the switches . in the capacitor array, VI. MEASUREMENTS Fig. 9. Schematic of the delay cell having offset control capability.

because the phase error due to the mismatches can be accurately measured only in the locking mode. If the calibration interval is shorter than the lock-in time of the main loop, the locking behavior of the main loop becomes disturbed and even unstable. Therefore, it is important to make not only the calibration interval long enough but also the amount of phase change, , generated by the individual calibrating operation, small enough to make sure that the main loop quickly responds. In this work, the loop gain of the calibration loop is chosen to be 1/10 of the main-loop loop gain. The updated offset control signals are maintained until the next update by a capacitor array connected to each delay cell. The delay cell used in this work is of the same type as the low-noise delay cell used in [13]. However, to control the rising phase of each output, four transistors are added, is low, the rising of as shown in Fig. 9. For example, if is pulled earlier. B. Calibration Loop Fig. 10 shows the circuit for the mismatch calibration. The calibration circuit consists of a PFD shared with the main loop, an additional charge pump, and a capacitor array. When is high, the PFD output signal is driven to the charge pump, and

A self-calibrated fractional- frequency synthesizer PLL has been fabricated in 0.35- m CMOS technology. The microphotograph of the fabricated chip is shown in Fig. 11, and its active mm . Both frequency synthesizers with area is about and without a calibration loop have been integrated in the same chip to demonstrate the proper operation of the mismatch calibration scheme. In both cases, an external 25-MHz crystal oscillator is used as a reference clock. The bandwidth of the PLL is set to 1 MHz. Fig. 12 shows the measured output spectrum of the fractional- RF synthesizer. Fig. 12(a) is the output spectrum of the frequency synthesizer without a calibration loop. In this figure, the fractional dividing ratio is set to 1/8. Without calibration circuit, 30-dBc spurious noise appears at 3.125 ( 25/8) MHz offset from the carrier frequency. In this case, the maximum phase offset is estimated as about 2.5 by the equations in Section II. On the other hand, in the output spectrum of the self-calibrated frequency synthesizer, the power of the spurious tones is attenuated to 55 dB, as shown in Fig. 12(b), and the calculated maximum phase offset is less than 0.2 . Initial settling of the calibration loop takes about 5.0 ms. Fig. 13 shows the measured phase noise of the frequency synthesizer. The closed-loop phase noise at 100-kHz offset from the 1.8-GHz carrier is 105 dBc/Hz. Table II summarizes the measured characteristics of the PLL.

782

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Fig. 11.

Microphotograph of the self-calibrated PLL.

Fig. 12.

Output spectrum of the synthesizer PLL. (a) Without calibration. (b) With calibration.

VII. CONCLUSION A self-calibrated 1.8-GHz PLL for fractional- frequency synthesizing is fabricated in a 0.35- m CMOS process. A ring-type oscillator is used to generate the multiphase signals, and a self-calibration loop reduces the output fractional spurs caused by delay mismatches between the delay cells. The phase offset of the I/Q signals from the ring oscillator is also relieved. With this calibration scheme, the fractional spur on the PLL is attenuated by 25 dB and the maximum phase offset is thereby reduced to less than 0.2 . REFERENCES Fig. 13.

Measured phase noise of the PLL.

TABLE II PERFORMANCE SUMMARY OF THE SELF-CALIBRATED PLL

[1] B. Razavi, RF Microelectronics. Englewood Cliffs, NJ: Prentice Hall, 1998. [2] C. D. Hull, J. L. Tham, and R. R. Chu, “A direct-conversion receiver for 900-MHz (ISM band) spread-spectrum digital cordless telephone,” IEEE J. Solid-State Circuits, vol. 31, pp. 1955–1963, Dec. 1996. [3] M. Steyaert, M. Borremans, J. Janssens, B. D. Muer, N. Itoh, J. Craninckx, J. Crols, E. Morijuji, H. S. Momose, W. Sansen, T. Yamaji, H. Tanimoto, and H. Kokatsu, “A single-chip CMOS transceiver for DCS-1800 wireless communications,” in ISSCC Dig. Tech. Papers, San Francisco, CA, Feb. 1998, pp. 48–49. [4] A. Montalvo, A. Holden, W. Suter, C. Angell, S. White, N. Klemmer, and D. Homol, “A 22-mW NADC receiver IF Chip with integrated second IF channel filtering,” in ISSCC Dig. Tech. Papers, San Francisco, CA, Feb. 1999, pp. 48–49. [5] J. L. Tham, M. A. Margarit, B. Pregardier, C. D. Hull, R. Magoon, and F. Carr, “A 2.7-V 900-MHz/1.9-GHz dual-band transceiver IC for digital wireless communication,” IEEE J. Solid-State Circuits, vol. 34, pp. 286–291, Mar. 1999.

PARK et al.: SELF-CALIBRATED PHASE-LOCKED LOOP WITH PRECISE I/Q MATCHING

[6] B. Razavi, “Design considerations for direct-conversion receivers,” IEEE Trans. Circuits Syst. II, vol. 44, pp. 428–435, June 1997. [7] L. Yu and W. M. Snelgrove, “A novel adaptive mismatch cancellation system for quadrature IF radio receivers,” IEEE Trans. Circuits Syst. II, vol. 46, pp. 789–801, June 1999. [8] Y. Sugimoto and T. Ueno, “The design of a 1-V 1-GHz CMOS VCO circuit with in-phase and quadrature-phase outputs,” in Proc. Int. Symp. Circuits and Systems, Hong Kong, June 1997, pp. 269–272. [9] A. A. Abidi, “Direct-conversion radio transceivers for digital communications,” IEEE J. Solid-State Circuits, vol. 30, pp. 1399–1410, Dec. 1995. [10] T. A. D. Riley, M. A. Copeland, and T. A. Kwasniewski, “Delta–sigma modulation in fractional-N frequency synthesis,” IEEE J. Solid-State Circuits, vol. 28, pp. 553–559, May 1993. [11] M. H. Perrot, “Techniques for high data rate modulation and low power operations of fractional-N frequency synthesizers,” Ph.D. dissertation, Mass. Inst. of Technol., Cambridge, MA, 1997. [12] U. L. Rohde, Digital PLL Frequency Synthesizers. Englewood Cliffs, NJ: Prentice Hall, 1983. [13] C.-H. Park and B. Kim, “A low-noise 900-MHz VCO in 0.6-m CMOS,” IEEE J. Solid-State Circuits, vol. 34, pp. 586–591, May 1999.

Chan-Hong Park (S’92) received B.S. and M.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea, in 1994 and 1996, respectively. He is currently working toward the Ph.D. degree in electrical engineering at KAIST. From 1994, he has been with the Department of Electrical Engineering, KAIST, as a Graduate Researcher, where he has been involved in designing 100Base-T transceiver ICs, low-noise phase-locked loops, and RF front-ends for wireless communications. His research interests include CMOS RF circuits for wireless communication, high-frequency analog IC design, and mixed-mode signal-processing IC design.

783

Ook Kim (M’86) received the M.S. and Ph.D. degrees in electronics engineering from Seoul National University, Seoul, Korea, in 1988 and 1994, respectively. He was with the Electronics and Telecommunications Research Institute, Taejon, Korea, from 1994 to 1998, and with SK Telecom, Seoul, Korea, from 1998 to 1999. Since 1999, he has been with Silicon Image Inc., Sunnyvale, CA. He was a Visiting Researcher at the Department of Electrical and Electronic Engineering, Adelaide University, Adelaide, Australia, during 1992, and a Visiting Scholar at the Department of Electrical Engineering, Stanford University, Stanford, CA, during 1999. His research interests are in CMOS mixed mode circuit design, high-speed data conversion, wireless circuit technology, and high-speed data communication.

Beomsup Kim (S’87–M’90–SM’95) received the B.S. and M.S. degrees in electronic engineering from Seoul National University, Seoul, Korea, in 1983 and 1985, respectively, and the Ph.D. degree in electrical engineering and computer sciences from the University of California, Berkeley, in 1990. He worked as a Graduate Researcher and Graduate Instructor at the Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, from 1986 to 1990. From 1990 to 1991, he was with Chips and Technologies, Inc., San Jose, CA, where he was involved in designing high-speed signal processing ICs for disk drive read/write channel. From 1991 to 1993, he was with Philips Research, Palo Alto, CA, conducting research on digital signal processing for video, wireless communication, and disk drive applications. During 1994, he was a Consultant, developing the partial response maximum likelihood detection scheme of the disk drive read/write channel. In 1994, he became an Assistant Professor with the Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea, and is currently an Associate Professor. During 1999, he took sabbatical leave at Stanford University, Stanford, CA, and at the same time, consulted for Marvell Semiconductor Inc., San Jose, on the Gigabit Ethernet and wireless LAN DSP architecture. His research interests include mixed-mode signal processing IC design for telecommunication, disk drive, and LAN, high-speed analog IC design, and VLSI system design. Dr. Kim is a corecipient of the Best Paper Award for 1990–1991 from the IEEE JOURNAL OF SOLID-STATE CIRCUITS. He received the Philips Employee Reward in 1992. Between June 1993 and June 1995, he served as an Associate Editor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II.

788

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

A 2.6-GHz/5.2-GHz Frequency Synthesizer in 0.4-m CMOS Technology Christopher Lam and Behzad Razavi, Member, IEEE

Abstract—This paper describes the design of a CMOS frequency synthesizer targeting wireless local-area network applications in the 5-GHz range. Based on an integer- architecture, the synthesizer produces a 5.2-GHz output as well as the quadrature phases of a 2.6-GHz carrier. Fabricated in a 0.4- m digital CMOS technology, the circuit provides a channel spacing of 23.5 MHz at 5.2 GHz while exhibiting a phase noise of 115 dBc/Hz at 2.6 GHz and 100 dBc/Hz at 5.2 GHz (both at 10-MHz offset). The reference sidebands are at 53 dBc at 2.6 GHz, and the power dissipation from a 2.6-V supply is 47 mW. Index Terms—Frequency dividers, oscillators, phase-locked loops, RF circuits, synthesizers, wireless transceivers. Fig. 1.

Transceiver architecture.

I. INTRODUCTION

W

IRELESS local area networks (WLAN’s) provide great flexibility in the communication infrastructure of environments such as hospitals, factories, and large office buildings. While WLAN standards in the 2.4-GHz range have recently emerged in the market, the data rates supported by such systems are limited to a few megabits per second. By contrast, a number of standards have been defined in the 5-GHz range that allow data rates greater than 20 Mb/s, offering attractive solutions for real-time imaging, multimedia, and high-speed video applications. One of these standards is high-performance radio LAN (HIPERLAN) [1]. HIPERLAN operates across 5.15–5.30 GHz and provides a channel bandwidth of 23.5 MHz with Gaussian minimum shift keying (GMSK) modulation. The receiver sensitivity must exceed 70 dBm. This paper presents the design of a frequency synthesizer for 5-GHz WLAN applications. To target realistic specifications, HIPERLAN is chosen as the framework. Employing an integer- architecture, the circuit generates a 5.2-GHz output for the transmit path and the quadrature phases of a 2.6-GHz carrier for the receive path. Realized in a 0.4- m CMOS technology, the synthesizer provides a channel spacing of 23.5 MHz while dissipating 47 mW from a 2.6-V supply. The phase noise at 10-MHz offset is equal to 115 dBc/Hz at 2.6 GHz and 100 dBc/Hz at 5.2 GHz. Section II of this paper describes the synthesizer environment and general issues, and Section III introduces the synthesizer architecture. Section IV presents the design of each building block, and Section V summarizes the experimental results.

Manuscript received July 30, 1999; revised December 1, 1999. The authors are with the Department of Electrical Engineering, University of California, Los Angeles, CA 90095 USA (e-mail: [email protected]). Publisher Item Identifier S 0018-9200(00)02987-5.

II. SYNTHESIZER ENVIRONMENT The design of a 5-GHz synthesizer in a 0.4- m CMOS technology presents many difficulties at both the architecture and the circuit levels. The high center frequency of the voltage-controlled oscillator (VCO), the poor quality of inductors due to skin effect and substrate loss, the limited tuning range, the nonlinearity of the VCO input/output characteristic, the high speed required of the feedback divider, the mismatches in the charge pump, and the implementation of the loop filter are among the issues encountered in this design. A 0.4- m-long NMOS transistor in this technology achieves of less than 15 GHz with a gate–source overdrive voltage an of about 400 mV, a typical value in this design. Also, a 5-nH inductor exhibits a self-resonance frequency of of 5 at this frequency, indicating that skin 6.5 GHz and a effect and substrate loss are much more significant at 5.2 GHz than at 2.6 GHz. The technology offers no high-density linear capacitors, creating difficulty in the design of the loop filter. The foregoing limitations make it necessary that the transceiver and the synthesizer be designed concurrently so as to relax some of the synthesizer requirements. Fig. 1 shows the transceiver architecture and its interface with the synthesizer. The receive path consists of two downconversion stages, each using a local oscillator (LO) frequency of 2.6 GHz, and the transmit path modulates the VCO by the Gaussian-filtered baseband data, producing a GMSK output. An important feature of this architecture is that the synthesizer is shared between the transmitter and the receiver, reducing the system complexity substantially. This is possible because HIPERLAN incorporates time-division duplexing (TDD). Also, the transceiver requires the generation of the quadrature phases of the 2.6-GHz carrier rather than the 5.2-GHz output, a task readily accomplished by the synthesizer itself.

0018–9200/00$10.00 © 2000 IEEE

LAM AND RAZAVI: 2.6-GHz/5.2-GHz FREQUENCY SYNTHESIZER

Fig. 2. Synthesizer architecture.

Fig. 3. Position of reference sidebands.

III. SYNTHESIZER ARCHITECTURE The synthesizer is based on an integer- phase-locked loop architecture (Fig. 2). The feedback divider senses the 2.6-GHz output because it is not possible to design a dual-modulus divider in 0.4- m CMOS technology that operates at 5.2 GHz reliably. Controlled by the digital channel-select input, the 220–225 circuit generates frequency steps of MHz in the 2.6-GHz band and 23.5 MHz in the 5.2-GHz band. A critical issue in the architecture of Fig. 2 is the nonlinearity of the VCO characteristic, i.e., the variation of the VCO gain, , with the control voltage . This effect manifests itself in the loop settling behavior as well as the magnitude of the phase noise and reference sidebands at the output. The problem is partially resolved through the use of a correction circuit that adjusts the charge-pump current according to the value of [2]. An interesting property of the architecture of Fig. 2 is the position of the reference spurs with respect to the main carrier. Since the reference frequency is half the channel spacing, such spurs fall at the edge of the channel rather than at the center of the adjacent channel for both 2.6- and 5.2-GHz outputs (Fig. 3). Since the interference energy received by the antenna is small at the edge, the maximum allowable magnitude of the spurs can be quite higher than if the reference frequency were equal to 23.5 MHz. IV. BUILDING BLOCKS A. VCO The VCO core is based on two 2.6-GHz coupled oscillators operating in quadrature, as shown in Fig. 4(a) [3], [4]. The fully differential topology of each oscillator raises the possibility of sensing the common-source nodes A, B, C, or D as the 5.2-GHz output. In fact, since the 2.6-GHz oscillators operate in quadrature, the waveforms at A and B (or C and D) are 180 out of phase, thereby serving as a differential output at 5.2 GHz. With

789

proper choice of device dimensions and bias current, a differential swing of 0.5 V can be achieved at this port. Note that if a frequency doubler were used, the output would be single-ended and difficult to convert to differential form at such a high frequency. The tuning of the oscillator poses several difficulties: the varactor diode must exhibit a small series resistance and remain reverse-biased even with large swings in the oscillator, and the varactor capacitance must be large enough to yield the required tuning range, but at the cost of increasing the power dissipation or the phase noise. This design incorporates a p -n diode inside an n-well and strapped with metal to reduce the n-well series resistance [4]. Such a structure suffers from a large parasitic n-well/substrate capacitance, making it desirable to connect the anode of the diode to the oscillator. This is accomplished as illustrated in Fig. 4(b), where only one of the two oscillators is shown for clarity. Here, the control voltage varies the dc poten. tial at nodes and by varying the on-resistance of leads However, the sharp variation of the on-resistance of to significant change in the gain of the VCO. To make the tran, in series with a resistor sition smoother, another transistor, serves as a clamp, is added as shown in Fig. 4(c). Transistor keeping the tail current source in saturation. Otherwise, the oscillator may turn off during synthesizer loop transients. Since the minimum voltage at node is only a few hundred millivolts above ground, an NMOS differential pair cannot directly sense the 5.2-GHz signal at this node. Instead, a is constant, common-gate stage is used [Fig. 4(d)]. But if turns off for low values of . Modifying the circuit then as shown in Fig. 4(e) ensures that the common-gate stage carries a constant bias current across the full tuning range. The choice of the inductors and capacitance of the varactors entails a compromise between the phase noise and the tuning range. In this design, 7-nH inductors are used, each contributing a parasitic capacitance of 120 fF. The cross-coupled transistors are relatively wide to ensure startup, yielding approximately 175 fF of gate-source capacitance. The differential pairs coupling the oscillators also load the tank. As a result, the varactor capacitance for 2.6-GHz operation must not exceed 160 fF. The inductors are realized as stacked spirals [5] made of metal 4 and metal 3 with a width of 6 m. Since the tuning range is inevitably narrow, it is critical to predict the oscillation frequency accurately. A distributed model is used for each inductor, yielding an error of only a few percent in the measured frequency of oscillation. B. Frequency Divider The design of a 2.6-GHz programmable divider with a reasonable power dissipation in 0.4- m CMOS technology is quite difficult. A number of circuit techniques are introduced in this work to ameliorate the power–speed tradeoff. The divider is based on a pulse-swallow topology. Shown in Fig. 5(a) is a conventional implementation, consisting of a dualmodulus prescaler, a fixed-ratio program counter, and a programmable swallow counter. The RS latch is typically included in the swallow counter and is drawn explicitly here for clarity. 1 until The prescaler begins the operation by dividing by the swallow counter is full. The RS latch is then set, changing

790

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

Fig. 4. Evolution of the VCO topology.

the prescaler modulus to and disabling the swallow counter. The division continues until the program counter is full and the RS latch is reset. The overall divide ratio is therefore equal to . The pulse-swallow divider used in this work is shown in Fig. 5(b). Here, the RS latch is followed by a D flip-flop to allow pipelining of the prescaler modulus control signal. This modification is justified below. The overall divide ratio is now 1. A critical decision in the design of the equal to divider is the choice between low-swing current-steering logic and rail-to-rail CMOS logic. Simulations of the circuit with various values of , , and indicate that the minimum power dissipation occurs if the prescaler incorporates current steering, its output is converted to rail-to-rail swings, and the remainder of the circuit incorporates standard dynamic and static CMOS logic. The use of current steering in the prescaler also obviates the need for large oscillator swings, saving power in the VCO buffer. The design of the 8/9 prescaler for 2.6-GHz operation presents a great challenge. Shown in Fig. 6, the prescaler consists of a synchronous 2/3 circuit and two asynchronous

2 circuits. In a conventional 2/3 realization [Fig. 7(a)], flip-flop FF is loaded by an OR gate, whereas FF is loaded by FF , an AND gate, and an output buffer. Since FF limits the speed, the fanout of three inherent to this topology translates to substantial power dissipation. Furthermore, if the divider is implemented by current-steering circuits, the AND gate requires stacked logic and hence level-shift source followers. Both of these issues intensify the power–speed tradeoff. The 2/3 circuit used in this work is shown in Fig. 7(b). Here, FF is loaded by a NOR gate and FF by a NOR gate and a buffer. Simulations indicate that the reduction of the load capacitance of FF increases the maximum operating speed by approximately 40%. The NOR/flip-flop combination is realized as depicted in Fig. 8. The resistors are made of n-well, and the bias voltage is generated to fall midway between the high and low levels of inputs and . The output of the prescaler drives a differential to single-ended converter, producing rail-to-rail swings for the remainder of the divider. The divider of Fig. 5 incorporates pipelining for the prescaler modulus control, thereby relaxing the minimum delay require-

LAM AND RAZAVI: 2.6-GHz/5.2-GHz FREQUENCY SYNTHESIZER

791

Fig. 7. Divide-by-2/3 circuit: (a) conventional topology and (b) circuit used in this work.

Fig. 5. Pulse swallow divider. (a) Conventional topology. (b) Addition of pipelining in the prescaler modulus control path.

Fig. 8. Implementation of NOR/flip-flop combination.

Fig. 6.

Prescaler.

ment in this path. Fig. 9 illustrates the issue. When the 9 operation of the prescaler is finished, the circuit would have at most to change the modulus to eight. In this parseven cycles of ticular prescaler, the timing budget is actually about five input cycles—approximately 1.9 ns. Thus, with no pipelining, the last pulse generated by the prescaler in the 9 mode must propagate through the level converter, the first 2 stage in the swallow counter, the subsequent logic, the RS latch, and the three-input NOR gate in less than 1.9 ns. Such a delay constraint necessitates the use of current steering in this path, raising the power dissipation and complicating the design. With pipelining, on the other hand, the maximum tolerable delay increases to about eight input cycles—approximately 3.1 ns.

Fig. 9. Pipelining in the prescaler modulus control path.

C. Charge Pump and Loop Filter Fig. 10 shows the charge pump [6] and the loop filter. Here, and —rather than and —operate as switches. Thus, the problem of transistor charge injection and clock feedthrough to the output is somewhat alleviated. In addition to these errors, up and down currents produced by the charge pump may also create ripple on the control voltage. Since in and turn on at every phase comparlocked condition, ison instant, any mismatch between their magnitudes, duration, or absolute timing results in a net current that is drawn from the loop filter.

Fig. 10.

Charge pump and loop filter.

To appreciate the significance of these effects, let us consider some typical values in this design. If the reference sidebands are GHz/V to be 50 dB below the carrier, then with MHz, the ripple amplitude must not exceed and

792

Fig. 11.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

(a) Addition of correction circuit to charge pump. (b) Simple folding circuit. (c) Folding circuit with one reference voltage.

75 V.1 This indicates that great attention must be paid to the design of the phase/frequency, the charge pump, and the loop filter so as to minimize the above errors. Another source of ripple in the control voltage is the low and in Fig. 10, especially as output impedance of reaches within a few hundred millivolts of the rails. This effect creates additional mismatch between the up and down cur, potentially leading to larger referrents as a function of ence sidebands near the ends of the tuning range. Transistors and degenerate and , respectively, alleviating this issue (another advantage of this topology over the standard charge-pump configuration). in the circuit of Fig. 10 to suppress the The addition of ripple potentially degrades the stability of the loop. Simulations , the settling time increases neglisuggest that for pF, pF, and k . gibly. In this design, The two capacitors can be realized by either MOSFET’s or polymetal sandwiches, a choice determined by the control voltage must aprange. To achieve the maximum tuning range, proach the supply and ground rails, demanding a reasonable capacitor linearity across this range. MOS capacitors, however, exhibit substantial change as their gate-source voltage falls below the threshold. Even a parallel combination of an NMOS capacitor (connected to ground) and a PMOS capacitor (connected to ) suffers from a two-fold variation as goes from zero 1The ripple is approximated by a sinusoid here. In a more rigorous method, the ripple can be expressed as a Fourier series [7].

to . For this reason, and are formed as poly-metal sandwiches (albeit with much less density than MOS capacitors). Another issue in the design of the loop filter of Fig. 10 relates . Low-pass filtered by to the thermal noise produced by and , this noise modulates the VCO, raising the output phase noise. The thermal noise on the control voltage per unit bandwidth is given by (1)

denotes the noise density of . From the where narrow-band frequency modulation theory [8], we know that and frequency if a sinusoid with a peak amplitude modulates a VCO, the output sidebands fall at rad/s below and above the carrier frequency and exhibit a peak amplitude . Approximating the noise per unit bandof width in (1) by a sinusoid, we obtain the output relative phase as noise per unit bandwidth at an offset frequency

(2)

LAM AND RAZAVI: 2.6-GHz/5.2-GHz FREQUENCY SYNTHESIZER

Fig. 12.

793

Die photograph.

With the values chosen in this design, the output phase noise reaches 138 dBc/Hz at 10-MHz offset for GHz/V. While it is desirable to reduce the value of , the releads to a severe area penalty because of quired increase in the low density of the poly-metal capacitors. Note that since the , if is, stability factor must be quadrupled to maintain constant say, halved, then (for a given charge-pump current). D. Correction Circuit The gain of the VCO varies substantially across the tuning range, resulting in considerable change in the settling behavior. As depicted in Fig. 11(a), it is desirable to vary the charge-pump , such that the product of and and current, hence remain relatively constant. Rather than use piecewise linearization [2], this work incorporates an analog folding techand nique. Fig. 11(b) shows a possible solution. Here are off if is well below 1.1 V and hence . As approaches 1.1 V, turns on while is off. Thus, drops, carries most of and a negreaching a low value as approaches and exceeds 1.3 V, turns ligible current. As eventually returns to . This design actually ution and lizes the topology shown in Fig. 11(c), where only one reference voltage is required and each differential pair provides a built-in offset by virtue of skewed device dimensions. The characteristic driving the curis similar to that shown for Fig. 11(b), with rent mirrors in the charge pump. The reference voltage of 1.2 V in Fig. 11(c) assumes that V. the gain of the VCO reaches its maximum at This value is somewhat process- and temperature-dependent, limiting (according to simulations) the suppression of the VCO nonlinearity to about one order of magnitude.

Fig. 13. Measured spectra at 2.6 and 5.2 GHz in locked condition.

Fig. 14.

Measured spectrum at 2.6 GHz.

Fig. 15.

Setup for settling time measurement.

V. EXPERIMENTAL RESULTS The frequency synthesizer has been fabricated in a 0.4- m digital CMOS technology. All of the inductors and capacitors are included on the chip. Fig. 12 is a photograph of the die, which measures 1.75 1.15 mm . The circuit has been tested with a 2.6-V supply. Figs. 13(a) and (b) depict the output spectra in the locked condition. The phase noise at 10-MHz offset is equal to 115 dBc/Hz at 2.5 GHz and 100 dBc/Hz at 5.2 GHz. A significant part of the phase noise at 5.2 GHz is attributed to the considerable loss of the output 50- buffer. Fig. 14 shows the 2.6-GHz output along with the reference sidebands. The sidebands are

approximately 53 dB below the carrier. For the 5.2-GHz output, the sidebands are buried under the noise floor. The settling behavior of the synthesizer has also been studied. Fig. 15 illustrates the setup, where the modulus of the feedback

794

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

Designing a multigigahertz synthesizer in 0.4- m CMOS technology necessitates circuit techniques such as: 1) a quadrature VCO with inherent frequency doubling, 2) a dual-modulus divider with equalized fanout, 3) pipelining in pulse-swallow counters, and 4) use of folding stages to compensate for nonlinearity in the VCO characteristic. REFERENCES [1] “Radio equipment and systems (RES); High performance radio local area network (HIPERLAN); Functional specification,” ETSI, Sophia Antipolis, France, ETSI TC-RES, July 1995. [2] J. Craninckx and M. S. J. Steyaert, “A fully integrated CMOS DCS-1800 frequency synthesizer,” IEEE J. Solid-State Circuits, vol. 33, pp. 2054–2065, Dec. 1998. [3] A. Rofougaran et al., “A 900-MHz CMOS LC oscillator with quadrature outputs,” in ISSCC Dig. Tech. Papers, Feb. 1996, pp. 392–393. [4] B. Razavi, “A 1.8 GHz CMOS voltage-controlled oscillator,” in ISSCC Dig. Tech. Papers, Feb. 1997, pp. 388–389. inductors for multi-level [5] R. B. Merril et al., “Optimization of high metal CMOS,” in Proc. IEDM, Dec. 1995, pp. 38.7.1–38.7.4. [6] J. Alvarez, H. Sanchez, and G. Gerosa, “A wide-band low-voltage PLL for PowerPC microprocessors,” IEEE J. Solid-State Circuits, vol. 30, pp. 383–391, Apr. 1995. [7] B. Razavi, RF Microelectronics. Upper Saddle River, NJ: PrenticeHall, 1998. [8] L. W. Couch, Digital and Analog Communication Systems, 4th ed. New York: Macmillan, 1993.

Q

Fig. 16.

Control voltage during loop settling.

TABLE I SYNTHESIZER PERFORMANCE

Christopher Lam received the B.Sc. and M.Sc. degrees in electrical engineering from the University of California, Los Angeles, in 1997 and 1999, respectively. He is currently with the Wireless Communication Group, National Semiconductor, Santa Clara, CA. His interests include phase-locked loops and communication circuits.

divider is switched periodically and the control voltage is monitored. The 0.8-pF capacitor results from the trace on the printed circuit board, and the active probe presents an input capacitance pF and pF, the addition of these of 2 pF. Since parasitics markedly degrades the stability. Therefore, a 100-k resistor is placed in series with the active probe to mimic the and . The low-pass filter thus formed has a corner role of frequency comparable to the loop bandwidth, and the 0.8-pF capacitor still produces ringing in the time response. Fig. 16 shows the measured control voltage, indicating a settling time on the order of 40 s. Table I summarizes the measured performance of the synthesizer. VI. CONCLUSION The speed and quality of the devices available in an IC technology directly affect the choice of transceiver architectures, synthesizer topologies, and circuit configurations. In order to optimize the overall system performance, the transceiver and the synthesizer must be designed concurrently, with particular attention to the frequency planning.

Behzad Razavi (S’87–M’90) received the B.Sc. degree from Sharif University of Technology, Tehran, Iran, in 1985 and the M.Sc. and Ph.D. degrees from Stanford University, Stanford, CA, in 1988 and 1992, respectively, all in electrical engineering. He was with AT&T Bell Laboratories, Holmdel, NJ, and subsequently Hewlett-Packard Laboratories, Palo Alto, CA. Since September 1996, he has been an Associate Professor of electrical engineering at the University of California, Los Angeles. His current research includes wireless transceivers, frequency synthesizers, phase-locking and clock recovery for high-speed data communications, and data converters. He was an Adjunct Professor at Princeton University, Princeton, NJ, from 1992 to 1994, and at Stanford University in 1995. He is a member of the Technical Program Committees of the Symposium on VLSI Circuits and the International Solid-State Circuits Conference (ISSCC), in which he is Chair of the Analog Subcommittee. He is the author of Principles of Data Conversion System Design (New York: IEEE Press, 1995), RF Microelectronics (Englewood Cliffs, NJ: Prentice-Hall, 1998), and Design of Analog CMOS Integrated Circuits (New York: McGraw-Hill, 2000), and the editor of Monolithic Phase-Locked Loops and Clock Recovery Circuits (New York: IEEE Press, 1996). Dr. Razavi received the Beatrice Winner Award for Editorial Excellence at the 1994 ISSCC, the Best Paper Award at the 1994 European Solid-State Circuits Conference, the Best Panel Award at the 1995 and 1997 ISSCC, the TRW Innovative Teaching Award in 1997, and the Best Paper Award at the IEEE Custom Integrated Circuits Conference in 1998. He has also served as Guest Editor and Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS and IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

835

16

A CMOS Monolithic -Controlled Fractional-N Frequency Synthesizer for DCS-1800 Bram De Muer, Student Member, IEEE, and Michel S. J. Steyaert, Senior Member, IEEE

16

Abstract—A monolithic 1.8-GHz -controlled fractionalphase-locked loop (PLL) frequency synthesizer is implemented in a standard 0.25- m CMOS technology. The monolithic fourth-order type-II PLL integrates the digital synthesizer part together with a fully integrated LC VCO, a high-speed prescaler, and a 35-kHz 2 mm2 . To investigate dual-path loop filter on a die of only 2 the influence of the modulator on the synthesizer’s spectral purity, a fast nonlinear analysis method is developed and experimentally verified. Nonlinear mixing in the phase-frequency detector (PFD) is identified as the main source of spectral pollution in fractional- synthesizers. The design of the zero-dead zone PFD and the dual charge pump is optimized toward linearity and spurious suppression. The frequency synthesizer consumes 35 mA from a single 2-V power supply. The measured phase noise is as low as 120 dBc/Hz at 600 kHz and 139 dBc/Hz at 3 MHz. The measured fractional spur level is less than 100 dBc, even for fractional frequencies close to integer multiples of the reference frequency, thereby satisfying the DCS-1800 spectral purity constraints.

16

16

16

Index Terms—CMOS RF integrated circuits, modulator, fractional- frequency synthesis, phase-locked loop, phase noise.

I. INTRODUCTION

T

HE END of the 20th century was characterized by the unrivaled growth of the telecommunication industry. The main cause was the introduction of digital signal processing in wireless communications, driven by the development of high-performance low-cost CMOS technologies for VLSI. However, the implementation of the RF analog front end remains the bottleneck. This is reflected in the large effort put into monolithic CMOS integration of RF circuits both by academics and industry [1]–[3]. The goal of this work is the monolithic integration in standard CMOS technology of a frequency synthesizer to enable the full integration of a transceiver front end in CMOS, including a low-IF receiver and a direct upconversion transmitter [1]. To achieve a high degree of integratability and fast settling under fractional- synthesizer topology low-noise constraints, a fractional- synthesis circumhas been chosen [4] (Fig. 1). vents the severe speed–spectral purity–resolution tradeoff of the classic phase-locked loop (PLL) synthesizer, by providing synthesis of fractional multiples of the reference frequency. Spurious tones that emerge from the fractional division are whitened action and ultimately filtered by and noise shaped by the the loop filter. To prevent degradation of the spectral purity by Manuscript received November 5, 2001; revised January 31, 2002. The authors are with the Katholieke Universiteit Leuven, Department Elektrotechniek, ESAT-MICAS, B-3001 Heverlee, Belgium (e-mail: [email protected]). Publisher Item Identifier S 0018-9200(02)05856-0.

Fig. 1. Principle of

16 fractional-N synthesis.

digital noise coupling, the modulator is scheduled for integration on the digital baseband signal processing IC of the full transceiver system. The paper describes the design of a monolithic 1.8-GHz -controlled fractional- PLL frequency synthesizer. In noise on PLL bandwidth Section II, the influence of requirements is theoretically analyzed for multistage noise modulators. shaping (MASH) and multibit single-loop Next, a fast nonlinear analysis method is presented, which predicts possible degradation of the PLL spectral purity by in-band noise leakage and re-emerging of spurious tones. The nonlinearities in the phase-frequency detector (PFD) charge pumps are identified as the main trouble spots. The fourth-order type-II PLL building-block design is discussed in Section IV, focusing on integrated filter and voltage-controlled oscillator (VCO) design and on the realization of a linear phase error-to-charge-pump current conversion. In Section V, the experimental results of the fractional- synthesizer prototype are presented and compared to the simulations, showing good correspondence. II. THE FRACTIONAL-

SYNTHESIZER

A. Introduction fractional- synthesizer is shown A block diagram of a modulator output controls the instantaneous in Fig. 1. The division modulus of the prescaler, such that the mean division , with the number of bits of the modulus is modulator and the input word. The corresponding phase changes at the prescaler output are quantized, leading to possible spurious tones and quantization noise. By selecting higher order modulators, the spurious energy is whitened and shaped to high-frequency noise, which can be removed by the low-pass loop filter. As a result, for a given frequency resolution, an arcan be chosen, by assigning the proper number bitrary high

0018-9200/02$17.00 © 2002 IEEE

836

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

Fig. 2. Third-order multibit single-loop reasons.

16 modulator. The internal modulator accuracy is 16 bit. From the five output bits, only four are used for stability

of bits to the modulator. The loop bandwidth is not restricted by the reference spur suppression, resulting in faster settling and higher integratability. Additionally, the division modulus is de(with the minimum number of creased by a factor bits for the frequency resolution, i.e., 7.02 in this case), so that noise of the PLL blocks, except for the VCO, is less amplified. B. The

Modulators

The influence of both third-order MASH and multibit modulators on the spectral purity of the single-loop fractional- synthesizer is investigated. Since the order of the integrated PLL loop filter is three, the order of the modulators must also be three or higher to ensure that noise has at least a 20-dB/dec rolloff at intermediate offset frequencies, causing no degradation of the output phase noise. Both modulators have an internal accuracy of 16 bit and 1 LSB dithering is applied to further randomize any spurious energy. The dithering sequence is third-order noise shaped to avoid an increased noise floor. modulator is chosen beThe MASH or cascade 1-1-1 cause it is easy to integrate in CMOS and is unconditionally stable. The noise transfer function (NTF) of the MASH moduand contains three poles at the lator is origin of the plane. The result is harsh LF noise shaping and and substantial HF noise. In the time domain, this is reflected in the intensive prescaler modulus switching. To synthesize a , all moduli between 64 and 71 are frequency of employed. modulator is shown in Fig. 2. The multibit single-loop For ease of integration, the feedforward and feedback coefficients are a power of 2. Only four output bits are needed to control the prescaler moduli, but five output bits are used, to avoid overlap of the intended input operating range and the unstable input regions. The NTF of the presented modulator is given in (1) and contains only one pole at the origin of the plane and , with a passband two low- Butterworth poles at gain of 3.2. (1) modulator is more complex than Although the single-loop the MASH modulator, it offers a higher flexibility in terms of noise shaping. The HF quantization noise of the modulator can be spread out by proper pole positioning. As a result, the prescaler modulus switching is less intense. Only the moduli . between 66 and 69 are needed to synthesize The reduced HF switching has advantageous effects on noise

Fig. 3. Maximum PLL bandwidth f versus the reference frequency and different modulator orders, for the type-II fourth-order PLL. The dashed curve is for the third-order single-loop modulator. The targeted phase-noise specification is 136 dBc/Hz at 3 MHz for DCS-1800.

16

0

coupling and sensitivity to PLL nonlinearities, as will be discussed in Section III. C. Theoretical Analysis control on the specTo theoretically model the impact of tral purity of the synthesizer, a linear-time-invariant (LTI) PLL quantization noise as an admodel is employed, with the at the prescaler output. The prescaler ditive noise source control can be looked upon as a digital-to-phase (D/P) with converter. Every reference cycle, the prescaler subtracts rad from its input signal, with determined modulator output. The resulting quantization noise by the on the division modulus, and thus output phase, is approximated by uniformly distributed white noise [5]. The quantization noise with for both modupower is the modulus range and the number of signiflators with output bits. The phase noise contribution of the icant modulator at the output of the synthesizer is found in (2) [6], the closed-loop transfer function of the fourth-order with type-II PLL. (2) fractional- synthesizers Since the main advantage of and the PLL is the decoupling of the reference frequency noise on the bandwidth bandwidth , the influence of the

DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800

837

III. FAST NONLINEAR ANALYSIS METHOD

Fig. 4. Maximum PLL bandwidth f different modulator orders for third-order single-loop modulator.

16

18

versus the reference frequency and : . The dashed curve is for the

<

15

requirement is examined. To comply with the most stringent 133 dBc/Hz DSC-1800 phase noise specification, i.e., phase noise is at 3 MHz offset [7], the target (3 MHz) dBc/Hz. In Fig. 3, the maximum PLL is plotted versus the reference frequency bandwidth for different MASH modulator orders. The dashed line is the modumaximum bandwidth for the single-loop multibit lator of Section II-B. For a reference frequency of 26 MHz, not much is gained from increasing the modulator order. For a high bandwidth and thus a fast PLL, the reference frequency and/or the modulator order should be increased leading to an increased power consumption and circuit complexity. The maximum bandwidth is 87 kHz for the third-order MASH modulator and 62 kHz for the single-loop multibit modulator. Apart from the out-of-band phase-noise constraint, the integrated in-band phase noise, determining the rms phase error of the PLL is of importance. To be sure that the does not corrupt the rms phase error, the dynamic range of the modulator must be higher than the dynamic range of the PLL is given by [8]. The integrated in-band frequency noise with the noise bandwidth of the PLL the in-band phase noise in dBc/Hz. The noise and 10 . The maxbandwidth of the presented PLL is imum bandwidth of the PLL is calculated in (3) [8]. (3) is plotted versus the refThe maximum PLL bandwidth erence frequency of the PLL for different MASH modulator modulators orders in Fig. 4. For the single-loop multibit (dashed curve), the actual maximum bandwidth can be calculated to be 25% smaller than in (3), due to the Butterworth poles. In the case of a third-order modulator, a 1.5 rms phase error (to of ensure at least an overall rms phase error of 2 ) and a 26 MHz, the maximum bandwidth is 810 and 614 kHz, respecmodulator tively. Obviously, the constraint posed on the noise due to in-band noise contributions is much less severe than the constraint due to the out-of-band phase noise at 3 MHz.

The theoretical analysis suggested that applying control to the prescaler would not cause any problems for the spectral purity of the PLL. Practice, however, proves this wrong. A fast nonlinear analysis method is developed which can take into account the nonlinearity of the PLL building blocks. The analysis method is at the same time sufficiently fast to sweep simulations over different degrees of nonlinearities and operating points, and is capable of performing sufficiently long transient simulations to get accurate fast Fourier transforms (FFTs) of the phase variable. The fractional operation of the PLL is simulated in discrete time and in open loop under locked conditions to avoid drift of the phase error. To further speed up the simulation, the building blocks are represented by high-level models with parameters to model any nonlinear behavior or mismatch in critical transistors. The simulations are performed in Matlab [9]. modulation of To find the phase error, generated by the the division modulus, the variation of the number of RF pulses, , at the output of the divider is monitored. Every reference cycle, the number of RF pulses at the divider output is detercontrol, mined by the number of pulses swallowed by the : (4) The resulting quantized phase changes are compared with the phase that would be expected when the loop would be in lock, i.e., the phase corresponding to the fractional part of the divi. The result is the instantaneous accumusion modulus : lated phase error (5) , in the The phase error is converted to current pulses, (phase-error charge-pump curcharge pump. The rent) conversion is modeled to contain any PFD nonlinearity. Mismatch in the up and down current sources, resulting in gain mismatch for positive and negative phase errors is modeled by . The occurrence of a dead zone is modeled by (6) By taking an FFT of the current pulses, the current noise spectrum is obtained. The current noise spectrum is modeled as a phase-noise source which is subjected to its corresponding closed-loop transfer function, obtained from the LTI PLL model. This means that the filter is modeled by its linear transfer function, which includes parasitic gain and pole position changes. The nonlinear conversion from voltage to frequency/phase in the VCO is modeled by the variation of the VCO gain, when changing the operating point of the PLL. The analysis tool enables the evaluation and comparison of noise on the PLL. the effect of MASH and single-loop This analysis is performed with the following nonlinearities: a 0.1% dead zone and a gain mismatch of 2%. The internal accuracy of both modulators is 16 bit. The reference frequency

838

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

1

Fig. 5. Simulation results. The phase error  for (a) the MASH modulator and (b) the single-loop multibit modulator. The FFT of the current pulses CP (c) the MASH modulator and (d) the single-loop multibit modulator.

is 26 MHz and the fractional division number is 67.92. The output frequency is 1.76592 GHz, i.e., 2.08 MHz offset from . In Fig. 5(a) and (b), the time-domain an integer multiple of is plotted for both modulators. Note that the phase error fractional- PLL frequency synthesizer can hardly be called a phase-locked loop, since the loop is never in lock! Due to the modulator, the inshaping of the HF noise in the single-loop stantaneous phase error is smaller than for a MASH modulator. This has two important consequences. First, the on-time of the charge pumps is smaller for the single-loop modulator, making it less sensitive to noise coupling from the substrate and the power consupply. Second, the sensitivity to the nonlinear version in terms of noise leakage is reduced. To be able to examine the effect of nonlinearities in the frequency domain, the FFTs of the charge-pump current pulses are plotted in Fig. 5(c) and (d). A noise floor appears in the output spectrum as well as spurious tones, although the output is perfectly randomized and dithered. Due to the nonfolds linear mixing in the PFD charge pump, noise at back to lower offset frequencies, similar to the effect of a nonADC. Since the noise at is linear DAC in a multibit modulator, its noise leakage much lower for the single-loop due to the nonlinear mixing in the PFD is also lower. In the time

[i] for

domain, this effect corresponds to the smaller phase excursions. The difference in phase error between MASH and single-loop modulators is reflected in a lower noise floor, i.e., a 10-dB difference. In addition, previously unnoticed spurious tones appear in the output spectrum at with . noise of both modulators as it appears at Fig. 6 shows the the PLL output for an ideal (dotted) and a nonlinear conversion (solid). The results of the ideal case closely match the theoretical results of Section II-C (solid light gray). Due to nonlinearity, the simulated output spectrum of the integerPLL (the dash-dotted line) is seriously deteriorated by noise . Especially, in the PLL noise bandwidth, increasing the the MASH converter is critical in terms of in-band noise due to the higher phase error [see Fig. 5(a)], despite the inherently noise of the MASH modulator. Note that the simlower LF ulations are performed without taking into account noise coupling through the substrate or power-supply lines. As a consefractionalquence, the actual spurious performance of the PLL could be worse than simulated. The presented simulation results are for a division modulus 67.92, close to an integer mul. When analyzing division moduli in between integer tiple of , noise leakage is still observed, but the spurious multiples of tones are well below the phase noise.

DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800

839

Fig. 7. Discrete time autocorrelation estimate of the modulator outputs for (a) the MASH modulator and (b) the single-loop multibit modulator.

16

Fig. 6. Simulation results. The noise at the output of the PLL for (a) the MASH modulator and (b) the single-loop multibit modulator. The results are plotted for an ideal PFD (dotted), which closely corresponds to the theoretical results (solid light gray) and for a nonlinear PFD (solid). They are compared to the simulated integer PLL phase noise (the dash-dotted line).

PFD. This effect can be worsened by substrate and power. supply coupling with signals at IV. PLL BUILDING-BLOCK CIRCUIT DESIGN A. The Fourth-Order Type-II PLL

The explanation for the re-emerging of spurious tones is that the modulator is unable to sufficiently decorrelate the successive modulator output samples. To quantify the correlation in the output, the discrete time autocorrelation estimate is calculated and plotted for both modulators for inputs close to an integer value (see Fig. 7). The autocorrelation calculations show correlation, although 1–LSB noise-shaped dithering is applied. The modulator shows large autocorrelation of the single-loop correlation peaks, explaining the higher spurious tones in the output phase-noise spectrum of the PLL. With the autocorrelamodtion estimate, the necessary internal accuracy of the ulators is found to be at least 13 bits for MASH and 16 bit for single-loop modulators to sufficiently decorrelate the modulator output for inputs close to integers. A second possible source of tones is the downconversion of tones which are inher[5], by the nonlinear mixing in the ently present around

A fourth-order type-II PLL is integrated, including a 4-bit prescaler, a zero-dead-zone PFD, a dual charge pump, and a 3-step equalizer, together with an on-chip LC-tank VCO and a third-order dual-path 35-kHz low-pass loop filter (see Fig. 8). The equalizer performs a 3-step piecewise equalization of the loop gain, by keeping the product of the VCO gain and the charge-pump current constant. To prevent switching between different equalization states, the state transitions exhibit hysteresis. B. The 4-Bit Prescaler The first high-speed division of the prescaler is done with two differential single-transistor-clocked (DSTC) logic n-latches [10], forming a differential dynamic D-flip-flop. The flip-flop operates with rail-to-rail internal signals to minimize the residual prescaler phase noise [11] to levels insignificant to

840

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

Fig. 8. Fully integrated fourth-order type-II phase-locked loop.

the overall phase-noise performance. The 16-modulus division (64 79) is implemented with the phase-switching topology [12]. The division moduli are generated by switching between the 90 -spaced output phases of the second D-flip-flop. When the 90 spacing is not ideal, spurs appear at 1/4, 2/4, and 3/4 of the PLL reference frequency. It takes careful layout and circuit design to equalize the delays of the different quadrature paths, such that these spurious tones are suppressed to negligible levels.

SUMMARY

TABLE I LOOP PROPERTIES AND PERFORMANCE FOURTH-ORDER TYPE-II PLL

OF THE

OF THE

C. The Voltage-Controlled Oscillator The LC VCO with on-chip inductor combines a 30% tuning range at only 2 V and an excellent phase-noise performance over a large frequency range. To minimize the VCO phase noise, a simulator-optimizer program has been developed which searches the optimal inductor geometry for a given technology. The resulting hollow octagonal balanced inductor as high as 9 with an inductance of 2.86 nH, for a has a standard 0.25- m CMOS technology with only two metal layers (0.6 and 1.0 m) [13]. The VCO is implemented as a single differential pMOS-only topology, leading to an enhanced tuning range, without in[13]. creasing the power consumption and the VCO gain, is between 100 and For the frequency range of interest, 200 MHz/V, explaining the need for equalization of the loop gain. The VCO output is buffered from the prescaler input to prevent kickback noise from entering the tank. The measured phase noise is as low as 127.5 dBc/Hz at 600 kHz and 142.5 dBc/Hz at 3 MHz for a carrier frequency of 1.82 GHz. D. The 35-kHz Dual-Path Loop Filter To achieve full integration, a dual-path filter topology has been implemented (Fig. 8). Two filter paths, one active integraare added tion ( ) and one passive low-pass filter

with a multiplication factor in the dual charge pumps. The addition realizes the low-frequency zero needed for loop stability in a type-II PLL, without adding the actual capacitor [12]. The total number of capacitors is the same as in a classical fourth-order type-II PLL, but for the same phase noise the integrated capacitance is more than 5 times smaller. Due to the rather high VCO gain, the integrated capacitance is still 1.4 nF to be able to comply with the DCS-1800 phase-noise requireis added at 210 kHz to ensure ments. An extra pole enough suppression at higher offset frequencies. A filter optimization model is developed, determining all pole and zero positions and the capacitance–resistance tradeoff to obtain low noise and high integratability [14]. The results of the optimization at 1765.92 MHz are listed in Table I. The total phase noise is without the noise. The MASH and single-loop (SL) noise contributions result from the nonlinear analysis. As

DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800

841

Fig. 9. (a) Timing control circuit and signals to control the dummy and the output current branch of the charge pump. (b) Charge-pump circuit with (at the left) the dummy current branch, denoted by the suffix d, and the output branch.

seen in Section II-C, the loop bandwidth needs to be smaller than noise suppression. However, to ensure sufficient 62 kHz for suppression of the low-frequency fractional spurious tones for inputs close to the integers, the bandwidth is designed to 35 kHz. Despite the rather low loop bandwidth for a fractional- synthesizer, a settling time of less than 293 s for a 104-MHz step is simulated. E. The

Conversion

The nonlinear analysis of Section III identified nonlinearity conversion as the main cause of noise leakage of the and spurious tones. Therefore, the PFD and charge-pump circuits are carefully optimized toward spurious suppression as such and toward a highly linear phase-error detection for spurious suppression. First, the reference spur generation by the PFD charge-pump circuit is carefully minimized. The integration in the first path of the loop filter is done actively to keep the charge-pump output

at a fixed level (see Fig. 8). Additionally, the charge-pump current is designed to be at least a magnitude larger than the fixed parasitic charge injection of the switch transistors. The current switches are implemented with pMOS and nMOS transistors to compensate charge injection. Finally, a timing control scheme [Fig. 9(a)] is developed to control the charge-pump switches. The up and down control pulses of the PFD are converted to synchronized control signals to drive both the output current branch and the dummy current branch of the charge pump [Fig. 9(b)]. Fig. 9(a) shows the dummy and output control signals. The is delayed versus the output control by dummy control modifying the thresholds of the second inverter-string (indicated always flows, preby high and low) such that the current venting hard on/off switching of the current sources. To equalize rise and fall times and force a perfect rad relation between nMOS and pMOS control signals, latches at the outputs of both inverter strings are implemented. Capacitors at the control outputs lower the rise and fall times to prevent large charge injections by fast switching.

842

Fig. 10.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

IC microphotograph and the measurement setup in which it is embedded.

To linearize the conversion, the phase detection is performed by a zero-dead-zone PFD [15], to prevent a hard nonlinearity around 0 phase error. Due to the delay added in the PFD, both the up and down current sources are on, for small or zero phase errors, enabling the PFD to react to very small phase errors. The on-time fraction of the charge pump due to the delay is less than 10%. This value is a tradeoff between dead-zone prevention and sensitivity to noise coupling, when the charge pumps are on. To further minimize digital noise coupling, the sampling in the PFD and the computational events modulator and prescaler are offset in phase. Consein the quently, the phase-error decision making is done in a relatively quiet environment. To make sure that the gains for positive and negative phase-error detection are equal, the current source transistors are oversized to ensure sufficient matching. As a side efnoise, which can seriously affect the fect, the current source in-band noise, is decreased. Additionally, the timing control of Fig. 9(a) provides synchronization between the two filter paths and the switches of the charge pumps themselves, thereby ensuring equal positive and negative phase-error detection gain. HSPICE simulations of the PFD charge-pump circuit are performed and show no dead zone and no gain mismatch with ideal transistor matching.

V. EXPERIMENTAL RESULTS Fig. 10 shows the IC microphotograph and the measurement fractional measuresetup in which it is embedded. The ments are performed by controlling the PLL divider moduli with an HP80000 data generator, which generates the 4-bit control output bit stream is generated using Matlab. word. The 4-bit modThis provides a flexible way to test different kinds of ulators, without the need for redesigns. All presented measurements are performed with a 26-MHz reference frequency and at

16

Fig. 11. Measured output spectrum of the fractionalGHz. All spurious tones are well below 75 dBc/Hz.

0

N PLL at 1.76592

1.76592 GHz, i.e., for a fractional division by 67.92 for comparmodulators ison with the simulated results. The input to the ), resulting in a frequency resolution of is a 16-bit word ( around 400 Hz. The power-supply voltage is only 2 V. Fig. 11 shows the output spectrum of the fractional- PLL over a span of 55 MHz. The reference spurs are well below 75 dBc, due to the careful charge-pump timing control. To measure the fractional performance of the frequency synthesizer, the Matlab data is stored in the data generator memory. Unfortunately, the maximum memory capacity is only 128 kbit, leading to large spurious tones at the output at low offset frequencies. These large tones corrupt the gain calibration, which is performed by the phase-noise measurement system every offset frequency decade, such that accurate measurements of the phase noise at offsets smaller than 10 kHz are not feasible. The measured phase noise of the PLL with the MASH modulator and the

DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800

SUMMARY

843

OF

TABLE II MEASURED SPECIFICATIONS COMPARED DCS-1800 SPECIFICATIONS

TO THE

16

Fig. 12. Phase-noise measurement with the single-loop multibit converter at 1.76592 GHz compared to the phase noise at integer division (light).

Fig. 13. Phase noise measurement with the MASH converter at 1.76592 GHz compared to the simulated noise at the output of the PLL (dashed), and control (dash-dotted). with the simulated PLL output without

16

16

single-loop multibit modulator is presented in Figs. 12 and 13. Small spurs are present at 2.08 MHz as predicted by the simulations in Fig. 6. The spur level is well below 100 dBc, due to careful PFD charge-pump design. The phase noise at 600 kHz is lower than 120 dBc/Hz. In Fig. 12, the measured phase noise of the PLL with a multibit single-loop modulator (dark) is compared to the phase noise at integer division (light). Noise at lower offsets origimodulator due to noise folding in the PFD, nates from the as predicted by the simulations. As a result, the rms phase error is increased from 1.7 to 3 . Note that the phase noise of the PLL at integer divisions is as low as 124 dBc/Hz at 600 kHz, which is only 0.3 dB higher than predicted by the PLL simulations (see Table I). The measured results for fractional division are much noisier than predicted by simulation. The phase noise at offset frequencies close to 10 kHz is increased due to the limited memory of the data generator. The noise at higher offset frequencies is corrupted by noise coupling from the data generator. As can be seen in Fig. 10, -control bonding wires, which conduct rail-to-rail, very the

noisy control pulses are close to the LC tank and the bonding wires of the VCO power supply. Without proper shielding, the VCO phase noise is seriously degraded by this noise coupling. noise and the noise as simIn Fig. 13, the measured ulated in Section III (dashed) is compared. The dash-dotted line control. The is the simulated phase noise of the PLL without noise leakage closely matches the measured resimulated sults, except at very low offsets due to the limited memory. The phase noise at high offsets is increased versus the simulated PLL results due to noise coupling. Second-order tones are larger in measurements, since the models in the simulator do not include second-order effects and noise coupling. Tones at 520 kHz are believed to come from subharmonic tones present in the modulator output [5], which are amplified by mixing through noise coupling. When comparing the results for the MASH and the single-loop modulator, the measured results are less pronounced than the simulated results (see Fig. 6). The measured phase noise for the single-loop modulator is however a few decibels lower than for the MASH modulator. Note that all measurements are performed for frequencies close to integer multiples . of The measured settling time of the PLL is 226 s for a 104-MHz frequency step. The power consumption of the PLL is 70 mW from a 2-V power supply. The fully integrated low-phase-noise VCO is responsible for almost 66% of the total power consumption. The IC area is 2 2 mm , including bonding pads and bypass capacitors. Table II shows the measured specifications compared to the DCS-1800 specifications [1]. The specifications of the IC prototype comply with the is degraded due to the limited DCS-1800, only the resolution of the measurement setup. VI. CONCLUSION A monolithic 1.8-GHz -controlled fractional- PLL frequency synthesizer is implemented in a standard 0.25- m CMOS technology. The monolithic fourth-order type-II PLL

844

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

integrates the digital synthesizer part together with a fully integrated LC VCO, a high-speed prescaler, and a 35-kHz dual-path loop filter on a die of only 2 2 mm . To investigate modulator on the synthesizer’s spectral the influence of the purity, a fast nonlinear analysis method is developed, showing good correspondence with measurements, in contrast to the results of the theoretical analysis. Nonlinear mixing in the phase-frequency detector and the VCO is identified as the main fractional- synthesizers. source of spectral pollution in modulators are compared MASH and single-loop multibit for use in fractional- synthesis. Although the MASH is stable and easy to integrate, the single-loop modulator presents a better solution, showing less sensitivity to noise leakage and noise coupling and providing more flexibility. The measured phase noise is lower than 120 dBc/Hz at 600 kHz and 139 dBc/Hz at 3 MHz. The measured fractional spur level is lower than 100 dBc, satisfying the DCS-1800 spectral purity requirements. All measurements are performed for frequencies close to integer multiples of the reference frequency, where the synthesizer is most sensitive to spurious tones. REFERENCES [1] M. S. J. Steyaert, J. Janssens, B. De Muer, M. Borremans, and N. Itoh, “A 2-V CMOS cellular transceiver front-end,” IEEE J. Solid-State Circuits, vol. 35, pp. 1895–1907, Dec. 2000. [2] T. Cho, E. Dukatz, M. Mack, D. Macnally, M. Marringa, S. Mehta, C. Nilson, L. Plouvier, and S. Rabii, “A single-chip CMOS direct-conversion transceiver for 900-MHz spread-spectrum digital cordless phones,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, San Francisco, CA, Feb. 1999, pp. 228–229. [3] A. Rofougaran, G. Chang, J. J. Rael, J. Y.-C. Chang, M. Rofougaran, P. J. Chang, M. Djafari, J. Min, E. W. Roth, A. A. Abidi, and H. Samueli, “A single-chip 900-MHz spread-spectrum wireless transceiver in 1-m CMOS—Part II: Receiver design,” IEEE J. Solid-State Circuits, vol. 33, pp. 547–555, Apr. 1998. [4] M. Copeland, T. Riley, and T. Kwasniewski, “Delta–sigma modulation in fractional-N frequency synthesis,” IEEE J. Solid-State Circuits, vol. 28, pp. 553–559, May 1993. [5] S. R. Norsworthy, R. Schreier, and G. C. Themes, Delta–Sigma Data Converters: Theory, Design and Simulation. New York: IEEE Press, 1997. [6] B. Miller and R. Conley, “A multiple modulator fractional divider,” IEEE Trans. Instrum. Meas., vol. 40, pp. 578–583, June 1991. [7] “Digital cellular communication system (Phase 2+); Radio transmission and reception,” Eur. Telecommun. Standards Inst., ETSI 300 190 (GSM 05.05 version 5.4.1), 1997. [8] W. Rhee, B.-S. Song, and A. Ali, “A 1.1-GHz CMOS fractional-N modulator,” IEEE J. frequency synthesizer with a 3-b third-order Solid-State Circuits, vol. 35, pp. 1453–1460, Oct. 2000. [9] The Mathworks Inc., Matlab User’s Guide, Version 5. Englewood Cliffs, NJ: Prentice Hall, 1997. [10] J. Yuan and C. Svensson, “New single-clock CMOS latches and flipflops with improved speed and power savings,” IEEE J. Solid-State Circuits, vol. 32, pp. 62–69, Jan. 1997.

16

[11] B. De Muer and M. S. J. Steyaert, “A single-ended 1.5-GHz 8/9 dualmodulus prescaler in 0.7-m CMOS with low phase-noise and high input sensitivity,” in Proc. Eur. Solid-State Circuits Conf. (ESSCIRC), The Hague, Sept. 1998, pp. 256–259. [12] J. Craninckx and M. S. J. Steyaert, “Low-phase-noise fully integrated CMOS frequency synthesizers,” Ph.D. dissertation, Katholieke Univ. Leuven, Belgium, 1997. [13] B. De Muer, M. Borremans, N. Itoh, and M. S. J. Steyaert, “A 1.8-GHz highly tunable low-phase-noise CMOS VCO,” in Proc. IEEE Custom Integrated Circuits Conf. (CICC), Orlando, FL, May 2000, pp. 585–588. [14] B. De Muer and M. S. J. Steyaert, “Fully integrated CMOS frequency synthesizers for wireless communications,” in Analog Circuit Design, W. Sansen, J. H. Huijsing, and R. J. van de Plassche, Eds. Norwell, MA: Kluwer, 2000, pp. 287–323. [15] F. M. Gardner, Phaselock Techniques. New York: Wiley, 1979.

Bram De Muer (S’00) was born in Sint-Amandsberg, Belgium, in 1973. He received the M.Sc. degree in electrical engineering in 1996 from the Katholieke Universiteit Leuven, Belgium, where he is currently working toward the Ph.D. degree on high frequency low-noise integrated frequency synthesizers at the ESAT-MICAS laboratories. He has been a Research Assistant with ESAT-MICAS laboratories since 1996. His research is focused on integrated low-phase-noise VCOs with on-chip planar inductors and high-speed prescaler design, leading to fully integrated fractional-N synthesizers in CMOS technology.

16

Michel S. J. Steyaert (S’85–A’89–SM’92) was born in Aalst, Belgium, in 1959. He received the M.S. degree in electrical-mechanical engineering and the Ph.D. degree in electronics from the Katholieke Universiteit Leuven (K.U. Leuven), Heverlee, Belgium, in 1983 and 1987, respectively. From 1983 to 1986, he obtained an IWONL fellowship (Belgian National Foundation for Industrial Research) which allowed him to work as a Research Assistant at the Laboratory ESAT at K.U. Leuven. In 1987, he was responsible for several industrial projects in the field of analog micropower circuits at the Laboratory ESAT as an IWONL Project Researcher. In 1988, he was a Visiting Assistant Professor at the University of California, Los Angeles. In 1989, he was appointed by the National Fund of Scientific Research (Belgium) as a Research Associate, in 1992 as a Senior Research Associate, and in 1996 as a Research Director at the Laboratory ESAT, K.U. Leuven. Between 1989 and 1996, he was also a part-time Associate Professor and since 1997 an Associate Professor at the K.U. Leuven. His current research interests are in high-performance and high-frequency analog integrated circuits for telecommunication systems and analog signal processing. Dr. Steyaert received the 1990 European Solid-State Circuits Conference Best Paper Award, the 1995 and 1997 ISSCC Evening Session Award, the 1999 IEEE Circuit and Systems Society Guillemin–Cauer Award, and the 1991 NFWO Alcatel-Bell-Telephone award for innovative work in integrated circuits for telecommunications.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 5, MAY 1997

691

A 960-Mb/s/pin Interface for Skew-Tolerant Bus Using Low Jitter PLL Sungjoon Kim, Student Member, IEEE, Kyeongho Lee, Student Member, IEEE, Yongsam Moon, Student Member, IEEE, Deog-Kyoon Jeong, Member, IEEE, Yunho Choi, and Hyung Kyu Lim, Member, IEEE Abstract—This paper describes an I/O scheme for use in a highspeed bus which eliminates setup and hold time requirements between clock and data by using an oversampling method. The I/O circuit uses a low jitter phase-locked loop (PLL) which suppresses the effect of supply noise. Measured results show peakto-peak jitter of 150 ps and rms jitter of 15.7 ps on the clock line. Two experimental chips with 4-pin interface have been fabricated with a 0.6-m CMOS technology, which exhibits the bandwidth of 960 Mb/s per pin. Index Terms— Skew-tolerant, high speed bus, oversampling, phase locked loop, jitter, CMOS, phase frequency detector, voltage controlled oscillator.

I. INTRODUCTION

A

S the speed of high-speed digital systems tends to be limited by the bandwidth of pins, new I/O architectures are gaining momentum over conventional ones. The advent of 64 Mb and 256 Mb DRAM’s and faster logic chips also propels the need for high-speed I/O interface while reducing the number of pins and hence the system cost. Synchronous DRAM’s increased chip bandwidth up to 220 Mb/s/pin [1]. A revolutionary architecture using delay-locked loops (DLL’s) or phase-locked loops (PLL’s) was also successful in providing over 500 Mb/s/pin bandwidth [2], [3]. Such a narrow, highspeed bus provides large bandwidth in a small, low pin-count package, but such high-speed bus architectures inevitably require strict phase relationships between clock and data. A phase-tolerant I/O scheme was also developed previously for a point-to-point link [4]. This paper describes an I/O scheme for use in a high-speed bus which eliminates setup and hold time margins by using blind 3 oversampling and data recovery. In the new scheme, the clock line delivers only frequency information. The data receiving circuits extract phase information from the data itself. An 8-b data bus employing this skew insensitive scheme can deliver over 960 MB/s. Two experimental chips with 4-pin interface were fabricated. In Section II, the chip architecture and the skew-tolerant I/O scheme will be presented. The circuit design techniques for low jitter PLL and other circuits are discussed in Section III.

Manuscript received August 20, 1996; revised December 3, 1996. S. Kim, K. Lee, Y. Moon, and D.-K. Jeong are with the Inter-University Semiconductor Research Center, Seoul National University, Seoul 151-742, Korea. Y. Choi and H. Lim are with Samsung Electronics Co., Yongin-City, Kyungki-Do, Korea. Publisher Item Identifier S 0018-9200(97)02850-3.

The chip layout and experimental results are presented in Section IV followed by a conclusion in the final section.

II. SYSTEM ARCHITECTURE Two chips, bus master and bus slave, were designed. Bus masters in a system bus initiate bus transactions, and slaves respond to the tenured master. For example, a memory controller works as the master chip and a memory with a high-speed interface works as the slave chip. A simplified block diagram of the two chips is shown in Fig. 1. The bus signals are composed of 4-b wide data lines, a clock line, and a reference line. A charge pump PLL multiplies the external clock by two and generates two sets of multiphase clocks for both bit serialization and data oversampling. The relationship between internal 12-phase clocks and external clock is shown in Fig 2. First set of multiphase clocks are 12 multiphase clocks with 30 of phase separation. These 12 clocks are shown in Fig 2(a) as PCK[0] to PCK[11]. These multiphase clocks were laid out to minimize the interference. Fig 2(b) shows the multiphase clock distribution. Ground lines were inserted between each multiphase clock to minimize the interference. When one clock is switching, the adjacent clocks are guaranteed to be in stable state. This configuration minimizes coupling between clocks. The second set of multiphase clocks are four multiphase clocks with 90 of phase separation. This second set of multiphase clocks, TCK[0] to TCK[3], are in phase with PCK[0], PCK[3], PCK[6], PCK[9], respectively. We generate these two separate sets of clocks to equalize loading conditions. An 8-b parallel data stream is first converted to a 4-b data stream by an internal clock and then serialized with a serialization circuit. The serializer circuit used is the same type of circuit reported in [4]. The only difference is that four phase clocks instead of ten phase clocks of the previous design are used in this design, thereby reducing area and parasitic capacitance at high-speed nodes. The serial stream is driven by a current controlled open-drain output driver. The second set of multiphase clocks, TCK[0] to TCK[3], are used by the transmitter to serialize 4 b of data. Each pin connected to a high-speed bus has 12 oversamplers and a output driver. In [6], 32 clock phases are generated to oversample the incoming data. The decision on the degree of oversampling is a tradeoff between input data phase jitter tolerance, power, and area. If too many clock phases are used per bit period, power consumption and chip area will increase. But low oversampling ratio may affect the tolerance of phase

0018–9200/97$10.00  1997 IEEE

692

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 5, MAY 1997

Fig. 1. Simplified block diagram of master and slave chip.

(a)

(b) Fig. 2. (a) External clock and 12 multiphase clocks relationship. (b) Multiphase clock layout.

jitter on the incoming data. If the phase jitter on the incoming data is low and the PLL has low jitter characteristics, the oversampling ratio can be as low as three [7]. The oversampler oversamples the bus data three times per bit using 12 phase clocks provided by a PLL. To extract correct phase information from the data stream, the high-to-low transition is inserted in each head of a packet on each pin for correct data sampling. The slaves of the bus keep oversampling the bus signals to catch the start of a bus transfer. This process is illustrated in

Fig. 3. The serial input data is sampled at the rising edges of each multiphase clock. The receiver samples the serial data blindly without any constraint on setup and hold time margins. The sampled data is amplified again regeneratively to reduce possible metastability. Fig. 3 shows two high-speed bus signals, bus signal 0 and bus signal 1, with skew between them. When the signal receiver detects the first 1-to-0 transition, it selects the next bit as the first valid data. The third bit after the first valid bit is also selected as valid. It is assumed

KIM et al.: 960-Mb/s/pin INTERFACE FOR SKEW-TOLERANT BUS

693

Fig. 3. Skew-insensitive bus operation.

Fig. 4. Byte skew handling operation.

Fig. 5. Functional block diagram of charge pump PLL.

Fig. 7. Implemented phase frequency detector.

Fig. 6. Conventional phase frequency detector.

that the next oversampled bit after the first 1-to-0 transition was sampled near the center of data eye pattern. Each pin of the data bus tracks the start phase of a data transfer separately. After each pin catches the start of a data transfer, the demultiplexed data of each pin is retimed into a single internal clock domain. Since this process can be done in one clock cycle, the masters can respond quickly as distance from the signal source changes. Since this scheme allows skew not only in clock line but also among data lines, there is a possibility that some of the demultiplexed parallel data are one internal clock cycle earlier or later than the other demultiplexed data after retiming.

694

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 5, MAY 1997

Fig. 8. (a) PFD dead zone and (b) PLL jitter.

Fig. 9. Voltage controlled oscillator circuit diagram.

The skew handler examines the parallel output of each pin and checks whether every pin is aligned properly. If some of the parallel outputs are not aligned, skew handler delays the parallel outputs which arrived earlier. Fig 4 explains the operation of the interpin skew handler operation. III. CIRCUIT DESCRIPTION A. Low Jitter PLL The performance of the PLL or DLL is one of the limiting factors of the high-speed interface or serial communications. The jitter characteristics become more important especially for such applications that require integration of PLL or DLL with noisy digital circuits. Integration with digital circuits induces noise on the supply rails or on the substrate. Since the charge pump PLL used in this design generates multiple phase clocks to divide one external clock period into many equally spaced intervals, the accuracy and the jitter characteristics become more important. Fig. 5 shows the functional block diagram of a charge pump PLL clock generator. It consists of a phase frequency

Fig. 10. Simulated UP/DOWN pulse width difference as a function of input phase difference.

detector, charge pump, loop filter, clock divider, and a voltagecontrolled oscillator (VCO). With a six-stage differential VCO, 12 clock phases are available to oversample the incoming data and to serialize parallel data into serial bit stream. One of the critical building blocks of the PLL is the phase frequency detector (PFD). A low precision PFD has a wide dead zone (undetectable phase difference range), which results in increased jitter. The jitter caused by the large dead zone can be reduced by increasing the precision of the phase frequency detector. Fig. 6 shows a conventional implementation of a static PFD [8]. This conventional PFD is an asynchronous state machine. The delay time to reset all internal nodes determines the circuit speed. The critical path of the conventional PFD is shown in bold lines in Fig. 6. The critical path forms a feedback path with six gate delays. The dead-zone occurs when the loop is in a lock mode and the output of the charge pump

KIM et al.: 960-Mb/s/pin INTERFACE FOR SKEW-TOLERANT BUS

Fig. 11.

695

Voltage controlled oscillator circuit diagram.

does not change for small changes in the input signals at the PFD. Any width of the dead-zone directly translates to jitter in the PLL and must be avoided. To overcome the speed limitation and to reduce the dead zone, a new dynamic logic style PFD was designed. A similar dynamic comparator was reported before [9]. But our implementation requires fewer number of transistors. Fig. 7 shows the circuit diagram of the PFD. Conventional static logic circuitry was replaced by dynamic logic gates. As a result, the number of transistors in the PFD core is reduced from 44 to 16. The critical path of this PFD is shown also in Fig. 7. The critical path of this PFD is composed of threegate feedback path. The shortened feedback path delay and dynamic operation allow high precision in the high-frequency operation. Fig. 8. shows the relation between dead zone of PFD and the phase error of PLL. If the phase difference of EXT clock and VCO clock is smaller than the dead zone, the PFD cannot detect the phase difference. So the phase error signal of PFD will remain zero, resulting in unavoidable phase error between EXT clock and VCO clock. The minimum peak-to-peak phase error caused by this dead zone is Minimum Peak-to-Peak Phase Error

(1)

In order to avoid dead zone, the PFD asserts both UP and DOWN outputs as shown in Fig 9. For in-phase inputs of EXT_CLK and VCO_CK, the charge pump will see both UP and DOWN pulse for the same short period of time. If there is a phase difference between EXT_CLK and VCO_CK, the width of UP and DOWN pulse will be proportional to the phase differences of the inputs. Fig. 10 shows the SPICE simulation result of the UP/DOWN pulse width differences as a function of the input phase differences. The deadzone of the PFD is significantly smaller than the measured maximum PLL jitter. Several critical parameters of the PLL, such as speed, timing jitter, spectral purity, and power dissipation, strongly depend

Fig. 12. VCO operation for step supply noise.

on the performance of the VCO. So the noise insensitivity of the VCO is very important. The VCO implemented in this design has a simple bias circuit to reject supply step noise. The processor or bus can have intervals when there is heavy circuit activity in switching large amounts of capacitance and intervals when there is very little circuit activity. This will show up as steps or impulses on the power supply of PLL [8]. The actual peak-to-peak jitter in this case becomes dominated by the peaks in the impulse transient noise response. The VCO used in the design is a six-stage differential-type ring oscillator with limited voltage swing and is shown in Fig. 11. Each stage is made up of a differential NMOS pair with variable resistance loads made of PMOS devices operating in the triode region. The bias voltage for the PMOS is generated by a replica bias circuit. The operation of this bias circuit is shown in Fig. 12. voltage dynamically tracks the supply variations. The The replica bias circuit which consists of replica delay cell and an op-amp sets the minimum voltage level of the internal VCO The signal is generated by two resistors and swing to one capacitor. When the supply rail is quiet, the voltage swing Let us assume that there is a of the internal VCO is at some point. After the supply voltage step variation of

696

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 5, MAY 1997

Fig. 13.

Phase and byte sync block diagram.

Fig. 14.

Sampler circuit diagram.

step change at the supply, the

level settles to (2)

to the increased supply voltage for a short period of time. And the voltage swing at the VCO increases with a time constant determined by and OPAMP bandwidth and approaches to

with a time constant of (4) (3) At the instant of supply step change, the voltage difference and remains the same due to the capacitor between at the generator. If is fixed, the delay cells run a little bit faster due to the supply voltage increase instead remains the of keeping exact constant delay. Since same temporarily, the delay cells run a little bit faster due

which result in the increase of one stage delay. This gives an averaging effect on the VCO delay after the supply step change, making the delay change minimized with supply step change. If we select and values for a minimum average delay change, the effect of supply step change can be nullified. The values we chose for this particular process are k k and pF.

KIM et al.: 960-Mb/s/pin INTERFACE FOR SKEW-TOLERANT BUS

697

PLL circuits can be sensitive to noise pickup from the supplies and substrate. So the PLL circuit has a dedicated power and ground pads. Bypass capacitors are included in the layout to stabilize VDD and GND of PLL. Guard rings are used to isolate PLL and other digital parts. The placement of multiphase clocks were carefully chosen to remove possible coupling between clocks. B. Phase and Byte Sync Phase and byte sync block at Fig. 1 is shown in Fig 13. It consists of 3-to-1 mux array, metastability resolver, start bit finder, phase memory, word memory, shifter, and D-flipflops (DFF’s). This circuit finds the start bit and decimates the oversampled 12 b and aligns the byte boundary. The oversampled 12 b are sent from the sampler to the metastability resolver. Since the oversampled 12 b are not sampled at the center of the eye, there is a possibility that some of the bits are still at the metastable state. The metastability is practically removed by one more stage of synchronizers in the metastability resolver. The start bit finder receives information from the metastability resolver and selects one of the three phases as a correct phase and also extracts byte align information. The phase and byte align information are stored at the phase and word memory. The 3-to-1 mux array decimates 12 b into 4 b. The shifter at the final stage aligns the byte boundary according to the value of the word memory.

Fig. 15. Microphotograph of master chip.

C. Oversampler The oversampler used in the data receiver is shown in Fig. 14. Each oversampler is a cascaded sense amplifier and uses four clocks for correct, timely sampling. It is very important to reduce the probability of metastability by careful design and layout. The same size is used for both PMOS and NMOS in the core synchronizing amplifier to maximize the loop bandwidth. IV. EXPERIMENTAL RESULTS Two prototype chips, master and slave, have been fabricated in a 0.6- m double-metal CMOS process. Fig. 15 shows the microphotograph of the fabricated master chip. This chips occupies 4100 m 4300 m including pad area. The master chip incorporates a common skew-insensitive I/O macro block, a bus protocol handler, and a self-test circuit for chip and system diagnostics. The common skew-insensitive I/O macro block includes a charge-pump PLL for multiphase generation, oversamplers, I/O buffers, parallel-to-serial converters, and a bias generator for internal use. The core area for the skewinsensitive I/O macro block is 3600 m 700 m for 4-pin interface. The microphotograph of the fabricated slave chip is shown in Fig 16. It has the same die size as the master chip. Many blocks are shared with the master chip. The skewinsensitive I/O macro block and the charge pump PLL are the same as those of the master’s. The slave chip includes a small internal fast SRAM to verify correct read/write operations. The measured charge pump PLL jitter histogram of the master and the slave chips is shown in Fig. 17. Since the two chips use the same PLL, it showed similar jitter performance.

Fig. 16. Microphotograph of slave chip.

The rms jitter is 15.7 ps when the tested chip is active. The peak-to-peak jitter was measured to be less than 150 ps. This PLL jitter characteristic is especially important for multiphase operation. Fig. 18 shows an output data waveform at 960 Mb/s. The master chip is sending data to the bus according to the predetermined bus protocol. The jitter at the output data is larger than the jitter at the charge pump PLL clock due to the extra modulation effect of supply voltage fluctuation to data

698

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 5, MAY 1997

Fig. 17.

PLL jitter histogram.

Fig. 18.

Output data waveform.

output. The speed limit came from several reasons. CMOS driving capability limitation and the signal degradation through chip packaging and printed circuit board (PCB) were among the main factors. The skew-insensitive receiving operation was also observed. There are four high-speed pins in the prototype chip. We made a PCB with four high-speed impedance controlled bus lines. The length of normal lines is 12 cm. One of the highspeed signal paths was made intentionally longer than the other signals by 10 cm. The 960 Mb/s high-speed serial data was sent into the receiver. The receiver recovers the serial data into 8-b 120 - MHz parallel data. Fig. 19 shows 120 MHz recovered parallel data. The upper waveform is from the

TABLE I MAIN FEATURES OF THE CHIP

2 0.7 mm

Core Area

3.6 mm

Technology

0.6-m double-metal CMOS

Supply Voltage Data Rate

960 Mb/s

PLL jitter Power

15.8 ps rms @ 960 Mb/s 0.7 W fully active

3.3 V

pin with a longer trace. The lower waveform is from the normal length pin. Although the two pins have different trace lengths, the chips could receive data without errors. The power dissipation at 960 Mb/s was 0.7 W for the master chip. The chip characteristics is summarized in Table I.

KIM et al.: 960-Mb/s/pin INTERFACE FOR SKEW-TOLERANT BUS

Fig. 19.

699

Skew-insensitive I/O operation.

V. CONCLUSION A new high-speed skew-insensitive I/O scheme has been described in this paper. Two chips that incorporated the new I/O scheme using the low jitter PLL technique have been fabricated in a 0.6- m double-metal CMOS process. Three times oversampling technique relaxed the strict requirement of setup and hold margins of high-speed chip-to-chip interfaces. Newly designed fast phase frequency detector and a high noise immunity VCO circuit improved jitter performance of PLL. The measured PLL rms jitter was 15.7 ps. Accurate multiphase clock generation for oversampling the bus signal was made possible by utilizing the low jitter PLL. By using such techniques, skew-insensitive data transfer was tested. This skew-insensitive I/O scheme is useful for high-speed ASIC-to-memory and ASIC-to-ASIC interfaces. This scheme will become more important as the chip-to-chip data transfer speed goes up.

REFERENCES [1] M. Horiguchi et al., “An experimental 220 MHz 1 Gb DRAM,” in ISSCC 1995 Dig. Tech. Papers, pp. 252–253. [2] M. Horowitz et al., “PLL design for a 500 MB/s interface,” in ISSCC 1993 Dig. Tech. Papers, pp. 160–161. [3] T. H. Lee et al., “A 2.5 V CMOS delay-locked loop for an 18 Mbit, 500 Megabytes/s DRAM,” IEEE J. Solid-State Circuits, vol. 29, pp. 1491–1496, Dec. 1994. [4] E. Reese et al., “A phase tolerant 3.8 GB/s data-communication router for a multiprocessor supercomputer backplane,” in ISSCC 1994 Dig. Tech. Papers, Feb. 1994, pp. 296–297. [5] S. Kim et al., “A pseudo-synchronous skew-insensitive I/O scheme for high bandwidth memories,” in Proc. Symp. VLSI Circuits, June 1994, pp. 41–42. [6] M. Bazes and R. Ashuri, “A novel CMOS digital clock and data decoder,” IEEE J. Solid-State Circuits, vol. 27, pp. 1934–1940, Dec. 1992. [7] S. Kim et al., “An 800 Mbps multi-channel CMOS serial link with 3 oversampling,” in Proc. IEEE Custom Integrated Circuit Conf., 1995, pp. 451–454. [8] I. Young et al., “A PLL clock generator with 5 to 110 MHz lock range for microprocessors,” IEEE J. Solid-State Circuits, vol. 27, pp. 1599–1607, Nov. 1992.

2

[9] H. Notani et al., “A 622-MHz CMOS phase-locked loop with prechargetype phase frequency detector,” in Proc. Symp. VLSI Circuits, June 1994, pp. 129–130.

Sungjoon Kim (S’91) was born in Pusan, Korea, on June 2, 1970. He received the B.S. and M.S. degrees in electronics engineering from Seoul National University in 1992 and 1994, respectively. Since 1994 he has been working toward the Ph.D. degree in the same university. He spent the summer of 1995 working on the limiting factors of CMOS Gb/s transmission at SUN Microsystems, CA. His research interests include clock and data recovery for high-speed communication and high-speed I/O interface circuits.

Kyeongho Lee (S’92) was born in Seoul, Korea, on August 5, 1969. He received the B.S. and M.S. degrees in electronics engineering from Seoul National University in 1993 and 1995, respectively. He is currently working toward the Ph.D. degree in electronics engineering of the same university. He is working on various CMOS high-speed circuits for data communication. His research interests include high-speed CMOS interface circuits, highspeed video display system, and PLL systems for Gigabit communication.

Yongsam Moon (S’97) was born in Incheon, Korea, on March 1, 1971. He received the B.S. and M.S. degrees in electronics engineering from Seoul National University in 1994 and 1996, respectively, where he is currently working toward the Ph.D. degree. He has been working on architectures and CMOS circuits for microprocessors. His current research interests are in clock and data recovery circuits for high-speed data communication.

700

Deog-Kyoon Jeong (S’87–M’89) received the B.S. and M.S. degrees in electronics engineering from Seoul National University, Seoul, Korea, in 1981 and 1984, respectively, and the Ph.D. degree in electrical engineering and computer sciences from the University of California, Berkeley, in 1989. From 1989 to 1991, he was with Texas Instruments, Dallas, TX, where he was a member of the technical staff working on the single chip implementation of the SPARC architecture. Since 1991, he has been on the faculty of the School of Electrical Engineering and the Inter-University Semiconductor Research Center, Seoul National University. His main research interests include high-speed circuits, VLSI systems design, microprocessor architectures, and memory systems.

Yunho Choi was born in Incheon, Korea, on March 29, 1960. He received the B.S. degree in electrical engineering from Seoul National University, Seoul, Korea, in 1983. He joined Samsung Semiconductor Inc., Santa Clara, CA, in 1983, where he was engaged in the design of the 256K DRAM. Since 1986, he has been working on the design of high-density dynamic memory including synchronous DRAM at the Semiconductor Research Center, Samsung Electronic Company, Ltd., Kiheung, Korea. Currently he is in charge of specialty memory design such as graphics memory and merged DRAM and logic product development.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 5, MAY 1997

Hyung Kyu Lim (S’82–M’84) was born February 4, 1953, in Kyung-Nam, Korea. He received the B.S. degree from the Seoul National University, Seoul, Korea, the M.S. degree from the Korea Advanced Institute Science and Technology, and the Ph.D. degree from the University of Florida, Gainesville, all in electrical engineering, in 1976, 1978, and 1984, respectively. Since 1976, he has been with the Semiconductor Research and Development Center, Samsung Electronics Co., Kiheung, Korea. From 1978 to 1981, he was engaged in the development of bipolar linear integrated circuits and CMOS watch chips. After finishing his Ph.D. study, he worked mainly in the area of high-density MOS memory development. Starting from a 64 Kb EEPROM design in 1984, he led various memory device research and development projects that include 256 Kb EEPROM, 16 Mb mask ROM, 1 Mb high-speed static Ram, and 1/3 inch CCD image sensor. He is currently responsible for design engineering of all MOS memory research and development projects in which dynamic RAM and specialty memories are added. He has authored or coauthored over 20 technical journal and conference papers and holds 23 patents. Dr. Lim is a member of the IEEE Electron Device Society.

784

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

A Dual-Loop Delay-Locked Loop Using Multiple Voltage-Controlled Delay Lines Yeon-Jae Jung, Seung-Wook Lee, Daeyun Shim, Wonchan Kim, Changhyun Kim, Member, IEEE, and Soo-In Cho

Abstract—This paper describes a dual-loop delay-locked loop (DLL) which overcomes the problem of a limited delay range by using multiple voltage-controlled delay lines (VCDLs). A reference loop generates quadrature clocks, which are then delayed with controllable amounts by four VCDLs and multiplexed to generate the output clock in a main loop. This architecture enables the DLL to emulate the infinite-length VCDL with multiple finite-length VCDLs. The DLL incorporates a replica biasing circuit for low-jitter characteristics and a duty cycle corrector immune to prevalent process mismatches. A test chip has been fabricated using a 0.25- m CMOS process. At 400 MHz, the peak-to-peak jitter with a quiet 2.5-V supply is 54 ps, and the supply-noise sensitivity is 0.32 ps/mV.

(a)

Index Terms—Clock synchronization, delay-locked loop, duty cycle corrector, replica biasing, voltage-controlled delay lines. (b)

I. INTRODUCTION

Fig. 1. (a) Block diagram of a conventional DLL. (b) Lock-failure cases.

F

OR high-performance microprocessors and memory ICs, the use of phase-locked loops (PLLs) or delay-locked loops (DLLs) is essential to minimize the negative effects caused by skews and jitters of clock signals. In applications where the frequency multiplication is not required, a DLL is a natural choice since it is free from the jitter accumulation problem of an oscillator-based PLL. Conventional DLLs, however, suffer from the problem of their limited delay range since DLLs adjust only the phase, not the frequency. We propose a new dual-loop DLL architecture that allows unlimited delay range by using multiple voltage-controlled delay lines (VCDLs). In our architecture, the reference loop generates four evenly spaced clocks, which are then delayed with controllable amounts by four VCDLs and multiplexed to generate the output clock in the main loop. The selection and delay control in the main loop permit the DLL to emulate the infinite delay range with a multiple of finite-length VCDLs. Moreover, a fully analog control technique can be applied to exploit the established benefits of conventional DLLs such as low skew and low jitter. To reduce supply-noise sensitivity further, a new low-jitter scheme is employed in a replica biasing circuit, which compensates the delay variation of a delay line against the injected supply noise. Finally, a duty cycle corrector immune to process mismatches is also used. This paper is arranged as follows. In Section II, following a brief overview of conventional DLLs, the proposed architecture Manuscript received June 27, 2000; revised October 15, 2000. Y.-J. Jung, S.-W. Lee, D. Shim, and W. Kim are with the School of Electrical Engineering, Seoul National University, Seoul 151-742, Korea. C. Kim and S.-I. Cho are with Samsung Electronics Company, Kyungki-do, Korea. Publisher Item Identifier S 0018-9200(01)03028-1.

is described with design concepts and various building blocks. Section III describes circuits for low-jitter scheme and duty cycle correction. Section IV discusses the prototype chip implementation and shows experimental results. Section V concludes this paper with a summary. II. ARCHITECTURE A. Limited Range Problem of Conventional DLLs A simplified block diagram of a conventional DLL [1] is outlined with its lock-failure cases in Fig. 1. In the normal condi) to be aligned tion, the DLL forces the output clock ( ) through the negative with the input reference clock ( feedback loop, which comprises a voltage-controlled delay line, a phase detector, a charge pump, and a loop filter. The clock buffer (CLK-BUF) is inserted to provide the chip-wide clock. Although this simple architecture offers many design flexibilities, the main problem in the conventional DLL of Fig. 1(a) ) has a minimum is that the delay time of the VCDL ( and a maximum boundary. Therefore, the DLL has states in which it does not work, as shown in Fig. 1(b). When has a maximum delay and the leads the , DN pulses are generated but the VCDL can not produce any more has a minimum delay delay. On the other hand, when lags the , UP pulses are generated but and the the VCDL cannot reduce any more delay. These lock-failure is limited cases arise from the facts that the range of is not known at loop startup. An and the initial value of additional loop startup control circuitry may solve this problem and the DLL acquire lock. Unfortunately, the delay time of the ) clock buffer and following clock distribution tree (

0018–9200/01$10.00 © 2001 IEEE

JUNG et al.: DUAL-LOOP DELAY-LOCKED LOOP

Fig. 2.

Block diagram of the proposed dual-loop DLL.

deviates from the value at the simulation stage according to temperature and voltage variations [2]. When the variation of is excessive, the DLL loses the lock and falls into the lock-failure cases in Fig. 1(b). A DLL relying on quadrature phase mixing [3] has been proposed to overcome the limited range problem of the conventional DLL. The phase mixing technique using quadrature clocks provides unlimited phase shift capability. However, phase mixing uses two small slew-rate clocks to obtain linear results. Therefore, this approach has the disadvantage of the increased dynamic noise sensitivity and jitter. In the semidigital DLL [4], a digitally controlled phase interpolator uses internally generated 30 -spaced clocks through the dual DLL architecture. Although noise sensitivity issues on the phase interpolation could be alleviated by smaller interpolation intervals, inherent digital nature causes dithering around zero phase error due to continuous control-bit updates. A digital DLL architecture with infinite phase capture ranges [5] is also not free from the same dithering problem and requires a large chip area for fine delay control. B. Proposed Dual-Loop DLL Fig. 2 shows a block diagram of the proposed dual-loop DLL architecture [6]. This architecture is based on two loops: the reference loop and the main loop. The reference loop is locked at 180 phase shift through the conventional DLL architecture. Since the reference loop VCDL is composed of four main delay cells, each delay cell generates a 45 phase shift at locked condition. All delay cells including delay buffers are differential elements commonly controlled by the output of the charge pump. The delay cell named “3” means three parallel-connected delay cells, so that the load balance between 0 and 180 clock is preserved. The reference loop provides two differential clocks spaced by 90 to the main loop. To cover the entire 360 phase range, clocks from the reference loop are partially inverted and inputted to four sets of VCDL in the main loop. Each main loop VCDL is composed of three delay cells and generates low swing , , and . These clocks expeinternal clocks- , rience the analog delay time control by two kinds of four con-

785

trol voltages generated from two main loop charge pumps. The multiplexer selects one of four clocks as and this clock feeds the clock buffer whose function is to convert low swing to full CMOS-level as well as provide the chip-wide output clock, . The drives the phase detector which compares it to the reference clock. The output of the phase detector is used by two charge pumps and four loop filters to control the delay time of each main loop VCDL. Four-to-one clock switching is implemented by the window finder and the state decoder block. The window finder monitors the boundary where the seis switched and forces the state decoder to update lected the two-bit selection code at the switching event. The selection code not only controls the clock selection at the multiplexer but changes the configuration of two charge pumps and four loop filters to accommodate the clock switching. Duty cycle correction (DCC) is employed to remove the duty cycle imperfections and the output clock . Fiof the input clock and , can nally, although two input clocks, be merged into one clock input, lower jitter clock source is pre, if possible, since it determines the jitter ferred as the characteristics of the whole DLL. In this architecture, the clock selection scheme enables the output clock to cover the entire phase range (modulo ). Furthermore, seamless clock switching is possible by optimizing the main loop VCDL delay control scheme. Moreover, the phase locking is achieved by fully analog control in all loops, so that we can apply low-skew and low-jitter techniques, established in conventional DLLs. C. Reference Loop Design The objectiveness of the reference loop is to provide quadrature clocks to the main loop. Since the main loop uses these multiphase clocks as references, the phase distribution in the output clocks should be preserved against a possible harmonic lock. The reference loop phase detector depicted in Fig. 3(a) has the capability to detect and escape up to the second harmonic lock. This design is made of two level-sensitive AND/NAND logic which requires 45 and 90 clocks as well as 0 and 180 clocks. At one period lock, clocks and UP/DN output waveforms are shown in Fig. 3(b). The phase detector asserts their UP and DN outputs for equal duration due to 45 clock in order to avoid a dead-zone problem, although the phase offset of the reference loop gives negligible effects on the offset of the main loop output clock. At the second harmonic lock as shown in Fig. 3(c), the phase detector detects that the loop is in the harmonic lock due to 90 clock and asserts only UP output to escape the harmonic lock. By limiting the delay range of a delay line, there is no possibility of harmonic lock over third since the reference loop is composed only of delay cells with no additional delay elements such as the clock buffer. D. Main Loop Design The main loop design is focused on the selection control and delay control of the main loop VCDL to achieve the infinite delay range by using four finite-length VCDLs. Fig. 4(a) shows the conceptual timing diagram of the main loop VCDL selection clock is selected as , the control. Assuming

786

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Fig. 4. Selection control of the main loop VCDL. (a) Conceptual timing diagram. (b) Block diagram of the control logic.

Fig. 3. Reference loop phase detector. (a) Block diagram . (b) Operation at 1 period lock. (c) Operation at 2 period lock.

moves in the movable range according to the output of the main loop phase detector. Other clocks remain fixed at the initial phase relationship spaced by 90 . When the rising edge of the coincides with that of (or ) clock, “select up” is changed to (or “select down”) is generated and then (or ) clock. Now (or ) clock acts as a new selected clock in a right-shifted (or left-shifted) movable range. Thus, clock switching at the quadrant boundaries can be repeated in this manner, to cover the entire phase range. Fig. 4(b) shows a block diagram of the selection control logic. Since the passes through the MUX stage, a MUX replica is reand all internal quired for delay matching between the clocks. Therefore, clock waveforms in Fig. 4(a) are validated. In the window finder, one inverter–one NAND pair makes the window which is bounded by rising edges of two input clocks. Thus, four windows are generated. Sampled values of these winenable the window finder to find which dows by the belongs to. If the found window is the “sewindow the lect up” or “select down” region, UP or DN signal is generated, respectively. Then, the state decoder updates two-bit selection in one clock cycle. Although clock code to change the switching occurs immediately after the switching event, there is since the possibility of the small delay difference in the may have a different time position the rising edge of old after clock switching. This delay differwith that of new ence can be represented as a switching jitter at the lock state. The delay control of the main loop VCDL should be optimized between two conflicting conditions, delay range and power consumption. More delay cells mean larger delay range but their power consumption is proportional to the number of

required delay cells. Furthermore, a larger delay causes a larger jitter. Intuitively, we apply a single control scheme as shown in rotates and other clocks remain Fig. 5(a), where only the fixed in phase space. Thus, clock switching occurs at the quadrant boundaries. Unfortunately, since the required delay range is from 90 to 90 , this control scheme consumes the same number of delay cells per VCDL as those in the reference loop. In order to reduce the number of required delay cells, a differential delay control scheme is employed. The differential conrotates counterclockwise, all trol means that when the other clocks rotate clockwise with their phase relationship fixed. If all clocks move with same speed, the required delay range is from 45 to 45 , as shown in Fig. 5(b). However, if the must rotate in the opposite direction after switching due to the delay fluctuation of the reference clock or the clock buffer, there is the problem of losing the lock since the delay range of a VCDL was already exhausted. In Fig. 5(c), we adopt a differential delay control with 3 speed difference, where the moves three times faster than other clocks, so that 3/4 of delay cells in the single delay control case satisfy the required delay range, 67.5 to 67.5 . Since 3 speed difference provides a shared region in the available delay range of two neighboring clocks, seamless clock switching is possible in any direction without losing the lock with three delay cells per VCDL. Fig. 6 shows the configuration of the main loop phase detector, charge pumps, and loop filters. Outputs of the phase detector are connected to the charge pump1 (CP1) directly and to the charge pump2 (CP2) with inversion. Thus, if the CP1 generates an increasing control voltage for a VCDL which gener, the CP2 generates a decreasing control voltage ates the for all other VCDLs. As a result, two substantially identical charge pumps are used for the differential delay control scheme. Three times speed difference is implemented by the fact that the CP1 has one loop filter and the CP2 has three loop filters. In

JUNG et al.: DUAL-LOOP DELAY-LOCKED LOOP

787

Fig. 5. Delay control of the main loop VCDL. (a) Single control with other clocks fixed. (b) Differential control with same speed. (c) Differential control with 3 speed difference.

2

Fig. 6. Configuration of the main loop phase detector, charge pumps, and loop filters.

case of clock switching, the selection code alters the connection between charge pumps and loop filters. Consequently, charge redistribution occurs between three loop filters except a loop . This charge redistribution proceeds filter for the new rapidly since two different voltages converge into one value. The fast VCDL control voltage change prevents possible dithering around the clock switching phase. Fig. 7 shows one example of the main loop VCDL control procedure starting at the unlock state. Let us assume the should be near 180 in phase space to acquire the lock. Initially, clock is selected as the assuming the selection code is “00,” . The rotates counterclockwise in phase space according to outputs of the phase detector. All clocks excluding rotates clockwise with one-third speed compared to the . Before the delay range of the VCDL genthat of the is reached at a limit, the is changed erating the clock. Thus, the selection code is “01.” All clocks exto settle near their original phase positions cept the new -phase space by the charge redistribution of loop filwith still moves counterclockters. After clock switching, the clock. Since this “10” state is near wise to be switched to the lock state, the DLL can acquire the lock by a minor delay control. However, let us assume the delay time of the must decrease due to the delay fluctuation of the reference clock or the clock buffer. Similarly in the delay increase case, beis returned fore a VCDL delay range is exhausted, the

Fig. 7.

Example of the main loop VCDL control procedure.

to clock, “01” state. In result, the proposed DLL covers the entire phase range and remains at the lock state in any direction switching by optimizing the control schemes of multiple VCDLs. Therefore, since this architecture makes it possible to emulate the infinite-length VCDL by using multiple finite-length VCDLs, the DLL overcomes the problem of conventional DLLs, described by the limited delay range and the initial phase relationship constraint. III. LOW JITTER SCHEME AND DCC A. Low-Jitter Scheme The jitter performance of the DLL is degraded by various noise sources, typically in the form of supply and substrate noise in high speed and highly integrated circuits. To reduce the jitter, the loop bandwidth should be set as high as possible but must have an upper limit for stability issues. Thus, low-jitter DLL designs strongly depend on the delay characteristics of a delay line with supply-noise injection. In order to design the delay line with low supply-noise sensitivity, the replica biasing for the delay control must be considered in noisy environment. The replica biasing circuit, which consists of a half-replica of a differential delay cell and an operational amplifier (op-amp), sets the low swing level of the delay cell to the reference voltage, . In the conventional replica biasing, the tracks the supply variation with the same amount. Unfortunately, this is not the optimal solution. The variation of the op-amp gain and

788

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Fig. 8. (a) Circuit diagram of a replica biasing. (b) Operation of a reference voltage generator under supply-noise injection.

the tail-current source distorts the delay characteristics of the delay cell. The delay equation of this case is described by (1) where delay time of a delay cell; load capacitance; swing voltage of the delay cell; current of the tail-current source. , since is positive For a positive supply variation of negative, greatly decreases. and In the design depicted in Fig. 8(a), an additional reference voltage generator is attached to the replica biasing circuit. The reference voltage generator is composed of one transistor and , in the two resistors and generates the reference voltage, nominal supply condition. When there is a supply variation of , the reference voltage generator produces a predetermined , which is a reduced swing compared to , variation of as shown in Fig. 8(b). The reduced swing compensates the delay variation due to the aforementioned variations induced by supply noise. Thus, supply-noise sensitivity can be minimized. supply noise, the desired is a function For a given across transistor as follows: of (2)

Fig. 9. (a) Block diagram of the duty cycle corrector [3]. (b) Circuit diagram of the proposed duty detection stage.

The sensitivity of over to process variations should be analyzed to guarantee a reliable operation. For example, the of the transensitivity to the threshold voltage variation can be obtained by (3), shown at the bottom of the sistor means of transistor . The senpage. In (3), with a of 100 mV. sitivity value is in the order of Similar analyses with other process parameters also show that is kept nearly constant under moderate the predetermined process variations. This replica biasing circuit is commonly applied to all VCDLs of the reference loop and the main loop to achieve the low-jitter characteristics through the whole DLL. B. Duty Cycle Corrector The duty cycle of clock signals within the DLL deviates from its ideal value of 50% due to various asymmetries in signal paths and voltage offsets in an off-chip generated reference clock. For applications in which the timing of both edges of the clock is critical, a duty cycle corrector (DCC) is required to maximize timing margins. A DCC [3] in Fig. 9(a) is configured as the error-voltage feedback with a corrector stage and a duty detection stage. The duty detection stage outputs the differen), which is proportional to the tial control voltage ( . This differduty cycle error of inputted clocks ential control voltage then effectively introduces offset voltage at the corrector stage to correct the to clock inputs duty cycle of output clocks.

(3)

JUNG et al.: DUAL-LOOP DELAY-LOCKED LOOP

789

Fig. 12.

Selection code waveforms with the refCLK input grounded.

Fig. 10. Simulated mismatch sensitivity characteristics of the DCC with the proposed duty detection stage.

Fig. 11.

Prototype chip layout.

As the clock frequency is increased, tighter bound is placed on the performance of the DCC. Even worse, process mismatches between transistors work as a serious error factor in the DCC especially under deep-submicron technology. Although process mismatches plague all devices, special care must be paid to the duty detection stage since near-ideal performance of this stage can remove the duty cycle distortion caused by the mismatches of all other nonideal blocks. The proposed duty detection block is based on two stacked source-coupled pairs configuration, as shown in Fig. 9(b). The source-coupled pair is immune to device mismatches due to its current steering capability, i.e., since for fairly large input signals, the source-coupled pair conducts the current set by the tail-current source through only one branch, various mismatch effects in transistors can be hidden. The common-mode problem of this approach is solved by the transistors in boxed area, comprising the self-biasing technique [7], which enables the output common mode to be dynamically adjusted by input clocks. Two transistors with source and drain tied are added to eliminate the load imbalance caused by the self-biasing circuit. Fig. 10 shows the simulated mismatch sensitivity characteristics of the DCC with the proposed duty detection stage over typical process mV m mismatch parameters, m [8]. Under 50% duty cycle and of the input clock, the duty cycle error is less than 2 ps, which guarantees a robust operation against process mismatches.

(a)

(b) Fig. 13. Jitter histograms at 400 MHz. (a) Quiet supply. (b) Added 2.5-MHz 300-mV square-wave supply noise.

IV. EXPERIMENTAL RESULTS The test chip has been fabricated using a 0.25- m five-metal CMOS process. The threshold voltages in this process are

790

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

TABLE I PERFORMANCE CHARACTERISTICS OF THE PROTOTYPE DLL

achieves 54-ps peak-to-peak jitter and 0.32-ps/mV jitter supply-noise sensitivity. REFERENCES

0.57 V (nMOS) and 0.55 V (pMOS). The gate-oxide thickness is 5.8 nm. Fig. 11 shows the layout of the prototype chip. The active area of the DLL occupies 0.13 mm . Waveforms depicted in Fig. 12 shows two-bit selection code with the reference clock input grounded, while running the input clock at its nominal frequency of 400 MHz. In this configuration, the main loop phase detector always asserts DN signals. Therefore, the selection code is continuously updated in accordance with sequences of “00,” “01,” “10,” and “11.” This means the infinite times rotation of the output clock throughout the full 0 –360 range. Fig. 13(a) and (b) shows the jitter histograms of the DLL clock output at 400 MHz. Fig. 13(a) shows 6.7 ps RMS and 54 ps peak-to-peak jitter characteristics with a quiet power supply. With a 300-mV 2.5-MHz square-wave supply noise, the peak-to-peak jitter increases to 150 ps, as shown in Fig. 13(b). The ratio of the peak-to-peak jitter to the RMS jitter is well maintained in spite of supply-noise injection. Supply-noise sensitivity is measured to be 0.32 ps/mV. Table I summarizes the DLL performance characteristics. The DLL operates from 150- to 600- MHz frequency range with a 2.5-V supply. Static phase error between the reference clock and the output clock of the DLL is less than 20 ps. Operating at 400 MHz, the DLL dissipates 60 mW. V. CONCLUSION We have described a dual-loop DLL architecture that allows the unlimited delay range by using multiple VCDLs. The reference loop generates four evenly spaced clocks without a possible harmonic lock. Clock selection in the main loop enables the DLL to cover the entire phase range and seamless clock switching is achieved by optimizing the main loop VCDL delay range control. Thus, this architecture can emulate the infinite-length VCDL with multiple finite-length VCDLs. To obtain low supply-noise sensitivity, the low-jitter scheme generates a reduced swing voltage compared to supply noise for the delay compensation of a delay line. Finally, a duty cycle corrector presents a high immunity to process mismatches with the help of two stacked source-coupled pairs configuration. A prototype fabricated using 0.25- m CMOS technology

[1] M. Johnson and E. Hudson, “A variable delay line PLL for CPU-coprocessor synchronization,” IEEE J. Solid-State Circuits, vol. 23, pp. 1218–1223, Oct. 1988. [2] T. Yoshimura, Y. Nakase, N. Watanabe, Y. Morooka, Y. Matsuda, M. Kumanoya, and H. Hamano, “A delay-locked loop and 90 phase shifter for 800-Mb/s double data rate memories,” in Symp. VLSI Circuits Dig. Tech. Papers, June 1998, pp. 66–67. [3] T. H. Lee, K. S. Donnelly, J. T. C. Ho, J. Zerbe, M. G. Johnson, and T. Ishikawa, “A 2.5-V CMOS delay-locked loop for an 18-Mb 500Mbyte/s DRAM,” IEEE J. Solid-State Circuits, vol. 29, pp. 1491–1496, Dec. 1994. [4] S. Sidiropoulos and M. A. Horowitz, “A semidigital dual delay-locked loop,” IEEE J. Solid-State Circuits, vol. 32, pp. 1683–1692, Nov. 1997. [5] K. Minami et al., “A 1-GHz portable digital delay-locked loop with infinite phase capture ranges,” in ISSCC Dig. Tech. Papers, Feb. 2000, pp. 350–351. [6] Y.-J. Jung, S.-W. Lee, D. Shim, W. Kim, C.-H. Kim, and S.-I. Cho, “A low-jitter dual-loop DLL using multiple VCDLs with a duty cycle corrector,” in Symp. VLSI Circuits Dig. Tech. Papers, June 2000, pp. 50–51. [7] M. Bazes, “Two novel fully complementary self-biased CMOS differential amplifiers,” IEEE J. Solid-State Circuits, vol. 26, pp. 165–168, Feb. 1991. [8] M. J. M. Pelgrom, H. P. Tuinhout, and M. Vertregt, “Transistor matching in analog CMOS applications,” in IEDM Dig. Tech. Papers, Dec. 1998, pp. 915–918.

Yeon-Jae Jung was born in Korea in 1974. He received the B.S. and M.S. degrees from the School of Electrical Engineering, Seoul National University, Seoul, Korea, in 1997 and 1999, respectively, where he is currently working toward the Ph.D. degree. He has worked on architectures and CMOS circuits for high-speed I/O interfaces. His current research interests include high-speed CMOS circuits and communication ICs.

Seung-Wook Lee was born in Seoul, Korea, in 1971. He received the B.S. and M.S. degrees in electronics engineering from Seoul National University, Seoul, Korea, in 1995 and 1997, respectively, where he is currently working toward the Ph.D. degree in the School of Electrical Engineering. His research interests include CMOS RF circuit design and high-speed communication interfaces. Mr. Lee is the winner of the Bronze Prize of the IC design contest held by the Federation of Korean Industries in 1995.

Daeyun Shim was born in Seoul, Korea, in 1962. He received the B.S., M.S., and Ph.D. degrees in electronics engineering from Seoul National University, Seoul, Korea, in 1985, 1987, and 2000, respectively. His Ph.D. dissertation was related to the design of high-speed locking clock generators. Since 1987, he has been working on digital video signal processing and ASIC design at Samsung Electronics Corporation. His research interests are video signal processing and compression, high-speed digital circuit design, and high-speed locking systems. He is currently working on DVD-PRML system design.

JUNG et al.: DUAL-LOOP DELAY-LOCKED LOOP

Wonchan Kim was born in Seoul, Korea, in 1945. He received the B.S. degree in electronics engineering from Seoul National University, Seoul, Korea, in 1972. He received the Dip.-Ing. and Dr.-Ing. degrees in electrical engineering from the Technische Hochschule Aachen, Aachen, Germany, in 1976 and 1981, respectively. In 1972, he was with Fairchild Semiconductor Korea as a Process Engineer. From 1976 to 1982, he was with the Institut für Theoretische Electrotecnik RWTH, Aachen. Since 1982, he has been with the School of Electrical Engineering, Seoul National University, where he is currently a Professor. His research interests include development of semiconductor devices and design of analog/digital circuits.

Changhyun Kim (M’85–S’90–M’95) received the B.S. and M.S. degrees in electronics engineering from Seoul National University, Seoul, Korea, in 1982 and 1984, respectively and the Ph.D. degree in electrical engineering from the University of Michigan, Ann Arbor, in 1994. In 1984 he joined Samsung Electronics Company, Ltd. (SEC), Kyungki-do, Korea, where he was involved with the circuit design for high speed dynamic RAM, ranging from 64 kb to 16 Mb densities. From 1989 until 1994, he was a Research Assistant in the Center for Integrated Sensors and Circuits, University of Michigan. His present research interest is in the area of circuit design for low-voltage high-performance gigascale DRAMs and future DRAM architecture. He has served as the Committee Member of the Symposium on VLSI circuits since 1995. Dr. Kim received the Grand Prize of the Samsung group for the successful development of 1-Mb and 1-Gb DRAMs in 1986 and 1996, respectively. His work on the characterization of submicron devices and reliability issues in highdensity DRAM, including reducing soft-error rate and reducing sensitivity to electrostatic discharge problems, earned a technical achievement award from the Samsung R&D Center in 1988. At the Center for Integrated Sensors and Circuits in 1991 and 1993, he won first prizes for design excellence in student VLSI design contests sponsored by several U.S. companies.

791

Soo-In Cho was born in Seoul, Korea, in 1957. He received the B.S. degree in electronics engineering from Seoul National University, Seoul, Korea, in 1979. He joined the Semiconductor Research and Development Center, Samsung Electronics Company, Ltd., Kyungki-Do, Korea, in 1979, where he was engaged in the design of CMOS logic LSI. Since 1983, he has been working on MOS dynamic memory design.

582

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997

A Low Jitter 0.3–165 MHz CMOS PLL Frequency Synthesizer for 3 V/5 V Operation Howard C. Yang, Lance K. Lee, and Ramon S. Co

Abstract— This paper describes a phase-locked loop (PLL)based frequency synthesizer. The voltage-controlled oscillator (VCO) utilizing a ring of single-ended current-steering amplifiers (CSA) provides low noise, wide operating frequencies, and operation over a wide range of power supply voltage. A programmable charge pump circuit automatically configures the loop gain and optimizes it over the whole frequency range. The measured PLL frequency ranges are 0.3–165 MHz and 0.3–100 MHz at 5 V and 3 V supplies, respectively (the VCO frequency is twice PLL output). The peak-to-peak jitter is 81 ps (13 ps rms) at 100 MHz. The chip is fabricated with a standard 0.8-m n-well CMOS process. Index Terms—CMOS phase-locked loop, current-steering amplifier, current-steering logic, frequency synthesizer, low noise, low voltage VCO.

I. INTRODUCTION

W

ITH the ever-increasing performance and decreasing price of microprocessors and PC/workstation systems, much more stringent requirements have been placed on the design of system clock synthesizers. Today’s high-performance clock synthesizers are often required: 1) to operate over a wide frequency range (high frequency for increased performance and low frequency for power saving) using a crystal oscillator input with constant frequency; 2) to have small phase jitter and frequency variation; 3) to operate from 3 V/5 V supplies for both portable and desktop systems; 4) to have smooth transition between high and low frequencies; and 5) to have integrated loop filter on the chip. To satisfy all these requirements simultaneously, a stable loop over the entire operating frequency range is needed; and a low jitter, wide frequency range, and variable supply voltage-controlled oscillator (VCO) circuit is essential. Several design techniques that improve the performance of a phase-locked loop (PLL) are presented in this paper. A current D/A converter is implemented to control the PLL bandwidth so that the loop performance is optimized over the operating frequency range of 0.3–165 MHz. The VCO is formed by a ring of single-ended current-steering amplifier (CSA) cells, which were first introduced as a current-steering logic (CSL) family for low noise and low power supply applications [1]. This VCO circuit can operate over a wide frequency range with low phase jitter at variable power supply. To achieve smooth frequency transitions, a pulse width limiting circuit is used to control the pulse width of the phase/frequency detector output. Manuscript received June 25, 1996; revised October 4, 1996. H. C. Yang and L. K. Lee are with the Shanghai Belling Microelectronics Manufacturing Co., Ltd., Shanghai 200233, P.R. China. R. S. Co is with the Kingston Technology Corp., Fountain Valley, CA 92708 USA. Publisher Item Identifier S 0018-9200(97)02477-3.

In Section II of this paper, the PLL architecture is described with an analysis of the loop stability and loop optimization. The circuit design techniques for the PLL are considered in Section III. The measured results are discussed in Section IV. Finally, conclusions are made in Section V regarding this work. II. PLL ARCHITECTURE It is often difficult to design a PLL that can operate over a wide frequency range due to the practical limit of the capacitor size that can be integrated for the loop filter. One method to widen the frequency range is to vary the PLL bandwidth as a function of the desired output frequency. This principle is applied in our design by utilizing a current D/A converter which controls the charge pump current. With this technique, we can also optimize the loop performance, including damping factor and loop gain, over the entire operating frequency range. A block diagram of a conventional frequency synthesizer is enclosed in the dashed-line box in Fig. 1. The output frequency is synthesized as (1) Using linear approximations, the loop equations of the PLL [2] for stability analysis are (2) (3) is the loop gain, is the damping factor, is the where VCO gain, is the charge pump current, is the loop filter is the integration capacitor of the loop filter. In resistor, and order to have an adequate margin of stability, (2) and (3) must satisfy the constraints and , where is the operating frequency of the phase/frequency detector. The former constraint is required to prevent aliasing effects as a result of the Nyquist Criterion, while the latter constraint is required to have a satisfactory transient response. In practice, a loop gain which is ten times less than the phase/frequency detector frequency [2] is more than adequate. For a clock synthesizer with a frequency range of 0.3–165 MHz, the feedback divider in the loop equations can vary by more than 20 times. The tolerances of the loop parameters ( , and ) would also have to be accounted for. These conditions make the constraints for stability margin extremely difficult to satisfy if is kept within a reasonable size (a few hundred picofarads) for onchip integration. As shown in Figs. 1 and 2, the proposed

0018–9200/97$10.00  1997 IEEE

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997

583

Fig. 1. Block diagram of frequency synthesizer.

Fig. 2. Programmable charge pump with D/A converter control.

PLL architecture uses a current D/A converter to control the charge pump current, . The product is optimized for given values of and such that the stability margin constraints are satisfied by (2) and (3). Using a decoding table, the optimization is performed by the logic block in Fig. 1 which sets the charge pump current upon examination of and . Typically, a loop bandwidth which is ten times less than for the entire frequency range can be easily achieved using this architecture. III. CIRCUIT DESIGN TECHNIQUES A. VCO Circuit Two types of VCO based on the ring oscillator topology are commonly used in CMOS PLL design: the current-starved inverter based VCO [3]–[5] and the differential-pair based VCO [6]–[8]. In spite of a wide frequency range, the first type of VCO is sensitive to power supply noise. Although an on-chip voltage regulator can be used to reduce the effect of power supply noise [5], it is not effective for operation at high frequency since a voltage regulator inherently has poor ac rejection. Another drawback of using an on-chip regulator is that it reduces the useful power supply range, making it undesirable for low-supply applications. A local feedback

loop on the VCO can also be used for reducing the PLL jitter; however, it is likely to cause glitches or overshoots whenever the frequency transition mode is activated due to complicated feedback loops [8]. Hence, it is not a suitable approach for microprocessor applications, wherein a smooth transition between frequencies is usually required. While the second type of VCO rejects power supply noise well, the frequency range of operation may not be sufficient for some applications. To widen the frequency range of differential-pairbased VCO’s, complex MOS resistors [7] can be used at the cost of higher supply and more complex design. The VCO design in this work utilizes a simple CSA circuit [1]. Fig. 3(a) shows a CSA cell which consists of a current source, , and a pair of NMOS devices. is the input device and is the load. When is high, turns on, sinking the bias current , while shuts off. Under this condition, the on resistance of defines the output low voltage, . When is low, turns off and is steered to . Under this condition, the resistance of the diode-connected defines the output high voltage, . By varying the bias current, , a current-controlled CSA-based ring oscillator is formed with an output voltage swing of (4) (typically between 1 and 2 V) Equation (4) indicates that varies with . Thus, the voltage swing of the CSA cell increases correspondingly with frequency. This is a desirable feature because the signal level improves at high frequency when the power supply switching noise becomes worse. Since the voltage swing is limited by the diode-connected , the current source always operates in the saturation region; consequently, very small switching noise is generated. For an n-well process, the PMOS current source can be guarded by its own well and isolated from the noisy p-substrate. The current source also buffers the output from , thereby reducing the noise injected from to the output. Any ground noise (coupled from other circuitry within the chip) is rejected by the CSA as a common mode noise because both its output and input are referred to the same ground. By referring the charge pump, loop filter, V/I converter, VCO, and other analog

584

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997

(a)

Fig. 5. Histogram of PLL period jitter at 100 MHz.

(b) Fig. 3. (a) Current steering amplifier (CSA) cell. (b) VCO using a three-stage CSA ring oscillator.

supply for proper operation. At V, the frequency of the CSA ring oscillator is practically independent of power supply. The maximum useful frequency of the VCO is limited by the saturation voltage ( 0.5 V) of the cascoded PMOS current source in the charge-pump circuit. The measured VCO performance is illustrated in Fig. 4. The frequency range of the VCO was observed from 174 kHz to 378.8 MHz at 5 V supply (the VCO frequency is twice the PLL output frequency). At V, the VCO frequency achieved 200 MHz indicating that the PLL can operate at 100 MHz for 3 V power supply. B. Phase/Frequency Detector

Fig. 4. Measured VCO performance.

circuits in the PLL to the same ground, i.e., p-substrate, the ground noise can be substantially rejected [9]. Fig. 3(b) shows the VCO circuit using a three-stage CSA ring oscillator. The current sources in the CSA ring oscillator are cascoded with high-swing bias. In the V/I converter, stage provides a first-order linear a single degenerated relationship between the oscillation frequency and the control is forced in the linear region to provide the highvoltage. swing cascoded bias for the VCO. This VCO is suitable for low supply voltage operation since it only needs about 2.5 V

A modified dead-zone free, phase/frequency detector (PFD) is used in this design [2]. In the frequency transition mode, a pulse width limiting circuit in the PFD limits the pulse width of the UP and DN signals. Such limited UP/DN pulses provide a finite amount of charge to the integration capacitor of the loop filter in order to slow down the frequency ramp of the VCO. The UP/DN pulses, however, cannot be made arbitrarily narrow. The noise level in the chip may dominate over the minute UP/DN correction pulses causing the PLL not to properly acquire in frequency. The maximum frequency transition rate of less than 0.1%, i.e., the difference in period between two consecutive clock cycles, is achieved by using this technique. C. Charge Pump and Loop Filter The charge pump shown in Fig. 2 is designed using cascoded current sources with CMOS switches. The amount

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997

585

Fig. 6. Measured PLL output at 155 MHz after 5 ms delay, i.e., after 775 000 clock cycles. The rms jitter (standard deviation) is 28 ps as shown.

Fig. 7. Measured frequency transition from 33.3 to 100 MHz.

of the current, typically 10–150 A, is controlled by the current digital-to-analog converter (DAC). A bandgap current reference circuit is used to compensate the variation over temperature. The opamp in the charge pump circuit reduces the transients caused by the charge transfer [4] as is switched. The capacitors in the loop filter are formed by NMOS devices with the sources and drains connected to ground and the gate connected to the filter output node. The capacitor C2 is about 400 pF. IV. MEASURED RESULTS The measured results show that the PLL operates from 0.3 to 165 MHz and 0.3 to 100 MHz at 5 V and 3 V power supplies, respectively. Since the output frequency is the VCO

frequency divided by two, a PLL output with 50% duty cycle is guaranteed. Fig. 5 shows the histogram of PLL period jitter (also referred to as short-term or clock-to-clock jitter) at 100 MHz after 42 321 hits using a standard 14.318 MHz crystal as the input reference frequency. The peak-to-peak jitter is 81 ps with an rms jitter of 13 ps. The long-term jitter of the PLL was also measured. An rms jitter of 28 ps is observed at 155 MHz with 5 ms delay, i.e., measured after a delay of 775 000 clock cycles from the triggered clock cycle, as shown in Fig. 6. In order to appreciate the noise rejection capability of this PLL, the chip was used to generate the clock for an HP laser printer. During the printing process, which is very noisy electrically and thermally, both the period jitter and the longterm jitter of the clock must be low for good printing quality. A

586

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997

SUMMARY

OF

TABLE I MEASURED PLL PERFORMANCE

Frequency Range Period (Short-Term) Jitter at 100 MHz Long-Term Jitter at 155 MHz Supply Voltage Range Output Duty Cycle VCO Linearity VCO Current Consumption Crosstalk between 2 PLL’s

0.3–165 MHz 13 ps rms, 81 ps peak-to-peak 28 ps rms after 5 ms delay 2.5 V to 7 V 50%, VCO frequency is twice PLL output 2% from 10–200 MHz 500 A at 200 MHz 50 dB down

comparison of test results between the use of this PLL and the original crystal clock showed no perceptible difference even under high magnification. Fig. 7 shows a smooth frequency transition of the PLL from 33 to 100 MHz. No frequency glitches and overshoots were observed during the transition time. Since there are two independent PLL’s on the same chip, crosstalk between the two PLL’s was also measured. The signal coupling between the two PLL’s is at least 50 dB down. The measured performance of the PLL is summarized in Table I. V. CONCLUSION In this paper, we demonstrated the design of a fully integrated CMOS PLL circuit that achieves wide operating frequency range and low jitters (both short-term and longterm) over a wide range of power supply voltage. The key

element in the design that provides all these features is the CSA-based VCO circuit. A programmable current DAC is also used to optimize the loop gain of the PLL. Smooth frequency transition is realized by using a modified PFD with a pulse width limiting circuit. The chip is implemented in a standard 0.8- m CMOS process. REFERENCES [1] D. J. Allstot, G. Liang, and H. C. Yang, “Current-mode logic techniques for CMOS mixed-mode ASIC’s,” in Proc. IEEE Custom Integrated Circuits Conf., 1991, pp. 25.2.1–25.2.4. [2] F. M. Gardner, “Charge-pump phase-lock loops,” IEEE Trans. Commun., vol. 28, pp. 1849–1858, Nov. 1980. [3] R. Shariatdoust, K. Nagaraj, M. Saniski, and J. Plany, “A low jitter 5 MHz to 180 MHz clock synthesizer for video graphics,” in Proc. IEEE Custom Integrated Circuits Conf., 1992, pp. 24.2.1–25.2.5. [4] M. G. Johnson and E. L. Hudson, “A variable delay line PLL for CPUcoprocessor synchronization,” IEEE J. Solid-State Circuits, vol. 23, pp. 1218–1223, Oct. 1988. [5] K. M. Ware, H.-S. Lee, and C. G. Sodini, “A 200-MHz CMOS phaselocked loop with dual phase detectors,” IEEE J. Solid-State Circuits, vol. 24, pp. 1560–1568, Dec. 1989. [6] B. Kim, D. N. Helman, and P. R. Gray, “A 30-MHz hybrid analog/digital clock recovery circuit in 2-m CMOS,” IEEE J. Solid-State Circuits, vol. 25, pp. 1385–1394, Oct. 1990. [7] I. A. Young, J. K. Greason, and K. L. Wong, “A PLL clock generator with 5 to 110 MHz of lock range for microprocessors,” IEEE J. SolidState Circuits, vol. 27, pp. 1599–1606, Nov. 1992. [8] D. Mijuskovic et al., “Cell-based fully integrated CMOS frequency synthesizers,” IEEE J. Solid-State Circuits, vol. 29, pp. 271–279, Mar. 1994. [9] D. J. Allstot and W. C. Black Jr., “A substrate-referenced dataconversion architecture,” IEEE Trans. Circuits Syst., vol. 38, pp. 1212–1217, Oct. 1991.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999

513

A Low-Jitter PLL Clock Generator for Microprocessors with Lock Range of 340–612 MHz David W. Boerstler

Abstract— A fully integrated, phase-locked loop (PLL) clock generator/phase aligner for the POWER3 microprocessor has been designed using a 2.5-V, 0.40-m digital CMOS6S process. The PLL design supports multiple integer and noninteger frequency multiplication factors for both the processor clock and an L2 cache clock. The fully differential delay-interpolating voltage-controlled oscillator (VCO) is tunable over a frequency range determined by programmable frequency limit settings, enhancing yield and application flexibility. PLL lock range for the maximum VCO frequency range settings is 340–612 MHz. The charge-pump current is programmable for additional control of the PLL loop dynamics. A differential on-chip loop filter with common-mode correction improves noise rejection. Cycle–cycle jitter measurements with the microprocessor actively executing instructions were 10.0 ps rms, 80 ps peak to peak (P-P) measured from the clock tree. Cycle-cycle jitter measured for the processor in a reset state with the clock tree active was 8.4 ps rms, 62 ps P-P. PLL area is 1040 2 640 m2 . Power dissipation is <100 mW. Index Terms— Clock generator, clocking, microprocessors, phase-locked loop (PLL).

I. BACKGROUND

T

HE use of phase-locked loops (PLL) for generating phase-synchronous, frequency-multiplied clocks in microprocessors has been prevalent in industry [1]–[4]. In recent years, the trend toward ever increasing clock frequency has made PLL’s even more attractive due to the difficulties in distributing high-frequency clocks through several levels of packaging [5], [6], but the jitter penalty for using a PLL has not kept pace with the rate of reduction in processor cycle time. Until this year,1 the best reported microprocessor PLL jitter penalties ranged from 82 to 83 ps peak to peak (P-P) for inactive processors [1], [5], and a PLL on a small (600-K transistor) graphics display chip has been reported with 80 ps P-P jitter for a quiet supply at 320 MHz [7]. Many examples of higher jitter PLL designs exist in the literature. Powersupply noise created from the digital switching activity on a microprocessor is recognized as a major source of PLL jitter, and the primary focus of designers has been directed toward reducing this sensitivity.

Manuscript received December 10, 1997; revised August 10, 1998. The author is with the IBM Research Division, IBM Austin Research Laboratory, Austin, TX 78758 USA. Publisher Item Identifier S 0018-9200(99)02429-4. 1 Recent announcements of a 1-GHz microprocessor PLL [12], [13] and a PLL with an on-chip regulator [14] reported jitter of 9 ps (quiet conditions)/ 36 ps (processor active) and 10 ps (sinusoidal external noise)/ 20 ps (square wave external noise), respectively.

<6

<6

<6

<6

II. INTRODUCTION This paper describes a fully integrated PLL-based clock generator/phase aligner used for the POWER3 microprocessor. The microprocessor is fabricated in IBM CMOS6S technology and contains approximately 12 million transistors. With the microprocessor actively executing instructions, this PLL achieved cycle–cycle jitter of 10.0 ps rms, 80 ps P-P in its application environment and 8.4 ps rms, 62 ps P-P with the microprocessor in a reset state with a portion of the clock tree active. A simplified block diagram of the PLL clock generator is shown in Fig. 1. The external reference or BUSCLK enters before a receiver and is divided by two by divider stage The inentering the phase/frequency detector (PFD) as from divider is compared to ternal feedback signal by the PFD, which generates an error signal , which is used by the charge-pump and filter network to control the voltage-controlled oscillator (VCO). The output frequency and is used as the main of the VCO is divided by processor clock (PCLK) after passing through four levels of clock buffering in an H-tree clock distribution network. The processor clock is passed through a delay-matching receiver completing the feedback path. before entering divider Since at equilibrium the inputs of the PFD will be matched in frequency (and phase), the processor-to-bus frequency ratio which is equal to the ratio is equal to the ratio allowing integer or noninteger frequency synthesis by changing divider ratios. Since the technique does not require clock choppers [2], the duty cycle and phase alignment are relatively insensitive to environment and process tolerances. The output of the VCO is also connected to frequency divider which is used for the L2 cache clock (L2CLK). Since the processor-to-L2 clock-frequency ratio is also adjustable to integer or noninteger ratios. Other phase-synchronous clocks may be designed in similar fashion, and quadrature or interstitial clocks may be created by a polarity change at the divider input. Using the is equal to structure of Fig. 1, the VCO frequency times the processor clock frequency For cases when is even, the processor clock edges are generated from only one VCO clock edge; hence a nearly ideal 50% processor clock duty cycle may be achieved through its independence from the VCO duty cycle. III. PROCESS TECHNOLOGY The microprocessor and integral clock generator PLL are fabricated in a five-layer CMOS process with 0.4- m feature

0018–9200/99$10.00  1999 IEEE

514

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999

Fig. 1. PLL block diagram.

TABLE I CMOS6S PROCESS SUMMARY

sizes. Table I lists some of the relevant attributes of this process technology. The PLL clock generator is shown in the microprocessor die photograph of Fig. 2(a). The dimensions of the entire PLL 640 m . It is shown with the major features are 1040 identified in Fig. 2(b). IV. PLL CLOCK GENERATOR COMPONENTS A. Phase/Frequency Detector The digital PFD generates a signal that conveys relative phase and frequency error information about its inputs to the charge pump and filter. The PFD design is based on a threestate machine structure [8], as depicted in Fig. 3(a). From the input will assert initial reset state, a rising edge on the appears, which the UP output until the rising edge of deasserts UP and forces a reset of both flip-flops [Fig. 3(b)].

A rising edge first appearing on similarly asserts DOWN followed by a subsequent until a rising edge arrives at reset. Complementary outputs are generated by the PFD for use in the differential charge-pump stage that follows the PFD. The pulse width of the output varies proportionally with the phase error between the two inputs, except for the deadzone region as the difference approaches zero. This dead zone exists when the phase error becomes small relative to the combined response time of the PFD, charge pump, and filter circuits. Circuit simulation results show a nominal dead zone of 25 ps. Concerns of current mismatch in the charge-pump and filter networks are reduced at the expense of increased dead zone by preventing simultaneous assertions of UP and DOWN B. Power-Supply Isolation A separate analog power connection (AVDD) is used for the analog circuits [current reference, charge pump, commonmode rejection (CMR), filter initialization, and VCO circuits] to increase the isolation of the sensitive circuits from the logicinduced switching noise present on the main power supply. To allow the detection of potential defects using conventional testing, the AVDD pin is held low, disabling the analog devices that normally draw dc current. Both on-chip and on-module decoupling is used on AVDD. C. Reference Circuit A thermal voltage-referenced current source is used to provide temperature- and supply-independent biasing for the analog circuits in the PLL. The circuit contains an array of P diffusions in the N-well connected to form two forward-biased diodes with areas that differ by a factor of ten. When connected

BOERSTLER: LOW-JITTER PLL CLOCK GENERATOR

515

(a)

(b) (a)

Fig. 3. (a) PFD state diagram. (b) Phase detector implementation.

(b)

Fig. 4. Reference circuit.

Fig. 2. (a) Die photograph of POWER3 microprocessor. (b) PLL layout.

D. Process and Temperature Compensation as shown in Fig. 4, the current through each leg has two A stable operating points, The startup circuit prevents the zero current state or from occurring by injecting current into one leg during initial power-on. The resistor is implemented using the precision resistor available in the process, which has a temperature coefficient (TC) of 2000 ppm/ C. The positive TC’s of the thermal voltage term and the resistor tend to cancel, providing a reference current TC of 785 ppm/ C at 85 C. The reference is used for subsequent generation of reference current through mirroring. currents and the PMOS bias voltage Sensitivity to power-supply change is 1.7%/V for 20% change on VDD.

due to process are monitored using the Variations in circuit shown in Fig. 5(a). All of the current sources are generA constant ated directly from the reference circuit current current is passed through a branch containing short-channel , which NMOS devices, creating a monitoring voltage is sensitive to NMOS device length variations. This voltage generated by a is compared to a reference voltage constant current through a long-channel NMOS device that is relatively insensitive to length variations. The devices and bias and currents used for length sensing are sized so that are equivalent for a nominal process. To minimize temperature sensitivity, the bias currents correspond to the zero-temperature coefficient (0-TC) region of the devices.

516

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999

Fig. 6. Charge-pump circuit. (a)

(b) Fig. 5. (a) Process compensation circuit. (b) Temperature compensation circuit.

The two voltages are compared using a differential amplifier, which generates a current proportional to the NMOS offset from nominal. This current is mirrored to produce a that is injected into a precision resistor current used for combining various process monitors to generate a compensating reference voltage. The compensating reference voltage is connected to the active load elements of the VCO, which control the VCO’s voltage swing. A current generated from a similar PMOS circuit also is injected into the resistor. Weighted combinations of standard bias circuits with differing voltage and temperature coefficients have been used previously to compensate reference circuits for VCO’s [9]. In this case, however, temperature was monitored directly by comparing the voltage of two series-connected devices below their 0-TC operating point to biased by current the voltage of two parallel devices biased by current significantly above their effective 0-TC point [Fig. 5(b)]. The devices and bias currents are sized so that both branches of for nominal the differential amplifier are balanced at temperature conditions. The inset shows the I–V characteristics as a function of temperature for the series (subscript 2) and parallel (subscript 1) connected devices; the 0-TC points correspond to the crossing point where the current is

invariant with temperature. The current in one leg of the differential amplifier varies proportionally with temperature and is mirrored and added to the summing junction of the A constant bias current is also added to the resistor summing junction to establish the correct weighting of the various compensating currents and to correct for the TC of the summing junction resistor. Using a statistical process model, the process compensation was designed to favor the stabilization of the “best case” side of the distribution over the “worst case” side in anticipation of future process trends. Given the limited range over which a circuit may be practically compensated, the performance for the “best case” devices was not sacrificed at the expense of extensive compensation of the poorest performing devices. For the unsorted population, this approach allowed a reduction in the sensitivity of the VCO to process variability by a factor of 3.6 (55.4–15.2%) over the uncompensated VCO; temperature sensitivity was reduced by a factor of 4.7 (38.6–8.2%). E. Charge Pump The reference circuit is used to generate the currents and for use within the charge pump. The peak chargepump current may be adjusted in 30- A increments from 30 to 240 A by scaling the mirror currents as shown in Fig. 6. and generated by the PFD are used The error signals to switch the peak current selected. Adjusting the charge pump allows for optimization of the loop characteristics for different divider and VCO settings. Differential outputs P and P are included for high CMR in the subsequent analog circuits. F. Loop Filter The differential loop filter and initialization circuits are shown in Fig. 7. Currents to and from the charge-pump circuit enter the filter at nodes P, P. The input to the filter contains NMOS transmission-gate clamping devices to limit where is the maximum filter voltage to the NMOS threshold voltage for a large source-bulk voltage. For the CMOS6S process, the clamps prevent the filter voltage from exceeding approximately 1.8 V, eliminating concern for the VCO input stage’s shutting off. The filter capacitors are accumulation-mode gate-oxide devices, and are interleaved to improve the matching. Both loop-filter capacitors together 280 m and are occupy an area of approximately 865 approximately 450 pF each. Precision resistors (1.2 K each) are used to produce a zero in the filter transfer function.

BOERSTLER: LOW-JITTER PLL CLOCK GENERATOR

517

Fig. 9. Voltage-controlled oscillator.

Fig. 7. Loop filter and filter initialization.

(a)

Fig. 8. Common-mode control.

The filter output is connected to the VCO control input at An initialization circuit activated during the nodes initial system power-on-reset is used to precharge the filter capacitors to the nominal common-mode voltages at nodes (b)

G. Common-Mode Control It is possible for common-mode voltages to develop in the filter from leakage, drift, or device mismatch. Since the common-mode voltage can introduce frequency offsets in the VCO or even inhibit operation for extreme cases, the circuit shown in Fig. 8 was used in conjunction with the filter clamps described earlier. The common-mode voltage of the filter and is sensed by generating currents proportional to and summing them across a load device to produce A differential amplifier compares to a reference , which is proportional to voltage and generates a current is mirrored by two the common-mode voltage. The current identical current sources, which bleed current from both filter capacitors simultaneously without affecting the differential voltage between them. The maximum drain currents for this structure, which corresponds to the case when both clamps have activated, are approximately 16 A. For typical cases where the common-mode voltage is below 600 mV, the bleed currents are 1 A. Stability of the network is assured by heavy dominant-pole compensation.

Fig. 10. (a) Delay element. (b) Mixer circuit.

H. Voltage-Controlled Oscillator The VCO design is based upon a delay-interpolating ring oscillator structure [9]–[11], as shown in Fig. 9. In contrast to the current-starved and current-modulated VCO’s, which are very commonly used for microprocessor clock generators, delay-interpolating VCO’s have relatively low-to-moderate VCO gains and are well suited to fully differential control and signal path circuit implementations. The lower VCO gain of the delay-interpolating VCO’s produces significantly less jitter due to coupled noise than higher gain structures. The limited operating frequency range for delay-interpolating VCO’s, which must be less than 2 : 1 to ensure monotonicity, may be effectively augmented by selecting suitable divider ratios or by adding programmability to the VCO signal paths. The frequency limits of the VCO are determined by the longest and shortest path delays through the structure. Fig. 9 composed shows an example high-frequency limit of period of three delay units and one mixer unit, and a low-frequency

518

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999

(a)

(b) Fig. 11.

Cycle–cycle processor clock jitter: (a) quiet processor and (b) active processor.

limit of period composed of six delay units and one mixer unit. These frequency limits also affect the VCO gain (for a given mixer design) as well as the center frequency. The frequency limits may be independently controlled using the multiplexers shown in Fig. 9, allowing flexible control of the VCO operating range and greater than ten-to-one adjustment range for VCO gain. The delay elements and mixer designs are based upon PMOS source-coupled pair differential amplifiers with NMOS load networks [Fig. 10(a) and (b)] which allow voltagecontrolled swing adjustment through effective load-line The high impedance translation by adjusting the voltage provided by the current source improves the supply noise rejection for the source-coupled pair, and the N-well improves the isolation to the p bulk substrate noise. The variation of the threshold voltage due to bulk effect is eliminated using bulk-to-source biasing throughout the structure. Sensitivity of the VCO to low repetition rate, 100-mV steps on VDD and AVDD is 0.418 ps/mV. Center-frequency common-mode voltage sensitivity is 3.5% over the full input range dictated by

the common-mode control circuit. Nominal VCO gain for the settings that produce the maximum VCO range is 185 MHz/V. The worst case VCO power dissipation is 30 mW. I. Dividers and Receivers Dividers and (Fig. 1) may be individually programmed and support division by 2, 3, 4, 5, 6, 8, or 10. The dividers are placed in pairs within the layout to improve and and between and device matching between The receivers shown in Fig. 1 are also placed together and are located near the I/O pad for BUSCLK. V. PLL MEASUREMENTS The damping factor, loop gain, and natural frequency of the PLL may be adjusted over a wide range to match the application by changing the charge-pump and VCO gain as described above. System testing was conducted with 90-A peak charge-pump current using the maximum frequency and range on the VCO with a variety of divider settings and BUSCLK

BOERSTLER: LOW-JITTER PLL CLOCK GENERATOR

519

frequencies. The processor clock was accessed from the clock tree through a series of inverters. A time-interval measurement (TIM) system was used to measure cycle–cycle period jitter statistics for a number of packaged die representing various process skews. The processor was operated using an array initialization program loop with the fixed-point and floatingpoint processors active for the “active” processor tests, and was also operated in a “quiet” mode reset state. All tests were performed at room temperature with ambient forcedair cooling. Conventional first-cycle oscilloscope-based jitter measurements were performed periodically and provided PP jitter results that were consistent with those measured on the TIM system. The external clock was provided by a highfrequency pulse generator, with 7.3 ps rms, 36 ps P-P jitter. Fig. 11(a) shows a histogram of cycle–cycle period measurements taken with the processor in an inactive reset state but with the clock tree active. The frequencies of the reference clock, processor clock, and VCO are 85, 170, and 340 MHz, respectively, which corresponds to a 3-dB loop bandwidth of 2 MHz. The distribution of samples in the histogram follows a Gaussian distribution with period jitter of 8.4 ps rms, 62 ps P-P. The minimum period measured for this sample size was 26.2 ps less than the mean (3.1 sigma away). Assuming that cycle-time failures only occur on the minimum period side, the worst case clock jitter penalty for this system (i.e., a “quiet” processor) is 26.2 ps at 3.1 sigma confidence (or 25.2 ps penalty at 3.0 sigma). Since a peak-to-peak jitter approximately equal to the PFD dead zone can exist for the PLL, the 25 ps simulated value for the dead zone may be a significant component of the measured jitter. Fig. 11(b) shows a clock-jitter histogram for the processor executing the array initialization routine for a large population A Gaussian curve has been superimposed on the histogram for comparison purposes. The frequencies of the reference clock, processor clock, and VCO are 90, 180, and 360 MHz, respectively. For this system (i.e., an “active” processor), the period jitter has increased to 10.0 ps rms, 80 ps P-P, and the worst case clock-jitter penalty is 37.1 ps at 3.7 sigma confidence (or 30.1 ps at 3.0 sigma). The effective noise penalty for running the array initialization routine is 4.9 ps at 3.0 sigma.

divider implementation, R. Kodali for circuit simulation and specification, D. Woeste and J. Strom for the divider and lock detector circuits, and S. Dhong and M. Papermaster for their continuous support of this work.

VI. CONCLUSION

David W. Boerstler received the B.S. degree in electrical engineering from the University of Cincinnati, Cincinnati, OH, in 1978 and the M.S. degree in computer engineering and in electrical engineering from Syracuse University, Syracuse, NY, in 1981 and 1985, respectively. Since joining IBM in 1978, he has held a variety of assignments, including the design of high-frequency PLL’s for clock generation and recovery, fiber-optic transceiver and system design, and other analog, digital, and mixed-signal bipolar and CMOS circuit development projects. He currently is a Research Staff Member with the High-Performance VLSI group at the IBM Austin Research Laboratory, Austin, TX. His current research interests include high-frequency synchronization techniques and signaling approaches for high-speed interconnect. Mr. Boerstler has received IBM Outstanding Technical Achievement Awards for his work on the design of the serializer/deserializer for the ESCON fiber-optic channel products and for the clock-generator design of IBM’s 1-GHz PowerPC microprocessor prototype. He has received seven IBM Invention Achievement Awards.

This work demonstrates the viability of a low-jitter PLL design approach amenable to high-speed microprocessors. Measured jitter for the design was 8.4 ps rms, 62 ps P-P for quiet conditions and 10.0 ps rms, 80 ps P-P for the processor active. A tunable, moderate-gain VCO with active process and temperature compensation provides high powersupply rejection and low sensitivity to temperature and process variability. A differential design approach maintains noise immunity in both control and signal paths within the analog portions of the PLL. ACKNOWLEDGMENT The author wishes to thank J. Peter for layout of the PLL, N. James and H. Casal for the hardware characterization and

REFERENCES [1] I. Young, M. Mar, and B. Bhushan, “A 0.35 m CMOS 3-880 MHz PLL N /2 clock multiplier and distribution network with low jitter for microprocessors,” in ISSCC Dig. Tech. Papers, Feb. 1997, pp. 330–331. [2] J. Alvarez, H. Sanchez, G. Gerosa, and R. Countryman, “A widebandwidth low-voltage PLL for powerPC microprocessors,” IEEE J. Solid-State Circuits, vol. 30, pp. 383–391, Apr. 1995. [3] J. Cho, “Digitally-controlled PLL with pulse width detection mechanism for error correction,” in ISSCC Dig. Tech. Papers, Feb. 1997, pp. 334–335. [4] I. Young, J. Greason, and K. Wong, “A PLL clock generator with 5–110 MHz of lock range for microprocessors,” IEEE J. Solid-State Circuits, vol. 27, pp. 1599–1607, Nov. 1992. [5] V. von Kaenel, D. Aebischer, C. Piguet, and E. Dijkstra, “A 320 MHz, 1.5 mW at 1.35 V CMOS PLL for microprocessor clock generation,” in ISSCC Dig. Tech. Papers, Feb. 1996, pp. 132–133. [6] P. E. Gronowski, P. Bannon, M. Bertone, R. Blake-Campos, G. Bouchard, W. Bowhill, D. Carlson, R. Castelino, D. Donchin, R. Fromm, M. Gowan, A. Jain, B. Loughlin, S. Mehta, J. Meyer, R. Mueller, A. Olesin, T. Pham, R. Preston, and P. Rubinfeld, “A 433 MHz 64b quad-issue RISC microprocessor,” in ISSCC Dig. Tech. Papers and Slide Supplement, Feb. 1996, pp. 222–223. [7] Z. Zhang, H. Du, and M. Lee, “A 360 MHz 3V CMOS PLL with 1 V peak-to-peak power supply noise tolerance,” in ISSCC Dig. Tech. Papers, Feb. 1996, pp. 134–135. [8] D. H. Wolaver, Phase-Locked Loop Circuit Design. Englewood Cliffs, NJ: Prentice-Hall, 1991, pp. 59–61. [9] J. F. Ewen, A. Widmer, M. Soyuer, K. Wrenner, B. Parker, and H. Ainspan, “Single-chip 1062 Mbaud CMOS transceiver for serial data communication,” in ISSCC Dig. Tech. Papers, Feb. 1995, pp. 32–33. [10] B. Lai and R. Walker, “A monolithic 622 Mb/s clock extraction and data retiming circuit,” in ISSCC Dig. Tech. Papers, Feb. 1991, pp. 144–145. [11] S. K. Enam and A. Abidi, “NMOS IC’s for clock and data regeneration in gigabit-per-second optical fiber receivers,” IEEE J. Solid-State Circuits, vol. 27, pp. 1763–1774, Dec. 1992. [12] D. W. Boerstler and K. Jenkins, “A phase-locked loop clock generator for a 1 GHz microprocessor,” in Symp. VLSI Circuits Dig. Tech. Papers, June 1998, pp. 212–213. [13] J. Silberman, N. Aoki, D. Boerstler, J. Burns, S. Dhong, A. Essbaum, U. Ghoshal, D. Heidel, P. Hofstee, K. Lee, D. Meltzer, H. Ngo, K. Nowka, S. Posluszny, O. Takahashi, I. Vo, and B. Zoric, “A 1.0 GHz singleissue 64b PowerPC integer processor,” in ISSCC Dig. Tech. Papers, Feb. 1998, pp. 230–231. [14] V. von Kaenel, D. Aebischer, R. van Dongen, and C. Piguet, “A 600 MHz CMOS PLL microprocessor clock generator with a 1.2 GHz VCO,” in ISSCC Dig. Tech. Papers, Feb. 1998, pp. 396–397.

726

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 6, JUNE 2002

A Low-Jitter Wide-Range Skew-Calibrated Dual-Loop DLL Using Antifuse Circuitry for High-Speed DRAM Se Jun Kim, Sang Hoon Hong, Jae-Kyung Wee, Joo Hwan Cho, Pil Soo Lee, Jin Hong Ahn, and Jin Yong Chung

Abstract—This paper describes a delay-locked loop (DLL) circuit having two advancements, a dual-loop operation for a wide lock range and programmable replica delays using antifuse circuitry and internal voltage generator for a post-package skew calibration. The dual-loop operation uses information from the initial time difference between reference clock and internal clock to select one of the differential internal loops. This increases the lock range of the DLL to the lower frequency. In addition, incorporation of the programmable replica delay using antifuse circuitry and the internal voltage generator allows for the elimination of skews between external clock and internal clock that occur from on-chip and off-chip variations after the package process. The proposed DLL, fabricated on 0.16- m DRAM process, operates over the wide range of 42–400 MHz with 2.3-V power supply. The measured results show 43-ps peak-to-peak jitter and 4.71-ps rms jitter consuming 52 mW at 400 MHz. Index Terms—Delay-locked loop, dual-loop operation, high-speed DRAM, programmable replica delay, skew calibration.

I. INTRODUCTION

T

HE DELAY-LOCKED loop (DLL) has become an indispensable component in high-speed synchronous DRAMs such as DDR SDRAM. Since the DLL determines the operation range of the DRAM and has a large effect on the data valid window, a high-performance DLL that has a wider range and lower jitter is essential for increasing the speed of DRAM. A DLL can be categorized into either of two types, the digital and the analog type. Although the digital DLL has robustness, process portability, and design simplicity, it is difficult to use on a very high-bandwidth DRAM (over 600 Mb/s) due to poor jitter performance [1], [2]. Therefore, in spite of sensitivity on process variation, the analog DLL, which ensures lower jitter by the continuous characteristics of analog operation, is more suitable in the higher speed DRAM. In addition to the jitter performance, another important issue of the DLL is the lock range. Process variation makes the lock range of the analog DLL more limited and results in a narrower operation range of the DRAM. The limited range of the DLL limits the flexibility of implementation on memory applications and increases test costs in mass production. For solving the limited lock-range problems, various types Manuscript received October 2, 2001; revised January 29, 2002. S. J. Kim, S. H. Hong, J. H. Cho, P. S. Lee, J. H. Ahn, and J. Y. Chung are with the Advanced Design Team, Memory Research and Development, Hynix Semiconductor Inc., Ichon-si, Kyoungki-Do 467-701, Korea (e-mail: [email protected]). J.-K. Wee is with the Department of Electronics Engineering, Hallym University, Chunchun-si, Kangwon-Do 200-702, Korea. Publisher Item Identifier S 0018-9200(02)04934-X.

of DLLs have been developed [3]–[6]. However, such DLLs resulted in complex architectures that faced such problems as increased area, added power consumption, and degradation of jitter performance. For these issues, a novel dual-loop architecture, which increases the lock range having no degradation of jitter performance with a relatively small overhead in area and power, is proposed in this paper. Another enhancement in the proposed DLL is the post-package skew calibration. Process variations in on-chip and trivial mismatches in off-chip parameters can result in a large static skew in addition to the phase offset of the phase detector. In the proposed DLL, an improved scheme using antifuse circuitry is applied for reducing the skew. It enables a practical calibration of inevitable skews after the package process. This paper is arranged as follows. The limited range problem of the conventional DLL is described in Section II. In Section III, the concept of the proposed dual loop for wide locking range is briefly explained, followed by presentation of the architecture and physical implementation based on the concept. The skew calibration method using antifuse circuitry is described in Section IV. Section V discusses the fabricated chip and shows the experimental results. Finally, the paper is concluded in Section VI. II. LIMITED RANGE PROBLEM OF CONVENTIONAL DLL Fig. 1(a) shows the architecture of the conventional analog DLL and the delay characteristic of the voltage-controlled (minimum delay delay line (VCDL). When (maximum delay of VCDL), the of VCDL) (operation frequency of DLL) is determined by range of (control voltage of loop filter) at the initial state. When (minimum control voltage of loop filter) at the (the cycle time initial state and of reference clock), the lock failure occurs because the phase detector produces a DN pulse which discharges the capacitor in the loop filter, as shown in Fig. 1(b). Therefore, in this case, it at the initial state for satmust be isfying the condition without lock failure. Therefore, the range is . In the of (maximum control voltage of other case, when , loop filter) at the initial state and the lock failure occurs because of the UP pulse of the phase detector shown in Fig. 1(c). In this case, the range of is when is . For utilizing the full range of the initial

0018-9200/02$17.00 © 2002 IEEE

KIM et al.: SKEW-CALIBRATED DUAL-LOOP DLL

727

(a)

(b)

(c) Fig. 1.

V

(a) Block diagram and delay characteristic of conventional DLL. Cases of lock failure at initial control voltage: (b) initial V .

=V

without the lock failures as in Fig. 1(b) and (c), the must be set at a level such that the initial initial is approximately . is determined as In this condition, the range of . But this method can cause stuck/harmonic lock and makes the jitter perforis desired at mance worse. Therefore, if the range of should be set to the higher frequency range, the initial

=V

and (c) initial

since it is stuck/harmonic lock free and the delay cell has a fast slew-rate that produces less phase noise [7]. But in is very sensitive to process, voltage, and reality, temperature (PVT) variation. As a result, designing to be in the target range becomes more careful and difficult work as the operation frequency becomes higher. Therefore, becomes considering the PVT variation, the range of more limited with the higher operating range.

728

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 6, JUNE 2002

(a) Fig. 3.

Architecture of proposed DLL.

switched from ICLK to ICLKB (the differential clock of ICLK), becomes , and FCLK is also switched from ICLK to ICLKB. This means that ICLKB is synchronized to REFCLK. Therefore, in our proposed DLL, the locking fre. quency range is As a consequence, although the same delay source was used, the operation range of the proposed DLL can be extended to a lower frequency than that of conventional DLLs. As an analogous concept, the phase inversion technique was developed for the wide range [8]. It uses instantaneous phase inversion in VCDL input at the final moment, when it realizes current lock-in process cannot meet the range by monitoring its control voltage. The proposed concept achieves faster lock-in time since it utilizes the dual-loop operation at the beginning by selecting an optimized path.

(b)

B. Architecture and Implementation of the Proposed DLL

(c) Fig. 2. (a) Concept of proposed DLL. Two cases of loop selection according to the initial time difference: (b) (1=2) and (c) 0 (1 2) .

T

< = T

T



<

III. PROPOSED DLL IN HIGH-SPEED DRAM A. Range of the Proposed DLL The concept of the proposed dual-loop DLL is shown in , Fig. 2(a). After the DLL starts up at the initial the initial time difference between REFCLK and ICLK is monitored at the first REFCLK cycle. The first REFCLK cycle refers to the cycle of the REFCLK when ICLK is produced first in the loop after the DLL starts up. If as shown in Fig. 2(b), is adjusted to by the phase detector and charge pump like the conventional DLL. As a result, ICLK becomes FCLK of the synchronized output clock of DLL. In the other as shown in Fig. 2(c), case, if the input clock for phase comparison in the phase detector is

Fig. 3 shows the proposed dual-loop architecture of the DLL for the wide lock range. Unlike conventional DLLs, the proposed DLL is composed of dual negative feedback loops (Loop1, Loop2). Loop1 and Loop2 are the feedback loops of the differential internal clocks. For the correct operation of dual loops, a loop selector, an initial circuit, a reset controller, and a 2 : 1 MUX are implemented. In the proposed dual-loop operation, the DLL determines one of the two differential internal clocks in Loop1 and Loop2 according to the initial time difference between the internal clock and the reference clock (shown in Fig. 3). Before the DLL starts up, the initial circuit sets VBP (control voltage of loop filter in Fig. 3) to the minimum value, which minimizes the delay of the VCDL to ensure harmonic lock-free operation. After RESET (DLL enable signal) transits from low state to high state, CLK and CLKB (the external differential clocks) are provided to the reset controller. The reset controller outputs IRESET to the clock buffer in the time between the falling edge of the next CLK and the next rising edge, as shown in Fig. 4(c). Since RESET can be asserted at any time, the direct application of this signal to the clock buffer is not feasible because it can make the clock buffer produce internal clocks with a distorted cycle and cause incorrect initial time difference in the loop selection cycle, as

KIM et al.: SKEW-CALIBRATED DUAL-LOOP DLL

729

(a) (a)

(b) Fig. 5. (a) Delay cell of the replica bias circuit. (b) Cross-sectional view of bias line in the proposed DLL. (b)

(c) Fig. 4. (a) Case where the clock buffer produces an incorrect initial time difference without reset controller. (b) Schematic and (c) timing diagram of the reset controller.

shown in Fig. 4(a). Fig. 4(b) shows the schematic of the reset controller. When IRESET is asserted, the clock buffer produces the three clocks (ICLK, ICLKB, and REFCLK). ICLK and ICLKB, which are the differential clocks, are input to VCDL and REFCLK is input to the phase detectors as the reference clock. Fig. 5(a) shows the delay cell and the biasing scheme used in the proposed DLL. To reduce the supply voltage sensitivity, the VCDL is implemented as a series of the differential delay cell with symmetric loads, as shown in Fig. 5(a) [9]. VBP is a control voltage from the loop filter and VBN is generated by the replica bias circuit [boxed area in Fig. 5(a)]. The replica bias circuit makes the constant swing independent of VBP, which provides better jitter performance and a wider operation range [10]. For shielding the analog biases from the external noise, VBP and VBN are physically enclosed with inter- and intra-layers as shown in Fig. 5(b). This shielding technique improves the jitter characteristic. DCLK and DCLKB in Fig. 3, which are converted from a small swing output of VCDL to a full swing output by amplifiers, each forms negative feedback loops and also are changed to LCLK and LCLKB by replica delays. LCLK and LCLKB are input to the phase detectors

and LCLK is also input to the loop selector as shown in Fig. 6(a). If the time difference between the first LCLK (the first produced LCLK after DLL is enabled) and REFCLK is in as shown in Fig. 6(b), Lsel, the output of the loop selector, preserves the low state initialized by IRESET. Lsel at the low state enables PD1 (the phase detector of Loop1) and disables PD2 (the phase detector of Loop2). Furthermore, the state makes the MUX select DCLK as FCLK (the output clock of DLL). Since only PD1 is enabled, the phase of LCLK is compared with that of REFCLK. The selected PD1, as shown in Fig. 7(a), produces the UP/DN pulses having a pulsewidth matching the phase difference between REFCLK and LCLK, as shown in Fig. 7(b). This PD has small phase offset due to the fast operation and precision of dynamic logic. Also, it does not have phase dithering problems because no pulses are produced at the locked state. The simulation results show about 40-ps phase offset at the worst case. The UP/DN pulse of PD1 is transferred to the charge pump and generates VBP on the loop filter. The linear capacitor of the loop filter is designed to achieve a large capacitance value in a small area while minimizing the substrate noise, as shown in Fig. 8 [11]. Although compromising the linearity, if a MOS capacitor is used, larger capacitance can be achieved in smaller area. The replica bias generator produces VBN to control the current source transistor of delay cell according to VBP. Finally, the phase of LCLK is synchronized with that of REFCLK. In the other case, where the time difference between the first LCLK as shown in and REFCLK is Fig. 6(c), Lsel is changed from the low state initialized by IRESET to the high state. It enables PD2 and disables PD1, and makes the MUX select DCLKB as FCLK. In contrast to the prior case, the phase of LCLKB is compared with that of REFCLK. Through the same locking process, LCLKB is

730

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 6, JUNE 2002

(a)

(a)

(b)

(c)

2T

Fig. 6. (a) Schematic and timing diagrams at (b) (1=2) , (c) 0 (1 2) of the loop selector.

T


< = 2T


<

(b) Fig. 7.

synchronized with REFCLK. Once one of the two loops is selected, with the exception that the DLL is disabled or the power is down, the selected loop is never changed by the time difference between LCLK and REFCLK after the loop selection cycle. Therefore, malfunction is avoided on the lock-in process. Consequently, the proposed dual-loop operation eliminates the , between the reference initial condition, clock and the internal clock and enables the delay of VCDL to be utilized fully. The lock range is also extended to lower frequency without compromising the jitter characteristic. IV. SKEW CALIBRATION BY PROGRAMMABLE REPLICA DELAY AND ANTIFUSE CIRCUITRY Although the replica delay is well matched with the sum of on-chip and off-chip delay at design process, process variation of on-chip and unexpected change in the circumstance of

(a) Schematic and (b) timing diagram of the phase detector.

off-chip such as output load, clock slew-rate, and so on, may result in unavoidable skew. There are two methods for skew elimination, wafer trimmed by laser [12] and post-package tuning by antifuse [13]. The wafer-level tuning is not effective because the wafer tester is not precise, and the off-chip condition cannot be considered. Although post-package tuning by antifuse is more practical, the previous post-package method has some problems. The previous post-package method uses external high voltage through pins for rupturing the antifuse. Providing sufficient high voltage for rupturing the antifuse can cause physical damage to other circuits connected to the pin and can negatively affect the reliability of the device. To remove the high-voltage problem, the antifuse programming scheme by internal negative voltage is used [14]. Fig. 9 shows the programmable replica delay circuit. The circuitry has three functional parts, the replica delay of clock, the replica delay of the output buffer, and the tun-

KIM et al.: SKEW-CALIBRATED DUAL-LOOP DLL

Fig. 8. Linear capacitor in the loop filter.

Fig. 9.

731

Fig. 10. Antifuse circuit for skew calibration and SEM photograph of the antifuse.

Replica delay including the programmable delay.

able delay circuit. The tunable delay circuit is connected to the antifuse circuitry and the antifuse is made of ONO (oxide–nitride–oxide) dielectrics, as shown in Fig. 10. The sequence of skew calibration is explained as follows. When DLL is enabled by RESET for the test, nodes fd [1]–[8] and bd [1]–[8] in Fig. 9 are all fixed at the high state, because the initial program voltage is at ground level, and RESET initializes node A and B as level. In this state, no address code can have an effect on the fixed levels of fd [1]–[8] and bd [1]–[8]. First, the skew between the external clock and the data strobe signal is measured. The measured skew is estimated by selection of optimal number of delay loads. After the program mode (PGM) is activated, the program code signifying the estimated number of delay loads is applied to the address pins and the skew is remeasured. This process is iterated to increase or decrease replica delay times by left-shift–right-shift (LSRS) for minimizing the skew. When the skew is almost eliminated, the inserted program address code is fixed and the on-chip negative voltage generator is enabled to V) for rupturing the produce a program voltage ( antifuses. The replica delay is tuned through the flow shown in Fig. 11. According to the simulation results, the programmable tuning range using the eight antifuses is from 350 to 350 ps and the minimum tuning resolution is approximately 10 ps. V. EXPERIMENTAL RESULTS The proposed DLL has been fabricated using 0.16- m DRAM process. Fig. 12 shows a microphotograph of the

Fig. 11.

Flow of skew calibration after package process.

Fig. 12.

Microphotograph of the proposed DLL.

fabricated chip. The active area of DLL occupies 0.27 mm . 50 of total area. For high-freThe loop filter consumes quency measurements of the proposed DLL, a chip-on-board (COB) has been fabricated both to reduce parasitics and to match 50- impedance of the measurement instrument. The proposed DLL operates from 42 to 400 MHz with a 2.3-V power supply. Fig. 13 shows the synchronized waveforms at

732

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 6, JUNE 2002

(a)

(b)

Fig. 13. Synchronized waveforms at (a) 42 MHz and (b) 400 MHz.

(a) Fig. 14.

(b)

Measured jitter characteristics at 400 MHz in (a) a quiet supply and (b) with injected 1-MHz

42 and 400 MHz. At 400 MHz, the peak-to-peak jitter is 43 ps and the rms jitter is 4.71 ps, as shown in Fig. 14(a). When a 300-mV 1-MHz square wave is injected externally on the power supply, the peak-to-peak jitter and the rms jitter is measured to be 80 and 7.46 ps, respectively, at 400 MHz, as shown in Fig. 14(b). Fig. 15(a) shows a skew that is composed of the phase offset of the phase detector and replica mismatch by process variation before skew calibration and the tuned skew after skew calibration. Before the calibration, the measured skew is 55 ps. After the calibration, the remeasured skew is

6300-mV square wave noise.

reduced to 9 ps with measured peak-to-peak jitter of 46 ps. In theory, error reduction resulting in negative phase shift will have increased jitter due to increased load in replica delay, but error reduction resulting in positive phase shift will have decreased jitter by decreased load in replica delay. However, from analyzing the measured results, the amount of increased jitter by negative phase reduction is insignificant compared to the reduced phase error. Fig. 15(b) shows the resolution and partial range of the skew calibration through antifuse programming. Minimum resolution is about 10 ps and total

KIM et al.: SKEW-CALIBRATED DUAL-LOOP DLL

733

(a)

(b)

Fig. 15. (a) Measured skew at 400 MHz before skew calibration and after skew calibration. (b) Range (full range not displayed for limitation of tester) and resolution of skew calibration. TABLE I PERFORMANCE CHARACTERISTICS OF THE PROPOSED DLL

clock and the internal clock of the DLL. Also, an improved skew calibration method demonstrated a practical post-package skew calibration using the antifuse circuitry and the internal negative voltage generator. The proposed DLL, fabricated on 0.16- m DRAM process, achieves a wide range from 42 to 400 MHz, and 43 ps peak-to peak jitter and 4.71 ps rms jitter at 400 MHz that is applicable to high-speed DRAMs. ACKNOWLEDGMENT The authors are grateful to H. Ryu and Dr. Y. Kim for helpful discussion about COB-type PCB design. REFERENCES

calibration range is from 350 to 350 ps, as expected from simulation. These results show that the skew by variation in on-chip or off-chip can be eliminated through programmable replica delays using the antifuse circuitry, and also verifies that the improved skew calibration technique can effectively eliminate the skews after packaging without degradation of the jitter characteristic. The power dissipation of the proposed DLL is 52 mW at 400 MHz. Table I summarizes the measured characteristics of the proposed DLL. VI. CONCLUSION In this paper, the dual-loop architecture with the improved skew calibration method was presented. The dual-loop architecture enabled the wide range of the DLL by using the loop selection decided by an initial time difference between the reference

[1] A. Hatakeyama et al., “A 256-Mb SDRAM using a register-controlled digital DLL,” IEEE J. Solid-State Circuits, vol. 32, pp. 1728–1734, Nov. 1997. [2] Y. Okajima et al., “Digital delay-locked loop and design technique for high-speed synchronous interface,” IEICE Trans. Electron., vol. E79-C, pp. 798–807, June 1996. [3] T. H. Lee et al., “A 2.5-V CMOS delay-locked loop for an 18-Mbit 500Mbyte/s DRAM,” IEEE J. Solid-State Circuits, vol. 29, pp. 1491–1496, Dec. 1994. [4] S. Tanoi et al., “A 250–622-MHz deskew and jitter-suppressed clock buffer using two-loop architecture,” IEEE J. Solid-State Circuits, vol. 31, pp. 487–493, Apr. 1996. [5] S. Sidiropoulos et al., “A semi-digital dual delay-locked loop,” IEEE J. Solid-State Circuits, vol. 32, pp. 1683–1692, Nov. 1997. [6] Y. Okuda et al., “A 66–400-MHz adaptive-lock-mode DLL circuit with duty-cycle error correction,” in Symp. VLSI Circuits Dig. Tech. Papers, June 2001, pp. 37–38. [7] C. H. Park et al., “A low-noise 900-MHz VCO in 0.6-m CMOS,” IEEE J. Solid-State Circuits, vol. 34, pp. 586–591, May 1999. [8] T. Yoshimura et al., “A delay-locked loop and 90-degree phase shifter for 800-Mb/s double data rate memories,” in Symp. VLSI Circuits Dig. Tech. Papers, June 1998, pp. 66–67. [9] J. G. Maneatis, “Low-jitter and process-independent DLL and PLL based on self-biased techniques,” IEEE J. Solid-State Circuits, vol. 31, pp. 1728–1732, Nov. 1998.

734

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 6, JUNE 2002

[10] I. A. Young et al., “A PLL clock generator with 5 to 110 MHz of lock range for microprocessors,” IEEE J. Solid-State Circuits, vol. 27, pp. 1599–1607, Nov. 1992. [11] F. Herzel et al., “A study of oscillator jitter due to supply and substrate noise,” IEEE Trans. Circuits Syst. II, vol. 46, pp. 56–62, Jan. 1999. [12] T. Hamamoto et al., “A skew and jitter suppress DLL architecture for high-frequency DDR SDRAMs,” in Symp. VLSI Circuits Dig. Tech. Papers, June 2000, pp. 76–77. [13] S. Kuge et al., “A 0.18-m 256-Mb DDR-SDRAM with low-cost postmold-tuning method for DLL replica,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2000, pp. 402–403. [14] K. S. Min et al., “A post-package bit-repair scheme using static latches with bipolar-voltage programmable antifuse circuit for high-density DRAMs,” in Symp. VLSI Circuits Dig. Tech. Papers, June 2001, pp. 67–68.

Se Jun Kim was born in Seoul, Korea, in 1974. He received the B.S. and M.S. degrees in electronics engineering from Hanyang University, Seoul, in 1998 and 2000, respectively. In 2000, he joined the Memory Research and Development Division, Hynix Semiconductor Inc., Kyungki-Do, Korea, as a Research Engineer, where he has been working on CMOS circuit and architecture for high-speed digital/analog interface. His current interests include clock recovery circuits, data converters, clock distribution, and I/O circuits for high-speed digital/analog interface.

Joo Hwan Cho was born in Seoul, Korea, in 1968. He received the B.S. degree in electronic materials engineering from Kwang-Woon University, Seoul, in 1992. He joined the Semiconductor Research and Development Center, Hynix Semiconductor Inc, Ichon-si, Kyungki-Do, Korea, in 1992. Since then, he has been working on DRAM design and failure analysis.

Pil Soo Lee was born in Seoul, Korea, in 1963. He received the B.S. and M.S. degrees from Inchon University, Korea, in 1990 and 1992, respectively. In 1993, he joined KEC, Kumi, Korea, where he worked on power device design and analysis. In 1997, he joined Hynix Semiconductor Inc., Ichon-si, Kyungki-Do, Korea, where he has been working on signal integrity analysis of high-frequency devices, circuits, and boards.

architectures.

Jin Hong Ahn was born in Busan, Korea, in 1958. He received the B.S. and M.S. degrees in electronic engineering from Seoul National University, Seoul, Korea, in 1982 and 1984, respectively. He joined Gold-Star Semiconductor Company, Gumi, Korea, in 1984. From 1986 to 1990, he was involved in designing SRAMs and mask ROMs. In 1991, he moved to the DRAM design group, Gold-Star Electron Company, Seoul. From 1991 to 1998, he managed several generations of advanced DRAM design projects, including 64-Mb, 256-Mb, MML, and intelligent RAM. His interests in DRAM design include new DRAM architectures, next-generation DRAM circuit technologies, and low-cost DRAM design techniques. In 1999, he joined the Memory Research and Development Group, Hynix Semiconductor Inc., Ichon-si, Korea, where he was engaged in the development of 0.15-m 256-M DRAM. He is currently a Technical Director in DRAM Design technology.

Jae-Kyung Wee was born in Seoul, Korea, in 1966. He received the B.S. degree in physics from Yonsei University, Seoul, in 1988 and the M.S. degree from Seoul National University in 1990. In August 1998, he received the Ph.D. degree in electronics engineering on modeling and characterization of interconnects for high-speed and high-density circuits from Seoul National University. In 1990, he joined Hyundai Electronic Company working on the process integration of 16 MDRAM and LOGIC devices. In 1996, he was engaged in the development of the manufacturable 0.35-m CMOS logic technology for highperformance logic products at Hyundai Electronics. In August 1998, he became a Project Leader of the Antifuse Repair Circuit Development Team. From August 1999 to June 2000, he was a Project Leader of 1-G DDR SDRAM using 0.13-m technology. Beginning in July 2000, he also worked on next-generation DRAM and its related systems. He is currently with the faculty of Hallym University, Chunchun-si, Kangwon-Do, Korea. His research interest is in the area of future DRAM architecture including high-speed DRAM with 200 400 MHz clock, interconnect modeling, charge pump, DLL, I/O, and module designs for high-speed chips. He holds several patents and is an author or co-author of several papers.

Jin Yong Chung received the B.S.E.E. degree from Seoul National University, Seoul, Korea, in 1974 and the M.S.E.E. degree from Korea Advanced Institute of Science and Technology, Taejon, Korea, in 1976. From 1976 to 1978, he worked for Korea Semiconductor Inc., which later became Semiconductor Business Unit of Samsung Electronics, where he was involved in the design of timepieces and custom CMOS chip designs. Since 1979, he was involved in memory design area and worked for various companies including National Semiconductor, Synertek, Vitelic, developing CMOS SRAMs, 4 K to 64 K and mask ROMs and CMOS DRAMs. In 1987, he joined LG Semiconductor, Korea, where he developed 256 K to 16 M DRAMs and other standard logic products. In 1992, he joined Mosel-Vitelic, where he developed high-speed DRAMs and the 256 K 8 high-speed DRAM became the first semi-standard DRAM, which helped the company to go public. Since 1996, he has worked for Hynix Semiconductor Inc., Ichon-si, Kyoungki-Do, Korea, as a Senior Vice President and Chief Architect in the Memory Research and Development Division. His current research interest is in development of ultrahigh-speed, super low-voltage and low-power memory products, novel device research in ferroelectric and magnetic memories, and new-generation 3-D devices.

Sang Hoon Hong received the B.S. degree in electronic engineering from Yonsei University, Seoul, Korea, in 1993. He received the M.S. and Ph.D. degrees in engineering sciences from Harvard University, Cambridge, MA, in 1998 and 2001, respectively. He is currently with the Memory Research and Development Division of Hynix Semiconductor Inc., Ichon-si, Kyongki-Do, Korea, working on high-speed dynamic memories with a particular interest in low-voltage/power circuits and



2

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 8, AUGUST 2000

1137

A Low-Noise Fast-Lock Phase-Locked Loop with Adaptive Bandwidth Control Joonsuk Lee, Student Member, IEEE, and Beomsup Kim, Senior Member, IEEE

Abstract—This paper presents a salient analog phase-locked loop (PLL) that adaptively controls the loop bandwidth according to the locking status and the phase error amount. When the phase error is large, such as in the locking mode, the PLL increases the loop bandwidth and achieves fast locking. On the other hand, when the phase error is small, this PLL decreases the loop bandwidth and minimizes output jitters. Based on an analog recursive bandwidth control algorithm, the PLL achieves the phase and frequency lock in less than 30 clock cycles without pre-training, and maintains the cycle-to-cycle jitter within 20 ps (peak-to-peak) in the tracking mode. A feed forward-type duty-cycle corrector is designed to keep the 50% duty cycle ratio over all operating frequency range. Index Terms—Adaptive bandwidth PLL, analog implementation, clock recovery, fast locking time, frequency hopping, gear-shifting algorithm, low jitter, phase-locked loops, time-varying channel.

I. INTRODUCTION

P

HASE-LOCKED loops (PLL’s) have been widely used in high-speed data communication systems such as Ethernet receivers, disk drive read/write channels, digital mobile receivers, high-speed memory interfaces, and so forth, because PLL’s efficiently perform clock recovery or clock generation with relatively low cost. Those PLL’s used in the systems are required to generate low-noise or low-jitter clock signals and at the same time need to achieve fast locking. Conventional analog PLL’s in clock recovery applications use a narrow-band loop filter to reduce output jitters at the expense of elongated locking time. In order to improve the locking-time characteristics, digital or hybrid analog/digital PLL’s with a loop bandwidth stepping capability have been studied [1], [2]. Since the stepping hardware is implemented with complex digital building blocks, these PLL’s usually suffer from high power dissipation, low operating speed and large die size. In order to reduce consuming power and die size, simpler algorithms such as a gear-shifting or a lock-detection algorithm were attempted [3], [4]. The PLL’s with such algorithms control the loop bandwidth according to a prestored charge-pump current control sequence in memory during the start-up mode. However, in clock recovery applications such as HDD and DVD, where the channel characteristics vary in time, the prestored control sequence cannot make the PLL’s Manuscript received October 29, 1999; revised February 23, 2000. J. Lee was with the Boston Design Center, IBM Microelectronics, Lowell, MA 01851 USA. He is now with the Korea Advanced Institute of Science and Technology, Taejon 305-701, Korea. B. Kim is with the Korea Advanced Institute of Science and Technology, Taejon 305-701, Korea (e-mail: [email protected]). Publisher Item Identifier S 0018-9200(00)06435-0.

respond properly to unpredictable phase fluctuation, instant frequency shift, and time-varying jitter because the sequence was calculated with preknown fixed noise statistics. Discrete-time PLL’s, which are programmed on DSP processors, based on a recursive least squared (RLS) algorithm [5] or the Kalman filter algorithm [6] can respond to such unpredictable jitter variations, but require enormous amount of hardware. The outputs generated from the discrete-time PLL’s are in a digital domain, and therefore the discrete-time PLL’s require digital-to-analog converters (DAC) and an analog-to-digital converter (ADC) to sample input signals for detection. Slow signal-processing speed of the digital-to-analog conversion in the discrete-time PLL’s limits the operating frequency and confines the use of the PLL’s to the applications dealing with lowfrequency signals like digital wireless base stations. This paper presents a new analog adaptive PLL (AAPLL) architecture capable of varying the loop bandwidth according to an adaptively updated control sequence under a time-varying noise environment. Since the control sequence is generated from analog signal processing, the PLL operates at several hundred megahertz and can be easily modified to run at gigahertz frequency ranges. This paper consists of five sections including the present section. Section II describes the AAPLL architecture and the analog adaptive bandwidth-control algorithm. Stability and jitter analysis for the AAPLL are given in Section III. AAPLL locking behaviors are also discussed in this section. Section IV shows the AAPLL IC implementation and measurement results. Finally, a brief summary of this paper is given in Section V. II. RECURSIVE EQUATION AND ANALOG LOOP BANDWIDTH CONTROLLER In this section, a recursive bandwidth update algorithm for the analog adaptive controller and its implementation are described. A. Adaptive Bandwidth Control As mentioned in the introduction, a common approach to improve the locking speed of a PLL is to use a gear-shifting method for loop bandwidth control. In such a PLL, when fast locking is required, as in the initial frequency/phase acquisition mode, the loop bandwidth of the PLL is expanded by the increased charge-pump current or the phase detector gain [2]–[4]. Zero phase start (ZPS) is also helpful to reduce the phase acquisition time [4], but limited to the case when the initial-frequency locking has been already established. For the case where rapid initial-frequency locking is required, various techniques with a prestored gear-shifting sequence have been studied [4],

0018–9200/00$10.00 © 2000 IEEE

1138

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 8, AUGUST 2000

Fig. 1. Linearized model of a CP-PLL.

equation is required and used in the proposed AAPLL. The update equation is given by (2), in terms of the loop gain that is proportional to the loop bandwidth.1 (2)

Fig. 2. Conceptual diagram of an analog adaptive controller.

[5]. However, in the case where the channel characteristics vary in time, such as in a disk drive, the prestored gear-shifting sequence is not helpful. Unpredictable phase fluctuation, instant frequency shift, and varying input jitter force such a PLL to use an indefinite wide loop bandwidth in order not to lose the locking. Although a discrete-time adaptive PLL can adjust the bandwidth according to the input noise statistics, it still requires complex hardware and its applications are limited to the low frequency operating systems [5]. A linearized model of a charge-pump PLL (CP-PLL) is shown in Fig. 1. The transfer function for domain is represented by (1) . Here and where are the phase detector gain given by and the voltage, respectively, controlled oscillator (VCO) gain given by is the -transform of the sampled version of , and is the PLL loop filter in domain given by where if a simple passive low-pass filter ( ) is assumed is called the to be used as a loop filter. The quantity . PLL loop gain Discrete-time PLL’s that have an adaptive stepping capability can control the loop bandwidth by a loop-gain update equation minimizing the RLS error [7]. However, it is difficult to fully implement the update equation used in the discrete-time PLL because it requires a significant amount of die size and power consumption. A simpler but still an effective loop gain update

Here, is a forgetting factor that has a positive value close to but less than unity. is a coefficient that normalizes and converts the absolute value of input–output phase errors from radians to dimensionless numbers. The loop gain is calculated by a recursive manner according to (2). When the input–output phase error becomes zero, the forgetting factor makes the loop gain converge to zero as the discrete time increases. Equation (2) reflects the most recent input–output phase most significantly. This recursive error relation is similar to the RLS algorithms commonly used for an estimator [8]. at time is calculated as The loop gain and the abthe weighted sum of the present loop gain solute value of the present input–output phase error, at time . The equation indicates that the loop gain, thus the loop bandwidth, rapidly grows when the recent absolute phase errors become large, and greatly improves the PLL loop tracking capability. When the recent absolute phase errors become small, the loop gain shrinks, as does the loop bandwidth, because the first part of (2) dominates. The reduced loop bandwidth improves the PLL’s input jitter rejection capability. Therefore, (2) satisfies the necessary loop bandwidth control under the presence of unpredictable jitter variation. B. Analog Adaptive Bandwidth Control Equation (2) is achieved by a CP-PLL with a small amount of extra hardware. The second term of (2), an absolute phase error, is obtained from outputs of a phase frequency detector (PFD). Since the PFD and the following charge-pump circuit generate up/down current signals proportional to the input–output phase difference, simply combining these up/down signals through an OR gate gives the absolute phase error signal at time . Fig. 2 shows a conceptual diagram of how the recursive loop gain of the AAPLL is calcubecomes the lated in the controller. The bandwidth voltage bias voltage of the following current source in the charge-pump circuit and controls the amount of charge-pump currents. The current switch steers the current proportional to the phase by the error and increases the voltage across the capacitor corresponding amount at a constant rate while the resistor exponentially discharges the capacitor. The resistor and the capacitor realize the first part of (2) with the forgetting factor . As K

1Here,

the loop bandwidth and the loop gain have the following relationship: =f .

=W

LEE & KIM: LOW-NOISE FAST-LOCK PHASE-LOCKED LOOP

Fig. 3.

1139

AAPLL total block diagram.

derived in the Appendix, in the steady state the voltage across at time is given by the capacitor (3) , is Here, the forgetting factor equals , and is the amount of the charging current in the controller. The loop gain is asymptotically proportional to the bandwidth governed by (3) because the charge-pump current voltage is directly controlled by this voltage. It means that the bandwidth of the AAPLL follows (2). III. CIRCUIT IMPLEMENTATION This section describes the circuit implementation of the adaptive bandwidth controller, the charge-pump circuit, the VCO, and the duty cycle correction circuit. Fig. 3 shows the overall block diagram of the AAPLL, which modifies a conventional PLL by attaching an analog adaptive bandwidth-controlling block. Due to the minor change, the AAPLL is easily applicable to various PLL applications and still takes advantage of the full adaptability.

Fig. 4.

PFD schematic with simplified TSPC D-flip–flops.

Fig. 5.

Up/Down and phase-error signal diagram.

A. Adaptive Bandwidth Controller and Charge Pump The well-designed PFD is used as a phase-detecting block instead of a mixer, though the input signal frequency is high, in order to achieve a wideband capturing capability. The PFD shown in Fig. 4 consists of two simplified true single-phase clock (TSPC) D-flip–flops and one NOR gate. Since the input frequency of the AAPLL is selected to recover the clock signal in DVD systems, whose clock frequency is about 250 MHz, the PFD should generate up and down signals at such a high speed. In order to minimize the abnormal operation of the PFD, TSPC D-flip–flops are used as leaf cells since these intrinsic delays are smaller than those of conventional ones. The adaptive bandwidth controller shown in Figs. 3 and 6 consists of an OR gate and a differential switch, which takes the differential signals from the OR gate. The OR gate sums the up and down signals generated from the PFD and gives the absolute phase error. The differential switch controls current paths to the bandwidth capacitor according to the from for one clock period. phase difference When the phase difference signal is mostly on over a period, such as in the initial-phase acquisition state, the charging rate

of the capacitor exceeds the discharging rate. Hence the of the capacitor and the pumping capacitor voltage current in the charge pump increase. As a result, the phase detector gain increases and so does the loop bandwidth. On the other hand, when the phase error signal is off for the most part

1140

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 8, AUGUST 2000

Fig. 6. Adaptive bandwidth controller and CP schematics.

of one period, such as in the tracking state, the discharging rate exceeds the charging rate and the capacitor voltage decreases. Therefore both the phase-detector gain and the loop bandwidth decrease. In the steady-state tracking mode, the AAPLL loop bandwidth can be very narrow because the phase error becomes zero. However, the AAPLL still maintains the minimum loop bandwidth even in such a case because of the up/down signals generated from the set/reset type PFD, as shown in Fig. 4. In the zero-phase-error and perfect locking case in Fig. 5, the OR-gated effective phase-error signal can still supply currents to the bandwidth capacitor. The statistical variation of the input signal also contributes to maintain this minimum bandwidth. Fig. 5 shows the relation between the phase error and the bandwidth control of the AAPLL. Fig. 6 shows the circuit diagram of the analog adaptive bandwidth controller with a charge-pump circuit. As mentioned beacross the capacitor in parallel fore, the voltage controls the phase-detection gain . In with a resistor ,a order to control the discharging rate of the capacitor , as shown in Fig. 7, is voltage-controlled resistor (VCR) used. The VCR is designed to have fully linear I–V characteristics for a given power supply range. By adjusting the magnitude of the bias current in the VCR branches, the resistance of is changed and so does the discharging rate of the capac. itor is connected to the gates of nMOS The output node of transistors, and controls the charge-pump current by adjusting and in Fig. 6. The charge pump consists the bias point of of two differential input stages, a mirror stage, an output stage, and two small extra current sources. These two small current and sources help the rapid turn-on/off operation for MOS , . The differential PFD signals drive the charge-pump inputs. When the down signal goes high, the current controlled by the is drawn from the loop voltage of the bandwidth capacitor filter. When the up signal goes high, the same amount of current is supplied to the loop filter. B. Voltage-Controlled Oscillator (VCO) A four-stage VCO as shown in Fig. 8 is used for the AAPLL. The basic delay cell consists of six transistors. The cross-coupled and , guarantee the differential operapMOS transistors, tion of the delay cell without a tail-current bias. Auxiliary pMOS and , control the oscillation frequency. Unlike transistors,

Fig. 7. VCR schematic.

conventional differential VCO’s with a current bias, this VCO allows the AAPLL to operate under a single 1.5-V power supply, consuming 1.5 mA. Because the output signal of the VCO swings rail-to-rail, no additional level shifter with a carefully designed replica bias circuit is required to generate CMOS level outputs. and , sharpens the The latch, configured with pMOS , edge of the output signal so that the added noise has little chance to be converted as jitters. Eventually this latch helps the reduction process of the VCO jitter [9], [10]. C. Duty-Cycle Corrector Maintaining a 50% duty-cycle ratio for a clock signal is extremely important in most high-speed clock recovery and clock generation applications because several systems, such as double-data rate (DDR) SDRAM’s and pipelined microprocessors, use negative transition edges of a clock signal in order to increase total system throughputs. This is often achieved by a VCO running at twice as high as the desired clock frequency, and then dividing the VCO frequency by 2. Other approaches use a feedback-type duty-cycle corrector. Since precise placement of the falling edge between two successive rising edges of the VCO output signal is generally controlled by an additional feedback loop, the duty-cycle correctors require an extra training period to stabilize the feedback loop. The AAPLL uses a feed forward-type duty-cycle corrector instead of the feedback type in order to eliminate the extra feedback hardware and the training period, as shown in Fig. 9(a). The duty cycle corrector utilizes multiphase signals generated from a multistage differential VCO. The signal in Fig. 9(b) and selected from the multiphase signals turns on MOS ,

LEE & KIM: LOW-NOISE FAST-LOCK PHASE-LOCKED LOOP

1141

Fig. 8. Four-stage VCO.

(a)

Fig. 10. Stability diagram for loop gain

K

.

A. Stability (a) Fig. 9. Feed forward-type duty-cycle corrector. (a) Duty-cycle corrector schematic. (b) Conceptual diagram of the correcting operation.

, and charges the output node of the duty-cycle corrector almost instantaneously, because the discharge path of the node is already off due to the signal . The signal , which is also selected from the multiphase signals, is the one whose rising edge is shifted by 180 in phase from that of . Similarly, the signal rapidly discharges the node and delivers the desired 50% duty-cycle signal. Since this duty-cycle correction circuit consists of only two transmission gates and two inverters, the silicon area is minimal and the power consumption is negligible. In HSPICE simulation, the proposed duty-cycle corrector keeps the output duty cycle almost perfectly at 50% with the input duty cycles varying from 10 to 90%. IV. ANALYSIS AND SIMULATION In this section, the stability of the AAPLL is analyzed for the adaptively generated loop sequence, and behavioral simulation results for fast lock and large jitter reduction are described.

Since the AAPLL automatically changes the loop bandwidth, a careful loop stability analysis is required. As mentioned in the previous section, an analog adaptive controller adjusts the phase-detector gain of the CP-PLL. Therefore, stability checking for the PLL for each different phase gain should be accomplished first. A complete stability analysis for the CP-PLL is cumbersome because a PLL operates in both a linear and a nonlinear region. A simplified stability analysis for a second-order CP-PLL [8] is used in this section. When the criterion is extended to include the logic delay effect, it can be expressed as (4) Here, , , and are the clock period, the logic delay, and the RC time constant of a loop filter respectively. The stability limit of the AAPLL is derived and simulated for the loop gain using this criterion as shown in Fig. 10. The adaptively generated loop gain sequence by the recursive equation is also shown in the same figure to verify the AAPLL stability. The sequence converges to the minimum bandwidth and the amplitude of this bandwidth is almost similar to that derived from the MMSE criterion [4]. Equation (4) can be written to obtain the stability criby solving a MOS I-V terion for the bandwidth voltage

1142

Fig. 11.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 8, AUGUST 2000

Stability limit graph for bandwidth voltage V

characteristic equation for ration condition.

,

Fig. 12.

.

Simulation setup for the locking behavior measurement.

in Fig. 6 assumiing a satu-

(5) , , , and are , the Here, VCO gain, the nMOS threshold voltage, and the size of a resistor is the of the loop filter in Fig. 3, respectively. Here, , . The stability limit for the bandwidth voltage, size of which is one of the observable values in the measurement setup, and resistor is visualized in Fig. 11 for various capacitor values. The figure shows that all the sequences and are within the stable region for the various resistor and capacitor values.

(a)

B. Output Jitter Recently, it was reported that a CP-PLL has an optimum loop bandwidth that generates minimum jitter in the steady state [11]. A clean tone, that is assumed to have only noise floor and no random walking phase noise, is used as a reference signal for the jitter derivation. Because the AAPLL eventually achieves the steady state locking with a clean reference signal like other conventional PLL’s, the output cycle-to-cycle jitter of the AAPLL can be calculated by

(b) Fig. 13. Simulation results for the locking behavior of the AAPLL and conventional ones. (a) Fixed narrow-bandwidth PLL. (b) Fixed wide-bandwidth PLL.

(6) , , , and are the internal jitter Here, from the VCO, the jitter of the input signal, the rms value of the charge-pump current variation, and the rms value of VCO control voltage noise in the steady state, respectively. C. Behavioral Simulation of a Locking Feature Closed-form analysis of locking behaviors for the AAPLL is difficult because of its nonlinear operation. In this paper, a simulation-based approach like the Monte Carlo Method is used instead. The AAPLL is modeled in a SPICE circuit simulator and

extensively tested by the circuit simulator. Fig. 12 shows the simulation setup for the AAPLL. Fig. 13 compares the simulated locking behavior of the AAPLL with that of a conventional PLL. The bandwidths of the conventional PLL are selected to have two typical values. One is optimized for the initial locking, and the other for the steady-state tracking. The gray line in Fig. 13(a) indicates an incoming signal in the phase domain. The solid line in the figures shows the phase of the AAPLL output signal from initial locking to steady-state tracking. The phase variation of the conventional PLL optimized for steady-state tracking with a narrow bandwidth is shown in the same figure as a dashed line.

LEE & KIM: LOW-NOISE FAST-LOCK PHASE-LOCKED LOOP

1143

Fig. 17.

Experimental results of the locking for a 150–200-MHz input signal.

Fig. 14. Micrograph of the fabricated AAPLL chip.

Fig. 18. Experimental results of the locking for a 180–220-MHz input signal by four steps.

Fig. 15.

Control voltage change for a 0–250 MHz frequency input.

Fig. 16.

Loop bandwidth voltage change for a 0–250-MHz frequency input.

In Fig. 13(b), the phase change of the conventional PLL optimized for initial locking is also shown as a dashed line. This simulation result gives several characteristics of the AAPLL. The AAPLL controlled by the recursive algorithm achieves fast lock in the initial locking period, comparable to the speed obtained from a wide-bandwidth PLL because the consecutive error signals rapidly increase the loop bandwidth of the AAPLL. In the steady-state tracking mode, the AAPLL substantially rejects the input jitter due to the narrower loop bandwidth. V. EXPERIMENTAL RESULTS The AAPLL is fabricated in a 0.6- m single-poly triple-metal n-well CMOS process [12]. The die size for the AAPLL is 0.11 mm . The total power consumption is less than 15 mW with a single 3-V supply. A microphotograph of the AAPLL

Fig. 19. 2.544-ps (rms) and 20-ps (peak-to-peak) cycle-to-cycle jitter at 250-MHz input.

is shown in Fig. 14. To get the forgetting factor , k , 20 pF are used. And 100 A . The locking-speed measureis selected to get ment is carried out using an abrupt change of the input signal frequency from 0 to 250 MHz. Fig. 15 shows the corresponding

1144

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 8, AUGUST 2000

Fig. 20. 50% duty-cycle correction operation over the entire frequency range.

Fig. 22. work.

Comparison between recently reported PLL’s and DLL’s and this

TABLE I AAPLL CHARACTERISTICS SUMMARY

Fig. 21.

VCO linearity.

VCO control-voltage variations. In order to measure the locking speed precisely, the running cycles of the output waveform are counted from the frequency triggering point in the initial locking state. The AAPLL requires less than 30 clock cycles for both fre, reprequency and phase lock in this case. The voltage of senting the AAPLL bandwidth, is also measured. Fig. 16 shows the measured voltage and describes the adaptation of the loop bandwidth in the AAPLL. Figs. 17 and 18 show the measured control voltages and the corresponding output signals when the input frequencies vary from 150 to 200 MHz and from 180 to 220 MHz by four steps respectively. In this case, the frequency and the phase locking require less than 10 symbol periods because the frequency steps are much smaller compared to the previous case. The measured cycle-to-cycle jitters of the AAPLL output signal at a 250-MHz input signal are 2.54 ps (rms) and 20 ps (peak-to-peak) as depicted in Fig. 19. This jitter value contains the inherent measurement setup jitter [13]. In order to test the performance of the duty cycle correction circuit, the duty cycle ratio of the output signal is measured with the input signals from 90 to 260 MHz. Fig. 20 shows the measured result of the corresponding duty cycle ratio. This result indicates the feed forward-type duty-cycle corrector maintains

50% duty-cycle ratio within 2% error for the region. Fig. 21 shows a VCO linearity diagram. The VCO gain is about 100 MHz/V at a 250-MHz input frequency. The AAPLL operates from 80 to 290 MHz with a 3-V supply voltage. Fig. 22 compares the normalized peak-to-peak jitter and the lock time of the AAPLL with those of recently reported PLL’s and DLL’s. Measured characteristics are summarized in Table I. VI. CONCLUSION This paper presents the design of a 250-MHz low-jitter fast-lock analog adaptive bandwidth-controlled PLL on a single chip. The chip is implemented in a 0.6- m standard CMOS process. Simple recursive control logic is proposed to control the bandwidth effectively. The measured locking time is less than 10 cycles in a 10-MHz frequency step and less than 30 cycles from an unknown frequency signal to the 250–MHz signal respectively. The measured output jitters are 2.6 ps (rms) and 20 ps (peak-to-peak). All the components are designed

LEE & KIM: LOW-NOISE FAST-LOCK PHASE-LOCKED LOOP

1145

using analog technique and hence the required die size and the power consumption are minimal. APPENDIX As shown in Fig. 2, the OR gate gives the control signal for the switch according to the phase error signal in the bandwidth controller. When the phase error of the signal is high, the controller signal from the OR gate feeds current to the bandand the voltage across the capacitor inwidth capacitor . As a result, the bandwidth creases at a constant rate increases proportional to the normalized phase voltage . After the charging process, the controller signal error from the OR gate disconnects the path from the current source discharges and connects to the resistor. So the capacitor . The switching action occurs every through the resistor clock cycle period. of the bandwidth capacitor at time The voltage can be written as

(7) where at time

is the voltage of the previous capacitor voltage . The voltage equation can be simplified to (8).

[8] S. Haykin, Adaptive Filter Theory. Englewood Cliffs, NJ: Prentice Hall, 1995. [9] T. C. Weigandt, B. Kim, and P. R. Gray, “Analysis of timing jitter in CMOS ring oscillators,” in Proc. Int. Symp. Circuit and Systems, vol. 4, London, U.K., June 1994, pp. 27–30. [10] C. H. Park and B. Kim, “A low-noise 900-MHz VCO in 0.6-m CMOS,” IEEE J. Solid-State Circuits, vol. 34, pp. 586–591, May 1999. [11] K. Lim, C. H. Park, and B. Kim, “Low noise clock synthesizer design using optimal bandwidth,” in Proc. Int. Symp. Circuit and Systems, Monterey, CA, June 1998, pp. 163–166. [12] J. Lee and B. Kim, “A 250 MHz low jitter adaptive bandwidth PLL,” ISSCC Dig. Tech. Papers, pp. 346–347, Feb. 1999. [13] J. McNeil, “Jitter in ring oscillators,” IEEE J. Solid-State Circuits, vol. 32, pp. 870–879, June 1997.

Joonsuk Lee (S’99) received the B.S. and M.S. degrees in electrical engineering and computer sciences from Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea, in 1995 and 1997, respectively. Since 1997 he has been working toward the Ph.D. degree at the same university. From 1999 to 2000, he was with IBM Microelectronics, Boston, MA, as an Analog and Mixed Signal Designer involved in a high performance sigma–delta ADC/DAC project with Motorola, Lowell, MA. His research interests include PLL/DLL, timing recovery algorithms, high-speed SDRAM interface, and LAN and mixed-mode signal processing technique for telecommunication IC’s. Mr. Lee is the Gold Medal winner of the Human-Tech Thesis Prize from Samsung Electronics Co. Ltd. in 1997, the Gold Medal winner of the Chip Design Contest from LG Semicon Co. Ltd. in 1998, and the Gold Medal winner of the Integrated Design Center (IDEC) Award in 1998.

(8) , . In the initial locking mode, the AAPLL does the locking operation based on (8). Once the AAPLL finished the phase and frequency locking, the . In this case, the forgetting phase error is far less than factor and the proportional coefficient can be be replaced by and .

Here

REFERENCES [1] J. Dunning et al., “An all-digital phase-locked loop with 50-cycle lock time suitable for high-performance microprocessors,” IEEE J. SolidState Circuits, vol. 30, pp. 412–422, Apr. 1995. [2] B. Kim, D. N. Helman, and P. R. Gray, “A 30-MHz hybrid analog/digital clock recovery circuit in 2-m CMOS,” IEEE J. Sold-State Circuits, vol. 25, pp. 1385–1394, Dec. 1990. [3] M. Mizuno et al., “A 0.18 m CMOS hot-standby phase-locked loop using a noise immune adaptive-gain voltage-controlled oscillator,” ISSCC Dig. Tech. Papers, pp. 268–269, Feb. 1995. [4] G. Roh, Y. Lee, and B. Kim, “An optimum phase-acquisition technique for charge-pump phase-locked loops,” IEEE Trans. Circuit Syst. II, vol. 44, pp. 729–740, Sept. 1997. [5] B. Chun, Y. Lee, and B. Kim, “Design of variable loop gain of dual-loop DPLL,” IEEE Trans. Commun., vol. 45, pp. 1520–1522, Dec. 1997. [6] P. F. Driessen, “DPLL bit synchronizer with rapid acquisition using adaptive Kalman filtering techniques,” IEEE Trans. Commun., vol. 452, pp. 2673–2675, Sept. 1994. [7] B. Kim, “Dual-loop DPLL gear-shifting algorithm for fast synchronization,” IEEE Trans. Circuits Syst. II, vol. 44, pp. 577–586, July 1997.

Beomsup Kim (S’87–M’90–SM’95) received the B.S. and M.S. degrees in electronic engineering from Seoul National University, Seoul, Korea, in 1983 and 1985, respectively, and the Ph.D. degree in electrical engineering and computer sciences from the University of California, Berkeley, in 1990. From 1986 to 1990, he worked as a Graduate Researcher and Graduate Instructor at Department of Electrical Engineering and Computer Sciences, University of California, Berkeley. From 1990 to 1991, he was with Chips and Technologies, Inc., San Jose, CA, where he was involved in designing high speed-signal processing IC’s for disk drive read/write channels. From 1991 to 1993, he was with Philips Research, Palo Alto, CA, where he was conducting research on digital signal processing for video, wireless communication, and disk drive applications. During 1994, he was a Consultant, developing the partial-response maximum likelihood detection scheme of the disk drive read/write channel. In 1994, he became an Assistant Professor with the Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea, and is currently an Associate Professor. During 1999, he took a sabbatical leave and stayed at Stanford University, Stanford, CA, and also consulted for Marvell Semiconductor Inc., San Jose, CA, on the Gigabit Ethernet and wireless LAN DSP architecture. His research interests include mixed-mode signal processing IC design for telecommunications, disk drive, local area network, high-speed analog IC design, and VLSI system design. Dr. Kim is a corecipient of the Best Paper Award (1990–1991) for the IEEE JOURNAL OF SOLID-STATE CIRCUITS, and received the Philips Employee Reward in 1992. Between June 1993 and June 1995, he served as an Associate Editor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING.

632

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

A Portable Digital DLL for High-Speed CMOS Interface Circuits Bruno W. Garlepp, Kevin S. Donnelly, Associate Member, IEEE, Jun Kim, Pak S. Chau, Jared L. Zerbe, Charles Huang, Chanh V. Tran, Clemenz L. Portmann, Member, IEEE, Donald Stark, Yiu-Fai Chan, Member, IEEE, Thomas H. Lee, Member, IEEE, and Mark A. Horowitz

Abstract— A digital delay-locked loop (DLL) that achieves infinite phase range and 40-ps worst case phase resolution at 400 MHz was developed in a 3.3-V, 0.4-m standard CMOS process. The DLL uses dual delay lines with an end-of-cycle detector, phase blenders, and duty-cycle correcting multiplexers. This more easily process-portable DLL achieves jitter performance comparable to a more complex analog DLL when placed into identical high-speed interface circuits fabricated on the same test-chip die. At 400 MHz, the digital DLL provides <250 ps peak-to-peak long-term jitter at 3.3 V and operates down to 1.7 V, where it dissipates 60 mW. The DLL occupies 0.96 mm2 : Index Terms—Delay circuits, delay-locked loops (DLL’s), digital control, digital DLL, phase blending, phase control, phase synchronization.

I. INTRODUCTION

I

N RECENT years, there has been a great deal of interest in delay-locked loops (DLL’s) for clock alignment. Both analog and digital DLL’s have been developed [1]–[6], with analog loops generally providing better jitter performance at the expense of greater complexity. This paper describes a digital DLL that achieves jitter performance comparable to an analog DLL. Although the digital DLL uses more area and power than the analog DLL, its greater simplicity, easier portability, and lower minimum required supply voltage makes it very attractive in many clock alignment applications. Additionally, the digital DLL not only operates at lower supply voltages than the analog DLL but it also demonstrates that digital DLL’s have the potential for good power-consumption scaling as supply voltage is decreased. The motivation for the development of this digital DLL was the need for a clock alignment circuit for use in the CMOS interface cells [6] of a high-speed memory system as in [7].1 The memory system operates at 400 MHz, with data transferred on both edges of the clock, producing an effective 800-Mb/s/pin transfer rate. This corresponds to a 1.25-ns bit time. With such tight timing requirements, it becomes imperative to include clock alignment circuits in Manuscript received September 15, 1998; revised December 23, 1998. B. W. Garlepp, K. S. Donnelly, J. Kim, P. S. Chau, J. L. Zerbe, C. Huang, C. V. Tran, C. L. Portmann, D. Stark, and Y.-F. Chan are with Rambus, Inc., Mountain View, CA 94040 USA. T. H. Lee and M. A. Horowitz are with the Center for Integrated Systems, Stanford University, Stanford, CA 94305 USA. Publisher Item Identifier S 0018-9200(99)03668-9. 1 Documentation is available at http://www.rambus.com/html/direct_documentation.html.

the interface cells to provide internal on-chip clocks that are aligned in phase with an external system clock. The clock alignment circuits must provide a phase resolution better than 50 ps and produce a worst case long-term jitter of less than 250 ps peak-to-peak (p–p). To facilitate the use of many different application-specific integrated-circuit controllers with the memory system, the clock alignment circuit should be easily portable across multiple processes without compromising performance. The clock alignment function can be provided using either phase-locked loops (PLL’s) or DLL’s. Because frequency synthesis is not needed in this application, DLL’s are preferred for their unconditional stability, lower phase-error accumulation, and faster locking time. In previous designs of the interface cells for this memory system, we have used an analog DLL with a two-step coarse/fine architecture. A high-level drawing of this approach is shown in Fig. 1. This analog DLL includes a quadrature generator, which produces four reference signals spaced 90 apart in phase to evenly cover the full 360 of phase space. A phase interpolator circuit in the analog DLL receives these reference signals and selects a phase adjacent pair that define a phase quadrant for interpolation to produce an output signal phase-aligned to a reference signal, RefClk. Analog DLL’s constructed with this approach provide several significant benefits. Because most of the elements in the signal path can be made from differential analog blocks with good power-supply rejection ratio (PSRR), the analog DLL architecture of Fig. 1 can provide very good jitter performance. Additionally, it can be carefully designed to occupy relatively little area and consume relatively little current. Furthermore, the analog DLL can provide very small phase steps when locked ( 50 ps). Finally, the architecture of Fig. 1 provides infinite phase range, and one set of quadrature reference signals can be fed to multiple phase interpolators, allowing phase alignment to multiple reference signals simultaneously. However, because of the relatively high analog complexity of this DLL and its individual elements, the analog DLL of Fig. 1 requires a detailed, process-specific implementation, making it relatively labor intensive to port across multiple processes. Although we have traditionally used analog DLL’s to provide the clock alignment function in the CMOS interface cells of the memory system described above, we decided to consider using a digital DLL. Digital DLL’s are characterized by their use of a digital delay line and are typically made from

0018–9200/99$10.00  1999 IEEE

GARLEPP et al.: PORTABLE DIGITAL DLL

633

Fig. 1. Block diagram of a two-step, coarse/fine analog DLL architecture.

simple, digital circuit elements. This facilitates their design and portability across multiple processes. Additionally, because phase information in a digital DLL is stored as a digital state, digital DLL’s can provide very fast timing recovery after being placed into a low power mode. However, conventional digital DLL’s provide only moderate phase resolution and jitter performance [8], [9]. Another benefit of digital DLL’s is their ability to readily operate at lower voltages than analog DLL’s. Because analog DLL’s require the use of saturated current sources, they experience voltage headroom problems as supply voltages decrease. Digital DLL’s, on the other hand, need only enough voltage to ensure the proper operation of their digital gate elements. For the same reason, digital DLL’s better utilize the power-saving benefits of digital CMOS voltage scaling than analog DLL’s. The power of an analog DLL is typically distributed between IV power (where I is power and V is voltage) from the constant current (differential) stages and CV f power (where C is capacitance and f is frequency) from the CMOS (single-ended) stages (if any). The power of digital DLL’s, on the other hand, is determined primarily by CV f power, which decreases quadratically with supply voltage. This paper describes a digital DLL [10] used as the clock alignment circuit in the CMOS interface cells of a high-speed memory system. This work improves upon the performance of previous digital DLL’s by paralleling the two-step coarse/fine analog DLL architectures presented in [4], [5], [7], and [11], allowing the digital DLL to achieve jitter performance comparable to the analog DLL’s. This paper is arranged as follows. Section II describes delay-generation techniques used in conventional digital DLL’s and describes the improved techniques implemented in the new DLL. This section also describes infinite phase generation with the new delay-line scheme. Section III describes several new circuit techniques used for enhancing the phase resolution and signal quality in the new digital DLL. Section IV describes the overall DLL architecture. Section V discusses our test chip and measured results, with special attention given to making a direct, side-by-side comparison of the new digital DLL with an analog DLL placed into identical

CMOS interface cells on the same test-chip die. Section VI concludes this paper. The terms phase and delay are used throughout this paper to describe the DLL’s operation. It is helpful to recall that at a given system frequency, the two quantities are related by the simple equation (1) is phase in degrees, where is frequency in hertz.

is delay in seconds, and

II. DIGITAL DELAY CIRCUIT TECHNIQUES A. Conventional Digital Delay Lines As mentioned above, the purpose of a DLL in a clock alignment application is to provide an output clock signal that is aligned in phase with a reference clock signal of the same frequency. To do this, the DLL must include a mechanism for providing a variable delay to an input signal. The DLL then adjusts this variable delay such that the input signal passes through the delay mechanism and emerges at the output of the DLL aligned in phase with the reference signal. Digital DLL’s generally incorporate a tapped digital delay line as the variable-delay mechanism. The delay line receives an input clock signal (e.g., a buffered version of the reference signal) and passes it through a series of delay elements. The outputs of the delay elements are tapped and buffered to provide a series of phase-adjacent signals. The DLL then selects the delay-line tap that provides the signal that produces an output with a phase that most closely matches the desired phase. A conventional delay line suitable for a CMOS digital DLL is shown in Fig. 2. The delay elements could be implemented with almost any circuit block, but because the phase resolution of the delay line is determined by the delay through the delay elements, delay elements that provide minimal delay are generally preferred. Thus, the delay line of Fig. 2 uses inverters, since they provide the shortest delay of any CMOS digital gate. Because of the inverting characteristic of all standard CMOS gates, the delay line is tapped only at every other inverter

634

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

Fig. 2. Conventional digital delay line with inverter delay elements.

Fig. 3. Complementary delay line with inverter delay elements for improved phase resolution.

output to ensure that each successive tap provides a signal that is adjacent in phase to the signals at its adjacent taps. Although conventional delay lines are attractive for their simplicity, DLL’s designed around such conventional delay lines suffer from several significant limitations. First, the delay line provides fairly coarse phase resolution. For example, the delay line in Fig. 2 provides a minimum phase step corresponding to two inverter delays. Such coarse phase resolution is not fine enough for our clock alignment application. Second, conventional delay lines deliver only a finite phase range. Typically, in order to cover at least one full cycle of phase, the delay-line length and element delays are adjusted to provide at least 360 of phase under the fastest process, voltage, and temperature (PVT) conditions and minimum operating More often, however, the delay frequency line is designed with as much as 720 (i.e., two cycles) of phase under these conditions. This requires the use of a long delay line, occupying a large silicon area and dissipating additional power as the input signal propagates through the many delay elements. Additionally, because inverters offer poor PSRR, voltage supply noise-induced jitter can accumulate as the signal propagates down the delay line. This causes the signals available from the later taps in the delay line to be more jitter prone than the signals from the earlier taps. Last, even with an extended delay line, the DLL can nonetheless run out of phase range and lose lock in a system with slowing drifting phase (e.g., spread-spectrum clocking). These limitations prohibited the use of a conventional delay line in our DLL design.

B. Delay-Line Improvements To overcome some of these limitations, we developed a complementary delay line as shown in Fig. 3 for our DLL. In this architecture, two parallel delay lines with weak cross coupling are driven by complementary input signals ClkIn and

ClkInb. Because of the use of complementary inputs, the two delay lines are tapped after every inverter to provide phaseadjacent signals separated by only one inverter delay, thereby improving the phase resolution by a factor of two. An example of how this delay-line scheme provides single inverter delay resolution is shown by the shaded paths in Fig. 3. The signal that emerges from Tap 2 has passed through three inverter delays, while the signal that emerges from Tap 3 has passed through four inverter delays. However, ClkInb is exactly 180 out of phase with ClkIn, providing the additional inversion required to ensure that the signals emerging from Taps 2 and 3 are indeed separated in phase by exactly one inverter delay. This complementary delay-line architecture also allows the delay lines to be made shorter. The true taps from the delay line can provide the first 180 of phase, while the complement taps can provide the second 180 of phase. Thus, each of the two delay lines can be tuned for only 180 of phase Shorter delay under the fastest PVT conditions and lines provide the additional benefits of reduced maximum jitter accumulation, smaller silicon area, and lower power consumption. The problem that this design creates is a need to determine when to switch from the true taps to the complement taps and vice versa to ensure full and even coverage of the entire 360 phase plane. This is particularly important because the number of delay elements (and output taps) needed to cover 180 changes with PVT conditions and operating frequency. C. Infinite Phase Generation To solve the problem of determining when to switch between the true and complement taps of the complementary delay line, we developed an end-of-cycle (EOC) detector, as shown in Fig. 4, for use with the complementary delay line. An EOC detector is essentially a bank of data flip-flops arranged as a time-to-digital converter for measuring the delay through the delay line. The EOC detector produces a thermometer code

GARLEPP et al.: PORTABLE DIGITAL DLL

635

Fig. 4. EOC detector circuit (180 ).

In other words, to travel counterclockwise around the phase plane, the DLL would successively select Taps 1–4, then Taps 1b–4b, then Taps 1–4, etc., to provide infinite phase range. In this manner, all phase steps are equivalent to at most one inverter delay (i.e., 50 ), except for the Tap 4 to Tap 1b and the Tap 4b to Tap 1 transitions, which are less (30 ). III. RESOLUTION-ENHANCING CIRCUIT TECHNIQUES A. Phase Blending

Fig. 5. Phasor diagram with phasors of signals from the taps of a complementary delay line with one inverter delay 50 :

=

indicating the first 180 of delay in the delay lines. The first state transition in the EOC code indicates the first true tap from the delay line that provides a signal with phase that lags the phase of the signal from Tap 1 by more than 180 With this information, the DLL logic knows when to switch between the true and complement taps of the delay line to ensure full coverage of all 360 of phase space, with phase steps of at most one inverter delay. Use of the EOC code also prevents negative phase steps in the phase-transfer function as taps are successively selected from the delay line. This allows the complementary delay lines to provide infinite, monotonic phase range for the DLL. The clocking signal for the EOC detector, SampClk, is synchronized to the signal from Tap 1 by a replica timing network (not shown). To illustrate the principle of infinite phase generation using the EOC code with this delay-line scheme, refer to Fig. 5, which shows a phasor diagram of the signals from the first five true and complement taps of a complementary delay line like the one shown in Fig. 3. The figure assumes that the PVT conditions and operating frequency are such that the propagation delay of each inverter stage is equal to 50 of phase. In the figure, the solid lines correspond to signals from the true taps, while dashed lines correspond to signals from the complement taps. Because Tap 5 delivers a signal that is delayed by 200 from the signal at Tap 1, the EOC detector’s thermometer code would indicate that Tap 5 is the first true tap to provide a signal with phase beyond 180 relative to the signal from Tap 1. With this information, the DLL knows to switch between the true and complement taps after four stages.

Although the delay-line improvements discussed above reduced the required power and area of the delay line, improved its jitter accumulation performance, enabled infinite phase range, and improved the available phase resolution by a factor of two, this phase resolution was still not good enough to meet the requirements of our memory system. In the 0.4- m process we used, the propagation delay of one inverter over all anticipated PVT conditions varied from 100 to 300 ps. This is much larger than the worst case phase step specification of 50 ps. Therefore, to ensure compliance with this specification, the DLL’s phase resolution needed to be improved by at least six times over what the delay line provided. To solve this problem, we used inverter phase blending. A simple, single-stage phase-blender circuit is shown in Fig. 6(a). This circuit receives two phase-adjacent input and , which are separated in phase by one signals, inverter delay. The phase blender directly passes these two and signals with a simple delay to produce output signals However, it also uses a pair of phase-blending inverters to interpolate between these two input signals to produce a third , having a phase between that of and output signal, This effectively doubles the available phase resolution. However, it is not sufficient to use equal-sized inverters for the phase blending. Fig. 6(b) illustrates a simple model [12] used for determining the ideal relative sizes of the two lies phase-blending inverters to ensure that the phase of and The model approximates directly between that of the two inverters with two simple switched current sources sharing a common resistance–capacitance (RC) load. For two the model rising edge input signals separated in time by yields the equation

(2)

636

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

(a)

(b)

(c)

(d)

(e)

Fig. 6. Phase blending for phase-resolution improvement. (a) Single-stage phase-blender circuit, (b) simple model of phase-blending inverters, (c) plot of WA =(WA + WB ) = 0:50, (d) phase-blender output signal edges for w = 0:50, and (e) phase-blender signal voltages in the simple model for w output signal edges for w = 0:60:

=

where is the total resistive load, is the output capacitance, is the total pulldown current of the two phase-blending is the unit step function, and is the phaseinverters, blending inverter relative size ratio [refer to Fig. 6(a), where is the ratio of the device widths in inverter to the total device widths in both inverters and ]. Equation (2) is the sum of two decaying exponential terms, and Fig. 6(c) shows a plot of the resulting waveform according Because the to this equation for the case where relative to second exponential term is delayed in time by the first, it only begins to affect the slope of the decay after this delay has elapsed. Therefore, without explicitly solving and it is not the equation for each case of will cross obvious when For input signals separated in phase by one inverter delay ), the model specifies that in order to ensure (i.e., lies directly in between that of that the phase of and the phase-blending inverters must be sized in a ratio, such that the leading phase is coupled to an inverter that is bigger than the one that receives the lagging phase. This ratio was also confirmed empirically with simulations. The effect of the relative sizing of the phaseblending inverters is illustrated in Fig. 6(d) and (e), which and shows the resulting output signal edges for , respectively. Clearly, the phase of output signal is closer to that of than to that of when the Although phase-blending inverter size ratio is inverter sizing ensures good, evenly asymmetrical spaced edge placement of the three output signals, it requires lead Reversing the phase of these two input that since the signals would result in a severely misplaced effective sizing ratio would then be

Another design constraint of the phase-blender circuit is that all paths through the circuit must provide precisely the same loading and delay to ensure that the phase relationship between and is maintained by and The phase-blender idea can be extended to multiple cascaded stages for further phase-resolution improvement, with each additional stage improving the resolution by a factor of two. Fig. 7 shows a two-stage cascaded phase-blender circuit that provides a 4x improvement in phase resolution from input to output. Although it is theoretically possible to increase phase resolution indefinitely by adding more and more phase-blender stages, there is a practical limit. The number of inverters in each signal path increases by two with each additional phaseblending stage, making the circuit increasingly susceptible to voltage supply noise-induced jitter due to the additional delay in the signal path. Therefore, it is prudent to increase the number of blending stages to improve phase resolution only until the output phase step size from the phase blender is approximately equivalent to the anticipated voltage supply noise-induced jitter. There are several design limitations that must be considered when designing a cascaded phase blender. First, the importance of proper (asymmetrical) sizing of the phase-blending inverters grows with the number of cascaded blending stages because edge misplacement has a compounding effect as the signals travel through the multiple stages. Additionally, close attention must be paid to ensuring equal loading for equal delay through all paths, requiring the use of dummy devices on otherwise unbalanced paths. Finally, like a single-stage phase blender, a cascaded phase blender also requires the to lead that of to ensure even output phase phase of spacing.

GARLEPP et al.: PORTABLE DIGITAL DLL

637

Fig. 7. Two-stage, cascaded phase-blender circuit for 4x phase-resolution improvement.

Fig. 8. Three-stage, symmetrical phase-blender circuit.

To overcome these design limitations of the cascaded phase blender, we developed a symmetrical phase blender. A block diagram of a three-stage symmetrical phase blender is shown in Fig. 8. This circuit is essentially two parallel cascaded phase-blender circuits, sharing some common paths. When leads the outputs provide leads the outequal output phase spacing. When provide equal output phase puts spacing. Therefore, the circuit provides phase blending with an 8x improvement in phase resolution and equally spaced output signals regardless of which input signal leads in phase. Additionally, the symmetrical blender allows for seamless input switching for continuous phase blending over multiple leads in input delays. For example, assume that

phase. Beginning with output outputs can be successively selected to evenly span and Once is selected, the phase range between can be changed to another signal that lags This beswitching is possible without affecting the signal has no dependence on or coupling from Then cause can be successively seoutputs and lected to evenly span the phase range between Once is selected, can be changed to yet another Again, this is possible without any change signal that lags because has no dependence on or in the signal This process can continue indefinitely. coupling from Also, because all paths through the symmetrical phase blender are inherently balanced, no dummy devices are needed.

638

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

(a)

(b) Fig. 9. (a) A 16 : 1 duty-cycle correcting multiplexer circuit. (b) Duty-cycle correction control circuit.

B. Signal Selection and Duty-Cycle Correction Since the digital DLL was to be placed into a memory system that exchanges data on both edges of the clock, good duty cycle (i.e., close to 50%) is required to ensure that the data exchanged on either edge of the clock have equal bit times. Duty-cycle distortion is usually addressed in PLL’s by simply running the PLL’s voltage-controlled oscillator (VCO) at twice the system frequency and using a postdivider triggered on one edge of the VCO output to produce the output clock from the PLL [13]–[15]. This ensures good, 50% duty cycle. In a DLL, however, no frequency multiplication is possible. The duty cycle of the output signal must be directly corrected to 50%, for example, by using a duty-cycle correcting amplifier in the signal path as in Fig. 1 and in [4]. Although duty-cycle correction can be addressed by placing a duty-cycle corrector at the output of the DLL, this approach has several limitations. First, since duty cycle is corrected only at the output of the DLL, internal DLL signals may have poor duty cycle. It is good practice, however, to maintain 50% duty cycle throughout the signal path to maximize signal propagation as frequency is increased. Second, performing all the duty-cycle correction in one stage at the output of the DLL places a great deal of strain on the duty-cycle correcting circuit; it must have a large duty-cycle correction range to compensate for all the duty-cycle distortion that can accumulate in the signal path. Finally, adding a duty-cycle corrector directly into the signal path increases signal path delay, and thus susceptibility to voltage supply noise-induced jitter. To address the issue of duty cycle, we developed the idea of duty-cycle correcting multiplexers. Since multiplexers would be needed in our DLL regardless, by adding duty-

cycle correcting functionality to the multiplexing circuitry, we implemented duty-cycle correction while requiring minimal additional power, area, and delay. A 16 : 1 duty-cycle correcting multiplexer is shown in Fig. 9(a) with a corresponding control circuit in Fig. 9(b). To facilitate understanding of this circuit’s operation, consider an is selected and has dutyexample. Assume that signal has a high cycle distortion such that output signal is sensed by a duty-cycle duty cycle. Assume also that error detector, which produces a differential output error signal proportional to the difference in duty cycle beand the ideal 50%. Thus, in our example, tween will be greater than causing more current to be steered through the right branch of the control signal in Fig. 9(b) than through the left side. This in turn increases the strength of and compared to and in the duty-cycle correcting multiplexer of Fig. 9(a). These transistors alter the to driving duty cycle of the signal as it passes from to the ideal 50% duty cycle. The use of both PMOS and NMOS devices to perform the duty-cycle correction ensures a symmetrical duty-cycle correction range. Furthermore, because duty-cycle correction has been distributed through two stages, the requirements on each individual duty-cycle correcting stage are reduced. By combining both necessary functions of signal selection and duty-cycle correction, this circuit minimizes signal path delay, jitter accumulation, circuit area, and power compared to performing both functions separately. IV. DLL ARCHITECTURE Fig. 10 is a block diagram of the entire digital DLL, with shading indicating the circuit blocks that were described in

GARLEPP et al.: PORTABLE DIGITAL DLL

Fig. 10.

639

Complete block diagram of the new digital DLL.

greater detail above. The DLL receives an input clock ExtClk and passes it through a clock amplifier and splitter to provide the two complementary input signals (ClkIn and ClkInb) to a 16-stage, 32-tap complementary delay line with EOC detector. The delay line provides 32 signals at its output taps, which then feed into two 32 : 1 duty-cycle correcting multiplexers. Each multiplexer selects one of a pair of phase-adjacent signals from the delay line. The two selected signals then pass to a three-stage, 2 : 16 symmetrical phase-blender circuit, which improves the phase resolution by a factor of eight. A final 16 : 1 duty-cycle correcting multiplexer selects one of the phaseblender output signals and passes it through a clock tree to provide the DLL’s output signal ClkOut. The digital DLL also includes two independent duty-cycle correction loops as shown in the figure. By using two separated duty-cycle correcting loops, duty-cycle correction is distributed throughout the signal path. This ensures a good duty cycle throughout the signal path and reduces the duty-cycle correcting requirements of any one stage. The DLL uses bang-bang-type, all-digital feedback to lock the phase of its output signal ClkOut to that of a reference signal RefClk. A phase detector compares the phase of ClkOut to RefClk and produces a binary error signal, which passes through an optional digital filter to a control logic circuit. The digital filter is a simple majority detector, which has no effect when the loop is acquiring lock but reduces dithering once lock is acquired. The control logic is composed of simple combinational logic and counters that drive the multiplexers to select the two phase-adjacent coarse phase signals from the delay line and the fine phase signal from the phase blender that minimize the phase error between ClkOut and RefClk. Because the phase information is stored in this DLL as a digital state, the DLL can quickly recover from low-power modes, requiring only enough time for the signals to propagate

(a)

(b)

Fig. 11. Test-chip micrograph showing on the left side (a) the analog DLL of [6] and on the right side (b) the new digital DLL integrated into identical interface cells.

through the signal path of the circuit from ExtClk to ClkOut to provide a phase-locked output signal. It is important to recognize the role of the EOC detector and code in this architecture. Because the delay line and blender are uncontrolled, open-loop circuits, the architecture relies on the control circuit’s use of the EOC code to ensure proper coarse phase selection, small maximum phase step size, and phase transfer function monotonicity. The EOC code enables the control logic to determine when to switch between the true and complement taps of the delay line to ensure that phaseadjacent taps are always selected by the coarse multiplexers for the phase blender. The EOC code also enables the control logic to determine which set of blender taps provides evenly spaced output signals.

640

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

(a) Fig. 12.

(b)

Measured transmit eye diagrams at 3.3 V and 400 MHz of the high-speed interface cells with (a) the analog DLL of [6] and (b) the new digital DLL.

V. MEASURED PERFORMANCE A. Test Chip Both the digital DLL presented here and an implementation of the analog DLL of Donnelly et al. [6] were integrated into identical high-speed CMOS interface cells on opposite sides of a single test chip. A micrograph of this test chip is shown in Fig. 11. The test chip I/O was laid out symmetrically so that either interface cell could be tested on the same hardware by simply removing the test chip from the test socket, rotating it 180 and reinserting it into the socket. This allowed a true side-by-side comparison of the two DLL’s operating in a system. The test-chip circuits were fabricated using a standard 0.4- m, 3.3-V CMOS process with 0.65-V threshold voltages. B. Test Results Unless indicated otherwise, all test results described in this section were measured with the analog and digital DLL’s operating in their respective high-speed interface cells at 3.3 V and 400 MHz (800 Mb/s/pin) using the same test vectors. Additionally, the test chip included noise-generator circuits, which produced digital switching noise during the testing of both interfaces. Fig. 12(a) and (b) shows eye diagrams of the two interfaces with the analog and digital DLL’s, respectively. The diagrams indicate the output timing performance of the interface cells in the test system. Although the interface with the analog DLL provided slightly better timing performance, 320 ps p–p versus 380 ps p–p for the interface with the digital DLL, the performances of both interfaces (and therefore, both DLL’s) were comparable. This is surprisingly good considering the extensive use of poor PSRR elements, such as inverters, in

the signal path of the digital DLL. (Note: I/O circuit dutycycle distortion produced the unequal eyes in both diagrams. This is unrelated to the DLL’s.) Fig. 13(a) and (b) shows receive shmoo diagrams for the two interfaces with the analog and digital DLL’s, respectively. The diagrams indicate the CMOS interfaces’ valid timing windows for receiving data. On the diagrams, the -axis is supply 4.0 V) while the -axis indicates input voltage (2.5 V Mb/s ns). data positioning along a bit period ( The normal data position is in the center of the bit period. A black dot in the diagram indicates incorrectly received data for Ideally, the window that combination of bit position and should be entirely white, but realistically, it is limited by jitter from the DLL and other sources. Therefore, this test measures the amount of tolerable skew on the input timing over a range of supply voltages. Although the interface with the analog DLL delivers better timing performance than the interface with the digital DLL (1.02 versus 0.92 ns), both meet the component specification of 0.85 ns. Fig. 14 is a circle plot of the measured phase of the DLL’s output signal ClkOut, illustrating the DLL’s ability to provide infinite phase range. The -axis indicates delay [or phase, as in (1)] of the ClkOut signal relative to a fixed 400-MHz signal. The -axis indicates cycle count. These data were measured by probing the on-chip DLL output signal (ClkOut) and forcing the DLL’s phase-detector output low. This caused the DLL’s output phase to continually advance over time. The term circle plot is used because this diagram is equivalent to sweeping a phasor that represents the phase of ClkOut around the phase plane, thereby drawing a circle in the phase plane. Because the phase of ClkOut is measured relative to a fixed 400-MHz ns signal, the plotted delay appears modulo 2.5 ns, where

GARLEPP et al.: PORTABLE DIGITAL DLL

641

(a)

(b)

Fig. 13. Measured shmoo diagrams showing the 400-MHz receive timing windows of the high-speed interface cells with (a) the analog DLL of [6] and (b) the new digital DLL.

Fig. 14.

Measured circle plot illustrating the infinite phase transfer characteristic of the digital DLL.

at 400 MHz. The absolute value of delay (i.e., from 3.4 to 5.9 ns) is irrelevant since it includes some test-system setup time. The data were measured and plotted using a time-interval analyzer. The circle plot illustrates the DLL’s phase transfer function, showing its reasonably good linearity, monotonicity, and lack of discontinuities. The small bumps in the transfer function indicate a change in coarse reference phase selected from

the delay line. The slope of the transfer function depends on PVT conditions and system frequency, since these conditions determine how many delay-line taps are required to provide 180 of phase. In this case, nine taps were required, resulting in an average phase step size of 20 ps or 2.9 Table I presents a summary of many of the measured and simulated results of the analog and digital DLL’s operating in their respective CMOS interfaces. Although the analog DLL

642

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

(a) Fig. 15.

(b)

Measured DLL power consumption (a) as a function frequency for

TABLE I ANALOG AND DIGITAL DLL PERFORMANCE SUMMARY AT 3.3 V

AND

400 MHz

uses less power and area, and provides better timing performance (smaller long-term jitter) and phase resolution (smaller maximum phase step), both DLL’s enable the interface cells to meet the component requirements when operating in the test system. Additionally, the digital DLL has a higher maximum operating frequency, works at lower supply voltages, and requires much less effort to port to other processes (one versus four man-months). Fig. 15(a) and (b) shows plots of measured DLL power V and measured DLL power versus frequency at MHz, respectively. Although versus voltage supply at both plots show that the digital DLL dissipated more power than the analog DLL for all measured conditions, the plots illustrate the different characteristics of the power consumed by the two DLL’s. As mentioned earlier, the power of both DLL’s is distributed between IV power in the constant-current stages and CV f power in the CMOS stages. The curves in Fig. 15(a) show that the digital DLL’s power dissipation has a greater dependence on frequency than does the analog DLL’s power. The curves in Fig. 15(b) show that the digital DLL’s power dissipation has a predominantly square-law dependence on supply voltage, whereas the analog DLL’s power dissipation has a mixed square-law and linear dependence. These trends confirm that the power of the analog DLL has a relatively higher IV term, whereas the power of the digital DLL has a

VDD = 3:3 V and (b) as a function supply voltage for f = 400 MHz. relatively higher CV f term. This indicates that digital DLL’s have the potential for providing better power scaling than analog DLL’s as supply voltages decrease in the future. Finally, we have shown in Table I and in Fig. 15(b) that the digital DLL operates at lower supply voltages than the analog DLL. Although the operation of the digital DLL was limited to 1.7 V, this limitation was due to our use of several analog elements in the digital DLL (i.e., it was a mostly digital DLL). The digital DLL used an analog clock amplifier, two analog duty-cycle error detectors (see Fig. 10), and an analog quadrature phase detector (in a second loop, not shown). Using an analog design for these circuit blocks in the digital DLL was faster to implement without preventing evaluation of the key digital blocks in the DLL, but their use determined the minimum supply voltage of the digital DLL. VI. CONCLUSION We have described the architecture of a portable digital DLL and demonstrated that it provides jitter performance comparable to an analog DLL when fabricated in the same 3.3-V, 0.4- m standard CMOS process. Several circuits were developed to enable the DLL to provide very fine phase resolution, infinite phase range, and good duty-cycle performance throughout the signal path. Despite its relatively simple architecture, the digital DLL meets all system specifications, and it operates down to lower supply voltages than its analog counterpart. Utilizing essentially only simple digital CMOS gates, the DLL can be ported to new processes in minimal time. For these reasons, this digital DLL provides an alternative to analog DLL’s for clock alignment applications. ACKNOWLEDGMENT The authors thank J. McBride and P. Gordon for layout support and S. Sidiropoulos for helpful insights. REFERENCES [1] A. Efendovich, Y. Afek, C. Sella, and Z. Bikowsky, “Multifrequency zero-jitter delay-locked loop,” IEEE J. Solid-State Circuits, vol. 29, pp. 67–70, Jan. 1994.

GARLEPP et al.: PORTABLE DIGITAL DLL

[2] J.-M. Han, J. Lee, S. Yoon, S. Jeong, C. Park, I. Cho, S. Lee, and D. Seo, “Skew minimization techniques for 256 Mb synchronous DRAM and beyond,” in VLSI Circuits Dig. Tech. Papers, June 1996, pp. 192–193. [3] A. Hatakeyama, H. Mochizuki, T. Aikawa, M. Takita, Y. Ishii, H. Tsuboi, S. Fujioka, S. Yamaguchi, M. Koga, Y. Serizawa, K. Nishimura, K. Kawabata, Y. Okajima, M. Kawano, H. Kojima, K. Mizutani, T. Anezaki, M. Hasegawa, and M. Taguchi, “A 256 Mb SDRAM using register-controlled digital DLL,” in ISSCC 1997 Dig. Tech. Papers, Feb. 1997, pp. 72–73. [4] T. Lee, K. Donnelly, J. Ho, J. Zerbe, M. Johnson, and T. Ishikawa, “A 2.5 V CMOS delay-locked loop for 18 Mbit, 500 megabyte/s DRAM,” IEEE J. Solid-State Circuits, vol. 29, pp. 1491–1496, Dec. 1994. [5] S. Sidiropoulos and M. Horowitz, “A semidigital dual delay-locked loop,” IEEE J. Solid-State Circuits, vol. 32, pp. 1683–1692, Nov. 1997. [6] K. Donnelly, Y. Chan, J. Ho, C. Tran, S. Patel, B. Lau, J. Kim, P. Chau, C. Huang, J. Wei, L. Yu, R. Tarver, R. Kulkarni, D. Stark, and M. Johnson, “A 660MB/s interface megacell portable circuit in 0.3 m–0.7 m CMOS ASIC,” IEEE J. Solid-State Circuits, vol. 31, pp. 1995–2003, Dec. 1996. [7] N. Kushiyama, S. Ohshima, D. Stark, H. Noji, K. Sakurai, S. Takase, T. Furuyama, R. Barth, A. Chan, J. Dillon, J. Gasbarro, M. Griffin, M. Horowitz, T. Lee, and V. Lee, “A 500-Megabyte/s data-rate 4.5M DRAM,” IEEE J. Solid-State Circuits, vol. 28, pp. 490–508, Apr. 1993. [8] M. Hasegawa, M. Nakamura, S. Narui, S. Ohkuma, Y. Kawase, H. Endoh, S. Miyatake, T. Akiba, K. Kawakita, M. Yoshida, S. Yamada, T. Sekigguchi, I. Asano, Y. Tadaki, R. Nagai, S. Miyaoka, K. Kajigaya, M. Horiguchi, and Y. Nakagome, “A 256 Mb SDRAM with subthreshold leakage current suppression,” in ISSCC 1998 Dig. Tech. Papers, Feb. 1998, pp. 80–81. [9] T. Saeki, Y. Nakaoka, M. Fujita, A. Tanaka, K. Nagata, K. Sakakibara, T. Matano, Y. Hoshino, K. Miyano, S. Isa, E. Kakehashi, J. Drynan, M. Komuro, T. Fukase, H. Iwasaki, J. Sekine, M. Igeta, N. Nakanishi, T. Itani, K. Yoshida, H. Yoshino, S. Hashimoto, T. Yoshii, M. Ichinose, T. Imura, M. Uziie, K. Koyama, Y. Fukuzo, and T. Okuda, “A 2.5 ns clock access 250 MHz 256 Mb SDRAM with synchronous mirror delay,” ISSCC 1996 Dig. Tech. Papers, Feb. 1996, pp. 374–375. [10] B. Garlepp, K. Donnelly, J. Kim, P. Chau, J. Zerbe, C. Huang, C. Tran, C. Portmann, D. Stark, Y. Chan, T. Lee, and M. Horowitz, “A portable digital DLL architecture for CMOS interface circuits,” in VLSI Circuits Dig. Tech. Papers, June 1998, pp. 214–215. [11] M. Griffin, J. Zerbe, A. Chan, Y. Jun, Y. Tanaka, W. Richardson, G. Tsang, M. Ching, C. Portmann, Y. Li, B. Stonecypher, L. Lai, K. Lee, V. Lee, D. Stark, H. Modarres, P. Batra, J. Louis-Chandran, J. Privitera, T. Thrush, B. Nickell, J. Yang, V. Hennon, and R. Sauve, “A process independent 800 MB/s DRAM bytewide interface featuring command interleaving and concurrent memory operation,” in ISSCC 1998 Dig. Tech. Papers, Feb. 1998, pp. 156–157. [12] S. Sidiropoulos, “High-performance interchip signalling,” Ph.D. dissertation, Computer Systems Laboratory, Stanford University, Stanford, CA, Apr. 1998. Available as Tech. Rep. CSL-TR-98-760 from http://elib.stanford.edu/. [13] I. Young, M. Mar, and B. Bhushan, “A 0.35 m CMOS 3-880 MHz PLL N/2 multiplier and distribution network with low jitter for microprocessors,” in ISSCC 1997 Dig. Tech. Papers, Feb. 1997, pp. 330–331. [14] V. von Kaenel, D. Aebischer, C. Piguet, and E. Dijkstra, “A 320 MHz, 1.5 mW at 1.35 V CMOS PLL for microprocessor clock generation,” in ISSCC 1996 Dig. Tech. Papers, Feb. 1996, pp. 132–133. [15] V. von Kaenel, D. Aebischer, R. van Dongen, and C. Piguet, “A 600 MHz CMOS PLL microprocessor clock generator with a 1.2 GHz VCO,” in ISSCC 1998 Dig. Tech. Papers, Feb. 1998, pp. 396–397.

643

Kevin S. Donnelly (A’93) was born in Los Angeles, CA, in 1961. He received the B.S. degree in electrical engineering and computer science from the University of California, Berkeley, in 1985 and the M.S. degree in electrical engineering from San Jose State University, San Jose, CA, in 1992. He was with Memorex, Sipex, and National Semiconductor, specializing in bipolar and BiCMOS analog circuits for disk-drive read/write and servo channels. In 1992, he joined Rambus, Inc., Mountain View, CA, where he has designed high-speed CMOS PLL circuits for clock recovery and data synchronization, and highspeed I/O circuits. He currently manages a group developing I/O circuits and PLL’s. His interests include PLL’s and DLL’s, I/O circuits, and data converters. He is a Member of the ISSCC Digital Subcommittee. He has received several circuit design patents. Mr. Donnelly is a coauthor of the paper that won the Best Paper Award at the 1994 ISSCC.

Jun Kim was born in Tokyo, Japan, on November 14, 1966. He received the B.S.E.E. degree from the University of California, Berkeley, in 1989. From 1989 to 1991, he was with Vitelic, Inc., where he worked on SRAM and DRAM development. Between 1991 and 1994, he was with Sun Microsystems, where he was involved in microprocessor and digital circuit design. Since 1994, he has been with Rambus, Inc., Mountain View, CA, as a Designer of high-speed CMOS I/O and DLL circuits.

Pak S. Chau was born in Hong Kong in 1966. He received the B.S. degree in computer system engineering from the University of Massachusetts, Amherst, in 1989 and the M.S. degree in electrical engineering from the University of California, Davis, in 1991. He was with National Semiconductor and Chrontel, Inc., where he worked as an Analog Circuit Designer. In 1994, he joined Rambus, Inc., Mountain View, CA, where he has engaged in designing high-speed I/O and DLL circuits.

Jared L. Zerbe was born in New York, NY, in 1965. He received the B.S. degree in electrical engineering from Stanford University, Stanford, CA, in 1987. He joined VLSI Technology, Inc., in 1987, where he worked on semicustom ASIC design. In 1989, he joined MIPS Computer Systems, where he designed high-performance floating-point blocks. Since 1992, he has been with Rambus Inc., Mountain View, CA, where he has specialized in the design of highspeed I/O and PLL/DLL clock recovery and data synchronization circuits.

Bruno W. Garlepp was born in Bahia, Brazil, on October 29, 1970. He received the B.S.E.E. degree from the University of California, Los Angeles, in 1993 and the M.S.E.E. degree from Stanford University, Stanford, CA, in 1995. In 1993, he joined the Hughes Aircraft Advanced Circuits Technology Center, Torrance, CA. There, he designed high-precision analog integrated circuits for A/D applications, as well as CMOS, bipolar, and SiGe RF circuits for wide-band communications applications. In 1996, he joined Rambus, Inc., Mountain View, CA, where he designs and develops high-speed CMOS clocking and I/O circuits for synchronous chip-to-chip communication.

Charles Huang received the B.S. degree in electrical engineering from the University of Fuzhou, China, in 1982 and the M.S. degree in electrical engineering from the University of Arkansas, Fayetteville, in 1990. He was with ULSI and SGI, working in the area of PLL and cache circuit design. He joined Rambus, Inc., Mountain View, CA, in 1994, where he has being engaged in high-speed CMOS DLL and I/O circuit design.

644

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

Chanh V. Tran was born in Vietnam in 1964. He received the B.S. degree in electrical engineering and computer science form the University of California, Berkeley, in 1989. From 1989 to 1992, he was with National Semiconductor Corp., Santa Clara, CA, where he worked on CMOS mixed-signal IC design in the Data Acquisition Group. In 1992, he joined Rambus Inc., Mountain View, CA, where he has been involved in DLL and high-speed I/O design.

Clemenz L. Portmann (S’92–M’95) received the B.S.E.E. degree from the University of Washington, Seattle, in 1986, the M.S.E.E. degree from the University of Hawaii at Manoa, Honolulu, in 1988, and the Ph.D. degree in electrical engineering from Stanford University, Stanford, CA, in 1995. From 1988 to 1989, he was a Visiting Researcher at Nagoya University, Nagoya, Japan, and the Toyohashi University of Technology, Toyohashi, Japan, under the Monbusho (Ministry of Education) scholarship program. From 1989 to 1990, he was a Design Engineer for VLSI Technology, Inc., San Jose, CA, where he designed standard cell libraries and SRAM’s for ASIC designs. In 1995, he joined Rambus, Inc., Mountain View, CA, where he is engaged in the design of high-speed I/O circuits and DLL’s for DRAM interfaces.

Yiu-Fai Chan (S’76–M’78) received the B.S. and M.S. degrees in electrical engineering and computer science (with highest honors) from the University of California (UC), Berkeley, in 1972 and 1973, respectively. He joined Rambus, Inc., Mountain View, CA, in 1992, where he is Director of Engineering, responsible for the development, application engineering, and customer support of high-speed mixed-signal circuits, device packaging, signal integrity, and system engineering. Prior to that, he was with Tera Microsystems in charge of developing chips for workstations based on the Sparc architecture. He was with Altera Corp. from 1983 to 1990, where he led a team of engineers to develop the industry’s first CMOS programmable logic devices. From 1976 to 1983, he held various technical and management positions at Intersil, Inc. (later a division of General Electric), where he was engaged in the development of various CMOS memories, microprocessors, and peripheral devices. It was there that he developed the first EPROM devices in CMOS technology. From 1974 to 1976, he designed calculator and TV game integrated circuits at National Semiconductor. He has received several patents in circuits and systems technologies. Mr. Chan is a member of Tau Beta Pi, Phi Beta Kappa, and Eta Kappa Nu. He received the University Science Fellowship from UC Berkeley and conducted research on solid-state devices and microwave acoustics. He has published in various IEEE technical publications and presented papers at IEEE technical conferences.

Thomas H. Lee (S’87–M’87), for a photograph and biography, see this issue, p. 585. Donald Stark received the B.S. degree from the Massachusetts Institute of Technology, Cambridge, in 1985 and the M.S. and Ph.D. degrees from Stanford University, Stanford, CA, in 1987 and 1991, respectively, all in electrical engineering. His research interests at Stanford included circuit design and CAD tools for analysis of voltage and current distributions in VLSI circuits. From 1987 to 1991, he was also a Member of the Western Research Laboratory, Digital Equipment Corp., Palo Alto, CA, working on CAD development and ECL circuit design. From 1991 to 1993, he was with the Semiconductor Device Engineering Laboratory, Toshiba Corp., Kawasaki, Japan, working on DRAM design. In 1993, he joined Rambus, Inc., Mountain View, CA, where he currently works on DRAM, high-speed I/O design, and CAD.

Mark A. Horowitz, for a photograph and biography, see p. 528 of the April 1999 issue of this JOURNAL.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999

565

A Register-Controlled Symmetrical DLL for Double-Data-Rate DRAM Feng Lin, Jason Miller, Aaron Schoenfeld, Manny Ma, and R. Jacob Baker

Abstract— This paper describes a register-controlled symmetrical delay-locked loop (RSDLL) for use in a high-frequency double-data-rate DRAM. The RSDLL inserts an optimum delay between the clock input buffer and the clock output buffer, making the DRAM output data change simultaneously with the rising or falling edges of the input clock. This RSDLL is shown to be insensitive to variations in temperature, power-supply voltage, and process after being fabricated in 0.21-m CMOS technology. The measured rms jitter is below 50 ps when the operating frequency is in the range of 125–250 MHz. Index Terms—Delay-locked loops, double-data rate, DRAM.

Fig. 1. Data timing chart for DDR DRAM.

I. INTRODUCTION

I

N synchronous DRAM, the output data strobe (DQS) should be locked to the data outputs (DQ outputs) for high-speed performance. The clock-access and output-hold times of conventional DRAM designs are determined by the delay time of the internal circuits such as the clock input and output buffers. Variations in temperature and process shifts will change the access time and make the valid data window small. To optimize and stabilize the clock-access and output-hold times, an internal register-controlled delaylocked loop (RDLL) [1], [2] has been used to adjust the time difference between the output and input clock signals in SDRAM. Since the RDLL is an all-digital design, it provides robust operation over all process corners. Another solution to the timing constraints found in SDRAM was given in [3] with the synchronous mirror delay (SMD). Compared to RDLL, SMD does not provide as tight of locking but has the advantage that the time to acquire lock between the input and output clocks is only two clock cycles. As the clock speeds used in DRAM continue to increase, the skew becomes the dominating concern, outweighing the disadvantage of the added time to acquire lock needed in an RDLL. This paper describes a modified register-controlled symmetrical delay-locked loop (RSDLL) used to meet the requirements of double-data-rate (DDR) SDRAM (read/write accesses occur on both rising and falling edges of the clock). Here, “symmetrical” means that the delay line used in the DLL has the same delay whether a high-to-low or a low-tohigh logic signal is propagating along the line. The data output timing diagram of a DDR SDRAM is shown in Fig. 1. The RSDLL is used to increase the valid output data window and Manuscript received September 3, 1998; revised November 2, 1998. This work was supported by Micron Technology, Inc. F. Lin and R. J. Baker are with the Microelectronics Research Center, University of Idaho, Boise, ID 83712 USA (e-mail: [email protected]). J. Miller, A. Schoenfeld, and M. Ma are with Micron Technology, Inc., Boise, ID 83707-0006 USA. Publisher Item Identifier S 0018-9200(99)02438-5.

diminish the undefined by synchronizing both rising and falling edges of the DQS signal with the output data DQ. The target specifications for the DLL described in this paper are: 1) robust operation eliminating the need for postproduction tuning (something required in an analog implementation); 2) operating frequency ranging from 143 (286 Mb/s/pin) to 250 MHz (500 Mb/s/pin); 3) tight synchronization (skew less than 5% of the cycle time) between the output clock and data on both rising and falling edges of the output clock; 4) low skew between the input and output clocks (with low, 5% duty cycle distortion); 5) power-supply-voltage operating range from 2.5 to 3.5 V; 6) portability for ease of use in other processes. II. RSDLL ARCHITECTURE Fig. 2 shows the block diagram of the RSDLL. The replica input buffer dummy delay in the feedback path is used to match the delay of the input clock buffer. The phase detector (PD) is used to compare the relative timing of the edges of the input clock signal and the feedback clock signal, which comes through the delay line, controlled by the shift register. The outputs of the PD, shift-right and shift-left, are used to control the shift register. In the simplest case, one bit of the shift register is high. This single bit is used to select a point of entry for CLKIn in the symmetrical delay line (more on this later). When the rising edge of the input clock is within the rising edges of the output clock and one unit delay of the output clock, both outputs of the PD, shift-right and shift-left, go to logic LOW and the loop is locked. The basic operation of the PD is shown in Fig. 3. The resolution of this RSDLL is determined by the size of a unit delay used in the delay

0018–9200/99$10.00  1999 IEEE

566

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999

Fig. 4. Symmetrical delay element used in RSDLL. Fig. 2. Block diagram of RSDLL.

Fig. 5. Delay line and shift register for RSDLL.

switches from LOW to HIGH. An added benefit of the twoNAND delay element is that two point-of-entry control signals are now available. Both are used by the shift register to solve the possible problem caused by the power-up ambiguity in the shift register. Fig. 3. Phase detector used in RSDLL.

B. Control Mechanism of the Shift Register line. The locking range is determined by the number of delay stages used in the symmetrical delay line. Since the DLL circuit inserts an optimum delay time between CLKIn and CLKOut, making the output clock change simultaneously with the next rising edge of the input clock, the minimum operating frequency to which the RSDLL can lock is the reciprocal of the product of the number of stages in the symmetrical delay line with the delay per stage. Adding more delay stages will increase the locking range of the RSDLL at the cost of increased layout area. III. CIRCUIT IMPLEMENTATION A. Basic Delay Element Instead of using an AND gate as the unit-delay stage (NAND inverter), as was done in [1], we used a NAND-gate-based delay element. The implementation of a three-stage delay line is shown in Fig. 4. The problem when using a NAND inverter as the basic delay element is that the propagation delay through the unit delay resulting from a HIGH-to-LOW transition is not equal to the delay of a LOW-to-HIGH transition ( ). Further, this delay varies from one run to another. If the and is 50 ps, for example, the total skew between skew of the falling edges through ten stages will be 0.5 ns. inverter delay element Because of this skew, the NAND cannot be used in a DDR DRAM. In our modified symmetrical delay element, another NAND gate is used instead of an inverter (two NAND gates per delay stage). This scheme guarantees independent of process variations, since that while one NAND switches from a HIGH to LOW, the other

As shown in Figs. 4 and 5, the input clock is a common input to every delay stage. The shift register is used to select a different tap of the delay line (the point of entry for the input clock signal into the symmetrical delay line). The complementary outputs of each register cell are used to is connected directly to the input select the different tap: of a delay element, and is connected to the previous stage of input . From right to left, the first LOW-to-HIGH transition in the shift register sets the point of entry into the delay line. The input clock will pass through the tap with a high logic state in the corresponding position of the shift of this tap is equal to a LOW, it will register. Since the disable the previous stages; therefore, it does not matter what the previous states of the shift register are (shown as “don’t cares,” , in Fig. 5). This control mechanism guarantees that only one path is selected. This scheme also eliminates powerup concerns since the selected tap is simply the first, from the right, LOW–HIGH transition in the shift register. C. Phase Detector To stabilize the movement in the shift register, after making a decision, the phase detector will wait at least two clock cycles before making another decision (Fig. 3). A divide by two was included in the phase detector so that every other decision, resulting from comparing the rising edges of the external clock and the feedback clock, was used. This will provide enough time for the shift register to operate and the output waveform to stabilize before another decision by the PD is implemented. The unwanted side effect of this delay is an increase in the lock time. The shift register is clocked by combining the

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999

567

Fig. 6. Measured rms jitter versus input frequency.

Fig. 7. Measured delay per stage versus VCC and temperature.

shift-left and shift-right signals. The power consumption will decrease when there are no shift-left or -right signals and the loop is locked. Another concern with the phase-detector design is the design of the flip-flops (FF’s). To minimize the static phase error, very fast FF’s should be used, ideally with zero setup time. Also, the metastability of the flip-flops becomes a concern as the loop becomes locked. This together with possible noise contributions and the need to wait, as discussed above, before implementing a shift-right or -left may increase the desirability of adding additional filtering in the phase detector. Some possibilities include increasing the divider ratio used in the phase detector or using a shift register in the phase detector to determine when a number—say, four—shift-rights or -lefts have occurred. For the present design, we were forced to use a divide by two in the phase detector because of lock time requirements.

IV. EXPERIMENTAL RESULTS The RSDLL was fabricated in a 0.21- m, four-poly, doublemetal CMOS technology (a DRAM process). We used a 48stage delay line with an operation frequency of 125–250 MHz. The maximum operating frequency was limited by delays external to the DLL such as the input buffer and interconnect. There was no noticeable static phase error on either rising or falling edges. Fig. 6 shows the resulting rms jitter versus input frequency. One sigma of jitter over the 125–250-MHz frequency range was below 50 ps. The peak-to-peak jitter over this frequency range was below 100 ps. The measured delay per stage versus VCC and temperature is shown in Fig. 7. Note that the 150-ps typical delay of a unit-delay element was very close to the rise and fall times on-chip of the clock signals and represents a practical minimum resolution of a DLL for use in a DDR DRAM fabricated in a 0.21- m process. The power

568

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999

Fig. 8. Measured ICC (DLL current consumption) versus input frequency.

consumption (current draw of the DLL when VCC V) of the prototype RSDLL is illustrated in Fig. 8. We found that the power consumption was mainly determined by the dynamic power dissipation of the symmetrical delay line. Our NAND delays in this test chip were implemented with 10/0.21- m NMOS and 20/0.21- m PMOS. By reducing the widths of both the NMOS and PMOS transistors, the power dissipation can be greatly reduced without a speed or resolution penalty (with the added benefit of reduced layout size). V. CONCLUSIONS The concept of a register-controlled symmetrical delaylocked loop has been presented. The modified symmetrical delay element makes the RSDLL useful in DDR DRAM’s. Experimental results verify that this RSDLL is stable against temperature, process, and power-supply variations. Further development of the RSDLL will include investigations into reducing power consumption, implementing phaselocked loops where the symmetrical delay is used as part of a purely digital registered-controlled oscillator, and developing

two-loop architectures where coarse loops (resolutions on the order of 100 ps) are used with fine loops (resolutions on the order of 10 ps [2]) for wide tuning range and small static phase errors. REFERENCES [1] A. Hatakeyama, H. Mochizuki, T. Aikawa, M. Takita, Y. Ishii, H. Tsuboi, S.-Y. Fujioka, S. Yamaguchi, M. Koga, Y. Serizawa, K. Nishimura, K. Kawabata, Y. Okajima, M. Kawano, H. Kojima, K. Mizutani, T. Anezaki, M. Hasegawa, and M. Taguchi, “A 256-Mb SDRAM using a register-controlled digital DLL,” IEEE J. Solid-State Circuits, vol. 32, pp. 1728–1732, Nov. 1997. [2] S. Eto, M. Matsumiya, M. Takita, Y. Ishii, T. Nakamura, K. Kawabata, H. Kano, A. Kitamoto, T. Ikeda, T. Koga, M. Higashiro, Y. Serizawa, K. Itabashi, O. Tsuboi, Y. Yokoyama, and M. Taguchi, “A 1Gb SDRAM with ground level precharged bitline and non-boosted 2.1V word line,” in ISSCC Dig. Tech. Papers, Feb. 1998, pp. 82–83. [3] T. Saeki, Y. Nakaoka, M. Fujita, A. Tanaka, K. Nagata, K. Sakakibara, T. Matano, Y. Hoshino, K. Miyano, S. Isa, S. Nakazawa, E. Kakehashi, J. M. Drynan, M. Komuro, T. Fukase, H. Iwasaki, M. Takenaka, J. Sekine, M. Igeta, N. Nakanishi, T. Itani, K. Yoshida, H. Yoshino, S. Hashimoto, T. Yoshii, M. Ichinose, T. Imura, M. Uziie, S. Kikuchi, K. Koyama, Y. Fukuzo, and T. Okuda, “A 2.5-ns clock access 250-MHz, 256-Mb SDRAM with synchronous mirror delay,” IEEE J. Solid-State Circuits, vol. 31, pp. 1656–1665, Nov. 1996.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997

1683

A Semidigital Dual Delay-Locked Loop Stefanos Sidiropoulos, Student Member, IEEE, and Mark A. Horowitz, Senior Member, IEEE

Abstract—This paper describes a dual delay-locked loop architecture which achieves low jitter, unlimited (modulo 2 ) phase shift, and large operating range. The architecture employs a core loop to generate coarsely spaced clocks, which are then used by a peripheral loop to generate the main system clock through phase interpolation. The design of an experimental prototype in a 0.8-m CMOS technology is described. The prototype achieves an operating range of 80 kHz–400 MHz. At 250 MHz, its peakto-peak jitter with quiescent supply is 68 ps, and its jitter supply sensitivity is 0.4 ps/mV. Index Terms—Clock synchronization, delay-locked loops, phase interpolation, phase-locked loops.

I. INTRODUCTION

P

HASE-LOCKED loops (PLL’s) and delay-locked loops (DLL’s) are routinely employed in microprocessor and memory IC’s in order to cancel the on-chip clock amplification and buffering delays and improve the I/O timing margins. However, the increasing clock speeds and integration levels of digital circuits create a hostile operating environment for these phase alignment circuits. The supply and substrate noise resulting from the switching of digital circuits affects the PLL or DLL operation and results in output clock jitter which subtracts from the I/O timing margins. In applications where no clock synthesis is required, DLL’s offer an attractive alternative to PLL’s due to their better jitter performance, inherent stability, and simpler design. The main disadvantage of conventional DLL’s, however, is their limited phase capture range. This paper presents a dual DLL architecture which combines several techniques to achieve unlimited phase capture range, low jitter and static-phase error, and four orders of magnitude operating frequency range. This architecture is based on a cascade of two loops. The core loop generates six clocks evenly spaced by 30 which are then used by the peripheral loop to generate the output clock, under the control of a digital finite state machine (FSM). By using phase interpolation, the dual loop can provide unlimited phase shift without the use of a voltage controlled oscillator (VCO). Using an FSM for phase control offers the advantage of enabling the flexible implementation of complicated phase capture algorithms in the digital domain. Finally, by utilizing self-biased techniques, the loop achieves large operating range and low jitter. This paper begins with a brief overview of conventional DLL design. After outlining some of the disadvantages of Manuscript received April 10, 1997; revised June 5, 1997. This work was supported by ARPA under contract DABT63-94-C-0054. The authors are with the Computer Systems Laboratory, Stanford University, Stanford, CA 94305 USA and with Rambus Inc., Mountain View, CA 94040 USA. Publisher Item Identifier S 0018-9200(97)08033-5.

Fig. 1. Block diagram of a conventional DLL.

conventional approaches, Section II presents the dual interpolating DLL architecture. Section III discusses circuit design issues that arose in the prototype implementation of the architecture in a 0.8- m CMOS technology. Section IV discusses the experimental results, and concluding remarks follow in Section V. II. ARCHITECTURE A. Conventional DLL’s A simplified block diagram of a conventional DLL [1] is outlined in Fig. 1. The components are a voltage controlled delay line (VCDL), a phase detector, a charge pump, and a first-order loop filter. The input reference clock drives the delay line which comprises a number of cascaded variable delay buffers. The output clock clk drives the loop phase detector (depicted in this example as a conventional flip-flop). The output of the phase detector is integrated by the charge pump and the loop filter capacitor to generate the loop control voltage . The loop negative feedback drives the control voltage to a value that forces a zero phase error between the output clock and the reference clock. This simple design offers many advantages compared to VCO-based PLL’s. Due to frequency acquisition constraints, PLL’s usually resort to a specific type of phase detector, the state-machine-based phase frequency detector (PFD). In contrast, DLL’s can be easily implemented by using “bang–bang” control—i.e., the control signal of the loop, rather than being proportional to the phase error magnitude, can simply be a binary “up” or “down” indication. Thus, in a “bang–bang” DLL the phase detector can be a replica of the input data receiver resulting in an optimal placement of the sampling clock in the center of the input receiver’s sampling uncertainty window. Additionally, since DLL’s do not use a VCO, phase errors induced by supply or substrate noise do not accumulate over many clock cycles. This improved noise immunity is

0018–9200/97$10.00  1997 IEEE

1684

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997

Fig. 2. Dual interpolating DLL architecture.

the main reason for the increased adoption of DLL’s in applications that do not require clock synthesis. The conventional DLL architecture of Fig. 1 suffers from two important disadvantages: clock jitter propagation and limited phase capture range. Since the VCDL simply delays the reference clock by a single clock cycle, the reference clock jitter directly propagates to the output clock. This allpass filter behavior with respect to the frequency of the jitter of the reference clock results in reduced I/O timing margins, especially in “source-synchronous” interfaces where the reference clock emanates from another noisy digital chip. To overcome this problem, a separate low-jitter differential clock can be used as the input to the delay line. This way the on-chip common-mode noise and the reference clock jitter do not affect the I/O timing margins. A more important problem is that a VCDL does not have the cycle slipping capability of a VCO. Therefore, at a given operating clock frequency, the DLL can delay its input clock by an amount bounded by a minimum and a maximum delay. As a consequence, extra care must be taken by the designer so that the loop will not enter in a state in which it tries to lock toward a delay which is outside these two limits. A compromising solution is to extend the VCDL range and use an FSM that controls the loop start-up. However, DLL’s relying on quadrature phase mixing [2], [3] completely eliminate this problem. This approach is based on the fact that quadrature clocks can be easily generated, given a clock of the correct frequency. The quadrature clocks are then fed to a phase mixer which can produce a clock whose phase can span the complete 0–360 phase interval. This approach eliminates the limited phase range problem of conventional DLL’s since it can essentially rotate the output clock phase infinite times providing seamless switching at the quadrant boundaries. The main disadvantage of quadrature mixing is that the output of

the phase mixer is a clock with a slew rate inherently limited by , where is the output swing of the phase mixer and the period of the clock. This slow clock exhibits increased dynamic noise sensitivity, thus degrading the jitter performance of quadrature mixing DLL’s. The approach presented here overcomes this limitation of quadrature mixing DLL’s since it generates the output clock by interpolating between smaller 30 phase intervals [5]. Simultaneously, by avoiding the use of a VCO it eliminates the phase error accumulation problem of similar approaches [4]. B. Dual Interpolating DLL Fig, 2 shows a high-level block diagram of the proposed architecture. This architecture is based on cascading two loops. A conventional first-order core DLL is locked at 180 phase shift. Assuming that the delay line of the core DLL comprises six buffers, their outputs are six clocks which are evenly spaced by 30 . The peripheral digital loop selects a pair of clocks, and , to interpolate between. Clocks and can be potentially inverted in order to cover the full 0–360 phase range. The resulting clocks, and , drive a digitally controlled interpolator which generates the main clock . The phase of this clock can be any of the quantized phase steps and , where is the between the phases of clocks interpolation controlling word range. The output clock of the interpolator drives the phase detector which compares it to the reference clock. The output of the phase detector is used by the FSM to control the phase selection, the selective phase inversion, and the interpolator phase mixing weight. The FSM moves the phase of the clock according to the phase detector output. In the more common case this means just changing the interpolation mixing weight by one. If, however, the interpolator controlling word has reached its minimum or maximum limit, the FSM must change

SIDIROPOULOS AND HOROWITZ: SEMIDIGITAL DUAL DELAY-LOCKED LOOP

the phase of clock or to the next appropriate selection. This phase selection change might also involve an inversion of the corresponding clock if the current interpolation interval is adjacent to the 0 or 180 boundary. Since these phase selection changes happen only when the corresponding phase mixing weight is zero, no glitches occur on the output clock. The digital “bang–bang” nature of the control results in dithering around the zero phase error point in the lock condition. The dither amplitude is determined by the interpolator phase step and the delay through the peripheral loop. In this architecture the output clock phase can be rotated, so no hard limits exist in the loop phase capture range: the loop provides unlimited (modulo 2 ) phase shift capability. This property eliminates boundary conditions and phase relationship constraints, common in conventional DLL’s. The only requirement is that the DLL input clock and the reference clock are plesiochronous (i.e., their frequency difference is bounded), making this architecture suitable for clock recovery applications. Since the system does not use a VCO, it does not suffer from the phase error accumulation problem of conventional PLL’s. Moreover, the input clocks of the phase interpolator are spaced by just 30 , so the output of the phase interpolator does not exhibit the noise sensitivity of the quadrature mixing approach. Finally, the fact that the capture algorithm can be completely implemented in the digital domain gives great flexibility in its implementation as will be discussed in Section III. Although the prototype described in this paper is implemented with an analog core loop, possible implementations of the architecture can use digital control in both loops, further enhancing the system versatility. Moreover, the architecture can be easily extended to use a clock recirculating scheme in the core loop, so that the output clock frequency is a multiple of the input clock [7]. C. Dual-Loop Dynamics Cascading two loops can compromise the overall system stability and lead to undesired jitter peaking effects. However, as the analysis in this section will show, this dual-loop architecture does not exhibit any jitter peaking irrespective of the dynamics of the two loops. The behavior of the DLL can be analyzed with respect to two types of perturbations: i) input or reference clock delay variations and ii) delay variations resulting from supply and substrate noise. The frequency response of the dual loop can be analyzed by making a continuous time approximation, in which the sampling operation of the phase detectors and the digital nature of the peripheral loop are ignored. This approximation is valid for core and peripheral loop bandwidths at least a decade below the operating frequency. This constraint needs to be satisfied anyway in a DLL in order to eliminate the effects of higher order poles resulting from the delays around loop. Fig. 3 shows the dual loop linearized model including both the loop clocks and , and delay errors introduced by supply or substrate noise . Each of the two loops is modeled as a single pole system, in which the input, output, and error variables are delays, similar to the single-loop analysis published in [7]. For example, the output delay of the

1685

Fig. 3. Linearized dual DLL model.

core loop (in seconds) is the delay established by the core loop delay line, while the input delay is the delay for which the core loop phase detector and charge pump do not generate an error signal. Since the core loop VCDL spans half a clock cycle, is equal to half an input clock period. By using these loop variables, the input-to-output transfer function of the core loop can be easily derived (1) where (in rads/s) is the pole of the core loop as determined by the charge pump current, the phase detector and delay line gain, and the loop filter capacitor. Similarly, the noise-to-delay error transfer function of the core loop can be shown to be (2) where is the additional delay introduced in the core loop from supply or substrate noise, and is the delay error seen by the core loop phase detector. This transfer function indicates that noise induced delay errors can be tracked up to the loop bandwidth and that the response of the loop to a supply step consists of an initial step followed by a decaying exponential with a time constant equal to . Before proceeding to analyze the response of the dual loop, it should be noted that the linearized model of Fig. 3 uses a simplifying assumption. The assumption is that the delay error introduced by supply or substrate variations is identical in both loops and does not depend on the state of the phase selection multiplexers. Since the supply and substrate sensitivity of the peripheral loop depends on the phase selection and will be typically higher due to the presence of the final CMOS system clock buffer, this assumption is not necessarily accurate. However, it does not affect the conclusions drawn below about the stability of the loop, since it only removes a modifying constant, which is equal to the ratio in the delay sensitivities of the two loops. This constant only affects the relative location of the poles and zeros of the resulting transfer function, and, as it will be shown below, the loop is unconditionally stable irrespective of the relation between the individual poles and zeros. Using the model of Fig. 3, it is straightforward to show that the transfer function of the peripheral loop is identical in form to that of the core loop. This result agrees with intuition since in the dual loop system reference clock perturbations do not

1686

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997

(a)

(b)

Fig. 5. Dual DLL detailed block diagram. Fig. 4. Dual loop response to: (a) step change in input clock and (b) supply noise step.

affect the core loop. More interesting is the transfer function of the input clock to dual-loop error since changes in the period of the input clock will cause both the core and peripheral loop to react. Based on (1) and (2), this transfer function can be shown to be (3) This bandpass transfer function exhibits no peaking at any frequency regardless of the relative magnitudes of and . The step response of the system, shown in Fig. 4(a), reveals that unit-step changes in (i.e., step changes in the input clock period) will initially peak at a less than unity value determined by the ratio of the two poles.1 Moreover, as the magnitude of increases, the disturbance on the output is reduced since the peripheral loop compensates quickly for disturbances at the output caused by changes of the input clock. Finally, the transfer function from supply or substrate noiseinduced delay errors to the delay error of the dual loop can be derived (4) Equation (4) also exhibits no peaking at any frequency since the location of the last zero can never be above that of the poles. The step response of the system is plotted in Fig. 4(b) for various ratios of the core to peripheral pole frequencies. Under all conditions, the initial delay error is equal to twice the injected unity error since this error is added on both loops. When the peripheral loop bandwidth is less than half that of the main loop, there is no overshoot in the dual-loop step response. This result occurs because the core loop compensates for its delay error quickly, while the slower peripheral loop compensates for the output delay 1 It should be noted that in case in-CLK and ref-CLK are identical or correlated, the resulting transfer function exhibits a low-pass peaked behavior. Nevertheless the resulting peaking is small, exhibiting a maximum of 15% when pp pc , while it is less than 5% as long as pp and pc are an order of magnitude apart in frequency.

=

error later. When the pole frequencies of the two loops are very close, the system overshoots since the peripheral loop compensates for the output delay error at approximately the same rate as the peripheral loop. The worst case overshoot of approximately 4.5% of the initial disturbance occurs when the peripheral loop bandwidth is twice that of the core loop. As the peripheral loop bandwidth increases, the overshoot becomes progressively smaller since the peripheral loop corrects for both the peripheral and core delay errors. Subsequently, the influence of the slower core loop correction on the output delay error is compensated by the peripheral loop. Therefore, even in the worst case, the dual loop cascade exhibits only minor overshoot. III. CIRCUIT DESIGN A. Overview A more detailed block diagram of the dual loop is shown in Fig. 5. This design uses a separate local differential clock as the input to the delay line. Although the use of this clock is not inherent in the loop architecture, it minimizes the supply sensitivity in applications such as “source synchronous” interfaces. To minimize the effects of input clock duty cycle imperfections and common-mode mismatches, a duty cycle adjuster (DCA) [2] is employed after the first clock receiving buffer. The 50% duty cycle clock drives the core DLL. The core delay line consists of six differential buffers. An extra pair of buffers B B generate two clocks which drive the core loop 180 phase detector. The output of the phase detector controls the charge pump which forces clocks C and C to be 180 out of phase. Since all the buffers in the core delay line (including B and B ) have the same size, all the core VCDL stages have the same fan-out and delay. Therefore, forcing C and C to be 180 out of phase will generate six evenly spaced by 30 clocks at the outputs of the core delay line. The phase selection and phase inversion multiplexers are differential elements controlled by the core loop control voltage. In order to eliminate jitter-sensitive slow clocks, all buffers in the clock path need to have approximately the same

SIDIROPOULOS AND HOROWITZ: SEMIDIGITAL DUAL DELAY-LOCKED LOOP

1687

Fig. 7. Core loop phase detector. (a)

(b) Fig. 6. (a) Core loop delay buffer and (b) charge pump.

bandwidth. For this reason, the phase selection in this design is implemented as a combination of a 3-to-1 and a 2-to-1 multiplexers, instead of a single 6-to-1 differential multiplexer with lower total power. Since the phase selection multiplexer can affect the phase shift of the core delay line through datadependent loading, the six output clocks are buffered before driving the phase selection multiplexers. This way, changing the multiplexer select does not affect the core delay line phase shift. The outputs of the phase inversion multiplexer drive the phase interpolator which generates the low swing differential clock . This clock is then amplified and buffered through a conventional CMOS inverter chain generating the main clock (CLK). The peripheral loop phase detector [1] compares that clock to the reference clock, generating a binary phase error indication that is then fed to the FSM. The FSM based on the phase detector (PD) output selects phases and controls the phase interpolation. B. Core Loop To minimize the jitter supply sensitivity, all the delay buffers in the design, from the input clock (in-CLK) to the output of the phase interpolator ( ), use differential elements with replica feedback biasing [6]. In order to linearize the loop gain and obtain large operating range, the core loop charge pump current is scaled along with the VCDL buffer current as illustrated in Fig. 6 [7]. Voltage is generated through the replica-feedback biasing circuit while is a buffered version of the charge pump control voltage . In addition to the core VCDL buffers, voltages and control the differential buffer elements of the peripheral loop. This ensures that all the buffers in the design have approximately equal delays and that the edge rates of the interpolator input clocks ( ) scale with the operating frequency of the loop.

The sensitivity of the dual-loop architecture to the core loop phase offset depends on the particular application. For the case that the dual DLL is used to just generate a clock whose phase is directly controlled by the phase detector output, the phase offset of the core loop does not affect the system phase offset. In this case, the loop operation will not be affected as long as the core loop phase offset is bounded. An absolute core loop offset less than 30 ensures monotonic switching at the 0 and 180 interpolation boundaries, so the interpolating loop functions correctly, albeit with a larger than nominal interpolation phase step. Core loop phase offsets larger than this amount will result in a hysteretic locking behavior at the quadrant boundaries, which will increase the dither jitter if the reference clock phase forces the dual loop to lock at this point. The dual-loop operation becomes more sensitive to core loop phase offsets in case the designer wants to use this architecture to generate an additional clock that is offset by 90 relative to the reference clock. In such an application, the quadrature clock would be generated by using an extra pair of phase selection and inversion multiplexers whose selects would be offset by three relative to those generating the main clock. This would create a 90 interpolation interval offset, resulting in the required quadrature phase shift. In this case the core loop phase offset would impact the quadrature phase if the selects of the extra multiplexers happen to wrap around the 0 or 180 interpolation interval boundaries. Even though the prototype does not implement quadrature phase generation, a low offset phase detector and careful matching of the layout were used to ensure uniform spacing of the six clocks. A self-biased DLL requires a linear phase detector. To avoid start-up problems that would result from the use of a conventional state machine PFD [7], the core loop uses the phase detector depicted in Fig. 7. This design comprises an S–R latch augmented with two input pulse generators. The absence of extra state storage in this design eliminates any start-up false locking conditions. Additionally, its symmetric structure and the use of pulse triggering minimize the core loop phase offset. The core of the phase detector is an S–R latch-based phase detector. The S–R latch ensures a 180 phase shift between the falling edges of its inputs only when the duty cycle of the two input clocks is identical. However, when the duty cycle of the two input clocks is different, this mismatch will propagate as a core loop phase locking offset. This happens because an unbalanced overlap of the two input clocks causes the output of the S–R latch to have a duty cycle deviating from 50%. To compensate for this effect, the S–R latch is

1688

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997

Fig. 9. Phase interpolator (type-I) schematic. Fig. 8. Phase detector and charge pump simulated transfer function.

augmented with two pulse generators which propagate a low pulse on the positive edges of the input clocks. Since potential overlaps are minimized, the design can tolerate large duty cycle imperfections and still provide an accurate 180 lock in the core loop. Fig. 8 shows the simulated transfer characteristics of the phase detector and charge pump over three extreme process and environment conditions. The cycle time of the two input clocks is set at 4 ns, while their duty cycles are mismatched by 0.5 ns such that the duty cycle of is 37.5% while the duty cycle of clock is 62.5%. It can be seen that the transfer function is linear and has no offset or dead-band around the 2-ns point where the loop actually locks. However, the combination of input pulsing and duty cycle imperfections results in nonlinear transfer function characteristics at the vicinity of the boundaries of the locking range (i.e., 0 and 4 ns). The only effect of this nonlinearity is that the core loop can exhibit an initial slew-rate limited reduction of its phase error, since the output of the phase detector and charge pump is constant. After the phase error has been reduced, such that the phase detector operates within its linear region, the core loop will exhibit a conventional single-pole response. Harmonic locking problems, common in PLL’s using S–R phase detectors, are eliminated in this design since the core loop is reset to its minimum delay at system start-up. C. Phase Interpolator Design The most critical circuit in the design of the peripheral digital loop is the phase interpolator. The phase interpolator receives two clocks and generates the main clock whose phase is the weighted sum of the two input phases. Essentially, the phase interpolator converts a digital weight code generated from the FSM to the phase of clock . Linearity is not important in the design of this digital-to-phase converter since it is enclosed in the peripheral loop feedback. The important requirement is that the interpolation process is monotonic to ensure that no hysteresis exists in the loop locking characteristics. Additionally, the phase step must be minimized since it determines the loop dither amplitude. In this

case, the interpolation step is 1/16 of the 30 interval resulting in approximately 2 peripheral loop nominal dither. Another important requirement is that the design should provide for seamless interpolation-boundary switching. This means that when the input code is such that the weight on one of the input clocks is zero, this clock should have no influence on the output. Fig. 9 shows a schematic diagram of the interpolator used in the prototype chip. This design is a dual input differential buffer which uses the same symmetric loads as all the core VCDL buffers and peripheral loop multiplexers. The bias and are identical with those biasing the rest voltages of the loop, ensuring that its total delay is approximately 30 of the clock period which is the same as the rest of the loop buffers. Therefore, the transition time of the interpolator input clocks is larger than the minimum delay through the interpolator, and the two input transitions overlap. This condition ensures that the interpolator outputs never settle at half of the swing range. The current sources of the two differential pairs are thermometer controlled elements. The thermometer codes are generated by a 16-b long up/down shift register which is controlled by the peripheral loop FSM. By changing the thermometer code, the FSM adjusts in a complementary fashion the currents of the two input differential pairs resulting in a mixing of the two input clock phases. This design (type-I) does not completely satisfy the seamless boundaryswitching requirement. Even when the current through one of the differential pairs is zero, the input still influences the output of the interpolator. This influence is due to the capacitive coupling of the gate-to-drain capacitance of the differential pair input transistors. Fig. 10 shows an alternative design which does not suffer from this problem. In this design (type-II), the interpolator differential pairs consist of unit cell differential pairs. Therefore, when one of the interpolation weight thermometer codes is zero, the corresponding input is completely cut off from the output, eliminating the gate-to-drain coupling capacitance. Fig. 11 shows the simulated transfer function of the interpolator alternative designs. This simulation includes random ( 20 mV) threshold voltage offsets in the thermometer code

SIDIROPOULOS AND HOROWITZ: SEMIDIGITAL DUAL DELAY-LOCKED LOOP

1689

(a)

(b)

Fig. 12. (a) Simplified FSM algorithm and (b) resulting loop behavior.

Fig. 10.

Phase interpolator (type-II) schematic.

Fig. 11.

Simulated phase interpolator transfer function.

current sources. The type-I design exhibits a nominal step of approximately 2 . However, due to the gate-to-drain capacitive coupling effect, the maximum step of 3.8 occurs at the interpolation boundary when the input clock is switched to the next selection. In the lower power implementation where no buffering is used at the core delay line outputs (typeI-unbuf), the data-dependent loading on the previous stage results on a double phase step at the interpolation interval boundaries. Although the alternative design (type-II) does not exhibit a boundary phase step, it was not used since it occupies more layout area and exhibits more nonlinear characteristics due to data-dependent loading of the previous stage. So in the present implementation, worst-case dithering occurs at the interpolation interval boundaries and has an approximate magnitude of 3.8 . D. Finite State Machine A simplified version of the peripheral loop FSM algorithm is outlined in Fig. 12(a). The single state Early of the FSM indicates the relationship of the two interpolator input clocks.

Fig. 13. Prototype chip microphotograph.

On every cycle of its operation, the FSM might undertake two actions. • In the more frequent case of in-range interpolation (i.e., 0), the FSM simply increments or decrements weight the interpolation weight by shifting up or down the interpolator controlling shift register. The direction of the shift is decided based on the phase detector output and the current value of the state Early. • If the peripheral loop has run out of range in the current interpolation interval, the FSM seamlessly slides the current interpolation interval by switching phase or to the next selection. The fact that the interpolation has run out of range in the current interval is simply indicated by a combination of the current value of the state Early, the most or least significant bit of the thermometer register, and the output of the phase detector. In case the current is adjacent to the 0 or 180 selection of phase or interpolation interval boundary, switching to the next selection involves toggling the select of the second-stage phase inversion multiplexer.

1690

Fig. 14.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997

Noise generation and monitoring circuits.

The loop phase capture behavior resulting from this simple algorithm is illustrated in Fig. 12(b). The phase error decreases at a linear rate until the system achieves lock. Subsequently, the loop dithers around the zero phase error point with a dither magnitude of one phase interpolation interval. This occurs because in this type of “bang–bang” system, the output of the phase detector is just a binary phase error without any indication of the magnitude of the phase error. The complementary interpolation weights slew linearly, changing direction at the interpolation interval boundaries. Once the system finds lock, they either dither by one or they stay constant if the dither point happens to lie on an interval boundary. The magnitude of the peripheral loop phase dither is determined by the minimum interpolation step and the delay through the feedback loop. In conventional analog “bang–bang” DLL’s, the loop delay is largely determined by the delay through the delay line and the clock distribution network. However, this digital implementation has a larger minimum loop delay. The underlying reason is that driving the FSM directly from the phase detector output might lead into metastability problems, especially since the whole loop operation is driving the phase detector to its metastable point of operation. For this reason, in this implementation the output of the phase detector is delayed by three metastability hardened flip-flops. This increases the mean time between failures (MTBF) of the system to a calculated worst case of approximately 100 years, but at the same time increases the peripheral loop delay by three cycles. To compensate for that delay and decrease the loop dither, the FSM logic implements a front-end filter which counts eight continuous phase detector “up” or “down” results before propagating this signal to the core FSM. This causes the FSM to delay its next decision until the results of its previous action have been propagated to the phase detector output and reduces the inherent peripheral loop dither to one phase interpolation interval. The digital nature of the peripheral loop control enabled the implementation of the FSM to be done through synthesis of a behavioral verilog model followed by a simple standard cell place and route. The FSM behavioral model was verified by simulation in conjunction with a behavioral core loop model. The significance of this automated methodology is that other

more complicated algorithms can be implemented requiring minimal effort from the designer. Faster phase acquisition can be obtained by disabling the front end counter/filter and changing the interpolation step by a larger amount while the loop is not in lock. The loop can also implement a periodic phase calibration algorithm. In this case, the FSM is activated initially to drive the loop to zero phase error. Then it is shut down to save power and it is periodically turned on to compensate for slow phase drifts. Since the FSM can run at a frequency slower than that of the system clock, the implementation of different algorithms is not in the system critical path. IV. EXPERIMENTAL RESULTS To verify the dual DLL architecture, a chip has been fabricated through MOSIS in the HP CMOS26B process. This is a 1.0- m drawn process with the channel lengths scaled to ˚ 0.8 m. Although the gate oxide in this process is 170 A allowing 5-V operation, the loop design and testing was done with a 3.3-V power supply voltage. Fig. 13 is a micrograph of the chip. The chip integrates the dual DLL, along with noise injection and monitoring circuits and current-mode differential output buffers. The dual DLL occupies 0.8 mm of silicon area, the majority ( 60%) of which is devoted to the peripheral loop logic. This is mainly due to the relatively large standard cell size of the library used in this implementation. The block labeled NOISE-GEN in Fig. 13 is used to inject and measure on-chip supply noise. Fig. 14 shows a schematic diagram of these circuits. The 1000- m wide transistor shorts the on-chip supply rails creating a voltage drop across the off-chip 4- resistor . In order to monitor the droop and the external 5- load on the on-chip supply, device resistor form a broadband attenuating buffer which drives the 50- scope. The gain of the buffer is computed during an initial calibration step. The use of these circuits enables the injection and monitoring of fast ( 1-ns rise time) steps on the on-chip supply. The dither jitter of the loop with quiescent on-chip supply varies with the input phase. This occurs because the offset of the interpolator and the phase selection multiplexers change according to the point of lock. Fig. 15 shows the worst-

SIDIROPOULOS AND HOROWITZ: SEMIDIGITAL DUAL DELAY-LOCKED LOOP

Fig. 15.

Jitter histogram with quiet supply.

Fig. 16.

Jitter histogram with 1-MHz 750-mV square wave supply noise.

case jitter (68 ps) with quiescent supply. The jitter histogram consists of the superposition of two Gaussian distributions resulting from the switching of the peripheral loop between two adjacent interpolation intervals. The distance between the peaks of the two superimposed distributions is about 40 ps, which is in fair agreement with the simulation results. With the noise generation circuits injecting a 750-mV 1-MHz square wave on the chip supply, the peak-to-peak jitter increases to 400 ps (Fig. 16). It should be noted that simulation results indicate that approximately 50% of this jitter is not inherent to the loop, but is due to the supply sensitivity of the succeeding static CMOS clock buffer and off-chip driver. Fig. 17 illustrates the linearity of the interpolation process in the peripheral loop. The figure shows the histogram of the output clock with the peripheral loop FSM continuously rotating that clock. The histogram was generated by keeping

1691

the reference clock to a constant voltage while the input clock ran at its nominal frequency of 250 MHz. The histogram valleys correspond to the interpolation interval boundaries. The spacing of the valleys is within 10% of their nominal 333-ps distance, indicating good matching of the delays of the core loop buffers. The absence of one valley at the 180 interpolation boundary indicates a slight offset in the core loop. The fact that the magnitude of the highest peak of the histogram is smaller than the magnitude of the deepest valley indicates that the interpolator achieves the 4-b target linearity (the 4-b linearity of the interpolator was also confirmed by a similar histogram of a single interpolation interval). Thus the overall linearity of the DLL is limited by the steps at the interpolation interval boundaries. Table I summarizes the loop performance characteristics. With a 3.3-V supply, the loop operates from 80 kHz to

1692

Fig. 17.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997

Interpolation process linearity.

TABLE I PROTOTYPE PERFORMANCE SUMMARY

complicated phase alignment algorithms in a straightforward manner. A prototype using a linear self-biased core loop has been implemented in a 0.8- m technology. The prototype achieves 68-ps peak-to-peak jitter, 0.4-ps/mV supply sensitivity, and 0.08–400 MHz operating range. ACKNOWLEDGMENT The authors are grateful to M. Johnson, T. Lee, J. Maneatis, and K. Yang for helpful discussions. REFERENCES

400 MHz. The phase offset between the reference clock and the output clock of the loop is less than 40 ps. Operating at 250 MHz, the dual DLL draws 31 mA dc from a 3.3-V power supply. V. SUMMARY Although DLL’s are easier to design than PLL’s and offer better jitter performance, their main disadvantage is their limited phase capture range. This disadvantage limits their application to completely synchronous environments and complicates start-up circuitry. This paper presented a dual DLL architecture which removes this limitation by using a core DLL to generate coarsely spaced clocks which are then used by a peripheral DLL to generate the output clock by using phase interpolation. This architecture has unlimited (modulo 2 ) phase shift capability, therefore removing boundary conditions and phase relationship constraints between the system clocks. The only requirement is that the DLL input and reference clocks are plesiochronous, making the dual DLL suitable for clock recovery applications. In addition, the digital nature of the peripheral loop control enables implementation of

[1] M. Johnson and E. Hudson, “A variable delay line PLL for CPUcoprocessor synchronization,” IEEE J. Solid-State Circuits, vol. 23, Oct. 1988. [2] T. Lee et al., “A 2.5 V CMOS delay-locked loop for an 18 Mbit, 500 MB/s DRAM,” IEEE J. Solid-State Circuits, vol. 29, pp. 1491–1496, Dec. 1994. [3] M. Izzard et al., “Analog versus digital control of a clock synchronizer for a 3 Gb/s data with 3.0 V differential ECL,” in Dig. Tech. Papers 1994 Symp. VLSI Circuits, June 1994, pp. 39–40. [4] M. Horowitz et al., “PLL design for a 500 MB/s interface,” in Dig. Tech. Papers Int. Solid State Circuits Conf., Feb. 1993, pp. 160–161. [5] S. Sidiropoulos and M. Horowitz, “A semi-digital delay locked loop with unlimited phase shift capability and 0.08–400 MHz operating range,” in Dig. Tech. Papers Int. Solid State Circuits Conf., Feb. 1997, pp. 332–333. [6] J. Maneatis and M. Horowitz, “Precise delay generation using coupled oscillators,” IEEE J. Solid-State Circuits, vol. 28, pp. 1273–1282, Dec. 1993. [7] J. Maneatis, “Low-jitter process-independent DLL and PLL based on self-biased techniques,” IEEE J. Solid-State Circuits, vol. 31, pp. 1723–1732, Nov. 1996.

Stefanos Sidiropoulos (S’93), for a photograph and biography, see p. 690 of the May 1997 issue of this JOURNAL.

Mark A. Horowitz (S’77–M’78–SM’95), for a photograph and biography, see p. 690 of the May 1997 issue of this JOURNAL.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

1021

A Wide-Range Delay-Locked Loop With a Fixed Latency of One Clock Cycle Hsiang-Hui Chang, Student Member, IEEE, Jyh-Woei Lin, Ching-Yuan Yang, Member, IEEE, and Shen-Iuan Liu, Member, IEEE

Abstract—A delay-locked loop (DLL) with wide-range operation and fixed latency of one clock cycle is proposed. This DLL uses a phase selection circuit and a start-controlled circuit to enlarge the operating frequency range and eliminate harmonic locking problems. Theoretically, the operating frequency range of the DLL can be from 1 ( max ) to 1 (3 min ), where min and max are the minimum and maximum delay of a is the number of delay cells used delay cell, respectively, and in the delay line. Fabricated in a 0.35- m single-poly triple-metal CMOS process, the measurement results show that the proposed DLL can operate from 6 to 130 MHz, and the total delay time between input and output of this DLL is just one clock cycle. From the entire operating frequency range, the maximum rms jitter does not exceed 25 ps. The DLL occupies an active area of 515 m and consumes a maximum power of 132 mW 880 m at 130 MHz. Index Terms—Delay-locked loops, latency, phase-locked loops, wide range.

I. INTRODUCTION

W

ITH THE evolution and continuing scaling of CMOS technologies, the demand for high-speed and high integration density VLSI systems has recently grown exponentially. However, the important synchronization problem among IC modules is becoming one of the bottlenecks for high-performance systems. Phase-locked loops (PLLs) [1]–[3] and delay-locked loops (DLLs) [4]–[7] have been typically employed for the purpose of synchronization. Due to the difference of their configuration, the DLLs are preferred for their unconditional stability and faster locking time than the PLLs. Additionally, a DLL offers better jitter performance than a PLL because noise in the voltage-controlled delay line (VCDL) does not accumulate over many clock cycles. Conventional DLLs may suffer from harmonic locking over wide operating range. If the DLLs are to operate at lower frequency without harmonic locking, the number of delay stages must be increased to let the maximum delay of the delay line be equal to the period of the lowest frequency. However, the

Manuscript received November 5, 2001; revised March 27, 2002. H.-H. Chang and S.-I. Liu are with the Department of Electrical Engineering and Graduate Institute of Electronics Engineering, National Taiwan University, Taipei, Taiwan 10617, R. O. C. (e-mail: [email protected]). J.-W. Lin was with the Department of Electrical Engineering and Graduate Institute of Electronics Engineering, National Taiwan University, Taipei, Taiwan 10617, R. O. C. He is now with Sunplus Corporation, Hsinchu 300, Taiwan. C.-Y. Yang is with the Department of Electrical Engineering, Huafan University, Taipei, Taiwan 223, R. O. C. Publisher Item Identifier 10.1109/JSSC.2002.800922.

Fig. 1. Block diagram of the conventional analog DLL.

maximum operating frequency of a DLL will be limited by the minimum delay of the delay line. In this paper, a DLL with wide-range operation and fixed latency of one clock cycle is proposed by using the phase selection circuit and the start-controlled circuit. The proposed DLL not only locks the delay equal to one clock cycle but also operates without the restrictions stated above. The operating frequency range of the proposed DLL can also be increased. The range problem of conventional DLLs will be discussed in Section II. The architecture of the proposed DLL will be introduced in Section III and the building blocks in this DLL will be described in Section IV. Measurement results are given in Section V. Conclusions are given in Section VI. II. RANGE PROBLEM OF CONVENTIONAL DLLS A conventional DLL, as shown in Fig. 1, consists of four major blocks: the phase detector (PD), the charge-pump circuit, the loop filter, and the VCDL. In the DLL, the reference clock, ref_clk, is propagated through VCDL. The output signal, vcdl_clk, at the end of the delay line is compared with the reference input. If delay different from integer multiples of clock period is detected, the closed loop will automatically correct it by changing the delay time of the VCDL. However, the conventional DLL will fail to lock or falsely lock to two or more pe, of the input signal if the initial delay of the VCDL is riods, or longer than 1.5 , as shown in Fig. 2. shorter than 0.5 Therefore, if the DLL is required to lock the delay to one clock cycle of the input reference signal, the initial delay of the VCDL and 1.5 [7], regardless needs to be located between 0.5 of the initial voltage of the loop filter. Assume that the maximum and the minimum delay of the VCDL are

0018-9200/02$17.00 © 2002 IEEE

1022

Fig. 2.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

DLL in normal lock and false lock conditions. Fig. 3.

System architecture of the proposed DLL.

Fig. 4.

Small-signal model of the conventional analog DLL.

and , respectively. As a result, the period of the input signal should satisfy the following inequality [7]: Max Min

(1)

Equation (1) shows that the DLL is prone to the false locking problem when process variations are taken into account [7]. Therefore, some solutions [6]–[10] are proposed to overcome this problem. They are described as follows. First, the basic idea is to use a phase-frequency detector (PFD) [5], because it has a capture range of 2 , 2 wider than other phase detectors. So, the PFD is a better choice for wide range operation. However, the PFD cannot be used in the DLL alone without any control circuit because the DLL will try to lock a zero delay. A PFD combined with a control circuit is presented in [6]. Nevertheless, in some cases, especially for high-frequency operations, the initial delay between ref_clk and vcdl_clk, as shown in Fig. 1, may be larger than two clock cycles and harmonic locking will occur. Second, a solution called an all-analog DLL using a replica delay line [7] has been developed to solve the narrow frequency range problem of a conventional DLL. If the delay range of the , VCDL satisfies the relation the DLL will have a maximum operation range of 7:1. Third, a digital-controlled DLL called the self-correcting DLL is proposed in [8]. The problem of false locking is solved by the addition of a lock-detect circuit and the modified phase detector. Although this self-correcting DLL avoids false locking, the outputs of the VCDL are required to have an exact 50% duty cycle. The DLL developed in [9] uses a stage selector for fast-locked and wide-range operations, but the DLL requires an additional VCDL, which increases the area. A similar DLL can automatically change its lock mode to extend the operation range, but the latency of the DLL will be larger than one clock cycle [10]. The approach presented in this work uses a phase selection circuit to automatically decide what number of delay cells should be used. This can enable the DLL to operate in the wide-frequency range. A new start-controlled circuit is also

Fig. 5. Block diagram of the phase selection circuit.

presented for the DLL to solve false locking problems and keep the latency of one clock cycle. The exact 50% duty cycle is not necessary. III. ARCHITECTURE OF THE PROPOSED DLL The architecture of the proposed DLL is shown in Fig. 3. It is composed of a conventional analog DLL, a phase selection circuit, and a start-controlled circuit. Before the DLL begins to lock, the phase selection circuit will choose an appropriate delay cell to be a feedback signal (vcdl_clk) according to different frequencies of input signal. In other words, the number of the delay cells may change at different input frequencies. The minimum of the delay line is determined by one unit-delay delay where cell. The maximum delay can be decided as is the number of unit-delay cells. Thus, the operating freto quency range of the DLL can be from . The linear model of the DLL is shown in Fig. 4, where the is the charge-pump cursummer stands for a phase detector,

CHANG et al.: WIDE-RANGE DELAY-LOCKED LOOP WITH FIXED LATENCY OF ONE CLOCK CYCLE

Fig. 6.

Schematic of edge detection circuit. (a) Edge detection circuits. (b) Clock edge generation. (c) Latch

(a)

1023

N.

(b)

Fig. 7. Timing diagram of edge detection circuit.

rent, is the period of the input reference clock, is the is the gain of the capacitor value in the loop filter, and VCDL which is proportional to the number of delay cells. In the steady-state locked condition, the -domain transfer function can be expressed as [11] (2)

is the input delay time and is the output delay where can be expressed as [11] time. The loop bandwidth (3) Since the transfer function is inherently stable, a wider loop bandwidth can be used to achieve fast acquisition time, but

the jitter performance will be degraded. Hence, the following tradeoff design guideline was suggested in [12]: (4) . where When the input frequency is higher, the phase selection circuit will will select the smaller number of delay cells and become smaller. In order to have an adequate loop bandwidth for the DLL, the capacitances used in the loop filter must become smaller. In this work, the 3-bit control signals generated from the phase selection circuit will switch the number of capacitors in the loop filter depending on the selected phase. After the vcdl_clk is decided, the DLL will start the locking process, which is controlled by the start-controlled circuit. First, the delay between input and output of the VCDL is initially set to the minimum value and then allows the down signal of the PFD output activate, supposing that the VCDL’s delay increases with control voltage decreasing. Therefore, the delay between

1024

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

Fig. 8. Schematic of start-controlled circuit associated with PFD.

Fig. 9. Timing diagram of start-controlled circuit.

input and output of the VCDL will increase until it reaches one clock period of the input signal. Thus, the DLL will not fall into false locking and the latency is fixed to one clock cycle no matter how long a delay the VCDL provides. IV. CIRCUIT DESCRIPTION A. Phase Selection Circuit The phase selection circuit consists of two blocks: an edge detector and a multiplexer with a decoder, as shown in Fig. 5. The schematic and timing diagram of the edge detector are shown in Figs. 6 and 7, respectively. To guarantee that the latency of the DLL is just one clock cycle, the first two clock phases in Fig. 6 are reserved for measurement. In practice, the first two clock phases could be included in the phase selection circuit to improve the operating frequency range of the DLL. At the initial state, the signal startb is set to low to reset the edge detector outputs (i.e., d3 d10) and the delay of the VCDL is set to its minimum value. When the signal startb goes high, the edge detector will detect the rising edge of input signals in sequence during the next two rising edges of ref_clk. Referring to Fig. 7(a), suppose that the signals all have rising edges in sequence during one clock cycle, therefore, the outputs (d3 d10) are all high and the multiplexer will select phase 10 as the output signal, vcdl_clk. However, if the input frequency is higher, suppose that the timing diagram is similar to Fig. 7(b). All the inputs have rising edges during one clock cycle, but only the rising edges of phases 1 4 in sequence lead the selected phase to be 4. The vcdl_clk will be low until the selected phase is chosen. After

Fig. 10.

Schematic of the PFD circuit [12].

the vcdl_clk is decided, the DLL will start the locking process, which will be explained later. By the decoder, signals (d3 d10) are decoded to generate 3-bit control signals, which switch the number of capacitors used in the loop filter for tuning the loop bandwidth. B. Start-Controlled Circuit The schematic of the start-controlled circuit and the associated PFD are shown in Fig. 8. It is composed of only two rising-edge trigger D-flip-flops (DFFs), two NAND gates, and

CHANG et al.: WIDE-RANGE DELAY-LOCKED LOOP WITH FIXED LATENCY OF ONE CLOCK CYCLE

Fig. 11.

Schematic of the charge-pump circuit [11].

Fig. 12.

Schematic of the delay cell with replica bias [12].

1025

Fig. 14. Microphotograph of the chip.

Fig. 13.

Simulated transfer curve of the VCDL.

two inverters. The timing diagram of this start-controlled circuit is shown in Fig. 9. Initially, startb is set to low in order to clear the two DFF’s outputs. Therefore, setupb is low and pulls the , as shown in Fig. 3 (i.e., set the VCDL control voltage to delay to its minimum value). In this way, the two inputs of the

PFD are in low level. When startb goes to high, setupb will also go to high. After two consecutive falling edges of vcdl_clk trigger the DFFs, the down signal of the PFD will be activated and let the delay of the VCDL increase. The delay of the VCDL will increase until it is equal to one clock period of the input signal due to the nature of negative feedback architecture. Since the start-controlled circuit forces the delay of the VCDL to its minimum value and controls the delay of the VCDL to increase until its delay is equal to one clock period, the DLL will not fall . In order to get into false locking even when equal delays for path1 and path2, dummy loads should be added in point A. In comparison with [6], this start-controlled circuit

1026

Fig. 15.

Fig. 16.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

DLL at initial state when operating frequency is 6 MHz.

DLL at initial state when operating frequency is 130 MHz.

has two advantages: the proposed circuit is simple, and the duty cycle of ref_clk and vcdl_clk is not required to be exactly 50%.

Fig. 17.

Jitter histogram when DLL operates at 130 MHz.

Fig. 18.

Measurement results of rms jitter over different frequencies. TABLE I PERFORMANCE SUMMARY

C. Other Circuits In this work, the dynamic logic style PFD [13] is adopted to avoid the dead-zone problem and improve the operating speed. To mitigate charge injection errors induced by the parasitic capacitors of the switches and current source transistors, the charge-pump circuit developed in [11] is used here. The delay cell circuit is similar to [11]. The schematics of these circuits are shown in Figs. 10–12. The control voltage of the loop filter is directly connected to nMOS rather than pMOS. Therefore, the transfer curve of delay versus control voltage is monotonic decreasing, as shown in Fig. 13. V. EXPERIMENTAL RESULTS The prototype chip is fabricated in a 0.35- m single-poly triple-metal standard CMOS process. The microphotograph of the chip is shown in Fig. 14. The capacitors used in the loop filter are integrated in the chip and formed by metal-to-metal capacitors. The experimental results show that the DLL can operate in the frequency range of 6–130 MHz. Figs. 15 and 16

show the first four cycles of the DLL in the locking process when the operating frequency is 6 and 130 MHz, respectively. After the signal startb is high, the phase selection circuit will select one of the outputs of the VCDL as close as possible to the next rising edge of the input clock, ref_clk. Figs. 15 and 16 also show that after the signal startb is high, the first rising edge of

CHANG et al.: WIDE-RANGE DELAY-LOCKED LOOP WITH FIXED LATENCY OF ONE CLOCK CYCLE

the output clock of the VCDL, vcdl_clk, leads that of the input clock, ref_clk. Since the signal startb will set the control voltage in Fig. 3 to , the proposed phase detector and the current-pump circuit will discharge the loop filter to increase the delay of the VCDL. It will align the phases between the input clock and output clock of the VCDL. Fig. 17 shows the jitter histogram when the DLL operates at 130 MHz. Fig. 18 shows the measurement results of rms jitter over different frequencies. Table I gives the performance summary. The proposed DLL can be seen to have a wide-operational range and a fixed latency of one clock cycle. VI. CONCLUSION A DLL with wide-range operation and fixed latency of one clock cycle is proposed. First, the multiphase outputs of the VCDL are all sent to the phase selection circuit. Then the phase selection circuit will automatically select one of the delayed outputs to feedback. As a result, this DLL can operate over a wide range without suffering from harmonic locking problems. Ideally, this DLL can operate from to . The experimental results also demonstrate the functionality of the proposed DLL. Moreover, at different operating frequencies, the jitter performances are all in an acceptable range and the latency is just one clock cycle. Since the speed of the proposed circuits can be increased if the more advanced process is used, the performance of the DLL such as the operating frequency range can be improved with a little hardware and design effort. The power consumption of the digital part in the DLL and the total die area will also be reduced. REFERENCES [1] B. Razavi, Monolithic Phase-Locked Loops and Clock Recovery Circuits: Theory and Design. Piscataway, NJ: IEEE Press, 1996. [2] F. M. Gardner, “Charge-pump phase-lock loops,” IEEE Trans. Commun., vol. COM-28, pp. 1849–1858, Nov. 1980. [3] R. E. Best, Phase-Locked Loops: Theory, Design and Applications. New York: McGraw-Hill, 1998. [4] R. L. Aguitar and D. M. Santos, “Multiple target clock distribution with arbitrary delay interconnects,” Electron. Lett., vol. 34, no. 22, pp. 2119–2120, Oct. 1998. [5] R. B. Watson Jr. and R. B. Iknaian, “Clock buffer chip with multiple target automatic skew compensation,” IEEE J. Solid-State Circuits, vol. 30, pp. 1267–1276, Nov. 1995. [6] C. H. Kim et al., “A 64-Mbit 640-Mbyte/s bidirectional data strobed, double-data-rate SDRAM with a 40-mW DLL for a 256-Mbyte memory system,” IEEE J. Solid-State Circuits, vol. 33, pp. 1703–1710, Nov. 1998. [7] Y. Moon, J. Choi, K. Lee, D. K. Jeong, and M. K. Kim, “An all-analog multiphase delay-locked loop using a replica delay line for wide-range operation and low-jitter performance,” IEEE J. Solid-State Circuits, vol. 35, pp. 377–384, Mar. 2000. [8] D. J. Foley and M. P. Flynn, “CMOS DLL-based 2-V 3.2-ps jitter 1-GHz clock synthesizer and temperature-compensated tunable oscillator,” IEEE J. Solid-State Circuits, vol. 36, pp. 417–423, Mar. 2001. [9] H. Yahata, T. Okuda, H. Miyashita, H. Chigasaki, B. Taruishi, T. Akiba, Y. Kawase, T. Tachibana, S. Ueda, S. Aoyama, A. Tsukinori, K. Shibata, M. Horiguchi, Y. Saiki, and Y. Nakagome, “A 256-Mb double-data-rate SDRAM with a 10-mW analog DLL circuit,” in Symp. VLSI Circuits Dig. Tech. Papers, June 2000, pp. 74–75.

1027

[10] Y. Okuda, M. Horiguchi, and Y. Nakagome, “A 66–400 MHz adaptive-lock-mode DLL circuit duty-cycle error correction,” in Symp. VLSI Circuits Dig. Tech. Papers, June 2001, pp. 37–38. [11] J. G. Maneatis, “Low-jitter process-independent DLL and PLL based on self-biased techniques,” IEEE J. Solid-State Circuits, vol. 31, pp. 1723–1732, Nov. 1996. [12] A. Chandrakasan, W. J. Bowhill, and F. Fox, Design of High-Performance Microprocessor Circuit. New York: IEEE Press, 2001, p. 240. [13] S. Kim et al., “A 960-Mb/s/pin interface for skew-tolerant bus using low jitter PLL,” IEEE J. Solid-State Circuits, vol. 32, pp. 691–700, May 1997.

Hsiang-Hui Chang (S’01) was born in Taipei, Taiwan, R.O.C., on February 4, 1975. He received the B.S. and M.S. degrees in electrical engineering from National Taiwan University, Taipei, in 1999 and 2001, respectively. He is currently working toward the Ph.D. degree in electrical engineering at National Taiwan University. His research interests are PLL, DLL, and high-speed interfaces for gigabit transceivers.

Jyh-Woei Lin was born in Kaoshiung, Taiwan, R.O.C., in 1974. He received the B.S. degree in electrical engineering from National Taipei University of Technology in 1996, and the M.S. degree in electrical engineering from National Taiwan University in 2001. He joined Sunplus Corporation, Hsinchu, Taiwan, in 2001 as an Analog Circuit Designer. His research interests include PLL, DLL, and interface circuits for high-speed data links.

Ching-Yuan Yang (S’97–M’01) was born in Miaoli, Taiwan, R.O.C., in 1967. He received the B.S. degree in electrical engineering from the Tatung Institute of Technology, Taipei, Taiwan, R.O.C., in 1990, and the M.S. and Ph.D. degrees in electrical engineering from National Taiwan University, Taipei, in 1996 and 2000, respectively. He has been on the faculty of Huafan University, Taiwan, since 2000, where he is currently an Assistant Professor with the Department of Electronics Engineering. His research interests are in the area of mixed-signal integrated circuits and systems for high-speed interfaces and wireless communication.

Shen-Iuan Liu (S’88–M’93) was born in Keelung, Taiwan, R.O.C., on April 4, 1965. He received both the B.S. and Ph.D. degrees in electrical engineering from National Taiwan University, Taipei, in 1987 and 1991, respectively. During 1991–1993, he served as a Second Lieutenant in the Chinese Air Force. During 1991–1994, he was an Associate Professor in the Department of Electronic Engineering of National Taiwan Institute of Technology. He joined the Department of Electrical Engineering, National Taiwan University, Taipei, in 1994, where he has been a Professor since 1998. His research interests are in analog and digital integrated circuits and systems.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000

1553

Active GHz Clock Network Using Distributed PLLs Vadim Gutnik, Member, IEEE, and Anantha P. Chandrakasan, Member, IEEE

Abstract—A novel clock network composed of multiple synchronized phase-locked loops is analyzed, implemented, and tested. Undesirable large-signal stable (mode-locked) states dictate the transfer characteristic of the phase detectors; a matrix formulation of the linearized system allows direct calculation of system poles for any desired oscillator configuration. A 16-oscillator 1.3-GHz distributed clock network in 0.35- m CMOS is presented here. Index Terms—Clock network, multiple oscillator system, phaselocked loop.

I. INTRODUCTION

T

HE CLOCK distribution network of a modern microprocessor uses a significant fraction of the total chip power and has substantial impact on the overall performance of the system. For example, the 72-W 600-MHz Alpha processor [1] dissipates 16 W in the global clock distribution, and another 23 W in the local clocks: more than half the power goes to driving the clock net. The clock uncertainty budget for a global clock is 10% of a clock period, which translates to a 10% reduction in maximum operating speed; as argued below, this penalty is likely to increase for currently popular clock architectures. Most conventional microprocessors use a balanced tree to distribute the clock [1]–[3]. Because the delays to all nodes are nominally equal, trees may be expected to have low skew. However, at gigahertz clock speeds a large fraction of skew and jitter comes from random variations in gate and interconnect delay. The majority of jitter in a clock tree is introduced by buffers and inter-line coupling to the clock wires; a relatively small amount comes from noise in the source oscillator [4]. Therefore, a primary consideration in clock design is matching delay along the clock path. As clock speed increases, signal delay across a chip becomes comparable to a clock cycle. For example, a 2-cm-long wire in a 0.25- m process has a delay of 0.86 ns, while the clock might be as high as 1 GHz; scaling to 4 GHz, the same wire (with optimal buffering) will have a delay of approximately 0.43 ns, compared to a clock period of 0.25 ns. In all practical cases a signal that takes longer than a clock cycle to propagate would be pipelined, and hence re-clocked. The fundamental weakness of tree distribution (and networks that depend on tree matching) Manuscript received March 24, 2000; revised June 24, 2000. This paper was supported by the MARCO Focused Research Center on Interconnects, which was funded at the Massachusetts Institute of Technology through a subcontract from the Georgia Institute of Technology, and supported in part by a Graduate Fellowship from the Intel Corporation. V. Gutnik was with M.I.T. Microsystems Technology Lab, Cambridge, MA 02139 USA. He is now with Silicon Laboratories, Austin, TX 78749 USA (e-mail: [email protected]). A. P. Chandrakasan is with M.I.T. Microsystems Technology Lab, Cambridge, MA 02139 USA (e-mail: [email protected]). Publisher Item Identifier S 0018-9200(00)09441-5.

is that skew is only relevant between communicating latches, but the clock path is always the length of the chip. Clock speeds increase with gate delay, and processor architectures can exploit both locality of blocks and pipelining to avoid penalty due to long signal paths, but the error in a global clock scales with the total path delay, and is thus a growing fraction of a clock cycle. In this paper, we consider the effects of static and dynamic mismatch on a few representative clock networks in Section II and propose a distributed generation scheme that needs only local synchronization to generate a global clock. Large and small-signal stability of the proposed network is analyzed in Section III. This clock was implemented on a test chip; circuit details and results are presented in Sections IV and V. II. MODELING RANDOM SKEW A. Assumptions Given sufficiently accurate models, systematic skew can be corrected at design time. Therefore, the primary interest is random zero-mean variations. For the sake of comparing architectures, we make several simplifying assumptions. 1) Delay mismatch, both static and dynamic, is proportional to total delay. 2) Wire RC delay is independent of gate delay ( ). 3) The clock period proportional to gate delay. 4) Chip size is independent of gate delay. 5) In 0.25- m technology, signal delay across a die equals one clock period. Assumption 1 is inaccurate, but convenient. Mismatch due to gradients scales as delay squared; purely random short-distance mismatch scales as the square root of delay. For the sake of analysis, however, we will assume that uncertainty scales linearly. Assumptions 2, 3, and 4 are approximately true, given historical data: as the geometries scale the resistance increase in clock wires is offset by lower capacitance; processor cycle time is generally on the order of 8–16 gate delays; and chip sizes mm . hover around Assumption 5 serves to normalize signal delay, chip size, and clock speed. It is not coincidental that random variation has become a noticeable issue at about the time when cross-die signal delay is comparable to one clock cycle: as a heuristic, 10% of a clock cycle is allocated for unmodeled skew and jitter margin, and delay uncertainty is about 5%–10% of delay. Hence, when delay across a chip is comparable to clock cycle time, random delay is a considerable fraction of the total clock error budget. B. Tree To keep internal clock skew low, a tree is generally made deep enough that a tile driven by a single leaf is small compared to the size of the chip [5], [6]. In turn, this means that the path from the

0018–9200/00$10.00 © 2000 IEEE

1554

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000

D. Active Feedback

Fig. 1. Simulated edge in a grid with skew to the drivers.

Fig. 2.

Short circuit power in a grid vs. input tree skew.

clock source to the load is comparable to the size of the entire die. Because the worst-case skew occurs between two adjacent leaves for which the clock path was completely different, worst case mismatch depends on the entire source-to-leaf delay. And worse, the problem becomes worse with process scaling. Because RC delay does not scale, delay along an optimally buffered ; hence the skew as a fraction of the clock line scales only as with falling . period grows as C. Grid Modern grids are H-tree-grid hybrids: a short H-tree distributes clock to a few (4 or 16, for example) buffers around a chip, and those buffers drive a clock grid in parallel. Shorting the buffers together helps drive down some of the uncertainty at the cost of increased short-circuit power during switching and somewhat slower edge rates. However, rise time scales linearly with , so by the same reasoning as applied to the tree scaling arguments, skew as a fraction of rise time will increase as gate delay falls. When the tree skew exceeds rise with time, short circuit power dissipation increases rapidly, and the clock edges begin to show an unacceptable kink. Fig. 1 shows simulated edge shapes with increasing input skew for a grid driven from a 4-level tree with skews from 0 to 200 ps, and Fig. 2 shows the corresponding short-circuit power dissipation, -power for the clock grid. plotted as a fraction of

As is evident from the given examples, most of the skew comes from the initial long-distance distribution of a clock to relatively small loads. A delay-locked loop (DLL) could be adapted to measure and cancel out wire variations, as shown in Fig. 3. If the round-trip delay is tuned to an even number of clock cycles, the wire has nominally 0 delay. Unfortunately, despite the apparent symmetry, the forward and reverse paths do not match well for two reasons. First, “matched” buffers are physically separated. In Fig. 3, should match , although it would be physically near . is not as far away from its matched pair as it might be in a tree, but it will still typically be millimeters away. Second, there is no temporal at a different time than correlation. The clock signal passes it passes , so any time-dependent variations, including those due to power supply and signal coupling, do not match. Another approach, proposed by Intel, is shown in Fig. 4 [7]. Here, a DLL matches delays to two half-trees; an obvious generalization, with four DLLs matching quarter-trees is shown in Fig. 5. Static delay variations of some nearest neighbors are canceled out by the DLL to within the precision of the matching of the comparators. The drawback is that some neighboring nodes, as and in Fig. 5, are only related through multiple DLLs. A much better result can be obtained by using DLLs that take multiple reference inputs, and adjust output phase to be aligned exactly between the two inputs. The network can then be redrawn somewhat more symmetrically, as Fig. 6. (For clarity, the local tree was not drawn, and the connections to the comparators are abstracted.) Optimization of the number of tiles is straightforward. Internal skew scales with tile area, so as the number of tiles increases, internal skew falls. However, every boundary between tiles introduces some skew because of mismatch in the phase detector (PD). Hence, as the number of tiles increases, the number of boundaries increases. Fig. 7 shows the optimization curves calculated for this clock metric. As in other clock networks, faster clocks require a more finely grained architecture. Jitter in a DLL network will rise in exactly the same way as it increases in clock trees, and for the same reasons. Skew scales linearly with because it is comprised of comparator mismatches and delays across each leaf-patch. Note, however, that in a phaselocked loop (PLL) the noise can be expected to scale with ; a PLL network like the one in Fig. 6 would have total clock uncertainty that is a constant fraction of the clock period. III. STABILITY We propose a distributed clock network comprised of an array of synchronized PLLs. Independent oscillators generate the clock signal at multiple points (“nodes”) across a chip; each oscillator distributes the clock to only to a small section of the chip (“tile”) (Fig. 8). PDs at the boundaries between tiles produce error signals that are summed by an amplifier in each tile and used to adjust the frequency of the node oscillator. In general, the network need not be square or regular. With locally generated clocks, there are no chip-length clock lines to couple in jitter; skew is introduced only by asymmetries in PDs instead of mismatches in physically separated buffers,

GUTNIK AND CHANDRAKASAN: ACTIVE GHz CLOCK NETWORK USING DISTRIBUTED PLLs

Fig. 3.

1555

Low-skew wire with DLL.

Fig. 7.

Tile number optimization.

Fig. 4. Matching tree leaves with a DLL.

Fig. 8. Distributed clocking network.

Fig. 5.

DLL architecture.

the use of multiple independent clocks [8], this approach produces a single fully synchronized clock. The rest of this section examines small and large signal stability of a distributed PLL. A. Small Signal In a multiple-oscillator PLL large-signal and small-signal behavior are interrelated. In normal operation, the oscillators are phase-locked, and jitter depends on the network response to noise. Because startup is expected to take a negligibly small fraction of time, the connection of the oscillators is optimized for small-signal behavior rather than to make initial acquisition more efficient. The linearized small-signal behavior, valid when the oscillators are nearly in phase, is analyzed first. B. General Derivation

Fig. 6. Multi-input delay cell DLL architecture.

and the clock is regenerated at each node, so high-frequency jitter does not accumulate with distance from the clock source. Unlike earlier work on multiple clock domains which suggested

The block diagram (Fig. 9) of a multiple-oscillator PLL is essentially identical to the one for a conventional PLL, except that the connections between blocks are vectors instead of individual signals, and the gains and transfer functions are matrices instead of scalars. This means that the PD becomes matrix ,

1556

Fig. 9.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000

Linear system model of a multi-oscillator PLL.

system ), the output of PLL is the input to PLL is described by shown in Fig. 10.

, as

(3) Fig. 10. One-dimensional PLL array; symmetrical with the dotted-line connections.

of size , and the loop filter becomes ,a matrix. is an incorresponding matrix. The network of oscillators tuitively meaningful is similar to a lumped circuit with a node for each oscillator and a branch for each connection between pairs of oscillators. Node voltages in represent oscillator phase, and branch currents represent the error signals on the output of the PD. is the conductance matrix for with unity conductance branches. for a four-oscillator network is shown in (1). Each off-diagonal is 1 if there is a PD between node and node ; entry is the number of detectors attached to node .

(1)

DC gain in the loop can be lumped into . Writing the transfer function in matrix form gives (2) where is the phase error input to each phase comparator. is the reference phase, and are the noise contributions from interconnect and PD mismatch. C. Examples is determined by the geometry of the tiles, and Matrix hence will constrained by the placement of clock loads, which for this problem is fixed. Assuming the simplest possible PLL, . This leaves , , and as design variables. There are still far too many choices to find the general optimum, but a few examples may help guide the search. 1) One-Dimensional Array: A one-dimensional array of oscillators with PDs between neighbors is the simplest generalization of a single PLL. In a perfectly asymmetrical array (call this

This system has multiple poles at the same place where a singleoscillator PLL has single poles. On the other hand, in a perfectly symmetrical array (call it ), the input to each oscillator is the phase of oscillators and (Fig. 10, with the dotted-line connections). The matrix is the same because the physical arrangement of nodes changes: is identical, but

(4)

as in , it is necessary To achieve the same phase margin in to lower the gain . This can be shown with a geometrical ar, gument: in , when the phase of oscillator changes by the change is measured at two PDs, so oscillator feels twice the feedback that it would have felt in , and at the same time, and both adjust in the opposite direction, oscillators giving four times the effective gain. Hence, the gain must be decreased by a factor of approximately four. Mathematically, the is 1, but the largest eigenvalue of largest eigenvalues of is 3.5. Poles of the symmetrical system, solved via (2), and are plotted in Fig. 12(a). The key difference between is the systems’ response to noise. In both cases, noise at frequenare attenuated. For cies higher than the unity gain frequency frequencies much lower than , the response can be calculated via (2). Fig. 11 shows a Bode plot of noise at node in response is much to a noise source at node . Noise performance of worse for intermediate frequencies because there is no feedback so errors propagate forever. In , the feedback limits the influence of preceding stages, and this in turn attenuates noise. For this reason, networks with feedback are preferred, despite the more complicated stability calculation. 2) Two-Dimensional Array: A two-dimensional array is analyzed exactly the same as is a one-dimensional array, except that the gain has to decrease by another factor of two because the center oscillators see four neighbors rather than two. A 16-elegrid is implemented in this thesis. Its poles ment array in a are shown in Fig. 12(b).

GUTNIK AND CHANDRAKASAN: ACTIVE GHz CLOCK NETWORK USING DISTRIBUTED PLLs

Fig. 13. Fig. 11. Comparison of noise responses for symmetrical and asymmetrical networks.

(a)

(b) Fig. 12.

Root locus for 1-D and 2-D PLL arrays. (a) 1-D array. (b) 2-D array.

D. Large Signal: Mode Locking The analysis of the previous section indicates that fully connected networks should have a better noise response than asymmetrical networks. However, the feedback allows the possibility of undesirable large-signal modes. Consider the matrices for a PLL network:

1557

Mode-locking example.

Because phase is periodic with period , the phase measured . For small , at the PDs , so the nonlinearity is irrelevant. However, with

(6) is a stationary point. This is intuitively easy to see, in so reference to Fig. 13: each oscillator leads one neighbor, and lags behind another neighbor by exactly the same amount. The net phase error is zero, so clearly there is no restoring force to drive the phases to 0. Because the nonlinearity does not change are the same for small deviations from , dynamics about as those about 0 and hence this state is stable. The locking of a distributed oscillator to nonzero relative phases has been called mode-locking [9]. At startup, each oscillator in a distributed PLL starts at a random phase, so there is a nonzero chance of converging to a mode-locked state. Simulations show that for a network like the one shown here, the system ends of random initial states. The probamode-locked from array bility goes up rapidly with the size of the system; a ends up mode-locked well over 99% of the time. Pratt and Nguyen proved several useful properties about systems in mode-lock [9]. The key result, generalized for nonCartesian networks, is that for a system in mode-lock, there must be a phase difference between two oscillators such that where is the number of nodes in the largest minimal loop in the network and a minimal loop is a loop in the graph that cannot be decomposed into multiple loops This result suggests a way to distinguish between mode-locked states and the desired 0-phase state: in mode-lock, there must be at least one branch with a large phase error. If the gain of the PD is designed to be negative for a phase difference larger than , then all mode-locked states are made unstable without affecting the in-phase equilibrium. Pratt and Nguyen suggest that XOR PDs preclude mode-lock in a rectangular network of oscillators because the response decreases for phase , [9]. This result follows directly from the errors larger than result derived above: in a rectangular array, the largest minimal . A PD described in the loop has four nodes, so , would be useful in nonrectangular next section, with networks, and where more gain near 0 phase is desirable. IV. IMPLEMENTATION

(5)

The distributed clock network generates the clock signal with PLLs at multiple points (“nodes”) across a chip, and distributes

1558

Fig. 14.

Fig. 15.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000

Ring oscillator schematic

Fig. 16.

Simulated phase transfer curve

Fig. 17.

Locking behavior of the PLL array

Phase detector (PD).

each only to a small section of the chip (“tile”) (Fig. 8). PDs at the boundaries between tiles produce error signals that are summed by an amplifier in each tile and used to adjust the frequency of the node oscillator Because the proposed network has many nodes, the power and size constraints on each node are even more stringent than the constraints on a single global PLL. The oscillator, PD, and loop filter of a working demonstration chip, fabricated in a standard 0.35- m single-poly triple metal process, are considered in turn below. A. Oscillator The demonstration chip used an nMOS-loaded differential ring oscillator as a voltage-controlled oscillator (VCO) comprise the differential (Fig. 14). Transistors , the tail current is driven inverter. The differential pair is , and act as the nMOS load. The nMOS loads allow by noise. fast oscillation and shield the output signal from is a low-pass version of generated by subthreshold ; supply noise coupling in through leakage through PFET of is bypassed by . The oscillation frequency is only dependent on the supply voltage through capacitor , and feedback nonlinearity and the output conductance of and . of the PLL compensates drift of B. Phase Detector (PD) The PD, shown in Fig. 15, has a sufficient nonlinearity, higher gain at small input phase difference and less high-frequency ) is an nMOScontent than an XOR PD. The core (

loaded arbiter which acts as a nonlinear PD. For no input phase difference, the output is balanced. As the phase difference increases from zero, one output will be asserted for the full duration of an input pulse, while the other output will be asserted for only the remainder of the input pulse duration after the first input pulse ends, which is equal to the input phase difference. Thus the detector has very high gain near zero phase error that drops off to zero as the input phase difference approaches the input pulse width (Fig. 16). and enable this arbiter to give The pulse generators frequency-error feedback. If one input is at a higher frequency than the other, its output will be asserted for more input pulses than the other. Because the width of the pulses is independent of input frequency, the average output voltage corresponds to frequency. Unlike a typical phase-frequency detector, however, the strength of the error signal falls to zero as frequency difference goes to 0, so there can be no mode-lock problems, yet large signal frequency (and hence, phase) locking is enhanced. Fig. 17 shows the large signal correction and small signal behavior of the entire array of PLLs as the already internally locked array

GUTNIK AND CHANDRAKASAN: ACTIVE GHz CLOCK NETWORK USING DISTRIBUTED PLLs

1559

Fig. 18. Loop filter schematic.

Fig. 19.

Frequency-locked divider outputs. Fig. 20.

approaches and locks to the reference clock. The detector fits in m m. C. Loop Filter make up ampliThe loop filter is shown in Fig. 18. make up . The differential output fier , while currents from the PDs at the edges of each tile are summed at and , and drive both amplifiers. is a single nodes stage differential pair so it has relatively low gain but a band. has a high-gain cascoded stage width limited by . is a large gate cadriving a common source PFET such that pacitor which serves to set the dominant pole of is biased at very low current to the PLL network is stable. boost gain and enable a low time constant (as low as 12 kHz) m m gate capacitor. The simple design and with a feed-forward compensation allow the loop filter to fit in only m m. Each clock node, consisting of an oscillator m m. and a loop filter, takes just V. RESULTS A chip was fabricated with a array of nodes and PD between nearest neighbors. Counting one node and two PDs the area overhead is approximately 0.0038 mm per tile. Another

Micrograph of the 16-oscillator 1.3-GHz chip.

PD was placed between one of the nodes and the chip clock input to lock the network to an external reference. The output of the 16 oscillators was divided by 64 and driven off chip. At V, the divided outputs were seen to be frequency locked at 17 to 21 MHz, corresponding to oscillator phase lock at 1.1 to 1.3 GHz. An oscilloscope plot of four locked output signals is shown in Fig. 19. Long-term jitter between neighbors is less than 30 ps. Cycle-to-cycle jitter is less than 10ps. The oscillators, amplifiers and all the biasing draws 130 mA at 3 V. A chip plot is shown in Fig. 20. (The rest of the area on the mm mm chip is taken up by test circuits.) VI. CONCLUSION Design and measurements on this chip confirm that generating and synchronizing multiple clocks on chip is feasible. Neither the power nor the area overhead of multiple PLLs is substantial compared to the cost of distributing the clock by conventional means. Most importantly, a distributed clock network can take advantage of improved devices by shrinking the size of the cells, lowering the overall skew and jitter, so performance will scale with the speed of devices, rather than with the much slower improvement of on-chip interconnect speed.

1560

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000

REFERENCES [1] D. W. Bailey and B. J. Benschneider, “Clocking design and analysis for a 600-MHz Alpha microprocessor,” J. Solid State Circuits, vol. 33, no. 11, pp. 1627–1633, Nov. 1998. [2] C. F. Webb, “A 400-MHz S/390 microprocessor,” in ISSCC Dig. Tech. Papers, Feb. 1997, pp. 168–169. [3] T. Yoshida, “A 2-V 250-MHz multimedia processor,” in ISSCC Dig. Tech. Papers, Feb. 1997, pp. 266–267. [4] I. A. Young, M. F. Mar, and B. Bhushan, “A 0.35-m CMOS 3-880-MHz PLL N/2 clock multiplier and distribution network with low jitter for microprocessors,” in ISSCC Dig. Tech. Papers, Feb. 1997, pp. 330–331. [5] H. B. Bakoglu, J. T. Walker, and J. D. Meindl, “A symmetric clockdistribution tree and optimized high-speed interconnections for reduced clock skew in ULSI and WSI circuits,” in IEEE Int. Conf. Computer Design, NY, Oct. 1986, pp. 118–122. [6] P. Zarkesh-Ha, T. Mule, and J. D. Meindl, “Characterization and modeling of clock skew with process variations,” in Proc. IEEE 1999 Custom Integrated Circuits Conf., pp. 441–444. [7] G. Geannopoulos and X. Dai, “An adaptive digital deskewing circuit for clock distribution networks,” in ISSCC Dig. Tech. Papers, Feb. 1998, pp. 400–401. [8] F. Ançeau, “A synchronous approach for clocking VLSI systems,” J. Solid State Circuits, vol. SC-17, no. 1, pp. 51–56, Feb. 1982. [9] G. A. Pratt and J. Nguyen, “Distributed synchronous clocking,” IEEE Trans. Parallel and Distributed Systems, Mar. 1995.

Vadim Gutnik (M’00) received the B.S. degree in electrical engineering and materials science from the University of California, Berkeley, in 1994, and the S.M. and Ph.D. degrees in electrical engineering from the Massachusetts Institute of Technology, Cambridge, in 1996 and 2000, respectively. Previous research interests have included micromechanical resonators, and variable-voltage power supplies. He is currently working as a Design Engineer at Silicon Laboratories, Austin, TX. Dr. Gutnik received an NDSEG fellowship in 1994, and the Intel Foundation Fellowship in 1997.

Anantha P. Chandrakasan (M’95) received the B.S., M.S., and Ph.D. degrees in electrical engineering and computer sciences from the University of California, Berkeley, in 1989, 1990, and 1994, respectively. Since September, 1994, he has been the Analog Devices Career Development Assistant Professor of electrical engineering at the Massachusetts Institute of Technology, Cambridge. His research interests include the ultra-low-power implementation of custom and programmable digital signal processors, wireless sensors and multimedia devices, emerging technologies, and CAD tools for VLSI. He is a co-author of the book titled Low Power Digital CMOS Design (Norwood, MA: Kluwer, 1995). He has served on the technical program committee of various conferences including ISSCC, VLSI Circuits Symposium, DAC, ISLPED, and ICCD. He is the Technical Program Co-Chair for the 1997 International Symposium on Low-Power Electronics and Design and for VLSI Design’98. He received the National Science Foundation Career Development Award in 1995, the IBM Faculty Development Award in 1995, and the National Semiconductor Faculty Development Award in 1996. He received the IEEE Communications Society 1993 Best Tutorial Paper Award for the IEEE Communications Magazine paper titled, “A Portable Multimedia Terminal.”

ISSCC 2000 I SESSION 10 I CLOCK GENERATION AND DlSTRl6UTlOH I PAPER TA 10.5

TA 10.5 Active GHz Clack Network using Distributed PLLs \ladim Gutnik, Ananfha Chandrakasan MIT Microsyslcms Technology I.ah, Cambridge, MA

difference iiicrcasos rrorti eeru, one output is asserted for the fiill durntiun of nn input pulse, while thc nthcr output is asscrtcd for only tlie remniiidcr of the input pu1.s~rlurnt.ion after tlic first input pulse ends, which is cqual to the input phase differcnco. Thus the dotectnr has high gain near zero phnsc error that drops nff to zero A S the input phnsc rliffcrence apprnnclics the iiiput pulw width (Figurc 10.5.4).

Mout modern microprocessors usc a balanccd tree t u distributr? t h o cluck [I].Hnwcvcr, n I gigahertz cloclr spccdfi nn increasiiifi-iirnction of slrew and jitter conies from random vwiatims in gate and The pulse generators P, and P, cnnble this orbitcr to give fwquency Irone input is at n higher frequcncy than Lhe oLher, intercunnccf dclny.The majority nf jit.tcriii a clock tree is inti>odiiccd error ficcdl)ncl~, its output will bo asscrtad Tor tilore input pulscs Ihnu the othcr. by bufycri; and inler-linc coupling to Ihc clock wircs. A rclntivcly Bocnusc the width of tho pulscs is independcnt o f inpul frequency, siiiall amount comes from noisc in the s o i i i ~ eoscillator 121. thr! average output voltngc corresponds t.n frequency. Unlikc n typicnl phasc-frcqucncy dcleclor, howcvcr, the fihangth uf thc error Thin distributed cloelr notwo~lcgcrierateu tho clnclc sigrrnl with plinsr! locked loops (I’LLs) at multiple points (iiorlcs) ncrosfi n chip, signal falls to zcro as frequency differenct! gncs to 0, Y U thcrc can br! no inodelock prublcms, yet large eigiial frequency- (and hcncc, anddistdxdeil each oiily to a ~ i n n lscction l ofthe chip(tilc) (Figure phase-) looking is cnhnnced. Kgurc 10.5.5shows the large-signnl 10.5.11. Phave detectors (PD) til Ihe bouiidarics lictwesn tiles currectioii and smnll-signal bchavior oEths entire array of PLLs na product error signals thnl are summed by an amplifier iii ench tile the already intcrnnlly-locked array nppronches and lnclts tn the nnd used t u acljnst tho rvequency uf the node oscillnlor. refereiicc clnck. Thc PD fits in 3Ox30[nn2, With locnlly-@nerated claclcs, thew are no chip-lctigth clock lincs One loop filler is sssociakd with cnch VCO. Tu avoid the wries to couple in jittcr; slrcw is intrwluccd only by asynimclrics in phase resistor of a charge puiiip with passive RC compensation, a f e d detectors itisteacl of mismatches in physically scpayntcd hflers; rorwmd compensation inelhod is uscd. Thc loop filter of Figure and the clocli i8 regeneratcd at cnch node, xu high frequency jitter 1O.S.G consists of t w o differential ainplifiers. M, - M, mdKC up docs not nccuniulatc with distance from the clock mime. Uillikc nuiplifier A,, whilc M9- M,? nialc np A,. The differential output cnrlicr work on multiple clock dornains which suggmls usc uf currents fiom tlic PDs at the edges cifcach tile nre s u n n ” at nodcs ii>iiltiplcindependent clocks, this approach prnrluccs n single fullyIn+ and In-, nnrl drive buth amplifiers. A, is A singlc-stagcdiffcrcrisynchronized clock. This nrbitrary iictworlc of tiles, riicli with its lial pair so it hafi relatively low gain bul a bandwidth limitcd by o w n PT,L, is more gcnornl than piwiuus activc skcw Inmagemelit g,:,dC,*. A.. hnr: n high gain cascndcd stnge driving a cninmoli source approaches (31. pPE‘? M,7, M,, is n large gate capacitor which SCIVCS to set Ihe IIowcvcr, because thcm nrc many nudes, tlic powcr nnrl size cuii- doniiiiaril polo n f ?,I2 such that t h o PLI, network is stablo. Mi, i s biased a t l o w current t o boost gain nnd ennble tiinc constnnt as low stthnints o n each clcmcnt d n distributed clock gcnci*ntiontiwhitcct w c nre evenmorc stringcnl lhnn the constraints on n single, global 88 l21rHz with A 1.5x15!iin2 gate capncitor. The simple dcsigti arid feed-fonuartlcoinpciisationalluwthc loopfiltcrm fit iiionly 1 5 ~ 4 5 p ~ i ’ . PLT,. Furtliannore, thcrc niiisl be t i way to cnsurc thnl the mnltiplc Each clock nndc, corisiuting of nil oacillnlor s n d a loop filter., Inkes nodesgetniid stay sjnchronized.‘l‘hsuscillator,pliascdetectur,and loopfiltcr oTs wurking d c m ” m l i o n chip, fabricated i r i afilaildnrd j u s t 4 ~ X 4 ~ ~ l i i i 2 , 0.35pn1, singlo-poly triplc-mctnl process, arc considered in turn A chip was fnhricnlcd with a 4x4 away o f rrodes and PD hctwccri bclo\v. ncnrest neighbors. Counting one norlo and two I’Ds, the a r m overhead is approxirnnlely 0.003Briini2 pcr tile. Another PD is This chip iiaes an nMOS-londcd differential ring oscillalar as a betwecn nnc ofthr? nodes and thccliipcloclc input tu lock the nctworlc voltage-contrullcdoscillator (VCO)t o minimize povvor supply noisc to nn external rofcrcncc. Tlie output of the 1.G oscillators is divided - M, conipriso thc differential (Fig-orc 1.0.5.2). ’I’ransistom by 64 end drivcli off chip. At ,V, 5 W ,thc divided o1itput.s arc! invartor. The differential pair i s ML,+, Lhc tail currant is drivcIlhy M, nndM,,: act as thcnMOS lonrl. Thenn’lOS loads allow fnst oscillation frequency-lockod n t 17 1~ZlMHz, cnrrcspotidingt u oseillntor phase and shield the output signnl kom VlIn noisc. V,,im,i s a low-pnss lock at 1.1 to 1.3GIIx. An oscillascopc plot of four locked output signals is showti i t ] Figure 10,5.7. vcraioni ofV,, gcncmtcd I~ysuhlhresholdlcak~~c llirongh pFlN M,; supply noivc coupling in through C,c,iuf M4,7 is bypassed by M, . ‘I’hc J,ong-tcrm jit,tcr bctweeii neighbors is less Lhan 30ps r m . Cycle-Looscillntiun frcqucncy ifi dependent on the supplyvnltngconly through cyclc jittcr is loss than 101)s.‘I’hcoscillntora, nriiplificrs and nll the capacitoi. nonlinearity and Llie output conductnncc of M4:7, and hiaaing draws 1301n.4 nt 3V. A chip plot is shown in Figure 10,6.8. fecdback o l the I’LL compcnsntcs drin of Vnn aiid Vb-k,4. Thc res1 ofthe area on tlic 3x3min2chip is tnkcn tip by test c i r c u i t s . Bocausophnseis periodic, asotnfescillatorviieednotall bo in phase t u each liavc xcro net phase error. Tliifi phcnoinenon, niodclnclr, is Dcsigri nod measureincnt.on t.his chip confirm tlint generating and sy~~chroriizing multiple clocks on chip is feasihlc. Neither the powcr describcd in Rererenee [41, which n o l c s Lhat inodelock can be nor tlie area ovcrhond of rnultiple I’LLs i s solistantial compared to avoidcd by iisiiig I’Us whosc response decreascs nionotonicnlly thc cost of distributing the clock by convcntionnl incans. h h t bcymid a phnse difference of d2, ns from an XOH phnsc! detector. Notc that this solution precludcs thc use of B phase-ficqireoq importnntly, a distrilrrutcd clock network can lake advantage o f improved devices by shrinking thc s i x of llie cells, lowcriiig the dctcctor (PFD). Lack nf a PFD is Iwoblematic bccausc the cepture overall slrcw m r l j i l l a r , 80 performancc will scale with device speed, bandwidth of a nicnio~ylcfifiPld, is liiiiitcd t o a fcw percent of thc rather than with the much sluwcr improvement of on-chig interconcenter keqwncy, wliilo thc cenler frcqucncics o f widely-spced nect spccd. oficillntors un a chip can cnsily vary by 10-20%, The 1’D prnposcd here, shown in V i w w 10.5.3, E ~ R R sufficicnt. nonliIicm,ily, higher gain at small input phase diffcrcncc and less high-frequency content thnn nn XOE 1’U. ‘I’lie cord (RI, - MJ iu an nMOS-loaded arbiter which ncts ns U noidincar plinfic delector. For iio input phnse difference, thc output i s balanced. An tlic pl~nse

174

Auknowlcdgii I C RIS: The nutliors ~clmowlrxlge~ ~ p p from ~ r tlie t MRR.CO Focused Resenrch Contcr mi Interconnects fuiidod nlrM12’ through a suhconLrnct from Georgia Tccli. Vodim Gutnilr was partly nuppo~.tedby a gmdutite fellowship from 1nt.el Corp.

2000 IEEE International Solid-State Circuits Conference

0-7803-5853~8/00/$10.00 02000 IEEE

ISSCC 2000 / February 8,2000 I Salon 9 / 10:45 AM

Keferences :

Chip Boundary

[I] Bnilcy,D. W.nnd B. 6.Bcnschncidcr, “Clucltinrdcsign and analysis fora GOO M l h Alplin rilict’opr.ocfsanr.,” ,Joiwrialof Solid Stnte Circuits, vol. 33,110. 11, pp. 1G27-1639,Novcnihcr 1998. 121Y U U I I.~ A., , M.I”.Mar, and 13. Ithushan, “A0.3:im CMOS :1-H8OMHz 1’LL NL2 cloclc nlulliplicr il rrtl dinidmiion netwnrk with low jitter formicruproccsAOIS,’’ in ISYCI: 13igwt nfTechnicn1 l’flpers, Fcbrtwry 1007, pp. 330-331,

i l e Boundary

[YI Genuunpouios, G. wid X. Diii, ‘‘An utlnptivc digital rlnskcwirlg.circuit fnr cloclt didtimihillion riotworks,” in tHHCC Uigent nfl’cclinicnl Papers, hhruury 19!M,pp, 400-4(1 1. 1.11 Pmtt, G . A. nnd J. Nguycn, ”Distributed synclirorrou:, clocking,” 1l4l1Cl3: Truns;ictionson Pnrallcl a n d Ilish~ibutetlSystems, Pehriinry 1996.

............................................ M7

h14

Vt>iaa

Vout

X t

I

Loop Filter 6t

1 ~6~

vco

Figtire 10.6.1: Dicltribiited docking network.

I

I

4 g..............................................

501

I ~I ~

.-

- --

Figure 10.6.2:Ring oscillator schcmntic.

,’.’...”.”.....

.................

fi

P

5. :;

M4

AI

v

Y2

M2

...............i

................

:

:

1

-0.2

I .

.

-0.1

0

0.2

0.1

TIme dlfference (nanoseconds)

-

Figure 10.6.4 Simulutcd I’D tmnsCcr curve.

Figure 10.6.3 Phase detector.

6

0



I

1

I



I

Figtire 10.6.6: Loup filter schematic.

,.031 .

., I

I

0.5

1

.

L

~

.

A

I

-

1.5 2 2.5 3 Slmulatlon time (mlcmsoconds)

Piguro iU.S.6: Locking behavior of the I’LL array.

-dPigurc 10.6.7: SCCpngc! 454. 3.5

Fignrc 10.5.8: See pnge 4Gt.

DIGEST OF TECHNICAL PAPERS

175

ISSCC 2000 PAPER CONTINUATlONS

Figure 10.1.5: Clock and data rccovery (CUR.)with MPM.

- -

Sitnilkition B 3.1 GHz I3-3simplihed Foimiili~Cl 3.1 GH7 Complelc Formtila 0 3 , i l i t l z 01

Vibpro 10.4.8: Miorngraph.

50 100 Spacing b c l w o n Signal Linu and G r o w l Line (11

a 111)

Figure 10.4.R Wire inductancc formulae.

Figurc 10.4.7: Wire inductance with substrate effect.

454

2000 IEEE International Solid-state Circuits Conference

0-7803-5853-8/00/$10.0002000 IEEE

ISSCC 2000 PAPER CONTINUATIONS

G:S

1

1

nimaf moulh

Figure 105.8:nistrilmtcd dock chip. fixcl vcrtcx dclcctnr Nuinber nf cbnnncls I'nwer I cllamcl Arcalclinnncl Track pnsilion resolulioii Tntd m a tarliation . . dose (10 yrs) Tracker Numlicr of ctiaiinels Power I cliilnncl Trnck posilioii rewliitioii Bndinliun dose (10 yrs)

Caloriiiiclor --

Niimbcr o f clinnnds Snmpljnfi rate

Rrtlialion d u e (10 ym) Munti dctccior Nuinbcr nf chinncla T i m i y rcaoliition Kadialion ilme ( I O yrsi Dala rnte i i h level I lrigger

100M

]

< 1UOJI\V

1 5 0 hu 400 luii' I 15 [I111 ,8 Id

li)-30Mndr~10"1icutnms/rii~

Figure 11.1.7: Ruiigc finder ASIC in 0 , 8 ) ~ m CMOS and p:rckagcd transmitter-receiver. Core aren of chip is 1xl.67min2.

12M < 3n1w Sfl-10011nr I O Mnd $. I O " ILICI~I'

-

LOOK I 2 liil al 40 MI lz 500 KmdtlO"n/ctn' (lmrrtil) Zfl Mmrlt Ill"n/cin? h l c a l x )

BW K .7 11s IflKrnd t

IO" n l c d

I Tbit/scc

Figutw 11.3.4: 130 chtinnel protolype micrograph.

Chip is 2x8mm2.

-4

I25 by 50 iiiicrotis hiialog 2.2 V Di si131 I .Ir V 13" 6.6 and 5.2 pA 11.9

Tlircsliulil of Iiil channcli incrmwd b y 1000 eIecImis. arid Ihe m i s e iruill 220 Iu 450 cleclrniis rim. ~

Pigurc 11.3.8. Measured prutotypc charnctcristics.

Pimrc 11.2.7: Dic micrograph.

DIGEST OF TECHNICAL PAPERS

455

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 3, MARCH 2000

377

An All-Analog Multiphase Delay-Locked Loop Using a Replica Delay Line for Wide-Range Operation and Low-Jitter Performance Yongsam Moon, Student Member, IEEE, Jongsang Choi, Kyeongho Lee, Member, IEEE, Deog-Kyoon Jeong, Member, IEEE, and Min-Kyu Kim

Abstract—This paper describes an all-analog multiphase delay-locked loop (DLL) architecture that achieves both wide-range operation and low-jitter performance. A replica delay line is attached to a conventional DLL to fully utilize the frequency range of the voltage-controlled delay line. The proposed DLL keeps the same benefits of conventional DLL's such as good jitter performance and multiphase clock generation. The DLL incorporates dynamic phase detectors and triply controlled delay cells with cell-level duty-cycle correction capability to generate equally spaced eight-phase clocks. The chip has been fabricated using a 0.35-µm CMOS process. The peak-to-peak jitter is less than 30 ps over the operating frequency range of 62.5–250 MHz. At 250 MHz, its jitter supply sensitivity is 0.11 ps/mV. It occupies smaller area (0.2 mm2) and dissipates less power (42 mW) than other wide-range DLL's [2]–[7]. Index Terms—Delay-locked loop, duty-cycle correction, dynamic phase detector, multiphase clock generation, replica delay line, triply controlled delay cell.

I. INTRODUCTION

A

S THE SPEED performance of VLSI systems increases rapidly, more emphasis is placed on suppressing skew and jitter in the clocks. Phase-locked loops (PLL's) and delay-locked loops (DLL's) have been typically employed in microprocessors, memory interfaces, and communication IC's for the generation of on-chip clocks. However, it becomes increasingly difficult to reduce the clock skew and jitter, whether they are inherent or result from substrate and supply noise, as the clock speed and circuit integration levels are increased. While the phase error of PLL's is accumulated and persists for a long time in a noisy environment, that of DLL's is not accumulated, and thus, the clock generated from DLL's has lower jitter. Therefore, DLL's offer a good alternative to PLL's in cases where the reference clock comes from a low-jitter source, although their usage is excluded in applications where frequency tracking is required, such as frequency synthesis and clock recovery from an input signal. However, the main problem of conventional DLL's [1] is that they are very difficult to design to Manuscript received July 20, 1999; revised October 6, 1999. Y. Moon, J. Choi, and D.-K. Jeong are with the School of Electrical Engineering, Seoul National University, Seoul 151-742 Korea (e-mail: [email protected]). K. Lee was with the School of Electrical Engineering, Seoul National University, Seoul 151-742 Korea. He is now with Global Communication Technology Inc., Los Altos, CA 94024 USA. M.-K. Kim is with Silicon Image, Inc., Sunnyvale, CA 94086 USA. Publisher Item Identifier S 0018-9200(00)00538-2.

work over process, voltage, and temperature (PVT) variations. Since DLL's adjust only phase, not frequency, the operating frequency range is severely limited. We propose a new DLL architecture that operates in a wide frequency range while keeping the low-jitter performance. Various wide-range DLL architectures [2]–[7], with similar motivations, have been developed, which can be classified into three categories: analog type [2], digital type [3], [4], and dual-loop type [5]–[7]. While a conventional analog DLL [1] uses a voltage-controlled delay line (VCDL), the wide-range analog DLL [2] uses phase mixers for wide-range operation. However, because of its relatively high analog complexity, the analog DLL requires a process-specific implementation, making it relatively difficult to port across multiple processes [4]. Thus, digital DLL's [3], [4] have been proposed for better process portability. However, skew error and jitter are increased due to continuous change of phase selections among quantized delay times with supply and temperature variations. To overcome these problems, dual-loop architectures have been proposed [5]–[7]. In [5], a PLL is added to make the core DLL lock to a reference frequency, and a phase mixer interpolates two intermediate clocks in the core DLL and produces the final output clock. Or, almost continuous phase is obtained with addition of a fine delay line [6] or a phase interpolator [7] to a digital DLL. However, additional chip area and power consumption of these wide-range DLL's are excessive, and furthermore, their jitter performance gets worse compared with conventional DLL's since the number of delay cells or gates in the clock propagation paths becomes larger. We propose a new DLL architecture that achieves a large operating range by attaching a replica delay line in parallel with a conventional analog DLL. Since the replica delay line occupies one-fourth the area of the core DLL, it incurs only a small increase in chip area and power consumption. Since the replica delay line is out of the clock propagation path, it does not do any harm on low-jitter performance. While other wide-range DLL's [2]–[7] use phase mixers or phase selections to generate a single output, the proposed DLL uses a similar multistage analog VCDL to what conventional analog DLL's use. Therefore, the proposed DLL can generate multiphase clocks without using excessive amount of hardware. Furthermore, by incorporating a dynamic phase detection circuit and cell-level duty cycle correction method, the multiphase clocks are equally spaced even in high-frequency operations. A prototype DLL designed for eight-phase clocks can

0018–9200/00$10.00 © 2000 IEEE

378

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 3, MARCH 2000

Fig. 2.

Block diagrams of (a) a digital DLL and (b) a dual-loop DLL.

Fig. 1. Block diagram of (a) a conventional DLL and (b) a DLL locking operation and operating frequency range limitation.

be used in applications such as gigabit serial interfaces [8], [9]. This paper is arranged as follows. Section II describes a conventional analog DLL and includes an analysis of its operational frequency range. This section also overviews other wide-range DLL architectures. In Section III, the proposed architecture is presented with design ideas, issues, and various analyses. Section IV describes various circuits used in the design. Section V discusses the prototype chip implementation and shows experimental results. Section VI concludes this paper with a summary.

Fig. 3. Block diagram of the proposed analog DLL.

or, equivalently, in terms of

II. CONVENTIONAL ARCHITECTURES A. Range Problem of Conventional DLL's A simplified block diagram of a conventional DLL [1] is outlined with its operation mechanism in Fig. 1. When the delay of the VCDL is initially smaller (or larger) than time of the reference clock (Ref-CLK), the DLL the period until phase difference disappears in a negative adjusts feedback loop, as shown in Fig. 1(b). The phase difference is detected by sampling the reference clock with the rising edge of the output clock (DLL-CLK). Depending on the sampled value, a DOWN or UP pulse is generated. These pulses discharge (or charge) a capacitor in the loop filter, thereby decreasing (or inand reducing the phase differcreasing) the control voltage ence gradually. However, if the sampling edge of DLL-CLK deviates from the lock range indicated in Fig. 1(b), the DLL falls prey to a stuck or a harmonic lock problem. In order to avoid this problem, the of should be located between minimum and , and the maximum between 0.5 and 1.5 . These stuck-free conditions can be expressed as the following inequality:

(1)

(2) The range of stuck-free clock period is determined by inequality (2). If the target clock period satisfies inequality (2), the DLL works without the stuck problem. However, it should be noted that inequality (2) has the maximum range , when . In , there is no range of addition, if that satisfies inequality (2), and the DLL is prone to the can be as stuck problem. Since the PVT variations of much as 2:1 in a typical CMOS process, the stuck-free condi, and tion can be satisfied over only a very narrow range of thus a time-consuming and tedious circuit trimming job is required when process migrations are performed across different processes. B. Digital DLL's and Dual-Loop DLL's Digital DLL's [3], [4] have been developed to overcome the narrow frequency range problem of conventional analog DLL's. A simplified block diagram of a typical digital DLL [3] is outlined in Fig. 2(a). Multistage delay cells in the VCDL provide

MOON et al.: ALL-ANALOG MULTIPHASE DLL

fixed and quantized delay times. The finite-state machine selects one clock output with closest phase to the reference clock's by using digital control bits instead of using an analog control voltage. Therefore, major drawbacks in the digital DLL's are large skew due to quantized delay time and large jitter due to control-bit updates during operation. To increase the resolution and cover a wide delay range, a large delay cell array must be used, and that inevitably increases chip area and power consumption. In order to cope with these problems, Garlepp et al. [4] proposed a phase blending technique in a hierarchical structure for improved phase resolution. However, the inherent problems of digital DLL's are not solved entirely. Dual-loop DLL's [6], [7] have been proposed to minimize the problems of digital DLL's. A simplified block diagram of architecture proposed in [6] is shown in Fig. 2(b). A fine delay line, which is analog controlled, is attached to a digital DLL in the subsequent stage. In [7], a phase interpolator is cascaded to a digital DLL for unlimited phase capture range. These dual-loop architectures achieve both a wide frequency range and relatively low jitter performance. However, due to digital DLL's inherent nature, jitter histogram of the generated clock shows the superposition of two Gaussian distributions [7] resulting from the control-bit updates. In addition, the overhead of chip area and power consumption is significant.

379

Fig. 4.

Configuration and operation of a replica delay line.

Fig. 5. (a) Delay capture range of the replica delay line and (b) gain curve of CSPD.

B. Delay Capture Range of the Replica Delay Line III. PROPOSED ARCHITECTURE A. All-Analog DLL Using a Replica Delay Line Fig. 3 shows a high-level block diagram of the proposed arof the main analog DLL chitecture. The delay time (core DLL) is primarily controlled by a control voltage Vcr, which is generated from a replica delay line. Another control . The replica delay line consists voltage Vcp fine-tunes of only one replica delay cell, a current steering phase detector (CSPD), and a low-pass filter (LPF). The replica delay cell is identical to the delay cells in the core DLL. Due to sharing of of the replica delay cell is almost Vcr, the delay time of each delay cell in the core DLL. equal to the delay time They are not exactly the same unless Vcp equals bias. Due to the is forced to characteristics of the proposed CSPD [10], . Therefore, of the core DLL bebe one-eighth of when the number of delay cells in the core comes equal to DLL is eight. With the replica delay line with a wide frequency range, the core DLL's operating frequency bounds will be established, and thus the core DLL will not fall into such a harmonic lock problem as conventional analog DLL's do. With only a negligible increase in chip area and power consumption, the proposed architecture offers many advantages compared with other wide-range DLL's. Since the DLL is analog controlled and the clock path is not extended, the DLL can keep the low-jitter performance of the conventional DLL. In addition, because it uses a multistage analog VCDL, the proposed DLL can generate multiphase clocks.

Fig. 4 shows the circuit diagram and operation waveforms of the replica delay line. The replica delay line generates a control voltage Vcr to pass to the core DLL. Vcr is used as a reference voltage in the core DLL to lock to the input frequency. The CSPD takes two inputs ICLK and QCLK. ICLK is directly connected to Ref-CLK, and QCLK is delayed from Ref-CLK is equal to the delay difference beby one delay cell. tween ICLK's and QCLK's rising edges. In the charge pump, the is tuned to three times the pulldown current pullup current . When is high, the charge on the , and Vcr will filter capacitors will be decreased by is low, the charge will be go down. On the other hand, when , and Vcr will go up three times faster. increased by When the feedback loop is locked, a stable value of Vcr will be . Thereobtained with the relation of has the low-to-high fore, in the locked state, XNOR output , and the rising edge duration ratio of 1:3 . of ICLK leads that of QCLK by one-eighth of Fig. 5(a) shows the capture range of the replica delay line . This can be derived from the gain when is smaller than curve of the CSPD shown in Fig. 5(b). If , the change of Vcr, denoted as , will become 1/8 will increase. That action is indicated in the negative and gain curve as the corresponding arrow pointing to the right. If is between 1/8 and 7/8 will be positive and the corresponding arrow points to the left. Eventually, will be settled at 1/4 , which represents 1/8 . is larger than 7/8 , the settling point However, if will run away and a harmonic lock problem will occur.

380

Fig. 6.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 3, MARCH 2000

Replica delay line for high-frequency operation.

The operating conditions explained above, in which the delay line locks correctly, can be summarized in the inequalities as follows:

(3)

Fig. 7.

Block diagram of the core DLL.

or equivalently in terms of Max (4) If the delay range of the controlled delay cell satisfies the re, the DLL will have a frelation quency range determined by the entire delay range of the delay cell. However, even if we make the delay range wider and satin an effort to increase the isfy frequency range, the lock range is limited to only 7:1. In some applications where the frequency range must be larger than 7:1, changing the pump-current ratio of the CSPD can make the frequency range wider. For example, with , the frequency range of 9:1 can be obtained. , the frequency range of 11:1 can be With obtained. , especially , may be In high-frequency operations, too short to drive the XNOR gate. So, a divide-by-two circuit and a pair of delay cells are used to slow down the frequency of Ref-CLK [11]. The new configuration shown in Fig. 6 is effectively the same as the one in Fig. 4 but offers a more robust operation in the high-frequency operations. C. Core DLL Fig. 7 shows a simplified block diagram of the core DLL. It consists of a VCDL, a dynamic phase detector, a charge pump, and a loop filter. The core DLL generates eight-phase clock outputs through eight delay cells (DC's) in the VCDL. The core DLL is the same as a conventional analog DLL except that it has another control voltage Vcr. Vcr from the replica delay line of the VCDL so coarsely determines the delay time is equal to in the locked state. In the locked that state, the eighth clock output, CLK7 in Fig. 7, is aligned with Ref-CLK. In high-frequency operations, there may be some static phase mismatch between CLK7 and Ref-CLK due to the long rise/fall times of signal transition edges compared with the period of the clock. So the fine-tuning is required. The dynamic phase detector (PD) in the core DLL generates control signal Vcp, fine, and removes residual phase mismatch so that the tunes rising edge of Ref-CLK is exactly aligned with that of CLK7.

Fig. 8. (a) Core DLL with cell-level duty-cycle correction and (b) rising and falling edge alignment.

D. Cell-Level Duty-Cycle Correction Fig. 8 shows the core DLL with a cell-level duty-cycle correction mechanism. In high-frequency operations, clock outputs with a short cycle time can be severely distorted as the clock passes through many delay cells. Even if the duty cycle of Ref-CLK is 50% at the entrance, that of CLK7 may deviate significantly from 50%. It causes multiphase clock outputs to have phase error, which could be fatal, especially in high-speed communication applications. A conventional solution is to attach duty-cycle correction circuits to all clock output drivers with the price of added area, increased jitter, and further phase mismatch due to elongated path. So a cell-level duty cycle correction is proposed. The second phase detector shown in Fig. 8 takes inverted Ref-CLK and inverted CLK7 as its inputs, generating a control signal Vduty as the output. It fine-tunes the cell current ratio, and thus aligns the falling edges of Ref-CLK and CLK7. In the steady state, therefore, both rising and falling edges of CLK7 and Ref-CLK are synchronized in phase, and both clocks have the same duty cycle. It should be noted that the duty-cycle correction circuit (DCC) used right at the input of Ref-CLK corrects

MOON et al.: ALL-ANALOG MULTIPHASE DLL

381

Fig. 10.

(a) Dynamic phase detector and (b) its operations.

Fig. 11.

Prototype chip microphotograph.

Fig. 9. Triply controlled DC. (a) Circuit diagram of a DCE and (b) configuration of a triply controlled DC.

the duty cycle of Ref-CLK only. With cell-level duty cycle correction, not only CLK7 but also the other intermediate clock outputs maintain a 50% duty cycle without any additional circuits. Although two control voltages Vcp and Vduty are simultaneously adjusted in the coupled negative feedback loops, the stability is guaranteed by making one of its loops have a sufficiently low bandwidth. IV. CIRCUIT DESIGN A. Triply Controlled Delay Cell According to the noise analyses of [12] and [13], a fast-slewing (short rise/fall time) delay cell with a fully switching capability offers less phase noise. Although offering a full swing output, a shunt-capacitor delay cell [14], with its capacitor, would increase the chip area and power. Therefore, we decided to use the current-starved inverter [15] as a basic controlled delay cell. Since the current-starved inverter does not require a level conversion circuit, which is required for a differential delay cell, it has less chip area and power, although substrate and supply noise might cause detrimental influence. A triply controlled delay cell is used as the basic delay cell element (DCE). The circuit diagram of the DCE and the configuration of one unit of DC are shown in Fig. 9. Four DCE's and two inverters compose a DC and make its rising/falling delay of the triply controlled times symmetric. The delay time delay cell is determined by six control signals: Vcr, Vcr_b, Vcp, Vcp_b, Vduty, and Vduty_b. Of those signals, Vcr and Vcr_b come from the replica delay line. In the DCE, the sizes of MP1 and MN1 are made larger than the others' so that Vcr and and primarily. The other control Vcr_b can control signals, which are generated by the core DLL, make only small and for the fine-tuning of . adjustments to Vcp and Vcp_b are used to align the rising edges of Ref-CLK and CLK7. Vduty and Vduty_b are responsible for maintaining the correct duty cycle and, thus, aligning the falling edges.

Since the high and low levels alternate in an inverter chain, duty-cycle control signals must alternate between Vduty and Vduty_b as well. Therefore, Vduty_b controls DCE0 and DCE3 and Vduty controls DCE1 and DCE2, as shown in Fig. 9(b). In the delay circuit, either Vduty or Vduty_b changes the duty cycle of the clock outputs by adjusting the current ratio of to . With this mechanism, the multiphase clock outputs, CLK0 CLK7, will be duty-cycle corrected and equally spaced. There is no need to attach a DCC circuit in each clock output. B. Dynamic Phase Detector Since the tuning precision of the core DLL depends on the characteristics of the phase detector, we propose a new high-precision dynamic phase detector. Fig. 10(a) shows the circuit diagram of the proposed dynamic phase detector, which is improved from the published phase-frequency detector [8] by removing a feedback path and replacing the feedback input with an REF and DCLK signal. The phase detector can operate with less phase offset at high frequencies due to symmetry of circuit, shallow logic depth of only two gates, and fast operation with a dynamic logic circuit. While the widths of UP and DOWN pulses are proportional to the phase difference of the inputs as shown in Fig. 10(b), there remains a chain of short pulses in the locked state. These pulses in the locked state serve to reduce the dead zone of the phase detector [8]. However, the accuracy of the phase detector is improved when the pulse duration is shorter. Furthermore, smaller capacitor in the loop filter can be used since the amount of pumped charge is smaller compared

382

Fig. 12.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 3, MARCH 2000

Clock waveforms at 62.5 MHz. (a) CLK0, CLK2 and (b) CLK0, CLK4.

with a conventional “bang-bang” type of phase detector or a proportional phase detector with wider pulse width. V. EXPERIMENTAL RESULTS The test chip has been fabricated using a 0.35-µm, N-well, triple-metal CMOS process. The threshold voltages in this process are 0.42 V (NMOS) and −0.22 V (PMOS). The gate-oxide thickness is 75 nm. Fig. 11 shows a microphotograph of the fabricated chip. The chip integrates the DLL with an on-chip decoupling capacitance of 270 pF. The active area of the DLL occupies 0.08 mm2 and the decoupling capacitor 0.12 mm2. Since the pulse currents of the multiphase clock outputs are interspersed, the ac component of the supply current is present at the eighth harmonic frequencies of the clock. Therefore, the 270-pF on-chip capacitor is adequate to reduce the on-chip supply noise induced by switching of digital circuits. The prototype chip operates from 62.5 to 250 MHz with a 3.3-V power supply. Fig. 12(a) shows the waveforms of CLK0 and CLK2 at 62.5 MHz. These clock outputs are the first and the third clocks, respectively, and have a 90 phase difference. Fig. 12(b) shows the waveforms of CLK0 and CLK4, which are an inversion of each other with a 180 phase difference. Fig. 13(a) and (b) shows the same waveforms at 250 MHz. In spite of some ringing due to capacitance and inductance of the board and measurement instrument, the measurement results

show that the clock outputs are aligned with precise phase relationships of less than 1% error over an operating frequency range from 62.5 to 250 MHz. The delay range of the VCDL is estimated to be between 4 and 16 ns. With minor change of device sizes of the VCDL, the operating frequency range could be extended toward a higher frequency range. Fig. 14(a) and (b) shows the jitter histograms in the clock output CLK7. The frequency of Ref-CLK is 250 MHz. Fig. 14(a) shows 4-ps rms and 29-ps peak-to-peak jitter characteristics in a quiet power supply, where only the DLL is activated in the chip. When other digital circuits are turned on, rms and peak-to-peak jitter are increased to 6.4 and 44 ps, respectively, and internal supply noise of about 200 mV is measured. If a 500-mV, 1.1-MHz square wave is injected externally on the power supply, the peak-to-peak jitter increases to 83 ps, as shown in Fig. 14(b). At 250 MHz, jitter supply sensitivity is measured to be only 0.11 ps/mV. Furthermore, from 62.5 to 250 MHz, the clock outputs show almost flat jitter performance. Since the delay range of the VCDL in the core DLL is primarily set by Vcr and Vcr_b, the gain of the VCDL is nearly flat over a wide range of operating frequency. The jitter performance of the proposed DLL is better than or at least comparable to other wide-range DLL's [2]–[7]. Table I summarizes the DLL performance characteristics. The power dissipation is proportional to the operating frequency. Operating at 250 MHz, the DLL draws 12.6-mA dc from a 3.3-V power supply.

MOON et al.: ALL-ANALOG MULTIPHASE DLL

Fig. 13.

Clock waveforms at 250 MHz. (a) CLK0, CLK2 and (b) CLK0, CLK4.

Fig. 14.

Jitter histograms at 250 MHz in (a) a quiet supply and (b) with added 1.1-MHz, 500-mV square wave noise.

383

384

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 3, MARCH 2000

TABLE I PERFORMANCE CHARACTERISTICS PROTOTYPE CHIP

OF

VI. CONCLUSION By including a replica delay line with a CSPD, the core DLL operates in a wide frequency range from 62.5 to 250 MHz. Since the replica delay line occupies a quarter of the area of the core DLL, the area cost and power consumption of the prototype chip are much smaller than those of other wide-range DLL's [2]–[7]. Both the analog-control scheme and the flat gain of the VCDL offer a low-jitter performance of 4-ps rms and 29-ps peak-to-peak, and a low supply sensitivity of 0.11 ps/mV. The DLL incorporates dynamic phase detectors and triply controlled delay cells with cell-level duty-cycle correction capability in order to generate equally spaced eight-phase clocks. The DLL can be used not only as an internal clock buffer of microprocessors and memory IC's but also as a multiphase clock generator for gigabit serial interfaces. With a faster VCDL with minor change of device sizes, the DLL will operate at a higher and wider frequency range. REFERENCES [1] M. Johnson and E. Hudson, “A variable delay line PLL for CPU-coprocessor synchronization,” IEEE J. Solid-State Circuits, vol. 23, pp. 1218–1223, Oct. 1988. [2] T. H. Lee, K. S. Donnelly, J. T. C. Ho, J. Zerbe, M. G. Johnson, and T. Ishikawa, “A 2.5 V CMOS delay-locked loop for an 18 Mbit, 500 Megabyte/s DRAM,,” IEEE J. Solid-State Circuits, vol. 29, pp. 1491–1496, Dec. 1994. [3] A. Efendovich, Y. Afek, C. Sella, and Z. Bikowsky, “Multifrequency zero-jitter delay-locked loop,” IEEE J. Solid-State Circuits, vol. 29, pp. 67–70, Jan. 1994. [4] B. W. Garlepp, K. S. Donnelly, J. Kim, P. S. Chau, J. L. Zerbe, C. Huang, C. V. Tran, C. L. Portmann, D. Stark, Y.-F. Chan, T. H. Leen, and M. A. Horowitz, “A Portable Digital DLL for High-Speed CMOS Interface Circuits,” IEEE J. Solid-State Circuits, vol. 34, pp. 632–644, May 1999. [5] S. Tanoi, T. Tanabe, K. Takahashi, S. Miyamoto, and M. Uesugi, “A 250–622 MHz deskew and jitter-suppressed clock buffer using two-loop architecture,” IEEE J. Solid-State Circuits, vol. 31, pp. 487–493, Apr. 1996. [6] K. Lee, Y. Moon, and D.-K. Jeong, “Dual loop delay-locked loop,”, U.S. patent pending. [7] S. Sidiropoulos and M. A. Horowitz, “A semi-digital dual delay-locked loop,” IEEE J. Solid-State Circuits, vol. 32, pp. 1683–1692, Nov. 1997. [8] S. Kim, K. Lee, Y. Moon, D.-K. Jeong, Y. Choi, and H. K. Lim, “A 960Mb/s/pin interface for skew-tolerant bus using low jitter PLL,” IEEE J. Solid-State Circuits, vol. 32, pp. 691–700, May 1997. [9] D.-L. Chen and M. O. Baker, “A 1.25 Gb/s, 460 mW CMOS transceiver for serial data communication,” in IEEE ISSCC Dig. Tech. Papers, Feb. 1997, pp. 242–243.

[10] Y. Moon, D.-K. Jeong, and G. Kim, “Clock dithering for electromagnetic compliance using spread spectrum phase modulation,” in IEEE ISSCC Dig. Tech. Papers, Feb. 1999, pp. 186–187. [11] Y. Moon, J. Choi, K. Lee, D.-K. Jeong, and M.-K. Kim, “A 62.5–250 MHz multi-phase delay-locked loop using a replica delay line with triply controlled delay cells,” in Proc. IEEE Custom Integrated Circuits Conf., May 1999, pp. 299–302. [12] B. Kim, “High speed clock recovery in VLSI using hybrid analog/digital techniques,” Ph.D. dissertation, Univ. of California, Berkeley, Memo. UCB/ERL M90/50, June 1990. [13] A. Hajimiri, S. Limotyrakis, and T. H. Lee, “Jitter and phase noise in ring oscillators,” IEEE J. Solid-State Circuits, vol. 34, pp. 790–804, June 1999. [14] M. Bazes, “A novel precision MOS synchronous delay line,” IEEE J. Solid-State Circuits, vol. SC-20, pp. 1265–1271, Dec. 1985. [15] D.-K. Jeong, G. Borriello, D. A. Hodges, and R. H. Katz, “Design of PLL-based clock generation circuits,” IEEE J. Solid-State Circuits, vol. SC-22, pp. 255–261, Apr. 1987.

Yongsam Moon (S'96) was born in Incheon, Korea, on March 1, 1971. He received the B.S. and M.S. degrees in electronics engineering from Seoul National University, Seoul, Korea, in 1994 and 1996, respectively, where he is currently pursuing the Ph.D. degree. He has been working on architectures and CMOS circuits for microprocessors. His current research interests include clock and data recovery for highspeed communication and high-speed I/O interface circuits.

Jongsang Choi was born in Korea on September 11, 1974. He received the B.S. and M.S. degrees in electronics engineering from Seoul National University, Seoul, Korea, in 1997 and 1999, respectively, where he is currently pursuing the Ph.D. degree. He has been working on architectures and CMOS circuits for high-speed communication. His current research interests include high-speed CMOS circuits and gigabit network systems.

Kyeongho Lee (S'92–M’00) was born in Seoul, Korea, on August 5, 1969. He received the B.S., M.S., and Ph.D. degrees in electronics engineering from Seoul National University, Seoul, Korea, in 1993, 1995, and 2000, respectively. Since 2000 he has been with Global Communication Technology, Inc., Los Altos, CA. He is working on various CMOS high-speed circuits for RF communication. His research interests include high-speed CMOS circuits and PLL systems.

Deog-Kyoon Jeong (S'87–M'89) received the B.S. and M.S. degrees in electronics engineering from Seoul National University, Seoul Korea, in 1981 and 1984, respectively., and the Ph.D. degree in electrical engineering and computer sciences from the University of California at Berkeley, Berkeley, CA, in 1989. From 1989 to 1991, he was with Texas Instruments Incorporated, Dallas, TX,, where he was a Member oif the Technical Staff. He worked on modeling and design of BiCMOS circuits and single-chip implementation of the SPARC architecture. Since 1991, he has been on the Faculty of the School of Electrical Engineering, Seoul National University, Seoul, Korea, as an Associate Professor. His research interests include high-speed circuits, microrocessor architectures, and memory systems.

Min-Kyu Kim was born in Seoul, Korea, in 1965. He received the B.S., M.S., and Ph.D. degrees in electronics engineering from Seoul National University, Seoul, Korea, in 1988, 1990, and 1998, respectively. From 1995 to 1996, he was with the Electronics and Telecommunications Research Institute, Taejon, Korea, working on the development of high-speed communication IC's for ATM switches. Since 1998, he has been working on high-speed serial link technologies at Silicon Image, Inc., Cupertino, CA. His current interests include circuit design for high-speed communication systems and digital-interface display systems.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 3, MARCH 2001

417

CMOS DLL-Based 2-V 3.2-ps Jitter 1-GHz Clock Synthesizer and Temperature-Compensated Tunable Oscillator David J. Foley, Student Member, IEEE, and Michael P. Flynn, Senior Member, IEEE

Abstract—This paper describes a low-voltage low-jitter clock synthesizer and a temperature-compensated tunable oscillator. Both of these circuits employ a self-correcting delay-locked loop (DLL) which solves the problem of false locking associated with conventional DLLs. This DLL does not require the delay control voltage to be set on power-up; it can recover from missing reference clock pulses and, because the delay range is not restricted, it can accommodate a variable reference clock frequency. The DLL provides multiple clock phases that are combined to produce the desired output frequency for the synthesizer, and provides temperature-compensated biasing for the tunable oscillator. With a 2-V supply the measured rms jitter for the 1-GHz synthesizer output was 3.2 ps. With a 3.3-V supply, rms jitter of 3.1 ps was measured for a 1.6-GHz output. The tunable oscillator has a 1.8% frequency variation over an ambient temperature range from 0 C to 85 C. The circuits were fabricated on a generic 0.5- m digital CMOS process. Index Terms—CMOS analog integrated circuits, delay-locked loops, frequency synthesizers, tunable oscillators, voltage controlled oscillators.

I. INTRODUCTION

T

RADITIONALLY, phase-locked loops (PLLs) have been used for clock synthesis. The synthesizer and tunable oscillator outlined in this paper employ a delay-locked loop (DLL). A DLL is more stable than higher order PLLs and requires only one capacitor in its first-order loop filter. On the other hand, a PLL generally requires a more complex second-order filter. This filter usually employs larger components which may need to be off chip. Additionally, a DLL offers better jitter performance than a PLL because phase errors induced by supply or substrate noise do not accumulate over many clock cycles [1]. The self-correcting DLL overcomes problems of false locking associated with conventional DLLs. A self-correcting circuit detects when the DLL is locked, or is attempting to lock, to an incorrect delay and then brings the DLL into a correct locked state. This DLL does not require the delay control voltage to be set on power-up; it can recover from missing reference clock pulses and, because the delay range is not restricted, it can accommodate a variable reference clock frequency. This paper describes how a small number of additional Manuscript received July 19, 2000; revised October 24, 2000. This work was supported by Parthus Technologies. D. J. Foley is with the Department of Microelectronics, National University of Ireland, Cork, Ireland. M. P. Flynn is with Parthus Technologies, Cork, Ireland. Publisher Item Identifier S 0018-9200(01)01483-4.

digital logic gates are required to convert a conventional DLL into a wider range self-correcting DLL. For comparison, in [2] a second DLL is added to achieve wider range operation. The synthesizer outlined in this paper operates over a wide range of input reference clock frequencies and generates a lowjitter output clock running at nine times the reference frequency. Jitter measurements of 3.2 ps rms and 20 ps peak-to-peak, for a 2-V supply and 1-GHz output frequency, show that the core DLL compares well with recently reported DLLs [2], [3]. Multiple clock phases from the DLL are combined using digital logic to produce the synthesizer output [4]. An alternative approach requiring a pair of on-chip tuned LC-tanks is described in [5]. The tunable voltage-controlled oscillator (VCO) is intended for use in a transceiver where the receive and transmit clocks are plesiochronous. It is possible to tune the VCO around a center frequency while still maintaining good temperature independence. In some applications it may also act as a replacement for a fractional-N-type synthesizer. This circuit is similar to the oscillator described in [6] but it uses a lower jitter DLL in place of the PLL and can operate over a wider frequency range. In Section II the DLL architecture is discussed, starting with a review of a conventional DLL and progressing to the new self-correcting architecture. Section III outlines the clock synthesizer architecture. This is followed in Section IV by an outline of the temperature-compensated tunable oscillator architecture. Section V discusses the circuit layout and Section VI introduces measured performance results for the two circuits. This paper then concludes in Section VII with a summary of the achievements of this work. II. DLL ARCHITECTURE A. Conventional DLL A simplified block diagram of a conventional DLL is illustrated in Fig. 1. This circuit contains a voltage-controlled delay line (VCDL), a phase detector, a charge pump, and a first-order loop filter. The delay line, consisting of cascaded variable delay stages, is driven by the input reference clock, ckref. The output of the delay line’s final stage and the ckref falling edges are compared by the phase detector to determine the phase alignment error. The phase detector output is integrated by the charge pump and loop filter capacitor to generate the control voltage, vcntl, of the delay stages. When correctly locked, the total delay of the delay line should equal one period of the reference clock. A conventional

0018–9200/01$10.00 © 2001 IEEE

418

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 3, MARCH 2001

Fig. 1. Conventional DLL architecture. Fig. 3.

Fig. 2. (a) Three-stage VCDL. (b) Waveforms with correct lock. (c) Waveforms with false lock.

DLL may lock or attempt to lock to an incorrect delay. In Fig. 2 we show correct and false locking for a three-stage delay line [Fig. 2(a)]. Fig. 2(b) shows the output phases at each stage , and with the delay line in correct lock. The DLL and ckref. The total delay is one control loop has aligned and ckref are period of the reference clock. In Fig. 2(c) again aligned but the total delay is two clock periods. The DLL can also falsely lock to three or more periods of delay or can attempt to lock to zero delay. B. Self-Correcting DLL Architecture Fig. 3 shows a block diagram of the new self-correcting DLL. The problem of false locking is solved by the addition of a lockdetect circuit and by some slight modifications to the conventional phase detector. The DLL incorporated in the two designs reported in this paper employs a nine-stage VCDL as shown in Fig. 3. In a conventional DLL, only the state of the output of the last delay element is used. From the example in Fig. 2, we can see that the state at the outputs of the other delay elements can

Self-correcting DLL architecture.

provide additional information about the nature of the locked delay. In the prototype the delayed phases, (1:9), are decoded to indicate the VCDL delay. If the delay is outside an acceptable delay range then the lock-detect circuit takes control of the loop from the phase detector. The lock-detect circuit signals the charge pump to charge or discharge the filter capacitor until it is safe for the phase detector to regain control of the loop. Three control signals are produced by the lock-detect circuit: over to indicate that the VCDL delay is greater than 1.5 reference clock periods, under to indicate that the delay is less than 0.75 clock periods, and release is activated when the delay reaches 1.25 clock periods. The release signal clears the over and under control signals and removes the phase detector from reset. The phase detector then regains control of the loop. If neither under nor over is active then the phase detector has control of the loop and the DLL is either in correct lock or approaching correct lock. If the DLL is in lock and it is brought out of lock because of missing reference clock pulses or a step in the input reference frequency, then the DLL may inadvertently try to lock to an incorrect delay. The DLL is allowed to attempt to reach the undesired lock delay until it triggers either an over or an under signal at which time the lock-detect circuit takes control of the DLL loop. C. Lock-Detect Circuit The VCDL output phases are first level shifted to CMOS levels. The level shift circuitry is designed to have high gain and fast rise and fall times. This helps to minimize any jitter contribution from this circuitry. The level-shifted output phases, (1:9), are latched on the rising edge of the reference clock. The outputs from these latches are processed by the decode circuitry as shown in the schematic of Fig. 4. The inputs, (1:8), correspond to the (1:8) output phases of the VCDL. Fig. 5 shows example output waveforms for a nine-stage VCDL. In Fig. 5(a) when the state of the VCDL output phases is decoded none of the control signals are activated as the VCDL is correctly locked to one period of the reference clock. In Fig. 5(b) the VCDL is incorrectly locked to two periods of the reference clock and the state of the output phases is decoded to activate the over control signal. The phase detector outputs, up and dn, signal the charge pump to charge or discharge the filter capacitor. An active over output

FOLEY AND FLYNN: CMOS DLL-BASED CLOCK SYNTHESIZER AND TEMPERATURE-COMPENSATED TUNABLE OSCILLATOR

419

Fig. 6. VCDL delay stage schematic.

Fig. 4.

Lock-detect decode circuitry.

Fig. 7.

(a)

(b)

Fig. 5. Nine-stage VCDL waveforms with (a) correct lock and (b) false lock.

from the lock-detect circuit disables the phase detector and activates the up control signal. Similarly, the lock detect under output activates dn. Following power-on reset the lock-detect circuit is initialized by setting over active. This ensures a faster acquisition time for the DLL because the filter capacitor is continuously charged to a voltage level corresponding to 1.25 reference clock periods. At this VCDL delay, the release signal is activated and the phase detector gains control of the loop and brings the DLL to lock. The state of the output phases corresponding to a delay of nine reference clock periods is the same as that corresponding to a single reference clock period delay. This circuitry is therefore only capable of detecting incorrect delays up to eight periods of the reference clock. This is not a limitation of the design as any delays above this would be outside the delay range of the VCDL. In general, the error detection periods of logic can detect an incorrect lock delay up to the reference clock, where is equal to the number of VCDL output phases. D. Voltage-Controlled Delay Line (VCDL) Fig. 6 shows one of the VCDL delay stages. The stage is designed to operate from a supply as low as 1.8 V and is similar to that used in [7]. The stage propagation delay is proportional to the tail current for the output charging and to the voltagecontrolled resistor (VCR) resistance for the output discharging.

Phase detector schematic.

A three-transistor VCR structure is adopted for better control linearity. The DLL negative feedback control loop compensates for variations in the stage delay due to process and temperature. The differential delay stage structure and coupling capacitors between bias lines and supply help to minimize supply-induced jitter noise. E. Charge Pump The charge pump charges or discharges the filter capacitor. The voltage on this capacitor, vcntl, sets the VCDL stage propagation delay. To minimize the temperature variation of the VCDL delay, the charging and discharging currents are proportional to absolute temperature. This helps to maintain a constant loop gain and phase margin over temperature. F. Phase Detector The phase detector, shown in Fig. 7, employs the conventional sequential-phase-frequency detection scheme [7] but extra gates have been included. This extra logic enables the lock-detect circuit to over-ride the phase detector control of the loop. The lockdetect output signals, over and under, now have direct control of the charge pump. The lock-detect circuit can therefore charge or discharge the VCDL control voltage, vcntl, to a voltage from which it is safe for the phase detector to regain control of the loop. III. CLOCK SYNTHESIZER ARCHITECTURE The clock synthesizer generates a differential output clock running at nine times the input reference frequency. The clock

420

Fig. 8.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 3, MARCH 2001

Clock synthesis waveforms.

Fig. 10. 1.62-GHz clock generation schematic.

Fig. 9.

Optimized AND-OR block diagram.

synthesizer employs the DLL structure shown in Fig. 3 to generate the multiple clock phases that are then combined to produce the output clock. There are two steps in the generation of the output clock. The first step combines the nine DLL output phases, (1:9), to generate three clocks ck1, ck2, and ck3. Fig. 8 shows the clock waveforms. These three clocks are phase separated by one-ninth of a reference clock period and have a frequency three times that of the reference clock. Fig. 9 shows how the 1, 4, and 7 output phases are combined in an optimized AND-OR structure with symmetrical delays to generate the ck1 clock. Using identical logic the 2, 5, and 8 phases produce the ck2 clock and the 3, 6, and 9 phases produce the ck3 clock. The second step in generating the synthesizer output clock is to combine these three clocks in another AND-OR structure and , running at to produce a differential output clock, nine times the reference clock frequency; see Fig. 8. This design produces a 1.62-GHz output clock frequency for a 180-MHz reference clock frequency. For a 0.5- m 3.3-V CMOS process there is a bandwidth limitation of approximately 500 MHz for reliable on-chip clock transmission [8]. The high bandwidth available at the chip outputs is utilized (determined by the external pull-up resistor and load capacitance) [8] to produce the 1.62-GHz clock as shown in Fig. 10. The AND function of the clock generation is performed in the chip core and the analog OR function is performed in the I/O ring. External pull-up resistors set the output swing and match the output impedance to that of the test equipment. Damping resistors are included to avoid any oscillations resulting from the combination of the lead and pin inductance and load capacitance. This removes the necessity to double bond these high-frequency outputs.

Fig. 11.

Tunable VCO architecture.

Fig. 12.

Tunable VCO stage block diagram.

IV. TEMPERATURE-COMPENSATED TUNABLE VCO ARCHITECTURE The temperature-compensated oscillator utilizes the control loop voltage, vcntl, of the DLL (Fig. 3) to compensate for any temperature and supply voltage induced frequency fluctuation in a VCO. Fig. 11 shows how the VCO and VCDL stages are both connected to vcntl. (For ease of illustration a conventional DLL is shown in Fig. 11 but in practice the new DLL architecture of Fig. 3 is employed). The VCDL in the DLL tracks temperature and process variations in the VCO circuit. The VCO is

FOLEY AND FLYNN: CMOS DLL-BASED CLOCK SYNTHESIZER AND TEMPERATURE-COMPENSATED TUNABLE OSCILLATOR

Fig. 13.

Fig. 14.

421

Fig. 15.

Variation of measured jitter over output frequency.

Fig. 16.

720-MHz synthesizer output for V

Fig. 17.

VCO frequency variation with temperature.

Fig. 18.

VCO frequency variation with tune voltage.

Die photo of the synthesizer and tunable oscillator.

= 1:8 V.

1.62-GHz synthesizer edge jitter histogram.

composed of the same delay stages as the VCDL and its temperature (and process) variations will therefore be the same (apart from some minor random mismatch effects and thermal gradients across the die). vcntl thus compensates for the VCDL and VCO temperature fluctuations. The last VCO stage has an additional tuning voltage, tune, which fine tunes the VCO frequency. By varying the tune voltage it is possible to tune the VCO center frequency to within 3%. A wider tuning range can be achieved by varying the frequency of the DLL reference clock, ckref. The schematic of the last VCO stage is shown in Fig. 12. This stage is identical to the other VCO and VCDL stages except that the VCR contains a transistor which is connected to the external tune voltage. In all other stages this transistor is connected to ground. The extra charging current required in this VCO stage . is provided by the controlled current source bias

ature-compensated tunable oscillator has an active area of 0.7 mm .

V. CIRCUIT LAYOUT The synthesizer and temperature-compensated tunable oscillator were fabricated on a standard 0.5- m triple-metal single-poly digital CMOS process. The die photomicrograph of the device, containing both the synthesizer and temperature-compensated tunable oscillator, is shown in Fig. 13. The synthesizer has an active area of 0.6 mm and the temper-

VI. TEST RESULTS Fig. 14 shows a histogram of the edge jitter on the 1.62-GHz synthesizer output clock for a supply of 3.3 V. Edge jitter of 3.1 ps rms and 20 ps peak-to-peak were measured. The jitter measurements of 3.2 ps rms and 20 ps peak-to-peak, for a 2-V supply and 1-GHz output frequency, show that the DLL core ex-

422

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 3, MARCH 2001

TABLE I MEASURED SYNTHESIZER CHARACTERISTICS

the tune voltage. As can be seen from the plot, the relationship is close to linear. It is possible to tune the frequency around a center frequency in the range from 200 to 500 MHz by selecting an appropriate input reference frequency. This ensures that this scheme can be used for a wide variety of applications. The measured jitter on the 400-MHz output was 29 ps rms and 180 ps peak-to-peak. Table I shows the measured synthesizer characteristics. Table II summarizes the measured characteristics of the temperature-compensated tunable oscillator. VII. CONCLUSION In this paper, a robust self-correcting low-jitter DLL was used as the basis for a low-voltage high-frequency synthesizer and a temperature-compensated tunable oscillator. The DLL does not require the VCDL control voltage to be set on power-up. The DLL can recover from missing reference clock pulses and it can track step changes in a variable reference clock frequency. The synthesizer has significantly lower edge jitter than the traditional PLL-type synthesizer [9] and other reported DLL circuits [10], [11]. The temperature-compensated tunable oscillator provides a temperature-stable tunable frequency that varies by just 1.8% over the 0 C to 85 C temperature range.

TABLE II MEASURED TUNABLE OSCILLATOR CHARACTERISTICS

ACKNOWLEDGMENT The authors wish to acknowledge contributions from the following Parthus Technologies employees: J. Ryan, J. Horan, C. Cahill, F. Fuster, J. Collins, B. Kinsella, M. Erett, and S. Murphy. The authors also wish to thank R. Fitzgerald from the NMRC for the die photo micrographs. The device was fabricated on the ESM (Newport) Wafer Fab through Europractice. REFERENCES

hibits better jitter performance than that reported for the higher voltage DLLs (3.3-V supply, 0.35- m CMOS, 4-ps rms jitter) in [2] and (5-V supply, 0.7- m CMOS, 10-ps rms jitter) in [3]. The measured jitter (rms) variation versus synthesizer output frequency for a 3.3-V supply is shown in Fig. 15. With the supply reduced to 1.8 V, the rms jitter was measured at 4.9 ps for an output frequency of 720 MHz. Fig. 16 shows this 720-MHz synthesizer output. Mismatched propagation delays and interblock routing in the frequency multiplication block (Fig. 9) resulted in 100-ps interperiod jitter. Fig. 17 shows the temperature-compensated tunable oscillator frequency variation with temperature. Varying the ambient temperature from 0 C to 85 C resulted in a total frequency variation of 1.8%. Fig. 18 shows the variation of frequency with

[1] B. Kim, T. C. Weingandt, and P. R. Gray, “PLL/DLL system noise analysis for low-jitter clock synthesizer design,” in Proc. ISCAS, June 1994, pp. 151–154. [2] Y. Moon, J. Choi, K. Lee, D. Jeong, and M. Kim, “An all-analog multiphase delay-locked loop using a replica delay line for wide-range operation and low jitter,” IEEE J. Solid-State Circuits, vol. 35, pp. 377–384, Mar. 2000. [3] M. Mota and J. Christiansen, “A high-resolution time interpolator based on a delay-locked loop and an RC delay line,” IEEE J. Solid-State Circuits, vol. 34, pp. 1360–1366, Oct. 1999. [4] D. Foley and M. Flynn, “CMOS DLL-based 2-V 3.2-ps jitter 1-GHz clock synthesizer and temperature compensated tunable oscillator,” in Proc. IEEE Custom Integrated Circuits Conf., May 2000, pp. 371–374. [5] G. Chien and P. R. Gray, “A 900-MHz local oscillator using a DLL-based frequency multiplier technique for PCS applications,” in ISSCC Dig. Tech. Papers, Feb. 2000, pp. 202–203. [6] H. Chen, E. Lee, and R. Geiger, “A 2-GHz VCO with process and temperature compensation,” in Proc. ISCAS, June 1999, pp. 11 569–11 572. [7] A. Young, J. K. Greason, and K. L. Wong, “A PLL clock generator with 5 to 110 MHz of lock range for microprocessors,” IEEE J. Solid-State Circuits, vol. SC-27, pp. 1599–1607, Nov. 1992. [8] M. Horowitz, C.-K. K. Yang, and S. Sidiropoulos, “High-speed electrical signaling: Overview and limitations,” IEEE Micro., vol. 18, pp. 12–24, Jan./Feb. 1998. [9] H. C. Yang, L. K. Lee, and R. S. Co, “A low-jitter 0.3-165 MHz CMOS PLL synthesizer for 3-V/5-V operation,” IEEE J. Solid-State Circuits, vol. 32, pp. 582–586, Apr. 1997. [10] J. G. Maneatis, “Low-jitter process-independent DLL and PLL based on self-biased techniques,” IEEE J. Solid-State Circuits, vol. 31, pp. 1723–1732, Nov. 1996. [11] S. Sidiropoulos and M. A. Horowitz, “A semidigital dual delay-locked loop,” IEEE J. Solid-State Circuits, vol. 32, pp. 1683–1692, Nov. 1997.

FOLEY AND FLYNN: CMOS DLL-BASED CLOCK SYNTHESIZER AND TEMPERATURE-COMPENSATED TUNABLE OSCILLATOR

David J. Foley (S’00) received the B.Eng. degree from the National University of Ireland, Limerick, in June 1988. In 1994 he received the M.Eng.Sc. degree from the National University of Ireland, Cork, where he is currently working toward the Ph.D. degree. He has worked in IC design with NEC Corporation, Tamagawa, Japan, from 1988 to 1990, AT&T Bell Labs, Tokyo, Japan, from 1990 to 1992, and Parthus Technologies, Dublin, Ireland, from 1994 to 1998.

423

Michael P. Flynn (S’92–M’95–SM’98) was born in Cork, Ireland. He received the B.E. and M.Eng.Sc. degrees from the National University of Ireland, Cork, in 1988 and 1990, respectively. He received the Ph.D. degree in electrical engineering from Carnegie Mellon University, Pittsburg, PA, in 1995. From 1998 to 1991, he was with the National Microelectronics Research Center, Cork. He was a Co-op Engineer with National Semiconductor in Santa Clara, CA, from 1993 to 1995. From 1995 to 1997, he was a Member of Technical Staff with Texas Instruments DSPS R&D Lab, Dallas, TX. He is now a Technical Director with Parthus Technologies, Cork. He is also a part-time Lecturer in the Department of Microelectronics at the National University of Ireland, Cork. Dr. Flynn received the 1992–1993 IEEE Solid-State Circuit Predoctoral Fellowship. He is a member of Sigma Xi.

ISSCC 2002 / SESSION 4 / BACKPLANE INTERCONNECTED ICs / 4.1

4.1

A 1.5V 86mW/ch 8-Channel 622–3125Mb/s/ch CMOS SerDes Macrocell with Selectable Mux/Demux Ratio

Fuji Yang, Jay O’Neill, Patrik Larsson, Dave Inglis, Joe Othmer Agere Systems, Holmdel, NJ 2.5-3.125Gb/s serial links are commonly used for chip-to-chip interconnects in high-speed network systems. In SONET OC-768 application, at least 16 on-chip SerDes transceivers are required to guarantee total full duplex I/O throughput of 40Gb/s. Published 2.5Gb/s SerDes transceivers consume between 150 and 200mW, not suitable for applications requiring hundreds of on-chip SerDes transceivers [1]. Developing a low-power SerDes transceiver is important for high throughput network ICs [2]. Another challenge is reduction of inter-channel noise coupling when integrating many transceivers on the same chip. This lowpower 8-channel SerDes macrocell employs a shared-PLL architecture. As shown in Figure 4.1.1, on the transmitter side, the on-chip TxPLL provides a half-rate clock to all transmitters. On the receiver side, the RxPLL distributes I- and Q-phase clocks to 8 receivers. Each receiver has a phase interpolator to generate an output phase-aligned with the in-coming data for clock and data recovery. Sharing a single PLL between a group of transmitters or receivers reduces the power and avoids the potential multi-VCO coupling problem found in a conventional one-PLLper-channel configuration. The macrocell realized in a 0.16µm CMOS process consumes an average power of 86mW per channel at 1.5V power supply. The transmitter 16:1 or 20:1 serialization starts with 4 shift-register based selectable 4:1 or 5:1 multiplexers. Their 4 outputs are sent to a tree-based 4:1 multiplexer (Figure 4.1.1). A pMOS CML output driver with on-chip 50Ω terminations is employed. The output signal referenced to the ground makes the interface independent of the power supply. The output amplitude is set to 1Vpp, diff. The receiver employs an interleaved integrate-and-dump frontend (Figure 4.1.1) [3, 4]. The integrate-and-dump operation improves the SNR and eliminates the quadrature clock required in a conventional half-rate front-end [5]. The integrator outputs are de-multiplexed by the decision-latches controlled respectively by ck2i and ck2q, which are divide-by-2 clocks of the recovered clock. The decision-latch outputs d1-d4 are fed into 4 shift-registers to realize the 4:16 or 4:20 de-serialization. The integrator is implemented in a way similar way to that proposed in Reference [4], but with a pMOS input stage. It has a gain of 2 allowing relaxed offset and noise requirements of the latches. The receiver achieves 30mVpp,diff sensitivity with BER <10-12 at 2.5Gb/s. The clock recovery is by a DLL based on an analog phase interpolator [6]. In contrast to the implementation in Reference [6], a four-quadrant phase mixer is used here. Referring to Figure 4.1.2, the DLL consists of a bang-bang phase detector (PD), a PD polarity control circuit, an amplitude control circuit, I- and Q-charge-pumps and the four-quadrant mixer-based phase interpolator. The analog phase interpolation is by mixing the Iand Q-phase clocks from the RxPLL with respective weights α (=Va-Vref) and β (=Vb-Vref): CLK=α*(I-IB)+β*(Q-QB). Va and Vb are independently generated by I- and Q-charge-pumps. The weights α and β, ranging from negative to positive, directly control the quadrant changes. This eliminates the potential phase discontinuity at quadrant crossings found in the circuit of Reference [6]. Figure 4.1.3 shows the schematic of one 4-quadrant mixer, where

• 2002 IEEE International Solid-State Circuits Conference

the nMOS differential pair converts Va to differential currents Ip and In, which are mirrored into the pMOS current sources to be steered by the high-speed differential clock (I-IB). A self-biased nMOS load is used with MP1 and MP2 to control the output common-mode voltage. The phase interpolator exhibits an infinite phase shift range allowing the DLL to easily track the frequency offset between the local clock and the incoming data and enables shared-PLL architecture for multi-channel serial links with plesiochronous clocking. Figure 4.1.4 illustrates the non-monotonic relation between the phase shift introduced by the interpolator and the two weights α and β. To have a 2π interpolation range, the bang-bang phase detector polarity must be updated to provide the correct up/down signals for different quadrants. This is by a PD-polarity-control circuit in association with a Q-detect circuit. The Q-detect circuit detects the output vector quadrant by determining the sign of α and β. The Q-detect circuit uses the replica of the V-I converter in the phase mixer. Although the phase mixer has control weights α and β, the phase interpolation is only a function of α/β, and is independent of the amplitude of α and β. The loop, sensitive only to the phase variation, thus controls α/β. As a consequence, α and β can grow or shrink arbitrarily. To prevent α and β from being too small, an offset current is intentionally introduced in the charge-pumps. It is controlled as follows: If α>0, Iup = I0 + Ioffset and if α<0, Idown = I0 + Ioffset (the same algorithm is applied for Q-charge-pump). As the result, α and β are always pulled away from zero to eliminate any shrinking possibility. To prevent overflow on α and β, the amplitude control circuit clips α and β by blocking UP or DOWN signal. As shown in Figure 4.1.4, Va or Vb will be kept within [Vmin, Vmax]. The test chip in a 0.16µm 5-level metal CMOS technology uses a 217-pin PBGA package. The chip micrograph is shown in Figure 4.1.5. Active area is about 2mm2. Figure 4.1.6a shows the measured jitter tolerance of the receiver. The CDR works with VDD as low as 1V for 1Gb/s maximal input data rate. With 1.5V power supply, the receiver covers an input data rate range of 622 to 3125Mb/s. Measured recovered clock jitter is 87.1ps pp at 2.5Gb/s. Figure 4.1.6b shows the Tx output eye diagram measured at 3.2Gb/s with a 231-1 PRWS. The measured jitter is 57.8ps pp and static VDD sensitivity is 0.06ps/mV. Measured results are summarized in Figure 4.1.7. References: [1] R. Gu et al “A 0.5-3.5Gb/s Low-Power Low-Jitter Serial Data CMOS Transceiver,” ISSCC Digest of Technical Papers, pp. 352-353, Feb. 1999. [2] M-J. Lee et al., “An 84mW 4Gb/s clock and data recovery circuit for serial link applications” VLSI Symposium 2001, pp. 149-152, 2001. [3] S. Sidiropoulos et al “A 700Mb/s/pin CMOS signaling interface using a current integrating receivers” IEEE JSSC, vol. 32, no. 5, pp. 681-690, May 1997. [4] J. Savoj et al., “A CMOS Interface Circuit for Detection of 1.2Gb/s RZ Data” ISSCC Digest of Technical Papers, pp. 278-279, Feb. 1999. [5] P. Larsson, “An Offset-Cancelled CMOS Clock Recovery/Demux with Half-Rate Linear Phase Detector for 2.5Gb/s Optical Communication” ISSCC Digest of Technical Papers, pp. 74-75, Feb. 2001. [6] T. Lee et al., “A 2.5V CMOS delay-locked loop for an 18Mb, 500Mb/s DRAM” IEEE JSSC, vol. 9, no. 2, Dec. 1994

0-7803-7335-9

©2002 IEEE

ISSCC 2002 / February 4, 2002 / Salon 8 / 1:30 PM 9E

9D 5[3//

7[ 3// 5;

FK 

φ LQWHUSFNJHQ

3' FN

FNL

LQW 

GULYHU

FNL ' '

LQW











'

G G G G

FNT

FNT

,QSXW GDWD &/.



'

FORFNJHQ

FK  FK  FK 

GG

XS GQ

$PSOLWXGHFRQWURO

FK  FK  FK 

,

XSL 3'SRODULW\FRQWURO

FK 

9UHI

4GHWHFW

4

3KDVHGHWHFWRU

7;

,

&3L

GQL

,%

9D

9UHI

XST

9E

&3T

GQT 9PD[

'

$GHWHFW

4% 4

9PLQ 9D

Figure 4.1.1: Overall architecture.

9E

Figure 4.1.2: DLL block diagram.

β 4

,S

,Q

9D

2XWSXWYHFWRU

,%

,

Φ

9UHI 9RS 03

03

α ,

9D9E ,

9UHI

9RQ

,,

,,,

,9

9PD[ 







Φ(GHJU.)

9PLQ FOLSLQJ

&RPPRQORDGFLUFXLW Figure 4.1.3: Four-quadrant mixer schematic.

Figure 4.1.4: Relation between the phase shift and the weights Va and Vb.

5[3//

5;

7;

7[ 3//

Figure 4.1.5: Die micrograph.

• 2002 IEEE International Solid-State Circuits Conference

Technology:

0.16 CMOS with 5 metal levels

Supply Voltage

1.5V

Power dissipation

75mW per transceiver with Tx output set to 1Vpp,diff 85mW for Tx and Rx PLLs + clock buffers

Active area

2mm2 (PLL : 0.1mm2, single transceiver : 0.25mm2)

BER

< 10 –12 (all measurements were done with BER < 10 –12 )

Receiver Sensitivity

30mVpp,diff

Recovered clock jitter

87.1psPkPk

Max. offset frequency

400ppm at 2.5Gb/s

Input data rate range

622-3125Mb/s

Transmitter’s output jitter

57.8ps Pk-Pk at 3.2Gb/s

TxoutputVDD sensitivity

0.06ps/mV

Output Amplitude

1Vpp,diff

Figure 4.1.7: Summary of measured results.

0-7803-7335-9

©2002 IEEE

SVGLY

D

E

Figure 4.1.6: Measured Rx jitter tolerance and Tx eye diagram at 3.2Gb/s.

• 2002 IEEE International Solid-State Circuits Conference

0-7803-7335-9

©2002 IEEE

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998

1987

A Fully Integrated VCO at 2 GHz Markus Zannoth, Bernd Kolb, Joseph Fenk, and Robert Weigel, Senior Member, IEEE

Abstract—A fully integrated voltage-controlled oscillator at a frequency of 2 GHz with low phase noise has been implemented in a standard bipolar process with a f t of 25 GHz. The design is based on an LC-resonator with vertical-coupled inductors. Only two metal layers have been used. The supply voltage of the oscillator is 2.7 V. The phase noise is only 0136 dB/Hz at 4.7MHz frequency offset. A tuning range of 150 MHz is achieved with integrated tuning diodes. Index Terms— Bipolar, fully integrated VCO, noise requirement for cordless phones.

I. INTRODUCTION

W

ITH the fast growth of the wireless application market, there is a growing need for smaller designs and higher levels of integration for the reduction of costs and size. Because of the very poor performance of integrated resonators on silicon IC’s, local RF oscillators are difficult to integrate with regard to the phase noise requirements. At the moment, external resonators are used with external hyperabrupt tuning diodes. The integrated tuning diodes have low performance because it is not possible to produce a hyperabrupt pn-junction in our standard bipolar process. The limiting factor is the inductor of the resonator, which can only archive quality factors of about four. While the performance of mobile telecommunication standards like global system for mobile communication (GSM) or digital communication system (DSC-1800) requires such high noise performance, which cannot be reached in our technology with the use of full integration of the local oscillator, the requirements for cordless phones like for the Digital European Cordless Telecommunications (DECT) standard seem to be achievable. In a DECT system, the critical point concerning oscillator phase noise is the emissions due to modulation. There the emitted power at the output in the third adjacent channel is specified to be smaller than 20 nW, which equals 47 dBm [1]. With a maximum output power of 25 dBm, there is a difference of 72 dB. The specified measurement filter to get this power has a bandwidth of 1 MHz. So the noise requirement at this point becomes 132 dB/Hz, which results dBm from: dBm dB Hz dB Hz . This is the difference between the noise level and the output-power normalized to 1 Hz. The frequency offset for this specification is the start frequency of the measurement filter. As this is in the third Manuscript received April 10, 1998; revised July, 17, 1998. M. Zannoth, B. Kolb, and J. Fenk are with Siemens AG, Muenchen D81541, Germany. R. Weigel is with the University of Linz, Institute for Communication and Information Engineering, Linz A-4040 Austria. Publisher Item Identifier S 0018-9200(98)08854-4.

Fig. 1. DECT specification.

adjacent channel, three times the channel spacing of 1.728 MHz has to be taken (see Fig. 1). The filter is centered in the channel, half of the filter-bandwidth of 1 MHz has to be subtracted to get the offset frequency MHz MHz

MHz MHz

Normalized to a 100-kHz frequency offset, the requirement is 98.6 dB/Hz, if the phase noise has a constant slope of 20 dB/decade, as assumed by [3]. These are the requirements for the main transmitter-voltage-controlled oscillator (TX-VCO), when the following blocks are not dominating in noise. This is indeed fulfilled. With fully integrated oscillators these requirements seem to be possible to realize. This paper presents a fully integrated oscillator, which achieves the required specification. It uses an LC oscillator consisting of integrated tuning diodes and integrated vertically coupled inductors. In this design only two metal layers in a standard bipolar process are used. II. OSCILLATOR

WITH

COUPLED INDUCTORS

The phase noise in oscillators depends on the quality factor of the resonator, the noise figure of the amplifier creating a negative resistance, and the energy in the resonator [3], [4]. For low phase noise, passive resonators are chosen. With active inductors, noise is added by the active devices that cannot be compensated by increasing the quality factor. For best integration and reproducibility spiral inductors are used, as described in [5]. The quality factor of the resonator is mainly limited by the inductors. Here it is limited to four, as only two metal layers are available. One metal layer has a thickness of 0.8 m. With limited chip area for the inductor and the use of

0018–9200/98$10.00  1998 IEEE

1988

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998

Fig. 2. Possibilities of feedback.

vertically coupled inductors, a quality factor of only four was achievable without changing technology parameters. The simplest way to reduce phase noise is increasing the resonator energy by applying higher voltages to the resonator. In this design an emitter-coupled pair with cross feedback is used as a negative resistance, which is responsible for undamping the resonator. The limit of the maximum oscillation amplitude depends on the feedback. There are three ways of feeding the output-signal back to the input (see Fig. 2). The easiest way is direct coupling, where no biasing network is needed and very low power consumption can be achieved. Using direct coupling the voltage across the resonator is limited by the base-collector diode of the transistors. When forward biased, this diode inserts additional damping and current noise to the resonator causing increased phase noise. With capacitive coupling [6] this can be avoided. Here no resistive element is inserted into the feedback. With capacitive feedback a phase noise of 100 dB/Hz at 100 kHz could be achieved by [6], having quality factors of eight. The disadvantage is the need of a high-impedance biasing network at the transistors base. This biasing network can be realized by noisy resistors or by large inductors that cost a lot of chipspace. If resistors are used, uncorrelated noise is introduced to both halfwaves of the signal, when the oscillator acts in its linear region. This noise is nearly negligible, when a low resonator is used. In our case the impedance at the input of the feedback amplifier is about 500 . This impedance consists of the feedback capacitor and the tank impedance at resonance. The Bias resistor in parallel is about 4 k , and so the effect of adding noise is not very dominant. With inductive coupling the bias current can be fed through the inductor. This allows connecting a low-impedance biasing network which can be made of a voltage source. The advantage of connecting a voltage source directly to the circuit is the absence of resistive elements that cause white noise, which would be converted to phase noise by the nonlinear elements. Every DC path can be blocked carefully against emissions from the supplies without any resistive element. The maximum voltage at the resonator can be adjusted by the biasing voltage so that the base-collector diode is not the limiting element. Now the amplitude of the swing is limited by the base emitter diode of the transistors and the limitation of the current source. The energy in the resonator can be increased and so the phase noise is reduced. Now the maximum voltage is not one diode-voltage, as in the

Fig. 3. Equivalent circuit of the tuning diode.

case of direct coupling. In the presented design the resonator . voltage reaches a value of 3 The limiting elements for the maximum voltage of the resonator are now the two serial-connected tuning diodes. To decrease their voltage without reducing the resonator-energy, a capacitor is added in series at the cost of tuning range. This capacitor is also responsible for getting a linear tuning characteristic. To provide the DC-path for the tuning diodes, resistors are connected in parallel to the coupling capacitances. These resistors are negligible in sight of reducing the quality factor because they have a large value of 1 k , which is much larger than the capacitances impedance of 40 (see Fig. 6). The quality factor of the capacitance is about 24, which is at the same range as the varactor. These quality factors are negligible high relative to that of the inductor. The inductors are produced as symmetrical quadratic spirals. At our standard bipolar process only two metal layers could be used to create vertically coupled inductors. The crosses are made in the gap between two metal lines (see Fig. 7). The cost of this technique is the wide gap between the lines, which causes an increment of the size and parasitic effects like series resistance and substrate capacitance. The quality factor is as low as four. This is caused by the technology, where the metal layers have a poor conductivity and high capacitances to the medium-doped substrate. In this design an inductor of 2.7 nH was used. Its series resistance is 4.2 . The coupling factor was estimated to 0.85. The values of the equivalent circuit (see Fig. 8) where first calculated by algorithms from [7] and then fitted to measurement. The coupling capacitor was estimated from the plate capacitance of the two metal layers. For tuning, the base-emitter diode of a transistor is used (see Fig. 3). This has the disadvantage of a high series resistance and a relatively low capacitance (base resistance) of 2.6 variation by a factor of 1.75 applying a voltage difference of 2.7 V (see Fig. 4). However, this represents the only way to create a tuning diode without changing the standard bipolar process, where no hyperabrupt pn-junctions are available. The of this varactor was simulated to be about 25 (see Fig. 5) when it is calculated from 1/(jwRC). The base-collector diode could not be used for tuning, because it does not have such a large capacitance variation.

ZANNOTH et al.: FULLY INTEGRATED VCO

1989

Fig. 7. Simplified layout of the inductor.

Fig. 4. Capacitance of the tuning diode.

Fig. 8. Equivalent circuit of the inductor.

Fig. 5. Quality factor of the tuning diode.

Fig. 9. Output buffer.

resonator, a high impedance is required for minimizing the effects of the load. The signal is fed through small coupling capacitances (400 fF) to emitter-followers that provide this high input impedance. This first stage drives a differential amplifier with open collector outputs. A balun can be connected that transforms the differential signal to a single ended one that can be connected to 50 . The current-consumption of the amplifier is about 6 mA. Its output power is about 8 dBm at 50 , which is enough for noise measurements. Fig. 6. VCO with coupled inductors.

III. RESULTS To get the signal into a 50- measurement system a buffer is added (see Fig. 9). As the signal is taken directly from the

The measured phase noise of the oscillator can be calculated from expressions by Leeson [3] and has a slope of 20 dB/Dec

1990

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998

SUMMARY Supply voltage

Measured tuning characteristic.

12 mA

Current of amplifier

6 mA

Output power

08 dBm

Center frequency

1.96 GHz

Tuning constant (KVCO)

055 MHz/V

Tuning range (0 . . . 2.7 V)

150 MHz

Phase noise @ 4.7 MHz

0102 dBc/Hz 0136 dBc/Hz

Suppression of harmonics

23 dB

Size of one inductor Chip size without Pads

Fig. 11.

2.7 V

Current of oscillator core

Phase noise @ 100 kHz

Fig. 10.

TABLE I VCO CHARACTERISTICS

OF THE

2 215 m 600 m 2 600 m 215 m

Measured and simulated phase noise.

measured at offset-frequencies between 100 kHz and 50 MHz, which represent the measurement limits. If the phase noise is calculated from

[5] we get a value of 135 dB/Hz at an offset frequency of 4.7 MHz. The effective series resistance is about 16 . It is taken from the series resistances of the inductor, the tuning diodes, and the coupling capacitances. The amplitude (3 ) of the oscillation was simulated to be 1.5 at a resonance frequency of 2 GHz. The noise factor of the amplifier was supposed to be about two. The expressions from [3] give a approximation for the expected noise. For better calculation, nonlinear effects have to be considered. In [12], such calculations as the correlation between waveform and phase noise are shown. The simulation with spectre RF shows nearly the same values as the measured ones. In Fig. 11 the simulated and measured noise are shown up to an offset frequency of 6 MHz. In the simulation the equivalent circuits are taken from above. The simulator uses nonlinear methods [13], [14] to calculate noise. It gives nearly the same results as the measurement and the small signal approximation from above. The measurement shows that the free running oscillator has a resonant frequency of 1.96 GHz, a phase noise of 136

Fig. 12. Chip photograph.

dB/Hz at 4.7 MHz offset frequency, and a VCO constant of 55 MHz/V (see Figs. 10 and 11). The phase noise is measured in the middle of the tuning range. With a thicker metal layer (1.2- m aluminum instead of 0.8 m) a phase noise of even 137 dB/Hz can be achieved. This improvement is achieved by reducing the series resistance of the inductor and increasing its quality factor. The tuning characteristic is nearly linear (see Fig. 10) and the noise performance varies only less than 1 dB at the whole tuning range from 0 V–2.7 V. The linearity of this characteristic is improved by the series capacitance (Fig. 6) added to the varactor diode. The current through the oscillator core is about 12 mA; the supply voltage is 2.7 V. At tuning voltages above the supply voltage the varactor diodes get forward biased and so the frequency stays nearly constant and the noise rises.

ZANNOTH et al.: FULLY INTEGRATED VCO

This occurs because of the reduction of the quality factor and the introduction of additional current noise due to the forward-biased diodes.

1991

Markus Zannoth was born in Munich, Germany, in 1971. He received the Dipl. Ing. degree in electrical engineering in 1996 from the Technical University of Munich, Munich, Germany. Since 1996, he has been working towards the Dr.Ing. degree at Siemens AG and the Technical University of Munich. His doctoral research is on integrated oscillators.

IV. CONCLUSION A fully integrated bipolar VCO is realized (see Fig. 12) that achieves a measured phase noise of 136 dB/Hz at 4.7 MHz. The oscillator has a linear tuning characteristic with a tuning range of 150 MHz at a center frequency of 1.96 GHz. Further characteristics are given in Table I. In this design two metal layers are used to build vertically coupled integrated inductors. These have quality factors of about four. Integrated varactor diodes are implemented by using base-emitter diodes of transistors. With this design the noise requirements of the DECT-specification of 132 dB/Hz at 4.7 MHz frequency offset are achieved with a margin of 4 dB. The output power is 8 dBm at 50 , with a center frequency of 1.95 GHz. For the use of this oscillator in a DECT product, the varactor-capacitance will be increased until the required center frequency of 1.88 GHz is reached. The design has been realized in standard high-volume bipolar process with of 25 GHz. an REFERENCES [1] ETSI, Digital European Cordless Telecommunications (DECT) Common Interface, Part 2: Physical Layer, Oct. 1992. [2] L. L. Larson, RF and Microwave Circuit Design for Wireless Communications. Boston: Artech House, 1996. [3] B. D. Leeson, “A simple model of feedback oscillator noise spectrum,” Proc. Lett. IEEE, pp. 329–330, Feb. 1966. [4] G. Sauvage, “Phase noise in oscillators: A mathematical analysis of Leeson’s model,” IEEE Trans. Instrum. Meas., vol. IM-26, pp. 408–410, Dec. 1977. [5] J. Craninckx and M. S. J. Steyaert, “A 1.8-GHz low-phase-noise CMOS VCO using optimized hollow spiral inductors,” IEEE J. Solid-State Circuits, vol. 32, pp. 736–744, May 1997. [6] G. Palmisano, M. Paparo, F. Torrisi, and P. Vita, “Noise in fully integrated PLL’s,” in Proc. 6th Workshop Advances in Analog Circuit Design AACD’97, Como, Italy, pp. 1–19. [7] J. Crols, P. Kinget, J. Craninckx, and M. Steyaert, “An analytical model of planar inductors on lowly doped silicon substrates for high frequency analog design up to 3 GHz,” in IEEE Symp. VLSI Circuit Dig. Tech. Papers, 1996, pp. 28–29. [8] J. N. Burghartz, M. Soyuer, and K. A. Jenkins, “Microwave inductors and capacitors in standard multilevel interconnect silicon technology,” IEEE Trans. Microwave Theory Tech., vol. 44, pp. 100–104, Jan. 1996. [9] L. Dauphinee, M. Copeland, and P. Schvan, “A balanced 1.5 GHz voltage controlled oscillator with an integrated LC resonator,” in Proc. ISSCC’97, Session 23, Analog Techniques, pp. 390–391. [10] I. B. Jansen, K. Negus, and D. Lee, “Silicon bipolar VCO family for 1.1 to 2.2 GHz with fully-integrated tank and tuning circuits,” in Proc. ISSCC’97, Session 23, Analog Techniques, p. 392. [11] B. Razavi, “A 1.8 GHz CMOS voltage—Controlled oscillator,” in Proc. ISSCC’97, Session 23, Analog Techniques, pp. 388–389. [12] K. A. Hajimiriand and T. H. Lee, “A general theory of phase noise in electrical oscillators,” IEEE J. Solid-State Circuits, vol. 33, pp. 179–194, Feb. 1998. [13] CADENCE, Oscillator Noise Analysis in SpectreRF, application note to SpectreRF, 1998. [14] F. X K¨artner, “Untersuchung des Rauschverhaltens von Oszillatoren,” Ph. D. dissertation, Tech. Univ. Munich, Munich, Germany, 1988.

Bernd Kolb was born in 1972. He studied electrical engineering with an emphasis on telecommunication techniques at the Georg-Simon-OhmPolytechnic Nuremberg. There, he received the Dipl.Ing. (FH) degree in 1995. He joined the Siemens High Frequency IC Department in 1995. Since then, he has worked in the field of oscillators, frequency dividers, and vector modulators. He has focused on designing highly integrated transmitter IC’s for mobile communication. He is now with Lucent Network Systems GmbH Nuremberg, Germany, where he designs high-frequency parts of base station for mobile communication.

Joseph Fenk received the diploma in electronics from the Technical University of Munich, Munich, Germany, in 1968. He is responsible for product definition and project management of communications RFintegrated circuits at Siemens Components, Inc., Integrated Circuit Division. After joining Siemens in 1968, he worked as a Development Engineer on high-frequency components in the Discrete Components Group, developing transmitters, aerial and tuner transistors, FET’s, and Varactor and PIN diodes. In 1976, he joined the Integrated Circuits Group as a Design Engineer for consumer products. He has been engaged in the development of integrated circuits for infrared preamplifiers, prescalers, IF-amplifiers/demodulators for FM-radio and satellite-TV, mixer/oscillators FM radio, TV-and SAT-TV, and TV UHF/VHF modulator IC’s, as well as circuits for narrowband FM mobile radio. He holds more than 50 patents relating to IC and system design and has presented technical papers at numerous industry conferences and forums.

Robert Weigel (S’88–M’89–SM’95) was born in Ebermannstadt, Germany, in 1956. In 1989, he received the Dr.Ing. degree, and in 1992 the Dr.Ing.habil degree, both in electrical engineering from the Technical University of Munich, Munich, Germany. From 1982 to 1988, he was a Research Assistant, from 1988 to 1994, he was a Senior Research Engineer, and from 1988 to 1996, he was a Professor at the Technical University of Munich. In the winter of 1994–1995, he was a Guest Professor at the Technical University of Vienna, Vienna, Austria. Since 1996, he has been Head of the Institute for Communication and Information Engineering at the University of Linz, Austria. He has been engaged in research and development on microwave theory and techniques, integrated optics, high-temperature superconductivity, surface acoustic wave (SAW) technology, and digital and microwave communication systems. In these fields, he has published more than 120 papers and has given more than 90 international presentations. His work includes European research projects and international journals. Dr. Weigel is a senior member of the IEEE Microwave Theory and Techniques and the Ultrasonics, Ferroelectrics, and Frequency Control Societies. He is also a member of the Institute for Systems and Components of the Electromagnetics Academy, the Informationstechnishe Gesellschaft (ITG) in the Verband Deutscher Elekrotechniker (VDE), and the Society of PhotoOpticals Instrumentation Engineers (SPIE). In 1993 he was a co-recipient of the MIOP-award.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

295

A Simple Precharged CMOS Phase Frequency Detector Henrik O. Johansson Abstract— We propose a simple precharged CMOS phase frequency detector (PFD). The circuit uses 18 transistors and has a simple topology. Therefore, the detector, in a 0.8-m CMOS process, works up to clock frequencies of 800 MHz according to SPICE simulations on extracted layout. Further, the detector has no dead-zone in the phase characteristic which is important in low jitter applications. The phase and frequency characteristics are presented and comparisons are made to other PFD’s. The phase offset of the detector is sensitive to differences of the dutycycle between the inputs. Mixed-mode simulations are presented of the lock-in procedure for a phase-locked loop (PLL) where the detector is used. Measurements on the detector are presented for a test-chip with a delay-locked loop (DLL) where the phase detection ability of the detector has been verified.

Fig. 1. Conventional phase frequency detector (conPFD) from [2].

Index Terms— CMOS integrated circuits, delay lock loops, phase detectors, phase lock loops.

I. INTRODUCTION

A

part of a phase-locked loop (PLL) is the phase detector (PD) [1]. The PD detects the phase difference between the reference frequency and the controlled slave frequency. Some PD’s also detect frequency errors, they are then called phase frequency detectors (PFD’s). A PFD is usually built with a state machine with memory elements such as flip-flops [2], [3], Figs. 1 and 2, respectively. We propose a new simple PFD, ncPFD, which uses two nc-stages [4] and six inverters, Fig. 3(a). A drawback with some phase detectors is a dead zone in the phase characteristic at the equilibrium point. The dead zone generates phase jitter since the control system does not change the control voltage when the phase error is within the dead zone. In Section II the ncPFD circuit is described. The phase and frequency characteristics are discussed in Sections III and IV, respectively, and comparisons are made to other PFD’s. Behavioral mixed-mode simulations are made to check the lock-in properties of the ncPFD detector and these simulations are shown in Section V. Experiments on the phase detection abilities of the ncPFD are presented in Section VI.

Fig. 2. Precharge type phase frequency detector (ptPFD) from [3].

II. CIRCUIT The transistor schematic of the ncPFD is shown in Fig. 3(a). The detector has a 0-rad phase offset. The main part of the circuit is the nc stage [4]. Delays (two inverters) are inserted at the reference and slave inputs in order to remove the dead zone in the phase characteristics around rad phase error. In Fig. 4, waveforms for the circuit in Fig. 3(a) are shown when the slave input lags the reference input. Manuscript received March 11, 1997; revised August 21, 1997. The author is with Electronic Devices, Department of Physics and Measurement Technology, Link¨oping University, S-58183 Link¨oping, Sweden. Publisher Item Identifier S 0018-9200(98)00732-X.

(a)

(b)

Fig. 3. (a) The ncPFD in zero degree phase offset version. (b) Modified version with  rad phase offset.

The detector can easily be modified to one with -rad phase offset, as shown in Fig. 3(b), where one, or in general an odd number, of inverter(s) are used for the delays. If the phase detector is used only as a phase detector, i.e., not as a frequency detector, the circuit in Fig. 3(a) can be used as

0018–9200/98$10.00  1998 IEEE

296

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

Fig. 4. Waveforms for the case when slave lags after the reference signal. The pulse width of the up signal is larger than for the down signal.

Fig. 6. Magnified phase characteristics at zero phase error of the ncPFD (solid line), conPFD (dashed line), and the ptPFD (dash-dot line) from SPICE level-2 simulations of extracted layout, VDD = 3:0 V and f = 50 MHz.

Fig. 5. Phase characteristics of the ncPFD (solid line), conPFD (dashed line), and the ptPFD (dash-dot line) from SPICE level-2 simulations of extracted layout, VDD = 3:0 V and f = 50 MHz.

a -rad phase detector by switching the up and down signals. The equilibrium point will then be on the negative slope of the phase characteristics at rad instead of at the positive slope at zero, Fig. 5. Similarly, the -rad phase detector, Fig. 3(b), can be modified to a 0-rad phase detector. III. PHASE CHARACTERISTIC The phase characteristic of the proposed ncPFD is shown in Fig. 5 together with the characteristics of the conventional PFD (conPFD) of Fig. 1 [2] and the precharge type PFD (ptPFD) shown in Fig. 2 [3]. Unlike the conPFD and ptPFD, there is no dead-zone in the characteristics of the ncPFD. A magnification of the characteristics at zero phase is shown in Fig. 6. The dead zone of the conPFD can be reduced by inserting delay at the output of the four-input-NAND-gate. But if delays are inserted in the feedback signals from the up and down outputs of the ptPFD, the dead zone unfortunately increases. In an ncPFD, when the PLL is locked, both up and down signals are active. Therefore the phase offset of the PLL depends on the matching between the up and down currents of the charge pump. All data in this section are based on simulations of extracted layout with SPICE (level-2) when V and MHz unless otherwise stated. The layout was made in a 0.8- m standard CMOS process and the N and P-transistors are 2.0 and 4.0 m wide, respectively. The outputs were connected to 4.0 fF capacitors, and the inputs were driven with inverters with a tapering factor of one. A. Duty-Cycle and Transition-Time Dependence The output of the ncPFD depends on the pulse-width of the input signals. Hence, the duty cycle will affect the phase

Fig. 7. Phase characteristics for three cases with different duty cycles. The reference input duty cycle is 50% for all cases and the slave input has the duty-cycles 45%, 50%, and 55% for the dashed, solid, and dashed–dotted lines, respectively.

characteristics. The phase characteristics are checked for three different duty cycles, 45, 50, and 55%. When both the reference and slave have the same duty cycle, the phase offset is not affected. There is a dead zone at -rad when the duty cycle is less than 50%. A duty cycle of 45% gives a dead zone width of 0.50 rad, 1.6 ns, at rad. This dead zone may result in a metastable state of the control loop. When the duty cycle is different for the two inputs, the phase offset will be nonzero, Fig. 7. A duty cycle difference of 5% at 50 MHz, i.e., 1 ns, gives a phase offset of rad, i.e., 630 ps. The phase characteristic of the ncPFD is not affected by variations of the rise and fall times when they are in the range of 300 ps up to 600 ps. B. Maximum Operation Frequency A maximum operation frequency definition can be found in [3]. The definition is that the maximum operation frequency is one over the shortest period with correct up and down signals when the inputs have the same frequency and 90 phase difference. This definition is easily applicable on flipflop-based PFD’s where this frequency is easily identified. Unfortunately, the degradation of the performance of the

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

297

Fig. 10. Waveforms for the case when the slave has a higher frequency than the reference signal. The down signal has higher duty cycle than the up signal.

Fig. 8. The width of the dead zones of the ncPFD (solid), ptPFD (dashed), and conventional PFD (dash-dot) as function of frequency. The frequency resolution is 100 MHz and the supply voltage is 5.0 V. The plot is based on SPICE simulations of extracted layout.

Fig. 11. Frequency sensitivity for the ncPFD (solid), ptPFD (dash-dot), and conPFD (dashed). The plot is based on behavioral simulations with 20 different initial phases for each frequency and the mean-value for each frequency is plotted. The reference frequency is 50 MHz.

Fig. 9. Maximum frequency as function of supply voltage for the ncPFD (solid line), the ptPFD (dash-dot line), and the conPFD (dashed line). The frequency resolution is 25 MHz. The plot is based on simulations of extracted layout. The layouts are made in a standard 0.8-m CMOS process.

ncPFD is gradual for increasing frequency and this makes it hard to find a specific frequency where the circuit starts to malfunction. Therefore, we define the maximum operation frequency to be the frequency where the size of the dead zone starts to deviate significantly from the low-frequency value. This definition gives similar results for the flip-flop-based phase detectors as for the definition in [3], and it is applicable on the ncPFD. An example of how the dead-zone-width varies with the frequency is shown in Fig. 8. The maximum speeds for different supply voltages are plotted in Fig. 9 for the three PFD’s of Figs. 1, 2, and 3(a). As seen, the maximum speed of the ncPFD and the ptPFD are similar and the conPFD is approximately three times slower. IV. FREQUENCY CHARACTERISTICS A frequency dependent phase detector always has some kind of memory. For the ncPFD, the memory consists of the two dynamic nodes at the output of the nc-stages. In Fig. 10, the frequency of the slave input is approximately three times higher than the reference input frequency, as a result, the down signal has a higher duty cycle than the up signal. Thus the slave frequency should decrease.

The average frequency sensitivities of the ncPFD, ptPFD, and conPFD are shown in Fig. 11. The frequency sensitivity is represented by the rate of change in the control voltage of the loop filter of a PLL when the slave input is driven by a pulse generator with a fixed frequency instead of the voltage-controlled oscillator (VCO) output. Each frequency is simulated 20 times with different initial phases, i.e., skew between the inputs. The ptPFD has the largest sensitivity, followed by the conPFD, and the ncPFD has the lowest. The sensitivity goes to zero as the slave frequency approaches the reference frequency for both the ncPFD and ptPFD. But for the conPFD, the sensitivity is relatively high even for frequencies close to the reference. In Fig. 12 the sensitivity for the ncPFD is shown with the mean, minimum, and maximum values from the 20 simulations for each frequency. Note that the behavior of the minimum and maximum values are almost random. For the ncPFD, the minimum absolute value of the sensitivity is close to zero for certain frequencies, Fig. 12. Actually, the sensitivity is zero for some frequency ratios and phase combinations. This is the case also for the ptPFD but not for the conPFD. The condition for this seems to be that when the frequency ratio of the reference and slave inputs is a rational number and the ratio is in the interval 1/2 to 2, including the limits, the sensitivity is zero for certain initial phases. We have no general proof of the previous statement but, for example, the sensitivity of the ncPFD for as function of initial phase is shown in Fig. 13. The sensitivity is zero for the phases 0.0, 2.5, and 5.0 ns. This lack of sensitivity may lead to false locking for a PLL in operation. However,

298

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

Fig. 12. Frequency sensitivity for the ncPFD for a number of frequencies. The plot is based on behavioral simulations with 20 different initial phases for each frequency. The solid line is the mean value and the “ ” symbols are the minimum and maximum values. The reference frequency is 50 MHz.

+

Fig. 14. Lock-in process of a third-order PLL with the ncPFD as phase frequency detector. The loop filter and PLL data are shown in the upper right corner.

language M [6]. The loop filter used ideal R and C models in circuit mode with analog voltages. The loop filter and PLL data are shown as an inset in Fig. 14. A lock-in simulation is shown in Fig. 14. The simulation is done with the presence of 300 ps peak-to-peak phase noise. Because of the sawtooth-shaped frequency sensitivity of the ncPFD (for a fixed frequency offset and varied initial phase), Fig. 13, and the presence of noise, the lock-in time is not deterministic but random. The lock-in times for 60 simulations have been analyzed. Most simulations show a lock-in time of 7 s and the largest time is 16 s. There is no upper limit on the lock-in time. One simulation took approximately 3 cpu-min on a SPARC 10 workstation.

Fig. 13. Frequency sensitivity for the ncPFD when the slave frequency is 4/5 of the reference frequency. For the initial phases of 0.0, 2.5, and 5.0 ns the sensitivity is zero.

this false locking will not be stable, since a small phase change results in a nonzero sensitivity and drives the loop back to lock. One way to add small phase changes to the simulation is to include phase noise which is always present in an oscillator. When we add phase noise of approximately 300 ps peak-topeak to the simulations, the normalized minimum sensitivity which was zero will increase to approximately 0.01. The improvement is not significant but the sensitivity will be nonzero and positive for all phases. Hence, false locking is avoided. To further enhance the phase noise during the lock in process, one could use dithering techniques, i.e., add the signal from a noise/signal source to the control voltage of the VCO. V. BEHAVIORAL MIXED-MODE SIMULATIONS In order to understand the sensitivity to frequency errors and lock-in properties of the proposed detector, a complete third-order charge pump PLL system was simulated using a multilevel mixed-mode simulator, Lsim [5]. The PFD was represented by a schematic simulated in switch mode. The VCO, phase-noise generator, and charge pump are represented by behavioral models written in the hardware description

VI. EXPERIMENTS The phase detection properties of the ncPFD have been verified experimentally with a test chip. The test chip is a line receiver for serial data that utilizes several parallel samplers to receive bit rates of 2.0 Gb/s [7]. The phase detector was used in a delay-locked loop (DLL) which generates control signals for the sampling switches used in the line receiver. The ncPFD, Fig. 3(a), was used as a -rad phase detector and the delay line was half a wavelength long. The skew between the reference and slave signals is not possible to measure directly. This quantity has been measured indirectly through measurement error compensation circuits to be about 125 ps at MHz. Unfortunately, there is no control of how large the measurement error is. The circuit blocks used to measure the offset are shown in Fig. 15. The two clocks that we want to compare come from the beginning and the end of the delay line. They are fed into two matched inverter chains where the propagation delay for rising and falling edges are matched against process variations [8]. The delay from the multiplexer inputs to the oscilloscope screen for the two signal paths are not matched. Two measurements are done to compensate this. One where the delay line input signal goes uninverted through Output buffer 1 and one where the same signal goes inverted through the Output buffer 2. The measured skew including the measurement error

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

299

Fig. 15. DLL, phase offset measurement circuitry, and NMOS transistor to access the control voltage. Fig. 16. Oscilloscope screen dump of the drain voltage of an NMOS transistor with external pull-up resistor where the gate is connected to the control voltage. Four different lock-in procedures are shown. The initial control voltages are 0.0, 1.0, 2.0, and 3.0 V for the curves from top to bottom, respectively.

for the measurements will be as follows: skew skew

inv inv inv inv

mux mux mux mux

Buf Buf Buf Buf

(1) VII. CONCLUSIONS (2)

where is the real skew and inv and inv are the delays through the four inverters’ long chains for falling and rising edges through the left and right chain, respectively. Similarly, inv and inv are for the five inverters’ long chains. And mux and mux are the delays through the multiplexers. The Buf and Buf are the delays through the output-buffers and through the oscilloscope input-channels. The sum of the skews (1) and (2) is skew

skew

inv inv

inv inv

(3)

Note that the expression is independent of the mux and Buf delays. Hence, theoretically, if the rise and fall delays of the inverter chains are matched properly, there will not be any measurement error. In Fig. 16 an oscilloscope screen dump with four lock-in procedures is shown. The signal is the drain voltage of an NMOS transistor with an external pull-up resistor and with the gate connected to the control voltage as shown in Fig. 15. The lock-in time is less than 200 s. Ideally, the control voltage should go monotonically to the equilibrium voltage. Therefore, the beating in the lock-in procedure when the initial control voltage is 3.0 V is unexpected. The reason for this is unknown.

A new PFD without a dead zone has been proposed. The circuit topology is simple and has no feedback loops. Simulation results indicate that the circuit can operate up to 800 MHz in 0.8- m CMOS with a 5-V supply. The detector’s phase offset depends on the duty cycle of the inputs. Measurements have been performed on the detector when it was used in a DLL as a phase detector and the functionality was verified. REFERENCES [1] R. E. Best, Phase-Locked Loops, 2nd ed. New York, NY: McGrawHill, 1993. [2] N. H. E. Weste and K. Eshragrian, Principles of CMOS VLSI Design, 2nd ed. Reading, MA: Addison Wesley, 1993. [3] H. Kondoh, H. Notani, T. Yoshimura, H. Shibata, and Y. Matsuda, “A 1.5-V 250-MHz to 3.0-V 622-MHz operation CMOS phase-locked loop with precharge type phase-detector,” IEICE Trans. Electron., vol. E78-C, no. 4, pp. 381–388, Apr. 1995. [4] P. Larsson and C. Svensson, “Skew safety and logic flexibility in a true single phase clocked system,” in Proc. IEEE Int. Symp. Circuits Syst., 1995, pp. II:941–944. [5] Mentor Graphics, Explorer Lsim User’s Manual. Mentor Graphics Corp., 1992. [6] Mentor Graphics, M Language User’s Guide. Mentor Graphics Corp., 1991. [7] H. O. Johansson, J. Yuan, and C. Svensson, “A 4 Gsamples/s LineReceiver in 0.8 m CMOS,” in Proc. Symp. VLSI Circuits, 1996, pp. 116–117. [8] M. Shoji, “Elimination of process-dependent clock skew in CMOS VLSI,” IEEE J. Solid-State Circuits, vol. SC-21, pp. 875–880, Oct. 1986.

1654

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001

Rotary Traveling-Wave Oscillator Arrays: A New Clock Technology John Wood, Terence C. Edwards, Member, IEEE, and Steve Lipa, Student Member, IEEE

Abstract—Rotary traveling-wave oscillators (RTWOs) represent a new transmission-line approach to gigahertz-rate clock generation. Using the inherently stable LC characteristics of on-chip VLSI interconnect, the clock distribution network becomes a low-impedance distributed oscillator. The RTWO operates by creating a rotating traveling wave within a closed-loop differential transmission line. Distributed CMOS inverters serve as both transmission-line amplifiers and latches to power the oscillation and ensure rotational lock. Load capacitance is absorbed into the transmission-line constants whereby energy is recirculated giving an adiabatic quality. Unusually for an LC oscillator, multiphase (360 ) square waves are produced directly. RTWO structures are compact and can be wired together to form rotary oscillator arrays (ROAs) to distribute a phase-locked clock over a large chip. The principle is scalable to very high clock frequencies. Issues related to interconnect and field coupling dominate the design process for RTWOs. Taking precautions to avoid unwanted signal couplings, the rise and fall times of 20 ps, suggested by simulation, may be realized at low power consumption. Experimental results of the 0.25- m CMOS test chip with 950-MHz and 3.4-GHz rings are presented, indicating 5.5-ps jitter and 34-dB power supply rejection ratio (PSRR). Design errors in the test chip precluded meaningful rise and fall time measurements. Index Terms—Clocks, MOSFET oscillators, phase-locked oscillators, phased arrays, synchronization, timing circuits, transmission line resonators, traveling-wave amplifiers.

Researchers have therefore looked to alternative oscillator mechanisms for better phase stability and lower power consumption. Previous transmission-line systems such as salphasic distribution [6], distributed amplifiers [7], and adiabatic LC resonant clocks [8] provide only a sinusoidal or semisinusoidal clock, making fast edge rates difficult to achieve. This paper introduces the rotary traveling-wave oscillator (RTWO); a differential LC transmission-line oscillator which produces gigahertz-rate multiphase (360 ) square waves with low jitter. Extension of the RTWO to rotary oscillator arrays (ROAs) offers a scalable architecture with the potential for low-power low-skew clock generation over an arbitrary chip area without resorting to clock domains. Simulations predict rise and fall times of 20 ps on a 0.25- m process and a of the integrated maximum frequency limited only by the circuit technology used. Experiments show that although the RTWO operates differentially, careful attention is required to guard against magnetic field couplings between the clock conductors and other structures if the potential performance of these oscillators is to be realized. II. CONCEPT OF THE ROTARY CLOCK OSCILLATOR

I. INTRODUCTION

A. Fundamentals and Structures

C

LOCKING at gigahertz rates requires generators with low skew and low jitter to avoid synchronous timing failures. The notion of a “clocking surface” becomes untenable at gigahertz rates [1], frequently mandating that large VLSI chips are subdivided into multiple clock domains and/or utilize skew-tolerant multiphase circuit design techniques [2]. Techniques such as distributed phase-locked loops (PLLs) [3] and delay-locked loops (DLLs) [4] can control systematic skew to within 20 ps, but are complex, introduce random skew (i.e., jitter), and have area penalties. H-tree distribution systems, while simple, are difficult to balance and can use upwards of 30% of a chip’s total power budget [5]. All these systems are inherently single-phase, induce large amounts of simultaneous switching noise, and can be highly susceptible to this noise. Manuscript received March 20, 2001; revised June 28, 2001. This work was supported by Multigig Ltd., and also supported in part by the National Science Foundation under Award EIA-31332. J. Wood is with MultiGig, Ltd., Northampton NN8 1RF, U.K. (e-mail: [email protected]). T. C. Edwards is with Engalco, Huntington, YO32 9NY, U.K. (e-mail: [email protected]). S. Lipa is with the Microelectronics Systems Laboratory, North Carolina State University, Raleigh, NC 27695 USA. Publisher Item Identifier S 0018-9200(01)08220-8.

The basic ROA architecture is shown in Fig. 1. A representative multigigahertz rotary clock layout has 25 interconnected RTWO rings placed onto a 7 7 array grid. Each ring consists of a differential line driven by shunt-connected antiparallel inverters distributed around the ring. This arrangement produces a single clock edge in each ring which sweeps around the ring at a frequency dependent on the electrical length of the ring. Pulses are synchronized between rings by hard wiring which forces phase lock. Fig. 2 illustrates the theory behind the individual RTWO. Fig. 2(a) depicts an open loop of differential transmission line (exhibiting LC characteristics) connected to a battery through an ideal switch. When the switch is closed, a voltage wave begins to travel counterclockwise around the loop. Fig. 2(b) shows a similar loop, with the voltage source replaced by a cross-connection of the inner and outer conductors to cause a signal inversion. If there were no losses, a wave could travel on this ring indefinitely, providing a full clock cycle every other rotation of the ring (the Möbius effect). In real applications, multiple antiparallel inverter pairs are added to the line to overcome losses and give rotation lock. Rings are simple closed loops and oscillation occurs spontaneously upon any noise event. Unbiased, startup can occur in

0018–9200/01$10.00 © 2001 IEEE

WOOD et al.: ROTARY TRAVELING-WAVE OSCILLATOR ARRAYS

1655

Fig. 3. Waveforms of line voltage and line current for the 3.4-GHz clock simulation example.

B. Waveforms Fig. 3 shows simulated waveforms of a 3.4-GHz RTWO taken at an arbitrary position on the ring. The design has the following characteristics for reference: Fig. 1. phase.

Basic rotary clock architecture. The

= signs denote points with same

Fig. 2. Idealized theory underlying the RTWO. (a) Open loop of differential conductors to a battery via a switch. (b) Similar loop but with the voltage source replaced by the inner and outer conductors cross-connected.

either rotational sense—usually in the direction of lowest loss. Deterministic rotation biasing mechanisms are possible, e.g., directional coupler technology or gate displacement [9]. Once a wave becomes established, it takes little power to sustain it, because unlike a ring oscillator, the energy that goes into charging and discharging MOS gate capacitance becomes transmission line energy, which is recirculated in the closed electromagnetic path. This offers potential power savings as losses are not related but rather to dissipation in the conductors where to can be reduced, e.g., by adoption of copper metallization.

m • Conductors: Width m • Pitch m • Ring Length • Metallization: 1.75 m copper nH • Loop inductance total • Process: 0.25- m CMOS • Nch total width: 2000 m • Pch total width: 5000 m • Number of inverters: 24 pairs. Very large distributed transistor widths give substantial capacitive loading to the lines, thus lowering velocity to give a reasonably low clock rate from a compact oscillator structure. In application, up to 75% of this capacitance can come from load capacitance, reducing the size of the drive transistors accordingly. The upper traces of Fig. 3 show the simulated voltage waveforms on the differential line at points labeled A0, B0. The lower traces show the current in the conductors to be 200 mA, while the supply current is simulated at 84 mA with 4.5 mA of ripple. This clearly illustrates that energy is recycled by the basic operation of the RTWO. Just driving the 34 pF of capacitance ). present would require 275 mA at this frequency (from C. Phase Locking Interconnected rings, as in Fig. 1(a), will run in lockstep, ensuring that the relative phase at all points of an ROA are known. It is possible to use a large array of interconnected rings to distribute a clock signal over a large die area with low clock skew. For example, referring to Fig. 1(a), all the points marked with have the same relative phase as that arthe equals sign bitrarily marked as 0 . At any point along the loop, the two signal conductors have waveforms 180 out of phase (two-phase

1656

Fig. 4. Voltage, current, and phase relationships versus rotation direction (Poynting’s vector).

nonoverlapping clock). A full 360 is measured along the complete closed path of the loop. In principle, an arbitrary number of clock phases can be extracted. Phase advances or retards depend on the direction of rotation, and Fig. 4 shows the current–voltage relationships for clockwise and counterclockwise rotation.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001

Fig. 5. Three-dimensional view of the structure. The two differential lines are shown, with current flow arrows (main and charge/boost) and encircling H-fields. CMOS transistors are also shown complete with supply voltages (V and V ) and both p- and n-channels.

D. Network Rules Although the square-ring shape is convenient to show diagrammatically, it is only one example of a more general network solution which requires ROAs to conform closely to the following rules. 1) Signal inversion must occur on all (or most) closed paths. 2) Impedance should match at all junctions. 3) Signals should arrive simultaneously at junctions. From 1) above, any odd number of crossovers are allowed on the differential path and regular crossovers forming a braided or “twisted pair” effect can dramatically reduce the unwanted coupling to wires running alongside the differential line. The differential lines would typically be fabricated on the top metal layer of a CMOS chip where the reverse-scaling trend of VLSI interconnect offers increasingly high performance [10]. E. Fields and Currents Fig. 5 illustrates a three-dimensional section of the ring structure connected to a pair of CMOS inverters expanded to show the four individual transistors. The main current flow in the differential conductors is shown by solid arrows, the magnetic field surrounding these conductors by dashed loops, and the capacitance charge/signal-boost current flowing through the transistors by dashed lines. An important feature of differential lines is the existence of a well-defined “go” and “return” path which gives predictable inductance characteristics in contrast to the uncertain return-current path for single-ended clock distribution [11]. Capacitance arises mainly from the transistor gate and depletion capacitance and interconnect capacitance does not dominate. indicates intrinsic gate resistance, i.e., the ohmic path implies a through which the gate charge flows. The term parasitic gate term, but in reality, most of this resistance is in the series circuit of the channel under the gate electrode. This is shared by the D-S channel, as illustrated by the triangular region (shown with transistors operating in the pinchoff region).

Fig. 6. Expanded view of short sections of the transmission line, including three sets of back-to-back inverters as a wavefront passes.

F. Coherent Amplification, Rotation Locking Fig. 6 is an expanded view of a short section of transmission line with three sets of back-to-back inverters shown. It is assumed that startup is complete and the rotating wave is sweeping left to right. For this analysis, we view the inverter pairs as discrete latch elements. Each latch switches in turn as the incident signal, traveling on the low impedance transmission line, overrides the ON resistance of the latch and its previous state. This “clash” of states occurs only at the rotating wavefront and therefore only one region is in this cross-conduction condition at any one time. The transmission-line impedance is of the order of 10 and the differential on-resistance of the inverters is in the 100- –1-k range, depending on how finely they are distributed throughout the structure. Once switched, each latch contributes for the remainder of the half cycle, adding to the forward-going signal. Coherent buildup of switching events occurs in this forward direction only. An equal amount of energy is launched in the reverse direction, but the latches in that direction cannot be switched further into the state to which they have already switched. The reverse-traveling components simply reduce the amount of drive required from those latches. Importantly, it is the nonlinear latching action which is responsible for the self-locking of direction (a highly linear amplifier has no such directionality). To clarify the above statements, Fig. 7 demonstrates how a large CMOS latch responds to an imposed differential signal. The curve trace shows a central differential-amplification region bounded by two absorptive ohmic regions (shaded) corre-

WOOD et al.: ROTARY TRAVELING-WAVE OSCILLATOR ARRAYS

1657

Fig. 7. DC transfer characteristic of two back-to-back inverters to an imposed differential signal. (a)

sponding to the two latched states. Except at the wavefront location where amplification takes place, the ring structures will be terminated ohmically to the supplies. The four-transistor “full-bridge” circuit minimizes supply current ripple to the cross-conduction period. G. Frequency and Impedance Relations In simulation models (and indeed as fabricated), the RTWO transmission line is built up from multiple RLC segments, and therefore, these primary line constants must be identified. Fig. 8(a) is the basic RF macromodel of a short length (SegLen) of RTWO line with all significant RF components and parasitics annotated (as per Fig. 5). Suffixes identify per-unit-length perlen, lumped lump and total (or loop) values. segments connected together, plus a crossover, There are to produce a closed ring of length RingLen. Fig. 8(b) is a capacitive equivalent circuit for the transistor and load capacitances. AC0 indicates an ac ground point ( and ). of one such segThe differential lumped capacitance ment is given approximately by

(b) Fig. 8. Development of the rotary clock model. (a) Complete RF circuit. (b) Capacitance circuit.

where conductor separation; conductor width; conductor thickness. The phase velocity is given by where

(3) For heavily loaded RTWO structures, can be as low as 0.03 m/s). of (where is the free space velocity, i.e., The clock frequency is given approximately by

RingLen (1) where interconnect capacitance for the line AB; gate overlap and Miller-effect feedback capacitance; total channel capacitance; drain depletion capacitance to bulk (substrate); load capacitance added to a line. is used to convert the in-parallel “to ground” (Note that the values into in-series differential values of capacitance.) is usually a small part of total capacitance and accurate formulas are available [12] if needed. To calculate the per-unit-length differential inductance, i.e., accounting for mutual coupling, we use [13], expressed below. (2)

SegLen

(4)

(The 2 factor arises from the pulse requiring two complete laps for a single cycle.) Differential characteristic impedance is given by (5) Transmission line characteristics dominate over RC characteristics when [14] (6) H. Bandwidth and Power Consumption Seen from an RF perspective, Fig. 8(a) shows the RTWO to be two push–pull distributed amplifiers folded on top of each other. Distributed amplifiers exhibit very wide bandwidth because parasitic capacitances are “neutralized” by becoming part

1658

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001

TABLE I CHANGES OF CHARACTERISTICS WITH

(a)

N

(b)

(c)

Fig. 9. A four-port junction of two RTWO rings carrying anticlockwise signals, with a noncoincident signal arrival time.

of the transmission-line impedance [15]. Performance is limited by the carrier transit time of the MOSFETs [16], not by the tra, which is not apditional digital inverter propagation time plicable where gates and drains are driven cooperatively by an imposed low-impedance signal, and where the load capacitance is hidden in the transmission line. Operation of the RTWO is largely adiabatic when the voltage drop required to charge the capacitances is developed mainly across the inductance:

Most of the remaining losses in Table I are attributed to crossconduction and parasitic losses. is a real loss mechanism for gigahertz signals, and RTWO rise/fall times can be doubled improves by this phenomenon. In newer CMOS processes, with shorter channel length.

III. MORE DETAILED CONSIDERATIONS A. Skew Control

(7) and when the intrinsic gate resistance is low relative to the reactance of the gate capacitance. (8) RTWO rise and fall times are controllable by setting the cutoff frequency of the transmission lines. (9) Edges become faster and cross-conduction losses are reduced when the structure is more distributed. , where Table I lists characteristic changes with with , and held constant. The most significant power loss mechanism for the RTWO is power dissipated in the interconnect, given by (10)

Interconnected RTWO loops offer the potential to control skew in spite of relatively large open-loop time-of-flight mismatches. Functionally, phase averaging occurs by pulse combination at the junction of multiple transmission lines. For a four-port junction, the normal operating mode will see two pulses arriving at the junction simultaneously. These two sources will feed two output ports and signal flow will be unimpeded by reflections if impedance is matched. This amounts to a situation similar to that described in [17], [18], although for ROAs, the mechanism is LC transmission-line energy combination, not ohmic combination of CMOS inverter outputs. Where there exists a time-of-flight mismatch, one pulse arrives at the junction before the other. Fig. 9(a) depicts the operation of a four-port junction between of two interwired but velocity-mismatched RTWO loops. Each of these rings has been (each as Fig. 8). Four divided into segments numbered rings are wired together (similar to Fig. 16, shown later). Only and are considered here; the junction of the rings the latter having a higher open-loop operating frequency.

WOOD et al.: ROTARY TRAVELING-WAVE OSCILLATOR ARRAYS

1659

Fig. 11. Segment of chip layout showing 90 routing beneath clock lines and a tap to clock (CLK: CLK) loads.

Fig. 10.

Waveforms corresponding with Fig. 9.

From simulation, two pulse-combination effects appear to be present, the simplest of which is the impedance match effect where the first signal to arrive at a junction must try to drive three transmission lines. If all ports have equal impedance, the junction can only reach a quarter of the full signal value and a reflection occurs driving an inverted signal back down the incident port [Fig. 9(b)]. Initially, detrimental effects on signal fidelity arising from this reflection are overcome when the other pulse arrives, whereupon the pulses combine and branch into the output ports, as shown in Fig. 9(c). The second pulse combination effect is believed to be due to nonlinear MOSFET drain capacitance, which can modulate the velocity of the line. Reflections can drive the MOSFETS from the ohmic state into the low-capacitance pinchoff region, locally increasing velocity. Quantitative Results From Simulation: Fig. 10 presents the results of a SPICE simulation of the above situation with an extreme condition of velocity mismatch. A 50% variation of oxide thickness is modeled across a small 2.4 2.4 mm chip having four interconnected rings. Thick oxide (lower ) devices are on the right side of the chip, giving a 22.5% phase velocity increase relative to the left side. Looking at these results with reference to Fig. 9 reveals and passes point that the first pulse arrives from ring at time ps and begins its rise time. Within A this rise time, the leading edge reaches the nearby junction, where negative reflections bounce back to momentarily prevent passing through the 1.5-V level. A The second pulse arrives from the slower left-hand ring , reaching point B at approximately ps. It then combines with the first pulse at the junction to branch into the two output ports without further reflections. ps, the signals have reached points A and By and are essentially coincident—forward progress of the B and are now synchronized. waves in rings

The phase-locking phenomenon occurs at every junction of the array (not just the junction considered here) and twice per oscillation cycle which accounts for the smaller than expected initial skew seen between the rings. Simulations of typical arrays show that lockup is achieved within a few nanoseconds from powerup after signals settle into the lowest-energy state of coherent mesh. B. Coupling Issues Related to Layout The induced magnetic fields from the rotary clock structures is relatively high (square can be strong. This is because waves). The magnetic coupling coefficient, however, depends on the angle between source and victim and falls to zero when the angle becomes 90 . Fig. 11 illustrates a 90 layout technique to minimize inductive coupling problems. The top metal M5 (running left to right) is used to create the differential RTWO, while orthogonal M4 is used as a routing resource for busses into and out of areas bounded by the clock transmission line. For capacitive coupling, fast rise and fall times imply high displacement currents and a potentially aggressive noise source. Differential transmission lines tend to mitigate such effects [19], and in Fig. 11, the total capacitive coupling area between each of the transmission-line conductors and any M4 conductor is balanced. If the clock source were ideally differential, no net charge would be coupled to the M4 wires. For the RTWO, distributed inverters force the waveforms to be substantially differential and nonoverlapping, keeping glitches below the sensitivity of a typical gate. For the five-metal test chip (Section V), a 45% utilization of M4 was used for the 90 routing pattern immediately underneath the RTWO rings. This coverage allows the M4 to act as both a routing resource and as an electrostatic shield similar to [20], preventing electrostatic coupling to signal lines further below. Magnetic fields are not attenuated much by this configuration, because the spaces between the thin perpendicular M4 lines break up the circulating currents which could repel a magnetic field. Substrate magnetic fields [21] are, therefore, to be expected.

1660

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001

Coupling to co-parallel (0 ) victim conductors is potentially much more problematic (discussed later in Section IV-C). C. Tapoff Issues and Stub Loadings It is possible to “tap into” the ROA structure (Fig. 11) anywhere along its length and extract a locally two-phase signal with known phase relationship to the rest of the network. This signal can then be routed via a fast differential transmission line to other circuits and will generally represent a capacitive stub on the RTWO ring. For minimum signal distortion, the round-trip time-of-flight (forward and backward along the stub) must be much less than the rise time and fall time of the clock waveform:

Fig. 12.

Signal at either end of a 2-pF total tap loading line.

(11) D. Frequency/Impedance Adjustment When the above condition is met, the capacitance can be taken as being effectively lumped on the main RTWO ring at the tap point for the purposes of predicting oscillator frequency and ring impedance. Although not immediately apparent, this condition is achievable in practice due to three factors. The first factor is that the tap line velocity is relatively fast for SiO dielectric. It is ap, while the main RTWO oscillator ring might proximately . The second factor is that the be operating at perhaps tap length only has to be long enough to reach within a single RTWO ring. The third factor is that it requires two signal rotations on the RTWO to complete a clock cycle. These three factors work together to make the RTWO rings physically small compared to the expected speed-of-light dimensions. The distances to be spanned by the fast tap wires are therefore short enough that transmission-line effects on these lines are unimportant—certainly at the clock fundamental frequency and even at higher harmonics. This can be illustrated by reference to a specific 3.4-GHz RTWO, 3200 m long with 20-ps rise/fall times. Within one of these rise or fall periods, a stub transmission line with velocity is able to communicate a signal over a distance of 3 mm. For a stub length of 400 m (to reach the center of the ring), this equates to 3.75 round-trip times along the stub. Fig. 12 shows simulated waveforms with 2 pF of total to-ground capacitance at the end of one such stub. Reflected energy gives rise to the ringing which is evident with this level of capacitance. The line resistance of the stubs must be low to maintain reflective energy conservation. The ratiometric factors outlined above between ring length, frequency, rise/fall time, and stub lengths are expected to hold as ROAs are scaled to higher frequencies and smaller ring lengths without requiring special stub tuning measures. Capacitive Loading Limits: Substantial total-chip capacitive loading can be tolerated by the RTWO relative to conventionally resonant systems [8], [22], [23]. However, the loading effects of interconnect, active, and stub capacitances cannot be increased without limit. The consequential lowering of line impedance inlosses become a concern. creases circulating currents until Eventually, the impedance becomes so low relative to the loop resistance that the relation (6) cannot be maintained, whereupon oscillation ceases altogether.

Rewriting (4) in the form below shows that frequency is set only by the total inductance and capacitance of the RTWO loop. (12) is proportional to RingLen and Total loop inductance varies strongly as a function of the width and pitch of the top metal differential conductors. This allows a coarse frequency selection through the top-metal mask definition. Unit-to-unit inductance variation is expected to be small because of the good lithographic reproduction of the relatively large clock conductors and the weak sensitivity of inductance to metal thickness variations. for the RTWO is the sum of all Total capacitance tends to lumped capacitances connected to the loop (1). from the drive be dominated by gate-oxide capacitance is inversely proportional FETs and the clock load FETs. , which on a modern CMOS SiO to gate-oxide thickness is controlled to approximately 5% variation over extended wafer lots [24]. Drain depletion capacitances exist on bulk CMOS where the active transistors connect to the ring. During the VLSI layout phase, a CAD tool (expected release: Q1 2002) can target a fixed operating frequency. The tool will be able to correct impedance discontinuities caused by lumped load capacitance by the addition of dummy “padding” capacitance elsewhere around the loop, and postcompensate an overly capacitive-loaded clock network by reducing the differential inductances through pitch reduction—hence restoring velocity and thus frequency. Alternatively, at the expense of using more metallization, a new layout with more numerous, shorter length rings could be used. The tool will need to simultaneously solve impedance matching issues [refer to Section II-A, (5)]. By manipulation of both and simultaneously, it is possible to control and independently, as shown diagrammatically in Fig. 13. For example, velocity can be reduced by increasing by the same factor to cancel the effect on . both and These adjustments can support arbitrary branch-and-combine networks (at least in theory). Post fabrication, adding together the sources of variation and and , a 5% inigiven that frequency is related to tial tolerance of operating frequency between parts is expected.

WOOD et al.: ROTARY TRAVELING-WAVE OSCILLATOR ARRAYS

1661

IV. SIMULATED PERFORMANCE A. Approach

Fig. 13. Differential line with varing trace separations and capacitive inverter loadings indicating the effects of altering several parameters.

Matching within a die should be better, but temperature gradients and transistor size variations as they affect capacitance will lead to phase velocity changes requiring correction by the Skew Control mechanism (described in Section III-A). Temperature can alter frequency through variation of and . Inductance variation is assumed to be negligible compared to capacitance variation and is not considered. Gate, but for oxide thickness variation could potentially affect SiO dielectric, with properties similar to quartz, this can be ignored. More significant are temperature variations of drain de. pletion capacitance and of transistor To tune an ROA clock to an exact reference frequency, allowing limited “speed-binning” and reduced internal phase mismatches, closed-loop control of distributed switched capacitors [9] or varactors [25] is envisaged. E. Active Compensation for Interconnect Losses Resistive interconnect losses make it difficult to communicate high-frequency clock signals over a large chip without waveshape distortion and attenuation, which impacts on the practicality of reflective energy conservation schemes [6], [22], [23]. The skin effect loss mechanism has been evident in clock tree conductors for some time [26] and is frequency dependent. High-speed H-trees tend to use hierarchical buffers within the trees to maintain amplitude and edge rates. Active compensation of VLSI differential transmission lines to overcome clock attenuation was shown by Bußmann and Langmann [27] to be applicable to sine-wave signals. Shunt-connected negative impedance convertors (NICs) were used with linear compensation to prevent oscillations. The distributed inverters used within RTWOs afford active compensation for transmission-line losses, raising the apparent of the resonant rings and helping to maintain a uniformly high clock amplitude around the structure. F. Logic Styles Two-phase latched logic [28] is the style most compatible with RTWO. It is highly skew tolerant and through dataflowaware placement [27] offers the potential to exploit the full 360 of clock phase to reduce clock-related surging [29], which in future systems could exceed 500 A [30]. Conventional singlephase D-latch designs can be driven where timing improvements through skew scheduling [31] might be possible. A locally four-phase system to support domino logic [2] could be implemented by wrapping two loops of RTWO line around the region to clock. Unfortunately, all of these techniques are beyond the capability of current logic synthesis tools.

To enable rapid “what-if” evaluation of potential RTWO structures, a simulation/visualization program known as Rotary Explorer [32] has been developed. Rotary Explorer is GUI driven and parametrically creates a SPICE deck of macromodels linking to FASTHENRY subcircuits [33] for multipole magnetic analysis of skin, proximity, and LR coupling effects in the time domain. MOSFETs are modeled using BSIM3v3 nonquasi-static model with an external resistor added to model (Fig. 8). The BSIM4 model [34], which properly accounts as a D-S channel component, was not available. for With the Rotary Explorer program, it is possible to simulate arrays. The RTWO rings independently or as interlocked effects of tap loads, oxide thickness variations, and magnetically induced “victim” noise can be evaluated. As a visualization aid, Rotary Explorer gives a “live” display of color-coded SPICE voltages projected onto a scaled image of the ROA structure being simulated. This aids in the intuitive understanding of reflections and how the structure achieves a steady-state phase-locked operation. B. Results Two very important performance metrics for any oscillator are its sensitivity to changes in temperature and supply voltage. Simulations of these effects on a nominally 3.34-GHz rotary clock resulted in the data given in Tables II and III. Supply Induced Jitter: Following on from the above and in light of the RTWO’s time-of-flight oscillation mechanism, it is inferred that such voltage sensitivity will also apply to phase modulation versus voltage, i.e., jitter—at least at low supplynoise frequencies. For a single RTWO ring, the power-supply and the power-supply induced jitter will be related to rejection ratio (PSRR) by (13) , because of the distributed nature of the oscillator, where is the mean supply voltage deviation as experienced along the path of an edge as it travels two complete rotations. To improve PSRR, plans are in place to add voltage-dependent capacitance to the structure to give first-order compensation. From simulations, we see that jitter reduces for multiple ring structures due to averaging effects. C. Coupling II—Simulated Coupling The Rotary Explorer program makes it easy to simulate coupled noise between an RTWO ring and user defined victim trace (drawn with the aid of a mouse). Simulated results are shown in Table IV for a 3.4-GHz RTWO configured to have 20 ps rise and fall times, and with geometry as shown in Fig. 14. Peak coupling magnitude occurs at 60- m victim length. A trace longer than this will see a coupling cancellation effect that approaches zero for each pitch of the braiding it traverses. Fig. 15 illustrates a notably strong coupled signal waveform m, with no loading on the victim at victim distance

1662

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001

TABLE II VARIATIONS WITH TEMPERATURE

TABLE III VARIATIONS WITH DC SUPPLY VOLTAGE V

TABLE IV INDUCED NOISE AS A FUNCTION OF VICTIM DISTANCE AND LENGTH

Fig. 14.

Crossover traces, a visualization output from the Rotary Explorer tool.

trace and one end connected to ground. Note the more sensitive noise scale. The absolute maximum coupling occurs if victim distance is allowed to go to zero. In this case, mutual coupling between aggressor and victim is 100% with no cancellation effects from the other differential trace. As a numerical example, it follows that a 2.5-V signal with a rise time of 20 ps on a transmission line has the 2.5-V gradient over 430 m of with a velocity of length (Fig. 4 illustrates the concept). Over the 60- m length discussed above, this equates to 348 mV. Slower edge rates, faster transmission lines, and lower supply voltages reduce this figure proportionally. Long-range inductive noise coupling from the differential transmission line is expected to be small, since (from a distance) the ‘go’ and ‘return’ currents are equal and opposite. Potential problems exist in short-range magnetic coupling to wiring in the vicinity of the clock lines. Inductance is lowered

Fig. 15.

Example of notably strong coupled signal waveform.

by coupling to any highly conductive structure in which eddy currents can flow to decrease and distort the inducing field. Couplings to less conductive circuits such as the substrate give a loss mechanism which can be modeled as a shunt term in the transmission-line equations. LC resonance in the small-scale coupled structures is unlikely because of the high resonant frequencies. All of the coupling mechanisms mentioned are edge-rate dependent, and this can limit the achievable rise and fall times of the RTWO by attenuating the high-frequency signal components. Full RLC layout extraction is essential in the neighborhood of the clock lines if routing is allowed in these areas. An alternative proposal under investigation is to predefine a VLSI structure combining clock and power distribution into the same grid to give consistent characteristics and shielding.

WOOD et al.: ROTARY TRAVELING-WAVE OSCILLATOR ARRAYS

1663

Fig. 18. Clock frequency versus V for the entire chip with all five rings.

Fig. 16.

versus V

Die photograph of a prototype chip.

Fig. 19.

Fig. 17. ring.

for the large ring and I

Measurement versus simulation waveforms for the large 965-MHz

V. SOME EXPERIMENTAL RESULTS Fig. 16 shows a die photograph of a prototype built using a 0.25- m 2.5-V CMOS process with 1- m Al/Cu top metal M5. The conductors are relatively wide in order to minimize resistive losses of the rather thin M5. The available top-metal area consumed by the transmission lines was 15%. A general feature of the RTWO and ROA is that power can be reduced by increasing the metal area devoted to clock generation. The simple substitution of copper metallization could halve the width of the lines for the same power consumption. The prototype features a large ring independent of four interconnected smaller rings. The 12 000- m outer ring uses 60- m conductors on a 120- m pitch, with 128 62.5- m/25- m inverter pairs distributed along its length. For the large ring, simulations predicted a clock frequency of approximately 925 MHz. Measurements of the actual perforV are shown in Fig. 17. mance versus simulated with The oscillation frequency was 965 MHz. Jitter was measured at 5.5 ps rms using a Tektronix 11 801A oscilloscope with an SD-26 sampling head. The slower than simulated rise-time discrepancy is believed to be due to the large extrinsic gate electrode resistance on the Pch FETs. At design time, the importance of this parameter was

Measured output on one of the 3.42-GHz rings.

overlooked. Transistors are now laid out according to RF design rules with the gate driven from both sides of the device. versus Fig. 18 shows that the oscillation frequency is quite flat over a large . We calculate from the measured slope that PSRR is approximately 34 dB for oscillators fabricated on this process. The oscillator was seen to be functional down to 0.8-V supply voltage, although 1.1 V was required to initiate startup. The test chip incorporates 15 pF of on-chip decoupling capacitance per ring. No off-chip decoupling was required. Effectively, the equivalent of ten single-ended lines each having 10 impedance were active, but simultaneous switching surges are low because of the distributed switching times of the inverters. The quad of inner rings each have the following characteristics: m • Conductors: Width m • Pitch m. • Ring Length Total channel widths are 2000 m for the Nch FET and 5000 m for the Pch FET spread over 40 pairs of inverters. Fig. 19 shows the measured waveform from one of the 3.4-GHz rings. The oscillation frequency is 3.38 GHz versus a simulated frequency of 3.42 GHz. However, the waveshape is disappointingly distorted, the amplitude is low, and even-mode artifacts are visible. Investigation of the fault identified a ‘co-parallel’ (0 ) inductive coupling problem between the clock signal lines and and supply traces running directly beneath on M3 for the complete loop length. Only when a complete FASTHENRY

1664

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001

analysis was performed including these power traces was it apparent that induced current loops (circulating through the decoupling capacitors) were strongly attenuating the rotary signal. In this condition, the latching action (Fig. 7) does not fully develop and the rings support linear amplification of noise signals—hence the problematic multimode action. (This effect was much less severe on the large 965-MHz ring because the lines were much closer to the magnetically neutral center line of the transmission line). The problem can be mitigated by use of braided transmission lines. (as detailed in Section IV-C). Analysis of the test chip showed that 90 coupling between M5 and the orthogonal thin M4 lines is not a significant problem, making it possible to route power and signals between regions bounded by the rotary clock structures. VI. CONCLUSION AND FURTHER WORK PLANNED This paper has described the rotary traveling-wave oscillator (RTWO) and its potential application to gigahertz-rate VLSI clocking. The oscillator is unique for a resonant-style LC-based oscillator in that it produces square waves directly and can be hardwired to form rotary oscillator arrays (ROAs). Being LC-based, the oscillator is stable and jitter is low. The formulas presented here give practical adiabatic oscillator designs suitable for VLSI fabrication. The structure and operation of the RTWO is fundamentally simple and amenable to analysis. We find that agreement between simulation and measurement is good. We need to demonstrate skew control (believed to be inherent) to fully establish that the simulated performance of multiring ROAs is realizable, and to measure susceptibility to induced high-frequency noise. Further work is planned to establish firm mathematical/analytical foundations for the prediction of both jitter and skew and to determine exact stability criteria for arrayed oscillators. Currently, a test chip using braided transmission line design to minimize coupling and incorporating varactors to control frequency is awaiting packaging and test. Looking to the future, our simulations predict that the oscillator scales well. On a more modern 0.18- m copper process, 10.5-GHz square-wave oscillator/distributors should be realizable consuming less than 32 mA per ring using slimmer 10- m conductors. From simulation, the RTWO also appears to be viable on SOI processes. ACKNOWLEDGMENT The authors would like to thank P. Franzon and M. Steer, both of North Carolina State University, for their assistance, and the Raunds and British public library service. REFERENCES [1] E. G. Friedman, High Performance Clock Distribution Networks. Boston, MA: Kluwer, 1997. [2] D. Harris, Skew Tolerant Circuit Design. San Mateo, CA: Morgan Kaufmann, 2000. [3] G. A. Pratt and J. Nguyen, “Distributed synchronous clocking,” IEEE Trans. Parallel Distributed Syst., vol. 6, pp. 314–328, Mar. 1995.

[4] S. Tam, S. Rusu, U. N. Desai, R. Kim, J. Zhang, and I. Young, “Clock generation and distribution for the first IA-64 microprocessor,” IEEE J. Solid-State Circuits, vol. 35, pp. 1545–1552, Nov. 2000. [5] C. J. Anderson et al., “Physical design of a forth-generation power GHz microprocessor,” in ISSCC 2001 Dig. Tech. Papers, Feb. 2001, pp. 232–233. [6] V. L. Chi, “Salphasic distribution of clock signals for synchronous systems,” IEEE Trans. Comput., vol. 43, pp. 597–602, May 1994. [7] B. Kleveland et al., “Monolithic CMOS distributed amplifier and oscillator,” in ISSCC Dig. Tech. Papers, Feb. 1999, pp. 70–71. [8] W. Athas, N. Tzartzanis, L. J. Svensson, L. Peterson, H. Li, X. Jiang, P. Wang, and W.-C. Liu, “AC-1: A clock-powered microprocessor,” in Proc. Int. Symp. Low-Power Electronics and Design, Aug. 1997, [Online] Available: http://www.isi.edu/acmos/people/nestoras/papers/97-08.MontereyAC1.ps. [9] J. Wood. PCT/GB00/00175. MultiGig Ltd.. [Online]. Available: http://www.delphion.com/cgi-bin/viewpat.cmd/WO00044093A1 [10] B. Kleveland, T. H. Lee, and S. S. Wong, “50-GHz interconnect design in standard silicon technology,” presented at the IEEE MTT-S Int. Microwave Symp., Baltimore, MD, June 1998, [Online] Available: http://smirc.stanford.edu/papers/mtts98p-bendik.pdf. [11] B. Kleveland, X. Qi, L. Madden 1, R. W. Dutton, and S. S. Wong, “Line inductance extraction and modeling in a real chip with power grid,” presented at the IEEE IEDM Conf., Washington, D. C., Dec. 1999, [Online] Available: http://gloworm.stanford.edu/tcad/pubs/device/iedm.pdf. [12] N. Delorme et al., “Inductance and capacitance analytic formulas for VLSI interconnect,” Electron. Lett., vol. 32, no. 11, May 23, 1996. [13] C. S. Walker, Capacitance, Inductance and Crosstalk Analysis. Norwood, MA: Artech, 1990, p. 95. [14] A. Deutsch et al., “Modeling and characterization of long on-chip interconnections for high-performance microprocessors,” IBM J. Res. Develop., vol. 39, no. 5, pp. 547–567, Sept. 1995. p. 549. [15] J. B. Beyer et al., “MESFET distributed amplifier design guidelines,” IEEE Trans. Microwave Theory Tech., vol. MTT-32, pp. 268–275, Mar. 1984. [16] Y. Tsividis, Operation and Modeling of the MOS Transistor, 2nd ed. New York: McGraw-Hill, 1999, pp. 339–340. [17] H. Larsson, “Distributed synchronous clocking using connected ring oscillators,” Master’s thesis, Computer Systems Engineering Centre for Computer System Architecture, Halmstad Univ., Halmstad, Sweden, Jan. 1997. [Online] Available: http://www.hh.se/ide/ccaweb/publications/97/distclock/9705.ps. [18] L. Hall, M. Clements, W. Liu, and G. Bilbro, “Clock distribution using cooperative ring oscillators,” in Proc. IEEE 17th Conf. Advanced Research in VLSI (ARVLSI’97), 1997, [Online] Available: http://www.computer.org/proceedings/arvlsi/7913/79130062abs.htm. [19] T. C. Edwards and M. B. Steer, Foundations of Interconnect and Microstrip Design, Chichester, U.K.: Wiley, 2000, ch. 6. sec. 6.11. [20] C. P. Yue and S. S. Wong, “On-chip spiral inductors with patterned ground shields for Si-based RF ICs,” IEEE J. Solid-State Circuits, vol. 33, pp. 743–752, May 1998. [21] C. P. Yue and S. S. Wong, “A study on substrate effects of silicon-based RF passive components,” in MTT-S Int. Microwave Symp. Dig., June 1999, pp. 1625–1628. [22] M. E. Becker and T. F. Knight Jr. Transmission line clock driver. presented at 1999 IEEE Int. Conf. Computer Design. [Online]. Available: http://www.computer.org/proceedings/iccd/0406/04060489abs.htm [23] P. Zarkesh-Ha and J. D. Meindl, “Asymptotically zero power dissipation Gigahertz clock distribution networks,” IEEE Electrical Performance and Electronic Packaging, pp. 57–60, Oct. 1999. [24] K. Bernstein, K. Carrig, C. M. Durham, and P. A. Hansen, High Speed CMOS Design Styles. Norwood, MA: Kluwer, 1998, p. 22. [25] T. Soorapanth, C. P. Yue, D. Shaeffer, T. H. Lee, and S. S. Wong, “Analysis and optimization of accumulation-mode varactor for RF ICs,” presented at the Symp. VLSI Circuits, Honolulu, HI, June 11–13, 1998, [Online] Available: http://smirc.stanford.edu/papers/VLSI98p-chet.pdf. [26] H. B. Bakoglu, J. T. Walker, and J. D. Meindl, “A symmetric clockdistribution tree and optimized high speed interconnections for reduced clock skew in ULSI and WSI circuits,” in IEEE Int. Conf. Computer Design, Oct. 1986, pp. 118–122. [27] M. Bußmann and U. Langmann, “Active compensation of interconnect losses for multi-GHz clock distribution networks,” IEEE Trans. Circuits and Syst. II, vol. 39, pp. 790–798, Nov. 1992. [28] M. C. Papaefthymiou and K. H. Randall, “Edge-triggering vs. two-phase level-clocking,” presented at the 1993 Symp. Research on Integrated Systems, Mar. 1993, [Online] Available: http://www.eecs.umich.edu/~marios/papers/sis93.ps.

WOOD et al.: ROTARY TRAVELING-WAVE OSCILLATOR ARRAYS

[29] L. Benni et al., “Clock skew optimization for peak current reduction,” J. VLSI Signal Processing, vol. 16, pp. 117–130, 1997. [30] International Semiconductor Roadmap for Semiconductors (1999). [Online]. Available: http://public.itrs.net/files/1999_SIA_Roadmap/Design.pdf [31] I. S. Kourtev and E. G. Friedman, Timing Optimization Through Clock Skew Scheduling. Boston, MA: Kluwer, 2000. [32] MultiGig, Ltd. Rotary Explorer. [Online]. Available: http://www. multigig.com/software.htm [33] M. Kamon, M. J. Tsuk, and J. K. White, “FASTHENRY: A multipole-accelerated 3-D inductance extraction program,” IEEE Trans. Microwave Theory Tech., vol. 429, pp. 1750–1758, Sept. 1994. [34] BSIM Research Group. (2000–2001) The BSIM4 Short-Channel Transistor Model. Univ. of California at Berkeley. [Online]. Available: http://www-device.eecs.berkeley.edu/~bsim3/bsim4.html

John Wood is the Engineering Director of MultiGig, Ltd., a U.K. technology startup specializing in multigigahertz circuit design I.P. Previously, he has worked as a consultant design engineer on multidomain design projects in mechanical, power electronics, infrared optics, and software development roles. He holds a number of patents which have been licensed for manufacture in the fields of infrared plastic welding and high-speed digital signaling. His technical interests include all areas of engineering design, but particularly electromagnetics, VLSI circuit design, and high-speed analog techniques.

1665

Terence C. Edwards (M’89) received the M.Phil. degree in microwaves. He is the Executive Director of Engalco, a consultancy firm based in the U.K., mainly specializing in signal transmission technologies and the global RF and microwave industry. He researches and takes responsibility for regular releases of Microwaves North America, published 1995, 1998, and 2001. He has authored several publications (including papers published in the IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES), has led management seminars on fiber optics, presented a paper on mobile technologies at the IMAPS Microelectronics Symposium, Philadelphia, PA, October 1997, and has written several articles and books. These include (jointly with Prof. Michael Steer) one recently on MICs (New York: Wiley) and on gigahertz and terahertz technologies (Norwood, MA: Artech, 2000). He is on the editorial advisory board for the International Journal of Communication Systems. He regularly consults for both national and overseas companies and is on the prestigious IEE (London) President’s List of Consultants. Mr. Edwards is a Fellow of the Institution of Electrical Engineers (IEE), U.K.

Steve Lipa (S’00) received the B.S. degree in electrical engineering from the University of Virginia, Charlottesville, in 1980, and the M.S. degree in electrical engineering from North Carolina State University, Raleigh, in 1993. He is currently working toward the Ph.D. degree in electrical engineering at North Carolina State University. He is currently a Research Assistant and Laboratory Manager with the Microelectronics Systems Laboratory at North Carolina State University. He has ten years of experience as an Integrated Circuit Design Engineer, primarily in the design of high-speed digital logic circuits. His current research is in the area of high-speed clock distribution.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 1, JANUARY 1999

97

A 1.6-GHz Dual Modulus Prescaler Using the Extended True-Single-Phase-Clock CMOS Circuit Technique (E-TSPC) J. Navarro Soares, Jr., and W. A. M. Van Noije

Abstract—The implementation of a dual-modulus prescaler (divide by 128/129) using an extension of the true-single-phase-clock (TSPC) technique, the extended TSPC (E-TSPC), is presented. The E-TSPC [1], [2] consists of a set of composition rules for single-phase-clock circuits employing static, dynamic, latch, dataprecharged, and NMOS-like CMOS blocks. The composition rules, as well as the CMOS blocks, are described and discussed. The experimental results of the complete dual-modulus prescaler, implemented in a 0.8 m CMOS process, show a maximum 1.59 GHz operation rate at 5 V with 12.8 mW power consumption. They are compared with the results from other recent implementations showing that the proposed E-TSPC circuit can reach high speed with both smaller area and lower power consumption. Index Terms— CMOS digital, high-speed circuits, prescalers, single-phase-clock design.

I. INTRODUCTION

F

OR MORE than 15 years, CMOS has been the main technology for very-large-scale integration (VLSI) system design. From the beginning to nowadays, several CMOS clock policies have been proposed. The pseudotwo-phase logic was one of the earliest techniques [3]. Later on, two-phase logic structures were proposed. The domino technique [4] associated successfully both two-phase and dynamic CMOS circuits. With the NORA technique [5], [6], an extensive no-race approach for two-phase and dynamic circuits was developed. A single-phase-clock policy was introduced in [7] [the true single-phase-clock (TSPC)]. This technique was subsequently advanced by [8]–[10]. Single-phase-clock policies are superior to the others due to the simplification of the clock distribution. They reduce the wiring costs and the number of clock-signal requirements (no problems with phase overlapping, for instance). Consequently, higher frequencies and simpler designs can be achieved. Introduced by [1] and [2], the extended true-single-phaseclock CMOS circuit technique (E-TSPC), an extension of the TSPC, consists of composition rules for single-phase circuits using static, dynamic, latch, data-precharged, and NMOS-like blocks. The composition rules enlarge the block-connection possibilities and avoid races; additionally, NMOS-like blocks enhance the technique for high-speed operations. The design of a dual-modulus prescaler (divide by 128/129) with the E-TSPC in a standard 0.8 m CMOS process (0.7 m

Manuscript received February 16, 1998; revised May 25, 1998. This work was supported in part by FAPESP and CNPq, Brazil. The authors are with the LSI/PEE, Escola Polit´ecnica, University of S˜ao Paulo, S˜ao Paulo, S.P. 05508-900 Brazil (e-mail: [email protected]; [email protected]). Publisher Item Identifier S 0018-9200(99)00410-2.

effective channel length) is presented. The prescaler implementation purpose is the evaluation of the E-TSPC technique potentialities. This paper is organized as follow. In Section II, the principal features of the E-TSPC technique, blocks and design rules, are presented. In Section III, some different dual-modulus implementations are analyzed. Experimental results and comparisons are reported in Section IV, and the principal conclusions are drawn in Section V. II. E-TSPC CIRCUIT BLOCKS AND COMPOSITION RULES A. Basic CMOS Blocks An E-TSPC circuit should use any of the blocks: CMOS static block, n-dynamic block [Fig. 1(a)], p-dynamic block [Fig. 1(c)], n-latch block [Fig. 1(e)], p-latch block [Fig. 1(g)], and high (PH) and low (PL) data-precharged blocks (Fig. 2). In Fig. 1, the clocked transistors of the n- and p-latches are placed close to the power rail, following the suggestion of [11]. This configuration can attain a higher speed but suffers chargesharing problems. Clocked transistors close to either the power rail or the block output are admissible latch configurations. In data-precharged blocks [10], some input signals, called precharging inputs or pc-inputs, control the output precharge (see Fig. 2). If all PH block pc-inputs are high, or if all PL block pc-inputs are low, then the PH or PL block is precharged. In this case, the PH block output goes to low, and the PL block output to high. In Fig. 2, the CMOS static block that is drawn, along executes the logic function with all equivalent PH and PL blocks [Fig. 2(b) and (c)]. The pc-inputs of each block are also indicated. The PH and PL blocks that have the output precharged when the clock is low will be called n-Dp blocks; similarly, the PH and PL blocks that have the output precharged when the clock is high will be called p-Dp blocks. B. Composition Rules First, the definition of data chains, fundamental to the design rules, is given. Definition: An n-data chain is any noncyclic signal propagation path: 1) containing at least one n-latch, one n-dynamic, or one n-Dp block; 2) starting in a circuit external input, or in the output of a p-latch, p-dynamic, or p-Dp block; when this output is followed by static blocks in the normal data flow, the data chain starts in the output of the last static block;

0018–9200/99$10.00  1999 IEEE

98

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 1, JANUARY 1999

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Fig. 1. Construction blocks of the E-TSPC circuit technique: (a) n-dynamic and (b) NMOS-like n-dynamic blocks; (c) p-dynamic and (d) NMOS p-dynamic blocks; (e) n-latch and (f) NMOS-like n-latch blocks; and (g) p-latch and (h) NMOS-like p-latch blocks.

(b)

(a)

(c) Fig. 2. Transformation from (a) a static block into data-precharged blocks: (b) PH blocks and (c) PL blocks.

3) going through static, n-dynamic, n-Dp, or n-latch blocks; 4) regardless of the number and ordering of the blocks defined above; 5) finishing in a circuit external output, or in the input of the first p-latch, p-dynamic, or p-Dp block.

For the p-data chains, an equivalent definition applies, replacing n with p and vice versa. When clock is high, n-data chains are in evaluation phase; otherwise, they are in holding phase. P-data chains evaluate when clock is low.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 1, JANUARY 1999

99

Fig. 3. Example of n-data chains. The blocks mentioned in the text are named and hatched in the figure.

In Fig. 3, part of a circuit schematic is depicted with seven complete n-data chains. Some examples are the data chain and going through blocks , , , starting at input ; the data chain starting at and going through , and , , , and ; and the data chain starting at and , , , and . going through Five of the six E-TSPC composition rules are now listed. Their purpose is to ensure the observance of some constraints during the evaluation and holding phases. To simplify the rule will be used to denote n or p in statements, the symbol nouns like -data chain, -dynamic block, etc. Composition Rule ( ): The -data chain input should be an input of a dynamic block, an input of a latch, or a nonpcinput of a Dp block. Composition Rule ( ): A -latch must not drive, directly or through static blocks, a -dynamic or a -Dp block. Composition Rule ( ): The number of inversions between: ) any two adjacent dynamic blocks must be odd1; ) any two adjacent Dp-blocks of the same type (PH and PH or PL and PL) must be odd; ) any two adjacent Dp-blocks of complementary types must be even; ) a PH (PL) block and an adjacent n- (p)-dynamic (or vice versa) in an n- (p)-data chain must be even; ) a PL (PH) block and an adjacent n- (p)-dynamic (or vice versa) in an n- (p)-data chain must be odd. (Two blocks are called adjacent if there are only static blocks between them.) Composition Rule ( ): Consider the last dynamic block in the -data chain (when it exists). The number of inversions (due to any block) from this dynamic block up to at least one -latch must be even. 1 Through

all the rules, zero inversion will be considered even.

Fig. 4. Two TSPC D-flip-flops connected in series.

Composition Rule ( ): The -data chain must have one of the following two configurations: ) at least one dynamic block and one latch; ) at least two latches and an even number of inversions (latches or static blocks) between them. It is worth noting that these five composition rules are very similar to the five rules proposed in the NORA technique [6]. In a circuit where all data chains obey the five rules, it can be proved that (six theorems presented in [1] and [2]): a) all data-precharged blocks are precharged during the holding phase of the data chains to which they belong; b) the dynamic and the data-precharged blocks are not incorrectly discharged during the evaluation phase; c) the output of the data-chain last latch is steady during the holding phase of the data chain. C. Exception Rule Although the above-described rules are necessary to avoid race problems, typical TSPC systems do not follow some of them. The most common exception is found in connecting two D-flip-flops (D-FF’s), as shown in Fig. 4. In such a

100

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 1, JANUARY 1999

CONDITIONS

FOR

TABLE I CORRECT OPERATION OF

configuration, the p-data chains are constituted of only one or ( violation). In conp-latch block, namely, sequence, the p-latch output may change during its holding time. A faulty sequence example is depicted below: consider an initial state on which the signals clock, input, and output a and are evaluating. At are low, and both blocks and are the end of the evaluation period, the outputs high. Subsequently, when the clock goes to high, the other works properly, holding blocks will evaluate. Suppose that goes to low, its former value (high). In this case, the node goes to low. As a result, the output a goes to high, and is cut, and the final value of node will depend transistor on the circuit delays. and is long Commonly, the delay between nodes enough to ensure that is fully discharged through transistors and ; in this case, the second D-FF works properly. A simple exception rule is added to cover the utilization of the well-established TSPC D-FF’s (Fig. 4). Exception Rule ( ): Configurations similar to that of and are not obeyed, are accepted if Fig. 4, where rules enough delay exists. is applied, to the detriment of The data chains where and , do not have a latch with steady output at the holding phase. Since the correct operation of the circuit will depend on the block delays, the exception rule should be used with caution. Considering the connection rules presented in former works [7]–[10], our six proposed rules differ in the following aspects. a) The “nonlatched domino logic,” a timing strategy considered in [10], is not accepted in our proposal. b) The proposed rules permit a more flexible usage of both data-precharge blocks, due to the distinction between pc and nonpc-inputs, and static logic blocks (static logic is allowed between dynamic and latch blocks). In Fig. 2, where no rule violations occur, several connections not allowed by former work rules are provided, for instance, and , between the connection between blocks and , between and , etc. D. NMOS-Like Logic Extension When high speed is also a requirement, restrictions on the use of p-dynamic and p-latch blocks should be imposed. These blocks have at least two p-transistors in series, which may

THE

NMOS-LIKE BLOCKS

Fig. 5. Schematic of the dual-modulus prescaler (divide by 128/129).

reduce considerably the maximum speed. In such applications, the p-data chains are limited to one block, and most logic operations are handled with n-data chains with limited logic dept. Thus, deep pipelines will be necessary to implement complex and fast logic designs. NMOS-like dynamic and latch blocks can be used to minimize this difficulty and also to increase the n-data chain speed. They are ratioed logic blocks, where the n-transistor section and the p-transistor section may conduct simultaneously. A similar technique was used in [12], but restricted to D-FF’s. In Fig. 1, the NMOS-like versions of the dynamic and latch blocks are drawn. To assure a correct operation, these blocks should satisfy the constraints summarized in Table I. The transistor section that must impose the output value, when both sections are conducting, is drawn with bold lines in Fig. 1. The NMOS-like blocks are faster due to the reduced number of transistors in series, but, unfortunately, they consume more power. In consequence, they should be used only in critical data chains, where the desirable speed has not been reached. Since the connection characteristics do not depend on whether it is a conventional or an NMOS-like block, the composition rules ( – and ) are valid and necessary for both; as a result, NMOS-like blocks and conventional blocks can replace one another, and the judicious selection of NMOS-like blocks is made easy. Summarizing, the static blocks, the n/p-dynamic, the n/platch, the PH/PL data-precharged, the NMOS-like blocks, and compose the E-TSPC the composition rules – and technique.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 1, JANUARY 1999

101

Fig. 6. Transistor schematic of the divide-by-4/5 counter DG4 . The transistor width or, when the length is different from 0.8 m, the transistor width/length, in m, is also indicated in the figure.

TABLE II MAXIMUM SPEED AND POWER-CONSUMPTION RESULTS FOR THE FOUR DESIGNED DIVIDE-BY-4/5 COUNTERS (SPICE SIMULATIONS, SLOW PARAMETERS, AND VDD = 5 V) Design

DG1 DG2 DG3 DG4

Speed (GHz)

Power (w/MHz)

0.98

3.27

1.28

4.45

1.39

4.85

1.67

5.62

III. DUAL-MODULUS DESIGN Dual-modulus prescalers, a circuit with applications in frequency synthesis systems, have been frequently used to compare different high-speed implementations [12] and [13], our current goal. A high-speed dual-modulus prescaler (divide by 128/129) was designed using a standard 0.8 m CMOS bulk process. The schematic of the dual-modulus prescaler is depicted in Fig. 5. The circuit inside the cross-hatched box, composed of three D-FF’s and two logic gates, forms a divide-by-4/5 counter. The div32 signal selects if it counts up to four (div32 high) or up to five (div32 low). The five D-FF’s at the bottom of the figure form a divide-by-32 counter. The fractional division ratio of the prescaler, 128 or 129, is selected signal. according to the Four different approaches were applied to draw a layout of the divide-by-4/5 counter, which is the critical high-speed part of the prescaler. The approaches are: ) design with conventional rise edge-triggered TSPC D-FF (Fig. 4); ) design with rise edge-triggered D-FF, and further optimization applying the E-TSPC technique; ) design with a modified fall edge-triggered D-FF [12]; ) design with fall edge-triggered D-FF, and further optimization applying the E-TSPC technique. approach, with In Fig. 6, the transistor schematic of the transistor dimensions, is depicted. The three cross-hatched boxes mark the D-FF’s; the first D-FF (left) has a buffered output.

Fig. 7. Photograph of the prescaler test chip.

The maximum speed and the power consumption for each design are shown in Table II. These results were obtained with SPICE simulations from the extracted netlists of the layouts for slow parameters, room temperature, and power supply at 5 V. The comparison of the results exhibits some advantages to approach, of the E-TSPC technique. From the the speed improvement is higher than 70%, and from to is 20%. On the other hand, the power consumption to . As uses only NMOSincreases 72% from like blocks, the latter result is not surprising, and confirms that these blocks should be restricted to critical circuit parts. Since the composition rules favor the replacement of conventional blocks with NMOS-like ones and vice versa, E-TSPC circuits can reach high speed and keep the power consumption low. To better evaluate the above results, the following notes should be taken into account: • all approaches use small transistor sizes, usually minimum sizes (as indicated in Fig. 6); • the Fig. 5 divide-by-4/5 counter schema was slightly ) to conform modified for each design ( with its structure characteristics; • the NOR configuration of Fig. 6 is similar to an NMOS logic, but the load is now a PMOS transistor. It is faster , , than the CMOS static NOR and is used in the approaches; and

102

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 1, JANUARY 1999

TABLE III AREA, SPEED, AND POWER-CONSUMPTION RESULTS FOR FOUR DIFFERENT PRESCALERS

Fig. 8. Measured results for the prescaler maximum frequency (fmax ), left axis (3 ), and current consumption at fmax , right axis (o), as a function of the power supply.



and blocks, Fig. 6, drive the clock signal to the divide-by-32 counter. All four designs have similar configuration. IV. EXPERIMENTAL RESULTS

The full prescaler circuit, occupying a 0.0126 mm area, . The D-FF’s of the 32 was formed with the counter asynchronous counter were built with conventional rise edgetriggered TSPC D-FF (Fig. 4). The clock signal from the divide-by-4/5 counter, Fig. 6, is inverted before being sent to the 32 counter. This expedient allows a longer time interval for preparation of the signal div32. The prescaler test chip, whose photograph is shown in Fig. 7, was mounted on an alumina substrate with the chip-onboard technique. A coplanar radio-frequency probe was used to feed the unique prescaler high-speed signal, the clock input. In Fig. 8, the measured maximum frequency and current consumption as a function of the power supply are shown. Since the used pulse generator has a maximum excursion of 3 V, the circuit real maximum frequencies are expected to be slightly higher than the measured results for power supply above 3 V. Performance results of this work, of two recently published prescalers using TSPC D-FF’s, and of a new prescaler architecture are summarized in Table III. In [13], the prescaler is implemented with rise edge-triggered TSPC D-FF’s, which were size optimized to reach maximum speed; in consequence, not only the circuit speed but also the area and power consumption are high. Fall edge-triggered TSPC D-FF’s with small-sized transistors and with some NMOS-like blocks are used in [12]. The resulting circuit has a small area and a low power consumption but a reduced maximum operation rate. Our implementation, with the E-TSPC technique and smallsized transistors, provides the smallest area and the lowest power consumption; the speed, in addition, is comparable to [13] and [14]. V. CONCLUSIONS A complete high-speed dual-modulus prescaler (divide by 128/129) was developed in a 0.8 m CMOS process. The

measured circuit attained 1.59 GHz and 8.0 mW/MHz power consumption with 5 V power supply. It can be advantageously compared with other implementations in terms of area and power consumption; in terms of speed, it matches the fastest TSPC prescaler. The studies done during the design reveal that, to take full advantage of the TSPC technique, every possible configuration should be considered. The E-TSPC, being an extension of TSPC, permits exploring a larger number of solutions and, in consequence, finding the best configuration. The dual-modulus prescaler results exhibit some significant improvements produced by the E-TSPC. REFERENCES [1] J. Navarro and W. Van Noije, “E-TSPC: Extended true single-phase clock CMOS circuit technique,” in VLSI: Integrated Systems on Silicon, IFIP International Conference on VLSI, R. Reis and L. Claesen, Eds. London, U.K.: Chapman & Hall, 1997, pp. 165–176. [2] , “E-TSPC: Extended true single-phase-clock CMOS circuit technique for high speed applications,” SBMICRO, J. Solid-State Devices Circuits, vol. 5, pp. 21–26, July 1997. [3] N. H. E. Weste and K. Eshraghian, Principles of CMOS VLSI Design, 1st ed. Reading, MA: Addison-Wesley, 1985. [4] R. H. Krambeck, C. M. Lee, and H.-F.S. Law, “High-speed compact circuits with CMOS,” IEEE J. Solid-State Circuits, vol. SC-17, pp. 614–619, June 1982. [5] N. F. Gon¸calves and H. J. De Man, “NORA: A racefree dynamic CMOS technique for pipelined logic structures,” IEEE J. Solid-State Circuits, vol. SC-18, pp. 261–266, June 1983. [6] N. F. Gon¸calves, “NORA: A racefree CMOS technique for register transfer systems,” Ph.D. dissertation, Katholieke Universiteit Leuven, Leuven, Belgium, 1984. [7] Y. Ji-ren, I. Karlsson, and C. Svensson, “A true single-phase-clock dynamic CMOS circuit technique,” IEEE J. Solid-State Circuits, vol. SC-22, pp. 899–901, Oct. 1987. [8] J. Yuan and C. Svensson, “High speed CMOS circuit technique,” IEEE J. Solid-State Circuits, vol. 24, pp. 62–70, Feb. 1989. [9] M. Afghahi and C. Svensson, “A unified single-phase clocking schema for VLSI systems,” IEEE J. Solid-State Circuits, vol. 25, pp. 225–235, Feb. 1990. [10] P. Larsson, “Skew safety and logic flexibility in a true single phase clocked system,” in Proc. IEEE ISCAS, Seattle, WA, May 1995, pp. 941–944. [11] Q. Huang, “Speed optimization of edge-triggered nine-transistor D-flipflop for gigahertz single-phase clocks,” in Proc. IEEE ISCAS, Chicago, IL, May 1993, pp. 2118–2121. [12] B. Chang, J. Park, and W. Kin, “A 1.2 GHz CMOS dual-modulus prescaler using new dynamic D-type flip-flops,” IEEE J. Solid-State Circuits, vol. 31, pp. 749–752, May 1996. [13] Q. Huang and R. Rogenmoser, “Speed optimization of edge-triggered CMOS circuits for gigahertz single-phase clocks,” IEEE J. Solid-State Circuits, vol. 31, pp. 456–465, Mar. 1996. [14] J. Craninckx and M. S. J. Steyaert, “A 1.75-GHz/3-V dual-modulus divide-by-128/129 prescaler in 0.7-m CMOS,” IEEE J. Solid-State Circuits, vol. 31, pp. 890–897, July 1996.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

835

16

A CMOS Monolithic -Controlled Fractional-N Frequency Synthesizer for DCS-1800 Bram De Muer, Student Member, IEEE, and Michel S. J. Steyaert, Senior Member, IEEE

16

Abstract—A monolithic 1.8-GHz -controlled fractionalphase-locked loop (PLL) frequency synthesizer is implemented in a standard 0.25- m CMOS technology. The monolithic fourth-order type-II PLL integrates the digital synthesizer part together with a fully integrated LC VCO, a high-speed prescaler, and a 35-kHz 2 mm2 . To investigate dual-path loop filter on a die of only 2 the influence of the modulator on the synthesizer’s spectral purity, a fast nonlinear analysis method is developed and experimentally verified. Nonlinear mixing in the phase-frequency detector (PFD) is identified as the main source of spectral pollution in fractional- synthesizers. The design of the zero-dead zone PFD and the dual charge pump is optimized toward linearity and spurious suppression. The frequency synthesizer consumes 35 mA from a single 2-V power supply. The measured phase noise is as low as 120 dBc/Hz at 600 kHz and 139 dBc/Hz at 3 MHz. The measured fractional spur level is less than 100 dBc, even for fractional frequencies close to integer multiples of the reference frequency, thereby satisfying the DCS-1800 spectral purity constraints.

16

16

16

Index Terms—CMOS RF integrated circuits, modulator, fractional- frequency synthesis, phase-locked loop, phase noise.

I. INTRODUCTION

T

HE END of the 20th century was characterized by the unrivaled growth of the telecommunication industry. The main cause was the introduction of digital signal processing in wireless communications, driven by the development of high-performance low-cost CMOS technologies for VLSI. However, the implementation of the RF analog front end remains the bottleneck. This is reflected in the large effort put into monolithic CMOS integration of RF circuits both by academics and industry [1]–[3]. The goal of this work is the monolithic integration in standard CMOS technology of a frequency synthesizer to enable the full integration of a transceiver front end in CMOS, including a low-IF receiver and a direct upconversion transmitter [1]. To achieve a high degree of integratability and fast settling under fractional- synthesizer topology low-noise constraints, a fractional- synthesis circumhas been chosen [4] (Fig. 1). vents the severe speed–spectral purity–resolution tradeoff of the classic phase-locked loop (PLL) synthesizer, by providing synthesis of fractional multiples of the reference frequency. Spurious tones that emerge from the fractional division are whitened action and ultimately filtered by and noise shaped by the the loop filter. To prevent degradation of the spectral purity by Manuscript received November 5, 2001; revised January 31, 2002. The authors are with the Katholieke Universiteit Leuven, Department Elektrotechniek, ESAT-MICAS, B-3001 Heverlee, Belgium (e-mail: [email protected]). Publisher Item Identifier S 0018-9200(02)05856-0.

Fig. 1. Principle of

16 fractional-N synthesis.

digital noise coupling, the modulator is scheduled for integration on the digital baseband signal processing IC of the full transceiver system. The paper describes the design of a monolithic 1.8-GHz -controlled fractional- PLL frequency synthesizer. In noise on PLL bandwidth Section II, the influence of requirements is theoretically analyzed for multistage noise modulators. shaping (MASH) and multibit single-loop Next, a fast nonlinear analysis method is presented, which predicts possible degradation of the PLL spectral purity by in-band noise leakage and re-emerging of spurious tones. The nonlinearities in the phase-frequency detector (PFD) charge pumps are identified as the main trouble spots. The fourth-order type-II PLL building-block design is discussed in Section IV, focusing on integrated filter and voltage-controlled oscillator (VCO) design and on the realization of a linear phase error-to-charge-pump current conversion. In Section V, the experimental results of the fractional- synthesizer prototype are presented and compared to the simulations, showing good correspondence. II. THE FRACTIONAL-

SYNTHESIZER

A. Introduction fractional- synthesizer is shown A block diagram of a modulator output controls the instantaneous in Fig. 1. The division modulus of the prescaler, such that the mean division , with the number of bits of the modulus is modulator and the input word. The corresponding phase changes at the prescaler output are quantized, leading to possible spurious tones and quantization noise. By selecting higher order modulators, the spurious energy is whitened and shaped to high-frequency noise, which can be removed by the low-pass loop filter. As a result, for a given frequency resolution, an arcan be chosen, by assigning the proper number bitrary high

0018-9200/02$17.00 © 2002 IEEE

836

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

Fig. 2. Third-order multibit single-loop reasons.

16 modulator. The internal modulator accuracy is 16 bit. From the five output bits, only four are used for stability

of bits to the modulator. The loop bandwidth is not restricted by the reference spur suppression, resulting in faster settling and higher integratability. Additionally, the division modulus is de(with the minimum number of creased by a factor bits for the frequency resolution, i.e., 7.02 in this case), so that noise of the PLL blocks, except for the VCO, is less amplified. B. The

Modulators

The influence of both third-order MASH and multibit modulators on the spectral purity of the single-loop fractional- synthesizer is investigated. Since the order of the integrated PLL loop filter is three, the order of the modulators must also be three or higher to ensure that noise has at least a 20-dB/dec rolloff at intermediate offset frequencies, causing no degradation of the output phase noise. Both modulators have an internal accuracy of 16 bit and 1 LSB dithering is applied to further randomize any spurious energy. The dithering sequence is third-order noise shaped to avoid an increased noise floor. modulator is chosen beThe MASH or cascade 1-1-1 cause it is easy to integrate in CMOS and is unconditionally stable. The noise transfer function (NTF) of the MASH moduand contains three poles at the lator is origin of the plane. The result is harsh LF noise shaping and and substantial HF noise. In the time domain, this is reflected in the intensive prescaler modulus switching. To synthesize a , all moduli between 64 and 71 are frequency of employed. modulator is shown in Fig. 2. The multibit single-loop For ease of integration, the feedforward and feedback coefficients are a power of 2. Only four output bits are needed to control the prescaler moduli, but five output bits are used, to avoid overlap of the intended input operating range and the unstable input regions. The NTF of the presented modulator is given in (1) and contains only one pole at the origin of the plane and , with a passband two low- Butterworth poles at gain of 3.2. (1) modulator is more complex than Although the single-loop the MASH modulator, it offers a higher flexibility in terms of noise shaping. The HF quantization noise of the modulator can be spread out by proper pole positioning. As a result, the prescaler modulus switching is less intense. Only the moduli . between 66 and 69 are needed to synthesize The reduced HF switching has advantageous effects on noise

Fig. 3. Maximum PLL bandwidth f versus the reference frequency and different modulator orders, for the type-II fourth-order PLL. The dashed curve is for the third-order single-loop modulator. The targeted phase-noise specification is 136 dBc/Hz at 3 MHz for DCS-1800.

16

0

coupling and sensitivity to PLL nonlinearities, as will be discussed in Section III. C. Theoretical Analysis control on the specTo theoretically model the impact of tral purity of the synthesizer, a linear-time-invariant (LTI) PLL quantization noise as an admodel is employed, with the at the prescaler output. The prescaler ditive noise source control can be looked upon as a digital-to-phase (D/P) with converter. Every reference cycle, the prescaler subtracts rad from its input signal, with determined modulator output. The resulting quantization noise by the on the division modulus, and thus output phase, is approximated by uniformly distributed white noise [5]. The quantization noise with for both modupower is the modulus range and the number of signiflators with output bits. The phase noise contribution of the icant modulator at the output of the synthesizer is found in (2) [6], the closed-loop transfer function of the fourth-order with type-II PLL. (2) fractional- synthesizers Since the main advantage of and the PLL is the decoupling of the reference frequency noise on the bandwidth bandwidth , the influence of the

DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800

837

III. FAST NONLINEAR ANALYSIS METHOD

Fig. 4. Maximum PLL bandwidth f different modulator orders for third-order single-loop modulator.

16

18

versus the reference frequency and : . The dashed curve is for the

<

15

requirement is examined. To comply with the most stringent 133 dBc/Hz DSC-1800 phase noise specification, i.e., phase noise is at 3 MHz offset [7], the target (3 MHz) dBc/Hz. In Fig. 3, the maximum PLL is plotted versus the reference frequency bandwidth for different MASH modulator orders. The dashed line is the modumaximum bandwidth for the single-loop multibit lator of Section II-B. For a reference frequency of 26 MHz, not much is gained from increasing the modulator order. For a high bandwidth and thus a fast PLL, the reference frequency and/or the modulator order should be increased leading to an increased power consumption and circuit complexity. The maximum bandwidth is 87 kHz for the third-order MASH modulator and 62 kHz for the single-loop multibit modulator. Apart from the out-of-band phase-noise constraint, the integrated in-band phase noise, determining the rms phase error of the PLL is of importance. To be sure that the does not corrupt the rms phase error, the dynamic range of the modulator must be higher than the dynamic range of the PLL is given by [8]. The integrated in-band frequency noise with the noise bandwidth of the PLL the in-band phase noise in dBc/Hz. The noise and 10 . The maxbandwidth of the presented PLL is imum bandwidth of the PLL is calculated in (3) [8]. (3) is plotted versus the refThe maximum PLL bandwidth erence frequency of the PLL for different MASH modulator modulators orders in Fig. 4. For the single-loop multibit (dashed curve), the actual maximum bandwidth can be calculated to be 25% smaller than in (3), due to the Butterworth poles. In the case of a third-order modulator, a 1.5 rms phase error (to of ensure at least an overall rms phase error of 2 ) and a 26 MHz, the maximum bandwidth is 810 and 614 kHz, respecmodulator tively. Obviously, the constraint posed on the noise due to in-band noise contributions is much less severe than the constraint due to the out-of-band phase noise at 3 MHz.

The theoretical analysis suggested that applying control to the prescaler would not cause any problems for the spectral purity of the PLL. Practice, however, proves this wrong. A fast nonlinear analysis method is developed which can take into account the nonlinearity of the PLL building blocks. The analysis method is at the same time sufficiently fast to sweep simulations over different degrees of nonlinearities and operating points, and is capable of performing sufficiently long transient simulations to get accurate fast Fourier transforms (FFTs) of the phase variable. The fractional operation of the PLL is simulated in discrete time and in open loop under locked conditions to avoid drift of the phase error. To further speed up the simulation, the building blocks are represented by high-level models with parameters to model any nonlinear behavior or mismatch in critical transistors. The simulations are performed in Matlab [9]. modulation of To find the phase error, generated by the the division modulus, the variation of the number of RF pulses, , at the output of the divider is monitored. Every reference cycle, the number of RF pulses at the divider output is detercontrol, mined by the number of pulses swallowed by the : (4) The resulting quantized phase changes are compared with the phase that would be expected when the loop would be in lock, i.e., the phase corresponding to the fractional part of the divi. The result is the instantaneous accumusion modulus : lated phase error (5) , in the The phase error is converted to current pulses, (phase-error charge-pump curcharge pump. The rent) conversion is modeled to contain any PFD nonlinearity. Mismatch in the up and down current sources, resulting in gain mismatch for positive and negative phase errors is modeled by . The occurrence of a dead zone is modeled by (6) By taking an FFT of the current pulses, the current noise spectrum is obtained. The current noise spectrum is modeled as a phase-noise source which is subjected to its corresponding closed-loop transfer function, obtained from the LTI PLL model. This means that the filter is modeled by its linear transfer function, which includes parasitic gain and pole position changes. The nonlinear conversion from voltage to frequency/phase in the VCO is modeled by the variation of the VCO gain, when changing the operating point of the PLL. The analysis tool enables the evaluation and comparison of noise on the PLL. the effect of MASH and single-loop This analysis is performed with the following nonlinearities: a 0.1% dead zone and a gain mismatch of 2%. The internal accuracy of both modulators is 16 bit. The reference frequency

838

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

1

Fig. 5. Simulation results. The phase error  for (a) the MASH modulator and (b) the single-loop multibit modulator. The FFT of the current pulses CP (c) the MASH modulator and (d) the single-loop multibit modulator.

is 26 MHz and the fractional division number is 67.92. The output frequency is 1.76592 GHz, i.e., 2.08 MHz offset from . In Fig. 5(a) and (b), the time-domain an integer multiple of is plotted for both modulators. Note that the phase error fractional- PLL frequency synthesizer can hardly be called a phase-locked loop, since the loop is never in lock! Due to the modulator, the inshaping of the HF noise in the single-loop stantaneous phase error is smaller than for a MASH modulator. This has two important consequences. First, the on-time of the charge pumps is smaller for the single-loop modulator, making it less sensitive to noise coupling from the substrate and the power consupply. Second, the sensitivity to the nonlinear version in terms of noise leakage is reduced. To be able to examine the effect of nonlinearities in the frequency domain, the FFTs of the charge-pump current pulses are plotted in Fig. 5(c) and (d). A noise floor appears in the output spectrum as well as spurious tones, although the output is perfectly randomized and dithered. Due to the nonfolds linear mixing in the PFD charge pump, noise at back to lower offset frequencies, similar to the effect of a nonADC. Since the noise at is linear DAC in a multibit modulator, its noise leakage much lower for the single-loop due to the nonlinear mixing in the PFD is also lower. In the time

[i] for

domain, this effect corresponds to the smaller phase excursions. The difference in phase error between MASH and single-loop modulators is reflected in a lower noise floor, i.e., a 10-dB difference. In addition, previously unnoticed spurious tones appear in the output spectrum at with . noise of both modulators as it appears at Fig. 6 shows the the PLL output for an ideal (dotted) and a nonlinear conversion (solid). The results of the ideal case closely match the theoretical results of Section II-C (solid light gray). Due to nonlinearity, the simulated output spectrum of the integerPLL (the dash-dotted line) is seriously deteriorated by noise . Especially, in the PLL noise bandwidth, increasing the the MASH converter is critical in terms of in-band noise due to the higher phase error [see Fig. 5(a)], despite the inherently noise of the MASH modulator. Note that the simlower LF ulations are performed without taking into account noise coupling through the substrate or power-supply lines. As a consefractionalquence, the actual spurious performance of the PLL could be worse than simulated. The presented simulation results are for a division modulus 67.92, close to an integer mul. When analyzing division moduli in between integer tiple of , noise leakage is still observed, but the spurious multiples of tones are well below the phase noise.

DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800

839

Fig. 7. Discrete time autocorrelation estimate of the modulator outputs for (a) the MASH modulator and (b) the single-loop multibit modulator.

16

Fig. 6. Simulation results. The noise at the output of the PLL for (a) the MASH modulator and (b) the single-loop multibit modulator. The results are plotted for an ideal PFD (dotted), which closely corresponds to the theoretical results (solid light gray) and for a nonlinear PFD (solid). They are compared to the simulated integer PLL phase noise (the dash-dotted line).

PFD. This effect can be worsened by substrate and power. supply coupling with signals at IV. PLL BUILDING-BLOCK CIRCUIT DESIGN A. The Fourth-Order Type-II PLL

The explanation for the re-emerging of spurious tones is that the modulator is unable to sufficiently decorrelate the successive modulator output samples. To quantify the correlation in the output, the discrete time autocorrelation estimate is calculated and plotted for both modulators for inputs close to an integer value (see Fig. 7). The autocorrelation calculations show correlation, although 1–LSB noise-shaped dithering is applied. The modulator shows large autocorrelation of the single-loop correlation peaks, explaining the higher spurious tones in the output phase-noise spectrum of the PLL. With the autocorrelamodtion estimate, the necessary internal accuracy of the ulators is found to be at least 13 bits for MASH and 16 bit for single-loop modulators to sufficiently decorrelate the modulator output for inputs close to integers. A second possible source of tones is the downconversion of tones which are inher[5], by the nonlinear mixing in the ently present around

A fourth-order type-II PLL is integrated, including a 4-bit prescaler, a zero-dead-zone PFD, a dual charge pump, and a 3-step equalizer, together with an on-chip LC-tank VCO and a third-order dual-path 35-kHz low-pass loop filter (see Fig. 8). The equalizer performs a 3-step piecewise equalization of the loop gain, by keeping the product of the VCO gain and the charge-pump current constant. To prevent switching between different equalization states, the state transitions exhibit hysteresis. B. The 4-Bit Prescaler The first high-speed division of the prescaler is done with two differential single-transistor-clocked (DSTC) logic n-latches [10], forming a differential dynamic D-flip-flop. The flip-flop operates with rail-to-rail internal signals to minimize the residual prescaler phase noise [11] to levels insignificant to

840

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

Fig. 8. Fully integrated fourth-order type-II phase-locked loop.

the overall phase-noise performance. The 16-modulus division (64 79) is implemented with the phase-switching topology [12]. The division moduli are generated by switching between the 90 -spaced output phases of the second D-flip-flop. When the 90 spacing is not ideal, spurs appear at 1/4, 2/4, and 3/4 of the PLL reference frequency. It takes careful layout and circuit design to equalize the delays of the different quadrature paths, such that these spurious tones are suppressed to negligible levels.

SUMMARY

TABLE I LOOP PROPERTIES AND PERFORMANCE FOURTH-ORDER TYPE-II PLL

OF THE

OF THE

C. The Voltage-Controlled Oscillator The LC VCO with on-chip inductor combines a 30% tuning range at only 2 V and an excellent phase-noise performance over a large frequency range. To minimize the VCO phase noise, a simulator-optimizer program has been developed which searches the optimal inductor geometry for a given technology. The resulting hollow octagonal balanced inductor as high as 9 with an inductance of 2.86 nH, for a has a standard 0.25- m CMOS technology with only two metal layers (0.6 and 1.0 m) [13]. The VCO is implemented as a single differential pMOS-only topology, leading to an enhanced tuning range, without in[13]. creasing the power consumption and the VCO gain, is between 100 and For the frequency range of interest, 200 MHz/V, explaining the need for equalization of the loop gain. The VCO output is buffered from the prescaler input to prevent kickback noise from entering the tank. The measured phase noise is as low as 127.5 dBc/Hz at 600 kHz and 142.5 dBc/Hz at 3 MHz for a carrier frequency of 1.82 GHz. D. The 35-kHz Dual-Path Loop Filter To achieve full integration, a dual-path filter topology has been implemented (Fig. 8). Two filter paths, one active integraare added tion ( ) and one passive low-pass filter

with a multiplication factor in the dual charge pumps. The addition realizes the low-frequency zero needed for loop stability in a type-II PLL, without adding the actual capacitor [12]. The total number of capacitors is the same as in a classical fourth-order type-II PLL, but for the same phase noise the integrated capacitance is more than 5 times smaller. Due to the rather high VCO gain, the integrated capacitance is still 1.4 nF to be able to comply with the DCS-1800 phase-noise requireis added at 210 kHz to ensure ments. An extra pole enough suppression at higher offset frequencies. A filter optimization model is developed, determining all pole and zero positions and the capacitance–resistance tradeoff to obtain low noise and high integratability [14]. The results of the optimization at 1765.92 MHz are listed in Table I. The total phase noise is without the noise. The MASH and single-loop (SL) noise contributions result from the nonlinear analysis. As

DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800

841

Fig. 9. (a) Timing control circuit and signals to control the dummy and the output current branch of the charge pump. (b) Charge-pump circuit with (at the left) the dummy current branch, denoted by the suffix d, and the output branch.

seen in Section II-C, the loop bandwidth needs to be smaller than noise suppression. However, to ensure sufficient 62 kHz for suppression of the low-frequency fractional spurious tones for inputs close to the integers, the bandwidth is designed to 35 kHz. Despite the rather low loop bandwidth for a fractional- synthesizer, a settling time of less than 293 s for a 104-MHz step is simulated. E. The

Conversion

The nonlinear analysis of Section III identified nonlinearity conversion as the main cause of noise leakage of the and spurious tones. Therefore, the PFD and charge-pump circuits are carefully optimized toward spurious suppression as such and toward a highly linear phase-error detection for spurious suppression. First, the reference spur generation by the PFD charge-pump circuit is carefully minimized. The integration in the first path of the loop filter is done actively to keep the charge-pump output

at a fixed level (see Fig. 8). Additionally, the charge-pump current is designed to be at least a magnitude larger than the fixed parasitic charge injection of the switch transistors. The current switches are implemented with pMOS and nMOS transistors to compensate charge injection. Finally, a timing control scheme [Fig. 9(a)] is developed to control the charge-pump switches. The up and down control pulses of the PFD are converted to synchronized control signals to drive both the output current branch and the dummy current branch of the charge pump [Fig. 9(b)]. Fig. 9(a) shows the dummy and output control signals. The is delayed versus the output control by dummy control modifying the thresholds of the second inverter-string (indicated always flows, preby high and low) such that the current venting hard on/off switching of the current sources. To equalize rise and fall times and force a perfect rad relation between nMOS and pMOS control signals, latches at the outputs of both inverter strings are implemented. Capacitors at the control outputs lower the rise and fall times to prevent large charge injections by fast switching.

842

Fig. 10.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

IC microphotograph and the measurement setup in which it is embedded.

To linearize the conversion, the phase detection is performed by a zero-dead-zone PFD [15], to prevent a hard nonlinearity around 0 phase error. Due to the delay added in the PFD, both the up and down current sources are on, for small or zero phase errors, enabling the PFD to react to very small phase errors. The on-time fraction of the charge pump due to the delay is less than 10%. This value is a tradeoff between dead-zone prevention and sensitivity to noise coupling, when the charge pumps are on. To further minimize digital noise coupling, the sampling in the PFD and the computational events modulator and prescaler are offset in phase. Consein the quently, the phase-error decision making is done in a relatively quiet environment. To make sure that the gains for positive and negative phase-error detection are equal, the current source transistors are oversized to ensure sufficient matching. As a side efnoise, which can seriously affect the fect, the current source in-band noise, is decreased. Additionally, the timing control of Fig. 9(a) provides synchronization between the two filter paths and the switches of the charge pumps themselves, thereby ensuring equal positive and negative phase-error detection gain. HSPICE simulations of the PFD charge-pump circuit are performed and show no dead zone and no gain mismatch with ideal transistor matching.

V. EXPERIMENTAL RESULTS Fig. 10 shows the IC microphotograph and the measurement fractional measuresetup in which it is embedded. The ments are performed by controlling the PLL divider moduli with an HP80000 data generator, which generates the 4-bit control output bit stream is generated using Matlab. word. The 4-bit modThis provides a flexible way to test different kinds of ulators, without the need for redesigns. All presented measurements are performed with a 26-MHz reference frequency and at

16

Fig. 11. Measured output spectrum of the fractionalGHz. All spurious tones are well below 75 dBc/Hz.

0

N PLL at 1.76592

1.76592 GHz, i.e., for a fractional division by 67.92 for comparmodulators ison with the simulated results. The input to the ), resulting in a frequency resolution of is a 16-bit word ( around 400 Hz. The power-supply voltage is only 2 V. Fig. 11 shows the output spectrum of the fractional- PLL over a span of 55 MHz. The reference spurs are well below 75 dBc, due to the careful charge-pump timing control. To measure the fractional performance of the frequency synthesizer, the Matlab data is stored in the data generator memory. Unfortunately, the maximum memory capacity is only 128 kbit, leading to large spurious tones at the output at low offset frequencies. These large tones corrupt the gain calibration, which is performed by the phase-noise measurement system every offset frequency decade, such that accurate measurements of the phase noise at offsets smaller than 10 kHz are not feasible. The measured phase noise of the PLL with the MASH modulator and the

DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800

SUMMARY

843

OF

TABLE II MEASURED SPECIFICATIONS COMPARED DCS-1800 SPECIFICATIONS

TO THE

16

Fig. 12. Phase-noise measurement with the single-loop multibit converter at 1.76592 GHz compared to the phase noise at integer division (light).

Fig. 13. Phase noise measurement with the MASH converter at 1.76592 GHz compared to the simulated noise at the output of the PLL (dashed), and control (dash-dotted). with the simulated PLL output without

16

16

single-loop multibit modulator is presented in Figs. 12 and 13. Small spurs are present at 2.08 MHz as predicted by the simulations in Fig. 6. The spur level is well below 100 dBc, due to careful PFD charge-pump design. The phase noise at 600 kHz is lower than 120 dBc/Hz. In Fig. 12, the measured phase noise of the PLL with a multibit single-loop modulator (dark) is compared to the phase noise at integer division (light). Noise at lower offsets origimodulator due to noise folding in the PFD, nates from the as predicted by the simulations. As a result, the rms phase error is increased from 1.7 to 3 . Note that the phase noise of the PLL at integer divisions is as low as 124 dBc/Hz at 600 kHz, which is only 0.3 dB higher than predicted by the PLL simulations (see Table I). The measured results for fractional division are much noisier than predicted by simulation. The phase noise at offset frequencies close to 10 kHz is increased due to the limited memory of the data generator. The noise at higher offset frequencies is corrupted by noise coupling from the data generator. As can be seen in Fig. 10, -control bonding wires, which conduct rail-to-rail, very the

noisy control pulses are close to the LC tank and the bonding wires of the VCO power supply. Without proper shielding, the VCO phase noise is seriously degraded by this noise coupling. noise and the noise as simIn Fig. 13, the measured ulated in Section III (dashed) is compared. The dash-dotted line control. The is the simulated phase noise of the PLL without noise leakage closely matches the measured resimulated sults, except at very low offsets due to the limited memory. The phase noise at high offsets is increased versus the simulated PLL results due to noise coupling. Second-order tones are larger in measurements, since the models in the simulator do not include second-order effects and noise coupling. Tones at 520 kHz are believed to come from subharmonic tones present in the modulator output [5], which are amplified by mixing through noise coupling. When comparing the results for the MASH and the single-loop modulator, the measured results are less pronounced than the simulated results (see Fig. 6). The measured phase noise for the single-loop modulator is however a few decibels lower than for the MASH modulator. Note that all measurements are performed for frequencies close to integer multiples . of The measured settling time of the PLL is 226 s for a 104-MHz frequency step. The power consumption of the PLL is 70 mW from a 2-V power supply. The fully integrated low-phase-noise VCO is responsible for almost 66% of the total power consumption. The IC area is 2 2 mm , including bonding pads and bypass capacitors. Table II shows the measured specifications compared to the DCS-1800 specifications [1]. The specifications of the IC prototype comply with the is degraded due to the limited DCS-1800, only the resolution of the measurement setup. VI. CONCLUSION A monolithic 1.8-GHz -controlled fractional- PLL frequency synthesizer is implemented in a standard 0.25- m CMOS technology. The monolithic fourth-order type-II PLL

844

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

integrates the digital synthesizer part together with a fully integrated LC VCO, a high-speed prescaler, and a 35-kHz dual-path loop filter on a die of only 2 2 mm . To investigate modulator on the synthesizer’s spectral the influence of the purity, a fast nonlinear analysis method is developed, showing good correspondence with measurements, in contrast to the results of the theoretical analysis. Nonlinear mixing in the phase-frequency detector and the VCO is identified as the main fractional- synthesizers. source of spectral pollution in modulators are compared MASH and single-loop multibit for use in fractional- synthesis. Although the MASH is stable and easy to integrate, the single-loop modulator presents a better solution, showing less sensitivity to noise leakage and noise coupling and providing more flexibility. The measured phase noise is lower than 120 dBc/Hz at 600 kHz and 139 dBc/Hz at 3 MHz. The measured fractional spur level is lower than 100 dBc, satisfying the DCS-1800 spectral purity requirements. All measurements are performed for frequencies close to integer multiples of the reference frequency, where the synthesizer is most sensitive to spurious tones. REFERENCES [1] M. S. J. Steyaert, J. Janssens, B. De Muer, M. Borremans, and N. Itoh, “A 2-V CMOS cellular transceiver front-end,” IEEE J. Solid-State Circuits, vol. 35, pp. 1895–1907, Dec. 2000. [2] T. Cho, E. Dukatz, M. Mack, D. Macnally, M. Marringa, S. Mehta, C. Nilson, L. Plouvier, and S. Rabii, “A single-chip CMOS direct-conversion transceiver for 900-MHz spread-spectrum digital cordless phones,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, San Francisco, CA, Feb. 1999, pp. 228–229. [3] A. Rofougaran, G. Chang, J. J. Rael, J. Y.-C. Chang, M. Rofougaran, P. J. Chang, M. Djafari, J. Min, E. W. Roth, A. A. Abidi, and H. Samueli, “A single-chip 900-MHz spread-spectrum wireless transceiver in 1-m CMOS—Part II: Receiver design,” IEEE J. Solid-State Circuits, vol. 33, pp. 547–555, Apr. 1998. [4] M. Copeland, T. Riley, and T. Kwasniewski, “Delta–sigma modulation in fractional-N frequency synthesis,” IEEE J. Solid-State Circuits, vol. 28, pp. 553–559, May 1993. [5] S. R. Norsworthy, R. Schreier, and G. C. Themes, Delta–Sigma Data Converters: Theory, Design and Simulation. New York: IEEE Press, 1997. [6] B. Miller and R. Conley, “A multiple modulator fractional divider,” IEEE Trans. Instrum. Meas., vol. 40, pp. 578–583, June 1991. [7] “Digital cellular communication system (Phase 2+); Radio transmission and reception,” Eur. Telecommun. Standards Inst., ETSI 300 190 (GSM 05.05 version 5.4.1), 1997. [8] W. Rhee, B.-S. Song, and A. Ali, “A 1.1-GHz CMOS fractional-N modulator,” IEEE J. frequency synthesizer with a 3-b third-order Solid-State Circuits, vol. 35, pp. 1453–1460, Oct. 2000. [9] The Mathworks Inc., Matlab User’s Guide, Version 5. Englewood Cliffs, NJ: Prentice Hall, 1997. [10] J. Yuan and C. Svensson, “New single-clock CMOS latches and flipflops with improved speed and power savings,” IEEE J. Solid-State Circuits, vol. 32, pp. 62–69, Jan. 1997.

16

[11] B. De Muer and M. S. J. Steyaert, “A single-ended 1.5-GHz 8/9 dualmodulus prescaler in 0.7-m CMOS with low phase-noise and high input sensitivity,” in Proc. Eur. Solid-State Circuits Conf. (ESSCIRC), The Hague, Sept. 1998, pp. 256–259. [12] J. Craninckx and M. S. J. Steyaert, “Low-phase-noise fully integrated CMOS frequency synthesizers,” Ph.D. dissertation, Katholieke Univ. Leuven, Belgium, 1997. [13] B. De Muer, M. Borremans, N. Itoh, and M. S. J. Steyaert, “A 1.8-GHz highly tunable low-phase-noise CMOS VCO,” in Proc. IEEE Custom Integrated Circuits Conf. (CICC), Orlando, FL, May 2000, pp. 585–588. [14] B. De Muer and M. S. J. Steyaert, “Fully integrated CMOS frequency synthesizers for wireless communications,” in Analog Circuit Design, W. Sansen, J. H. Huijsing, and R. J. van de Plassche, Eds. Norwell, MA: Kluwer, 2000, pp. 287–323. [15] F. M. Gardner, Phaselock Techniques. New York: Wiley, 1979.

Bram De Muer (S’00) was born in Sint-Amandsberg, Belgium, in 1973. He received the M.Sc. degree in electrical engineering in 1996 from the Katholieke Universiteit Leuven, Belgium, where he is currently working toward the Ph.D. degree on high frequency low-noise integrated frequency synthesizers at the ESAT-MICAS laboratories. He has been a Research Assistant with ESAT-MICAS laboratories since 1996. His research is focused on integrated low-phase-noise VCOs with on-chip planar inductors and high-speed prescaler design, leading to fully integrated fractional-N synthesizers in CMOS technology.

16

Michel S. J. Steyaert (S’85–A’89–SM’92) was born in Aalst, Belgium, in 1959. He received the M.S. degree in electrical-mechanical engineering and the Ph.D. degree in electronics from the Katholieke Universiteit Leuven (K.U. Leuven), Heverlee, Belgium, in 1983 and 1987, respectively. From 1983 to 1986, he obtained an IWONL fellowship (Belgian National Foundation for Industrial Research) which allowed him to work as a Research Assistant at the Laboratory ESAT at K.U. Leuven. In 1987, he was responsible for several industrial projects in the field of analog micropower circuits at the Laboratory ESAT as an IWONL Project Researcher. In 1988, he was a Visiting Assistant Professor at the University of California, Los Angeles. In 1989, he was appointed by the National Fund of Scientific Research (Belgium) as a Research Associate, in 1992 as a Senior Research Associate, and in 1996 as a Research Director at the Laboratory ESAT, K.U. Leuven. Between 1989 and 1996, he was also a part-time Associate Professor and since 1997 an Associate Professor at the K.U. Leuven. His current research interests are in high-performance and high-frequency analog integrated circuits for telecommunication systems and analog signal processing. Dr. Steyaert received the 1990 European Solid-State Circuits Conference Best Paper Award, the 1995 and 1997 ISSCC Evening Session Award, the 1999 IEEE Circuit and Systems Society Guillemin–Cauer Award, and the 1991 NFWO Alcatel-Bell-Telephone award for innovative work in integrated circuits for telecommunications.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000

1039

A Family of Low-Power Truly Modular Programmable Dividers in Standard 0.35-m CMOS Technology Cicero S. Vaucher, Igor Ferencic, Matthias Locher, Sebastian Sedvallson, Urs Voegeli, and Zhenhua Wang

Abstract—A truly modular and power-scalable architecture for low-power programmable frequency dividers is presented. The architecture was used in the realization of a family of low-power fully programmable divider circuits, which consists of a 17-bit UHF divider, an 18-bit -band divider, and a 12-bit reference divider. Key circuits of the architecture are 2/3 divider cells, which share the same logic and the same circuit implementation. The current consumption of each cell can be determined with a simple power optimization procedure. The implementation of the 2/3 divider cells is presented, the power optimization procedure is described, and the input amplifiers are briefly discussed. The circuits were processed in a standard 0.35 m bulk CMOS technology, and work with a nominal supply voltage of 2.2 V. The power efficiency of the UHF divider is 0.77 GHz/mW, and of the -band divider, 0.57 GHz/mW. The measured input sensitivity is 10 mVrms for the UHF divider, and 20 mVrms for the -band divider. Index Terms—CMOS integrated circuits, current-mode logic, frequency synthesizers, phase-locked loop, programmable frequency counter, programmable frequency divider.

I. INTRODUCTION

T

HE feasibility of RF functions implemented in CMOS technology has been demonstrated by a.o. the work presented in [1]–[3]. They show that the scaling of CMOS technologies to deep submicron has made CMOS a technological option for the low-gigahertz frequency range. However, for CMOS to become a commercial option for RF building blocks requires compliance to all trends of the consumer market: miniaturization, low cost, high reliability and long battery lifetime. Bulk CMOS technologies presently available satisfy the low cost and reliability trends by standard design practice. Complying to miniaturization and long battery lifetime, on the other hand, demands CMOS building blocks with low-power dissipation and good electromagnetic compatibility (EMC) characteristics. A critical RF function in this context is the frequency synthesizer, more particularly the programmable frequency divider. The divider consists of logic gates which operate at (or close to) the highest RF frequency. Due to the divider’s complexity, high operation frequency normally leads to high power dissipation. Other crucial aspects of the present-day consumer electronics industry are the short time available for the introduction of new products in the market, and the short product lifetime. On top of that, the lifespan of a given CMOS technology is also short, due to the aggressive scaling of minimum feature sizes. Short Manuscript received November 16, 1999; revised January 24, 2000. C. S. Vaucher is with the Philips Research Laboratories, 5656AA Eindhoven, The Netherlands. I. Ferencic, M. Locher, S. Sedvallson, U. Voegeli, and Z. Wang are with Philips Semiconductors Zurich, 8045 Zurich, Switzerland. Publisher Item Identifier S 0018-9200(00)03878-6.

Fig. 1.

Fully programmable divider based on a dual-modulus prescaler.

time-to-market demands architectures providing easy optimization of power dissipation, fast design time and simple layout work. High reusability, in turn, requires an architecture which provides easy adaptation of the input frequency range and of the maximum and minimum division ratios of existing designs. The choice of the divider architecture is therefore essential for achieving low-power dissipation, high design flexibility and high reusability of existing building blocks. A modular architecture complies with these requirements, as shall be demonstrated in this paper. The focus of the paper is first on the truly modular architecture and on the implementation of the circuits. Then the power optimization procedure and the design of the input amplifier are briefly discussed. Finally, a collection of measured data and the conclusions are presented. II. PROGRAMMABLE DIVIDER ARCHITECTURES A. Architecture Based on a Dual-Modulus Prescaler Fig. 1 depicts the divider architecture based on a dual-modulus prescaler [4], [5]. The design of the dual-modulus prescaler itself has been extensively treated in the literature [4], [6]–[10]. On the other hand, the architecture of Fig. 1 has some undesirable characteristics. One readily notices the lack of modularity of the concept: besides the dual-modulus prescaler, the architecture requires two additional counters for the generation of a given division ratio. The programmable counters—which are, in fact, fully programmable dividers, albeit not operating at the full RF frequency—represent a substantial load at the output of the dual-modulus prescaler, so that power dissipation is increased. Besides, the additional design and layout effort required for the programmable counters increase the time-to-market of new products. These properties led us to conclude that the dual-modulus-based architecture is not an interesting option for the realization of building blocks with high reusability, high flexibility, and short design time.

0018–9200/00$10.00 © 2000 IEEE

1040

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000

(a)

(b) Fig. 2.

Programmable prescaler. (a) Basic architecture. (b) With extended division range.

B. Programmable Prescaler Architectures The “basic” programmable prescaler architecture is depicted in Fig. 2(a). The modular structure consists of a chain of 2/3 divider cells connected like a ripple counter [11]. The structure of Fig. 2(a) is characterized by the absence of long delay loops, as feedback lines are only present between adjacent cells. This “local feedback” enables simple optimization of power dissipation. Another advantage is that the topology of the different cells in the prescaler is the same, therefore facilitating layout work. The architecture of Fig. 2(a) resembles the one presented in [12], which is also based on 2/3 divider cells. Yet there are two fundamental differences. First, in [12] all cells operate at the same (high) current level. Second, the architecture of [12] relies on a common strobe signal shared by all cells. This leads to high power dissipation, because of high requirements on the slope of the strobe signal, in combination with the high load presented by all cells in parallel. The programmable prescaler operates as follows. Once in a division period, the last cell on the chain generates the signal This signal then propagates “up” the chain, being reclocked by each cell along the way. An active mod signal enables a cell to divide by 3 (once in a division cycle), provided that its programming input is set to 1. Division by 3 adds one extra period of each cell’s input signal to the period of the output signal. Hence, a chain of 2/3 cells provides an output signal with a period of

1) can be realized. The division range is thus rather limited, amounting to roughly a factor two between maximum and minimum division ratios.1 The division range can be extended by combining the prescaler with a set-reset counter [13]. In that case, however, the resulting architecture is no longer modular. The divider implementation presented in Fig. 2(b) extends the division range of the basic prescaler, whilst maintaining the modularity of the basic architecture [14]. The operation of the new architecture is based on the direct relation between the performed division ratio and the bus programmed division word Let us introduce the concept of effective length of the chain. It is the number of divider cells that are effectively influencing the division cycle. Deliberately setting the mod input of a certain 2/3 cell to the active level overrules the influence of all cells to the right of that cell. The divider chain behaves as if it has been shortened. The required effective length corresponds to the index of the most significative (and active) bit of the programmed division word. Only a few extra to the programmed division OR gates are required to adapt word, as depicted on the right side of Fig. 2. With the additional logic the division range becomes: ; • minimum division ratio: . • maximum division ratio: We see that the minimum and maximum division ratios can be and respectively. Subseset independently, by choice of quent changes in an optimized design can be realized with low risk. A somewhat similar technique, applied to an asynchronous programmable counter, is described in [9]. III. TRULY MODULAR PROGRAMMABLE DIVIDERS FAMILY

(1) is the period of the input signal , and are the binary programming values of the cells 1 to respectively. The equation shows that all integer division (if all = 0) to (if all = ratios ranging from In (1),

A. Realized Circuits The modular architecture of Fig. 2(b) was applied in the realization of a family of fully programmable frequency dividers. 1In principle, it is also possible to divide by 3 , but the gap between this value and the continuous division range makes it useless in standard synthesizer applications.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000

1041

Fig. 3. Family of truly modular programmable dividers, and corresponding division range of the different implementations.

Fig. 4.

Functional blocks and logical implementation of a 2/3 divider cell.

Three circuits were implemented: an 18-bit -band divider, a 17-bit UHF divider, and a 12-bit reference divider. The architecture and the division range of the dividers is presented in Fig. 3. The -band divider was used as the basis for the UHF and for the reference divider. The UHF divider consists of the same circuitry as the -band divider, except for the first 2/3 cell, which was removed. The reference divider is simply the -band divider stripped off its six high frequency cells. B. Logic Implementation of the 2/3 Divider Cells A 2/3 divider cell comprises two functional blocks, as depicted in Fig. 4. The prescaler logic block divides, upon control input signal by the end-of-cycle logic, the frequency of the either by 2 or by 3, and outputs the divided clock signal to the next cell in the chain. The end-of-cycle logic determines the momentaneous division ratio of the cell, based on the state of the and signals. The signal becomes active once in a division cycle. At that moment, the state of the input is checked, and if = 1, the end-of-cycle logic forces the prescaler

Fig. 5. SCL implementation of an AND gate combined with a latch function.

to swallow one extra period of the input signal. In other words, the cell divides by 3. If = 0, the cell stays in division by 2 mode. Regardless of the state of the input, the end-of-cycle signal, and outputs it to the preceding logic reclocks the signal). cell in the chain C. Circuit Implementation of the 2/3 Divider Cells The use of standard rail-to-rail CMOS logic techniques makes the integration of digital functions with sensitive RF signal processing blocks difficult, due to the generation of large supply and substrate disturbances during logic transitions. Source coupled logic (SCL), often referred to as MOS current mode logic (MCML), has better EMC properties, because of the constant supply current and differential voltage switching operation [8]. Besides, SCL has lower power dissipation than rail-to-rail logic, for (very) high input frequencies [15]. The logic functions of the 2/3 cells are implemented with the SCL structure presented in Fig. 5. The logic tree combines an AND gate with a latch function. Three AND_latch circuits are

1042

Fig. 6.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000

Transient simulation of optimized L-band divider.

used to implement Dlatch1, Dlatch3, Dlatch4 and the AND gates of the 2/3 cells (see Fig. 4). Therefore, six logic functions are achieved, at the expense of three tail currents only. Dlatch2 is implemented as a “normal” D latch (without the differential pair connected to the b–bn inputs). The nominal voltage swing is set to 500 mV in the high frequency (and high current) cells, and to 300 mV in the low curA). The voltage is generated by the tail rent cells and by the load resistances current, set by the current source

TABLE I SCALING OF CURRENTS IN THE 2/3 DIVIDER CELLS

D. Power Dissipation Optimization The absence of long delay loops in the architecture of Fig. 2 enables fast and reliable optimization of power dissipation, since simulation runs may be done for clusters of two cells each time. The critical point in the operation of the programmable prescaler are the divide by 3 actions [11]. There is a maximum delay between the mod and the clock signals in a given cell that still allows properly timed division by 3. The maximum delay where is the period of the cell’s input is signal. The input frequency for each cell is scaled down by the previous one. As a consequence, the maximum allowed delay increases as one moves “down” the chain. As the delay in a cell is inverse proportional to the cell’s current consumption (which is a property of current mode logic circuits), the currents in the cells may be scaled down as well. The results of a transient simulation with the optimized high frequency cells of the -band divider are presented in Fig. 6. The influence of current consumption on the slope of the digital signals (and hence on the time delay) is clearly observed. Table I presents the tail current and the resistance values of the optimized divider cells. Layout optimization took about three iteration cycles. Transient simulations, including extracted parasitics, showed that layout parasitics caused a decrease of about 30% in the highest operation frequency, when compared to the original simulations.

circuitry. High input sensitivity enables the divider to be directly coupled to a wide range of VCO’s, without the need for external (discrete) buffers. In addition, the input amplifier performs other important functions, which are listed below. • It provides reverse isolation, to prevent the divider activity from “kicking-back” and disturbing the VCO. • It provides single-ended to differential conversion of the (very often) single-ended VCO signal. • It enables the VCO to be ac coupled to the divider function, and provides a signal to the first divider cell with the proper dc level. The required amplification of the UHF amplifier, set by sensitivity requirements ( 20 dBm), has been split into two differential stages. Each differential pair operates with 50- A nominal current, and has load resistances of 14 k The -band input amplifier is a scaled version of the UHF input amplifier. The tail currents were doubled, and the drain resistances were halved. The nominal low frequency small signal gain of the UHF amplifier is 26 dB; the gain of the -band amplifier is 23 dB. Negative feedback from the output node to the input was implemented with 50 k resistances. The feedback provides dc biasing to the first stage, and allows AC coupling of the VCO signal to the first differential pair.

E. Input Amplifiers The input amplifier provides the required amplification of the voltage-controlled oscillator (VCO) signal to “digital” levels, determined by the sensitivity specifications and by the divider

IV. MEASUREMENTS The control currents for the UHF and -band dividers can be set externally, through input pins. The input amplifier current is

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000

Fig. 7.

1043

Sensitivity of the UHF divider, for different divider current settings. Division ratio = 511, nominal current is I

= 10 A.

Fig. 8. Sensitivity curves of the L-band divider, for a few divider and amplifier current settings. Division ratio = 1023.

Fig. 9. Maximum operation frequency of the UHF and L-band dividers, as function of divider current consumption (excluding input amplifiers).

controlled by the input current and the 2/3 divider cell curThe curves presented in this section rents by input current were obtained with the nominal supply voltage of 2.2 V, except where otherwise noted. A. Input Sensitivity and Maximum Operation Frequency Fig. 7 presents sensitivity curves of UHF divider, for different The nominal value of current settings of the control current is 10 A. Fig. 8 shows sensitivity curves of the -band divider, for different current settings in the divider and input amplifier. Such as the UHF divider, the -band divider is highly sensitive over a large frequency range. The circuits were driven

Fig. 10. Minimum input level for correct division of the frequency dividers, as function of the input amplifiers consumption.

by a differential input signal, carried over printed circuit board The (PCB) strip-lines with a characteristic impedance of 50 strip-lines were terminated with discrete resistances of 50 to ground, which were set close to the input leads of the input amplifiers. The maximum operation frequencies of the UHF and -band dividers, as function of the current consumption, are plotted in Fig. 9. The effect of the input amplifiers current consumption on the input sensitivities is displayed Fig. 10. Setting the UHF amplifier current to 230 A yields a sensitivity value in excess of 10 mVrms. The influence of the supply voltage on the maximum operdecreased ating frequency was found to be small ( 5% for from 2.2 V down to 1.8 V). It is interesting to mention that

1044

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000

Fig. 11. Phase noise of the reference and UHF divider, measured at 10 MHz. - - - Reference divider (I — Reference divider (I = 10 , F = 20 MHz, 20 dBm).

0

Fig. 12.

= 10 , F

= 20 MHz,

010 dBm).

Comparison of power efficiency (GHz/mW).

MCML circuits have been demonstrated to operate with supply voltages as low as 1.2 V [15], without significant loss of speed. B. Phase Noise The phase noise of the UHF and reference dividers was measured with a dedicated phase noise measurement system. We used coherent demodulation techniques (phase-locked loop configuration), and employed a low-noise 10 MHz signal source during the evaluation of the circuits. To facilitate the output measurements, we implemented signal taps on the of certain cells on the divider chain. The UHF divider was output of its sixth cell; the provided with a tap on the reference divider had the output of the first cell tapped. Fig. 11 presents the phase noise of the UHF divider, with nominal settings for the supply current. The straight lines represent measured phase noise of the reference divider. We see a dependency of the noise floor of the reference divider on the

level of the 20-MHz input signal. For the UHF divider, however, no dependency of the noise floor on the level of the input signal at 640 MHz was observed. An increase of 25% in current led to a change in noise floor from 122 dBc/Hz (nominal bias, = 10 A) to 124 dBc/Hz (with = 12.5 A). The noise floor of the reference divider went from 127.5 dBc/Hz down to 130 dBc/Hz, with increased bias. Fig. 11 shows that the high frequency cells of the UHF divider (see Fig. 3) contribute region.” significantly to the phase noise, specially in the “ noise of about 15 dB is observed, compared An increase of to the noise of the single reference divider’s cell. C. Power Efficiency Fig. 12 presents the power efficiency of the UHF and -band dividers, in comparison to recently published data on low-power dividers and tuning systems. Power efficiency is defined here as the ratio of the divider’s maximum operation frequency to its

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000

power dissipation, with dimensions of GHz/mW. The authors have found that (most of) the dividers presented in the literature do not include an input amplifier. Therefore, only the current consumption of the “core” divider circuits is taken in the calculations. This leads to a fair comparison of the available data. Refs. [2] and [8] describe prescalers implemented in bulk CMOS technology. Reference [16] proposes a new synthesizer architecture, where the divider is “powered-down” after lock has been achieved. Ref. [14] describes a fully programmable divider implemented in an ultrathin-film 0.25- m CMOS/SIMOX process. The CMOS/SIMOX divider power efficiency is about 30% higher than the -band divider’s. Our divider, however, is implemented in a standard 0.35- m bulk technology. The power efficiency of a bipolar dual-modulus prescaler [6] included as well, for technology benchmarking. Its power efficiency is similar to the power efficiency of the CMOS/SIMOX divider. The fully programmable dividers described here demonstrate that architectural choices and optimization procedures can take standard 0.35 m CMOS to performance levels comparable to more expensive technologies, such as bipolar and CMOS/SIMOX processes. V. CONCLUSION This paper presented a truly modular and power-scalable architecture for low-power fully programmable frequency dividers. The flexibility and reusability properties of the architecture were demonstrated with the realization of a family of programmable divider circuits, consisting of the UHF divider (17 bits), the -band divider (18 bits), and the reference divider (12 bits). The UHF and reference divider were implemented by simple removal of divider cells from the -band circuitry. The implementation of the 2/3 divider cells was presented, and the power dissipation optimization procedure was described. To cope with EMC considerations, the dividers were implemented in CMOS SCL (current mode logic). The circuits were processed in a standard 0.35- m bulk CMOS technology, and operate with a nominal supply voltage of 2.2 V. The power efficiency of the UHF divider is 0.77 GHz/mW, and of the -band divider, 0.57 GHz/mW. The measured input sensitivity, including the input amplifiers, is 10 mVrms for the UHF divider, and 20 mVrms for the -band divider.

1045

ACKNOWLEDGMENT The authors wish to thank G. van Veenendaal, of PS-Systems Laboratory, Eindhoven, The Netherlands, for evaluation work done on the programmable dividers. Many thanks go to D. Kasperkovitz and J. de Haas for the support provided during the project. REFERENCES [1] S. Wu and B. Razavi, “A 900 MHz/1.8 GHz CMOS receiver for dual-band applications,” IEEE J. Solid-State Circuits, vol. 33, pp. 2178–2185, Dec. 1998. [2] J. Craninckx and M. Steyaert, “A fully integrated CMOS DCS-1800 frequency synthesizer,” IEEE J. Solid-State Circuits, vol. 33, pp. 2054–2065, Dec. 1998. [3] Q. Huang et al., “GSM transceiver front-end circuits in 0.25-m CMOS,” IEEE J. Solid-State Circuits, vol. 34, pp. 292–303, Mar. 1999. [4] Y. Kado et al., “An ultralow power CMOS/SIMOX programmable counter LSI,” IEEE J. Solid-State Circuits, vol. 32, pp. 1582–1587, Oct. 1997. [5] U. L. Rohde, RF and Microwave Digital Frequency Synthesizers. New York, NY: Wiley, 1997. [6] T. Seneff et al., “A sub-1 mA 1.6 GHz silicon bipolar dual modulus prescaler,” IEEE J. Solid-State Circuits, vol. 29, pp. 1206–1211, Oct. 1994. [7] J. Craninckx and M. Steyaert, “A 1.75 GHz/3 V dual-modulus divide-by-128/129 prescalar in 0.7 m CMOS,” IEEE J. Solid-State Circuits, vol. 31, pp. 890–897, July 1996. [8] F. Piazza and Q. Huang, “A low power CMOS dual modulus prescaler for frequency synthesizers,” IEICE Trans. Electron., vol. E80-C, pp. 314–319, Feb. 1997. [9] P. Larsson, “High-speed architecture for a programmable frequency divider and a dual-modulus prescaler,” IEEE J. Solid-State Circuits, vol. 31, pp. 744–748, May 1996. [10] J. Navarro Soares, Jr. and W. A. M. Van Noije, “A 1.6 GHz dual-modulus prescaler using the extended true-single-phase-clock CMOS circuit technique (E-TSPC),” IEEE J. Solid-State Circuits, vol. 34, pp. 97–102, Jan. 1999. [11] C. S. Vaucher and D. Kasperkovitz, “A wide-band tuning system for fully integrated satellite receivers,” IEEE J. Solid-State Circuits, vol. 33, pp. 987–998, July 1998. [12] N. H. Sheng et al., “A high-speed multimodulus HBT prescaler for frequency synthesizer applications,” IEEE J. Solid-State Circuits, vol. 26, pp. 1362–1367, Oct. 1991. [13] C. S. Vaucher, “An adaptive PLL tuning system architecture combining high spectral purity and fast settling time,” IEEE J. Solid-State Circuits, vol. 35, pp. 490–502, Apr. 2000. [14] C. S. Vaucher and Z. Wang, “A low-power truly modular 1.8 GHz programmable divider in standard CMOS technology,” in Proc. 25th Eur. Solid-State Circuits Conf., Sept. 1999, pp. 406–409. [15] M. Mizuno et al., “A GHz MOS adaptive pipeline technique using MOS current-mode logic,” IEEE J. Solid-State Circuits, pp. 784–791, June 1996. [16] A. R. Shahani et al., “Low-power dividerless frequency synthesis using aperture phase detection,” IEEE J. Solid-State Circuits, vol. 33, pp. 2232–2239, Dec. 1998.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

761

A 10-Gb/s CMOS Clock and Data Recovery Circuit with a Half-Rate Linear Phase Detector Jafar Savoj, Student Member, IEEE, and Behzad Razavi, Member, IEEE

Abstract—A 10-Gb/s phase-locked clock and data recovery circuit incorporates an interpolating voltage-controlled oscillator and a half-rate phase detector. The phase detector provides a linear characteristic while retiming and demultiplexing the data with no systematic phase offset. Fabricated in a 0.18- m CMOS technology in an area of 1 1 0 9 mm2 , the circuit exhibits an RMS jitter of 1 ps, a peak-to-peak jitter of 14.5 ps in the recovered clock, and a bit-error rate of 1 28 10 6 , with random data input of length 223 1. The power dissipation is 72 mW from a 2.5-V supply. Index Terms—Clock recovery, half-rate CDR, optical communication, oscillators, phase detectors, PLLs.

I. INTRODUCTION

W

ITH THE exponential growth of the number of Internet nodes, the volume of the data transported by its backbone continues to rise rapidly. The load of the global Internet backbone is expected to be as high as 11 Tb/s by the year 2005, indicating that the required bandwidth must increase by a factor of 50 to 100 every seven years. Among the available transmission media, optical fibers have the highest bandwidth with the lowest loss, serving as an attractive solution for the Internet backbone. However, the electronic interface proves to be the bottleneck in designing high-speed optical systems. In order to push the speed of operation beyond the capabilities of the fabrication processes, a number of transceivers can be fabricated on the same chip. The input and output signals can be carried either over a bundle of fibers, or on a single fiber that uses wave-division multiplexing. In this scenario, both the power dissipation and the complexity of each transceiver become critical. While stand-alone building blocks of optical transceivers have been built in GaAs and silicon bipolar technologies [1], [2], full integration of many transceivers makes it desirable to use CMOS technology. This paper describes the design of the first 10-Gb/s CMOS clock and data recovery (CDR) circuit. A linear phase detector (PD) is introduced that compares the phase of the incoming data with that of a half-rate clock. The CDR circuit also incorporates a three-stage interpolating ring oscillator to achieve a wide tuning range. Fabricated in a 0.18- m CMOS technology, the circuit achieves an RMS jitter of 1 ps with a pseudorandom sewhile dissipating 72 mW from a 2.5-V supply. quence of Manuscript received August 21, 2000; revised December 1, 2000. This work was supported in part by the Semiconductor Research Corporation and in part by Cypress Semiconductor. The authors are with the Electrical Engineering Department, University of California, Los Angeles, CA 90095-1594 USA (e-mail: [email protected]). Publisher Item Identifier S 0018-9200(01)03020-7.

The next section of the paper presents the CDR architecture and design issues. Section III deals with the design of the building blocks. Section IV describes the experimental results.

II. ARCHITECTURE The choice of the CDR architecture is primarily determined by the speed and supply voltage limitations of the technology as well as the power dissipation and jitter requirements of the system. In a generic CDR circuit, shown in Fig. 1, the phase detector compares the phase of the incoming data to the phase of the clock generated by the voltage-controlled oscillator (VCO), producing an error that is proportional to the phase difference between its two inputs. The error is then applied to a charge pump and a low-pass filter so as to generate the oscillator control voltage. The clock signal also drives a decision circuit, thereby retiming the data and reducing its jitter. If attempted in a 0.18- m CMOS technology, the architecture of Fig. 1 poses severe difficulties for 10-Gb/s operation. Although exploiting aggressive device scaling, the CMOS process used in this work provides marginal performance for such speeds. For example, even simple digital latches or three-stage ring oscillators fail to operate reliably at these rates. These issues make it desirable to employ a “half-rate” CDR architecture, where the VCO runs at a frequency equal to half of the input data rate. The concept of the half-rate clock has been used in [2]–[5]. However, [2] and [3] incorporate a bang–bang phase detector, possibly creating a large ripple on the control line of the oscillator and hence high jitter. The circuit reported in [4] inherently has a smaller output jitter as a result of using a linear phase detector, but it fails to operate at speeds above 6 Gb/s in 0.18- m CMOS technology. The circuit of [5] benefits from a new linear phase detection scheme, but it may not operate properly with certain data patterns. Another critical issue in the architecture of Fig. 1 relates to the inherently unequal propagation delays for the two inputs of the phase detector: most phase detectors that operate properly with random data (e.g., a D flip-flop) are asymmetric with respect to the data and clock inputs, thereby introducing a systematic skew between the two in phase-lock condition. Since it is difficult to replicate this skew in the decision circuit, the generic CDR architecture suffers from a limited phase margin, unless the raw speed of the technology is much higher than the data rate. The problem of the skew demands that phase detection and data regeneration occur in the same circuit such that the clock still samples the data at the midpoint of each bit even in the

0018–9200/01$10.00 © 2001 IEEE

762

Fig. 1.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Generic CDR architecture.

Fig. 3. Effect of clock duty cycle distortion.

Another important aspect of CDR design is the leakage of data transitions to the oscillator. In Fig. 2, such leakage arises to in the PD; 2) from: 1) capacitive feedthrough from and to through capacitive feedthrough from to the oscillator the multiplexer; and 3) coupling of through the substrate. To minimize these effects, the VCO is followed by an isolation buffer and all of the building blocks incorporate fully differential topologies.

Fig. 2.

III. BUILDING BLOCKS

Half-rate CDR architecture.

A. VCO presence of a finite skew. For example, the Hogge PD [6] automatically sets the clock phase to the optimum point in the data eye (but it fails to operate properly with a half-rate clock). The above considerations lead to the CDR architecture shown in Fig. 2. Here, a half-rate PD produces an error proportional to the phase difference between the 10-Gb/s data stream and the 5-GHz output of the VCO. Furthermore, the PD automatically retimes and demultiplexes the data, generating two 5-Gb/s and . Although the focus of this work sequences is point-to-point communications, a full-rate retimed output, , is also generated to produce flexibility in testing and exercise the ultimate speed of the technology. The VCO has both fine and coarse control lines, the latter allowing inclusion of a frequency-locked loop in future implementations. In this work, a new approach to performing linear phase detection using a half-rate clock is described. Owing to its simplicity, this technique achieves both high speed and low power dissipation while minimizing the ripple on the oscillator control voltage. It is interesting to note that half-rate architectures do suffer from one drawback: the deviation of the clock duty cycle from 50% translates to bimodal jitter. As depicted in Fig. 3, since both clock edges sample the data waveform, the clock duty cycle distortion pushes both edges away from the midpoint of the bits. Typical duty cycle correction techniques used at lower speeds are difficult to apply here as they suffer from significant dynamic mismatches themselves. Thus, special attention is paid to symmetry in the layout to minimize bimodal jitter.

The design of the VCO directly impacts the jitter performance and the reproducibility of the CDR circuit. While LC topologies achieve a potentially lower jitter, their limited tuning range makes it difficult to obtain a target frequency without design and fabrication iterations. Since the circuit reported here was our first design in 0.18- m technology, a ring oscillator was chosen so as to provide a tuning range wide enough to encompass process and temperature variations. A three-stage differential ring oscillator [Fig. 4(a)] driving a buffer operates no faster than 7 GHz in 0.18- m CMOS technology. The half-rate CDR architecture overcomes this limitation, requiring a frequency of only 5 GHz. As shown in Fig. 4(b), each stage consists of a fast and a slow path whose outputs are summed together. By steering the current between the fast and the slow paths, the amount of delay achieved through each stage and hence the VCO frequency can be adjusted. All three stages in the ring are loaded by identical buffers to achieve equal rise and fall times and thus improve the jitter performance. Fig. 4(c) shows the transistor implementation of each delay stage. The fast and slow paths are formed as differential circuits sharing their output nodes. The tuning is achieved by reducing the tail current of one and increasing that of the other differentially. Since the low supply voltage – and makes it difficult to stack differential pairs under – , the current variation is performed through mirror arrangements driven by pMOS differential pairs. Fig. 5 depicts the small-signal gain and phase response of each delay stage. While providing a phase shift of 60 , each stage achieves a gain

SAVOJ AND RAZAVI: CLOCK AND DATA RECOVERY CIRCUIT

763

Fig. 5. Small-signal gain and phase response of each delay stage.

Fig. 4. (a) Three-stage ring oscillator. (b) Implementation of each stage. (c) Transistor-level schematic.

of 5.5 dB at 5 GHz, yielding robust oscillation at the target frequency. A critical drawback of supply scaling in deep-submicron technologies is the inevitable increase in the VCO gain for a given tuning range. To alleviate this difficulty, the control of the VCO is split between a coarse input and a fine input. The partitioning of the control allows more than one order of magnitude reduction in the VCO sensitivity. The idea is that the fine control is established by the phase detector and the coarse control is a provision for adding a frequency detection loop. The coarse control is provided externally in this prototype. The fine control exhibits a gain of 150 MHz/V and the coarse control, 2.5 GHz/V (Fig. 6). The tuning range is 2.7 GHz ( 54%). B. Phase Detector Phase detectors generally appear in two different forms. Nonlinear PDs coarsely quantize the phase error, producing only a positive or negative value at their output. Linear PDs, on the other hand, generate a linearly proportional output that drops to zero when the loop is locked. Compared to nonlinear PDs, linear PDs result in less charge pump activity, smaller ripple on the oscillator control line, and hence lower jitter. In a linear PD, such as that described in [6],

Fig. 6. VCO gain partitioning. (a) Fine control. (b) Coarse control.

the phase error is obtained by taking the difference between the width of two pulses, both of which are generated whenever a data transition occurs. The width of one of the pulses is linearly proportional to the phase difference between the clock and the data, whereas the width of the other is constant. By using a differential error signal, pattern dependency of phase error is can-

764

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Fig. 8.

Fig. 7.

(a) Phase detector. (b) Operation of the circuit.

celled because both pulses are present only when a data transition occurs. For linear phase comparison between data and a half-rate clock, each transition of the data must produce an “error” pulse whose width is equal to the phase difference. Furthermore, to avoid a dead zone in the characteristics, a “reference” pulse must be generated whose area is subtracted from that of the error pulse, thus creating a net value that falls to zero in lock. The above observations lead to the PD topology shown in Fig. 7(a). The circuit consists of four latches and two XOR gates. The data is applied to the inputs of two sets of cascaded latches, each cascade constituting a flipflop that retimes the data. Since the flipflops are driven by a half-rate clock, the two output seand are the demultiplexed waveforms of quences the original input sequence if the clock samples the data in the middle of the bit period. The operation of the PD can be described using the waveforms depicted in Fig. 7(b). The basic unit employed in the circuit is a latch whose output carries information about the zero crossings of both the data and the clock. The output of each latch tracks its input for half a clock period and holds the value for the other half, yielding the waveforms shown in Fig. 7(b) for and . The two waveforms differ because their corpoints responding latches operate on opposite clock edges. Produced , the Error signal is equal to ZERO for the portion of as and overlap, and equal to the time that identical bits of XOR of two consecutive bits for the rest. In other words, Error is equal to ONE only if a data transition has occurred. It may seem that the Error signal uniquely represents the phase difference, but that would be true only if the data were pe-

Symmetric XOR gate.

riodic. The random nature of the data and the periodic behavior of the clock in fact make the average value of Error pattern dependent. For this reason, a reference signal must also be generated whose average conveys this dependence. The two waveand contain the samples of the data at the rising forms contains pulses as and falling edges of the clock. Thus, wide as half the clock period for every data transition, serving as the reference signal. While the two XOR operations provide both the Error and the Reference pulses for every data transition, the pulses in Error are only half as wide as those in Reference. This means that the amplitude of Error must be scaled up by a factor of two with respect to Reference so that the difference between their averages drops to zero when clock transitions are in the middle of the data eye. The phase error with respect to this point is then linearly proportional to the difference between the two averages. In order to generate a full-rate output, the demultiplexed sequences are combined by a multiplexer that operates on the half-rate clock as well. This output can also be used for testing purposes in order to obtain the overall bit-error rate (BER) of the receiver. It is important to note that the XOR gates in Fig. 7 must be symmetric with respect to their two differential inputs. Otherwise, differences in propagation delays result in systematic phase offsets. Each of the XOR gates is implemented as shown in Fig. 8 [7]. The circuit avoids stacking stages while providing perfect symmetry between the two inputs. The output is singleended but the single-ended Error and Reference signals produced by the two XOR gates in the phase detector are sensed with respect to each other, thus acting as a differential drive for the charge pump. The operation of the XOR circuit is as follows. If the two logical inputs are not equal, then one of the input transistors on the left and one of the input transistors on the right off. If the two inputs are identical, turn on, thus turning . Since the average one of the tail currents flows through current produced by the Error XOR gate is half of that generated is scaled differently, by the Reference XOR gate, transistor making the average output voltages equal for zero phase differreduces the ence. Channel length modulation of transistor precision of current scaling between the two XOR gates. This effect can be avoided by increasing the length of the device. The gain of the PD is determined by the value of the resistor and the tail current sources ( ). The voltage is generated on chip in order to track the variations over temperature and

SAVOJ AND RAZAVI: CLOCK AND DATA RECOVERY CIRCUIT

765

Fig. 10.

Fig. 9.

Determination of PD gain.

process. This voltage equals the output common-mode level of the latches preceding the XOR gate. It is generated using a differential pair that is a replica of the preamplifier section of the raises the common-mode level of the latch. Current source differential signal formed by the Error and Reference signals, compatible with the input of the charge pump. making It is instructive to plot the input/output characteristic of the PD to ensure linearity and absence of dead zone. This is accomplished by obtaining the average values of Error and Reference while the circuit operates at maximum speed. Fig. 9 shows the simulated behavior as the phase difference varies from zero to one bit period. The Reference average exhibits a notch where the clock samples the metastable points of the data waveform. The Error and Reference signals cross at a phase difference approximately 55 ps from the metastable point, indicating that the systematic offset between the data and the clock is very small. The linear characteristic of the phase detector results in minimal charge pump activity and small ripple on the control line in the locked condition. The choice of the logic family used for the XOR gates and the latches is determined by the speed and switching noise considerations. While rail-to-rail CMOS logic achieves relatively high speeds, it requires amplifying the data swings generated by the stage preceding the CDR circuit (typically a limiting amplifier). Furthermore, CMOS logic produces enormous switching noise in the substrate and on the supplies, disturbing the oscillator considerably. For these reasons, the building blocks employ current-steering logic. The phase detector incorporates an input buffer with on-chip resistive matching. C. Charge Pump and Loop Filter Fig. 10 shows the implementation of the differential charge pump. The common-mode feedback (CMFB) circuit senses the and , providing correction through output CM level by and . Both the matching and channel-length modulation – in Fig. 10 impact the residual phase error in locked of

Charge pump and loop filter.

condition. Thus, their lengths and widths are relatively large to minimize these effects. The design of the loop filter is based on a linear time-invariant model of the loop and is performed in continuous time domain. The loop is in general a nonlinear time-variant system and can only be assumed linear if the phase error is small. The timeinvariant analysis is valid if the averaging behavior of the loop rather than its single-cycle performance is of interest, i.e., the loop can be analyzed by continuous-time approximation if the loop bandwidth is small. Under this condition, the state of the CDR changes by only a small amount on each cycle of the input signal. A low-pass jitter transfer function with a given bandwidth and a maximum gain in the passband is specified for a SONET system. The closed-loop transfer function of the CDR has a zero at a frequency lower than the first closed-loop pole. This results in jitter peaking that can never be eliminated. But the peaking can be reduced to negligible levels by overdamping the loop. As derived in [8], the closed-loop unity-gain bandwidth is approximated as (1) and are the gains of the VCO and PD, rewhere denotes the conversion gain of the charge spectively, and . pump. Equation (1) can be used to determine the value of The amount of the jitter peaking in the closed-loop transfer function can be approximated as (2) Equation (2) yields the required value of . In order to obtain greater suppression of high-frequency jitter, a second capacitor and . is added in parallel with the series combination of These components are added externally to achieve flexibility in defining the closed-loop characteristics of the circuit. Another advantage of linear PDs over their bang–bang counterparts is that their jitter transfer characteristics is independent of the jitter amplitude. It should also be mentioned that if the CDR is followed by a demultiplexer, the tight specifications for jitter peaking need not to be satisfied because such specifications are defined for cascaded regenerators handling full-rate data. Fig. 11 depicts the simulated behavior of the CDR circuit at the transistor level. The voltage across the filter is initialized to

766

Fig. 11.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Lock acquisition.

Fig. 13. (a) Spectrum of the recovered clock. (b) Recovered clock in the time domain. Fig. 12.

Chip photograph.

a value relatively close to its value in phase lock. The loop goes through a transition of 350 ns before it locks. The ripple on the control line in phase lock is approximately 1 mV. IV. EXPERIMENTAL RESULTS The CDR circuit has been fabricated in a 0.18- m CMOS process. Fig. 12 shows a photograph of the chip, which occumm . Electrostatic discharge (ESD) pies an area of protection diodes are included for all pads except the high-speed lines. Nonetheless, since all of these lines have a 50- termina, they exhibit some tolerance to ESD. The circuit is tion to tested in a chip-on-board assembly. In this prototype, the width of the poly resistors was not sufficient to guarantee the nominal sheet resistance. As a result, the fabricated resistor values deviated from their nominal value by 30%, and the VCO center frequency was proportionally lower than the simulated value at the nominal supply voltage (1.8 V). The supply was increased to 2.5 V, to achieve reliable operation at 10 Gb/s. While such a high supply voltage creates hot-carrier effects in rail-to-rail CMOS circuits, it is less detrimental in this design because no transistor in the circuit experiences a gate–source or drain–source voltage of more than 1 V. This issue is nonetheless resolved in a second design [9] by proper choice of resistor dimensions. The circuit is brought close to lock with the aid of the VCO coarse control before phase locking takes over. Fig. 13(a) shows the spectrum of the clock in response to a . The effect of the noise 10-Gb/s data sequence of length shaping of the loop can be observed in this spectrum. The phase

Fig. 14. Measured jitter transfer characteristic.

noise at 1-MHz offset is approximately equal to 106 dBc/Hz. Fig. 13(b) depicts the recovered clock in the time domain. The time-domain measurements using an oscilloscope overestimate the jitter, requiring specialized equipment, e.g., the Anritsu MP1777 jitter analyzer. The jitter performance of the CDR circuit is characterized by this analyzer. A random sequence produces 14.5 ps of peak-to-peak and 1 ps of length of RMS jitter on the clock signal. These values are reduced to 4.4 and 0.6 ps, respectively, for a random sequence of length . The measured jitter transfer characteristics of the CDR is shown in Fig. 14. The jitter peaking is 1.48 dB and the 3-dB bandwidth is 15 MHz. The loop bandwidth can be reduced to

SAVOJ AND RAZAVI: CLOCK AND DATA RECOVERY CIRCUIT

767

supply. The VCO, the PD, and the clock and data buffers consume 20.7, 33.2, and 18.1 mW, respectively. V. CONCLUSION CMOS technology holds great promise for optical communication circuits. The raw speed resulting from aggressive scaling along with high levels of integration provide a high performance at low cost. A 10-Gb/s clock and data recovery circuit designed in 0.18- m CMOS technology performs phase locking, data regeneration, and demultiplexing with 1 ps of RMS jitter. REFERENCES

Fig. 15.

(a) Recovered demultiplexed data. (b) Recovered full-rate data.

the SONET specifications, but the jitter analyzer must then generate large jitter and drives the loop out of lock. The loop bandwidth can be reduced to the SONET specifications if a means of frequency detection is added to the loop [9]. The circuit is then much less susceptible to loss of lock due to the jitter generated by the analyzer. Fig. 15 depicts the retimed data. The demultiplexed data outputs are shown in Fig. 15(a). The difference between the waveforms results from systematic differences between the bond wires and traces on the test board. Fig. 15(b) depicts the full-rate output. Using this output, the BER of the system can , the BER is be measured. With a random sequence of . However, a random sequence of smaller that results in a BER of . This BER can be reduced if the bandwidth of the output buffer driving the 10-Gb/s data is increased. Furthermore, if the value of the linear resistors is adjusted to their nominal value, the increased operating speed of the back-end multiplexer results in an improved BER [9]. The CDR circuit exhibits a capture range of 6 MHz and a tracking range of 177 MHz. The total power consumed by the circuit excluding the output buffers is 72 mW from a 2.5-V

[1] Y. M. Greshishchev and P. Schvan, “SiGe clock and data recovery IC with linear type PLL for 10-Gb/s SONET application,” in Proc. 1999 Bipolar/BiCMOS Circuits and Technology Meeting, Sept. 1999, pp. 169–172. [2] M. Wurzer et al., “40-Gb/s integrated clock and data recovery circuit in a silicon bipolar technology,” in Proc. 1998 Bipolar/BiCMOS Circuits and Technology Meeting, Sept. 1998, pp. 136–139. [3] M. Rau et al., “Clock/data recovery PLL using half-frequency clock,” IEEE J. Solid-State Circuits, vol. 32, pp. 1156–1159, July 1997. [4] K. Nakamura et al., “A 6 Gb/s CMOS phase detecting DEMUX module using half-frequency clock,” in Dig. Symp. VLSI Circuits, June 1998, pp. 196–197. [5] E. Mullner, “A 20-Gb/s parallel phase detector and demultiplexer circuit in a production silicon bipolar technology with f = 25 GHz,” in Proc. 1996 Bipolar/BiCMOS Circuits and Technology Meeting, Sept. 1996, pp. 43–45. [6] C. Hogge, “A self-correcting clock recovery circuit,” J. Lightwave Technol., vol. LT-3, pp. 1312–1314, Dec. 1985. [7] B. Razavi, Y. Ota, and R. G. Swarz, “Design techniques for low-voltage high-speed digital bipolar circuits,” IEEE J. Solid-State Circuits, vol. 29, pp. 332–339, Mar. 1994. [8] L. M. De Vito, “A versatile clock recovery architecture and monolithic implementation,” in Monolithic Phase-Locked Loops and Clock Recovery Circuits, Theory and Design, B. Razavi, Ed. New York: IEEE Press, 1996. [9] J. Savoj and B. Razavi, “A 10-Gb/s CMOS clock and data recovery circuit with frequency detection,” in Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2001, pp. 78–79.

Jafar Savoj (S’98) was born in Tehran, Iran, in 1974. He received the B.S.E.E. degree from Sharif University of Technology, Tehran, in 1996 and the M.S.E.E. degree from the University of California, Los Angeles (UCLA), in 1998. He is currently working toward the Ph.D. degree at UCLA. He spent the summer of 1998 with Integrated Sensor Solutions, San Jose, CA, working on the design of high-precision interfaces for sensor applications. During the summer of 1999, he was with NewPort Communications, Irvine, CA, developing CMOS clock and data recovery circuits for the SONET OC-192 standard. Mr. Savoj received the IEEE Solid-State Circuits Society Predoctoral Fellowship for 2000–2001, and the Beatrice Winner Award for Editorial Excellence at the 2001 ISSCC. He is also a recipient of the Design Contest Award of the 2001 Design Automation Conference.

ISSCC 2001 / SESSION 5 / GIGABIT OPTICAL COMMUNICATIONS I / 5.3

5.3 A 10Gb/s CMOS Clock and Data Recovery Circuit with Frequency Detection Jafar Savoj, Behzad Razavi Electrical Engineering Department, University of California, Los Angeles, CA Clock and data recovery (CDR) circuits operating in the 10Gb/s range have become attractive for the optical fiber backbone of the Internet. While CDR circuits operating at 10Gb/s have been designed in bipolar technologies, cost and integration issues make it desirable to implement these circuits in standard CMOS processes. This 10Gb/s CDR circuit is realized in 0.18µm CMOS technology. Architecture and circuit techniques circumvent the speed limitations of the devices. In contrast to previous work [1], this design incorporates an LC oscillator to reduce the jitter as well as a phase/frequency detector to achieve a wide capture range. Shown in Figure 5.3.1, the CDR consists of a phase/frequency detector (PFD), a voltage-controlled oscillator (VCO), a charge pump, and a low-pass filter (LPF). The PFD compares the phase and frequency of the input data to that of a half-rate clock, providing two binary error signals for phase and frequency. The PFD is designed so that, in addition to providing information about the phase error, it retimes the data as well. Consequently, the CDR exhibits no systematic offset, i.e., inherent skews between clock and data edges due to their unidentical paths through the loop do not degrade the quality of detection. The VCO provides four differential half-quadrature phases over the full tuning range. All building blocks are fully differential. Since the half-rate frequency detector requires clock phases that are integer multiples of 45°, the 5GHz VCO is designed as a ring structure consisting of four LC-tuned stages [Figure 5.3.2a]. If the dc feedback around the ring is positive, all stages operate in-phase at the resonance frequency defined by the LC tanks. On the other hand, if the dc feedback is negative, the frequency shifts by a small amount so as to allow each stage to contribute 45° of phase. The oscillator topology has two advantages over resistive-load ring oscillators. First, owing to the phase slope (Q) provided by the resonant loads, it exhibits less phase noise. Second, its frequency of oscillation is only a weak function of the number of stages, generating multiple phases with no speed penalty. By comparison, a four-stage resistive-load ring operates at a lower frequency. Figure 5.3.2b shows the implementation of each stage. The loads are formed using on-chip spiral inductors and MOS varactors. Resistor R1 provides a shift in the output common-mode level, allowing both positive and negative voltages across the varactors and thus maximizing the tuning range. Modeling each tank by a parallel network, the required 45° phase shift slightly detunes the circuit. The oscillation frequency is given by ω0=(LC)-0.5(11/Q0)0.5, where Q0 denotes the Q of each stage at resonance. The phase detector (PD) is derived from the data transition tracking loop described in Reference 2. In this PD, in-phase and quadrature phases of a half-rate clock signal sample the data in two double-edge-triggered flipflops (DETFFs). Figure 5.3.3 shows the implementation of the PD. Two latches operating on opposite clock phases and a multiplexer form a DETFF that samples the data using both the positive and negative transitions of a half-rate clock. The two signals V1 and V2 are therefore the inphase and quadrature samples of data, respectively, and one is used to route the other or its complement.

The phase detector operates at high speeds because it uses a half-rate clock. Since in the locked condition, the rising and falling edges of the quadrature clock coincide with data transitions, the in-phase clock transitions sample the data at its optimum point with no systematic offset, generating a full-rate output stream. Also, since the phase-error signal is reevaluated only at data transitions, it incurs little ripple. Note that the output is independent of the data transition density, resulting in reduction of pattern-dependent jitter. With the small CDR loop bandwidths specified by optical standards, circuits employing only phase detection suffer from an extremely narrow capture range, e.g., about 1% of the center frequency. For this reason, a means of frequency detection is necessary to guarantee lock to random data. As with other phase detectors, the half-rate PD of Figure 5.3.3 generates a beat frequency equal to the difference between the data rate and twice the VCO frequency. However, it does not provide knowledge of the polarity of this difference. Figure 5.3.4 depicts the half-rate phase and frequency detector introduced in this work. A second PD is added and driven by phases that are 45° away from those in the first PD. The circuit operates as follows. (1) If the clock is slow, VPD1 leads VPD2; therefore, if VPD2 is sampled by the rising and falling edges of VPD1, the results are negative and positive, respectively. (2) If the clock is fast, VPD1 lags VPD2. Therefore, if VPD2 is sampled by the rising and falling edges of VPD1, the results are the reverse of the previous case. The output buffer delivering the 10Gb/s retimed data with high current levels requires a bandwidth of more than 7 GHz. As shown in Figure 5.3.5, the buffer stage employs inductive peaking [3]. The value of the spiral inductors is chosen so as to avoid ripple in the passband. Since the quality factor of the inductors is not critical here, the spiral structures have a linewidth of only 4µm to achieve a high self-resonance frequency. The CDR circuit is fabricated in a 0.18µm CMOS technology. The circuit is tested in a chip-on-board assembly while operating with a 1.8V supply. The phase noise of the clock in response to a 9.95328Gb/s data sequence of length 223-1 at 1MHz offset is approximately equal to -107dBc/Hz. Figure 5.3.6a depicts the recovered clock and data. A pseudo-random sequence of length 223-1 produces 9.9ps of peak-to-peak and 0.8ps rms jitter on the clock signal. The jitter characteristics are measured by the Anritsu MP1777 jitter analyzer. The measured jitter transfer characteristic of the CDR is shown in Figure 5.3.6b. The jitter peaking is 0.04dB and the 3dB bandwidth is 5.2MHz. Despite the small loop bandwidth, the frequency detector provides a capture range of 1.43GHz, obviating the need for external references. The total power consumed by the circuit excluding the output buffers is 91mW from a 1.8V supply. Figure 5.3.7 shows a micrograph of the chip, which occupies 1.75x1.55mm2. Acknowledgments: The authors thank NewPort Communications for fabrication and test support. This work was supported by SRC and Cypress Semiconductor. References: [1] J. Savoj and B. Razavi, “A 10-Gb/s CMOS Clock and Data Recovery Circuit,” Dig. of Symposium on VLSI Circuits, pp. 136-139, June 2000. [2] A. W. Buchwald, Design of Integrated Fiber-Optic Receivers Using Heterojunction Bipolar Transistors, Ph.D. Thesis, University of California, Los Angeles, Jan. 1993. [3] J. Savoj and B. Razavi, “A CMOS Interface Circuit for Detection of 1.2Gb/s RZ Data,” ISSCC Digest of Technical Papers, pp. 278-279, Feb. 1999.

• 2001 IEEE International Solid-State Circuits Conference

0-7803-6608-5

©2001 IEEE

ISSCC 2001 / February 5, 2001 / Salon 9 / 2:30 PM

Figure 5.3.1: CDR architecture.

Figure 5.3.2: (a) Four-stage LC-tuned ring oscillator, (b) implementation of one stage.

Figure 5.3.3: Phase detector.

Figure 5.3.4: Phase and frequency detector.

Figure 5.3.5: Output buffer.

Figure 5.3.7: Die micrograph.

• 2001 IEEE International Solid-State Circuits Conference

0-7803-6608-5

©2001 IEEE

Figure 5.3.6: (a) Recovered data and clock, (b) measured jitter transfer characteristics.

• 2001 IEEE International Solid-State Circuits Conference

0-7803-6608-5

©2001 IEEE

1320

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 9, SEPTEMBER 1999

A 40-Gb/s Integrated Clock and Data Recovery Silicon Bipolar Technology Circuit in a 50-GHz Martin Wurzer, Josef B¨ock, Herbert Knapp, Wolfgang Zirwas, Fritz Schumann, and Alfred Felder

Abstract—Clock and data recovery (CDR) circuits are key electronic components in future optical broadband communication systems. In this paper, we present a 40-Gb/s integrated CDR circuit applying a phase-locked loop technique. The IC has been fabricated in a 50-GHz f T self-aligned double-polysilicon bipolar technology using only production-like process steps. The achieved data rate is a record value for silicon and comparable with the best results for this type of circuit realized in SiGe and III–V technologies. Index Terms— Bipolar digital integrated circuits, clocks, data communication, high-speed integrated circuits, phase-locked loops, synchronization. Fig. 1. Block diagram of the fiber-optic link.

I. INTRODUCTION

T

HE demands for new services and increased flexibility have accelerated the development of telecommunication transport networks, which has resulted in the synchronous optical networks (SONET)/synchronous digital hierarchy (SDH) standards. Key elements of such high-capacity networks are fiber-optic communication links. Time-division multiplexing (TDM) systems operating at 10 Gb/s are now under development using advanced silicon bipolar production technologies to fabricate all high-speed IC’s. Next generations with SONET/SDH are expected to operate at data rates of 40 Gb/s [1]. To enable such large-capacity optical transmission systems to be put into practical use, very high-speed monolithic IC’s are required as key components. It has been shown that basic digital functions like MUX and DMUX for 40-Gb/s optical-fiber TDM systems can be realized in silicon bipolar technology [2]. But clock and data recovery circuits in a silicon technology have so far only been demonstrated for 20 Gb/s [3]. With more sophisticated SiGe or III–V technologies, 40-Gb/s operation has been achieved [4]–[7]. Some of these solutions are hybrid. All these realizations are based on either high-Q filters or phase-locked loops (PLL’s). The advantage of the first concept is the easy implementation. The disadvantages are that temperature and frequency variation of filter group delay makes sampling time difficult to control, the high-Q filter is

difficult to integrate, and narrow pulses require a high . The major advantages of the second approach are that the phase between the extracted clock and the received data is locked and that it can be implemented as a monolithic integrated circuit. The goal of our work was to implement a cost-effective and reliable clock and data recovery circuit for 40 Gb/s in a production-near silicon bipolar technology. Therefore, an approach based on a PLL technique has been selected and will be described in this paper in more detail.

Manuscript received January 11, 1999; revised March 23, 1999. M. Wurzer is with Corporate Technology, Microelectronics, Siemens AG, Munich 81730 Germany and the Institut f¨ur Nachrichtentechnik und Hochfrequenztechnik, Technische Universit¨at Wien, Austria (e-mail: [email protected]). J. B¨ock, H. Knapp, F. Schumann, and A. Felder are with Corporate Technology, Microelectronics, Siemens AG, Munich 81730 Germany W. Zirwas is with Information and Communication Networks, Siemens AG, Munich 81379 Germany. Publisher Item Identifier S 0018-9200(99)06493-8.

III. ARCHITECTURE OF THE CLOCK AND DATA RECOVERY CIRCUIT

II. FIBER-OPTIC LINK The described circuit has been developed for use in 40Gb/s TDM fiber-optic links. A block diagram of such a link is shown in Fig. 1. The time-division multiplexer collects several data channels into a single high-speed data stream. An external modulator converts the data from electrical to optical signals by modulating the light of a semiconductor laser diode (E/O block). The O/E conversion on the receiving side is performed by a photodiode followed by a transimpedance amplifier. This bitstream is fed into the clock and data recovery unit. Its task is to synchronize the local oscillator to the phase of the incoming data and to retime the data. In contrast to 10-Gb/s systems, the decision function is now performed by a demultiplexer. This requires a DMUX with excellent retiming capability combined with a high input sensitivity [8].

Fig. 2(a) shows the used concept of the CDR for the fiberoptic link (Fig. 1) in more detail. The main processing blocks are the demultiplexer consisting of two master–slave D-flipflops (DFF1, DFF2) in parallel and an additional master–slave D-flip-flop (DFF3), which forms the phase detector together

0018–9200/99$10.00  1999 IEEE

WURZER et al.: 40-Gb/s INTEGRATED CLOCK AND DATA RECOVERY CIRCUIT

(a)

1321

(b)

Fig. 2. CDR circuit: (a) block diagram and (b) timing diagram.

Fig. 3. Circuit diagram of the master–slave D-flip-flop.

with DFF2 and the XOR gate. All these functions are integrated in a single chip. The fixed 90 phase shifter, voltage-controlled oscillator (VCO), and loop filter have been realized externally with commercially available components. Fig. 2(b) shows the timing diagram. The incoming 40-Gb/s data signal is applied to flip-flops DFF1, DFF2, and DFF3. DFF1 is toggled by CLK, DFF2 by CLK, and DFF3 by the 90 delayed clock signal. This results in the sampling of the input in the vicinity of midbit and each following potential transition. If a transition is present, the phase relationship of the data and the clock can be deduced to be early or late. If the midbit clock CLK is too early, DFF3 samples the same bit; if it is too late, DFF3 samples the following bit. Under locked conditions, DFF3 samples at the edge of the data eye. The XOR compares the output samples of DFF2 and DFF3. The result is fed to the loop filter. The output signal of the loop filter serves as the control signal of the VCO. The advantages of this concept are that all components operate at half the data rate and that the input is demultiplexed at the same time. The disadvantage is that the input signal has to drive three DFF’s in parallel.

IV. CIRCUIT

AND

DESIGN PRINCIPLES

The circuit is designed for the single supply voltage of 5 V. The circuit principles used are seen in the circuit blocks of a master–slave D-flip-flop (MS-DFF), shown in Fig. 3. For details, see [9]. The well-proven E CL (emitter–emitter coupled logic) is used with emitter followers at the inputs and current switches at the outputs. The series gating between clock and data signals enables differential operation with low mV - ) resulting in an increase in voltage swings ( speed and a reduction of power consumption. Furthermore, differential operation reduces time jitter and crosstalk and offers good common-mode suppression compared to singlemode operation [10]. Cascaded emitter followers are used for level shifting and impedance transformation between the various current switches. Multiple emitter followers improve the decoupling capability and increase the collector-base voltage of the current-switch transistors allowing for smaller transistors, resulting in lower collector-base capacitances [10]. On-chip matching resistors (50 ) at all data inputs are used in order to reduce jitter introduced by reflections [2], [11].

1322

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 9, SEPTEMBER 1999

Fig. 5. Schematic cross section of a transistor.

Fig. 4. Chip micrograph (chip size: 0.9

2 0.9 mm2 ).

TABLE I DEVICE PARAMETERS (JUNCTION CAPACITANCES ARE ZERO-BIAS VALUES)

Meeting the required speed rather than low power consumption was the main aim of this design. All transistor sizes are individually optimized with respect to the function of the transistor in the circuit. Extra attention was given to the on-chip wiring. The lines on the chips were classified as “critical” or “uncritical.” For example, the lines driven by emitter followers are critical because they support ringing, while the lines driven by current switches are uncritical [10]. The critical lines are then shortened at the cost of the uncritical ones. The longer signal lines are realized as microstrip lines (with the lowest metallization layer as a ground plane), mainly to improve simulation accuracy. This leads to the layout shown in Fig. 4. V. CHIP TECHNOLOGY The circuit has been fabricated in a self-aligned doublepolysilicon bipolar technology [12]. The fabrication starts with buried layer formation. A 1- m epitaxial layer is grown and low to compromise between high-transit-frequency . The isolation consists external collector-base capacitance of a channel stopper implantation combined with LOCOS field oxide. The active base is formed by 5-keV BF implantation. This low implantation energy in combination with optimized annealing conditions allows for very steep base profiles. This results in a narrow base width of about 50 nm and thus enables a high transit frequency. A selectively implanted collector improves the current-carrying capability of the transistors to of about 2 the high optimum collector current density mA/ m . To minimize narrow emitter effects, an in situ doped emitter-polysilicon layer is used [13]. This prevents a reduction of cutoff frequency even for 0.5- m design rules. A three-level metallization completes the process. Fig. 5 shows a schematic cross section of a transistor. Except for epitaxy, only process steps of a 0.5- m CMOS production environment are necessary. The maximum transit GHz at V frequency of the transistors is mA m . Table I summarizes typical parameters and for transistors with effective emitter size of m . The minimum gate delay for an ECL differentially operating ring oscillator with output voltage swing of 800 mV - is measured to be 15.4 ps. This value is achieved for a current per gate of 1.6 mA (see Fig. 6).

Fig. 6. Measured ECL gate delay versus current per gate with an 800-mV differential voltage swing.

VI. MOUNTING

AND

MEASURMENT SETUP

For measurements, the clock and data recovery IC has been using mounted on a 15-mil ceramic substrate conventional bonding techniques. Special care has been taken to minimize the length of the bond wires by positioning the surface of the chip on the same level as the signal, ground, and supply lines of the mounting substrate. Due to differential operation, a pair of lines for each clock and data signal is needed to connect the chip with the environment. Therefore, a corresponding number of connectors are necessary. The minimum distance between them determines the minimum size of the test fixture. To avoid additional delay lines, the , and , , , length of the lines for the signals

WURZER et al.: 40-Gb/s INTEGRATED CLOCK AND DATA RECOVERY CIRCUIT

1323

Fig. 9. Eye diagram of the 20-Gb/s data signal at the output demultiplexer.

Fig. 7. Photograph of the package (package size: 70

2 70 mm2 ).

D2 of the 1 : 2

(a)

(b)

Fig. 10. (a) Transmitter clock and (b) recovered clock.

Fig. 8. Eye diagram of the 40-Gb/s input data signal

Din .

respectively, have to be the same. To achieve a compact layout of these lines, coupled microstrip lines are used. At the input , grounded coplanar lines are applied, which show lower dispersion than microstrip lines. The realized test fixture is 70 mm . shown in Fig. 7. It measures 70 Random pulse pattern generators for driving the circuit at the required data rate of 40 Gb/s are not commercially available. A pulse generator has been built from basic high-speed IC’s [2], [14]. Four 10-Gb/s pseudorandom bit sequences 1) have been multiplexed to a 40-Gb/s (sequence length 2 nonreturn-to-zero signal.

Fig. 11. Jitter histogram of the recovered clock.

VII. EXPERIMENTAL RESULTS The clock and data recovery IC operates at the single supply voltage of 5 V and consumes 1.6 W. It should be mentioned that no additional cooling was applied. Fig. 8 shows the 40Gb/s input signal to the CDR circuit. To demonstrate the input sensitivity of the circuit, the eye opening is artificially reduced. In Fig. 9, an eye diagram of the well regenerated and demultiplexed data signal is shown. Fig. 10 shows in (a) the 20-GHz transmitter clock and in (b) the recovered clock. The jitter histogram of the extracted clock in the time domain is displayed in Fig. 11. The measured rms time jitter as observed on the sampling oscilloscope is about 0.8 ps. The measured signal spectra of the VCO are plotted in Fig. 12. The dashed line represents the free-running VCO and the solid line the VCO phase-locked to the 40-Gb/s data signal shown in Fig. 8. The peak is about 35 dB above the floor caused by the statistics of the data.

Fig. 12. VCO spectra.

VIII. CONCLUSION An integrated clock and data recovery circuit operating up silicon to 40 Gb/s has been realized in a 0.5- m/50-GHz bipolar technology using only production-like process steps. This data rate is the highest reported value for this type of circuit in a silicon technology. This demonstrates that all

1324

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 9, SEPTEMBER 1999

digital functions necessary for a 40-Gb/s transmission system are feasible with silicon bipolar production technologies. REFERENCES [1] K. Hagimoto, Y. Miyamoto, T. Kataoka, H. Ichino, and O. Nakajima, “Twenty-Gbit/s signal transmission using simple high-sensitivity optical receiver,” in OFC’92 Tech. Dig., Feb. 1992, p. 48. [2] A. Felder, M. M¨oller, J. Popp, J. B¨ock, and H.-M. Rein, “46 Gb/s DEMUX, 50 Gb/s MUX, and 30 GHz static frequency divider in silicon bipolar technology,” IEEE J. Solid-State Circuits, vol. 31, pp. 481–486, Apr. 1996. [3] W. Bogner, U. Fischer, E. Gottwald, and E. M¨ullner, “20 Gbit/s TDM nonrepeatered transmission over 198 km DSF using Si-bipolar IC for demultiplexing and clock recovery,” in Proc. ECOC, Sept. 1996, paper TuD.3.4. [4] W. Bogner, E. Gottwald, A. Sch¨opflin, and C.-J. Weiske, “40 Gbit/s unrepeatered optical transmission over 148 km by electrical time division multiplexing and demultiplexing,” Electron. Lett., vol. 33, no. 25, pp. 2136–2137, Dec. 1997. [5] R. Yu, R. Pierson, P. Zampardi, K. Runge, A. Campana, D. Meeker, K. C. Wang, A. Petersen, and J. Bowers, “Packaged clock recovery integrated circuits for 40 GBit/s optical communication links,” in GaAs IC Symp. Tech. Dig., Nov. 1996, pp. 129–132. [6] M. Mokhtari, T. Swahn, R. H. Walden, W. E. Stanchina, M. Kardos, T. Juhola, G. Schuppener, H. Tenhunen, and T. Lewin, “InP-HBT chip-set for 40-Gb/s fiber optical communication systems operational at 3 V,” IEEE J. Solid-State Circuits, vol. 32, pp. 1371–1383, Sept. 1997. [7] M. Lang, Z.-G. Wang, Z. Lao, M. Schlechtweg, A. Thiede, M. RiegerMotzer, M. Sedler, W. Bronner, G. Kaufel, K. K¨ohler, A. H¨ulsmann, and B. Raynor, “20–40 Gb/s 0.2-m GaAs HEMT chip set for optical data receiver,” IEEE J. Solid-State Circuits, vol. 32, pp. 1384–1393, Sept. 1997. [8] A. Felder, M. M¨oller, M. Wurzer, M. Rest, T. F. Meister, and H.-M. Rein, “60 Gbit/s regenerating demultiplexer in SiGe bipolar technology,” Electron. Lett., vol. 33, no. 23, pp. 1984–1986, Nov. 1997. [9] J. Hauenschild, A. Felder, M. Kerber, H.-M. Rein, and L. Schmidt, “A 22 Gb/s decision circuit and a 32 Gb/s regenerating demultiplexer IC fabricated in silicon bipolar technology,” in Proc. IEEE BCTM’92, Sept. 1992, pp. 151–154. [10] H.-M. Rein and M. M¨oller, “Design considerations for very-high-speed Si-bipolar IC’s operating up to 50 Gb/s,” IEEE J. Solid-State Circuits, vol. 31, pp. 1076–1090, Aug. 1996. [11] J. Hauenschild and H.-M. Rein, “Influence of transmission-line interconnections between Gbit/s IC’s on time jitter and instabilities,” IEEE J. Solid-State Circuits, vol. 25, pp. 763–766, June 1990. [12] J. B¨ock, A. Felder, T. F. Meister, M. Franosch, K. Aufinger, M. Wurzer, R. Schreiter, S. Boguth, and L. Treitinger “A 50 GHz implanted base silicon bipolar technology with 35 GHz static frequency divider,” in Symp. VLSI Technology Tech. Dig., June 1996, pp. 108–109. [13] J. B¨ock, M. Franosch, H. Sch¨afer, H. v. Philipsborn, and J. Popp, “Insitu doped emitter-polysilicon for 0.5 m silicon bipolar technology,” in Proc. ESSDERC’95, The Hague, the Netherlands, Sept. 1995, pp. 421– 424. [14] M. M¨oller, H.-M. Rein, A. Felder, and T. F. Meister, “60 Gbit/s timedivision multiplexer in SiGe-bipolar technology with special regard to mounting and measuring technique,” Electron. Lett., vol. 33, no. 8, pp. 679–680, Apr. 1997.

Martin Wurzer was born in Innsbruck, Austria, in 1966. He received the Diplomingenieur degree in electrical engineering from the Technical University Vienna, Austria, in 1994, where he is currently pursuing the Ph.D. degree. He joined Corporate Research and Development, Siemens AG, Munich, Germany, in 1994, where he has been engaged in the development of digital high-speed silicon bipolar IC’s for future optical communication systems in the gigabit-per-second range.

Josef B¨ock was born in Straubing, Germany, in 1968. He received the diploma degree in physics and the Ph.D. degree from University of Regensburg, Germany, in 1994 and 1997, respectively. He joined Corporate Research and Development, Siemens AG, Munich, Germany, in 1993, where he first investigated narrow emitter effects in deep submicrometer silicon bipolar devices. His work on technology development and process integration for high-speed silicon bipolar transistors resulted in the SIEGET 45 microwave-transistor family. Currently, he is working on process development for Si and SiGe bipolar technologies.

Herbert Knapp was born in Salzburg, Austria, in 1964. He received the Diplomingenieur degree in electrical engineering from Technical University Vienna, Austria, in 1997. He joined Corporate Research and Development, Siemens AG, Munich, Germany, in 1993, where he has been involved in the design of integrated circuits for wireless communications. His current research interests include the design of high-speed and low-power microwave circuits.

Wolfgang Zirwas received the Diplomingenieur degree in electrical engineering from Technical University Munich, Germany. He joined Siemens AG, Munich, in 1987. First, he worked in the field of high-bit-rate fiber-optic communication systems. Later, he focused his work on broad-band access technologies (xDSL, HFC) for both residential and business users. He is now working in the field of broad-band wireless systems.

Fritz Schumann received the Diplomingenieur degree in electrical engineering from Technical University Berlin, Germany, in 1981. Subsequently, he worked in the field of RF and microwave hybrid circuit and system design for telecommunication and radar applications. In 1992, he joined the silicon bipolar IC design group, Corporate Research and Development, Siemens AG, Munich, Germany. Since then, he has realized IC’s for wireless and fiber-optic communication systems up to 60 Gb/s.

Alfred Felder was born in Bruneck, South Tyrol, Italy, in 1963. He received the Diplomingenieur and Ph.D. degrees in electrical engineering from the Technical University Vienna, Austria, in 1989 and 1993, respectively. He joined Corporate Research and Development, Siemens AG, Munich, Germany, in 1989, where he has been engaged in the development of analog and digital high-speed silicon bipolar IC’s for future optical communication systems in the gigabit-per-second range. From 1996 to 1998, he was Manager of the Technology Department of Siemens K.K. The department is the liaison office of the Corporate Technology of Siemens AG in Japan, responsible for the cooperation with Japanese companies in research. Since 1998, he has been heading the business operation Signal Processing & Control within the Siemens Semiconductor Group in Japan and has been responsible for marketing of microcontrollers and digital signal processors.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 12, DECEMBER 2001

1937

A Fully Integrated 40-Gb/s Clock and Data Recovery IC With 1:4 DEMUX in SiGe Technology Mario Reinhold, Claus Dorschky, Eduard Rose, Rajasekhar Pullela, Peter Mayer, Frank Kunz, Yves Baeyens, Member, IEEE, Thomas Link, and John-Paul Mattia

Abstract—In this paper, a fully integrated 40-Gb/s clock and data recovery (CDR) IC with additional 1:4 demultiplexer (DEMUX) functionality is presented. The IC is implemented in a state-of-the-art production SiGe process. Its phase-locked-loop-based architecture with bang-bang-type phase detector (PD) provides maximum robustness. To the authors’ best knowledge, it is the first 40-Gb/s CDR IC fabricated in a SiGe heterojunction bipolar technology (HBT). The measurement results demonstrate an input sensitivity of 42-mV single-ended data input swing at a bit-error rate (BER) of 10 10 . As demonstrated in optical transmission experiments with the IC embedded in a 40-Gb/s link, the CDR/DEMUX shows complete functionality as a single-chip-receiver IC. A BER of 10 10 requires an optical signal-to-noise ratio of 23.3 dB. Index Terms—Bang-bang, BER, CDR, clock and data recovery, demultiplexer, DEMUX, dynamic frequency divider, jitter generation, jitter tolerance, limiting amplifier, OSNR, phase detector, phase-locked loop, PLL, SiGe, VCO.

I. INTRODUCTION

T

ODAY’S commercially available highest capacity optical transmission systems are based on multiple 10-Gb/s timedivision multiplexing (TDM) channels. These systems are expected to be insufficient to meet the rapidly increasing demands for higher bandwidth in the foreseeable future. The economically achievable transmission capacity of these wavelength-division multiplexing (WDM) systems is currently limited to 1.6 Tb/s, assuming 160 parallel 10-Gb/s TDM channels in the C- and L-band at a channel spacing of 50 GHz. This corresponds to a spectral efficiency of 0.2 (b/s)/Hz. By increasing the channel bit rate to 40-Gb/s per TDM channel, the fiber capacity can be better utilized. With the spectral efficiency increased to 0.4 (b/s)/Hz, the total transmission capability is 3.2 Tb/s, assuming 80 parallel 40-Gb/s channels and 100-GHz channel spacing.

Manuscript received March 26, 2001; revised July 15, 2001. M. Reinhold, C. Dorschky, E. Rose, and F. Kunz were with Lucent Technologies, Optical Networking Group, D-90411 Nürnberg, Germany. They are now with CoreOptics GmbH, D-90411 Nürnberg, Germany (e-mail: [email protected] or [email protected]). R. Pullela was with Lucent Technologies, Bell Labs, Murray Hill, NJ. He is now with Gtran Inc., Westlake Village, CA 91362 USA. P. Mayer and T. Link are with Lucent Technologies, Optical Networking Group, D-90411 Nürnberg, Germany. Y. Baeyens is with Lucent Technologies, Bell Labs, Murray Hill, NJ 07974 USA. J.-P. Mattia was with Lucent Technologies, Bell Labs, Murray Hill, NJ 07974 USA. He is now with Big Bear Networks, Sunnyvale, CA 94086 USA. Publisher Item Identifier S 0018-9200(01)09325-8.

Additionally, 40-Gb/s TDM will become more cost effective, as the number of optical ports is reduced by a factor of 4 compared to 10-Gb/s TDM, resulting in fewer price-determining optical components, smaller system footprint, and reduced maintenance costs. Regarding next-generation 40-Gb/s TDM links, the clock and data recovery (CDR) IC is a key electronic component, which strongly determines the overall transmission performance. 40-Gb/s TDM designs must be architecturally robust and manufacturable to compete with 10-Gb/s TDM systems. Accordingly, a fully integrated phase-locked loop (PLL)-based approach with self-aligning bang-bang phase detector (PD) is employed in this work. The IC is fabricated in a production state-of-the-art SiGe heterojunction bipolar technology (HBT) which provides advantages with respect to the achievable level of integration, yield, cost-effectiveness, and process stability compared to III-V process technologies. II. CLOCK

AND

DATA RECOVERY EMBEDDED OPTICAL LINK

IN

THE

The 40-Gb/s TDM optical link employs a 4:1 multiplexing scheme, as shown in the block diagram in Fig. 1. At the receiver, the incoming optical signal is first amplified by an optical preamplifier (OA), converted into electrical pulses by the photo diode, and then directly feeds the CDR/DEMUX. Data recovery is accomplished by the first 1:2 DEMUX. In the PLLbased clock recovery approach presented here, the PD output forces the receive-side voltage-controlled oscillator (VCO) to track the phase of the incoming data signal. The combination of the PD and 1:2 DEMUX function allows the use of a 20-GHz half-bit-rate clock. This half-bit-rate architecture is explained in more detail in Section IV. This paper focuses on the CDR/DEMUX IC. However, the remaining basic functions such as the 4:1 multiplexer (MUX) and the driver IC have also been realized in this work program in SiGe HBT and GaAs high-electron-mobility transistor (HEMT) technology, respectively. III. PROCESS TECHNOLOGY The CDR/DEMUX IC presented in this work was designed and 74-GHz in a state-of-the-art SiGe HBT with 72-GHz [1]. SiGe HBT provides superiority for a high level of integration compared to III-V technologies. The process features four metal layers in total including a thick metal layer on top.

0018–9200/01$10.00 © 2001 IEEE

1938

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 12, DECEMBER 2001

Fig. 1. Clock and data recovery IC embedded in the optical link.

Fig. 2.

CDR/DEMUX architecture.

Small-scale integrated analog and digital building blocks implemented in this process have been demonstrated for 40-Gb/s operation [2], [3]. IV. CDR/DEMUX ARCHITECTURE The half-bit-rate architecture of the CDR/DEMUX IC (Fig. 2) is based on the concept reported in [4] and has already demonstrated its functionality for 10-Gb/s applications. The nonlinear bang-bang PD described in [5] was modified for interlaced operation. The combination of the PD and the

1:2 DEMUX functions allows the use of a half-bit-rate clock running at 20 GHz. The three upper eye diagrams in Fig. 3 illustrate the basic principle of a common bang-bang PD. For the basic realization, of the incoming data signal are necessary. three samples sample two consecutive bits In locked condition, and while samples the data transition, as indicated in the first eye diagram. In this implementation, two modifications to the common bang-bang PD are made. First, the 1:2 DEMUX and phase-

REINHOLD et al.: FULLY INTEGRATED 40-Gb/s CLOCK AND DATA RECOVERY IC

1939

each other, resulting in a minimum lead time of the high-speed phase-control loop. Two variants of the CDR/DEMUX exist. The higher integration variant of the CDR/DEMUX (CDR/DEMUX with VCO) also contains an on-chip 40-GHz VCO, the proportional filter of the PLL, and a frequency detector (FD) working as a lowspeed frequency acquisition aid, whereas in the lower integration variant (CDR/DEMUX without VCO), these parts are external to the IC. V. BUILDING BLOCKS

Fig. 3. PD principle.

The most challenging building blocks of the CDR/DEMUX are the 40-GHz 2:1 frequency divider, the 40-GHz VCO, the limiting data input amplifier and the transition sampling latches, including the four-phase 20-GHz clock tree. A. 2:1 Dynamic Frequency Divider

detection function are combined so that a half-bit-rate clock is used. However, for sampling in the bit transition, a fourphase clock is essential. Second, unlike in previous approaches and are generated in [6], six samples order to process every single data transition as indicated in the fourth eye diagram in Fig. 3. This increases the maximum PD gain, reducing the jitter generation compared to a single-edge PD. and are derived The PD output signals from the following logical operations on the six samples after their synchronization with the clock phase C0.

Since the PD requires a 20-GHz four-phase clock, those clock signals can be most accurately generated by dividing a 40-GHz signal with a 2:1 frequency divider resulting in differential 0 and 90 phase-shifted clock signals. Studies published earlier [7] using a similar technology show a maximum operating frequency of 42 GHz with a standard static frequency divider. To achieve a higher performance margin, a dynamic frequency divider similar to [8] was employed. As it is indicated in the circuit diagram (Fig. 4), the divided clock signal is stored by parasitic capacitances, which results in a higher operating frequency compared to a static frequency divider. As a drawback of the dynamic approach, the operating frequency exhibits a lower limit. B. On-Chip 40-GHz VCO

In this design, an accurate four-phase clock at 20 GHz is generated from the 40-GHz VCO output using a symmetrically loaded 2:1 frequency divider. Additionally, a limiting amplifier is implemented to improve the input sensitivity. It feeds the 40-Gb/s input data to the four parallel latch chains generating the six samples. To process the two 1:2 demultiplexed 20-Gb/s signals and with commercially available 10-Gb/s DEMUX ICs, an additional 2:4 demultiplexer is included, which produces the 10-Gb/s output signals D00, D01, D10, D11, and the 5-GHz output clock C5G. The PLL filter has a parallel proportional ( ) integrating filter). The high-speed filter aligns the ( ) structure ( phase relation between clock and data. Thus, the digital PD and each generate a dynamic output pulses in the VCO frequency. Integration frequency step by the filter with low bandwidth of controls the static VCO frequency. Since the filter and the filter work at different speeds, both paths are decoupled from

The internal 40-GHz VCO is based on a differential Colpitts topology using microstrip transmission lines instead of a spiral inductor (Fig. 5). At an oscillation frequency of 40 GHz, a grounded microstrip line can be modeled quite accurately compared to other forms of inductors. The microstrip transmission lines exhibit an inductive input impedance, since for the odd mode they see a virtual ground termination. It is physically implemented with the signal line on the upper thick metallization layer and a shielding ground plane on the lowest metal layer. This yields maximum inductive input impedance per line length combined with minimum resistive losses, which optimizes the quality factor . As the PLL employs a high-speed and a low-speed filter in parallel, the VCO has two separate frequency tuning inputs. and feed two ac-couThese tuning inputs pled reverse-biased varactor diodes controlling the VCO frequency. The varactors have minimum size, resulting in a maximum VCO frequency modulation bandwidth as required by the high-bandwidth bang-bang PD architecture. It should be noted that the optimization of the free-running VCO phase noise is not the major design goal. Since the bang-bang PLL architecture provides very high bandwidth with

1940

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 12, DECEMBER 2001

Fig. 4. Circuit diagram of the 2:1 dynamic frequency divider.

D. Clock Distribution and Latch

Fig. 5.

Circuit diagram of the 40-GHz on-chip VCO.

respect to loop gain, the VCO phase noise is suppressed by its open loop gain when the PLL is in lock. C. Limiting Data Amplifier The limiting data input amplifier (Fig. 6) employs cascaded chains of emitter followers (EF), transimpedance stages (TIS), and transadmittance stages (TAS) in accordance with the concept of impedance mismatch [9]. Layout aspects strongly influence the circuit performance; especially, it should not be degraded by signal interconnects. Since signal lines can be distinguished into critical and uncritical lines [9], long transmission lines are arranged between current interfaces consisting of a TAS and its load, which is either represented by an active TIS or by passive resistors. For this reason, the limiting data amplifier is implemented—both in schematic and layout—in the form of three separate amplifier blocks with a TIS–TAS interface. As the data signal has to be split into four latch chains (Fig. 2) and longer lines cannot be avoided, four TIS2 stages are driven in parallel by one TAS2.

Fig. 7 shows the circuit diagram of the latch. The latch structure can be subdivided into the latch core and the local clock input stage. The steepness of the PD curve is strongly determined by the metastability and the clock phase margin (CPM) of the latch as sample the bit transition when the PLL is in lock. Two high current-biased emitter followers in the data path provide a high CPM and a small metastability region. For optimal clock distribution, a local clock input stage conand two emitter followers sisting of termination resistors is included in each latch cell, since a total clock line length of several millimeters cannot be avoided. For this reason, current interfaces between each clock buffer and the latches are employed. The concept of the clock distribution is illustrated in Fig. 8. The two clock buffers are based on open-collector transadmittance stages. As stated before, a local clock input is included in stage consisting of termination resistors every latch cell. Each clock buffer is loaded with ten latches. Impedance matching can be easily achieved by designing according to the number of loading the line impedance can be locally adapted with respect latches. The value of to signal splits, as it is shown in Fig. 8. If the line impedances (with being the number of loading are chosen latches), the clock amplitude can be increased. This is due to the inductive peaking effect of the transmission line, as indicated in Fig. 9. The layout of the latch, which is shown as part of Fig. 10, employs orthogonal data and clock inputs. This implementation minimizes line length on the high-speed data path by running a data channel directly through the cascaded latch cells (Fig. 10). In addition, clock channels run beside the cells to simplify the clock-tree routing.

REINHOLD et al.: FULLY INTEGRATED 40-Gb/s CLOCK AND DATA RECOVERY IC

1941

Fig. 6. Circuit diagram of the 40-Gb/s limiting data amplifier.

Since the self-oscillating frequency of the divider is roughly 41 GHz divided by 2, the circuit is very sensitive in the frequency range centered around the nominal operating frequency of 40 GHz. The dynamic principle of the frequency divider results in a minimum operating frequency of roughly 33 GHz at an input power of 1 dBm. B. On-Chip 40-GHz VCO A similarly centered behavior can be measured for the on-chip VCO represented by the tuning characteristic, as shown in Fig. 14. The VCO tuning range expands from 37.7 to 41.2 GHz. Fig. 7.

Circuit diagram of the latch.

VI. PHYSICAL REALIZATION AND MEASUREMENT RESULTS The CDR/DEMUX with VCO dissipates 5.4 W in total. All high-speed blocks operate at 5.5-V supply voltage and the 2:4 DEMUX at 4.2-V supply voltage. The whole die (Fig. 11) occupies an area of 3005 m . Closed-loop PLL measurements are performed with mounted ICs using single-ended 4-b interleaved OC192 SONET signals pseudorandom bit sequence (PRBS) payload for with PRBS payload for the electrical back to back and optical transmission experiments. For these measurements, the CDR/DEMUX and the external components are mounted on ) using standard wire bonding, a duroid substrate ( as shown in Fig. 12. The IC is placed into a cavity, reducing the bondwire length. The substrate is attached onto a grounded brass box. The 40-Gb/s input data is fed single-ended into the high-speed box via a V-connector. A. 2:1 Dynamic Frequency Divider Fig. 13 illustrates the input sensitivity of the dynamic frequency divider in single-ended operation. A maximum operating frequency of more than 44.5 GHz is observed giving sufficient margin with respect to temperature and process variations.

C. Bang-Bang PD is The measured PD transfer function illustrated in Fig. 15. The curve shows a fairly steep slope in the lock-in point, which implies good sampling capability at data transition and a small metastability region, corresponding to the observed high CPM of 240 . D. Jitter Generation and Jitter Tolerance is measured using The rms jitter of the recovered clock a sampling oscilloscope. By excluding the trigger jitter of the ps) from the original measurement test equipment ( result of less than 1.2 ps, the overall CDR rms jitter generation can be calculated to be approximately 0.7 ps. For SONET and SDH systems, jitter tolerance masks are defined. As standardization is not finalized for 40-Gb/s systems yet, the tolerance mask is extrapolated from the 10-Gb/s specification. The measured jitter tolerance curve, given in Fig. 16, exhibits sufficient margin relative to the extrapolated BELLCORE mask [10]. For jitter frequencies higher than the corner freMHz of the PLL, the jitter tolerance is dequency termined by the CPM of the first latches and the PLL has no inMHz, the fluence. For jitter frequencies lower than jitter tolerance decreases with 20 dB per decade. Large amounts of jitter can be tolerated at low frequencies, since the filter provides high gain in this frequency range.

1942

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 12, DECEMBER 2001

Fig. 8. Block diagram of the 20-GHz quadrature clock distribution.

Fig. 9.

Rough odd-mode simplification of the clock distribution.

E. Electrical Sensitivity (BER) The bit-error rate (BER) curve as measure of the overall electrical CDR performance is given in Fig. 17. For the CDR/DEMUX with external VCO, a very high sensitivity is of 28-mV single-ended voltage swing at measured. Due to minor modifications of the limiting amplifier resulting in lower bandwidth, the CDR/DEMUX with internal VCO provides slightly less sensitivity. For the same BER, a 42-mV single-ended voltage swing is necessary. A contribution of the internal VCO to the performance degradation can be ruled out, since a variant with external VCO and modified limiting amplifier showed similar performance degradation. Such high input sensitivity allows a direct connection of the

Fig. 10.

Layout of the entire clock tree and enlargement of the latch.

photodiode to the CDR/DEMUX as the optical measurement results demonstrate. F. Performance in System Application The performance of the CDR/DEMUX embedded in a optical fiber link (refer to Fig. 1) can be characterized by the optical signal-to-noise ratio (OSNR) measurement, as shown in Fig. 18. This is due to the fact that in the given configuration the sensitivity is limited by the noise of the OA. A 50- terminated photodiode is directly connected to the CDR/DEMUX without any electrical amplifier in between, so

REINHOLD et al.: FULLY INTEGRATED 40-Gb/s CLOCK AND DATA RECOVERY IC

Fig. 11.

Fig. 12.

Fig. 13.

1943

Fig. 14.

Tuning curve of the 40-GHz on-chip VCO.

Fig. 15.

Measured PD transfer function.

Fig. 16.

Jitter tolerance measurement result.

Fig. 17.

BER measurement.

CDR/DEMUX micrograph (CDR/DEMUX with VCO).

CDR/DEMUX test fixture with mounted CDR/DEMUX.

Input sensitivity of the 2:1 dynamic frequency divider.

that the CDR/DEMUX works as a single-chip-receiver IC. An externally modulated signal is transmitted over an 80-km optical-fiber link. To achieve a BER of 10 , a minimum OSNR of 23.3 dB is required.

1944

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 12, DECEMBER 2001

Claus Dorschky received the Dipl.-Ing. degree in electrical engineering from Friedrich Alexander University, Erlangen, Germany, in 1986. He has been working in the development department for high-speed optical transmission systems at Philips Kommunikation Industries (later Lucent Technologies), Nürnberg, Germany, for 14 years. His research interests include design and integration of analog and mixed-signal full custom ICs for 10- and 40-Gb/s as well as integration of optical receivers and transmitters into single wavelength and DWDM transmission systems at those bitrates. In early 2001, he cofounded CoreOptics Inc., Nürnberg, Germany. Fig. 18.

OSNR measurement.

VII. CONCLUSION In this paper, the implementation of a fully integrated CDR/DEMUX for 40-Gb/s TDM application in a state-of-theart SiGe HBT process has been demonstrated. Key success factors in the design are, first, the robust half-bit-rate architecture with bang-bang PD, and second, its implementation by partitioning the whole chip into key building blocks interconnected via robust current interfaces. Therefore, schematic design and layout have to be done concurrently. This concept has been applied throughout the IC, but mainly in the 40-Gb/s data and the 20-GHz clock distribution. Optical system measurements show the feasibility of the CDR/DEMUX as a single-chip-receiver IC. REFERENCES [1] T. F. Meister et al., “SiGe base bipolar technology with 74-GHz f and 11-ps gate delay,” Proc. IEEE Int. Electron Devices Meeting (IEDM), pp. 739–742, Dec. 1995. [2] J. Müllrich, T. F. Meister, M. Rest, W. Bogner, A. Schöpflin, and H.-M. Rein, “40-Gbit/s transimpedance amplifier in SiGe bipolar technology for the receiver in optical-fiber links,” Electron. Lett., vol. 34, pp. 452–453, 1998. [3] A. Felder, M. Möller, M. Wurzer, M. Rest, T. F. Meister, and H.-M. Rein, “60-Gbit/s regenerating demultiplexer in SiGe bipolar technology,” Electron. Lett., vol. 33, pp. 1984–1985, 1997. [4] J. Hauenschild et al., “A plastic packaged 10-Gb/s BiCMOS clock and data recovery 1:4-demultiplexer with external VCO,” IEEE J. SolidState Circuits, vol. 31, pp. 2056–2059, Dec. 1996. [5] J. J. D. H. Alexander, “Clock recovery from random binary signals,” Electron. Lett., vol. 11, pp. 541–542, Oct. 1975. [6] M. Wurzer et al., “A 40-Gb/s integrated clock and data recovery circuit in a 50-GHz f silicon bipolar technology,” IEEE J. Solid-State Circuits, vol. 34, pp. 1320–1324, Sept. 1999. [7] M. Wurzer et al., “42-GHz static frequency divider in a Si/SiGe bipolar technology,” in IEEE ISSCC Dig. Tech. Papers, Feb. 1997, pp. 123–123. [8] Z. Lao et al., “55-GHz dynamic frequency divider IC,” Electron. Lett., vol. 34, no. 20, pp. 1973–1974, 1998. [9] H.-M. Rein et al., “Design considerations for very-high-speed Si-Bipolar ICs operating up to 50-Gb/s,” IEEE J. Solid-State Circuits, vol. 31, pp. 1076–1090, Aug. 1996. [10] “SONET OC-192 Transport System Generic Criteria,” Bellcore, GR-1377-CORE, Dec. 1998. Mario Reinhold was born in Mülheim/Ruhr, Germany, in 1972. He received the Diplom-Ingenieur degree in electrical engineering from the Ruhr-University Bochum, Germany, in 1998. He joined Lucent Technologies, Optical Networking Group, Nürnberg, Germany, in 1998, where his activities focused on the development of various analog and digital high-speed bipolar ICs for 40-Gb/s and advanced 10-Gb/s fiber-optic communication systems. Since 2001, he has been with CoreOptics Inc., Nürnberg, Germany, working on a next-generation 40-Gb/s chipset.

Eduard Rose was born in Kischinjow, Moldova, in 1973. He received the Dipl.-Ing. degree in electrical engineering from Ruhr-University Bochum, Germany, in 1998. He joined Lucent Technologies, Optical Networking Group, Nürnberg, Germany, in 1999, where he started developing different analog and digital high-speed bipolar ICs for SDH/Sonet systems. He is currently with CoreOptics Inc., Nürnberg, working on a second-generation chipset for a 40-Gb/s optical link system.

Rajasekhar Pullela received the B.Tech. degree in electrical and communications engineering from the Indian Institute of Technology, Madras, India, in 1993. From 1993 to 1998, he worked as a graduate student researcher at the University of California, Santa Barbara. During this period, he received M.S. and Ph.D. degrees in electrical engineering, studying device physics and high-speed circuit design. During 1998–2000, he worked as a Member of Technical Staff at Bell Laboratories, Lucent Technologies, Murray Hill, NJ, designing high-speed ICs for fiber-optic communication systems. Since 2000, he has been with Gtran, Inc., Newbury Park, CA.

Peter Mayer was born in Germany on July 11, 1964. He received the Dipl.-Ing. degree in electrical engineering from Friedrich Alexander University, Erlangen, Germany, in 1989. In 1989, he joined Philips Kommunikation Industries, Nürnberg, Germany, and has been working on 622-Mb/s optical interface circuits. In 1998, he started developing clock and data recovery ICs for 10- and 40-Gb/s applications. He is currently a Technical Manager at Lucent Technologies GmbH, Nürnberg, where he is responsible for high-speed optical/electrical module design and integration for optical transmission systems.

Frank Kunz was born in Bad Sobernheim, Germany, in 1970. He received the Dipl.Ing. degree in electrical engineering from the Ruhr-University Bochum, Germany, in 1998. Until February 2001, he was with Lucent Technologies GmbH, Nürnberg, Germany, developing high-speed bipolar ICs for advanced 10-Gb/s optical communication links. He is currently with CoreOptics Inc., Nürnberg, and is working on a second-generation chipset for a 40-Gb/s optical link system.

REINHOLD et al.: FULLY INTEGRATED 40-Gb/s CLOCK AND DATA RECOVERY IC

Yves Baeyens (S’89–M’96) received the M.S. and Ph.D. degrees in electrical engineering from the Catholic University, Leuven, Belgium, in 1991 and 1997, respectively. His Ph.D. research was performed in cooperation with IMEC, Leuven, and treated the design and optimization of coplanar InP-based dual-gate HEMT amplifiers, operating up to W-band. After a year and a half stay as a Visiting Scientist at the Fraunhofer Institute for Applied Physics, Freiburg, Germany, he is currently a Technical Manager in the High-Speed Electronics Research Department of Lucent Technologies, Bell Laboratories, Murray Hill, NJ. His research interests include the design of mixed analog–digital circuits for ultrahigh-speed lightwave and millimeter-wave applications.

Thomas Link was born in 1968 in Nürnberg, Germany. He received the Dipl.-Ing. (FH) degree in electrical engineering from Georg Simon Ohm Fachhochschule, Nürnberg, Germany, in 1991. In 1991, he joined Philips Kommunikation Industrie AG (now Lucent Technologies), Nürnberg. He is Member of Technical Staff in the high-speed ASIC group and designed various high-speed ASICs, circuit packs, and firmware for SDH/Sonet systems.

1945

John-Paul Mattia received the B.S., M.S., E.E., and Ph.D. degrees in electrical engineering and computer science from the Massachusetts Institute of Technology, Cambridge. He began working in high-speed electronics at MIT Lincoln Laboratory in 1989. In 1996, he joined Texas Instruments Inc. in the DSP R&D organization. From 1997 to 2000, he worked in the High-Speed Electronics Group of Lucent Bell Labs, designing and testing circuits for lightwave communication systems. Since July 2000, he has been at Big Bear Networks, Sunnyvale, CA, where he is Chief Technical Officer of Electronics.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 12, DECEMBER 2000

1949

A Fully Integrated SiGe Receiver IC for 10-Gb/s Data Rate Yuriy M. Greshishchev, Member, IEEE, Peter Schvan, Member, IEEE, Jonathan L. Showell, Member, IEEE, Mu-Liang Xu, Member, IEEE, Jugnu J. Ojha, Member, IEEE, and Jonathan E. Rogers, Student Member, IEEE

Abstract—A silicon germanium (SiGe) receiver IC is presented here which integrates most of the 10-Gb/s SONET receiver functions. The receiver combines an automatic gain control and clock and data recovery circuit (CDR) with a binary-type phase-locked loop, 1 : 8 demultiplexer, and a 27 1 pseudorandom bit sequence generator for self-testing. This work demonstrates a higher level of integration compared to other silicon designs as well as a CDR with SONET-compliant jitter characteristics. The receiver has a die size of 4 5 4 5 mm2 and consumes 4.5 W from 5 V. Index Terms—Clock and data recovery (CDR), jitter generation, jitter tolerance, jitter transfer, phase detector, phase-locked loop (PLL), SONET, VCO.

Fig. 1. 10-Gb/s receiver architecture. Dotted box shows components integrated in the SiGe receiver IC presented in the paper.

I. INTRODUCTION

A

TYPICAL fiber-optic SONET receiver contains pin-diode with transimpedance (TZ) amplifier, wide dynamic range automatic gain control amplifier (AGC), and a clock and data recovery circuit (CDR) with a demultiplexer. Introduction of dense wave-division-multiplexed (DWDM) systems has put a high demand on the receiver production. A high level of 10-Gb/s component integration, as opposed to using a filter-based CDR architecture [1], is required along with self-testing capabilities to reduce receiver cost, module size, and power dissipation. One of the major difficulties in the integration of 10-Gb/s receiver is to achieve jitter characteristics compliant to the SONET requirements, such as Bellcore recommendations for the OC192 system [2]. To the authors’ knowledge, none of the previously reported [3]–[5] 10-Gb/s receiver ICs with the integrated clock and data recovery circuit (CDR) demonstrated all of the SONET compliant jitter characteristics. While sub-picosecond jitter generation was previously confirmed in the SONET CDR [5], another important question is if all of the receiver components can be integrated on a die without sensitivity and jitter performance degradation. A fully integrated SiGe receiver IC, presented in the paper, combines CDR, AGC, 1 : 8 demultiplexer and

pseudorandom bit sequence (PRBS) generator for self-testing, as shown in the dotted-line box in Fig. 1 [6]. Receiver performance mounted into test fixture was verified in a data-recovery mode up to 12.5 Gb/s and in a CDR mode at 9.1 Gb/s (only limited by the VCO maximum oscillation frequency after packaging). The OC192 10-Gb/s SONET-compliant jitter characteristics of the CDR were verified on-wafer with a membrane probe card and with a jitter analyzer from Anritsu. Phase-noise characteristics have also been measured to confirm the CDR’s sub-picosecond rms jitter performance. Measured 10-Gb/s maximum receiver sensitivity de-embedded after losses in the test fixture is 4.5 mV at a bit-error rate (BER) of at the demultiplexer (DEMUX) output. In Section II, the binary CDR architecture used in the receiver is briefly analyzed as compared to a linear-type CDR and design method to meet SONET jitter requirements is presented. Then in Section III, the full receiver architecture and the building blocks implementation details are discussed. In Section IV, the IC die fabrication features are described. Finally, in Section V, measured results are presented. II. BINARY CDR IN SONET RECEIVER A. Binary CDR Versus Linear CDR

Manuscript received April 17, 2000; revised June 29, 2000. Y. M. Greshishchev and P. Schvan are with Nortel Networks, Ottawa, ON K1Y 4H7, Canada (e-mail: [email protected]). J. L. Showell was with Nortel Network and is currently with Quake Technologies Inc., Ottawa, ON, Canada. M.-L. Xu was with Nortel Networks, Ottawa, ON K1Y 4H7, Canada. He is now with Conextant Systems, San Diego, CA. J. J. Ojha was with Nortel Networks, Ottawa, ON K1Y 4H7, Canada. He is now with Caspian Networks, Palo Alto, CA (e-mail: [email protected]). J. E. Rogers is with The University of Toronto, Toronto, ON M5S 3G4, Canada. Publisher Item Identifier S 0018-9200(00)09475-0.

The CDR published in [5] uses a linear-type PLL approach [Fig. 2(a)], while the CDR presented here is based on a binary PLL [Fig. 2(b)]. In the binary PLL, a binary Alexander-type [7] phase detector (PD) is used as compared to the Hogge-type PD [8] in a linear-type PLL. Examples of using binary architecture in optical receiver ICs can be found in [9], [10]. Binary PD produces two digital outputs, UP and DOWN, to signal if the data is early or late with respect to the VCO clock. To control the VCO, the binary information is split into two loops as suggested in [11]. The phase-control loop is formed with the UP and DOWN

0018-9200/00$10.00 © 2000 IEEE

1950

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 12, DECEMBER 2000

(a) Fig. 3.

(b) Fig. 2. Two basic self-aligned CDR architectures. (a) With a linear PLL. (b) With a binary PLL.

outputs directly modulating the VCO frequency with frequency ( denotes “bang-bang”—other name of the binary step PLL architecture) via bang-bang frequency tune input. The frequency loop of the binary PLL uses binary outputs integrated to control the second tune with charge pump and capacitor input of the VCO. To reduce jitter generation in absence of data transitions (during long zeros or ones), a tri-state charge pump was employed. By the same reason, a tri-state is introduced to the VCO bang-bang control input which is no longer binary, but ternary, where in a tri-state no frequency step is applied. Table I shows the comparison for two PLL receiver architectures. Binary CDR is less demanding on the “analog” features of an IC technology, and, in principal, has only one critical component—the binary phase detector (BPD) where sub-picosecond time resolution is required. The ring-oscillator-type VCO is recommended to reduce delay in the PLL loop and, therefore, jitter generation in the CDR, as explained later in the paper. The VCO phase noise is less critical in a binary CDR because of the relatively wide PLL bandwidth. In a linear-type PLL, jitter transfer characteristics can be analyzed using linear PLL theory (see, for example, [5]). Binary-type PLL has nonlinear jitter transfer characteristics and its analyses have not been presented in the technical literature. The following subsection describes the binary PLL design method used in the receiver design. B. Binary CDR Jitter Characteristics In SONET applications three main jitter characteristics are important: 1) Jitter Tolerance; 2) Jitter Transfer; 3) Jitter Generation. The Jitter Generation and loop stability was first analyzed by R. Walker et al. [11]. This work also suggested loop decomposi-

CDR analytical jitter tolerance as compared to SONET mask.

Fig. 4. Trade-off for the frequency step in binary CDR.

tion into a frequency-control (low-frequency) part and a phasecontrol (high-frequency) part. In general, binary PLL is similar in system behavior to a double integration delta modulator with prediction [12] acting in the data-phase domain. Based on analogy with the signal frequency response in delta modulator, the phase-jitter transfer function has a nonlinear (slew-rate limited) mechanism for the phase-jitter frequency response. It is a single-pole-like response with the bandwidth inversely proportional to the input jitter amplitude: (1) is 3-dB bandwidth of the binary PLL jitter transfer where is the jitter amplitude, is the function; is the average data transition bang–bang frequency step, for pattern). density factor (maximum The shape of the Jitter Tolerance function can be characterized by means of jitter-tolerance scale function, [5]. Modeled binary PLL jitter tolerance response, , . For is shown in Fig. 3 for two frequency steps comparison, Bellcore SONET mask is also shown on the same plot with the corresponding unit interval (UI) values on the right side of the graph1 . To satisfy the mask, jitter transfer bandwidth defined by (1) should be set above 4 MHz at a jitter amplitude ps and minimum average data transition . density, The Jitter Generation in a binary CDR is proportional to the , and delay in the PLL loop, measured as a frequency step, number of 100-ps clock periods required to propagate signal from the phase detector output to its input: (2) Equations (1) and (2) were used to find a frequency step and a delay acceptable for the SONET applications as 1In

a 10-Gb/s system, UI

= 100 ps.

GRESHISHCHEV et al.: Fully Integrated SiGe Receiver IC

1951

TABLE I COMPARISON OF LINEAR AND BINARY TYPE CDR ARCHITECTURES

shown in Fig. 4. The minimum value for the frequency step is determined by Jitter Tolerance minimum bandwidth is limited requirements (4 MHz); the maximum value by jitter generation (10 ps is recommended [2]). In the deps was assumed. To reduce sign presented here, jitter generation delay, should be minimized. This makes the ring-type VCO preferable in the binary CDR as compared to LC-tank based VCO where tuning delay is larger due to the usually higher -factor of the LC-tank. III. RECEIVER ARCHITECTURE A. Architecture The receiver architecture is shown in Fig. 5. It combines an AGC and a binary CDR with a 1 : 8 demultiplexer and a PRBS generator for self-testing. The receiver recovers 10-Gb/s data and a 10-GHz clock, and produces eight demultiplexed 1.25-Gb/s CML data outputs with a 1.25-GHz clock. The PRBS generator allows functional testing of the CDR and subsequent circuits. A PRBS clock (CLK) is required for testing. In the test mode, the PRBS output is enabled to drive the CDR data bus. In the receiver mode, the AGC output is enabled. The recovered 10-Gb/s data and 10-GHz clock appear at the recovered data (DATA REC BUS) and clock (CLK REC BUS) buses, driving a 1 : 8 DEMUX circuit. A clock signal can also be supplied externally (CLKx) for data recovery operation only. B. AGC The block diagram of the AGC is shown in Fig. 6. The AGC has total linear gain range from 3 to 20 dB with a maximum input of 1.7 V . The AGC has two variable gain

stages implemented with output current steering in has a fixed gain and also a differential pair [13]. Stage provides open collector transmission line interface to drive the CDR data bus. To alleviate conflicting requirements for large data swing and low noise figure at low input amplitudes, the AGC has two gain ranges: 7–7 dB (low gain range) and 7–20 dB (high gain range). Two differential pairs with a “large” and a “small” emitter degeneration resistors are used to switched the gain ranges. AGC-measured S11 is better than 15 dB in a frequency range up to 10 GHz, noise figure 13.5 dB. The ac bandwidth is adjustable in a range of 8–10 GHz. C. CDR As compared to original version of binary CDR [7], [11], in the CDR presented here, the data decision and clock recovery processes are split. This allows for independent optimization of data decision threshold (slicing) without affecting clock recovery process. There are four decision channels in the CDR, all driven by the CDR data bus. Channels 1 and 2 are identical decision circuits, as shown in the block diagram of Fig. 7(a). The additional decision channel allows operation with two different input slicing levels. The data decision threshold is set by a differential slicer circuit based on an emitter follower [Fig. 7(b)]. In a long-haul receiver application, a high-performance limiting amplifier [2 in Fig. 7(a)] is required. Note that the AGC stabilizes only a long-time averaged amplitude measured at AGC output with a peak detector (not shown in Figs. 5 and 6). The limiting amplifier stage was designed for 40-dB gain with bandwidth of more than 16 GHz and input AM to output PM conversion less than 1 ps in 20-dB input dynamic range. Two 20-dB gain-limiting amplifier stages similar to [14] were employed. The digital sampler is based on a master–slave–master (MSM) flip-flop

1952

Fig. 5.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 12, DECEMBER 2000

SiGe receiver architecture.

Fig. 6. AGC block diagram.

configuration [Fig. 7(c)]. It helps to reduce the latching metastability region and to increase the clock phase margin in the decision circuit, as opposed to master–slave D-type flip-flop (DFF). Schematic and layout of the latch was optimized for a minimum latching time constant. The BPD is formed with decision channels 3 and 4, and two DFF circuits. The BPD takes three data samples according to the timing diagram in Fig. 5. It generates a binary output with respect to the lead/lag phase of the VCO clock. The BPD uses only one edge of the data transition, and is set to a tri-state condition at the other edge or in the absence of data transitions according to a truth table (Table II). As a result, recovered clock jitter is not effected by asymmetry between the rising and the falling edges or by the incoming data pattern. The BPD phase resolution is critical to the CDR jitter performance. Latch metastability at time sample T0 (see Fig. 5) limits the resolution. This problem was circumvented by using a MSM-based decision circuit preceded by a limiting amplifier, as described above. Phase resolution better than 1 ps was measured in the BPD circuit, as shown in Fig. 8. A phase-delay sweep was arranged with two clock generators: 5.0125 GHz for the data and 10.0000 GHz

Fig. 7. Decision channel. (a) Block diagram: 1–slicer; 2–limiting amplifier; 3–digital sampler. (b) Slicer schematic. (c) Digital sampler schematic.

for the clock. Frequency shift of 12.5 MHz for the data provided binary pulses at the output of the phase detector with 40-ns period corresponding to a 100-ps delay sweep. To analyze the output pulses, a digital scope was synchronized from the phase-detector output using an HP 54118A trigger amplifier. This method allowed measurement of the output transition region with the accuracy of 0.5 ps per one data cycle (200 ps). Measured transition region at the phase detector output contains two data cycles, confirming phase detector time resolution to be less than 1 ps. The CDR frequency-control loop and the phase-control loop are separated as described in Section II. The bang-bang part of

GRESHISHCHEV et al.: Fully Integrated SiGe Receiver IC

1953

TABLE II TRUTH TABLE OF BINARY PHASE DETECTOR

Fig. 9.

Ring-type VCO block diagram.

Fig. 8. BPD measured time resolution.

the PLL controls the recovered clock phase via the input of the VCO. The frequency loop includes the charge pump and an external integration capacitor (pins C1 and C2). The VCO is a ring oscillator type with an architecture shown in Fig. 9. A mixer-type delay cell is used to control the oscillation frequency. The mixer cell is split into a fine-tune (for the internal frequency loop) and a coarse-tune (to compensate for process variation). Care was taken to provide symmetrical bang-bang frequency steps with respect to the tri-state. All of the VCO control inputs were implemented with the high-impedance pMOS buffers. A pMOS-based charge pump was employed [5].

Fig. 10.

1 : 8 demultiplexer block diagram.

Fig. 11.

2

D. 1 : 8 Demultiplexer The 1 : 8 DEMUX (Fig. 10) is similar in architecture to the design presented in [15]. Seven 1 : 2 demultiplexer circuits are cascaded, with each stage optimized for the clock frequency required. Each 1 : 2 demultiplexer consists of a master–slave–master flip-flop to capture the lead bit on the positive edge of the clock and a master–slave flip-flop to capture the second bit using the negative edge of the clock. Utilizing an extra latch in the MSM flip-flop ensures that the 1 : 2 data outputs are aligned for further processing. The frequency of the incoming CDR clock is divided by two at each demultiplexer stage with a delay equal to the data delay in the 1 : 2 block. E. Built-in

PRBS Generator

PRBS generator (Fig. 11) was implemented using The in a parallel form. the standard polynomial equation

0 1 PRBS generator block diagram.

The parallel form avoids the necessity of distributing a 10-GHz clock, as would be required if using a shift-register-type PRBS.

1954

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 12, DECEMBER 2000

Fig. 13. SiGe receiver mounted into microwave-style test fixture.

Fig. 12.

SiGe receiver die micrograph.

In an 8-bit parallel form, the clock frequency is reduced to 1.25 GHz. Similar to the AGC, the output of the multiplexer provides open collector transmission line interface to drive the CDR data bus. The disadvantage of a parallel architecture is penalty in the die area and the power consumption. In normal operation (the AGC transmits data to the CDR) the PRBS power supply is turned off, reducing the overall power consumption of the receiver. F. CDR Simulation Because of the nonlinear jitter response of a binary CDR, hierarchical numerical analysis was an important part of the receiver IC design. Four levels of PLL analysis were carried out: analytical, behavioral, schematic level, and post-layout extracted circuit with distributed parasitics. The last three levels are HSPICE-based. A behavioral library of linear and digital components was developed. Analytical models of jitter transfer and jitter tolerance are based on simplified binary-PLL theory, as described above. IV. FABRICATION The receiver IC was implemented in IBM”s SiGe technology GHz, GHz). The microphotograph of the ( mm . The major die is shown in Fig. 12. The die size is circuit building blocks were not only integrated into the receiver, but were implemented as individual IC components and tested. In the receiver IC, the building blocks were physically partitioned with a transmission line circuit and layout isolation interface similar to that presented in [5], [14]. Separate power supply systems with digital and analog grounds were routed. V. EXPERIMENTAL RESULTS The IC worked at first implementation with the VCO oscillation frequency 10% lower than simulated. The receiver IC was mounted into a microwave test fixture (Fig. 13) and was tested at 9.1 Gb/s (VCO oscillation frequency limit2 ) in a CDR mode and up to 12.5 Gb/s in a data-recovery mode or in internal PRBS test mode with an external clock. The carrier substrate size of 2Maximum oscillation frequency can be easily corrected by removing one delay stage in the VCO design of Fig. 9.

Fig. 14. SiGe receiver eye diagrams measured in CDR mode at 9.1 GB/s. Input data: 80 mV p PRBS 2 1.

0

cm in the test fixture was defined by the perimeter required for mounting I/O connectors in the housing metal box. A large number of required I/O were used for testing purposes. The receiver IC does not require external components, except decoupling and integration capacitors mounted beside the die. The recovered clock and data eye diagrams at 9.1 Gb/s are shown in Fig. 14. The IC input sensitivity is less than 4.5 mV at measured at the 1 : 8 demultiplexer output with AGC gain set to 20 dB (data eye closure in the test fixture was de-embedded). DATA : 8 transition distortion apparent in Fig. 14 is due to long ribbon cable attached to the test fixture demultiplexed outputs. The receiver consumes in a mission mode 4.5 W from 5 V. The CDR performance IC was fully characterized at 10-Gb/s on-wafer with a probe card. The die micrograph of the CDR is shown in Fig. 15. It consists of an exact copy of the receiver CDR layout plus the output buffers located in the DEMUX partition. In all of the measurements, input data were supplied singleended while unused differential input was terminated with 50 . The CDR typical eye diagrams measured with 20 mV PRBS data are shown in Fig. 16. The input sensitivity was meaas compared to 13.4 mV sured to be 14 mV at simulated considering thermal and shot noise in the decision channel. Phase noise of the recovered clock was measured with an HP 4352B as a power spectrum density (Fig. 17). 10-Gb/s input data were supplied with amplitude of 100 mV and

GRESHISHCHEV et al.: Fully Integrated SiGe Receiver IC

Fig. 15.

SiGe CDR IC die micrograph.

Fig. 16.

CDR IC eye diagrams measured in CDR mode at 10 Gb/s.

1955

Fig. 18.

CDR jitter transfer.

Fig. 19. CDR jitter tolerance. Performance measured with the clock reference level modulaton test marked with symbol .

Fig. 17. Phase-noise comparison of the CDR recovered clock, free running VCO and data pattern generator (BERT) clock.

PRBS pattern. For comparison, phase noise of the free-running VCO and the data-pattern generator was also measured and shown on the same plot. As expected in a high performance CDR, the recovered clock-phase noise follows, with no error, the data-reference clock noise down to the CDR jitter noise floor at 110 dBc/Hz. Similar recovered phase noise was achieved in the CDR design with a linear PLL and LC-type VCO [5]. Numerically integrated phase noise of the recovered

clock gives jitter RMS value of 0.78 ps. Phase noise was found . to be PRBS pattern independent up to a pattern of The OC192 jitter compliant performance (at 9.953 28 Gb/s) was verified with a jitter analyzer MX177 701 from Anritsu. Jitter generation (in 80-MHz bandwidth) was measured to be 5.4 ps and 0.8 ps RMS as compared to 10 ps or 1 ps RMS recommended by Bellcore [2]. The RMS jitter is very close to the 0.78-ps value obtained in the phase-noise measurement. Jitter transfer measurement (Fig. 18) showed, as predicted by modeling, single-pole-like characteristics with no jitter peaking. Jitter tolerance (Fig. 19) has a very wide safety margin for SONET mask with a minimum of 40 ps (15 ps is recommended). The shape of measured CDR jitter tolerance response differs from the modeled in Fig. 3 because of test setup limitations. This is seen from the measured jitter tolerance of the test setup (BERT) with no CDR in the data path (shown in the same plot of Fig. 19). Only in the frequency range of 40 kHz–2 MHz measured jitter tolerance is determined by CDR performance. In this frequency range, measured and modeled jitter tolerance coincide. The upper frequency range of the jitter tolerance response was also remeasured with a different (see Fig. 5) method, based on the reference voltage

1956

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 12, DECEMBER 2000

modulation with a sin-wave signal. Minimum jitter tolerance of 40 ps was measured. Both jitter transfer and jitter tolerance response were found to be PRBS pattern independent. The IC demonstrated a 60-MHz frequency range of robust PLL locking and operation even at the input signals well below the sensitivity level. VI. CONCLUSION A fully integrated SiGe receiver IC is presented, which combines self-aligned CDR with integrated binary PLL, AGC, 1 : 8 PRBS generator for self-testing. The demultiplexer, and receiver, mounted into a test fixture, operates up to 9.1 Gb/s (VCO limit) in a CDR mode and up to 12.5 GB/s in a data-recovery mode. Maximum die sensitivity is 4.5 mV at measured at 1 : 8 DEMUX output. Receiver die size is mm , and it consumes in a mission mode 4.5 W from 5-V power supply. CDR SONET-compliant jitter characteristics were verified on-wafer. Jitter tolerance well exceeds OC192 Bellcore mask with a minimum of 40 ps . Jitter transfer has a single-pole-like response with no peaking detected. Jitter generation is less than 1 ps RMS and less than 5.5 ps .

[14] Y. Greshishchev and P. Schvan, “60-dB gain 55-dB dynamic range 10-Gb/s SiGe HBT limiting amplifier,” IEEE J. Solid-State Circuits, vol. 34, pp. 1914–1920, Dec. 1999. [15] L. I. Anderson et al., “Silicon bipolar chipset for SONET/SDH 10-Gb/s fiber-optic communication links,” IEEE J. Solid-State Circuits, vol. 30, pp. 210–218, Mar. 1995.

Yuriy M. Greshishchev (M’95) received the M.S.E.E. degree from Odessa Electrotechnical Institute of Communications, Odessa, Ukraine, in 1974 and the Ph.D. degree in electrical and computer engineering from V.M. Glushkov Institute of Cybernetics, Microelectronics Division, Kyiv, Ukraine, in 1984. From 1976 to 1994, he worked with research and development organizations and academia on high-speed silicon bipolar and GaAs MESFET ADC and DAC integrated circuits. His Ph.D research was dedicated to the development of folding-type ADCs embedded into TV systems. In 1993, he was a Visiting Scientist at Micronet, Institution Center of University of Toronto, Ontario, Canada. In 1994, he joined the Department of Electrical and Computer Engineering, University of Toronto, where he conducted research on GaAs MESFET linear transmitter design for digital wireless communication. Since 1996, he has been with Nortel Networks, Ottawa, Ontario, where he is responsible for development of highly integrated circuit solutions in emerging technologies for optical communications. He has coauthored two books and numerous technical papers on the area of high-speed communication circuit design, data converters, and statistical modeling.

ACKNOWLEDGMENT The authors thank their colleagues S. Szilagyi for the microwave test fixture design, and D. Marchesan and Dr. S. Voinigescu for useful discussions and distributed components modeling. Special thanks to R. Hadaway for his directions and to IBM Corporation for fabrication. REFERENCES [1] B. Beggs, “GaAs HBT 10-Gb/s Product,” in 1999 IEEE MTT-S Int. Microwave Symp. Workshop, Anaheim, CA, June 13–19, 1999. [2] SONET OC-192, “Transport system generic criteria,” Bellcore, GR-1377-CORE, no. 4, Mar. 1998. [3] R. C. Walker et al., “A 10-Gb/s Si-bipolar Tx/Rx chipset for computer data transmission,” in ISSCC Dig. Tech. Papers, Feb. 1998, pp. 302–303. [4] T. Morikawa et al., “A SiGe single-chip 3.3-V receiver IC for 10-Gb/s optical communication systems,” in ISSCC Dig. Tech. Papers, Feb. 1999, pp. 380–381. [5] Y. Greshishchev and P. Schvan, “SiGe clock and data recovery IC with linear-type PLL for 10-Gb/s SONET application,” in Proc. 1999 Bipolar/BiCMOS Circuits and Technology Meeting, Sept. 1999, pp. 169–172. [6] Y. M. Greshishchev, P. Schvan, J. L. Showell, M.-L. Xu, J. J. Ojha, and J. E. Roger, “A fully integrated SiGe receiver IC for 10-Gb/s data rate,” in ISSCC Dig. Tech. Papers, Feb. 2000, pp. 52–53. [7] J. D. H. Alexander, “Clock recovery from random binary signals,” Electron. Lett., vol. 11, pp. 541–542, Oct. 1975. [8] C. R. Hogge, “A self-correcting clock recovery circuit,” J. Lightwave Technology, vol. 3, pp. 1312–1314, Dec. 1985. [9] J. Hauenschild et al., “A two-chip receiver for short-haul links up to 3.5-Gb/s with PIN-preamp module and CDR-MUX,” in ISSCC Dig. Tech. Papers, Feb. 1998, pp. 308–309. [10] J. Hauenschild et al., “A plastic packaged 10-Gb/s biCMOS clock and data recovering 1 : 4-demultiplexer with external VCO,” IEEE J. SolidState Circuits, vol. 31, pp. 2056–2059, Dec. 1996. [11] R. C. Walker et al., “A two-chip 1.5-GBd serial link interface,” IEEE J. Solid-State Circuits, vol. 27, pp. 1805–1811, Dec. 1992. [12] R. Steele, Delta Modulation Systems. New York/Toronto: Wiley, 1975. [13] M. Soda, T. Suzaki, and T. Morikawa et al., “A Si bipolar chip set for 10-Gb/s optical receiver,” in ISSCC Dig. Tech. Papers, Feb. 1992, pp. 100–101.

Peter Schvan (M’89) was born in Budapest, Hungary, in 1952. He received the M.S. degree in physics from Eotovos Lorand University, Budapest, in 1975 and the Ph.D. degree in electrical engineering from Carleton University, Ottawa, Ontario, Canada, in 1985. In 1985, he joined Nortel Neworks, Ottawa, where he started working in the area of BiCMOS and bipolar technology development, yield prediction, device characterization, and modeling. Recently, his work has been extended to the design of multi-gigabit circuits and systems. He is currently Senior Manager of a group responsible for evaluating various high-performance technologies and demonstrating advanced circuit concepts required for fiber optic communication systems. He has authored and co-authored numerous publications.

Jonathan L. Showell (S’90–M’95) received the B.Eng and M.Eng degrees in engineering physics from McMaster University, Hamilton, ON, Canada, in 1990 and 1994, respectively. He joined Nortel Networks, Ottawa, Canada, in 1994, working on hot carrier injection reliability of CMOS devices. Later he became a member of the Technology Access and Applications Group where his responsibilities included accurate high-frequency analog (up to 110 GHz) and digital (40 Gb/s) measurements and the design of high-speed 10- to 40-Gb/s, multiplexer/demultiplexer circuits in SiGe HBT and InP HBT technologies, respectively. Recently, he joined Quake Technologies, Ottawa, Canada, working on the design of chip sets for high-speed datacom applications. His interests include high-speed technologies, circuit design for high-speed communications, and accurate high-frequency measurements.

Mu-Liang Xu (M’00), biography not available at time of publication.

GRESHISHCHEV et al.: Fully Integrated SiGe Receiver IC

Jugnu J. Ojha (M’00) received the B.Eng. degree from Salhousie University and the Technical University of Nova Scotia in 1987. He received the M.Sc. and Ph.D. degrees from McMaster University, Hamilton, Ontario, Canada, in 1990 and 1994, respectively. His graduate work involved research in electronic and optoelectronic devices, as well as optoelectronic properties of semiconductors. He was with Nortel Networks, Ottawa, Ontario, Canada, from 1994 to 2000, where he worked on a wide range of technologies, including design of circuits for 10 and 40 Gb/s optical transmission systems using SiGe and InP HBTs. He also led a program in MEMS technology, with a focus on optical applications, including optical crossconnects. His other activities included next-generation optical network development, as well as research on optical properties of SiGe materials and devices. He recently joined Caspian Networks in Palo Alto, CA, as Senior Advisor in Optical Networking.

1957

Jonathan E. Rogers (S’00) received the B.A.Sc degree in engineering sciences (electrical option) from the University of Toronto, Ontario, Canada, in 1999. He is currently working toward the M.A.Sc in electronics at the University of Toronto. His area of research is clock and data recovery systems in deep sub-micron CMOS. In May, 1997, he joined Nortel, Ottawa, Ontario, for a 16-month internship, where he performed clock and data recovery system characterization, VCO design, and high-speed measurements on SiGe MMICs under the guidance of Dr. Y. Greshishchev.

1120

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 9, SEPTEMBER 2002

Clock and Data Recovery IC for 40-Gb/s Fiber-Optic Receiver George Georgiou, Member, IEEE, Yves Baeyens, Member, IEEE, Young-Kai Chen, Fellow, IEEE, Alan H. Gnauck, Senior Member, IEEE, Carsten Gröpper, Peter Paschke, Rajasekhar Pullela, Mario Reinhold, Claus Dorschky, John-Paul Mattia, Timo Winkler von Mohrenfels, and Christoph Schulien

Abstract—The integrated clock and data recovery (CDR) circuit is a key element for broad-band optical communication systems at 40 Gb/s. We report a 40-Gb/s CDR fabricated in indium–phosphide heterojunction bipolar transistor (InP HBT) technology using a robust architecture of a phase-locked loop (PLL) with a digital early–late phase detector. The faster InP HBT technology allows the digital phase detector to operate at the full data rate of 40 Gb/s. This, in turn, reduces the circuit complexity (transistor count) and the voltage-controlled oscillator (VCO) requirements. The IC includes an on-chip LC VCO, on-chip clock dividers to drive an external demultiplexer, and low-frequency PLL control loop and on-chip limiting amplifier buffers for the data and clock I/O. To our knowledge, this is the first demonstration of a mixed-signal IC operating at the clock rate of 40 GHz. We also describe the chip architecture and measurement results. Index Terms—Clock and data recovery, CDR, fiber-optic communication receiver, InP HBT, limiting amplifier, phase detector, VCO.

I. INTRODUCTION

C

LOCK and data recovery (CDR) is an important function of the transceiver of a high bit-rate lightwave communication system. Since 40-Gb/s systems are nearing commercial deployment, the chosen CDR architecture must have few external components and be insensitive to temperature and component variations. CDRs reported in the literature use either external filter-based architecture or a a high quality-factor phase-locked loop (PLL) architecture. While the high- filter architecture is easier to implement, it is susceptible to temperature and group delay variations in the filter [1]. Specifically, the temperature drift of the filter bandpass and the temperature drift of the timing within the IC are not correlated. Also, once the clock signal is recovered, additional precise clock phase adjustment is needed to set the decision sampling time to obtain proper phase margin. This phase Manuscript received January 30, 2002; revised April 30, 2002. G. Georgiou, Y. Baeyens, and Y.-K. Chen are with Lucent Technologies, Bell Laboratories, Murray Hill, NJ 07974 USA (e-mail: [email protected]). A. H. Gnauck is with Lucent Technologies, Bell Laboratories, Holmdel, NJ 07733 USA. C. Gröpper and P. Paschke are with Lucent Technologies, Optical Networking Group, 90411 Nürnberg, Germany. R. Pullela was with Lucent Technologies, Bell Laboratories, Murray Hill, NJ 07974 USA. He is now with Gtran Inc., Thousand Oaks, CA 91362 USA. M. Reinhold, C. Dorschky, T. W. von Mohrenfels, and C. Schulein were with Lucent Technologies, Optical Networking Group, 90411 Nürnberg, Germany. They are now with Core Optics, 90411 Nürnberg, Germany. J. P. Mattia was with Lucent Technologies, Bell Laboratories, Murray Hill, NJ 07974 USA. He is now with BigBear Networks, Milpitas, CA 95035 USA. Publisher Item Identifier 10.1109/JSSC.2002.801186.

Fig. 1. Schematic diagram of a lightwave transceiver.

adjustment is further complicated by packaging issues, specifically that of aligning the recovered clock after the off-chip filter and the decision IC. In PLL-based CDRs, the clock phase in the decision circuit is automatically synchronized to sample the center of the time slot of each bit. Also, the PLL can be integrated onto a single IC, greatly reducing temperature drift and phase relationship problems. Previous implementations of digital PLL-based CDRs at 40 Gb/s have employed half bit-rate clocking of CDR demultiplexer (DEMUX) combinations, to reduce the bandwidth required in the buffers and digital gates [2]–[4]. By clocking at 20 GHz to tolerate lower transistor bandwidth, the 2 parallel phase detector requires higher circuit complexity with about twice the transistor count and a precise four-phase voltage-controlled oscillator (VCO). In this paper, we leverage the InP heterojunction bipolar transistor (HBT) technology operating at the full 40-Gb/s data rate to simplify the CDR architecture. II. CDR ARCHITECTURE The typical transceiver architecture for a 40-Gb/s lightwave system is shown in Fig. 1. The CDR IC of this work is highlighted in the receiver path. The core of the PLL-based CDR is the phase detector. The phase detector used here simultaneously recovers both clock and data. The digital early–late phase detector [5], consisting of data flip-flops and combinatory logic gates, is shown in Fig. 2. In the locked state, the – chain samples the center of the incoming

0018-9200/02$17.00 © 2002 IEEE

GEORGIOU et al.: CDR IC FOR 40-Gb/s FIBER-OPTIC RECEIVER

Fig. 2. Digital phase detector architecture.

1121

Fig. 3.

CDR IC block diagram.

data time slot, while the chain samples the data zero crossings. The combinatory logic block, with inputs from the , , and latch chains, determines if the clock is early or late with respect to the incoming data transition. This logic generates the UP–DOWN control signal for the VCO. and are generated by decision circuits (respectively, after two and four latches). The phase difference between and is 180 (one bit). is generated after three latches with an inverted clock. If the clock phase is correct, is always in the middle of and . EXOR ( ) and NAND ( ) gate combinatory logic converts , , and to the UP–DOWN pulses for controlling the VCO. The logic equations are as follows. UP DOWN The clock is early (slows down the VCO) if and , . The clock is late (speeds up the VCO) if UP DOWN and , UP DOWN . The clock is correct if . Obviously, data transitions are required for clock recovery. The chip also incorporates 15-dB gain-limiting amplifier buffers at the data I/O, at the divide-by-2 clock output (for the external DEMUX) and at the divide-by-32 clock output (for the external coarse adjustment low-frequency lock loop). The static frequency dividers use similar latches as those in the digital phase detector. Figs. 3 and 4, respectively, show the final chip block diagram and photograph. The block diagram of Fig. 3 is laid over Fig. 4. To maintain symmetry in the 40-Gb/s data and 40-GHz clock signals, the phase detector layout contains four symmetric rows (as in the block diagram of Fig. 2). III. CDR DESIGN AND FABRICATION Differential emitter-coupled logic (ECL) logic with 400-mV differential voltage swings is used. To simplify the layout process at a high bit rate, standardized digital and analog blocks are designed and used. All high-frequency inputs are terminated with on-chip 50resistors. Propagation delay of the 40-GHz clock to the , , and latches and to the divider chain is a critical issue. Matching propagation delay is achieved by symmetric layout

Fig. 4. CDR IC photograph.

and symmetric loading (for example, the dummy latch in the chain of Fig. 2). Coplanar transmission lines with controlled impedance are ) to reduce reflections and to imused for longer lines ( prove timing accuracy. Lines driven by emitter followers are kept short to avoid reflections due to impedance mismatch. A series gate approach is used for the clock and data signals. To improve high-frequency performance, a transadmittance (TAS) and transimpedance (TIS) combination [6], connected by coupled coplanar transmission lines is used for clock and data amplification and distribution. TAS–TIS buffer amplifiers have a higher gain-bandwidth product (because of the active TIS load) than does a simple ECL buffer. The buffer amplifier signal splitting and transistor level design are shown in Fig. 5. The VCO transistor level schematic is shown in Fig. 6. The VCO is based on a differential Colpitts topology using coplanar transmission lines which can be modeled very accurately at 40 feeds two GHz, instead of inductors. The tuning input reverse-biased varactors realized from the base–collector junctions of open-emitter HBTs. Note that minimum phase noise is not a design parameter since the early–late (or bang–bang) PLL architecture has very high bandwidth with respect to loop gain.

1122

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 9, SEPTEMBER 2002

Fig. 5. Buffer amplifier cascaded TAS–TIS architecture and transistor level schematic.

Fig. 6. VCO transistor level schematic.

Thus, the VCO phase noise is reduced by the open-loop gain of the locked PLL. The chip is fabricated in an InP HBT technology [7]. Peak are 160 and 135 GHz, respectively. The transistor and transistors are conservatively biased at half the collector current for peak . The nominal transistor has a 1 m 3 m emitter biased with a collector current of 4 mA. The interconnects use one metal layer for longer wires and one local metal layer for shorter wires. Passive elements are fabricated using tantalum–nitride thin-film resistors and silicon–nitride dielectric capacitors. ) A power-supply series decoupling resistor ( is used to prevent any spurious low-frequency oscillations that could arise in the packaged IC. The internal circuit is designed to operate between 4 and 5 V. The circuit draws nominally 1 A. The nominal total power dissipation is 5.6 W, 1.6 W across

Fig. 7. Retimed 40-Gb/s data (40-GHz clock, 30 mV/div, 10 ps/div) and phase-detector transfer functions (UP DOWN with 40-GHz clock offset 10 MHz).

+

<

the decoupling series resistor and 4 W internally. The chip area is 1.75 mm 1.75 mm. IV. MEASUREMENT RESULTS Two versions of the CDR chip (with and without VCO) were fabricated. The results of on-wafer measurements on the CDR with external VCO are discussed here. The input signal generated by a commeris a single-ended 0.5 cial 40-Gb/s 4 : 1 multiplexer, driven with four independent pseudorandom bit streams (PRBS) of a 10-Gb/s 2 pattern generator. A synthesizer generates the 40-GHz clock signal for this measurement. Fig. 7 shows the retimed 40-Gb/s eye from the CDR IC. (It should be noted that the on-wafer

GEORGIOU et al.: CDR IC FOR 40-Gb/s FIBER-OPTIC RECEIVER

1123

Fig. 8. Divide-by-2 frequency spectrum of on-chip VCO. VCO designed for 40 GHz but actual at 42 GHz divided to 21 GHz with an output power of 5 dBm.

0

measurement is limited by the equipment. The retimed data eye is sharper than that of the commercial multiplexer. Also, the jitter is characteristic of the digital oscilloscope used for the measurement.) Decision circuit phase margin greater than 180 is measured by changing the clock phase with respect to incoming data and observing the output data eye. The phase transfer function (UP and DOWN) at the bottom of Fig. 7 is measured by introducing a 10-MHz offset between the clock frequency and data bit rate. Measurements of the CDR with on-chip LC VCO indicate that the VCO is capable of driving 40-Gb/s digital gates. However, the VCO center frequency is higher than simulated, probably because of capacitance or inductance (transmission-line) drift during this process run. The VCO operates in the band between 40.5 and 42.5 GHz. The divider chains operate up to a frequency of 44 GHz. The output of the divide-by-2 is shown in Fig. 8 for the VCO tuned to 42 GHz. To evaluate the packaged performance of the CDR chip in an optical transmission system, we used the CDR chip as a single channel 1 : 4 DEMUX by applying 40-Gb/s data and 10-GHz clock. (The packaging used here for 40 Gb/s is relatively simple. The chip is mounted into a cutout in a composite Rogers 4003 on FR-4 board. This chip recess corresponding approximately the chip thickness 8 mil, reduces the length of the bond wires to the 50- coplanar GSSG transmission lines designed on the R4003 substrate. (See [8, Fig. 6].) The experimental optical time-division multiplexing (OTDM) link is shown in Fig. 9. As before, 4 : 1 MUX is used to multiplex independent 10-Gb/s nonreturn-to-zero (NRZ) PRBS data streams, from a commercial pulse pattern 2 PRBS NRZ generator. The resulting electrical 40-Gb/s 2 PRBS RZ signal is converted to an optical 40-Gb/s 2 signal by a pulse-carving technique with cascaded modulators. Optical power is converted back to the electrical signal with a p-i-n photodetector. The 40-Gb/s RZ eye after the p-i-n is demultiplexed to 10 Gb/s using the CDR chip as a single channel 1 : 4 DEMUX. Fig. 10 shows the CDR IC performance as a one-channel DEMUX. The demultiplexed 10-Gb/s eye is very open and has very low jitter ( 4 ps limited by the oscilloscope bandwidth). The error probability is also measured as a function of received

Fig. 9. Experimental optical link to measure the bit-error rate of 40-Gb/s RZ transmission.

Fig. 10. CDR IC used as single channel 1 : 4 DEMUX (40-Gb/s data, 10-GHz clock). Electrical output at 10 Gb/s and BER versus optical power measurement of the optical link of Fig. 8.

optical power. The received optical power is 29.5 dBm for the typical system required bit-error rate (BER) of 10 . V. CONCLUSION A complex ( 1350 transistors and 1.75-mm square) mixed-signal CDR chip with on-chip VCO, amplifiers, decision circuit, and clock dividers was successfully fabricated with a state-of-the-art InP HBT technology. Fully functional chips at speed were obtained from the first iteration. Data is retimed at 40 Gb/s and a good control signal is made available to the on-chip VCO. The CDR IC is used as a DEMUX to PRBS RZ signal to an convert an optical 40-Gb/s 2

1124

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 9, SEPTEMBER 2002

electrical 10-Gb/s 2 PRBS NRZ signal in an optical link experiment. An optical sensitivity of 29.5 dBm is measured at 10 BER. REFERENCES [1] R. Yu, R. Pierson, P. Zampardi, K. Runga, A. Campana, D. Meeker, K. C. Wang, A. Peterson, and J. Bowers, “Packaged clock recovery integrated circuits for 40-Gb/s optical communication links,” in GaAs IC Symp. Tech. Dig., 1996, pp. 129–132. [2] M. Wurzer, J. Bock, H. Knapp, W. Zirwas, F. Schumann, and A. Felder, “A 40-Gb/s integrated clock and data recovery circuit in a 50-GHz f silicon bipolar technology,” IEEE J. Solid-State Circuits, vol. 34, pp. 1320–1324, Sept. 1999. [3] J. Hauenschild, C. Dorschky, T. W. von Mohrenfels, and R. Seitz, “A plastic packaged 10-Gb/s BiCMOS clock and data recovering 1 : 4 demultiplexer with external VCO,” IEEE J. Solid-State Circuits, vol. 31, pp. 2056–2059, Dec. 1996. [4] M. Reinhold, C. Dorschky, R. Pullela, E. Rose, P. Mayer, P. Paschke, Y. Baeyens, J. P. Mattia, and F. Kunz, “A fully integrated 40-Gb/s clock and data recovery/1 : 4 DEMUX IC in SiGe technology,” IEEE J. Solid-State Circuits, vol. 36, pp. 1937–1945, Dec. 2001. [5] J. D. H. Alexander, “Clock recovery from random binary signals,” Electron. Lett., vol. 11, pp. 541–542, 1975. [6] H.-M. Rein, “Design considerations for very high speed Si-bipolar ICs operating up to 50 Gb/s,” IEEE J. Solid-State Circuits, vol. 31, pp. 1076–1090, Aug. 1996. [7] M. Sokolich, D. Doctor, Y. Brown, A. Kramer, J. Jensen, W. Stanchina, S. Thomas, C. Fields, D. Ahmari, M. Liu, R. Martinez, and J. Duvall, “A low power 52.9-GHz static divider implemented in a manufacturable 180-GHz InAlAs/InGaAs HBT IC technology,” in GaAs IC Symp. Tech. Dig., 1998, pp. 117–120. [8] G. Georgiou, P. Paschke, R. Kopf, R. Hamm, R. Ryan, A. Tate, J. Burm, C. Schullien, and Y.-K. Chen, “High gain limiting amplifier for 10-Gb/s lightwave receivers,” in Proc. 11th Int. Conf. InP and Related Materials, 1999, pp. 71–74.

George Georgiou (M’92) was born in Greece in 1954. He received the Ph.D. degree in applied physics from Columbia University, New York, NY, in 1980. He joined AT&T (now Lucent Technologies) Bell Laboratories in 1980 to develop sub-micron X-ray lithography systems. He proceeded into process integration of novel gate and metal structures for sub-micron silicon CMOS. He is currently a Member of Technical Staff with the High-Speed Electronics Research Department of Lucent Technologies, Bell Laboratories, Murray Hill, NJ. His current interest is mixed-signal IC design for high-speed lightwave communications systems using InP and SiGe HBT technologies.

Yves Baeyens (S’87–M’96) received the M.S. and Ph.D. degrees in electrical engineering from the Catholic University, Leuven, Belgium, in 1991 and 1997, respectively. His Ph.D. research was performed in cooperation with IMEC, Leuven, and treated the design and optimization of coplanar InP-based dual-gate HEMT amplifiers operating up to the W-band. He was a Visiting Scientist with the Fraunhofer Institute for Applied Physics, Freiburg, Germany, for a year and a half, and is currently a Technical Manager in the High-Speed Electronics Research Department of Lucent Technologies, Bell Laboratories, Murray Hill, NJ. His research interests include the design of mixed analog-digital circuits for ultrahigh-speed lightwave and millimeter-wave applications.

Young-Kai Chen (S’78–M’86–SM’94–F’98) received the B.S.E.E. degree from National Chiao Tung University, Hsinchu, Taiwan, R.O.C., the M.S.E.E. degree from Syracuse University, Syracuse, NY, and the Ph.D. degree from Cornell University, Ithaca, NY, in 1988. From 1980 to 1985, he was a Member of Technical Staff in the Electronics Laboratory of the General Electric Company, Syracuse, responsible for the design of silicon and GaAs MMICs for phase array applications. Since 1988, he has been with Lucent Technologies, Bell Laboratories, Murray Hill, NJ, as a Member of Technical Staff. Since 1994, he has been the Director of the High Speed Electronics Research Department. He is also an Adjunct Associate Professor at Columbia University, New York, NY. His research interest is in high-speed semiconductor devices and circuits for wireless and fiber-optic communications. He has authored more than 90 technical papers and holds nine patents in the field of high-frequency electronic and semiconductor lasers. Dr. Chen is a member of the American Physics Society and the Optical Society of America.

Alan H. Gnauck (M’98–SM’00) received the B.S. degree in physics and the M.S. degree in electrical engineering from Rutgers University, New Brunswick, NJ, in 1975 and 1986, respectively. In 1982, he joined AT&T (now Lucent Technologies) Bell Laboratories. He has designed and built multigigabit amplifiers, multiplexers, demultiplexers, and optical receivers, and performed record-breaking optical transmission experiments at single-channel rates of from 2 to 40 Gb/s. He has investigated coherent detection, chromatic-dispersion compensation techniques, CATV hybrid fiber-coax architectures, wavelength-division-multiplexed (WDM) systems, and system impacts of fiber nonlinearities. His WDM transmission experiments include the first demonstration of terabit transmission. He is a Technical Committee Member of the Optical Fiber Communications Conference (OFC) 2003. He holds twelve patents in optical fiber communications. His current research interests include the study of WDM systems with single-channel rates of 40 Gb/s. Dr. Gnauck is an Associate Editor for IEEE PHOTONICS TECHNOLOGY LETTERS.

Carsten Gröpper was born in Münster, Germany, in 1971. He received the Dipl.-Ing. degree in electrical engineering from the Ruhr University, Bochum, Germany, in 1998. He joined Lucent Technologies, Nürnberg, Germany, in 1998. He is currently with the Optical Networking Group, Lucent, developing high-speed bipolar ICs for 10- and 40-Gb/s optical communication systems.

Peter Paschke was born in Dusseldorf, Germany, on May 21, 1959. He received the M.S. degree in electrical engineering from the Ruhr University, Bochum, Germany, in 1988. In 1988, he joined Philips Kommunikation Industries, Nürnberg, Germany, as a Full Custom ASIC Designer. He is currently a Technical Manager with Lucent Technologies GmbH, Nürnberg, where he is responsible for the high-speed ASICs. His main focus is analog circuits such as laser drivers and limiting amplifiers for high bit rates up to 40 Gb/s. In the lightwave system area, he has been involved in 2.5-Gb/s receiver design and several research projects for 40 Gb/s.

Rajasekhar Pullela received the B.Tech. degree in electrical and communications engineering from the Indian Institute of Technology, Madras, India, in 1993. From 1993 to 1998, he was a graduate student researcher at the University of California, Santa Barbara. During this period, he received M.S. and Ph.D degrees in electrical engineering, studying device physics and high-speed circuit design. During 1998–2000, he was a Member of Technical Staff with Lucent Technologies, Bell Laboratories, Murray Hill, NJ, designing high-speed ICs for fiberoptic communication systems. Since 2000, he has been with Gtran, Inc., Newbury Park, CA.

GEORGIOU et al.: CDR IC FOR 40-Gb/s FIBER-OPTIC RECEIVER

Mario Reinhold was born in Mülheim/Ruhr, Germany, in 1972. He received the Dipl.-Ing. degree in electrical engineering from the Ruhr University, Bochum, Germany, in 1998. He joined the Optical Networking Group, Lucent Technologies, Nürnberg, Germany, in 1998, where his activities focused on the development of various analog and digital high-speed bipolar ICs for 40-Gb/s and advanced 10-Gb/s fiber-optic communication systems. Since 2001, he has been with CoreOptics Inc., Nürnberg, working on a next-generation 40-Gb/s chipset.

Claus Dorschky received the Dipl.-Ing degree in electrical engineering from Friedrich Alexander University, Erlangen, Germany, in 1986. He was with Phillips Kommunikation Industries (later Lucent Technologies), Nürnberg, Germany, for 14 years, working in the development department for high-speed optical transmission systems. His research interests include design and integration of analog and mixed-signal full custom ICs for 10- and 40-Gb/s as well as integration of optical receivers and transmitters into single-wavelength and DWDM transmission systems at those bit rates. In early 2001, he cofounded CoreOptics Inc., Nürnberg.

1125

John-Paul Mattia received the B.S., M.S.E.E., and Ph.D. degrees in electrical engineering and computer science from the Massachusetts Institute of Technology (MIT), Cambridge. He began working in high-speed electronics at MIT Lincoln Laboratory in 1989. In 1996, he joined Texas Instruments Incorporated in the DSP R&D organization. From 1997 to 2000, he worked in the High-Speed Electronics Group, Lucent Technologies, Bell Labs, designing and testing circuits for lightwave communication systems. Since July 2000, he has been with Big Bear Networks, Sunnyvale, CA, where he is Chief Technical Officer of Electronics.

Timo Winkler von Mohrenfels, photograph and biography not available at time of publication.

Christoph Schulien, photograph and biography not available at time of publication.

1156

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 7, JULY 1997

Clock/Data Recovery PLL Using Half-Frequency Clock M. Rau, T. Oberst, R. Lares, A. Rothermel, R. Schweer, and N. Menoux

Abstract— A clock and data recovery PLL is described for serial nonreturn-to-zero (NRZ) data transmission. The voltage controlled oscillator (VCO) works at half the data rate, which means for a 1-Gb/s data rate, the VCO runs at 500 MHz. A specially designed phase comparator uses a delay-locked loop (DLL) to generate the required sampling clocks to compare clock and data. The VCO can typically be tuned from 350 MHz to 890 MHz and the phase-locked loop (PLL) locks between 720 Mb/s and 1.3 Gb/s. Data recovery is error free up to 1.2 Gb/s with a 9-b pseudorandom data sequence. The core consumes 85 mW (3.3 V) at 1 Gb/s. Index Terms—Bang-bang control, CMOS digital integrated circuits, data communication, high-speed integrated circuits, phase locked loops, synchronization.

I. INTRODUCTION

D

IGITAL signal processing becomes economical in consumer applications. The main requirement there is low cost in mass production. Digital processing and transmission has to be carried out with low power and in cheap IC packages. Data transmission between different digital signal processing IC’s influences significantly the power consumption and the system cost. For video signal transmission in 100 Hz TV sets, typically 16 data lines in parallel are driven with 27 MHz rail-to-rail nonreturn-to-zero (NRZ) data signals. Sharp data transitions are in use to ensure reliable synchronous operation. A power saving alternative could be found in low-swing highspeed serial data transmission in the range of 500 Mb/s or more. However, this kind of high-speed data transmission has to be asynchronous. The most economic solution avoids separate transmission of the clock. In that case, clock recovery from the NRZ data stream is required. In this paper we describe a phase-locked loop (PLL) which is designed to process more than 1 Gb/s data in a 0.5- m CMOS technology.

II. ARCHITECTURE The PLL generally consists of three building blocks (Fig. 1): 1) phase comparator, detecting the phase difference between the data and the recovered clock; Manuscript received December 15, 1996; revised February 6, 1997. This work was supported in part by the German Ministry for Education and Research under Contract 01M2880A. M. Rau was with the University of Ulm, Germany. He is now with Siemens AG, 81359 Munich, Germany. T. Oberst was with the University of Ulm, Germany. He is now with DASA, D-89077 Ulm, Germany. R. Lares and A. Rothermel are with the Microelectronics Department, University of Ulm, D-89081 Ulm, Germany. R. Schweer is with Thomson Multimedia, D-78048 VillingenSchwenningen, Germany. N. Menoux is with Thomson, 38240 Meylan, France. Publisher Item Identifier S 0018-9200(97)04386-2.

Fig. 1. Classic PLL.

2) loop filter, filtering the phase detector output and forming the control signal for the oscillator; 3) voltage controlled oscillator (VCO). The unusual feature in our design is the phase detector, which uses a delay-locked loop (DLL) to generate multiple sampling clocks. Thus, the VCO can run at only half the data rate, which means that we can detect a 1-Gb/s serial data stream with a 500-MHz VCO. This relieves the timing constraints in the phase detector logic and results in well correlated and data independent control signals. Also, at the lower frequency the VCO tuning range is large enough to compensate all technology parameter variations. With this architecture we could achieve higher data rates. The block diagram of the circuit is shown in Fig. 2. No external components are required for the PLL. The loop filter capacitor is integrated on chip together with the VCO, the phase comparator, and a charge pump. The data stream is retimed in two flip-flops with the inverted and noninverted clock. Two flip-flops are required because the clock has only half the data rate. These two half-speed data streams are combined in a multiplexer, forming an output stream at the original data rate. A lock-in circuit is realized on chip, because the phase comparator is not frequency sensitive. III. PHASE COMPARATOR The PLL adjusts the clock to an incoming data stream. Because of the random nature of data there is not necessarily a data transition at every clock cycle. The loop has to handle sequences of consecutive zeroes or ones in the data stream. The following phase comparator output signal properties are essential. First, the phase comparator must not give any output signal if there is no data edge. Second, the duration of the control signal pulses at the data transitions is important, especially if there are few of them. In general, for a good loop performance the control signal should be proportional to the phase error. However, for very high operating frequencies, analog signals depend on the data pattern and become highly nonlinear, because they do not settle during the bit duration. It was found by simulation that different phase detectors with analog outputs [1], [2] limit the PLL operating frequency. On the other hand, clock recovery schemes based on sampling techniques [3], [4] result in uniform digital control pulses. They are

0018–9200/97$10.00  1997 IEEE

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 7, JULY 1997

1157

Fig. 2. Clock recovery block diagram.

(a)

Fig. 3. Phase comparator. (b)

best suited to support highest possible data rates at a given technology. The phase comparator used here is an extension of the circuit from [3], modified to work with half the “normal” clock frequency (Fig. 3). The data stream is sampled at four equally spaced timepoints. The logic circuitry driven by the flip-flops generates the up and down control pulses for the VCO according to Fig. 4. Because these control pulses are generated by clocked flip-flops, they are of well defined width. The advantage is that they do not depend on the data pattern. On the other hand, they do not reflect the amount of the phase error, either. The pulse width is constant, even for very small phase errors. This so-called bang-bang operation generates an increased jitter in the locked state. However, the magnitude is much smaller compared to the one introduced by datadependent and nonlinear analog pulses at high frequencies. The phase logic evaluates only rising signal edges, in order not to depend on duty cycle variations of the input signal. There is an issue to be taken care of when dimensioning the flip-flops and the phase logic. The stable operating point of the loop is reached when the signal is sampled exactly at its transition (see Fig. 4). Thus the loop forces the flipflop to sample the metastable state, which is not allowed in normal flip-flop operation. In this application, however, it is not critical for the operation. If the metastable state is sampled, it does not matter whether it will be interpreted as up or down, because any decision is equally wrong, as we are at the stable operating point, i.e., zero phase error. Only the jitter of the

(c) Fig. 4. Operation of the phase detector: (a) data at sampling time B equals data transition is late the data at the preceding sampling time A frequency up, (b) data at sampling time B equals the data at the following data transition is early frequency down, (c) data at sampling time A sampling time A equals the data at the preceding sampling time A no data edge, no control signal output.

)

) )

)

)

bang-bang operation results. Also, there is an increased short current inside the flip-flops that has to be limited. For uniform pulses and small jitter, absolutely identical sampling intervals are required. Therefore, a DLL has been implemented to generate four 90 shifted clock phases clk1 clk4 from the VCO output signal (Fig. 5). The loop compares the phase of the original clock to a clock fed through four adjustable delay elements. The clock signal repeats with a period . A delay element in Fig. 5 can therefore delay by , or as well by . By rearranging the output signals, are also possible. With a delay times of , it is not possible to compensate for delay element for all technology and environment variations. Therefore, it is necessary to select a larger value for the delay, to just be able to deal with all technology parameter variations.

1158

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 7, JULY 1997

Fig. 5. DLL to generate all 90 phase shifted sampling clocks with high accuracy.

Fig. 7. VCO schematic.

Fig. 8. VCO frequency versus control voltage. Fig. 6. Current mirror charge pump.

VI. LOCK-IN CIRCUIT

The stability of the system containing two coupled loops can be guaranteed for two reasons. First, the DLL is a firstorder loop and inherently stable. Second, the time constants of the two loops are two orders of magnitude different.

IV. CHARGE PUMP

AND

LOOP FILTER

The control pulses drive a current mirror charge pump [6] (Fig. 6) which assures that the charge delivered to the loop filter does not vary with the VCO control voltage. The charge pump allows the realization of an ideal integrator transfer function (pole at ) with no additional active amplifier, resulting in a zero-phase error in steady state. A simple RC network shown in Figs. 2 and 6 is used for the low-pass loop filter. The current level of the charge pump and hence the charge delivered at every rising data transition can be set to a small value. This allows the implemention of the loop capacitor on chip.

V. VCO Both high oscillation frequency and a wide tuning range are required. We choose a ring oscillator design with variable load capacitors (Fig. 7) based on [5]. Duty cycle is not an issue here, because the flip-flops all are triggered with the same edge; the DLL generates the required phase shifts. This circuit can safely cope with all parameter variations. Fig. 8 shows the VCO tuning characteristic.

The bang-bang operation and the data dependent phase detector output signal require a narrow loop bandwidth for a low jitter. This results in a reduced pull-in range of the PLL. Instead of adapting the loop bandwidth during operation we created a lock-in circuit which is active only after power up. For lock-in, a 1010-sequence has to be fed to the circuit. The VCO is swept, starting with the highest frequency. When clock and input frequencies are the same, the sampled data (before the Mux) do not change. An edge-triggered monoflop then stops the frequency sweep and closes the PLL. VII. LAYOUT Fig. 9 shows the test chip. A large area is used for the onchip loop filter capacitor (upper left). A comparable area is required for the ring oscillator, including its load capacitors (lower left). Because the series resistance of those load capacitors is more critical compared to the one in the loop filter, a finer finger structure was chosen. All capacitors have been realized as MOS-transistor gates. No special mask is required. In the top right area are located the lock-in circuit and the DLL with its loop filter, whereas in the lower middle and to the right, buffers and control logic can be seen. VIII. MEASUREMENT RESULTS We verified locking of the PLL at data rates from 720 to 1300 Mb/s with pseudorandom sequences up to bit at the data input. However, data recovery is not guaranteed under these conditions because of the clock jitter. Fig. 10 shows the maximum available data rates for different lengths

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 7, JULY 1997

1159

Fig. 11. VCO clock output and data output eye pattern at 1 Gb/s with a (215 1)-bit length pseudorandom input sequence.

0

Fig. 9. Chip micrograph.

Chip core area is 0.38 mm , power consumption without pad drivers is 85 mW at 1 Gb/s, 0.5- m CMOS, 3.3 V supply. Only 1/4 of the power consumption is proportional to the clock frequency, 3/4 are constant. The circuit consumes 91 mW at 1.3 Gb/s. The power saved by using only half the conventional clock frequency is partly used to supply the DLL, which needs 21 mW ( 1/4 of total power at 1 Gb/s). No external components are required, except one reference current, which is not very critical (a 20% variation is allowed). IX. CONCLUSION

Fig. 10. Maximum data rate versus pseudorandom sequence length for error-free receiving during time of measurement (complies with error rate smaller than 1 10011 ).

1

of the pseudo random sequences for correct data recovery. Measurement period was 10 clocks (corresponding to a bit ). error rate of At very high data rates, clock and data phase precision has to be better at the input of the retiming flip-flops, because the “eyes” become smaller. The lower required phase jitter corresponds to shorter pseudorandom sequences. -bit Fig. 11 shows the locked PLL at 1 Gb/s with a length pseudorandom sequence. The clock jitter is about 350 ps, which is caused mainly by the bang-bang operation of the phase comparator. We believe that this behavior can be improved by reducing the uncertain time interval of the sampling flip-flop, i.e., reducing their setup-and-hold times and increasing the clock slope. All measurements have been done with the IC housed in a standard 16-pin dual in-line ceramic package which shows rather poor high-frequency performance. It was our goal to demonstrate the circuit in a critical environment. Better results could be expected when using packages with shorter leads.

Complete on-chip clock and data recovery at 1 Gb/s is feasible with a standard 0.5- m CMOS technology. Onchip clock is only 500 MHz in this case. Data are directly demultiplexed one to two in the retiming flip-flops. A multiplexer to regenerate the original data stream was included for measurement purposes only. In applications, serial-to-parallel conversion will normally follow the PLL. In that case, the halved clock frequency is an advantage, because the following blocks can be designed more easily. ACKNOWLEDGMENT The authors greatly acknowledge perfect layout support by Y. A. Savalle and G. Kimmich from TCEC. They thank J. Borel from SGS-Thomson for providing the design kit and acknowledge the fast sample production in the factory. REFERENCES [1] T. H. Lee, “A 155-MHz clock recovery delay- and phase-locked loop,” IEEE J. Solid-State Circuits, vol. 27, pp. 1736–1746, Dec. 1992. [2] B. Thompson, “A 300-MHz BiCMOS serial data transciever,” IEEE J. Solid-State Circuits, vol. 29, pp. 185–192, Mar. 1994. [3] B. Lai and R. C. Walker, “A monolithic 622 Mb/s clock extraction data retiming circuit,” in Int. Solid-State Circuits Conf., San Francisco, CA, 1991, vol. 306, pp. 144–145. [4] A. Pottbaecker, U. Langmann, and H.-U. Schreiber, “A si bipolar phase and frequency detector IC for clock extraction up to 8 Gb/s,” IEEE J. Solid-State Circuits, vol. 27, pp. 1747–1751, Dec. 1992. [5] M. Bazes, “A novel precision MOS synchronous delay line,” IEEE J. Solid-State Circuits, vol. 20, pp. 1265–1271, Dec. 1985. [6] A. Waizman, “A delay line loop for frequency synthesis of de-skewed clock,” in Int. Solid-State Circuits Conf., San Francisco, CA, 1994, pp. 298–299.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 9, SEPTEMBER 2000

1353

SiGe Clock and Data Recovery IC with Linear-Type PLL for 10-Gb/s SONET Application Yuriy M. Greshishchev, Member, IEEE, and Peter Schvan, Member, IEEE

Abstract—An integrated 10 Gb/s clock and data recovery (CDR) circuit is fabricated using SiGe technology. It consists of a linear-type phase-locked loop (PLL) based on a single-edge version of the Hogge phase detector, a LC-tank voltage-controlled oscillator (VCO) and a tri-state charge pump. A PLL equivalent model and design method to meet SONET jitter requirements are presented. The CDR was tested at 9.529 GB/s in full operation and up to 13.25 Gb/s in data recovery mode. Sensitivity is 14 mVpp at a bit error rate (BER) 10 9 . The measured recovered clock jitter is less than 1 ps rms. The IC dissipates 1.5 W with a 5-V power supply.

=

Index Terms—Charge pump, clock and data recovery (CDR), jitter generation, jitter tolerance, jitter transfer, phase detector, phase-locked loop (PLL), SONET, VCO.

I. INTRODUCTION N A CLOCK and data recovery (CDR) circuit with integrated phase-locked loop (PLL), the reference clock is extracted from the incoming data stream and is automatically aligned to the center of the data pulse independent of its pattern. Two CDR ICs have been reported operating at 10 Gb/s: one with a linear PLL (LPLL) using a modified Hogge-type phase detector (PD) [1] and the other with a binary PLL using a bang-bang (Alexander)-type PD [2].1 CDR jitter characteristics critical to SONET optical receiver design are Jitter Transfer (Bandwidth and Jitter Peaking), Jitter Tolerance, and Jitter Generation. In a linear CDR (LCDR), jitter transfer characteristics are independent of the jitter amplitude and can be analytically predicted according to a LPLL theory. This feature can be important for SONET applications, particularly if data is to be retransmitted and jitter transfer must be well controlled. This paper describes a 10 Gb/s LCDR with less than 1 ps rms pattern-independent jitter generation required for SONET application [3]. This jitter generation represents the best published CDR result. A number of techniques have been used to achieve low jitter. First, a charge pump with tri-state is employed. This is a well known technique frequently used in conjunction with frequency-PD PLLs [4]. This technique is also called switched-filter PLL [5]. Generally, this approach provides a hold mode in the PLL filter in the absence of data transitions and prevents variation of the voltage-controlled oscillator

I

Manuscript received December 8, 1999; revised February 2, 2000. The authors are with Nortel Networks, Ottawa, ON K1Y 4H7, Canada. Publisher Item Identifier S 0018-9200(00)05928-X. 1The linear property of a LPLL is due to a linear phase response of the Hogge-type PD. In publication [1], [4] the Hogge-type PD is called a phase comparator, which is a misleading terminology since the word comparator implies a binary output. The bang-bang (Alexander)-type PD is a true phase comparator.

(VCO) frequency (that causes jitter) during a long run of data 0 s or 1 s. Second, charge-pump and VCO control circuits were designed to provide a high degree of PLL filter isolation, or low charge-pump offset current, in a tri-state. In addition, the charge pump has a high output impedance necessary for high loop gain in a PLL with passive filter. Third, the original Hogge PD was modified to provide a single-edge operation and to extend linear phase range. Fourth, circuit and layout cross-talk isolation techniques similar to those presented in [6] are employed to prevent jitter generation and sensitivity degradation due to a cross-talk. The CDR IC was implemented in IBM’s SiGe bipolar process which includes pMOS devices. Jitter characteristics of a LPLL depend to a large degree on the PLL filter parameters. An LPLL equivalent model and design method to satisfy SONET requirements are presented in this paper. A theoretical jitter tolerance function is introduced based on considerations similar to those presented in [7]. It is shown that in a LPLL, all of the jitter characteristics specified by SONET requirements can be analytically expressed via and PLL damping factor . The jitter transfer bandwidth should be above 4 MHz to satisfy OC PLL bandwidth 192 SONET jitter tolerance mask requirement, and the damping – to satisfy 0.1-dB jitter peaking. factor should be above In Section II the CDR architecture is described with an attention to low jitter operation and the LPLL equivalent model and its design method are considered. Then, in Section III the building-blocks circuit diagrams are discussed. The CDR hierarchical simulation flow is given in Section IV. Finally, the IC implementation details and measured results are presented in Sections V and VI. II. CDR ARCHITECTURE A. The Architecture The CDR architecture includes a single-edge Hogge-type PD with a decision circuit, a charge pump, an integrated LC-tank VCO and a passive second-order PLL filter, as shown in Fig. 1. These four components constitute the LPLL. To minimize jitter, a tri-state PD and charge pump are used. The charge pump produces close to zero differential output current, when no data transitions occurs. The CDR is fully differential. Maximum differential input voltage of the CDR is 1 V . Two differential and are used to threshold slicing inputs optimize the threshold of the decision and the clock recovery circuits within 80% of the data swing. The recovered clock and data are transmitted to the corresponding outputs and via buffers and cross-talk isolation interface ) [6]. The amplitudes (transmission lines and transmitters

0018–9200/00$10.00 © 2000 IEEE

1354

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 9, SEPTEMBER 2000

Fig. 3.

functions of : , . The CDR circuit . Fig. 3 shows analytically calcuwas designed for with 4 MHz compared to OC192 lated SONET mask. The bandwidth does not satisfy SONET requirement (to be below 120 KHz), but the jitter peaking is within the recommended 0.1-dB jitter gain. The CDR jitter tolerance is a measure of how much peak-to-peak sinusoidal jitter can be added to the incoming data before causing data errors2 due to misalignment of the data and the recovered clock. In a CDR with LPLL, jitter tolerance is defined by a jitter transfer function and the PLL slew-rate capabilities [7]. Considering no slew rate limitations in the loop (this is usually the case in a well designed CDR), the frequency response of the jitter tolerance can be described by the following function:

Fig. 1. CDR IC architecture.

Fig. 2.

CDR analytical jitter transfer compared to a SONET mask.

Equivalent model of the linear PLL.

of and are 1 V differential. The IC can also be used in a data recovery mode only, when the VCO clock . is overdriven with an external signal

(4)

(2)

determines the shape of the jitter tolerance response. It can be used to compare the performance of the design with is multiplied by the SONET jitter tolerance mask if at high frequencies . The jitter tolerance is defined by the CDR circuit design and value of decision-circuit clock-phase margin. For the SONET OC192 4 MHz mask it is specified to be more than 15 ps at for [8]. Fig. 4 shows analytically calculated 4 MHz compared to the OC192 SONET mask. The dotted line is the response with asymptotic single-pole jitter transfer function [(A.4) in Appendix A]. The 20-dB/decade slope of the mask in the frequency range of 0.4–4 MHz coincides with the response for 4 MHz. SONET compliant LPLL design 4 MHz for minimum data tranmust have a bandwidth . sition density Several factors affect jitter generation of the CDR shown in Fig. 1. The recovered clock central frequency value is held as a . An offset current at the charge voltage on the capacitor pump output in tri-state causes a frequency step

(3)

(5)

is 3-dB bandwidth of the PLL jitter transfer funcwhere is VCO sensitivity in Hz/V, is the loop tion in Hz, is the average data transition density natural frequency, and for 0101 pattern, for factor (maximum is doubled, PRBS pattern). In (3) the charge pump current compared to the Gardner’s formula [9], since in a Hogge PD a corresponds to -radians of the data current variation of and the damping factor are phase. Both the bandwidth

For practical filter parameters and expected maximum time interval with no data transitions, the voltage variation across is negligible compared to the voltage step capacitor . The phase jitter is proportional to the number of consecutive 0’s or 1’s in the data, as follows:

B. PLL Equivalent Model for CDR Jitter Characteristics Analyses Three types of jitter characteristics are important in SONET receiver design: jitter transfer function (bandwidth and jitter peaking), jitter tolerance and jitter generation [8]. The PLL bandwidth was specified to be between 4–10 MHz with less than 0.1-dB jitter peaking and jitter generation below 1 ps rms. The PLL was analytically designed using a continuous time approximation for the equivalent model of Fig. 2 [9]. – is required to provide low A damping factor above jitter peaking. Due to overdamping, the PLL jitter transfer approaches a single-pole low-pass-type response with the following parameters (see Appendix A): (1)

[ps] 2The

[MHz]

(6)

amount of errors is defined with 1-dB receiver input power penalty.

GRESHISHCHEV AND SCHVAN: CLOCK AND DATA RECOVERY IC FOR SONET APPLICATION

Fig. 5.

PD and decision circuit.

Fig. 6.

PD simulated response.

1355

Fig. 4. CDR analytical jitter tolerance compared to a SONET mask.

Combining (1)–(6) the jitter can be expressed as follows: [ps]

[MHz]

(7)

where is the relative current offset at the charge pump output [MHz] is the PLL bandwidth [MHz] at minimum exand . pected data transition density factor across the damping resistor Instantaneous voltage drop creates the well known frequency ripple. Peak-to-peak jitter associated with this ripple can be expressed as [ps]

[MHz]

(8)

is the attenuation due to high order poles in the where increase jitter genPLL filter. Lower targeted values of eration. The single-edge PD, used in this CDR, reduces DF by a factor of 2 and doubles the jitter amplitude. Because of the , attention must be paid to the required high bandwidth and to the attenuation to keep charge-pump offset jitter generation in the sub-picosecond range. Jitter can also be generated by the loop static phase error and its pattern dependence. To minimize static error, a charge pump with high output impedance and a VCO with high control input impedance were employed. The VCO phase noise had little impact on the recovered clock jitter because of the high loop gain achieved. and wide loop bandwidth C. PLL Design Method The LPLL with previously described model is fully defined , , , , and by five independent parameters: . Filter components and are functions of these parameters and can be found from (1)–(3) (see Appendix B). and are mostly constrained by circuit Parameters implementation. Their initial values do not impact, in the first order, LPLL jitter generation as is seen in (7) and (8). are specified at (or Parameters and accounting for single-edge phase detection): to satisfy 4 MHz to satisfy jitter tolerance jitter peaking and mask. III. CIRCUIT DESIGN A. Phase Detector and Decision Circuit The block diagram in Fig. 5 shows the PD and decision circuit. The data decision circuit is split with the clock recovery

circuit to provide independent data threshold optimization. A divide-by-two circuit results in a single-edge operation of the original Hogge-type PD [10]. Therefore the recovered clock jitter is not affected by possible asymmetry between the rising and falling edges of the incoming data. A dummy latch circuit is introduced to compensate for the delay. An additional advantage of the single-edge operation is an extended linear phase range, which is explained in Fig. 5. The output provides a phase-independent reference signal with a constant pulse width of about 70 ps at each positive data transioutput is the phase difference signal in the form tion. The of a variable pulse width of 70 50 ps. Fig. 6 shows SPICE simulation results of the PD circuit phase response. The linear phase range is about 80 ps. In the absence of data transitions and are at a low level. This is detected both outputs as a tri-state by the charge pump. The front-ends of the decision circuit and the PD contain limwith differential slicing level control at the iting amplifiers input. To increase the time resolution of the decision circuit, a master–slave–master structure is employed in the retiming . The sensitivity of the decision circuit, defined priblock marily by thermal and shot noise in the input slicing circuitry, is simulated to be 13.4 mV at BER 10 . The simulated latching metastability region is less than 1 ps . This region is determined as a time zone in the clock-delay sweep where the output of the decision circuit is not defined. B. LC-Tank VCO The block diagram of the LC-tank-based VCO is shown in Fig. 7(a). The 10-GHz oscillator core is a cross-coupled differential circuit [Fig. 7(b)] [11]. The VCO includes a differential

1356

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 9, SEPTEMBER 2000

(a)

Fig. 8.

PLL charge pump with a tri-state.

(b) Fig. 7.

VCO. (a) Block diagram. (b) LC-VCO core. Fig. 9. Jitter generation due to offset current in the charge-pump tri-state.

control buffer and a pulse-edge sharpening limiting amplifier to reduce jitter sensitivity to the cross-talk noise. The has an open collector output to drive the transmission line interface with a 50- termination at the far end. The control buffer is designed with pMOS source followers at the input. VCO frequency is tuned with a varactor which is split into two parts: a coarse tune (to compensate for process variation) and a fine tune for frequency control in the loop. VCO phase noise is measured to be less than 80 dBc/Hz at a 100-kHz offset frequency. C. Charge Pump The charge pump (Fig. 8) employs a well known currentswitching technique with the addition of a common-mode feedback amplifier . Care was taken to achieve unconditional stability in the feedback with sufficient gain and with a small value in Fig. 8. Small is necessary for low jitter of capacitance peaking in the PLL. The charge-pump output differential cur, as accounted for by the model in Fig. 2. rent The charge pump is in a tri-state when both differential inputs and are switched into a low (or high) state. A mismatch between charge-pump current sources, , their finite output resistances, and the VCO control input current cause an offset current in the tri-state. Fig. 9 shows a plot of the PLL jitter due in the tri-state as calculated from to relative offset current . Single-edge phase detection, employed in the (7) for value compared to the double-edge CDR, requires half the Hogge-type PD. The top current sources and the feedback amplifier were designed with pMOS transistors. Appropriate matching was achieved by sizing the critical components and using symmetrical layout. To increase the charge-pump output impedance, cascode current sources were employed. The measured offset was less than 0.2%.

D. PLL Filter The PLL filter is split into internal and , and external and components (see Fig. 1). Resistors R’ make jitter performance less sensitive to the external parasitics and coupled noise. (along with in Fig. 8) performs smoothing of Capacitor the PD output pulses and introduces the required attenuation used in (8). In the PLL, resistor is limited by maximum voltage drop required for normal circuit operation. Pulse smoothing also relaxes this constraint. E. Cross-Talk Isolation Two differential output buffers ( in Fig. 1) provide adjustable differential voltage swing up to 1 V . The buffers are physically separated from the VCO and PD with transmission line interfaces to prevent jitter generation due to cross-talk via substrate and common grounds. The VCO is also separated from the PD with similar transmission line interface. All of the blocks have separate power-supply systems routed according to the isolation and analog–digital ground splitting techniques described in [6]. All of the CDR circuits are fully differential. The 10 Gb/s inputs and outputs are terminated on-chip with 50- resistors. IV. SIMULATION Five levels of hierarchical PLL analysis were carried out: analytical, behavioral linear, behavioral mixed-mode, circuit schematic level, and post layout with distributed parasitics. The last four levels are HSPICE-based. A mixed-mode behavioral library of linear and digital components was developed. All levels of simulation give consistent results, with increasing

GRESHISHCHEV AND SCHVAN: CLOCK AND DATA RECOVERY IC FOR SONET APPLICATION

Fig. 11.

1357

Microphotograph of the SiGe CDR with linear PLL.

(a)

Fig. 12. CDR 9.529-Gb/s eye diagrams and the recovered clock. Input data 30 1 PRBS pattern. mV , 2

0

VI. EXPERIMENTAL RESULTS (b) Fig. 10. HSPICE simulated jitter characteristics. (a) Jitter tolerance. (b) Jitter transfer.

insight into jitter behavior at more detailed levels. Analytical models of jitter transfer and jitter tolerance are based on the second-order linear PLL theory as described in Section II. and , the PLL becomes a third-order With the addition of loop. This was simulated along with on-chip, in-package, and external filter parasitics using HSPICE-based models. AC ranging from 1/6 to 1 are shown simulation results for response was in Fig. 10(a) and (b). Jitter tolerance . For , designed to fit the mask at is compliant with SONET requirements. Jitter the peaking is within the required 0.1-dB value for the simulated range. Sub-picosecond jitter generation was predicted in circuit transient simulation. V. FABRICATION The CDR circuit was implemented in IBM’s SiGe HBT bipolar process which includes pMOS devices. Detailed device characteristics are given in [12]. Die size is 3 3 mm (Fig. 11). Three external RC-filter components are required to complete the CDR design.

The IC performed as simulated, except for the VCO oscillation frequency which was 5% lower than simulated. Measurements were done on-wafer using membrane probes from Cascade Microtech. The PLL was locked by an external sweeping of the VCO frequency using the 10-GHz adjust input (see Fig. 1). The locking range is 25 MHz and the PLL stays locked within a 200-MHz frequency range. Fig. 12 shows recovered clock and CDR eye diagrams at 9.529-Gb/s data rate and 30 mV , PRBS input signal. Measured sensitivity was 14 mV at BER 10 . This value is close to the simulated 13.4 mV (Section III), which indicates that a sufficient level of cross-talk isolation is achieved. The jitter tolerance was measured for jitter amplitudes below 40 ps . This jitter was generated by modu(see Fig. 1) with an exlating the clock slicing level ternal signal from dc to 100-MHz frequency range. No data errors were detected associated with this jitter. This demonstrates jitter tolerance of more than 40 ps compared to the 15 ps SONET mask above 4 MHz. To verify the maximum bit rate of the IC, it was tested at 13.25 Gb/s in a data recovery mode with an external clock (Fig. 13). Sensitivity at 12.5 Gb/s was measured to be 15.5 mV . Data-recovery clock-delay margin of 77 ps at 10 Gb/s was the same as the BER tester delay margin, confirming picosecond timing resolution of the decision circuit. The recovered clock jitter was measured, with a digital oscilloscope, to be 1.85 ps rms versus 1.68 ps rms jitter of the refer-

1358

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 9, SEPTEMBER 2000

APPENDIX A The CDR jitter transfer function is similar by definition to . For the second-order the PLL phase transfer function charge-pump PLL of Fig. 2, the phase transfer function is [9] (A.1) This formula can be rewritten as Fig. 13.

2

CDR data recovery eye diagrams at 13.25 Gb/s. Input data 14 mV

,

0 1 PRBS pattern. The RECCLK waveform is the PRBS generator

(A.2)

reference clock translated by CDR.

is a bandwidth of the where asymptotic single-pole low-pass transfer function (A.3) approaches the low-pass reThe jitter response at or at . The asymptotic sponse jitter tolerance shape function can be defined from (4) and (A.3) as (A.4)

Fig. 14. Phase-noise comparison of the CDR recovered clock, free-running VCO, and data pattern generator reference clock.

ence clock. Therefore jitter generated by the CDR is estimated to be 0.78 ps rms. Phase noise was measured more accurately with a HP4352B phase-noise meter (Fig. 14). Recovered clock phase noise follows, with no error, the data reference clock noise down to the CDR jitter noise floor at 110 dBc/Hz. The noise floor is reached within the bandwidth of the loop (designed to be above 4 MHz). Numerically integrated phase noise of the recovered clock in 80 MHz bandwidth gives a jitter value of 0.77 ps rms. Jitter was found to be independent of the PRBS word length up . The IC dissipates 1.5 W with a 5-V power supply. to VII. CONCLUSION In this paper, a low-jitter integrated CDR with a linear-type PLL has been demonstrated. The PLL equivalent model and design method to meet SONET jitter requirements were presented. The IC was implemented in SiGe technology. Sub-picosecond rms jitter with no jitter dependence on data PRBS pattern is achieved. Jitter generation factors in CDR were considered. A single-edge version of the Hogge-type PD and a tri-state charge pump were designed to satisfy jitter requirements. PMOS transistor circuits and cross-talk isolation technique were used to improve CDR jitter performance. In a second-order LPLL a bandwidth of more than 4 MHz and a damping factor of 4–6 at minimum expected data transition density are recommended to satisfy OC192 jitter tolerance and jitter transfer peaking requirements. To satisfy jitter transfer bandwidth ( 120 KHz), additional low-pass filtering of the recovered clock must be performed, for instance, in the PLL of a transmitter circuit.

APPENDIX B The following PLL filter components are found by solving (1)–(3): (B.1) (B.2)

ACKNOWLEDGMENT The authors thank C. Kelly and P. Popescu for discussions, J. E. Rogers for his contributions to layout design and simulations, M.-L. Xu for help with the output buffer layout, J. Showell for assistance with the measurements, Dr. S. Voinigescu and D. Marchesan for their expertise in SiGe components modeling, and Dr. M. Copeland for advice on VCO phase noise analyses. Special thanks to R. Hadaway for his support and to IBM corporation for fabrication. REFERENCES [1] T. Morikawa et al., “A SiGe single-chip 3.3 V receiver IC for 10Gb/s optical communication systems,” in ISSCC Dig. Tech. Papers, Feb. 1999, pp. 380–381. [2] R. C. Walker et al., “A 10Gb/s Si-bipolar Tx/Rx chipset for computer data transmission,” in ISSCC Dig. Tech. Papers, Feb. 1998, pp. 302–303. [3] Y. Greshishchev and P. Schvan, “SiGe clock and data recovery IC with linear-type PLL for 10 Gb/s SONET application,” in Proc. 1999 Bipolar/BiCMOS circuits and Technology Meeting, Sept. 1999, pp. 169–172. [4] B. Razavi, “Design of monolithic phase-locked loops and clock recovery circuits—A tutorial,” in Monolithic Phase-Locked Loops and Clock Recovery Circuits: Theory and Design, B. Razavi, Ed. New York, NY: IEEE Press, 1996, pp. 405–420.

GRESHISHCHEV AND SCHVAN: CLOCK AND DATA RECOVERY IC FOR SONET APPLICATION

[5] K. Kishine, N. Ishihara, K. Takiguchi, and H. Ichino, “A 2.5-Gb/s clock and data recovery IC with tunable jitter characteristics for use in LANs and WANs,” IEEE J. Solid-State Circuits, vol. 34, pp. 805–812, June 1999. [6] Y. Greshishchev and P. Schvan, “60 dB gain 55 dB dynamic range 10Gb/s SiGe HBT limiting amplifier,” IEEE J. Solid-State Circuits, vol. 34, pp. 1914–1920, Dec. 1999. [7] L. De Vito, “A versatile clock recovery architecture and monolithic implementation,” in Monolithic Phase-Locked Loops and Clock Recovery Circuits: Theory and Design, B. Razavi, Ed. New York, NY: IEEE Press, 1996, pp. 405–42. [8] “SONET OC-192 Transport System Generic Criteria,” Bellcore, GR-1377-CORE, Mar. 1998. [9] F. M. Gardner, “Charge-pump phase-lock loops,” IEEE Trans. Commun., vol. COM-28, pp. 1849–1858, Nov. 1980. [10] C. R. Hogge, “A self-correcting clock recovery circuit,” IEEE J. Lightwave Technol., vol. 3, pp. 1312–1314, Dec. 1985. [11] B. Jansen, K. Negus, and D. Lee, “Silicon bipolar VCO family for 1.1 to 2.2 GHz with fully-integrated tank and tuning circuits,” in ISSCC Dig. Tech. Papers, Feb. 1997, pp. 392–393. [12] J. D. Cressler, “SiGe HBT technology: A new contender for Si-based RF and microwave circuit applications,” IEEE Trans. Microwave Theory Tech., vol. 46, pp. 572–589, May 1998.

Yuriy M. Greshishchev (M’95) received the M.S.E.E. degree from Odessa Electrotechnical Institute of Communications, Odessa, Ukraine, in 1974 and the Ph.D. degree in electrical and computer engineering from V. M. Glushkov Institute of Cybernetics, Kyiv, Ukraine, in 1984. From 1976 to 1994, he worked with research and development organizations and academia on high-speed ADC and DAC circuit theory and design, primarily in the area of silicon bipolar and GaAs MESFET integrated circuits. His Ph.D. research was dedicated to the development of folding-type video ADCs embedded into TV systems. In 1993, he was a Visiting Scientist at Micronet, Institution Center of University of Toronto, Toronto, ON, Canada. In 1994, he joined the Department of Electrical and Computer Engineering, University of Toronto, where he conducted research on low-voltage GaAs MESFET circuits for digital wireless communication. Since 1996, he has been with Nortel Networks, Ottawa, ON, Canada, where he is responsible for the development of highly integrated circuit solutions in emerging technologies for optical communications. He is the coauthor of two books and more than 40 technical papers on the area of data converters, high-speed circuit design, and statistical modeling.

1359

Peter Schvan (M’89) was born in Budapest, Hungary, in 1952. He received the M.S. degree in physics from Eotvos Lorand University, Budapest, in 1975 and the Ph.D. degree in electrical engineering from Carleton University, Ottawa, ON, Canada, in 1985. In 1985, he joined Nortel Networks, Ottawa, ON, Canada, where he worked in the area of BiCMOS and bipolar technology development, yield prediction, device characterization, and modeling. Recently, his work has been extended to the design of multigigabit circuits and systems. He is currently Senior Manager of a group responsible for evaluating various high-performance technologies and demonstrating, advanced circuit concepts required for fiberoptic communication systems. He is the author or coauthor of numerous publications.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999

1951

A 2–1600-MHz CMOS Clock Recovery Capability PLL with LowPatrik Larsson

Abstract— A general-purpose phase-locked loop (PLL) with programmable bit rates is presented demonstrating that large frequency tuning range, large power supply range, and low jitter can be achieved simultaneously. The clock recovery architecture uses phase selection for automatic initial frequency capture. The large period jitter of conventional phase selection is eliminated through feedback phase selection. Digital control sequencing of the feedback enables accurate phase interpolation without the traditional need of analog circuitry. Circuit techniques enabling low-V dd operation of a PLL with differential delay stages are presented. Measurements show a PLL frequency range of 1–200 MHz at V dd = 1:2 V linearly increasing to 2–1600 MHz at V dd = 2:5 V, achieved in a standard process technology without low threshold voltage devices. Correct operation has been verified down to V dd = 0:9 V, but the lower limit of differential operation with improved supply-noise rejection is estimated to be 1.1 V.

design is a large digital circuit incorporating a PLL-based clock generator with low-jitter requirements, which is the most common mixed-mode design today. Digital style PLL’s have been suggested, e.g., [3], but these cannot compete with the supply-noise rejection of differential analog circuitry. A clock recovery PLL architecture suitable for programmable bit rates is developed in Sections II and III with emphasis on jitter reduction. Sections IV–VI present PLL circuit techniques that use the noise resistant differential pair but avoid other “expensive” (in terms of headroom) analog operation is enabled in a standard circuitry, such that lowdigital CMOS process without the need of low-threshold devices.

Index Terms—Frequency locked loops, frequency synthesizers, phase comparators, phase jitter, phase locked loops, phase noise, synchronization.

II. LOW-JITTER PHASE-SELECTING CLOCK RECOVERY

I. INTRODUCTION

T

HE continuing scaling of CMOS process technologies enables a higher degree of integration, reducing cost. This fact, combined with the ever shrinking time to market, indicates that designs based on flexible modules and macrocells have great advantages. In clock recovery applications, flexibility means, for example, programmable bit rates requiring a phase-locked loop (PLL) with robust operation over a wide frequency range. Increased integration also implies that the analog portions of the PLL (mainly the voltage-controlled oscillator [VCO]) should have good power-supply rejection to achieve low jitter in the presence of large supply noise caused by digital circuitry. This Another trend is low-power design using reduced reduces the headroom available for analog design, causing integration problems for mixed-mode circuits [1]. Furthermore, in applications where power consumption is a more is not scaled as critical design goal than compute power, to avoid leakage currents in OFF devices, aggressively as which aggravates the headroom problem. For mixed-mode circuits with significant analog circuitry, dual- and/or dualprocessing combined with a dc/dc converter [2] is a viable solution. However, for circuits dominated by digital logic, it is difficult to justify the additional fabrication steps required for these solutions. A common case of the latter mixed-mode Manuscript received April 13, 1999; revised June 19, 1999. The author is with Bell Laboratories, Lucent Technologies, Holmdel, NJ 07733 USA. Publisher Item Identifier S 0018-9200(99)08963-5.

A basic PLL for clock recovery is shown in Fig. 1(a). In most CMOS implementations, the VCO must have a tuning range covering more than 50% of the target frequency to guarantee high yield over large process variations. This large frequency range requires special techniques for initial frequency locking since there exists no phase detector for nonreturn-to-zero (NRZ) data that operates reliably with large initial frequency offset. Available techniques include frequency sweeping [4], using a replica VCO matched to the clock generating VCO [5], or initially locking the PLL to a reference frequency with a frequency detector before switching to the input data and locking with a phase detector [6]. One common technique requiring no special initialization is shown in Fig. 1(b). This dual-loop PLL can be traced back to [7], which was based on a delay-locked loop (DLL). The multiple-output VCO in Loop A in Fig. 1(b) generates a number of equally spaced clock phases at a frequency of This loop can have a large frequency tuning range since it is locked with a phase frequency detector (PFD). Clock recovery is performed by Loop B that generates the recovered clock by selecting the clock phase from Loop A that is best aligned with the incoming data. If there is a frequency offset and the incoming data, an appropriate clock between can still be generated by changing the Ctrl signal to select a different phase over time. Frequency initialization is automatically achieved by selectand for the expected data rate. Most ing appropriate communication systems have a frequency tolerance of a few hundred parts per million (ppm), eliminating any need for a frequency detector in Loop B. The decoupling of the VCO loop from the data recovery loop enables independent selection of bandwidth in those two loops. This allows a large bandwidth in

0018–9200/99$10.00  1999 IEEE

1952

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999

Loop A is selected from the multiple VCO phases. When Loop B detects a misalignment of the incoming data and the VCO output clock, the Ctrl signal is changed to select a different phase for the feedback clock. This will cause a phase change in the divided clock feeding the PFD such that the charge pump will alter the VCO control voltage stored in the loop filter. Therefore, sudden phase steps generated by the clock recovery logic will be smoothed by the filter of Loop A, causing the VCO clock to slowly drift toward the correct phase with a rate of change determined by the bandwidth of Loop A. In a 622-MHz application with a division ratio of and a it will take bandwidth of Loop A equal to one-tenth of clock cycles to complete a phase approximately switch. The jitter caused by a phase step in the structure in Fig. 1(c) is therefore spread out over 80 clock cycles, significantly reducing the jitter compared to Fig. 1(b). Feedback phase selection has previously been applied to fractionalfrequency synthesizers for other purposes [10], [11].

(a)

(b)

III. AVERAGING PHASE INTERPOLATION

(c) Fig. 1. Clock recovery PLL’s. (a) Standard, (b) phase selection, and (c) loop filter. feedback phase selection. ChP F denotes charge pump

+

+

Loop A to suppress VCO jitter [4], for example, jitter induced by power-supply noise. At the same time, a low bandwidth can be used in Loop B to reduce jitter transfer. This cannot be achieved by the PLL in Fig. 1(a), which has a single loop with conflicting design goals regarding loop bandwidth. A disadvantage of a phase-selecting PLL is that the phase step that is generated when the Ctrl signal in Fig. 1(b) switches to a new clock phase. This phase switching leads to large cycle-to-cycle jitter (greater than or equal to the phase spacing) that can actually dominate the peak-to-peak jitter. By increasing the number of phases, the phase spacing will be smaller with less jitter. More phases can be generated by having more delay stages in the VCO, but this limits the speed. An alternative is phase interpolation that enables a large number of phases without degrading the VCO speed [8], [9]. However, interpolators add analog circuitry to the design and are prone to mismatch, which in the worst case can lead to nonmonotonic phase spacing. A proposed remedy for the jitter due to phase steps is shown in Fig. 1(c). Instead of selecting a clock phase feeding the sampling flip-flop and the phase detector, the feedback clock in

The smoothing effect of the loop filter can also be used for phase interpolation. If the Ctrl signal in Fig. 1(c) alternates between two different clock phases every second cycle of the reference clock, the result will be a VCO clock phase corresponding to the average of the two selected phases. In the test chip, four levels of averaging phase interpolation were implemented by circulating through four clock cycles and in each clock cyor as the feedback clock. A quarcle selecting phase is then achieved by ter phase interpolation generating for three consecutive clock cycles, then selecting selecting for the fourth cycle and repeating this sequence. The architecture in Fig. 1(c) lends itself naturally to combining both averaging phase interpolation and standard currentmode interpolation. A test chip was built in a 0.25- m, 2.5-V digital CMOS process to evaluate the jitter performance of the phase selection architecture. A block diagram of the implemented VCO and phase control circuitry is shown in Fig. 2. The phase select control code at the input consists of seven bits, of which two are directly fed to a finite state machine (FSM) that generates control signals for realizing the averaging interpolation. The remaining five bits of the from which the code for control code represent is generated by adding one. The FSM controls Mux1 to and in a four select one of the codes representing clock period repetitive cycle, as described above. The five bits at the output of Mux1 are split into three bits coarse select and two bits fine select. The three coarse bits select two neighboring phases from a four-stage differential VCO having eight evenly spaced output phases and send these two phases to a current-mode interpolator. Mux2/Mux3 in Fig. 2 receive one coarse bit each, and the third coarse bit is used to conditionally invert the output signals. The interpolator is similar to the Type-I circuit in [9] and is controlled by a fourbit temperature code derived from the two fine select bits. Both the current-mode interpolation and the averaging phase interpolation are programmable in the test chip and can be disabled. The two complementary multiplexers at the output

LARSSON: 2–1600-MHz CMOS CLOCK RECOVERY PLL

1953

Fig. 4. Phase shift versus phase code measured at 1 GHz.

Fig. 2. VCO, phase selector, interpolator, and feedback multiplier.

(a)

(b)

(c)

(d)

Fig. 3. 500-MHz period distribution histograms. (a) Clock recovery inactive, (b) standard phase selection, (c) feedback phase selection, and (d) averaging phase interpolation. Measurement conditions were V dd = 2:5 V, N = 25; fref = 20 MHz, and fdata = 499:4 Mb/s.

of the VCO/interpolator (Mux4/Mux5) allow the chip to be configured for the scheme in either Fig 1(b) or (c), enabling a performance comparison. Freezing the 7-bit phase select control code to a fixed phase gives a measured output period jitter1 of 7.6 ps rms when running the VCO at 500 MHz, as shown in Fig. 3(a). This is the jitter inherent in the VCO and the output buffers. Configuring the chip for the standard phase selecting scheme in Fig. 1(b) with 32 clock phases (4 interpolation) gives the jitter in Fig. 3(b), revealing a long tail in the histogram caused by a frequency offset of 1200 ppm between the incoming data The phase spacing is ns/ ps such and that we can expect a second peak in the histogram 62.5 ps away from the main peak, which is confirmed by the shape 1 Timing

uncertainty between two consecutive edges of the generated clock.

of the histogram. As shown by the inset, this peak is slightly off its ideal position and is smeared out due to the nonideal ac behavior of the current-mode interpolator. Feedback phase selection with 4 current-mode interpolation eliminates the long tail in the histogram, bringing the period jitter down to 7.9 ps rms, as shown in Fig. 3(c). This indicates that the jitter is completely dominated by the VCO jitter, and nearly all of the phase-switching jitter caused by digital clock recovery can be eliminated. Fig. 3(d) shows the period jitter histogram obtained when the currentmode interpolators are disabled and 4 averaging phase interpolation is used instead. Its similarity to the result in Fig. 3(c) proves that the same low jitter and the same number of discrete clock phases (32) can be achieved without the analog interpolation circuitry. Enabling both the current-mode interpolator and the averaging phase interpolation gives a total of 128 selectable clock phases. The graph in Fig. 4 shows the phase shift as a function of the phase select control code when the period of the VCO ps, whereas is 1 ns. The expected phase step is ns/ the largest measured step is 21 ps, resulting in a differential nonlinearity (DNL) of 1.7 bits. The differential VCO makes the phase curve near-symmetric around the midpoint, suggesting that the integral nonlinearity (INL) of 94 ps is mainly due to delay mismatch in the VCO. The main contributing factor to this mismatch is unbalanced parasitic wiring capacitors that are difficult to match without incurring speed penalty. If Loop B is a first-order loop or a well-damped second-order loop, the feedback in Loop B will automatically select the best fit phase select code, reducing the impact of INL. The maximum phase deviation in a clock recovery application is then the DNL added to the VCO jitter. In addition to jitter reduction and phase interpolation, feedback phase selection also has other advantages when combined with other architectures. Using feedback instead of feedforward phase selection reduces circuit complexity, thereby eliminating the need for good matching in an analog-style interpolator [12] and a high-speed parallel sampling structure [13]. IV. VCO Recently, low-noise VCO’s utilizing high-swing complementary signals have been presented (e.g., [14] and [15]).

1954

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999

(a) Fig. 6. Example VCO waveforms to estimate lower limit on

(b) Fig. 5. Bias generation and one VCO delay stage of (a) replica bias scheme and (b) diode clamping.

Good 1 noise performance has been shown, but their powersupply noise rejection is inferior to that of the standard analog differential pair since they lack a high-impedance source, Therefore, the analog style making the delay depend on differential pair is preferable in applications where powersupply noise is the main source of oscillator jitter. When a differential pair with resistor loads is used as a delay cell in a VCO, the frequency is regulated by changing the tail current control voltage in Fig. 5(a). To as implemented by the achieve a large frequency tuning range, it is desirable that the output swing and common mode do not change significantly with frequency. Often the replica-bias scheme in Fig. 5(a) is employed, which relies on good matching between a replica of the delay stage (devices Mr1–Mr3) and the VCO delay stages to , giving a known to set the VCO output swing from common mode and swing independent of the speed-regulating current. A disadvantage of this technique is that the PMOS load (Mr3) will operate as a current source at low frequencies, introducing high gain in the replica feedback loop. To prevent instability, a large compensation capacitor is required, which introduces another pole in the PLL, leading to more intricate design. Furthermore, the amplifier in the replica bias loop requires additional headroom, thereby prohibiting lowoperation. Fig. 5(b) shows a structure that achieves the good powersupply noise rejection of the analog differential pair, at the operation. The PMOS diodes same time enabling loware used for clamping the output voltage to a minimum level of giving a fixed common mode and swing without the need for a replica bias circuit. This makes the VCO suitable for a wide range of operating frequencies and supply voltages. To guarantee clamping action, the NMOS tail current must be larger than the current through the controlled Furthermore, a proposed design goal for PMOS load

V dd:

low oscillator noise suggests that the rise and fall times of the output nodes should be made equal [16]. This is achieved to each of the controlled PMOS by reflecting half of loads by the current mirror formed by devices Md1–Md3. Assuming that Md4 recently turned on, “node a” will be discharged by a current of At the same time, the complementary output node is pulled by a current equal to indicating equal to rise and fall times. A disadvantage of this oscillator is the additional parasitic capacitance of the diodes, which makes the maximum operating frequency lower than that of the replica bias structure. The additional gate capacitance of the diode loads can be eliminated by using NMOS diodes [17]. The minimum supply voltage for the VCO is which has been verified by measurements V. However, at this value of down to the VCO is no longer differential. An estimate of the minfor differential operation can be derived from imum the simulated VCO waveforms in Fig. 6. The VCO output down to approximately For swings from differential operation, it is required that both NMOS devices in the differential pair (Md4, Md5) are turned ON at the drop crossover point of the waveforms. Assuming a over the current source device Md6 leads to a minimum input At the lowest limit of this voltage of generated by the previous input voltage is of stage in the oscillator, indicating a minimum Measurements determined and to be 0.53 and 0.85 V, respectively, indicating a minimum of about 1.1 V assuming a of 0.1 V. Note that this is a theoretical number, since the differential operation of Good the VCO has zero tuning range at this value of power-supply rejection can also be achieved by the regulatedsupply structure in [18]. However, the requirement of a large decoupling capacitor generates contradicting design goals on PLL bandwidth. V. CHARGE PUMP A. Bandwidth and Peaking Compensation To reduce peak-to-peak jitter due to VCO noise, it is advantageous to keep as high a PLL bandwidth as possible. Traditional worst case design would keep the PLL bandwidth and damping factor sufficiently far away from stability limits under all variations of the input reference frequency, the

LARSSON: 2–1600-MHz CMOS CLOCK RECOVERY PLL

1955

(a) Fig. 7. Current multiplier generating charge-pump biasing voltages and Vqpb :

qbn

V

manufacturing process, and the division ratio in the feedThe concept of self-biasing introduced in [19] back path simplifies the design by eliminating process variations and the input reference frequency from the stability constraints. so However, the PLL bandwidth is still a function of that maximum noise suppression can only be achieved for a In programmable applications, can vary by more fixed than an order of magnitude, indicating that the variation in instead of process stability constraint can be dominated by variations, as shown by the stability limit of a charge-pump PLL [20]

(b) Fig. 8. Jitter transfer functions for different division ratios. (a) Simulated standard PLL. (b) Measured characteristics of Loop A with intentionally low damping.

(1) is the input reference frequency (or effectively the where is the VCO gain, sampling rate of the phase detector), is the charge-pump current, and is the loop filter resistance. Other PLL design parameters, such as bandwidth and damping Compensating loop parameters factor, also change with guarantees that the PLL is always operating for changes in with maximum bandwidth and fixed damping factor without endangering stability. This can be done by setting the charge pump current to (2) is a fixed reference current. This is realized by the where current multiplier in Fig. 7, which generates the charge-pump by letting the individual bits of control binary current weighted current sources. The simulated jitter transfer function of a standard PLL in is Fig. 8(a) demonstrates the change of loop parameters as altered. The damping factor is intentionally set low to show The measured jitter transfer function of its dependence on The Loop A in Fig. 8(b) shows the desired independence of slight deviation of the curves is caused by transistor mismatch in the current multiplier. B. Charge Sharing A common problem of many charge pumps is charge sharing. For the charge pump in Fig. 9(a) (Type A), charge sharing is caused by the parasitic capacitance in nodes pcs is active, node pcs is charged to and ncs [21]. When When deactivating some of the charge stored in node pcs will leak through the current source device. Since the parasitics

(a)

(b)

Fig. 9. (a) Charge-pump suffering from charge sharing (Type A). (b) Charge removal transistors eliminate charge sharing (Type B).

of nodes ncs and pcs can never be matched, this will lead to a static phase offset, as shown in Fig. 10(a). This is the transfer function of a phase-frequency detector followed by a Type A charge pump. The two transistors Mp and Mn in the Type B charge pump in Fig. 9(b) will remove the charge from the nodes pcs and ncs when Up and Down are deactivated [22]. This leads to a large reduction in the phase offset, as shown in Fig. 10(a). For this application, static phase offset in Loop A is not critical. However, when analyzing the cause of phase offset, a source of increased jitter is revealed. Fig. 10(b) indicates that the leakage from node pcs is larger than that from ncs. When the PLL is locked, the leakage mismatch is compensated for by earlier than , giving a phase offset. Since activating the compensation charge is applied in the early portion of the charge-pump activation time, it will cause voltage ripple on

1956

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999

(a)

(b)

(c)

(d)

(a)

(e) (b) Fig. 10. Characteristics of the Type A and B charge pumps. (a) Transfer function of PFD followed by charge pump. (b) Simulated IUp and IDown when net output charge is zero.

the loop filter, leading to phase jitter at the VCO output. The charge removal transistors Mn and Mp in Fig. 9(b) eliminate the current tails resulting in a well-balanced Up and Down and cancel each other, activation time such that reducing the loop filter ripple. A further advantage of the Type noise in the current source B charge pump is reduced 1 to below transistors achieved by periodically resetting their 0 V [23], [24]. A limitation of the Type B charge pump is a reduced ). If dynamic range of the VCO control voltage ( is less than there will be a current flowing through the Mp device to the output when the Dwn control is inactive. When NMOS devices are used for speed-regulating will never drop below , constraining the VCO, to be less than which can easily be fulfilled. However, the charge pump works only up to an output voltage of limiting the upper tuning range of the VCO. However, the charge pump in Fig. 9(a) has the same and is a similar upper voltage limit. Mismatch in source of jitter as charge sharing described above. For low jitter, it is essential to have good matching, implying that the should be saturated. Again, devices controlled by this requires Charge removal can also be done by ac coupling [18], but this requires careful timing of the control signals in the charge pump. The solution to charge sharing in [21] is less suitable for applications due to the common-mode restrictions lowon the differential amplifier.

Fig. 11. Evolution of loop filter. (a) Ideal model, (b) MOS-only implementation, (c) improved resistor linearity for low-V dd operation, (d) improved capacitor linearity, and (e) final model where C3 models the well-to-substrate capacitance.

VI. LOOP FILTER The most common PLL loop filter is the simple RC circuit in Fig. 11(a). Common design options for the resistor are poly or the channel resistance of an MOS transistor. For high resistance values, an MOS device is most attractive. However, if implemented with the it has a disadvantage at low straightforward configuration of Fig. 11(b). For a nominal of 2.5 V, the effective resistance of the transmission gate ). is nearly independent of the VCO control voltage ( However, the resistance becomes strongly dependent on for low For the resistance goes [25], [26]. Exchanging to infinity for some values of the position of the transmission gate resistor and the MOS and the resistance of capacitor as in Fig. 11(c) will make The resistance still the NMOS device independent of but the variation is much less than for the varies with previous configuration. Since the capacitor in Fig. 11(c) is a “floating” capacitor, it must be implemented with a PMOS device. When the VCO the MOS device is between control voltage approaches inversion and depletion, where its capacitance value is voltage dependent, as shown in Fig. 12. By altering the gate and source/drain connections of the PMOS as shown in Fig. 11(d), it will operate in accumulation where the capacitance value V in is less voltage dependent, as shown for Fig. 12. To avoid strong power-supply noise injection, the well must be connected to the same node as source and drain, as shown in Fig. 11(d). The corresponding filter model is

LARSSON: 2–1600-MHz CMOS CLOCK RECOVERY PLL

Fig. 12.

1957

Voltage-dependent capacitance of filter in Fig. 11(c) and (d).

shown in Fig. 11(e), where is the parasitic well-to-substrate capacitance of the MOS capacitor. This filter has an impedance of (3) which is a close approximation to the impedance of the original filter in Fig. 11(a), given as (4) when

as is common design practice [20]. VII. PHASE-FREQUENCY DETECTOR

Phase detectors may exhibit a dead zone, resulting in enlarged jitter. A common design technique to avoid a dead zone is to make sure that both Up and Down output signals are fully activated before shutting them both off. This is implemented by generating a reset signal with an AND operation of Up and Down output and introducing a delay before feeding back this signal to reset the phase detector. It is this reset delay and in Fig. 10(b). that causes the simultaneous If the charge sharing in the charge pump is not perfectly and there will cancelled or if there is a mismatch of always be some current compensation, leading to phase offset and loop filter ripple, as discussed in Section V. A longer reset delay results in a longer period during which the VCO is running at a different frequency due to the compensation current. Therefore, the reset delay should be minimized under the constraint that it has to be longer than the response time of the PFD with some additional design margin to avoid a dead zone. A PFD with low logic depth is shown in Fig. 13, including details of the Up section. Its operation is easiest to analyze by assuming an initial state of This implies that and that is discharges and sets precharged high. A rising edge on without changing the state of the RS flip-flop. The path will assure that internal weak feedback in the is kept active even if falls. At the next rising edge on V, is activated, which sets This triggers the RS high, which shuts off ; and, at the flip-flop to precharge is deactivated in a similar way. same time, sets which is In summary, a positive edge on reset by the next positive edge on This behavior is identical

Fig. 13. Phase-frequency detector used in Loop A with details of Up section.

to the two classical PFD’s implemented by either four RS flipflops or two resettable D—flip-flops. The precharged gate and the shorter logic depth of this implementation make the delay shorter than for the standard PFD’s. This allows a smaller delay in the reset path for eliminating the dead zone, such that loop filter ripple will be reduced and generate less noise. An additional benefit of low logic depth is a reduction in phase detector jitter caused by power-supply-dependent delays and device noise. The reset delay of this PFD can be further reduced by directly reset the precharged gate letting the signal simultaneously as the RS flip-flop is reset. This technique was not adopted in order to keep a conservative design, guaranteeing operation with no dead zone. Similar precharged gates have previously been used in PFD designs [27]–[29]. VIII. FREQUENCY DIVIDER To enable high flexibility, the frequency divider in Fig. 2 is ) divider. The structure in a fully programmable ( [30] based on a clock-gated dual-modulus prescaler followed by a counter was chosen to achieve high speed at low supply voltages. The divider was realized in standard static CMOS logic, reaching a maximum operating frequency of 800 MHz in simulations of worst case slow process variation at V and C This exceeded the simulated speed limit of the VCO. The potential startup deadlock in [30] was eliminated by logic that prohibits two consecutive clock pulse removals. IX. PLL OPERATING RANGE

AND JITTER

The maximum operating frequency of the PLL measured at room temperature is plotted in Fig. 14 as function of Simulations indicate that the speed is limited by the VCO. A of 0.9 V agrees well with the measured minimum V. At low power-supply voltages, the speed cannot However, compare with high-end circuits using standard the operating frequency range exceeds that of low-voltage circuit implementations [2], [3], [25], [26]. The maximum speed also compares favorably with another low-voltage PLL based on a low-threshold process [18]. With a PLL bandwidth of 2 MHz, the tracking jitter is 5.2 ps rms at 1200 MHz, as shown in Fig. 15(a). This measurement represents the standard deviation of the delay between a

1958

Fig. 14.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999

Measured maximum PLL operating frequency.

triggering clock edge and a clock edge occurring 320 ns later according to the setup in Fig. 2(e) in [31]. The delay was chosen four times larger than the delay at which the “jitter ns) to knee” occurs in Fig. 2(f) in [31] ( get reliable data. All signals on the chip are periodic with the reference frequency, so when using the frequency divider output as a triggering signal, most of the jitter due to power supply and board noise will cancel in the measurement. Such a setup gave an rms jitter of 2.5 ps at 1200 MHz, as shown in Fig. 15(b). The jitter with respect to an ideal reference would in this ps [31]. This proves that device noise case be in a standard ring oscillator is tolerable for communication standards with very tight jitter tolerances such as SONET OC48 (2.5 Gb/s), which has a jitter specification of 4 ps rms at a 2-MHz PLL bandwidth. The VCO power consumption was 5 mW at 1200 MHz in simulation of extracted layout. The figure of merit defined in [31] is estimated from

(a)

(5) is the PLL bandwidth, is the measured longwhere is the tracking term self-referenced tracking jitter, and jitter with respect to an ideal reference clock. Table I lists and derived as function of operating frequency. measured The jitter reported here is lower than that in [32] due to an improved measurement setup and more accurate measurement equipment. The of this oscillator is better than that reported and a for bipolar implementations in [31] of (as derived from the CMOS VCO with a for a complete Slide Supplement of [33]). A reported for a stand-alone PLL is similar to VCO in [34], suggesting that noise contributions from the other PLL components (charge pump, PFD, frequency divider) are much smaller than the VCO noise. The tracking jitter compares favorably with several other CMOS oscillators [e.g., [8], [33], and [35]–[37]. degrades significantly at 1600-MHz operation, indicating the speed limit of the PLL. The measurements are also used for estimating the VCO period due to device noise, based on the relation [31] jitter (6)

(b) Fig. 15. A 1.2-GHz tracking jitter histogram. (a) Including power supply and board noise. (b) Supply and board noise cancelled.

where is the VCO period. The derived period jitter agrees to within 10% of the phase noise estimation technique in [38]. The PLL tracking jitter is plotted as a function of frequency for various power-supply voltages in Fig. 16. These measureand a fixed PLL bandwidth. ments were taken with the chargeSince the loop filter resistance changes with in Fig. 7 was adjusted until a 3-dB pump reference current PLL bandwidth of 2 MHz was measured. A bandwidth of 2

LARSSON: 2–1600-MHz CMOS CLOCK RECOVERY PLL

TABLE I PLL JITTER FOR V DD = 2:5 V MEASURED WITH A PLL BANDWIDTH OF 2 MHz

Fig. 16. Tracking jitter as function of V dd and frequency with a PLL bandwidth of 2 MHz. The right-hand scale represents the figure of merit  [31].

MHz represents a scaling factor of approximately 5000 in (5), as indicated by the right-hand scale in Fig. 16. The VCO frequency is set by the tail current in the differential delay stages and is practically independent of Therefore, the power consumption at a fixed VCO frequency The graph in Fig. 16 shows that drops linearly with suggesting that power the jitter does not change with ) can be achieved without jitter reduction (by lowering penalty. This seems to contradict the common belief that jitter should increase with lower power consumption. However, the critical parameter for low jitter is not power consumption but current consumption, as has previously been theoretically derived for LC oscillators [39], [40]. As shown by these measurements, low-jitter design with a fixed power budget and as large a current should be based on a minimum as can be tolerated. The measured power consumption at 250 MHz and 2.5 V is 18 mW and is dominated by buffers and the current-mode interpolator. Simulations of extracted layout indicate that the VCO consumes 0.7 mW at 250 MHz and 5 mW at 1200 MHz. The PLL characteristics are summarized in Table II. X. CONCLUSION Clock recovery circuits in CMOS processes require special techniques for initial frequency locking. This need is due to the fact that CMOS process variations dictate a larger frequency tuning range than can be covered by existing frequency detectors for NRZ data. An attractive technique for initial

1959

TABLE II PLL CHARACTERISTICS

frequency locking is phase selection clock recovery, where a multioutput VCO is locked onto a reference clock and a clock recovery loop selects one of the output phases of the VCO. The large period jitter in traditional phase selection clock recovery is eliminated by the feedback phase selection technique presented here. This scheme filters the phase jumps through the PLL loop filter and also enables accurate phase interpolation with digital circuitry only, as opposed to the conventional analog-style phase interpolation. In applications where the PLL is programmable, important loop characteristics such as bandwidth and damping factor change with the frequency multiplication mode. By making the charge-pump current depend on the division ratio in the feedback divider, a fixed bandwidth and damping factor can be obtained. Differential analog circuits have superior supply noise rejection compared to digital complementary logic styles and are therefore preferred in an environment with large powersupply noise. However, previous differential PLL implementations have used circuits requiring large headroom, thereby operation. Circuit techniques for PLL prohibiting lowoperation in components are discussed that enable lowa process technology without low-threshold devices. Correct V, but operation has been verified down to the lower limit for differential operation is estimated to be V in a process with V and V. measured conMeasurements show that jitter is independent of tradicting the common belief that jitter is strongly correlated to power consumption. At a fixed operating frequency, power reduction is achieved without any penalty in jitter performance by lowering The tracking jitter at 500–1500 MHz was measured to be 2–5 ps rms dominated by device noise. This indicates that a standard ring oscillator can fulfill the jitter specification for a SONET OC-48 receiver. This paper demonstrates that a clock recovery circuit with programmable bit rates can be realized with a large frequency tuning range. Robust operation and low jitter are achieved over a large range of power-supply voltages, making it ideal for low-power applications and suitable as a reusable macrocell.

1960

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999

REFERENCES [1] K. Bult, “Analog broadband communication circuits in pure digital deep sub-micron CMOS,” in Proc. IEEE Int. Solid-State Circuits Conf., 1999, pp. 76–77. [2] H. Neuteboom, B. M. J. Kup, and M. Janssens, “A DSP-based hearing instrument IC,” IEEE J. Solid-State Circuits, vol. 32, pp. 1790–1806, Nov. 1997. [3] W. Lee, P. E. Landman, B. Barton, S. Abiko, H. Takahashi, H. Mizuno, S. Muramatsu, K. Tashiro, M. Fusumada, L. Pham, F. Boutaud, E. Ego, G. Gallo, H. Tran, C. Lemonds, A. Shih, M. Nandakumar, R. H. Eklund, and I. C. Chen, “A 1-V programmable DSP for wireless communication,” IEEE J. Solid-State Circuits, vol. 32, pp. 1766–1776, Nov. 1997. [4] F. M. Gardner, Phase-Lock Techniques. New York: Wiley, 1979. [5] R. J. Baumert, P. C. Metz, M. E. Pedersen, R. L. Pritchett, and J. A. Young, “A monolithic 50–200 MHz CMOS clock recovery and retiming circuit,” in Proc. IEEE Custom Integrated Circuits Conf., 1989, pp. 14.5.1–4. [6] K. M Ware and C. G. Sodini, “A 200-MHz CMOS phase-locked loop with dual phase detectors,” IEEE J. Solid-State Circuits, vol. 24, pp. 1560–1568, Dec. 1989. [7] J. Sonntag and R. Leonowich, “A monolithic CMOS 10 MHz DPLL for burst-mode data retiming,” in Proc. IEEE Int. Solid-State Circuits Conf., 1990, pp. 194–195. [8] M. Horowitz, A. Chan, J. Cobrunson, J. Gasbarro, T. Lee, W. Leung, W. Richardson, T. Thrush, and Y. Fujii, “PLL design for a 500 MB/s interface,” in Proc. IEEE Int. Solid-State Circuits Conf., 1993, pp. 160–161. [9] S. Sidiropoulos and M. A. Horowitz, “A semidigital dual delay-locked loop,” IEEE J. Solid-State Circuits, vol. 32, pp. 1683– 1692, Nov. 1997. [10] S. Kasturia, “A novel fractional divider for improving the switching speed of phase-locked frequency synthesizers,” Bell Labs Tech. Memo., May 1995. [11] J. G. Maneatis, personal communication, Feb. 1999. [12] T. H. Lee, K. S. Donnelly, J. T. C. Ho, J. Zerbe, M. G. Johnson, and T. Ishikawa, “A 2.5 V CMOS delay-locked loop for an 18 Mbit, 500 megabytes/s DRAM,” IEEE J. Solid-State Circuits, vol. 29, pp. 1491–1496, Dec. 1994. [13] T. H. Hu and P. R. Gray, “A monolithic 480 Mb/s parallel AGC/decision/clock-recovery circuit in 1.2 m CMOS,” IEEE J. Solid-State Circuits, vol. 28, pp. 1314–1320, 1993. [14] C.-H. Park and B. Kim, “A low-noise 900 MHz VCO in 0.6 m CMOS,” IEEE J. Solid-State Circuits, vol. 34, pp. 1586–1591, May 1999. [15] J. Lee and B. Kim, “A 250 MHz low jitter adaptive bandwidth PLL,” in Proc. IEEE Int. Solid-State Circuits Conf., 1999, pp. 346–347. [16] A. Hajimiri and T. H. Lee, “A general theory of phase noise in electrical oscillators,” IEEE J. Solid-State Circuits, vol. 33, pp. 179–195, Feb. 1998. [17] K. Iravani, F. Saleh, D. Lee, P. Fung, P. Ta, and G. Miller, “Clock and data recovery for 1.25 Gb/s Ethernet transceiver in 0.35 m CMOS,” in Proc. Custom Integrated Circuits Conf., 1999, pp. 261–264. [18] V. von Kaenel, D. Aebisher, C. Piguet, and E. Dijkstra, “A 320 MHz, 1.5 mW at 1.35 V CMOS PLL for microprocessor clock generation,” in Proc. IEEE Int. Solid-State Circuits Conf., 1996, pp. 132–133. [19] J. G. Maneatis, “Low-jitter process-independent DLL and PLL based on self-biased techniques,” IEEE J. Solid-State Circuits, vol. 31, pp. 1723–1732, Nov. 1996. [20] F. M. Gardner, “Charge-pump phase-lock loops,” IEEE Trans. Communications, vol. COM-28, pp. 1849–1858, Nov. 1980. [21] M. Johnson and E. Hudson, “A variable delayline PLL for CPUcoprocessor synchronization,” IEEE J. Solid-State Circuits, vol. SC-23, pp. 1218–1223, Oct. 1988. [22] P. Larsson and J.-Y. Lee, “A 400 mW 50–380 MHz CMOS programmable clock recovery circuit,” in Proc. IEEE ASIC Conf. Exhibit, 1995, pp. 271–274. [23] I. Bloom and Y. Nemirovsky, “1=f noise reduction of metal-oxidesemiconductor transistors by cycling from inversion to accumulation,” Appl. Phys. Lett., vol. 58, no. 15, pp. 1664–1666, Apr. 1991.

[24] S. L. J. Gierkink, E. A. M. Klumperink, T. J. Ikkink, and A. J. M. van Tuijl, “Reduction of intrinsic 1=f device noise in a CMOS ring oscillator,” in Proc. IEEE European Solid-State Circuits Conf., 1998, pp. 272–275. [25] J. Crols and M. Steyeart, “Switched-opamp: An approach to realize full CMOS switched-capacitor circuits at very low power supply voltages,” IEEE J. Solid-State Circuits, vol. 29, pp. 936–942, Aug. 1994. [26] A. M. Abo and P. R. Gray, “A 1.5 V, 10-bit, 14 MS/s CMOS pipeline analog-to-digital converter,” IEEE J. Solid-State Circuits, vol. 34, pp. 599–606, May 1999. [27] S. Kim, K. Lee, Y. Moon, D-K. Jeong, Y. Choi, and H. K. Lim, “A 960Mb/s/pin interface for skew-tolerant bus using low jitter PLL,” IEEE J. Solid-State Circuits, vol. 32, pp. 691–700, May 1997. [28] D. W. Boerstler and K. A. Jenkins, “A phase-locked loop clock generator for a 1 GHz microprocessor,” in Proc. IEEE Symp. VLSI Circuits, 1998, pp. 212–213. [29] H. O. Johansson, “A simple precharged CMOS phase frequency detector,” IEEE J. Solid-State Circuits, vol. 33, pp. 295–298, Feb. 1998. [30] P. Larsson, “High-speed architecture for a programmable frequency divider and a dual-modulus prescaler,” IEEE J. Solid-State Circuits, vol. 31, pp. 744–748, May 1996. [31] J. A. McNeill, “Jitter in ring oscillators,” IEEE J. Solid-State Circuits, vol. 32, pp. 870–879, June 1997. [32] P. Larsson, “A 2–1600 MHz 1.2–2.5 V CMOS clock-recovery PLL with feedback phase-selection and averaging phase-interpolation for jitter reduction,” in Proc. IEEE Int. Solid-State Circuits Conf., 1999, pp. 356–357. [33] J. F. Ewen et al., “Single-chip 1062 Mbaud CMOS transceiver for serial data communication,” in Proc. IEEE Int. Solid-State Circuits Conf., 1995, pp. 32–33. [34] A. Hajimiri, S. Limotyrakis, and T. H. Lee, “Jitter and phase noise in ring oscillators,” IEEE J. Solid-State Circuits, vol. 34, pp. 790–804, June 1999. [35] I. Novof, J. Austin, R. Chmela, T. Frank, R. Kelkar, K. Short, D. Strayer, M. Styduhar, and S. Watt, “Fully-integrated CMOS phase-locked loop with 15–240 MHz locking range and 50 ps jitter,” in Proc. IEEE Int. Solid-State Circuits Conf., 1995, pp. 112–113. [36] Z.-X. Zhang, H. Du, and M. S. Lee, “A 360 MHz 3 V CMOS PLL with 1 V peak-to-peak power supply noise tolerance,” in Proc. IEEE Int. Solid-State Circuits Conf., 1996, pp. 134–135. [37] I. A. Young, M. F. Mar, and B. Bhushan, “A 0.35 m CMOS 3–880 MHz PLL N/2 clock multiplier and distribution network with low jitter for microprocessors,” in Proc. IEEE Int. Solid-State Circuits Conf., 1997, pp. 330–331. [38] A. Demir, A. Mehrotra, and J. Roychowdhur, “Phase noise in oscillators: A unifying theory and numerical methods for characterization,” in Proc. ACM/IEEE Design Automation Conf., June 1998, pp. 26–31. [39] Q. Huang, “On the exact design of RF oscillators,” in Proc. IEEE Custom Integrated Circuits Conf., 1998, pp. 41–44. [40] P. Kinget, “Integrated GHz voltage controlled oscillators,” in Proc. Advances in Analog Circuit Design, Nice, France, Mar. 1999.

Patrik Larsson received the Ph.D. degree from Linkoping University, Sweden, in 1995. During his Ph.D. research, he investigated the inherent analog properties of digital circuits, such as di=dt noise, clock skew, and clock slew rate. After graduation, he joined Bell Laboratories, where he is currently working on VCO’s and PLL’s for gigabit/second communication. He has also been working on low-power digital filtering for cable modems, equalization, and clock recovery structures while maintaining his interest in di=dt noise.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

805

A 2.5-Gb/s Clock and Data Recovery IC with Tunable Jitter Characteristics for Use in LAN’s and WAN’s Keiji Kishine, Member, IEEE, Noboru Ishihara, Member, IEEE, Ken-ichi Takiguchi, and Haruhiko Ichino, Member, IEEE Abstract— A 2.5-Gb/s monolithic clock and data recovery (CDR) IC using the phase-locked loop (PLL) technique is fabricated using Si bipolar technology. The output jitter characteristics of the CDR can be controlled by designing the loop-gain design and by using the switched-filter PLL technique. The CDR IC can be used in local-area networks (LAN’s) and in long-haul backbone networks or wide-area networks (WAN’s). Its power consumption is only 0.4 W. For LAN’s, the jitter generation of the CDR when the loop gain is optimized is 1.2 ps (0.003 UI). The jitter characteristics of the CDR optimized for WAN’s meet all three types of STM-16 jitter specifications given in ITU-T Recommendation G.958. This is the first report on a CDR that can be used for both LAN’s and WAN’s. This paper also describes the design method of the jitter characteristics of the CDR for LAN’s and WAN’s. Index Terms—Clock and data recovery (CDR), IC, jitter suppression, local-area network (LAN), low jitter, phase-locked loop (PLL), transmission receiver, wide-area network (WAN), 2.5 Gb/s.

I. INTRODUCTION

O

PTICAL communication systems, which are used in local-area networks (LAN’s) and wide-area networks (WAN’s), are expected to play an important role in realizing the future multimedia society. These systems must be compact, economical to produce, and efficient in terms of power consumption. Given these requirements, researchers have been developing low-power and small-size optical receiver/sender (OR/OS) modules. A clock and data recovery (CDR) circuit is one of the key components of the OR, which must have retiming, reshape, regeneration (3R) operation. To ensure that the receivers have low power consumption and are costeffective and compact, it is essential to employ a single-chip, adjustment-free CDR IC using the phase-locked loop (PLL) technique without any high- components. A number of approaches have been proposed for developing a CDR IC using the PLL technique [1], [2]. Generally, the jitter specifications for the CDR differ depending on what it is being used for, and jitter suppression is one especially serious problem for the CDR-IC design. There are different jitter specifications for the following two applications. Manuscript received August 19, 1998; revised February 8, 1999. K. Kishine and H. Ichino are with NTT Network Innovation Laboratories, Yokosuka, Kanagawa 239-0847 Japan. N. Ishihara is with NTT Opto-electronics Laboratories, Atsugi-shi, Kanagawa 243-01 Japan. K. Takiguchi is with NTT Electronics Corp., Atsugi-shi, Kanagawa Pref. 243-0032 Japan. Publisher Item Identifier S 0018-9200(99)04198-0.

Case 1) LAN’s such as gigabit/second ethernets, fiber channels, and other optical interconnections. They use a single span of a transmission medium. Case 2) Backbone networks or WAN’s such as synchronous digital hierarchy (SDH) or synchronous optical network. They use line regenerators to transport information over long distances. For case 1), the CDR must suppress mainly the jitter generated due to noise in the CDR, so-called jitter generation. In case 1), there is no jitter accumulation due to cascaded regenerators. For case 2), however, the ITU-T G.958 recommendation for SDH stipulates other specifications [3]: a) jitter transfer specification, which is the criterion of the suppression of the noise in input signals to line regenerators, and b) jitter tolerance specification. This paper describes a 2.5-Gb/s CDR that can be used in both cases, which eliminates the need to fabricate two chips with different characteristics. The key design techniques are based on the switched-filter (SF) PLL technique and loop gain adjustment using a gain control amplifier (GCA) circuit. The CDR IC is fabricated using 0.5- m Si bipolar technology. The loop gain and loop bandwidth can be adjusted using a control signal from outside the chip. For case 1), the rms jitter generation of the CDR can be reduced to only 1.2 ps, and the capture range is 150 MHz. For case 2), the jitter of the CDR meets the jitter specifications of the ITU-T G.958 recommendation. The rms jitter generation is 3.6 ps, and the capture range is 50 MHz. The power consumption of the CDR for both cases is only 0.4 W. In Section II, the concept of the suppression of jitter in each case is discussed. It is explained that the SF PLL technique can be used in the CDR for both cases. Design details and the configurations of the circuits of the CDR are given in Section III. Section IV discusses the experimental results, which show that the CDR has very good jitter characteristics, and discusses the feasibility of using the CDR for various transmission systems. II. CONCEPT FOR JITTER-SUPPRESSION DESIGN A. Jitter Characteristics of CDR Using the PLL Technique Generally, output jitter of a CDR based on the PLL technique can be caused by two kinds of sources: 1) additive noise that accompanies the input signal [Fig. 1(a)] and 2)

0018–9200/99$10.00  1999 IEEE

806

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

(a)

(b)

Fig. 2. Loop-gain dependence of jitter.

Fig. 1. Noise source (a) in input signal and (b) in PLL.

noise generated in the CDR [Fig. 1(b)]. In Fig. 1(b), two cases [noise forward and present for voltage-controlled oscillation (VCO)] have been shown to be equivalent in terms of VCOphase fluctuation [4]. In addition, because the phase drift of the VCO output due to the random input data is random, Fig. 1(b) gives a rough approximation of the noise due to the input data pattern, with no jitter applied. To suppress the jitter caused by additive noise, the CDR should be designed so that the noise bandwidth of the PLL is minimized. The output jitter (the phase deviation [in rad]) of the CDR using the PLL technique is expressed as [4] (1) is the power spectra density of noise, is the input where is the natural angular frequency, is the signal amplitude, damping factor, and is the loop gain. When the loop gain becomes larger, the jitter becomes larger, as shown in the Appendix. This means that smaller loop gain causes narrow noise bandwidth, thereby suppressing jitter. It should be noted that smaller loop gain leads to a smaller cutoff frequency of the jitter transfer function of a PLL. On the other hand, in order to suppress the jitter caused by noise generated in the CDR circuit, the operation of the CDR circuit needs to be made stable. This stability can be obtained by reducing the signal fluctuation in the CDR circuit caused by the input of consecutive data bits, device noise, and so on. This is the socalled suppression of jitter generation, which is specified for SDH in ITU-T Recommendation G.958. The output jitter (in rad) in this case (jitter generation) is expressed as [4] (2) where is the power spectra density of noise. This equation is derived assuming that the instantaneous frequency deviation of the VCO output is caused by disturbance due to random phase becomes larger, noise. In this equation, when the loop gain becomes smaller, as shown in the Appendix. In the jitter other words, the jitter increases as the loop gain decreases. Larger loop gain can reduce the jitter caused by the noise in

the CDR. However, larger loop gain results in a larger cutoff frequency of the jitter transfer function. Consequently, there is a tradeoff, as shown in Fig. 2, between reducing the jitter shown in Fig. 1(a) (equivalent to reducing the cutoff frequency of the jitter transfer function) and reducing the jitter shown in Fig. 1(b) (jitter generation). When the jitter of Fig. 1(a) is dominant, the loop gain must be controlled to be lower (area 1 in Fig. 2). When the jitter of Fig. 1(b) is dominant, it must be controlled to be higher (area 2 in Fig. 2). The optimum loop gain depends on which type of jitter has a greater effect on the output jitter of the CDR and causes degradation of transmission quality in the system. B. Design of the CDR Given the previous discussion, it is clear that there should be two types of CDR design, one for LAN’s and another for WAN’s. 1) CDR for LAN’s: In the case for LAN’s, the jitter from input signals is small because there is no jitter accumulation through the short and single laser-fiber-receiver span. We can therefore concentrate on reducing the jitter generation, which is caused by the input-signal-pattern dependence of the circuit, the fluctuation of the supply voltage, and device noise in the CDR. As described in Section II-A, this design should not utilize smaller loop gain to lower the cutoff frequency, but instead should utilize larger loop gain to achieve smaller output jitter. 2) CDR for WAN’s: In the case for backbone networks or WAN’s, the regenerator may be cascaded in order to transport information over long distances, causing the jitter to accumulate. Therefore, not only the jitter generation of the CDR has to be taken into consideration but also the jitter transfer characteristics, which is the criterion of suppression of noise in input data signals as given in ITU-T Recommendation G.958. There is a tradeoff between reducing the jitter generation and reducing the cutoff frequency of the jitter transfer function. a) Jitter transfer: The loop gain of the CDR IC using the PLL technique, on an IC whose jitter transfer specifications meet those of ITU-T Recommendation G.958, must be designed to be lower. The jitter transfer function of the 2.5-Gbit/s PLL using a lag-lead filter can be expressed by substituting the

KISHINE et al.: CDR IC FOR LAN’S AND WAN’S

807

Fig. 3. Jitter transfer function.

phase transfer function for the jitter transfer function as

(3) This function is plotted in Fig. 3. It is a curve when the time and of the lag-lead filter described in the constants Appendix are set to values that provide that the natural angular becomes nearly 2 MHz, and the damping factor frequency is larger than the value where the jitter gain peaking is less than 0.1 dB. Fig. 3, in which it is stipulated that the loop gain is lower than 3.2 10 (1/s), indicates that the curve of the smaller loop gain meets the ITU-T (STM-16) specification. b) Jitter generation: As described in Section II-A, as the loop gain becomes small, it becomes more difficult to suppress the jitter generation because of the tradeoff between reducing jitter generation and reducing the cutoff frequency of the jitter transfer function. To solve this problem, we introduce the SF PLL technique. Fig. 4 shows the SF CDR configuration, which we originally proposed as a way to maintain a precise clock signal, thereby achieving tolerance to the input of consecutive data bits. (Our previous work, 156-Mb/s SF CDR [5], has no GCA in Fig. 4.) The main features of SF circuit operation are that the PC output can be transferred to the low-pass filter (LPF) only when data transitions occur (sample mode) and the LPF output can be constant during consecutive data inputs (hold mode). These features prevent the phase drift of VCO output during the input of consecutive data bits. We thought that the equivalent high- operation of the SF circuit could be utilized to reduce the jitter generation. Fig. 5 shows simulation results of the change in differential output voltage of the loop filter when the input signal changes from a 1/0repeated bitstream to consecutive data bits (in this case, “0”) at 805 ns. Fig. 5 shows the filter output of the SF CDR levels out, while that of the CDR without the SF circuit begins to degrade at 805 ns. This means that the operation of the SF circuit is equal to that of the larger RC time constant of the loop filter, and the jitter generation due to the input signal pattern would be more suppressed than that of the CDR without an SF circuit.

In other words, the SF circuit would provide equivalent highoperation and achieve low jitter operation. We also thought this advantage of the SF circuit could be used to solve the tradeoff problem. Fig. 6 shows the SPICE simulation results of the cutoff-frequency (of the jitter transfer curve) dependence of the jitter generation of the 2.5Gb/s CDR’s both with the SF circuit and without it. Both curves indicate that the jitter generation decreases as the cutoff frequency increases. Furthermore, the jitter generation of the SF CDR is 70% lower than that of the CDR without the SF circuit. It is noteworthy that the suppression of jitter generation by the SF circuit is marked. In addition, the jitter characteristics of the SF CDR meet the STM-16 specifications (rms jitter generation that is lower than 4 ps, and equivalent jitter transfer specification in which the cutoff frequency at 3dB jitter gain is lower than about 2.8 MHz), while the CDR circuit without the SF circuit fails to meet the specifications. In the SPICE simulation, the source of jitter is the instability of the circuit operation of the CDR, with no jitter applied at random input. In the experimental results, the device noise, the fluctuations of supply voltage, and so on are also noise sources. The jitter generation in experiments is therefore larger than that in the simulation results. The simulation results do, however, show the characteristics of jitter generation versus jitter transfer functions. Given our findings, we conclude that in the design of the CDR IC using the PLL technique, an IC used in backbone networks or WAN’s, the loop gain must be large enough so that the jitter generation meets the ITU-T specs, yet small enough so that suitable jitter transfer characteristics are obtained. III. CIRCUIT DESIGN A block diagram of the CDR, including a GCA circuit between the SF and VCO, is shown in Fig. 4 [6]. This CDR can be used in both short- and long- distance transmission systems by adjusting the loop gain through the GCA from outside the chip. The main features of the CDR are 1) an SF circuit for equivalent high- operation [5], 2) optimum timing adjustment between extracted clock and input data, and 3) loop gain control on optimization. A. VCO In Fig. 7, the circuit configuration of VCO is shown. The oscillation frequency is controlled by the voltage swing of “VC1, VC2,” which is determined by the feedback signal from GCA. The free-running frequency is determined by the current IF1, IF2, which can be controlled from outside the chip. The free-running frequency can be adjusted from 2.2 to 2.8 GHz. It covers the free-running frequency deviation caused by fluctuations in device performance. Fig. 8 shows the tuning-voltage (feedback-voltage) dependence of the oscillation frequency. The simulation results are in good agreement with the experimental results. The VCO modulation frequency sensitivity is designed to be about 1 GHz/V. The oscillation frequency range by a feedback signal is 200 MHz. The tuning-range diagram is shown in Fig. 9. The total tunable range is sufficiently wide, from 2.0 to 3.0 GHz.

808

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

Fig. 4. SF CDR with GCA.

Fig. 5. Output voltage of low-pass filter (differential output). Fig. 6. Cutoff-frequency dependence of jitter generation.

B. GCA A current-bypass GCA circuit is used in the CDR (see Fig. 10). The gain of the GCA, which can be controlled from outside the chip, can be varied from 40 to 0 dB. To lower the jitter generation of the CDR, the gain should be higher. On the other hand, to achieve the lower cutoff frequency of the jitter transfer curve, it must be lower. Therefore, gain should be adjusted according to the jitter specification in each case. C. Delay Circuit To reduce jitter generation, the edge-inclined circuit in the 90 -delay block shown in Fig. 4, which includes a capacitor for delay control [5], is replaced by a chain of emitter-coupled logic (ECL) buffer circuits without capacitors, the delay of which can be adjusted from about 100 to 300 ps from outside the chip. The edge-inclined circuit was employed to make the 156-Mbit/s CDR smaller. But the delay needed for the 2.5-Gbit/s CDR is only 200 ps, which is much smaller than that needed for the 156-Mbit/s CDR. Therefore, only a small number of ECL circuits are needed for the 200-ps delay, and

the delay circuit itself is relatively small. In addition, in the ECL circuit, there is no input-data-pattern dependence of the response in the edge-inclined circuit capacitor. When the ECL delay circuit is used, the simulated jitter due to the input data pattern is about 80% of that when an edge-inclined-delay circuit is used. As a result of using the ECL circuit, the jitter due to the input pattern effect is more suppressed than in the circuit reported in our previous work [5]. D. Loop Filter The lag-lead filter is used as the loop filter, and the RC time constant is adjusted for each use. An additive capacitor outside the chip is not needed when the CDR is used for short-distance transmission systems. It is, however, needed for long-distance use. E. Other Considerations Furthermore, in order to guarantee jitter tolerance, it is important to maintain an optimum timing adjustment between

KISHINE et al.: CDR IC FOR LAN’S AND WAN’S

809

Fig. 7. VCO configuration.

Fig. 10. GCA configuration. Fig. 8. VCO tuning curve.

Fig. 11. GCA gain dependence of jitter generation.

Fig. 9. Range of controllable oscillation frequency.

the extracted clock and the input data. This timing adjustment is attained by allowing the clock to trigger the center of the data period by means of the phase-comparator (PC) output

signal generated from the comparison between the phase of the delay flip-flop output and the 90 -delayed phase for the input data. In addition, in order to lower the power consumption, the new 2.5-Gb/s CDR IC uses stacked differential pairs on two levels, which enables its supply voltage to be decreased to 3.0 V (as opposed to 5.2 V in our previous work [5]). Furthermore, current dissipation is optimized in each block to reduce power consumption. IV. EXPERIMENTAL RESULTS A new chip was fabricated using the 0.5- m super selfaligned process technology Si bipolar process [7]. It was

810

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

(a)

(b)

Fig. 12.

Output waveforms of the CDR for LAN. (a) Data output. (b) Clock output.

(a)

(b)

Fig. 13.

Output waveforms of the CDR for WAN. (a) Data output. (b) Clock output.

mounted in a 7 7 mm -square ceramic package. The CDR IC of both high and low loop gain is evaluated in each case when the gain is adjusted to both short- and long-distance transmission systems. Jitter was measured with a commercial jitter analyzer. An rms jitter-generation value from which the jitter value of input data is subtracted can be obtained with the analyzer. A. SF CDR for LAN The internal capacitor in the loop filter is 10 pF, and an external capacitor is not needed in this case. The GCA gain dependence of the jitter generation is shown in Fig. 11. The jitter generation decreases as the gain increases, and the lowest point is when the gain is larger than about 8 dB. The loop 10 (1/s). The output waveforms gain at this point is 1.2 when the loop gain is set to that point and input data is 2.488 32

Gb/s are shown in Fig. 12. The eye opening of the output data was sufficiently wide, and clock extraction was very precise. The rms jitter generation is 1.2 ps and the capture range is over 150 MHz. B. SF CDR for WAN To lower the cutoff frequency of the jitter transfer curve, the external capacitor for the loop filter of 0.1 F is added. The loop gain is set to about 2 10 (1/s), which is the loop gain when the jitter transfer curve meets the ITU-T jitter transfer specification in Fig. 3. The output waveforms when the loop gain is set to the point above are shown in Fig. 13. Again, the eye opening of the output data was sufficiently wide, and clock extraction was very precise. The rms jitter generation is 3.6 ps, which is larger than that of the CDR when its loop gain is adjusted for

KISHINE et al.: CDR IC FOR LAN’S AND WAN’S

Fig. 14.

811

Jitter transfer function.

Fig. 16. Loop-gain dependence of jitter generation and cutoff frequency of jitter transfer function. Fig. 15.

Jitter tolerance curve.

short-distance transmission systems, but is smaller than the specification of the jitter generation of 4.0 ps (for STM-16; ITU-T Recommendation G.958). The capture range is over 50 MHz. Fig. 14 shows the measured jitter transfer function of the CDR in this case. The curve meets the ITU-T G.958 specification. Fig. 15 shows the jitter tolerance curve when the input jitter magnification is 120% of the ITU-T specification. The squares indicate error-free operation (where the error rate is lower than 10 ). The rms jitter generation, jitter tolerance, and jitter transfer function all meet the jitter specifications in ITU-T G.958. The relationship between the measured jitter generation (or cutoff frequency) and the loop gain in this experiment is shown in Fig. 16. The darker shaded area is for the CDR, whose jitter characteristics meet the specifications of ITU-T Recommendation G.958. In the area of larger loop gain, the jitter generation becomes small. Fig. 16 shows clearly that, when its loop gain is optimized, the CDR IC is suitable for both LAN’s and WAN’s. The capture range of both types of CDR’s is wide enough to cover the deviation in the free-running frequency due to changes in temperature (ranging from 5 to 90 C In addition, the power consumption (including that of

the I/O circuit) in both cases is less than 35% of that in the 2.5-Gb/s PLL’s reported previously [1], [2]. V. CONCLUSION The design method of the CDR for both LAN’s and WAN’s is presented. A new 2.5-Gb/s SF monolithic CDR IC using the 0.5- m Si bipolar process has been developed. The CDR IC can be used in the transmission receivers for both LAN’s and WAN’s. The rms jitter generation of the CDR adjusted for LAN’s is 1.2 ps. Furthermore, the jitter characteristics of the CDR for backbone networks or WAN’s meet the specifications for STM-16 given in ITU-T Recommendation G.958. In addition, the power consumption of the CDR is only 0.4 W. APPENDIX A. Loop-Gain Dependence of the Jitter Shown in Fig. 1 The jitter due to the noise in the input signal to the PLL is expressed as (1). When the loop filter is a lag-lead type (the and and the series and shunt register are respectively and shunt capacitance is ), the natural angular frequency

812

the damping factor

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

are expressed as

where and in (1) therefore becomes

[7] C. Yamaguchi, Y. Kobayashi, M. Miyake, K. Ishii, and H. Ichino, “A 0.5  bipolar technology using a new base formation method,” in Proc. BCTM, 1993, pp. 63–66.

m

The term

(A1.1)

This shows that Furthermore,

increases as the loop gain increases. in (1) can be expressed as

(A1.2) increases. Therefore, the This also shows that jitter in Fig. 1 increases as the loop gain increases. B. Loop-Gain Dependence of the Jitter Shown in Fig. 2 The jitter due to the noise in the PLL can be expressed as is (2). In (2), the term (A2.1)

Keiji Kishine (M’98) was born in Kyoto, Japan, on October 26, 1964. He received the B.S. and M.S. degrees in engineering science from Kyoto University, Kyoto, in 1990 and 1992, respectively. In 1992, he joined the Electrical Communication Laboratories, Nippon Telegraph and Telephone Corp. (NTT), Tokyo, Japan. At the NTT System Electronics Laboratories, Kanagawa, Japan, he was engaged in research and design of high-speed, lowpower circuits for Gbit/s LSI’s using Si-bipolar transistors with application to optical communication systems. Since 1997, he has worked on research and development of Gbit/s clock and data recovery IC at the Photonic Network Laboratory, NTT Network Innovation Laboratories, Kanagawa, Japan. Mr. Kishine is a member of the Institute of Electronics, Information and Communication Engineers (IEICE) of Japan.

Noboru Ishihara (M’89) was born in Gunma, Japan, on April 27, 1958. He received the B.S. degree in electrical engineering from Gunma University, Gunma, in 1981 and the Dr.Eng. degree from the Tokyo Institute of Technology, Tokyo, Japan, in 1997. In 1981, he joined the Electrical Communication Laboratory, NTT, Tokyo, where he has been engaged in research and development of analog IC’s for communication use. His recent work is in the area of low-power and high-speed analog IC’s for optical communications. Mr. Ishihara is a member of the Institute of Electronics, Information and Communication Engineers (IEICE) of Japan and the IEEE Microwave Theory and Techniques Society.

decreases as the loop gain increases. In The term is expressed as addition, the term (A2.2) also decreases as the loop gain increases. Therefore, the jitter in Fig. 2, expressed in (2), decreases as the loop gain increases. ACKNOWLEDGMENT The authors wish to thank H. Yoshimura and K. Sato for their helpful discussions and suggestions. REFERENCES [1] H. Ransjin and P. O’Conner, “A PLL-based 2.5b/s GaAs clock and data Regenerator IC,” IEEE J. Solid-State Circuits, vol. 26, pp. 1345–1353, Oct. 1991. [2] R. Walker, C. Stout, and C.-S. Yen, “A 2.488Gb/s Si-bipolar clock and data recovery IC with robust loss of signal detection,” in ISSCC Dig. Tech. Papers, 1997, pp. 246–247. [3] “Digital Line Systems Based on the Synchronous Digital Hierarchy for Use on Optical Fiber Cables,” CCITT Rec. G.958. [4] A. Blanchard, Phase-Locked Loops. New York: Wiley, 1976, ch. 8. [5] N. Ishihara and Y. Akazawa, “A monolithic 156Mb/s clock and data recovery PLL circuit using the sample-and hold technique,” IEEE J. Solid-State Circuits, vol. 29, pp. 1566–1571, Dec. 1994. [6] K. Kishine, N. Ishihara, and H. Ichino, “Jitter-suppressed low-power 2.5-Gbit/s clock and data recovery IC without high-Q components,” Electron. Lett., vol. 33, no. 18, pp. 1545–1546, Aug. 1997.

Ken-ichi Takiguchi was born in Kanagawa, Japan, on July 21, 1969. He graduated from Tokyo Computer School, Tokyo, Japan, in 1992. In 1992, he joined NTT Electronics Corp., Kanagawa. He has been engaged in development of high speed IC’s, especially gigabit/second PLL IC’s.

Haruhiko Ichino (M’89) was born in Yamaguchi, Japan, on January 26, 1957. He received the B.S., M.S., and Ph.D. degrees in applied physics from Osaka University, Osaka, Japan, in 1979, 1981, and 1994, respectively. In 1981, he joined the Electrical Communication Laboratories, Nippon Telegraph and Telephone Corp. (NTT), Tokyo, Japan. He has been engaged in research and development of Gbit/s SSI-MSI’s using bipolar transistors (Si bipolar transistor and AlGaAs/GaAs HBT), with application to Gbit/s optical communication systems and high-frequency satellite communication systems. His work also includes low-power Gbit/s LSI’s for SDH networks and future ATM switching systems; and O-E, E-O converter modules. During this research and development, he also worked on the modeling of a highspeed bipolar transistor, analyzing ECL gate delay and maximum operating speed of GHz flip-flop, and high-speed design methodology based on gatearray and standard-cell approaches. His interests include high-speed packaging and measurement systems. Since 1997, he has worked on research and development of Gbit/s-interface hardware design of photonic transport network systems. Currently, he is a Senior Research Engineer, Supervisor, and Photonic Network Systems Research Group Leader of the Photonic Network Laboratory, NTT Network Innovation Laboratories, Kanagawa, Japan. He was a Visiting Lecturer at Osaka University during 1995–1996. Dr. Ichino is a member of the Institute of Electronics, Information and Communication Engineers (IEICE) of Japan. He was a Secretary of IEICE’s Technical Group on Integrated Circuits and Devices.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 12, DECEMBER 2002

1781

A 10-Gb/s CDR/DEMUX With LC Delay Line VCO in 0.18-m CMOS Jonathan E. Rogers, Member, IEEE, and John R. Long, Member, IEEE

Abstract—A monolithic 10-Gb/s clock/data recovery and 1 : 2 demultiplexer are implemented in 0.18- m CMOS. The quadrature LC delay line oscillator has a tuning range of 125 MHz and a 60-MHz/V sensitivity to power supply pulling. The circuit meets SONET OC-192 jitter specifications with a measured jitter of 8 ps p-p when performing error-free recovery of PRBS 231 –1 data. Clock and data recovery (CDR) is achieved at 10 Gb/s, demonstrating the feasibility of a half-rate early/late PD (with tri-state) based CDR on 0.18- m CMOS. The 1.9 1.5 mm2 IC (not including output buffers) consumes 285 mW from a 1.8-V supply. Index Terms—Bang–bang phase-locked loop, clock and data recovery, LC delay line, phase detector, SONET OC-192, voltagecontrolled oscillator. Fig. 1. Optical receiver architecture.

I. INTRODUCTION

T

HE VOLUME of data transported over the telecommunications network increased at a compounded annual rate of 100% from 1995 to 2001 in the U.S. (and since 1997 in Europe), mainly due to increased internet traffic [1]. Contrast this with the historic demand for bandwidth, which grew at an annual rate of between 6% and 10% before the mid-1990’s. The call for technologies, such as interface electronics, which expand the capacity of fiber-based transport links to 10 Gb/s (and beyond) has risen in response to this explosion in data traffic. Systems at 10 Gb/s per channel are currently implemented in either OC-192 or STM-64 formats using the synchronous optical network (SONET OC-192) or European synchronous data hierarchy (SDH STM-64), respectively. A typical Gb/s receiver is shown in Fig. 1. It uses a photodiode to convert the incoming nonreturn to zero (NRZ) optical pulses from a single 10-Gb/s fiber channel to a current. These current pulses are then converted by the transimpedance (TZ) amplifier into a voltage. The pulses are low-pass filtered to remove out-of-band noise, thereby improving the received signal-to-noise ratio (SNR). An automatic gain control amplifier (AGC) provides additional amplification while compensating for variations in the received signal power. The clock required to retime the incoming synchronous data stream is recovered from Manuscript received April 8, 2002; revised June 25, 2002. This work was supported by Micronet, NSERC, and the Nortel Institute at the University of Toronto. J. E. Rogers was with the RF/MMIC Group, Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada. He is now with Inphi Corporation, Westlake Village, CA 91361 USA (e-mail: [email protected]). J. R. Long was with the RF/MMIC Group, Department of Electrical and Computer Engineering University of Toronto, Toronto, ON, Canada. He is now with the Electronics Research Laboratory, Delft University of Technology, 2628CD Delft, The Netherlands (e-mail: [email protected]). Digital Object Identifier 10.1109/JSSC.2002.804337

the received data by the clock/data recovery block (CDR), and the received data are regenerated. It is then applied to a 1 : N time-division demultiplexer (DEMUX) to separate the 10-Gb/s stream into multiple, lower speed channels. Typical ratios for demultiplexing are 1 : 4, 1 : 8 and 1 : 16. These lower speed channels are then further processed by VLSI CMOS circuitry designed to comply with the SONET or SDH standards. II. BANG–BANG CDR/DEMUX This paper describes the implementation of an early/late (bang–bang) phase-locked loop (PLL) based CDR in 0.18 m CMOS. For monolithic CDR implementations, a PLL implementation is preferred [2], [3] since it eliminates narrowband (i.e., high-Q) resonators, used in direct extraction of a timing tone by filtering that are difficult to integrate. Also, the use of CMOS technology is potentially advantageous from both manufacturability and cost perspectives [4], [5]. Linear and early/late (bang–bang) phase detectors (PD) have been used for Gb/s CDR implementations. A linear PD has an average output voltage that is directly proportional to the error between the data and clock phases. This linear relationship allows the loop to be designed using classical control theory. In contrast to a linear loop, the early/late phase detector outputs only signify whether the clock is early or late with respect to the ideal data sampling instant. A loop incorporating an early/late phase detector is nonlinear and has a jitter transfer bandwidth which varies with jitter amplitude. Therefore, characterizing a bang–bang loop requires time-domain numerical simulation, which appears to be a disadvantage. However, guaranteeing the stability of a linear phase-locked loop (PLL) often requires multiple design iterations, lengthening the time to manufacture. Early/late PD-based PLLs are quasi-digital systems and consequently they are more resistant to component and process

0018-9200/02$17.00 © 2002 IEEE

1782

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 12, DECEMBER 2002

Fig. 2. Early/late (bang–bang) phase detector-based CDR.

variations as well as noise. In addition, an early/late PD possesses intrinsic matching between the sampling and retiming phases, allowing operation at speeds where it is difficult to match the delay of a conventional analog phase detector to that of the retiming latch. It is this combination of robustness and speed that makes the bang–bang PD an excellent choice for an IC-based implementation. Fig. 2 shows a block diagram of the CDR-PLL implemented in this work. Data is compared to the voltage-controlled oscillator (VCO) clock at the loop input. If the falling clock edge occurs before the data transition (early), the early/late phase de. If the clock edge falls after the tector outputs a voltage, data transition (late), then the phase detector outputs a voltage . In the case where there is no data transition during the clock period, the phase detector outputs a “0” voltage. The PD output pulses are then filtered by integral and differential loop filter branches. An important parameter of the early/late phase detector based CDR is the bang–bang frequency step of the VCO, , or (1) is the output voltage of the phase detector, where gain of the proportional branch of the loop, and tuning gain of the VCO in Hz/V.

is the is the

A. System-Level Design Characterization of early/late (bang–bang) PLL behavior is described by Walker [2] and analysis of its application to SONET compatible CDR systems is found in the publication by Greshishchev [3]. At the system level, design of the bang–bang CDR is reduced to the selection of two main system parameters. The first is , which (somewhat inconthe bang–bang frequency step veniently) simultaneously governs jitter tolerance performance, the related jitter transfer bandwidth, and jitter generation. Optimal jitter generation is realized by setting the bang–bang frequency step to the lowest value which still meets the jitter tolof 3 MHz was selected, which inerance specification. A cludes some margin so that the CDR exceeds the jitter tolerance specification for OC-192. The second system parameter is the loop stability factor , defined by Walker [2] as

(2)

where is the stability factor, the gain of the integral branch of the loop, is the number of bit periods of latency around the is the unit interval or length of the bit period in loop, and seconds. This stability factor is the ratio of proportional and integral path gains with a correction for the amount of latency around the loop. A stability factor greater than unity ensures stability while the loop is not slew-rate limited in the phase domain. It should be noted that the bandwidth of the jitter transfer function (JTF) for a bang–bang loop is inversely proportional to the input jitter amplitude. A JTF for jitter amplitudes in excess of 1 UI cannot be defined, as the loop will slip cycles when such large jitter inputs are not tracked. To ensure that the loop JTF does not exhibit peaking, the stability factor is made large enough so that the proportional loop dominates during slew limiting (slope overload) at jitter inputs less than 1 UI p-p. Therefore, this design implements a large stability factor ( 1 10 ) using interchangeable (off-chip) loop filter capacitors. B. CDR Architecture Fig. 3 shows the CDR/Demux test chip at the block level. Differential 10-Gb/s data are brought on-chip through DATA and DATA pins which are broad-band matched to 50 . The half-rate early/late phase detector compares the data falling edge with the phase of the quadrature clock. The phase detector produces no output if a falling edge of the data is not present, otherwise an early or late pulse is produced. The phase information generated by the phase detector travels down two separate paths. The direct path to the oscillator provides high frequency proportional (bang–bang) control for the system. Through this tuning path, the early/late phase detector outputs translate into and from the oscillator frequency steps of center frequency. If early or late pulses are not present, the oscillator center frequency remains unchanged. The second tuning path leads to the input of the charge pump. The charge pump adds or removes charge from capacitor for early and late inputs, respectively. When the phase detector result is neither early nor late, the charge pump does not alter the charge on the capacitor (charge pump tri-state). The voltage is used for low-frequency tuning of the VCO, thus across completing the integral control loop. An on-chip capacitor serves to reduce glitches caused by the probe or bond inductances between the chip and external circuitry. When the PLL has settled into lock, the in-phase clock of the oscillator is aligned with the ideal data sampling instant. This allows the 1 : 2 DEMUX to divide the data into two streams, and these signals are buffered off-chip so that the data recovery properties of the chip can be analyzed. Although this is significantly less demultiplexing than in an actual 10-Gb/s application, it is sufficient to demonstrate feasibility. The clock signal is also brought off-chip for evaluation of the system performance. Fig. 4 shows the half-rate early/late phase detector. The representation of this block with single-ended signal paths is purely to simplify the diagram; the actual phase detector is fully differential. The architecture is similar to Hauenschild’s phase detector/demux [6], which was implemented in a BiCMOS technology. The main difference here results from modifications necessary to accommodate the relatively low transconductance

ROGERS AND LONG: 10-Gb/s CDR/DEMUX WITH LC DELAY LINE VCO IN 0.18- m CMOS

Fig. 3.

1783

CDR/DEMUX block diagram.

available from 0.18- m CMOS transistors. In the quadrature sampling paths, where metastability and hysteresis are a concern, extra buffering is used. Also, the input latch of the first flip-flop is increased in size in order to improve performance. All samples enter the logic on the same clock edge simplifying the early/late logic. The phase detector pulses are retimed at the output of the phase detector in order to remove asymmetries in both amplitude and duration from the output pulses. Note that a drawback to these modifications is unequal loading of the inphase and quadrature clock lines. Thus, care must be taken to avoid a static offset in the phase detector as this could cause a reduction in the residual jitter tolerance. Note that 1 : 2 demultiplexed data could be tapped off directly from early/late logic inputs A and B of Fig. 4. However, a separate 1 : 2 demux was used for the testchip to minimize loading of the phase detector latches, at the expense of possible phase alignment errors at the demux and a slight increase (10 mW) in power consumption. III. CIRCUIT DESCRIPTION The transistor and block level design of the 10 Gb/s CDR circuits are described in the following sections. The implementation of the phase detector is examined first, followed by an in-depth description of the LC delay line VCO. A. Phase Detector The phase detector logic is implemented in resistively-loaded MOS current mode logic (MCML). This offers several advan-

tages over conventional CMOS and other all-NMOS implementations. First, it has a relatively low output impedance making it suitable for high-speed operation. MCML logic also benefits from reduced logic voltage swing as well as from the elimination of lower mobility PMOS transistors compared to CMOS logic. The MOS equivalent of a bipolar ECL gate is not practical, especially from a 1.8-V supply due to attenuation of the signal by source followers and lack of headroom. Another benefit of using MCML is reduced switching-related supply noise, due to the relatively constant current drawn from the power supply. For improved supply rejection, the gain stages and output buffers of the ring oscillator are implemented as MCML inverter/buffers. An additional benefit of this simple design is that the clocked phase detector elements can interface with the ring oscillator without level shifting or swing adjustment. The first goal of this design is to create a buffer which has the widest possible bandwidth, while still having enough gain. A minimum value of approximately 2 for the small-signal gain was chosen, otherwise the gate noise margin becomes unacceptable. Biasing of the circuit so that the large-signal switching speed approaches maximum performance is now considered. ) is seFirst, an appropriate voltage swing ( lected. The voltage swing is made as large as possible, without forcing the switching transistors into the triode region at any time during the cycle. Larger drain-to-gate capacitance of the MOSFET in the triode region limits the switching speed. Note that gain is (first-order) dependent on the voltage swing, but

1784

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 12, DECEMBER 2002

Fig. 4. Early/late (bang–bang) phase detector.

the propagation delay time is not. When the differential pair is switched so that transistor M1 [see Fig. 5(a)] is carrying all the bias current, M1 will be in the active region (saturation) as long (i.e., ) obeys the following conas voltage swing dition: (3) is the threshold voltage of the switching transistor, where is the difference between and and is the minimum drain source voltage for which M1 is still in the active force the region. Note that signal swings in excess of transistors into the triode region, which puts an upper limit on the signal swing that is practical. Once an upper limit for the voltage swing is defined, then and it is simply a matter of deciding what proportion of are used to make up the signal swing ( ). The maximum bandwidth for the MCML circuit is achieved to (from circuit simby finding the largest ratio of ulations) for which the small-signal gain condition is still met, . Additionally, it must as this minimizes the load resistance be ensured that this minimized swing is still sufficient to fully switch the differential pair. Swings of approximately 700 mV at current densities of 50–100 A/ m of FET width (depending on the application) are typical of the CML cells used in this work. The combinatorial logic required by the phase detector is performed using MCML logic gates consisting of cascoded differential pairs [Fig. 5(b)]. These are preferred over MCML gates consisting of parallel differential pairs since they maintain two distinct logic levels for all possible input combinations despite the low transconductance typical in MOS circuits.

(a)

(b) Fig. 5.

MCML logic gates. (a) Inverter/buffer. (b) Cascode logic gate (AND).

The latch design (refer to Fig. 6) is critical to the jitter generation performance of the CDR system. Two important nonide-

ROGERS AND LONG: 10-Gb/s CDR/DEMUX WITH LC DELAY LINE VCO IN 0.18- m CMOS

Fig. 7.

(a)

1785

LC delay line oscillator.

edges. The physical layout of the MCML latch is also shown in Fig. 6. B. LC Delay Line Oscillator

(b) Fig. 6.

MCML latch. (a) Schematic. (b) Physical layout.

alities, metastability and hysteresis, directly translate into degraded jitter generation performance. This degradation occurs above and beyond the jitter generation, which is inherent in the system properties of the early/late phase detector-based PLL. Metastability occurs when positive feedback during the latch phase cannot ramp the output from the initial tracked voltage to the voltage required for reliable recognition of the dig. An approximate value for the minimum time ital state is [7] required to latch an initial voltage

(4) term is not much greater than It should be noted that the one for a typical MCML latch with resistive loading in a typical 0.18- m CMOS technology. Maximizing the tracking bandwidth reduces the pattern-dependent jitter (hysteresis) caused by the phase detector latches. Increasing bandwidth by reducing the gain creates a tradeoff between resistance to metastability and the tracking bandwidth. Tracking bandwidth is improved without increasing the likelihood of metastability by making device sizes in the input latches (e.g., the master latch) significantly larger than the slave. This reduces the relative loading of the master by the slave latch. This technique was used in the latches which sample the incoming full-rate data in the 1 : 2 demux and on the quadrature clock

Fig. 7 shows a block diagram of the two-stage LC delay line VCO. The symmetry of this architecture ensures that precise in-phase (CKI) and quadrature (CKQ) clocks are generated. The 5-GHz center frequency is tunable through external (EX), internal (IN), and high-frequency bang–bang (BB) inputs. The external tuning input is used to control the oscillator’s center frequency in laboratory testing. This input could also be incorporated into a frequency-locked calibration loop to boost the frequency acquisition range of the PLL. The internal tuning port is part of the integral tuning path, while the bang–bang tuning input completes the higher speed proportional tuning path. Circuitry in the oscillator core is fully differential (with the exception of the varactors) in order to reject supply noise. The LC delay line promotes frequency stability with supply, process, and temperature variations, without compromising tuning speed. The oscillation frequency is determined by the total propagation time through the gain blocks and differential transmission line stages. Load resistors of each MCML buffer match the output to the 75- characteristic impedance of the delay line. A delay line impedance of 75 was chosen as a reasonable compromise between power dissipation in the gain stages and attenuation of the delay lines. The delay through each gain stage is only 7 ps (simulated) due to the low impedance seen at the drains of the switching transistors. The delay line is a fully symmetric square microstrip spiral [see Fig. 8(a)]. Each delay line (two are required) has an outside dimension of 150 m, a conductor width of 4 m, conductor spacing of 2 m, and consists of top metal (aluminum) directly over the substrate. A relatively wide gap between lines reduces the interwinding capacitance [Co in Fig. 8(b)], allowing a larger tuning capacitance. The effective inductance seen between each pair of input and output ports is approximately 2.5 nH with a parasitic capacitance of 120 fF. The inductance per unit area increases due to mutual coupling between interwound delay lines. This also reduces chip area compared to the alternative implementation requiring two separate delay lines between gain stages. In addition to saving chip area, a differential delay line also improves common-mode rejection of the oscillator. This is

1786

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 12, DECEMBER 2002

(a)

(a)

(b) Fig. 9. NMOS varactor. (a) Varactor structure. (b) Varactor C –V curve.

(b) Fig. 8.

Balanced LC delay line. (a) Physical layout. (b) GEMCAP2 model.

TABLE I SIMULATED VCO SENSITIVITIES

because the inductance seen in common mode is less than when differentially driven, and this places common-mode oscillations outside the bandwidth of the gain stages. The delay lines account for 43 ps of delay time each at 5 GHz or 86% of the total delay around the oscillator loop. Concentration of the loop delay in these lines makes the VCO resistant to variations in power supply, temperature and process. Table I compares simulated sensitivities of the LC delay line VCO (including tuning circuitry) with an ring oscillator composed of identical MCML gain stages (without tuning capability) and comparable center frequency. Supply pulling is over an order of magnitude lower for the delay line oscillator, at 45 MHz/V. In addition, sensitivity of the delay line oscillator to process (based on transistor variation only) and temperature variations is substantially less than for a ring oscillator, mainly due to the dominance of inductance over capacitive parasitics in the loop delay.

The simulated temperature sensitivity includes back-end metal and substrate resistivity effects, which dominate temperature dependence of the on-chip delay line. In addition, it is important to note that the tuning response time of the delay line oscillator is on the order of the clock period (comparable to a conventional ring oscillator), which is important in the bang–bang PLL application. GEMCAP2 [8] is used to derive a SPICE-compatible lumpedelement model for the delay line [see Fig. 8(b)] and refine the oscillator design. Each pi-section of this model corresponds to an individual conductor segment of the delay line, with circuit elements representing the self-inductance, frequency-dependent resistances (e.g., skin effect), capacitance to the substrate as well as the capacitance and loss of the substrate itself. Also modeled is the capacitance and mutual magnetic coupling between windings. These elements are then combined to form the multi-segment delay line model which is employed in transient simulations of the oscillator. The remaining capacitance required for oscillation at 5 GHz is added by inversion-mode NMOS varactors at each gain stage input (a high impedance node). Integral loop tuning range is designed at 32.5 MHz/V, and the (simulated) voltage swing at the clock buffer inputs is 2.5 V differential, which consumes extra power but improves switching speed of the MCML logic. The varactors were selected based on practical issues related to the fabrication technology [9]. Fig. 9 shows a cross section and the tuning characteristics of the inversion mode NMOS varactor used in the VCO. This varactor was chosen because the four-terminal NMOS transistor model in the IC design kit could be used for circuit simulations without modification. IV. EXPERIMENTAL RESULTS Table II summarizes the measurements for the LC delay line oscillator. Phase noise spectral density of the VCO running

ROGERS AND LONG: 10-Gb/s CDR/DEMUX WITH LC DELAY LINE VCO IN 0.18- m CMOS

1787

TABLE II MEASURED VCO PERFORMANCE

Fig. 12.

CDR clock and data waveforms (10-Gb/s operation).

Fig. 13.

Measured CDR jitter tolerance. TABLE III CDR SUMMARY

Fig. 10.

Oscillator and CDR phase noise performance.

open-loop and phase-locked and the phase noise of the reference source are all plotted in Fig. 10. At a 1-MHz offset from the carrier, the free-running VCO phase noise is 103 dBc/Hz, which falls to 127 dBc/Hz when the CDR is locked to a sinusoidal data input at 2.5 GHz. The reference source phase noise is also shown ( 135 dBc/Hz). The difference is primarily due to frequency multiplication between the reference source (2.5 GHz) and locked VCO (5 GHz), which adds a minimum of 6 dB to the phase noise. The fabricated oscillator has a measured center frequency of 4.45 GHz (10% slower than predicted by simulations). Subsequent measurements of individual component test structures revealed that the frequency shift is caused by unanticipated loss and delay between the oscillator gain stages. Excessive resistive losses in the top metal, inaccuracy in the modeling of parasitic capacitances, and stray inductance between the delay line and gain stages (which is not extracted from the physical layout for simulation) all contribute to this error. The measured capacitance per unit length and resistance per unit length of the delay line are 45% and 28% higher, respectively, than those derived from the same simulation. This result exposes a sensitivity of the circuit to the absolute loss and capacitance in the delay line and, more importantly, sensitivity to variations in these parameters over process. Work is ongoing to analyze the architecture for variations in the properties of the backend metal/dielectric

stack and the actual magnitude of these variations. This characterization work is aimed at improving the accuracy of the CAD models thereby allowing better correlation between the simulation and measurement. Nevertheless, it is important to note that oscillators from two separate fabrication runs showed only 0.2% variation in VCO center frequency. The measured external tuning range and bang–bang frequency step (varied using the BB input) are 125 and 2.5–5 MHz, respectively. Characterization of this varactor using a separate test structure showed 40% less capacitance variation than expected, thus explaining the smaller tuning ranges observed. A VCO center frequency close to 4.98 GHz is needed in order to conduct full-speed testing of the CDR including bit error rate testing (BERT) at the SONET OC-192 rate (9.953 Gb/s). A noninvasive technique for adjusting the oscillation frequency was

1788

Fig. 14.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 12, DECEMBER 2002

A 10-Gb/s CDR testchip micrograph.

developed so that a design iteration is avoided, thereby allowing the prototype to be fully characterized. The method is shown schematically in Fig. 11, where a metal plate is placed in close proximity to the delay line using a micro-manipulator. Current induced in the metal plate reduces the self-inductance of the delay lines. Since delay between oscillator stages is proportional to line inductance, the frequency increases when the inductance is lowered. However, the plate must be placed within approxim from Fig. 11) to be mately 10 m of the IC surface ( effective. Conductivity of the metal plate is important (i.e., gold or a similar metal is used), as resistive losses actually increase the signal delay and slow down the oscillator. An unwanted secondary effect is additional interwinding capacitance that results from placing another conductor in close proximity to the delay line, which acts to reduce the oscillation frequency. The inductive effect dominates, however, with the net result that the center frequency is adjustable from 4.45 to 5.5 GHz with negligible effect on phase noise. The oscilloscope eye pattern of Fig. 12 is measured in response to 10-Gb/s PRBS input data (2 –1). Error-free data recovery at 10 Gb/s was measured in BER tests with a recovered clock jitter of 1.2 ps rms, or 8 ps p-p. The 5-Gb/s output data eye has larger jitter than the clock due to pattern dependencies that are likely introduced by bandwidth limitations of the 50output buffers used for testing. Note that a 1 : 8 or 1 : 16 demultiplexer would be used in a typical application, which relaxes the bandwidth requirements for off-chip buffering of the recovered data. Measured jitter transfer, generation, and tolerance all meet the SONET OC-192 requirements (measured jitter of 8 ps p-p),

with the exception of the small residual jitter tolerance. The jitter tolerance exceeds specifications at low jitter frequencies but is very close to the SONET mask at higher frequencies (see Fig. 13). Poor electrical contact from probes to the chip, phase error between the quadrature clocks, and mismatch between the demux and PD latches are likely sources of degradation in the jitter performance at higher frequencies. Poor electrical contact is partly due to wear caused by mechanical scrubbing of the pad by the probe tip. Repeated contacts were needed to trim the oscillation frequency before measuring the jitter tolerance, which caused significant wear of the pad metal and inconsistent electrical contacts. CDR performance is summarized in Table III. A photomicrograph of the 1.9 1.5 mm IC is shown in Fig. 14. The input and output data lines are implemented in 50 microstrip. The pad configuration used was dictated by the RF on-wafer probes used for test. In order to increase the isolation between the oscillator and the data path circuits, power supplies are kept separate. The layout also includes an extensive bottom metal ground plane which provides the reference plane for the microstrips as well as increasing the capacitance from substrate to ground. The IC consumes 285 mW from a 1.8-V supply (not including 50- test output drivers).

ACKNOWLEDGMENT Circuit fabrication was facilitated by the Canadian Microelectronics Corporation. The authors thank Dr. Y. Greshishchev and Dr. P. Schvan for providing access to test facilities at Nortel Networks’ Ottawa Laboratories.

ROGERS AND LONG: 10-Gb/s CDR/DEMUX WITH LC DELAY LINE VCO IN 0.18- m CMOS

REFERENCES [1] M. J. Reizenman, “Optical nets brace for even heavier traffic,” IEEE Spectr., pp. 44–45, Jan. 2001. [2] R. C. Walker, C. L. Stout, J.-T. Wu, B. Lai, C.-S. Yen, T. Hornak, and P. T. Petruno, “A two-chip 1.5-GBd serial link interface,” IEEE J. Solid-State Circuits, vol. 27, pp. 1805–1811, Dec. 1992. [3] Y. M. Greshishchev, P. Schvan, M. Xu, J. Showell, J. Ohja, and J. E. Rogers, “A fully integrated SiGe receiver IC for 10-Gb/s data rate,” IEEE J. Solid-State Circuits, vol. 35, pp. 1949–1957, Dec. 2000. [4] J. Savoj and B. Razavi, “A 10Gb/s CMOS clock and data recovery circuit with frequency detection,” in Proc. ISSCC, San Francisco, CA, Feb. 2001, pp. 78–79. [5] J. E. Rogers and J. R. Long, “A 10Gb/s CDR/demux with LC delay line VCO in 0.18m CMOS,” in Proc. ISSCC, San Francisco, CA, Feb. 2002, pp. 254–255. [6] J. Hauenschild, C. Dorschky, T. Winkler von Mohrenfels, and R. Seitz, “A plastic packaged 10 Gb/s BiCMOS clock and data recovering 1 : 4demultiplexer with external VCO,” IEEE J. Solid-State Circuits, vol. 31, pp. 2056–2059, Dec. 1996. [7] D. A. Johns and K. Martin, Analog Integrated Circuit Design, First ed. New York: Wiley, 1997. [8] J. R. Long and M. A. Copeland, “The modeling, characterization, and design of monolithic inductors for silicon RFICs,” IEEE J. Solid-State Circuits, vol. 32, pp. 357–367, Mar. 1997. [9] P. Andreani and S. Mattisson, “On the use of MOS varactors in RF VCOs,” IEEE J. Solid-State Circuits, vol. 35, pp. 905–910, June 2000.

1789

Jonathan E. Rogers (S’00–M’01) received the B.A.Sc. degree in engineering science (electrical option) and the M.A.Sc. degree in electronics from the University of Toronto, Toronto, ON, Canada, in 1991 and 2001, respectively. During his undergraduate work, he spent 16 months working at Nortel, Ottawa, ON, where he participated in the design and characterization of SiGe MMIC for OC-192 applications. His graduate work focused on the implementation of 10-Gb/s clock and data recovery systems in deep submicrometer CMOS. In October of 2001, he joined Inphi Corporation, Westlake Village, CA, where he is working to develop physical layer solutions for 10and 40-Gb/s communication systems.

John R. Long (S’77–A’78–M’83) received the B.Sc. degree in electrical engineering from the University of Calgary, Calgary, AB, Canada, in 1984, and the M.Eng. and Ph.D. degrees in electronics engineering from Carleton University, Ottawa, ON, Canada, in 1992 and 1996, respectively. His current research interests include low-power transceiver circuitry for highly integrated radio applications and electronics design for high-speed data communications systems. Prof. Long is a member of the ISSCC, IEEE BCTM and ESSCIRC conference technical program committees and is an Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS. He received the NSERC Doctoral Prize and Douglas R. Colton and Governor General’s Medals for research excellence and a Best Paper Award from ISSCC 2000.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998

713

A 0.5- m CMOS 4.0-Gbit/s Serial Link Transceiver with Data Recovery Using Oversampling Chih-Kong Ken Yang, Student Member, IEEE, Ramin Farjad-Rad, Student Member, IEEE, and Mark A. Horowitz, Senior Member, IEEE

Abstract—A 4-Gbit/s serial link transceiver is fabricated in a MOSIS 0.5-m HPCMOS process. To achieve the high data rate without speed critical logic on chip, the data are multiplexed when transmitted and immediately demultiplexed when received. This parallelism is achieved by using multiple phases tapped from a PLL using the phase spacing to determine the bit time. Using an 8 : 1 multiplexer yields 4 Gbits/s, with an on-chip VCO running at 500 MHz. The internal logic runs at 250 MHz. For robust data recovery, the input is sampled at 32 the bit rate and uses a digital phase-picking logic to recover the data. The digital phase picking can adjust the sample at the clock rate to allow high tracking bandwidth. With a 3.3-V supply, the chip has a measured bit error rate (BER) of <10014 .

I. INTRODUCTION

T

HE increasing demand for data bandwidth in networking has driven the development of high-speed and low-cost serial link technology. Applications such as computer-tocomputer or computer-to-peripheral interconnection are requiring gigabit-per-second rates either over short distances in copper or longer distances in fiber. CMOS technology is used increasingly over GaAs or bipolar technologies because of the development toward faster and faster devices. In 0.18m CMOS technology, the -channel is expected to equal or exceed that of the standard 0.5- m GaAs process. While other technologies are limited in the number of transistors due to yield or power, CMOS technology allows implementation of complex digital logic enabling more integration of the backend processing, lowering the cost. Recent development has shown CMOS capability to achieve Gbit/s data rates [1], [5], [6], [8], [11]. This work pushes NRZ signaling rates to the bandwidth limitations of the process technology and explores the issues involved. The primary components of a link are the transmitter, the receiver, and the timing recovery circuits. Section II describes the overall architecture of the link. Because many of the circuits in the transmitter and receiver blocks have been previously discussed [1], this paper focuses on the timing recovery technique. Section III evaluates the impact of timing recovery on performance and compares two different timing recovery techniques: phase-locked loops versus oversampled phase picking. This chip implements a phase-picking algorithm that is discussed in Section IV. The measured performance of Manuscript received September 1997; revised December 3, 1997. The authors are with the Center for Integrated Systems, Stanford University, Stanford, CA 94305-4070 USA. Publisher Item Identifier S 0018-9200(98)02225-2.

Fig. 1. Transmit architecture.

the entire transceiver chip is presented in Section V. Finally, some conclusions are drawn from these results in Section VI.

II. ARCHITECTURE A 0.5- m CMOS technology is not fast enough to directly generate and receive a 4-Gbit/s stream (since the maximum ring oscillator frequency is <2 GHz). Instead, we use parallelism to reduce the performance requirements of each circuit. The transmitter generates the bit stream by an 8 : 1 multiplexer that multiplexes current pulses directly onto the output channel (Fig. 1). The receiver (Fig. 2) performs a 1 : 8 demultiplexing by sampling with a bank of input samplers. Similar to the transmitter, each sampler is triggered by individual clock phases. Furthermore, clock/data recovery is achieved by a 3 oversampling of each data bit. Thus, the receiver requires a total of 24 clock phases to support both the oversampling and the 1 : 8 demultiplexing. Various techniques exist for generating multiple clock phases [2], [3]. The receive side uses a sixstage ring oscillator ( -PLL) followed by phase interpolators to generate intermediate phases (ick[23 : 0]) between the ring -PLL, eight oscillator edges (ck[11 : 0]) [1]. Similar to the different clock phases tapped from a four-stage ring oscillator ( -PLL) control the transmitter multiplexing. A timing recovery circuit extracts the clock from the multiple samples per bit by finding the positions of the data transitions. Once the transitions are determined, a decision logic selects the samples furthest from data transitions (phase picking) as the received data byte. This approach is similar to

0018–9200/98$10.00  1998 IEEE

714

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998

Fig. 3. Transceiver test-chip block diagram.

limits the run length to <5 consecutive zeros or ones. The PRBS sequence is a suitable substitute because it guarantees a maximum run length of 7. The transmitter can be optionally configured to transmit the PRBS sequence, a fixed sequence, or the received data for testing. III. TIMING RECOVERY

Fig. 2. Receive architecture.

what is done in UART’s, and was first applied to a high-speed link by Lee et al. in [4]. Fig. 3 shows the full transceiver test-chip block diagram. Since the sampling clocks are different phases, the sampled results are resynchronized to a global clock. To facilitate the digital design, the on-chip data are further demultiplexed (2 : 1) to 250 MHz. Finally, in order to test the bit-error rate (BER), an on-chip parallel pseudorandom bit sequence sequence. (PRBS) encoder and decoder are used for a Serial data are commonly encoded with 8B10B coding which

The goal of the timing recovery scheme is to maximize the timing margin—the amount that a sample position can err with the data still properly received. Errors that impact the timing margin can be classified into two sources: static phase error, and jitter (dynamic phase error). Fig. 4 illustrates the where is the timing margin static sampling error, and and are the jitter on the data transition and the sampling clock. Since the sampling position is defined with respect to the data transition, jitter on both the clock and the data additively reduces timing margin. With ideal square pulses, as long as the sum of the magnitudes of the static and dynamic phase error is less than a bit time, the phase error does not impact signal amplitude. However, in a band-width limited system (for this work, due to the process technology), signal amplitude is lower with sampling phase error because the signals have finite slew rates. Correspondingly, this reduces the signal-to-noise ratio (SNR), hence impacting performance. The amount of SNR degradation can be calculated based on the shape of the signal waveform. For static phase error, the SNR penalty is shown in Fig. 5 for a triangular signal waveform and a sinusoidal signal waveform. When the sample position phase offset is small, the sinusoidal waveform has a lower penalty than a triangular waveform due to the lower signal slew rate near the sample point.1 For jitter, the SNR penalty is more complex to evaluate since it additionally depends on the statistics of the noise. For example, we can 1 This

penalty is only applicable to transitions.

YANG et al.: 4.0-Gbit/s SERIAL LINK TRANSCEIVER

715

Fig. 4. Timing margin.

Fig. 6. BER versus SNR with various amounts of phase noise.

(a)

Fig. 5. SNR penalty for different phase offsets.

assume an idealized jitterless system with signal amplitude and additive white Gaussian noise (AWGN) of standard on the signal amplitude. In this system, we can deviation determine the performance (BER) for various SNR [14]:

a

A

A

(1)

This equation is plotted as the lowest dotted line in Fig. 6. If we further assume jitter to be a AWGN as well, for a triangular waveform, the phase noise can be translated into (where the bit time amplitude noise using spans ). Since the noise sources are additive, the probability of error can be simply expressed as

(2)

Fig. 6 illustrate the BER versus amplitude SNR for various amounts of phase noise. The SNR penalty, as shown in the figure, increases at higher SNR because the phase noise eventually limits performance, a “BER floor.” For a sinusoidal signal waveform (with a lower slew rate near the sample point), the behavior is similar, except with lower SNR penalty.

(b) Fig. 7. Clock recovery architectures: (a) phase picking block diagram and (b) data/clock recovery architectures.

The amount of phase error and the jitter depends on the implementation of the clock recovery circuit. Two techniques are commonly used, a phase-locked loop (PLL) and a phase picker. A PLL employs a feedback loop that actively servos the sampling phase of an internal clock source based on the phase of the input [7]. Fig. 7(a) illustrates a common VLSI implementation using an on-chip voltage-controlled oscillator (VCO) as the clock source, and a charge pump following the phase detector to integrate the phase error. A phase picker, as shown in Fig. 7(b), oversamples each bit, and uses the oversampled information to determine the transition position (phase) of the data. Based on the transition information, the best sample is then selected as the data value (UART [10]). Each of the two architectures has a different tradeoff in terms of static phase error and jitter. The static phase error of a PLL depends mainly on its phase detector design. Ideally, sampling at the middle of the bit window gives the maximum timing margin. However, if the sampler has a setup time, the middle of the effective bit

716

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998

window is shifted by the setup time. Not compensating this shift causes significant static phase error. This error can be reduced by using the data samplers as the phase detector.2 Additional phase error occurs due to inherent mismatches within the phase detectors and/or charge pump. Furthermore, any phase detector “dead band” (window in which the phase detector does not resolve phase information) limits the phase resolution, increasing the static phase error. In a phase-picking architecture, the multiple samples per bit are used to find the transitions, effectively behaving as the phase detector. Sampler uncertainty limits the resolution of the transition detection. Sources of this uncertainty are sampler metastability window and data dependence of the sampler setup time. The uncertainty window for the sampler design used is <1/10 the bit time which does not impact performance significantly. More importantly, in this architecture, the phase information is quantized by the oversampling, causing a finite quantization error of 1/2 the phase spacing between samples. For a higher oversampling ratio, this static phase error is less, but it has a significant cost of increasing the number of input samplers, increasing the input capacitance, and hence limiting the input bandwidth. For a 3 oversampling system, the maximum static phase error is 1/6 the bit time. In terms of jitter, a PLL tracks the phase of the input data with a tracking bandwidth limited by the stability of the feedback loop. The loop tracking is effectively a highpass filter that rejects the phase noise of the input at lower frequencies. The noise not tracked appears as data jitter. Furthermore, because the PLL frequency source is an onchip VCO, supply and substrate noise from on-chip digital switching can introduce additional jitter. The impact of these two sources is formulated for a second-order PLL in the following equation as the first and second terms:

(3) Constants that determine the loop bandwidth in the equation (V/rad) the gain of the filter, are depicted in Fig. 7(a) with the stabilizing zero in the filter, and (rad-hertz/V) the gain of the VCO. is the noise induced onto the VCO, and is the sensitivity of the VCO to this noise. Thus, the total amount of “effective jitter” depends on the tracking bandwidth of the loop, the amount of supply and substrate noise, and the sensitivity of the loop elements to the noise. Because the feedback loop has a loop delay of at least one clock cycle, the bandwidth of the loop is often chosen to be <1/10 of the oscillation frequency for sufficient phase margin and stability. The delay makes tracking high-frequency phase noise ineffective because, if the phase error from on transition is independent of the phase of the next transition, correction 2 This causes additional difficulties because such phase detectors can only determine if transitions are early or late. The control loop is “bang–bang” control instead of linear control, which is less stable, has inherent dithering, and requires additional frequency acquisition aid. Although a DLL (delay-line based PLL) [8] can be used to eliminate the stability and frequency acquisition problems, the phase spacing, when tapping phases from the buffer stages, is sensitive to the input clock’s duty cycle and amplitude.

(a)

(b) Fig. 8. Effect of tracking bandwidth on jitter.

based on the first transition’s phase information could increase the phase error for receiving the next bit. The impact of different tracking bandwidth on jitter is illustrated in Fig. 8. The single sideband power spectral density (PSD) of an oscillator, such as the VCO of the transmitter, is shown to represent the phase noise in Fig. 8(a). Two hypothetical PLL’s with different bandwidths ( , and )3 behave as high-pass filters that reject the lower frequency noise. Their transfer functions are overlaid in Fig. 8(a). The resulting phase error is shown in the PSD of Fig. 8(b). Note that this example excludes the additional noise from the phasetracking circuit [second term of (3)]. The integral of the area beneath the curve is an indication of the amount of jitter [13] [ for (2)]; thus, the phase noise of Circuit I is larger than that of Circuit II. Additionally, if a second-order PLL is not critically damped, the transfer function can exhibit peaking. This peaking accumulates phase noise at its loop bandwidth, increasing the noise. For a phase picker, the sampling clocks experience similar jitter problems from supply and substrate noise since the phases for the oversampling are also generated from an onchip VCO. The primary difference is the tracking bandwidth. A phase-picking system is a feedforward architecture (instead of feedback); thus, there are no intrinsic bandwidth limitations. The tracking rate depends on the rate at which new phase decisions are made, which in turn depends on the logic’s cycle time. The importance of this fast tracking is that it can potentially track the accumulation of phase noise by the onchip multiphase generator (PLL). We delay the data by the time to arrive at a decision so the corrections are applied to the appropriate bit (although with a latency overhead). However, the maximum phase change between two transitions must be 3 The actual shape of the tracking transfer function implementation.

H (s)

varies with

YANG et al.: 4.0-Gbit/s SERIAL LINK TRANSCEIVER

less than , half the bit time, even if the peak-to-peak jitter can greater than are be much larger than a bit time. Changes indistinguishable from a phase shift in the opposite direction, . Choosing between the two clock recovery systems depends on the system requirements and noise behavior. We chose a phase-picking architecture to explore the usefulness of the higher phase-tracking capability. In such VLSI implementations, supply noise can be significant enough for the peak-topeak jitter to occupy a large fraction of the bit time, especially since a PLL accumulates jitter. For the 4-Gbit/s link, we chose a low oversampling ratio of 3 to maintain high input bandwidth and to keep the number of clock phases manageable (1 : 8 demultiplexing and 3 oversampling yields 24 phases). With a bit time of 250 ps, the phase-picking scheme4 can track the noise of the on-chip multiphase generator (PLL) from both the transmit and receive sides to keep the total “effective jitter” below the 83-ps quantization spacing. One limitation of the phase-picker tracking is that the maximum rate of the tracking depends on the data transition density. Since the PRBS signal guarantees one transition per byte, the maximum tracking rate of one sample spacing every transition is fast (83 ps/2 ns). Although the tracking rate is high, the maximum static phase error from the quantization is 41 ps (2% of the clock period, 8 bit time), causing an SNR penalty (Fig. 5). Whether or not a 3 oversampled phase-picking approach with higher tracking bandwidth than a PLL can achieve better performance with the larger static phase error depends on the amount of jitter induced by on-chip noise sources. If the lower SNR penalty from the lower jitter compensates the higher SNR penalty of larger static phase error, phase picking would be the better choice. IV. PHASE-PICKING ALGORITHM AND IMPLEMENTATION The details of the phase-picking algorithm are illustrated in Fig. 9. Picking the center sample requires finding and tracking the bit boundaries. The decision logic first detects transitions by an XOR of adjacent samples, indicating the bit boundary to be in one of three possible positions. Fig. 10 shows an example of the boundary detection with a portion of a sampled stream. To find which of the three transition positions is the most likely bit boundary, transitions corresponding to the same bit boundary position are tallied. The position with the largest total determines the bit boundaries. The decision logic makes a new decision per byte of data. In contrast to a higher order oversampling phase picker, the 3 oversampling limits the change of the selected sample position to one sample position per byte. To guarantee sufficient transitions for averaging any bit-to-bit variations of highfrequency noise (near the bit rate), the tally is across a sliding window of 3 bytes. The transitions are accumulated from the current byte, the previous byte, and the next byte (delaying the data allows the noncausal information) so that the decision is applied to the byte at the middle of the window. As a result 4 In

our system, the oscillator is at 250 MHz so the PLL bandwidth is restricted to <25 MHz. This yields a 10 tracking rate difference between the two systems.

2

717

Fig. 9. Phase-picking algorithm block diagram.

Fig. 10. Example of the phase-picking algorithm.

of the 3-byte sliding accumulation, the rate of phase change that the algorithm can track is slower than the maximum of 83 ps/2 ns. The algorithm picks the correct sample if the majority of the transition information within the 3-byte window (6 ns) indicates the correct phase. For example, if the input phase has a constant rate of change of <1 sample spacing per 3 ns (corresponding to a frequency difference of 4%), the transition information from >1.5 bytes of the 3-byte window would fall in the same phase quantization. Then the tally and compare would select the correct sample to track the phase change. This indicates a maximum phase-tracking rate of 83 ps/3 ns. and -PLLs’ accumulation The criterion of tracking both is met because the VCO elements’ supply noise sensitivity is %/% (percent of frequency change per percent of supply noise [1], [3]),5 corresponding to 30 ps/3 ns for a 10% supply step, which is less than the tracking rate. If the phase change is slower than 83 ps/3 ns, the 3-byte accumulation offers some robustness by averaging any uncertainty in the transition detection due to high-frequency bit-to-bit noise. A smaller window of one byte can track phase faster, but has poorer performance without sufficient transitions within that byte to average the bit-to-bit variation. A larger window of 5 bytes (<83 ps/6 ns) would be too slow to track the - and -PLLs’ phase accumulation under reasonable supply noise. Once the transition position is determined, the middle sample within the bit boundaries is selected as the data. 5 Although the maximum phase error accumulation rate is based on the supply sensitivity of the VCO, the peak phase error depends on the loop bandwidth. The Tx -PLL and Rx -PLL generating the multiple clock phases have bandwidths of 15 and 5 MHz, respectively.

718

Fig. 11.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998

Comparison between center picking versus majority voting.

The selection is implemented by multiplexers selecting the appropriate samples based on three select signals. In the case where no transitions are detected, the three select signals use previously stored values to maintain data through the multiplexers. The actual algorithm for deciding the received data value from the oversampled information can be designed alternatively while still keeping the advantage of higher tracking bandwidth of a feedforward architecture. Instead of selecting the middle (“phase pick”), a simple alternative implementation is to take a majority vote based on the three sampled values. Fig. 11 shows the performance comparison. Majority voting works well with nonbandwidth-limited signals that have highfrequency noise because it averages the noise over many samples. In a bandwidth-limited system (low-pass filtered by time constant), it performs worse because at least the I/O one of the two nonmiddle samples is required to be valid, and the nonmiddle samples have a much higher probability of error. Arbitration is required when two transition positions have equal counts. This occurs when two of the sample positions straddle the center of the bit and the third sampler samples at the transition. Picking either of the two straddling the center gives equivalent performance. More complex logic can be implemented by using the previous, current, and next cycles’ comparison results to follow the direction of any phase transition. However, this only improves the performance by less than 1 dB. If the peak-to-peak phase jitter is larger than one bit time, or if the transmitter and receiver operate at different frequencies, the tracking must allow bit(s) to overflow/underflow. For example, if the SEL[2 : 0] signal changes from 0–0–1 to 1–0–0, the selected sample of the first cycle corresponds to the same bit as the selected sample of the following cycle. This “underflow” condition must be appropriately handled by dropping one of the two samples. Typically, these samples are of the same bit, and thus have the same value. However, in the case where they are different, if phase movement changes directions (the SEL signal returns to 0–0–1) in the following cycle’s decision, dropping the latter one gives a slight performance improvement. Similar to the “underflow”

Fig. 12. Chip micrograph.

where only 7 bits are received, the opposite transition from 1–0–0 to 0–0–1 causes an “overflow,” requiring an extra bit (9 bits total) to be stored. These conditions are handled by a bitwise FIFO built by shifting the input byte to accommodate the one extra/less bit. If the aggregate shift increases beyond 1 byte, a bytewise FIFO handles the overflow/underflow byte. The limited depth of the FIFO can only handle a finite number of byte overflow. If the application requires handling long streams of data with a slight frequency difference with the local reference clock, the local frequency can be corrected based on the phase information from the decision logic.6 V. TRANSCEIVER EXPERIMENTAL RESULTS The transceiver chip was implemented in a 0.5- m CMOS process offered through MOSIS. The 3 mm 3 mm die photo is shown in Fig. 12. The chip is packaged in a 52pin CQFP package supplied by Vitesse Semiconductor which has internal power planes for controlled impedance. The size 70 m to of the I/O bond pads are reduced to 70 m keep pad capacitance to a minimum because the capacitance would otherwise limit the I/O bandwidth. With an effective impedance at the I/O of 25 (for a doubly terminated 50line), the total I/O capacitance can not exceed 4.5 pF for 4-Gbit/s operation without losing 10% of the bit height to the filtering. The 1 : 8 demultiplexing receiver and 8 : 1 multiplexing transmitter designs have capacitances of 2.2 and 1.2 pF, respectively, with 600 fF due to the pad and metal interconnects. An input time constant of ps is estimated from measurements sweeping the reference voltage for a single-ended input pulse. The width of the pulse with a different reference voltage determines the time constant. The performance of the link depends significantly on the I/O circuits. The minimum receivable amplitude of 50 mV was measured by using a fixed data pattern while changing 6 This

feature is not implemented as part of this test chip.

YANG et al.: 4.0-Gbit/s SERIAL LINK TRANSCEIVER

Fig. 13.

Transmitter data eye.

the amplitude. This indicates the worst case input offset in the bank of samplers. The transmitter data eye at 3.0 Gbits/s is shown in Fig. 13 with the output driving a PRBS sequence. The measured data rate is limited by the triggering bandwidth of the oscilloscope. The maximum speed of the transmitter was 4.8 Gbits/s, and was limited by the maximum frequency of the ring oscillator used in the clock generation. The multiple-phased clock generation (PLL) is crucial to the performance of the link because the phase spacing determines the bit time in the multiplexing/demultiplexing architecture, and the supply sensitivity and loop bandwidth determine the amount of jitter that needs to be tracked. Mismatches can cause one phase to be shifted with respect to the others. In the transmitter, the shift enlarges one bit, but reduces the next. By measuring the spacing between edges, we can evaluate the ability to match the phases tapped from the oscillators and interpolators [3]. The differential nonlinearity (DNL) of the phase spacing is plotted for the transmitter in Fig. 14 at various frequencies. The error is expressed as a percentage of the ideal bit time for all eight phase positions. While transmitting the PRBS pattern and using a trigger frequency of 1/8 the data rate (internal clock rate), these spacings are measured with a 20-GHz bandwidth digital oscilloscope by the width of each of the eight data-eye patterns.7 If we use the data-rate frequency as a trigger instead of using a divided frequency, the data eye of Fig. 13 overlaps all eight of the bits. The overlaid histogram shows that the 333-ps bit time is degraded by 90 ps due to equal contributions from jitter and errors in the transmitter phase spacing. The peak-to-peak variation, <±7% of the bit time, indicates very little degradation in bit width due to mismatches. The dominant cause of these bit-width variations is the and 7 The

719

measurement uncertainty is the DNL is ±2 ps.

Fig. 14. Transmit-side DNL at various frequencies.

mismatches of the transistors in the clock generation circuits [12]. The increase in error with decreasing oscillation frequency, shown in Fig. 14, is an indication of these mismatches. ) is less at lower oscillation The gate overdrive ( frequencies, making the phase spacing more sensitive to these mismatches. Fig. 15 shows the measurement of the DNL for four chips. The darker line indicates the average at each phase position. The variation of this average across phase positions potentially indicates some systematic error. However, because the average is over a sample size of only four chips, and the variation of the average is significantly smaller than the variation between chips, the random component is believed to be the dominant source of static phase spacing error. Although a systematic component of the offset can also be expected from noise at any integer multiple of the oscillator

720

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998

Fig. 16. BER testing configuration.

Fig. 15.

Transmit-side DNL for four chips.

frequency, it is not apparent in Fig. 15. Normally, noise such as substrate or supply noise at the same frequency as the oscillator would modulate the oscillator, causing a duty-cycle error which spreads the phases in the first half cycle and compresses the phases in the second half cycle. Since most (250 of the digital logic clock on this chip switches at MHz), this effect of the clock buffer switching on the 500 MHz oscillator would cause different phase spacings for two consecutive oscillator cycles. However, Fig. 15 shows that the average phase spacing errors from the second cycle is nearly the same as the first cycle, indicating that this coupling is negligible. Also, any systematic components from path mismatches (e.g., capacitive loading errors) are insignificant compared to the random source. On the receive side, the DNL of the sample spacing is also measured, as was shown for a 0.8- m process technology [1] to be <8% of the bit time. Receive clock phase spacing errors reduce the effectiveness of the oversampling by increasing the sample spacing, causing both increased static phase error and larger jitter. Jitter in the transmitter can be measured by a outputting a fixed pattern and measuring the jitter on the data transition. We can also measure the sampling clock jitter by looking at the sampler output while sweeping a clean input transition. The window in which the sampler output is uncertain indicates the jitter with respect to the input. The supply sensitivitiy can also be measured by the increase in jitter due to induced supply noise with an internal switch that shorts between supply and ground. The sensitivities of the transmit and receive PLL’s are 0.2 and 0.3 ps/mV, respectively, with a similar peak-to-peak quiescent jitter of 45 ps. The BER testing is performed with two different configurations. The first measurement is by feeding the transmitted output directly back into the input. This yielded a BER of . The second configuration is by placing the chip in a mock optical network (Fig. 16). A bit error rate tester (BERT) is used to generate the data pattern. The pattern is modulated onto a fiber-optic network. The optical power is measured by siphoning 1/10 of the total optical power. The optical signal

Fig. 17. Measured BER versus SNR.

is received and amplified by a avalanche photodiode (APD) followed by an amplifier. The output of the amplifier is either returned to the BERT for the baseline measurement, or sent into the chip configured in its transceiver mode. Because the BERT and optical amplifiers have a bandwidth limitation at 3 Gbits/s, the experimental results of this configuration are limited in data rate. As shown in Fig. 14, the phase spacing at lower frequencies is worse, so the performance is slightly worse than at 4 Gbits/s. The BER versus SNR is plotted with SNR expressed in optical power showing both the baseline and the DUT with BER (Fig. 17). The SNR penalty a 1.5-dB penalty at for not having the selected sample at the middle of the data eye is shown in Fig. 18. Because of the phase spacing errors on the receive side, the penalty shown here is worse than simulated. Since the quiescent jitter of the clock generation is smaller than the sample spacing (<83 ps), the phase tracking is not active. In order to test the effectiveness of the phase picking, voltage steps are induced on the supply, causing 250-PLL. While this causes ps jitter on both the -PLL and the data eye to collapse, the receiver can still track this jitter . Also, the transceiver is operated and maintain BER with the transmitter and receiver at different frequencies. The chip was able to track a frequency difference of 1 MHz with . BER Table I shows some additional performance measurements of the chip. The total power dissipated is 1.5 W, with 1/3 from the clock generation and 1/3 from the receive-side logic. The BER is 90 minimum amplitude that can still maintain

YANG et al.: 4.0-Gbit/s SERIAL LINK TRANSCEIVER

721

when additional noise is induced. This low accumulated jitter implies that the lower tracking bandwidth of a PLL-based clock recovery circuit can potentially perform equally. The design of such a system is nontrivial, and still has challenges in maintaining small static phase offsets. However, since the phase picking has significant hardware overhead in the extra number of input samplers and large digital processing, a PLL would potentially offer similar performance with lower area and power.

ACKNOWLEDGMENT The authors would like to thank S. Sidiropoulos, B. Amrutur, K. Falakshahi, Vitesse Semiconductor, Prof. T. Lee, Prof. L. Kazovsky, and their research groups for invaluable discussions and assistance. Fig. 18.

Measured BER at various sampling phase. TABLE I TEST-CHIP PERFORMANCE

REFERENCES [1] C.-K. Yang and M. Horowitz, “A 0.8 m CMOS 2.5 Gbps oversampling receiver and transmitter for serial links,” IEEE J. Solid-State Circuits, vol. 31, Dec. 1996. [2] C. Gray et al., “A sampling technique and its CMOS implementation with 1-Gb/s bandwidth and 25 ps resolution,” IEEE J. Solid-State Circuits, vol. 29, Mar. 1994. [3] J. Maneatis and M. Horowitz, “Precise delay generation using coupled oscillators,” IEEE J. Solid-State Circuits, vol. 28, pp. 1273–1282, Dec. 1993. [4] K. Lee et al., “A CMOS serial link for fully duplex data communications,” IEEE J. Solid-State Circuits, vol. 30, pp. 353–364, Apr. 1995. [5] A. Fiedler et al., “A 1.0625Gb/s transceiver with 2 -oversampling and transmit signal pre-emphasis,” in ISSCC’97 Dig. Tech. Papers, Feb. 1997, pp. 238–239. [6] A. Widmer et al., “Single-chip 4 500 Mbaud CMOS transceiver,” IEEE J. Solid-State Circuits, vol. 31, pp. 2004–2014, Dec. 1996. [7] F. M. Gardner, Phaselock Techniques, 2nd ed. New York: Wiley, 1979. [8] W. Dally and J. Poulton, “A tracking clock recovery receiver for 4-Gb/s signaling,” in Hot Interconnect97 Proc., Aug. 1997, p. 157. [9] S. Sidiropoulos and M. Horowitz, “A semi-digital DLL with unlimited phase shift capability and 0.08–400MHz operating range,” in ISSCC’95 Dig. Tech. Papers, Feb. 1995, pp. 332–333. [10] J. E. McNamara, Technical Aspects of Data Communication, 2nd ed. Bedford, MA: Digital, 1982. [11] S. Kim et al., “An 800Mbps multi-channel CMOS serial link with 3 oversampling,” in IEEE 1995 CICC Proc., Feb. 1995, p. 451. [12] M. J. Pelgrom, “Matching properties of MOS transistors,” IEEE J. Solid-State Circuits, vol. 24, p. 1433, Dec. 1989. [13] J. A. Crawford, Frequency Synthesizer Design Handbook. Boston, MA: Artech House, 1994. [14] J. Proakis, Communication Systems Engineering. Englewood Cliffs, NJ: Prentice-Hall, 1994.

2

2

mV with an internal eye height of 65 mV. The 24 mV of amplitude noise is primarily due to ringing from the package inductance and on-chip output capacitance at the transmitter. VI. CONCLUSION Very high data rates are achievable in CMOS technologies by making extensive use of parallelism. Using an 8 : 1 demultiplexing at the input and a 8 : 1 multiplexing output transmitter, we achieved a 4-Gbit/s transceiver while keeping all internal signals <500 MHz in a 0.5- m process technology. The fundamental limitations of this approach are the I/O capacitance (increased due to the parallelism), the sampler uncertainty, and the phase position accuracy of the multiple clock phases. Provisions were made in this design to handle very large jitter accumulation of 83 ps/3 ns by a fast phase-picking algorithm. The effectiveness of this architecture critically depends on the jitter characteristics. Although a CMOS PLL can potentially exhibit this large jitter due to supply noise, the measured jitter while operating this transceiver is only 50 ps. This jitter is measured in a realistic noise environment because of the presence of significant digital switching noise from the large digital phase picker that can couple onto the VCO elements. Since the jitter is less than the quantization error, the advantage of the phase picking is only apparent

2

Chih-Kong Ken Yang (S’93) received the B.S. and M.S degrees in electrical engineering from Stanford University, Stanford, CA, in 1992. He is currently pursuing the Ph.D. degree at Stanford University in the area of circuit design for high-speed interfaces. Mr. Yang is a member of Tau Beta Pi and Phi Beta Kappa.

722

Ramin Farjad-Rad (S’95) was born in Tehran, Iran, in 1971. He received the B.Sc. degree in electrical engineering from Sharif University of Technology, Tehran, in 1993 and the M.Sc. degree in electrical engineering from Stanford University, Stanford, CA, in 1995, where he is currently a Ph.D. candidate in electrical engineering. He worked at SUN Microsystems Laboratories, Mountain View, CA, on a 1.25-Gbit/s serial transceiver for the fiber channel standard during the summer of 1995. Over the summer of 1996, he worked at LSI Logic, Milpitas, CA, where he examined different multi-Gbit/s serial transceiver architectures. Mr. Farjad-Rad holds one U.S. patent, and is also the Bronze Medal Winner of the 20th International Physics Olympiad, Warsaw, Poland.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998

Mark A. Horowitz (S’77–M’78–SM’95) received the B.S. and M.S. degrees in electrical engineering from MIT in 1978, and the Ph.D. degree from Stanford University, Stanford, CA, in 1984. He is the Yahoo Founders Professor of Electrical Engineering and Computer Science at Stanford. His research area is in digital system design, and he has led a number of processor designs including MIPSX, one of the first processors to include an on-chip instruction cache, TORCH, a statically scheduled, superscalar processor, and FLASH, a flexible DSM machine. He has also worked on a number of other chip design areas including high-speed memory design, high-bandwidth interfaces, and fast floating point. In 1990, he took a leave from Stanford to help start Rambus Inc., a company designing high-bandwidth memory interface technology. His current research includes multiprocessor design, low-power circuits, memory design, and highspeed links. Dr. Horowitz is the recipient of a 1985 Presidential Young Investigator Award and an IBM Faculty Development Award, as well as the 1993 Best Paper Award from the International Solid-State Circuits Conference.

Related Documents