

Richard C. Walker (415) 857-3754 walker@opus.hpl.hp.com

Cheryl Stout (415) 857-8460

Chu-Sun Yen (415) 857-4689

Hewlett Packard Laboratories

3500 Deer Creek Road MS 26U4 Palo Alto, CA 94304-1392

HP Design Conference, Tokyo 1997



### Abstract

Adjustment-free clock and data recovery for 2.488Gb/s SONET applications is provided by a 1.77 Watt, 3.45x3.45 mm<sup>2</sup> chip implemented in a 25GHz f<sub>T</sub> silicon bipolar process. The chip was designed for low-cost packaging, utilizing an on-chip VCO and operates from 2-3 Gb/s over process, voltage and temperature variation, requiring a single off-chip filter capacitor. For network monitoring, a highly reliable Loss-of-Signal Detector is provided with a trigger threshold programmable between  $10^{-4}$  and  $10^{-6}$  BER.

### Authors

### **Richard Walker** *Author Background*

Richard Walker was born in San Rafael CA, in 1960. He received the B.S. degree in Engineering and Applied Science from the California Institute of Technology in 1982, and an M.S. degree in Computer Science from California State University, Chico, CA in 1992.

He joined Hewlett-Packard Laboratories as Member of Technical Staff in 1981. Since that time, he has worked in the area of phase-locked-loop theory, and high-speed circuit design for both Si and GaAs IC processes. He holds 9 U.S. patents.

### Cheryl Stout Author Background

Cheryl Stout received the B.S.E.E. from San Jose State University, San Jose, CA, in 1979 and the M.S.E.E. from the University of California at Berkeley in 1983. From 1979 to 1982 seh was involved in the development of optical communication products at Plantronics, Inc.

Since 1983 she has been a Member of the Technical Staff at Hewlett-Packard laboratories in Palo Alto, CA, where she has designed high-speed silicon and GaAs integrated circuits for optical communication systems and test instruments.

### Chu-Sun Yen Author Background

Chu-Sun Yen was born in China in 1933. He received the B.S. degree from the National Taiwan University in 1955, the M.S. degree from the University of Florida in 1958, and the Ph.D. degree from Stanford University, Stanford, CA. in 1961, all in electrical engineering.

Since 1961 he has been with Hewlett-Packard Laboratories, Palo Alto, CA. He is presently a Project Manager in the Communications and Optics Research Lab.

Slide #1



Monolithic Clock and Data Recovery (CDR) for the Synchronous Optical Network (SONET) presents several challenges to the designer.

The first section covered is an overview of the general design goals.

The most critical component in a monolithic design is the high speed VCO. The functional characteristics of the VCO influence the chosen chip architecture.

A little-discussed requirement for SONET CDR applications is the loss-of-signal (LOS) function. This ciircuit has a direct impact on the overall reliability of the entire SONET regenerator system. The requirements are discussed, and a robust algorithm is presented.

Finally, the components of the phase-locked-loop (PLL) are described and measured results are presented.

Slide #2



The CDR circuit is highlighted in the view-graph.

The Front-end of a SONET receiver typicallyuses an Avalanche photodiode and a transimpedance amplifier to convert the optical signal into an electrical form. Because of variations in input optical power level, the signal is further processed by either a limiting amplifier or an Automatic Gain Control (AGC) amplifier.

The CDR is responsible for extracting Clock and Data from the input electrical signal. In addition, the CDR is responsible for the LOS output.

In many telecom regenerators, the LOS indicator is used to initiate an immediate rerouting of traffic to a redundant fiber. The LOS detector deserves careful attention due to its major impact on system reliability.

Slide #3



The goals for this design are motivated by wanting to eventually allow the design to be packaged in low-cost plastic packaging.

Although the the initial product uses a stainless steel package with a ceramic hybrid, the design itself requires no external components other than a loop filter capacitor.

This desire for monolithic implementation requires us to use an internal ring-oscillator for our internal clock source.

To achieve low-cost we also require that the design be adjustment-free, both in manufacture and in operation.

Finally, we must meet all relevant SONET specifications for Jitter and LOS performance.

Slide #4



The on-chip ring oscillator uses interpolators to smoothly adjust the overall delay around the ring from 2 gate delays to 6 gate delays, giving approximately a 3:1 tuning range.

As shown in the inset diagram, each interpolator accepts two inputs, "x" and "y". The tuning input "c" is adjusted from 0 to 1, ideally producing an output "z" equal to (x\*c + y\*(1-c)). If the interpolator nputs are quasi-sinusoidal with phase shifts less than 120 degrees, then it can be shown that the output interpolates smoothly between the two input phases.

This architecture consists of two cascaded interpolator sections to extend the tuning range while minimizing jitter.

The main tuning input is pre-filtered by two 100 MHz poles to reduce the VCO sensitivity to power supply noise. The wide-bandwidth bangbang tuning input is 500 times less sensitive than the main tuning input, and is implemented by injecting small currents into the interpolation cell.

#### Slide #5



The characteristics of the on-chip ring oscillator have a large influence on the overall design. This graph shows that, in our process, we can expect +/- 30% variation in center frequency over power supply, worst, nominal and best-case process, and temperature variation.

The VCO is designed with a wide (3:1) tuning range so that, even under situations of wide process variation, the design frequency of 2.488 GHz will still be achievable.

A fundamental pitfall of recovering clock from data streams is harmonic lock. For instance, it is impossible to tell the difference between the case where the VCO is running at twice the proper speed, and the case where all the bits happen to be sent in matched pairs.

To avoid this harmonic lock problem, the chip incorporates a "lock-to-reference" loop. This allows the VCO to be trained to a precise frequency reference under start-up and loss-of-signal conditions. Slide #6



The PLL uses a bang-bang phase detector and positive-feedback charge pump similar to thos described in Reference 3.

The phase-locked-loop portion of the circuit includes two different phase/frequency detector blocks. The first phase/frequency detector (FDET) initially trains the VCO to the desired bit frequency using an external low-frequency reference, and asserts the frequency-lock (FLOCK) signal when successful.

The second phase detector (PDET) produces three signals: data transition detect (DTRANS), data lock detect (DLOCK), and a tri-state bangbang phase-error signal. Once the VCO is frequency locked to the reference, and data transitions are present, the PLL is switched to the bang-bang phase detector.

If the data lock detector stabilizes within a certain time, then the loop remains locked to the data, otherwise the VCO is retrained to the reference clock and the sequence repeats.

In this block diagram, cycle-slippage occurs whenever an LOS indication causes the VCO to be retrained to the reference clock. For overall system reliability it is important for the LOS detector to be *Reliable*.

#### Slide #7



Three of the common ways in which an LOS circuit can be fooled are shown in this slide.

Looking first at the bottom figure, we see a phasor diagram for a normal data eye with the proper transition phase denoted by a cross at the top of the phase circle. The corresponding eye diagram is shown in the second column.

The first problem is shown in the top example, which is a digital signal of random phase generated by thermal noise passed through a limiting amplifer.

The second signal has low jitter at the proper transition location, but also has transitions at the middle of the eye. This type of signal is common when the fiber is cut, but there is a feedback signal from the VCO clock back to the sensitive transimpedance amp.

The third case is that of a cut fiber with either low amplifer gain, or with an offset voltage at the input of the amplifer. There is a complete lack of transitions at the CDR.

The LOS circuit must never indicate the locked condition for any of these three case, and must not indicate out-of-lock with valid data of acceptable BER.

#### Slide #8



The primary function of the LOS circuit is to signal an out-of-lock indication when the BER drops to an unacceptable level. Unfortunately, there is no practical way for the CDR circuit to directly measure BER without parsing the SONET payload.

The alternative is to look at the received phase jitter on the incoming signal. For a given transmission system, there will be a monotonic relationship between phase jitter and BER.

To be *Reliable*, we definine a BER trigger threshold, (eg:  $10^{-3}$ ) and require the Mean Time to Restarting (MTTR) to be less than one second at this threshold. When the BER has improved by 2 orders of magnitude (eg:  $10^{-5}$ ) we require that the MTTR be greater than one trigger event per year.

Statistical processing of the raw phase jitter is required to achieve this "snap-action" performance.

#### Slide #9



The LOS algorithm uses information from a data lock detector which operates by monitoring the location of the data zero-crossings. Transitions occurring more than +/- 135 degrees away from the nominal location are flagged as a raw Phase-Errors (PE).

The actual detector is composed of two flip-flops each clocked by the incoming data. The D-input of the flip-flops are connected to timing signals. If a data transition causes a "high" to be sampled on both latches, then that implies that the data transition is greater than +/- 135 degrees away from the proper phase-aligned location.





The phase detector block is built around a bangbang phase detector core. Two matched flipflops sample the incoming data on both the rising and falling edges of the high-speed clock.

Under locked conditions, the latch triggered on the falling edge of the clock samples the *transitions* of the data (labelled "T" above). The latch triggered on the rising edge of the clock samples the *middle* of each bit cell. The mid-bit sample prior to the "T" sample is labelled "A", and the one after the transition is labelled "B".

If "A" is different than "B", then we have detected a transition. This information will be used later to detect if the data is stuck at all ones or all zeros.

The truth table shown above allows a determination to be made whether the VCO phase is early or late, and also allows detection of an ERROR state which can only occur either when VCO is an octave slower than the data, or when there is the presence of VCO energy coupled into the transimpedance amplifier.

Slide #11



This slide gives a simplified explanation of the debouncing strategy used for LOS detection. The full circuit is shown two slides later.

*Raw phase error pulses* from the data lock detector are combined with *VCO feedback error pulses* from the phase detector. The composite signal then needs to be statistically qualified for reliable detection of LOS.

The algorithm uses two pulse stretcher circuits. The first circuit stretches incoming pulses by N bit times. If the output of this pulse stretcher is high, it implies that there has been *one or more* errors within the last N bits.

The second pulse stretcher operates on the inverted output of the first, and produces an inverted output that goes high only when the output of the first pulse stretcher has been high *without gaps* for M\*N bit times.

As explained in a later slide, this algorithm produces a reliable indication of phase-errors that can be adjusted by M and N to correspond to any desired BER trigger point.





A pair of set/reset (S/R) flip-flops efficiently implement a pulse-stretcher function without capacitors. The high speed (400ps) error pulses *set* both latches, and the latches are alternately *reset* by low speed clocks. Because the reset pulses are non-overlapping, there is always at least one of the flip-flops available to be set.

The outputs of the two S/R flip-flops are combined with an OR gate to produce a pulsestretched version of the input signal, slowed down to the timing of the low-speed clock.

This circuit efficiently slows down the high speed 2.5GHz error pulses to ~100 KHz rates, while performing needed statistical processing.

#### Slide #13



This slide describes the complete LOS state machine. The two pulse stretchers in the DLOCK chain perform the same function as in Slide #11.

The raw data lock (DLOCK) detector operates by observing phase errors. In the absence of transitions, the DLOCK indication is therefore unusable. An extra pulse stretcher has been added that monitors the transition detect information (DTRAN). The DTRAN information is processed with a pulse stretcher of 1/2 the time constant of the DLOCK pulse stretcher. This ensures that loss of signal is reliably indicated even if the input data is stuck at a one or zero.

A fourth pulse stretcher operates on the raw frequency lock errors to provide a debounced indication that the loop has properly locked to the frequency reference during startup.

The logic block implements a two-state statemachine. The state-machine powers-up in state 0, causing the loop to lock to the reference input. Once the loop is locked, the statemachine switches to state 1, where the datadriven phase detector remains in control until either transitions dissappear, or the phase error rate exceeds the design threshold.





The LOS/BER relationship is steepened by processing PE events in multi-bit "bins", requiring multiple consecutive bins to each contain at least one error before asserting LOS. The current design groups PE events into bins of size N, requiring M consecutive errored bins before asserting LOS. Assuming that the phase error rate is approximately equal to half the BER, then

$$MTTR \approx \frac{N \bullet t_{bit}}{\left(1 - \left(1 - \frac{BER}{2}\right)^{N}\right)^{M}}.$$

The exponent M in the denominator sets the slope relation between PE and MTTR. With M=7, only about 1 decade change in BER is needed to change MTTR from 1 sec to 1 year. Four different LOS thresholds between approximately  $10^{-4}$  and  $10^{-6}$  BER are selectable by bond options. The exact BER varies according to the application due to the system-dependent relationship between PER and BER.

If the link were simply restarted on isolated PE events, there would be a 1:1 relationship between BER and MTTR. To achieve an MTTR greater than 1 year would then require a BER of  $10^{-16}$ !

#### Slide #15



This slide shows the measured versus calculated MTTR for a system in which M=7, and N= $2^{16}$ . The mismatch between measurement and theory is probably due to a the BER not being 1:1 correlated with the Phase Error Rate. The exact relationship between BER and PER is system dependant.





The two-tuning input VCO architecture, in conjunction with a binary-quantized phase detector, results in a VCO drive voltage equivalent to a first order sigma-delta conversion of the loop frequency error. The ability of the loop to track incoming phase jitter is a slew-rate limited process, with an effective jitter bandwidth proportional to jitter amplitude. In practice, this is ideal behavior for the input of a SONET regenerator, where a wide jitter-tracking bandwidth minimizes sampling errors, and where the overall system jitter transfer function will be set by a separate narrow-band transmitter PLL. For this design, the bang-bang amplitude and charge pump time constant have been set to meet the SONET jitter tolerance specification. The resulting jitter generation, as calculated from the jitter spectrum is 0.0049 UI RMS.

#### Slide #17



This slide shows the time-domain PLL operation in the presence of noisy data. The top trace is an input  $2^{23}$ -1 PRBS data signal with broadband noise added to achieve  $10^{-4}$  BER. The middle trace is the recovered data eye, and the bottom trace is the recovered clock signal. All signals are triggered off of the transmitter BERT clock. Slide #18



This is a micrograph of the IC mounted on a fine-line hybrid and packaged in a 68 pin stainless-steel package. The large surface-mount component in the lower left corner is a 9.71875 MHz crystal-controlled reference oscillator.

#### Slide #19

| Parameter                  | Value                            | Units           |
|----------------------------|----------------------------------|-----------------|
| Guaranteed Frequency Range | 2-3                              | Gb/s            |
| Supply voltage             | 4.5-5.5                          | V               |
| Supply current (nominal)   | 340                              | mA              |
| Power dissipation          | 1.77                             | W               |
| Case temperature range     | 0-60                             | °C              |
| Die size (gate array)      | 3.45x3.45                        | mm <sup>2</sup> |
| Number of active devices   | 3606                             |                 |
| IC Process                 | 25 GHz f <sub>T</sub> Si-Bipolar |                 |
| Jitter Generation          | 0.0049                           | UI (rms)        |
| Jitter Tolerance           | meets SONET Spec                 |                 |

A commercial IC performing clock and data recovery for SONET 2.488Gb/s Transmission and Switching systems has been described. Previous commercial solutions have required multiple chips and GaAs processes to perform this function [1]. The 25GHz f<sub>T</sub> Si-Bipolar [2] chip reported here operates from 2 to 3 Gb/s over worst-case process, temperature and voltage variations, dissipating 1.77 Watts from 5Volt +/-10% supply, requiring a single off-chip filter capacitor. For network monitoring, a *reliable* Loss-of-Signal Detector is provided which operates on Phase-Error-Events, with a trigger threshold programmable between  $10^{-4}$  and  $10^{-6}$  BER.

Slide #20



This is the Die Photo of the Chip. The layout was done using a gate-array methodology with fully-differential ECL cells and 3606 active devices (less than 1/2 of the array capacity).

### Acknowledgments

The authors thank Soo-Young Chai, Jean Norman, Teik Goh and Lewis Dove for their valuable help. in the development of this product.

### References

- Ransijn, H., and O'Connor, P., "A PLL-Based 2.5-Gb/s GaAs Clock and Data Regenerator IC", IEEE Journal of Solid State Circuits, Vol. 26, no. 10, October 1991, pp 1345-1353.
- [2] W. M. Huang *et al.*, "A high-speed bipolar technology featuring self-aligned singlepoly base and submicrometer emitter contacts," IEEE Electron Device Letters., vol. 11, no. 9, pp. 412-414, September 1990.
- [3] Benny Lai, Richard C. Walker, "A Monolithic 622Mb/s Clock Extraction Data Retiming Circuit", 1991 ISSCC Digest, pp.144,145.