# **Two Methods for Linearity Improvement in Digitally Controlled Delay Elements: Current Starved Type**

Mahmoud Ghasemi<sup>1</sup>, Ali Nehbandan Dokht<sup>2</sup>, Abbas Golmakani<sup>3</sup>

1- Department of Electrical, Sadjad University of Technology, Mashhad, Iran Email: ghasemimd@yahoo.com

2- Department of Electrical, Sadjad University of Technology, Mashhad, Iran

Email: kapula\_30@yahoo.com

3- Department of Electrical, Sadjad University of Technology, Mashhad, Iran Email: abbas.golmakani@yahoo.com

Received: May 2015

Revised: October 2016

Accepted: November 2016

# ABSTRACT

Current starved delay elements (CSDEs) are among the popular architectures to manipulate rising or falling edges of signals in order to meet timing requirements. The digitally controllable generations of these topologies are now monotonic and reasonably power efficient, but they lack linearity in full range. Inherently, this subject may not seem problematic because by setting the dimensions of the design elements the desired delay can be acquired. However, in case that a chain of incremental delays are required, we tend to employ more linear designs. In this paper two improvements in linearity are examined for two known CS designs. Both of the topologies are in 0.18 µm technology, and meet appropriate design parameters like power, area and monotonic response.

KEYWORDS: Current Starved, Delay Element, Linearity, Monotonic, Power, Process, Temperature.

#### **1. INTRODUCTION**

Delay elements are circuits that add an estimated amount of delay to the input signal. The input is mostly a clock signal or pulse and the resulting delayed pulse is applied in ICs for timing purposes. The wide range of DEs applications include pulse generators, switched capacitor circuits, memory units and microprocessors [1], delay locked loops (DLLs) [1], [2] and its digital type (DDLL) [3], phase locked loops (PLLs) [1-3] and their all digital version (ADPLLs) [3], and digitally controlled oscillators (DCOs) [1], [3], [4]. Glitch blocking circuits which trigger the gates of glitchy transistors at proper time benefit from using delay elements [5]. Other applications include duty cycle converters (DCCs) [6], pulse width control loop [6], [7], time-to-digital converters (TDCs) [8], DRAM interface units [9], deskew circuits [10] and clock buffers [11], and spread spectrum clock generator [12].

A variety of architectures for DEs have been reported. Examples constitute transmission gate, transmission gate cascaded with Schmitt trigger, cascaded inverters and thyristor-based delay elements. The elements which perform based on current-starved rule are more popular for proportionally lower power consumption and simple design; however they are generally sensitive to variations in source voltage and temperature [13]. This paper is presented through five sections. Section II introduces in brief the basic operation aspect of a current starved design and circuit 1. It is followed by section III, which includes the introduction of circuit 2 as reported in literature, and the improvement applied to that as well. Section IV presents a simple high resolution method for creating a chain of easily estimable incremental linear delay. Section V concludes the paper.

#### 2. A SHORT REVIEW OF CSDE FUNCTION

Two well-known topologies for current starved circuits are resistor-based and capacitor-based architectures. In the first, one the (dis)charging current is controlled by changing the resistivity of the path flow. In the second one, the change in capacitance controls the rate of (dis)charging and hence the delay [3].

The basic concept for both architectures is presented by relation (1):

$$t_{d} = C_{L} \frac{V}{I}$$
(1)

Where  $t_d$  is delay,  $C_L$  is sum of parasitic and external capacitor at the output of inverter, V is the inverter output voltage, and I is the charging current.

Fig. 1 shows the basic resistor-based schematic of a digitally controlled delay element (DCDE). A bank of stocked transistors is arrayed in the source of an nMOS or pMOS of an input inverter. The load capacitor at the output of the input inverter is alternatively charged and discharged through the inverter. The schematic in figure 1 indicates that the discharging current is controllable by a set of n MOS transistors arrayed at the source of M2. If the output of this first inverter is fed to the input of another inverter, the final output will exhibit controllability on the rising edge delay of the input signal. As reported by many, for example in [1], programming of the bank of transistors for a monotonic response is an issue since extra coding would be needed.



Fig. 1. The resistor-based DCDE

In the capacitor based design, as depicted schematically in Fig. 2, a number of parallel transistors of different sizes arrayed in the source of the input inverter play the role of a variable capacitor. The change in capacity is obtained through a number of on/off possible conditions produced by the input vector. However, as reported in [1], the charge sharing between the load capacitor at the output of the first inverter and this array spoils monotonic response or makes accurate programming a difficult issue.

A current mirror monotonic CSDE is reported in [1]. It is digitally programmable by a set of transistors of different widths (M1=4/4, M2=2/4, M3=1/4, M4=0.5/4)  $\mu$ m which are changing the flow in the main branch of the current mirror. Fig. 3 depicts the reported architecture. The current in M6 is mirrored to M7 and aauses a change in the discharging current running through M8, hence changing the delay.



Fig. 2. A capacitor-based DCDE [1]

The size of M7 must be smaller than M8 to provide a starved nature for the design [1]. The circuit, implemented in  $0.18\mu$ m technology, exhibits low sensitivity to temperature. In spite of the low power characteristic of the CSDEs, the continuous path for the current through M5 and also the static current allowed by the output inverter stimulate two innovations introduced in [3].



**Fig. 3**. (Circuit 1)The current mirror monotonic DCDE originated by [1]

Following equation (1), the drain current of M7 is:

$$i_{l} = -C_{L} \frac{\mathrm{d}v_{o1}}{\mathrm{d}t} \frac{\mathrm{k_{n}}W_{7}}{2\mathrm{L}_{7}} \left(\mathrm{V_{g}} - \mathrm{V_{th7}}\right)^{2} (1 + \lambda_{7} \mathrm{V_{DS7}})$$
(2)

The interested reader may refer to references [1], [3] and [4] for detailed mathematical treatments. Simulation of the circuit in Fig. 3, better call it circuit 1 in this paper, for higher temperatures shows that with the rise in temperature, delay values will increase.



Fig. 4 . Delay versus input vector at 27°C and 77°C for circuit 1

Fig. 4 shows the trend of delays versus input vector for two conditions; one at 27°C and another for 77°C. The minimum increase in delay is 30.5ps at the input vector abcd = 0000 and the maximum is 72.7ps at abcd = 1111.

In addition, the simulation of this circuit (circuit 1) for process corners is illustrated in Fig. 5. Interestingly the delay value for FS and SS corners are quite the same, because there is only about 7ps difference between the two characteristics. Besides, the increased delay values for SF and FF corners are almost the same as well.



Fig. 5. Delay versus input vectors for process corners in circuit 1

# 3. THE REPORTED POWER EFFICIENT TOPOLOGY

This design, as reported in [3], introduces two concomitant improvements in order to reduce static power dissipation to almost zero. First, the free path for current through the main branch of the mirror structure is limited to only one half of each period. Such a task can be carried out if a direct feedback from the output

#### Vol. 11, No. 1, March 2017

to the gate of M5 is applied. Because the half-period current is fully controlled dynamically, a dynamic current mirror is created. As expressed in [3], this technique can provide "current-on-demand" operation. Fig. 6 depicts the design. Second, it is observed in this figure that the input to the first inverter is also fed to another inverter inserted between the main two inverters. This additional inverter is notified in Fig. 6 as INV. The input to this inserted inverter receives a small adjustable delay and when it is input to the nMOS of the last inverter, M12, it can remove quite completely the direct path between M11 and M12, because the gate voltage of M12 lags that of M11 and therefore, during the transition period, M11 and M12 are not on at the same time.

#### 3.1. Simulation test

The circuit shown in fig. 6 was subjected to simulation using Cadence software. Through a lot of simulation cases it appeared to exhibit its wonderfully equal-to-zero static power dissipation. However, one fact depicted in the simulation results seems to be as a slight impairment. Fig. 7 shows the input-output and the voltage pattern at node  $V_m$  as obtained through simulation.



Fig.6. (Circuit 2) Power efficient DCDE as reported in [3].

We can see a noticeable discrepancy between the first half cycle of the output and the subsequent cycles. An extra voltage at this node, at the beginning of the first cycle, pushes M7 and M9 faster and deeper into saturation region. Considering channel modulation effect, the current passing through the two transistors rises rapidly and due to the mirror nature of the current, M10 will pass more current and therefore the discharging current flowing down M8 presumes a faster rate. This condition makes the desired set of delay values at the start of the on-state much less than the desirable values for the subsequent cycles. From the

beginning of the second cycle on, the discrepancy has been removed because the dynamic current mirror is now actively regulates both cycles.



Fig. 7. The pattern of  $V_m$  for the two ends of the delay range, and the delay waveform of circuit 2

#### 3.2. A proposed improvement method

One way to remove this discrepancy is that we should create a proper dynamic path parallel to node  $V_m$  and ground to bypass the extra amount of the current in the main branch of the mirror structure during the first half-cycle and bring about a balancing factor for the next cycles. Of course such a path would not be needed for the topology in [1] because transistor M5 is on for both half-cycles and the current mirror is regulated from the very beginning of the on-state; hence the first half-cycle is a true copy of the second half-cycle and the like.



Fig. 8. The improved DCDE (circuit 3)

A dynamically appropriate path for this purpose can be created with pMOS, as illustrated in Fig. 8. Here transistors M13 and M14 do the job. They have different sizes so that one of them keeps the circuit in the linear region for higher input codes. One interesting matter with this improvement way is that after the sizes of M13 and M14 are optimized toward zero error in delay differences between the first two cycles, the circuit is robust, from the point of view of nondiscrepancy, for a large variety of changes in size of other transistors.

#### 3.2.1. Cycle discrepancy improvement test

Fig. 9 indicates that the difference between the first graph of the 16-set of graphs in the second cycle and the same one in the first cycle is only about 2ps, an about 95% improvement. However, toward the end of the delay spectrum, this difference rises to about 10ps. As seen, the pattern of  $V_m$  is consistent for both the first and second cycles.



Fig. 9. The pattern of  $V_m$  for the two ends of the delay range and the delay waveform in the improved circuit (circuit 3)

#### 3.2.2. Linearity improvement

The second advantage associated with this way of improvement is enhanced linearity. It is clear from the pattern of the delay spectrum in Fig. 10 that spacing between the sixteen graphs enjoy a more linearity trend than what which is observable in the second cycle in Fig. 7. Such a comparison is more easily acknowledged when the waveform in Fig. 10 and the plot in Fig. 11 are considered. One obvious fact with the application of this method is the reduction in length of the delay spectrum. This can be figured out as the third feature of this method since the resolution has been enhanced.



Fig.10. Output waveform of the improved design (circuit 3) at 450MHz

#### Vol. 11, No. 1, March 2017

Fig. 11 gives the delay-versus-input characteristics for both circuit 2 and circuit 3. It is observed that the output is linear to much extent for circuit 2.



Fig. 11. Delay vs. Input vector for both the improved design (circuit3) and circuit 2.

# 3.2.3. Temperature variation effect

Moreover, both the designs were examined by simulation for sensitivity against temperature variation. This time circuit 2 proved to be noticeably less sensitive to temperature.



Fig.12. The change in delay vs. input vector for both the main and improved designs

Fig. 12 depicts the situation for both circuits for a range of 50 degrees centigrade length. While the maximum increase in the spacing between the graphs (delay steps) is almost 12ps for circuit 2, the general increase in delay steps for the improved design (circuit 3) is about 60ps. This is definitely a drawback for the introduced improvement method; however, it preserves

the relevant linearity fashion, which could be presumed as a positive feature.

#### 3.2.4. Process corners effect

The results in this portion might not be so promising because of the vast changes we can observe with different states of the corners. This is more an issue for the improved circuit. While Fig. 13 is not the indicative of a desirable state for circuit 2, the condition is even more frustrating for the improved one (circuit 3) in Fig. 14. It should be noted that in all figures in this paper which are relevant to process corners, "stat" is "TT". For circuit 3, the delays are in a disordered fashion, from SF state that is so high that trespasses the half oncycle and spoils the response for the whole input, to SS state that unexpectedly does not follow the linear pattern of the normal state for delay increments. This becomes even complicated when a waveform examination proves that the discrepancy between the first cycle waveform and the other ones is back again, too. Fig. 14 also shows that for SF and FF, circuit 3 shows better characteristics.



Fig.13. The effect of process corners on the delay in circuit 2

Circuit 3 was examined once more through simulation at 400MHz, which is 50 MHz lower than the former case. The results, presented in Fig. 15, suggest that it performs better at lower frequencies, regarding process variations.



Fig. 14. The effect of process corners on the delay of the main circuit at 450MHz



**Fig. 15**. Delay vs. input vector due to the effect of process corners on circuit 3 at 400MHz

#### 3.2.5. Power

Circuit 3 consumes some  $31\mu$ W static power, which compared to circuit 2 is a drawback.

The improved design introduced in section III showed to be able to significantly offer cycle resemblance and much linearity. However, due to unsatisfying robustness against process, another topology is proposed in the next section. This topology, introduced in section IV, could be a solution to create an almost precisely linear response in any specified range and set of delays. In this paper it is designated as circuit 4.

# 4. A CERTAIN STEP TOWARD FULL LINEARITY

The second method which is offered for discussion follows basically and foremost the topology reported in

#### Vol. 11, No. 1, March 2017

[1], with of course one absolute difference. The idea is employing a set of 15, instead of 4, transistors of different sizes for producing proper separate current values in every single state in which only one transistor is on. Fig. 16 exhibits the proposed change brought about to the original CSDE in [1].

The current flowing through M21 is fully determined so that the exact desired increment to the next delay is added. This is achieved by adjusting the dimension of each transistor individually for every separate single delay.

#### 4.1. Implementation Process

This part includes several practical steps leading to the ideal implementation of the design, as follows:



Fig. 16. The highly linear improved CSDE

# 4.1.1. Transistors M1...M15 dimensions

Since the design is simulated for a 450MHz input, T=2.22ns, the maximum accessible range of delay could be a little lower than half of the period or less than 1.11ns. In this example, the highest delay was selected to be 709.8ps. Transistor M16 with the aspect ratio of  $(0.41/11) \mu m$  produces this delay. This is done while the 15-set of transistors are all off (abcd = 0000). In this design, the input vectors (abcd) are binary values at the input to a decoder. Now for the low end of the range to be specified, M1 is determined by dimension. For the case of the proposed circuit, for example, the aspect ratio equal to  $(16/4) \mu m$  produces 151.8ps delay. Now by a simple calculation we get:

$$709.8 - 151.8 = 558 \text{ (ps)} \tag{3}$$

$$558 \div 15 = 37.2(\text{ps})$$
 (4)

Consequently, there are 15 uprising steps of 37.2ps from the left end to the right end of the present delay range. By choosing the frequency and proper aspect ratios for M16 and M1, the desired delay ranges and steps are acquired. The mere issue is that all this has

been accomplished using simulation process. A theoretically formulated guideline seems to be a matter of much work with deep insight.

### 4.1.2. Input decoding

A 4 to 16 decoder should be used at the 4-bit binary input to feed any one of the transistors from M1 to M15 individually.

# **4.1.3.** Inputs

Like the main source, the four input sources to the decoder are also 1.8V.

#### 4.1.4. Gates in the decoder

In order to decrease area and power consumption, the 16 binary codes were divided by two groups of lesser and higher values and therefore the number of gates was almost reduced by half. The NOT and NAND gates that implement the decoder should also be of optimal size to become power efficient.

#### 4.2. Simulation results

# 4.2.1. Linearity

Fig. 17 depicts the waveform of the output delay. The orderly equal spaces between the subsequent graphs are the indicative of high linearity. There is a slight maximum error of about 1.2ps in some spaces.



Fig. 17. The waveform of the fully linear improved design

This fact is observable as well in Fig. 18, which indicates the linear nature of this improvement method based on the delay versus the input vector.

#### 4.2.2. Effect of temperature variations

This proposed design (circuit 4) was also examined for temperature variations. A two-step sweep range from 27°C to 77°C shows that this architecture is almost robust for ambient temperature changes.



Fig. 18. Delay vs. input vector in the second improved design

Fig. 19 shows the simulation result. The minimum change is at abcd=0001, equal to 10ps, and the maximum change is at abcd=0000, with about 30ps.



**Fig. 19**. Delay vs. input vector in the second improved design for 27°C and 77°C

#### 4.2.3. Process

The simulation of this circuit for process corners is illustrated in Fig. 20. As seen, the graphs show a significantly more linear trend with less scattered fashion and drastic changes. In fact, this circuit displays the best process behavior.



Fig. 20. The effect of process corners on the delay of the second improved design (circuit 4)

#### 4.2.4. Power

The static power dissipated by the 1.8V source is  $40.3\mu$ W and the total dynamic power is  $92.7\mu$ W. The total power consumption is  $133\mu$ W, which shows  $78\mu$ W decrease compared to the reliable circuit reported in [1].

A comparison between some important parameters of the introduced and improved architectures is presented in Table 1. Using relation (5), the factor of merits is also offered in the table. It is obvious that circuit 4 performs better than the other three.

$$F.M. = \frac{\text{Linearity} \times \text{Process corners}}{\text{Vdd} \times (\text{P}_{Static} + \text{P}_{dyn})_{uW} \times \text{Temp.S.}} \times 100$$
(5)

|                             | Vdd<br>(V) | P <sub>static</sub><br>(µW) | P <sub>dyn</sub><br>(µW) | Lineari<br>ty    | Temp.<br>sensitiv<br>ity | Proces<br>s<br>corners<br>{2-5} | F.M  |
|-----------------------------|------------|-----------------------------|--------------------------|------------------|--------------------------|---------------------------------|------|
| Circuit 1                   | 1.8        | 136                         | 75                       | Low              | Low                      | Good                            | 1    |
| Circuit 2                   | 1          | 35 <sup>a</sup>             | 36                       | Low <sup>b</sup> | Lower                    | Poor                            | 5.6  |
| Improved<br>1<br>(circuit3) | 1          | 30.<br>7                    | 31.<br>3                 | High             | mediu<br>m               | Poor                            | 4.3  |
| Improved<br>2<br>(circuit4) | 1.8        | 40.<br>3                    | 92.7                     | Very<br>High     | Lower                    | Very<br>Good                    | 10.4 |

**Table 1**. The comparison of the four designs

a: If we do not consider the very low linearity and discrepancy of the first cycle.

b: This value is not given in the original source. The approximate value is given out of comparison with circuit 3, which is similar to circuit 2.

#### 5. CONCLUSION

In this paper, 4 DCDEs were examined through a combination of comparative analysis and simulations.

#### Vol. 11, No. 1, March 2017

The interested reader may refer to at least the first 4 references in this paper for mathematical treatments. The goal was to evaluate more practically the merits of each circuit. Circuit 4 proved to be the best for process corners variation. The only merits with circuit 3 (the improved form of circuit 2) are rather complete resemblance in cycles and linearity but it tremendously suffers from temperature and process corners variations. Circuit 4, irresponsible of the probably not a straightforward way to calculate the aspect ratios for the 15 set of input-vector transistors, offers the best linearity and process corners variations. It also displays lower sensitivity to temperature variations while the total power consumption is lower compared with the same trait in circuit 1.

#### REFERENCES

- M. Maymandi-Nejad, M. Sachdev, "A Monotonic Digitally Controlled Delay Element", *IEEE Journal of Solid-State Circuits*, Vol. 40, No. 11, November 2005.
- [2] T. Lee, K. Donnelly, J. Ho, J. Zerbe, M. Lohnson, and T. Ishikawa, "A2.5 V CMOS delay-locked loop for an 18 Mbit, 500 megabyte/s DRAM," *IEEE J. Solid-State Circuits*, vol. 29, pp. 1491– 1496, Dec. 1994.
- [3] S. B. Kobenge, Huazhong Yang, "A power efficient digitally programmable delay element for low power VLSI applications", 1st Int'l Symposium on Quality Electronic Design-Asia, 2009.
- [4] M. Maymandi-Nejad, M. Sachdev, "A digitally programmable delay element, Design and analysis", *IEEE Trans. On Very Large Scale Integration (VLSI) Systems*, Vol. 11, No. 5, Oct. 2003.
- [5] N.R. Mahapatra, S.V. Garimella, and A. Tareen, "Efficient techniques based on gate triggering for designing static CMOS ICs with very low glitch power dissipation," to be published in *Proc.* 2000 IEEE International Symposium on Circuits and Systems, Geneva, Switzerland, May 28-31, 2000.
- [6] M. Fenghao, C. Svensson, "Pulsewidth Control Loop in High-Speed CMOS Clock Buffers", *IEEE JSSC*, Vol. 35, No. 2, pp. 134-141, February 2000.
- [7] "Resolution Multi-Channel Time-to-Digital Converter (TDC) for High-Energy Physics and Biomedical Imaging Applications", 4<sup>th</sup> IEEE Conference on Industrial Electronics and Applications, pp. 1133-1138, 30 June 2009.
- [8] G. Wu, G. Deyuan, W. Tingcun, C. Hu-Gu2 and Y. Hu, "A High-Resolution Multi-Channel Time-to-Digital Converter (TDC) for High-Energy Physics and Biomedical Imaging Applications", 4<sup>th</sup> IEEE Conference on Industrial Electronics and Applications, pp. 1133-1138, 30 June 2009.
- [9] Matano, T. et al., "A 1-Gb/s/pin 512-Mb DDRII SDRAM using a digital DLL and a slew-ratecontrolled output buffer," *IEEE Journal of*

Vol. 11, No. 1, March 2017

*SolidState Circuits*, Vol. 38, No. 5, pp. 762-768, May 2003.

- [10] Dehng D-K. et al., "Clock-deskew buffer using a SAR-controlled delaylocked loop," *IEEE J. of Solid-State Circuits*, vol. 35, pp. 1128-1136, Aug 2000.
- [11] Watson, R.B., Jr.; Iknaian, R.B., "Clock buffer chip with multiple target automatic skew compensation," *IEEE Journal of Solid-State Circuits*, Vol. 30, No.11, pp. 1267-1276, Nov 1995.
- [12] Jonghoon, K., Kam D. G., Jun P. J., SJoungho K., "Spread spectrum clock generator with delay cell array to reduce electromagnetic interference," *IEEE Trans. on electromagnetic compatibility*, No. 4, pp. 908-920, 2005.
- [13] Gyudong Kim, Min-Kyu Kim, Byoung- Soo Chang, and Wonchan Kim, "A Low-Voltage, Low-Power CMOS Delay Element", *IEEE Journal of Solid-State circuits*, Vol. 31, No.7, July 1996.