Digital Design Laboratory

Circuit Delays and Timing Simulation

Purpose

1. To Understand delays in digital circuits
2. To Learn to use the Timing Simulator

Background: Circuit Delays

Up to now we have been interested in the functional operation of logic circuits. In the previous labs you used the logic simulator to verify that the circuits function properly. However, there is one other important aspect to circuit design: the speed at which it operates. The speed of a digital circuit is very important, as it will determine the maximum frequency at which it can work. Let us consider a PC that has a clock frequency of 400 MHz. That means that each 2.5 ns (i.e. period T=1/frequency) the PC will perform a computation! As we will see, this will require clever circuit design.

You may ask yourself what determines the speed or the maximum frequency of a digital system? The answer is the delays of the circuits. There are several factors that contribute to the delay. One is the propagation delay due to the internal structure of the gates, another factor is the loading of the output buffers (due to fanout), and a third factor is the logic circuit itself.

1. Propagation delay

When the input signal of a gate changes, the output signal will not change instantaneously as is shown in Figure 1 below.

Figure 1: Propagation delay of gates
The propagation delay (or gate delay) of a gate is the time difference between the change of the input and output signals. There are two types of gate delays, TPHL and TPLH, as indicated in Figure1. The value of the propagation delay varies from gate to gate and from logic family to family. In general the more you are willing to pay for a device (or chip), the faster it will be. The FPGAs we are using in the lab have gate delays which vary between 1.5 and 4.8ns. The actual delay depends on the way the logic gates have been mapped into the LUTs (Look up table) of a CLB (Configurable Logic Block). The I/O buffers have delays in the range of 2-5ns.

2. Fanout and net delays

The propagation delay described above is caused by parasitic capacitors inside the gates and the physical limitations of the devices used to build these gates. Another cause of delay is the capacitor associated with the loads seen by a gate. These capacitors are the result of the wiring (net delays) between gates (e.g. a long metal line connecting two gates on a chip) and the input capacitor of the gates as is shown in Figure 2a.

Figure 2: (a)Parasitic interconnection capacitors and fanout of a gate; (b) hydraulic equivalent.
These capacitors need to be charged or discharged through the gate that drives them (e.g. gate 1 in Figure 2a). The more capacitors that need to be charged or discharged the longer it will take for the output to change. Also, the longer the interconnection, the more resistance the nets will have. The easiest way to visualize this is to use a hydraulic equivalent of a capacitor and a resistor: a bucket filled with water and a narrow pipe, respectively, as shown in Figure 2b. The more buckets connected to the drain (i.e. the input inverter), the longer it will take to empty them. This delay is the result of the fanout of the inverter.

3. Delay as a result of circuit topography.
Circuits that perform the same function can vary significantly in their speeds. A good example is an adder circuit. The one you designed in the previous lab is called a  ripple-adder and is considerably slower that a carry-look-ahead adder or CLA [1,3].Measuring Circuit Delays

The overall speed of a digital system can be measured on an oscilloscope by comparing the input to the output signals. However, during the design phase, the circuit has not yet been fabricated and therefore, cannot be measured. In that case it is possible to determine the delay of circuits by doing a  Timing Simulation. The advantage of a simulation is that one can also determine the delay of internal nodes of a circuit. This can be very helpful to understand which nodes or paths are the slowest and thus limit the overall speed of the circuits. These paths are called "Critical path". It is important to understand which paths are critical in a circuit so that one can reduce their delay.

The actual delay will depend on how the gates have been implemented in the various LUTs and CLBs. Also, the routing of the signals between the different CLBs determines the overall speed. Thus, one needs first to implement the design so that one can provide the Timing Simulator with the block and routing delay information. The Timing Simulator and the Implementation Tools are described in the tutorials.

One type of circuit where the effect of gate delays is particularly clear, is an ADDER. In this lab you will be measuring the delay of different types of adder circuits. The 4-bit adder you designed and implemented in the previous lab is called a ripple-carry adder because the result of an addition of two bits depends on the carry generated by the addition of the previous two bits. Thus, the Sum of the most significant bit is only available after the carry signal has rippled through the adder from the least significant stage to the most significant stage. This can be easily understood if one considers the addition of the two 4-bit words: 1 1 1 12 + 0 0 0 12, as shown in Figure 3.

Figure 3: Addition of two 4-bit numbers illustrating the generation of the carry-out bitIn this case, the addition of (1+1 = 102) in the least significant stage causes a carry bit to be generated. This carry bit will consequently generate another carry bit in the next stage, and so on, until the final carry-out bit appears at the output. This requires the signal to travel (ripple) through all the stages of the adder as illustrated in Figure 4 below. As a result, the final Sum and Carry bits will be valid after a considerable delay. For the schematic of Figure 4, the Sum of the most significant stage will be valid after 2(N-1) + 1 gate delays, in which N is the number of bits. The carry-out bit will be valid after 2N gate delays. This delay may be in addition to any delays associated with interconnections. It should be mentioned that in case one implements the circuit in a FPGA, the delays may be different from the above expression depending on how the logic has been placed in the look up tables and how it has been divided among different CLBs.

Figure 4: Ripple-carry adder, illustrating the delay of the carry bit.
The disadvantage of the ripple-carry adder is that it can get very slow when one needs to add many bits. For instance, for a 32-bit adder, the delay would be about 63 ns if one assumes a gate delay of 1 ns. That would imply that the maximum frequency one can operate this adder would be only 16 MHz! For fast applications, a better design is required. The carry-look-ahead adder solves this problem by calculating the carry signals in advance, based on the input signals. It is based on the fact that a carry signal will be generated in two cases: (1) when both bits Ai and Bi are 1, or (2) when one of the two bits is 1 and the carry-in (carry of the previous stage) is 1. Thus, one can write,
COUT = Ci+1 = Ai.Bi + (Ai \$ Bi).Ci.                        (1)
The "\$" stands for exclusive OR. One can write this expression also, as
Ci+1 = Gi + Pi.Ci                                                        (2)
in which
Gi = Ai.Bi                              (3)
Pi = (Ai \$ Bi)                        (4)

are called the Generate and Propagate term.

Notice that both the Propagate and Generate terms only depend on the input bits and thus will be valid after one gate delay. If one uses the above expression to calculate the carry signals, one does not need to wait for the carry to ripple through all the previous stages to find its proper value. Let’s apply this to a 4-bit adder.

C1 = G0 + P0.C0                                                                       (5)
C2 = G1 + P1.C1 = G1 + P1.G0 + P1.P0.C0                               (6)
C3 = G2 + P2.G1 + P2.P1.G0 + P2.P1.P0.C0                               (7)
C4 = G3 + P3.G2 + P3.P2.G1 + P3P2.P1.G0 + P3P2.P1.P0.C0      (8)
Notice that the carry-out bit, Ci+1, of the last stage will be available after  three delays (one delay to calculate the Propagate signal and two delays as a result of the AND and OR gate). The Sum signal can be calculated as follows,
Si = Ai \$ Bi \$ Ci = Pi \$ Ci.                                                          (9)
The Sum bit will thus be available after one additional gate delay. The advantage is that these delays will be the same independent of the number of bits one needs to add, in contrast to the ripple counter.

The carry-lookahead adder can be broken up in two modules: (1) the Partial Full Adder, PFA, which generates Si, Pi and Gi as defined by equations 3, 4 and 9 above; and (2) the Carry Look-ahead Logic, which generates the carry-out bits according to equations 5 to 8. The 4-bit adder can then be built by using 4 PFAs and the Carry Look-ahead logic block as shown in  Figure 5.

Figure 5: Block diagram of a 4-bit CLA.
The disadvantage of the carry-lookahead adder is that the carry logic is getting quite complicated for more than 4 bits. For that reason, carry-look-ahead adders are usually implemented as 4-bit modules and are used in a hierarchical structure to realize adders that have multiples of 4 bits. Figure 6 shows the block diagram for a 16-bit CLA adder. The circuit makes use of the same CLA Logic block as the one used in the 4-bit adder. Notice that each 4-bit adder provides a group Propagate and Generate Signal, which is used by the CLA Logic block. The group Propagate PG of a 4-bit adder will have the following expressions,

PG = P3.P2.P1.P0                          ;                                       (10)

GG = G3 + P3G2 + P3.P2.G1. + P3.P2.P1.G0                         (11)

The group Propagate PG and Generate GG will be available after  2 and 3 gate delays, respectively (one or two additional delays than the Pi and Gi signals, respectively).

Figure 6: Block diagram of a 16-bit CLA Adder

Pre-lab assignment:
1. Describe briefly which factors determine the maximum frequency of a digital circuit.
2. What is a critical path in a digital circuit?
3. Read the tutorial on "Timing Simulation".
4. Assuming that one gate delay is 1ns (or multiples of 1ns), determine:

1. The delay of the Carry-out of a one-bit Full Adder circuit
2. Consider the 4-bit adder you designed in the previous lab. Under what conditions does the worst case delay occur for the carry-out signal? What is the worst case delay for the carry-out (of the last stage)?
3. What would the worst case delay of the carry-out signal be for a 16 bit ripple-carry adder?
4. What is the delay of the Sum bit S3 of a 4-bit carry-lookahead adder of Figure 5?
5. What is the delay of the Sum bits S0-3, S4-7, S8-11 and S12-15 of the 16-bit CLA of Figure 6?
6. Timing

In-lab assignment:

A. Parts and Equipment:

1. PC
2. Xilinx Foundation Tools F2.1i
B. Experiments:
• You will implement  a 16-bit carry-lookahead adder. You will do this by creating a Partial Full Adder (PFA) macro and a Carry-lookahead Logic  macro as shown in Figures  5 and 6.
• Open a new project in your folder (c:\users\your_name\) and call it  MY16CLA.
• Open the schematic editor.
• Create a macro of the PFA using ABEL or VHDL. Call the macro MYPFA. When the PFA has been synthesized, do a functional simulation to verify that the circuit works.

NOTE: You can do a functional simulation without having to connect input and output terminals to the macro input. Do the following:

• Go to the simulator by clicking on the SIM icon on the top toolbar.
• In the Waveform viewer, click on the Add Signals icon (or go to SIGNALS -> ADD SIGNALS). In the middle panel of the Component Selection window, called Chip Selection, you will see the name of the macro, MYPFA. Double click on it. In the right panel, Scan Hierarchy, all the input and output pins will be displayed. Select these pins by double clicking. When done, click the Close button.
• Open the stimulator window and define the A and B bits using the Binary counter.
• Do a functional simulation and verify the operation.
• Create a macro for the Carry- Look-Ahead Logic block. Call this MYCLALOG. Use ABEL to create this macro. You should also generate the Groups Generate and Propagate signals, PG and GG, respectively. Do a simulation, as explained above.
• Now you can create a 4-bit CLA adder macro. First create the schematic for the 4-bit adder using the above defined modules of PFA and MYCLALOG. The adder should have as input A[3:0], B[3:0], and CIN. The outputs are S[3:0], PG, and GG (there is no need for Cout). Be careful when connecting the carry signals. Since this schematic will be used as a macro, you do not need to add input/output buffers and pads. For single signals use a terminal; and for the buses (ex. A, B and S), define the  bus as "input" or "output" bus when creating the bus.  When done, you can create a macro from the schematic  by going to the HIERARCHY -> CREATE A MACRO FROM CURRENT SHEET menu. Call the macro MY4CLA.
• When finished, synthesize the macro and simulate the 4-bit adder. Verify its operation and make a print out of the functional simulation waveforms (only one page).
• You are now ready to create the top level schematic of the 16-bit carry-lookahead adder. Use the block diagram of Figure 6.
• The schematic of the 16-bit adder is the top level schematic. All the input pins (A[15:0], B[15:0], CIN, SOUT[15:0], and COUT) need to be connected to pads and buffers. Cout is the last carry of the adder, corresponding to C16.
• In order to keep the schematic and its interconnections managable, use buses as much as possible. It may be helpful to use a 16-bit wide bus for input signals A and B. This bus can be split up into segments of 4 bits that each feeds into the 4-bit inputs of each of the 4-bit CLA macros. This can be done by first drawing a 16-bit wide bus (e.g. AIN[15:0]). Then connect the 4-bit wide input bus (e.g. A[3:0]) of the first 4-bit CLA  to the 16-bit wide A bus; next connect the input of the 2nd 4-bit CLA to the same A bus, etc. until all 16 inputs have been connected. Now you need to specify how the bus signals are connected. This can be done by giving the bus segment the proper name by clicking on the bus. This will open the Edit Bus window. Keep the name but change the bus width. For instance, to specify a 4-bit wide bus segment that connects to the input of the first 4-bit adder, give it the name AIN and as width [3:0]. It is important that the name of the bus segment is the same as the name of the 16-bit bus.  Do the same for the other bus segments (e.g. AIN[7:4], etc.). For more information about buses in the Schematic Entry window, select HELP -> SCHEMATIC EDITOR HELP CONTENTS. In the help window select INDEX and type "buses" or "bus connections".

• When done, do a functional simulation and make a print out of the functional simulation diagram (again only one page). Instead of defining each input  signal A0, A1 ... A15, B0, B1, ... B15 separately, it is more convenient to use the Formulas to define the input  signals of the input buses. See section on Formulas in the Logic Simulator Tutorial.

• When the circuit works properly, you need to implement it so that you can do a timing simulation later on.
• Before doing a simulation, go to the Program Manager and select the IMPLEMENTATION -> OPTIONS menu.  In the Options window, click on the Edit Options button for the Simulations Option (Foundation EDIF). In the Simulation Options window, click on the General Tab and  De-select the option: Correlate Simulation Data to Input Design. This will prevent that the timing simulation will give undefined signals (see also undefined signals in the timing simulation tutorial).
• For the implementation, see previous lab or check the tutorial on Design Implementation. Use a device speed of 3. Read the Place&Route report to check the device utilization. Write these numbers down in your notebook (they will be needed for the report later on).
• Timing simulation: Do a worst-case simulation of the CLA and determine the delay of the Sum bits and the Carry out bit COUT.  In case you have problems with the timing simulation and some of the signals are undefined (gray waveform), see the notes below.  You do not need to go through all the possible input combinations because you have verified that the circuit works properly by doing a functional simulation above. Apply the signals that will give you a worst case scenario. Make a print out of the simulation data, including the measurement data. Print only the simulation data of interest to show the worst case delay; do not print more than 2 pages (1 page should do).
Notes:
•  In case some of the signals show as an undefined signal (gray in the waveform viewer window or blue X on the schematic) make sure that you De-select the option: Correlate Simulation Data to Input Design in the Implementation Options (Simulations) as explained above or in the timing simulation tutorial.
• Use the Measurement tool to determine the delay: go to WAVEFORM -> MEASURMENT -> MEASUREMENT ON. This will allow you to display the delays. Be careful that the period of the stimuli (i.e. the signals applied to the input) is larger than the longest delay of the circuit. If your input signal is changing before the output signals appear (due to the delay), you will get useless and unpredictible results. You can change the period of the binary counter output (Bc:) by going to the OPTIONS -> PREFERENCES -> SIMULATIONS; adjust the Clock Period of B0 to a value larger than the maximum delay (e.g. 50 ns). The input signal should not change until all the output signals have settled to the proper value. Find the following delays:
• Delay associated with an input and output buffer.
• Delay between the start of the input signal  and the Cout signal. Measure the input signal after the input buffer and the output signal (Cout)  before the output buffer to exclude the delay of the buffers.
• Delay between the start of the signal (measure the signal after the buffer to exclude the buffer delay) and the Sum bits.
• What is the worst-case delay? What is the maximum frequency at which you will be able to run your CLA adder?
Hand-in at the start of next lab:

Lab Report including

1. Course title, Lab no, Lab section, your name and date.
2. Section on the lab experiments:
1. Brief description of the lab including the goals
2. Copy of the ABEL or VHDL source code of the macros PFA, MYCLALOG.
3. Printout of the schematic of the 4 bit and 16-bit CLA adders
4. Printout of the funtional simulation of the 4-bit CLA adder (label printouts)