# COMPUTER ORGANIZATION AND DESIGN

## CURRENT HOMEWORK ASSIGNMENT

Patterson and Hennessy, Chapter 7
Review Exercises (DO NOT hand in):
Waldron, Exercises for Chapters 7 and 8
Exercises (due December 2, 1999 for both sections):
1. [30] How many clock periods does the following SPIM code segment take given a) no pipeline, b) a 5 stage pipeline assuming one clock period per instruction, but no data forwarding, c) a 5 stage pipeline with data forwarding as described in P&H, Chapter 6, section 4.
```         or   \$t1,\$0,\$a2
or   \$t3,\$0,\$a0
or   \$t4,\$0,\$a1
lw   \$t5,0(\$t7)
lw   \$t6,0(\$t8)
mul  \$t2,\$t5,\$t6

```
2. [20] Find the data hazards in the following SPIM code segment, and show the hazards on a pipeline diagram, as in P&H, Fig. 6.44:
```    lw   \$t1,i1
or   \$t2,\$t3,\$t1
ori  \$a0,\$a0,42
```
3. [10] Reorder the following code segment to remove the data hazards. Assume that data forwarding takes place:
```     lw  \$t0,24(\$a0)
sub \$t4,\$t4,\$t0
sub \$t8,\$t8,\$t3
mul \$t7,\$t7,\$t1
```
4. [10] What is the CPI for the reordered sequence of instructions in the preceding problem?
5. [20] Find the control hazards in the following SPIM code segment, and show the hazards on a pipeline diagram::
```        la   \$t0, ar2
lw   \$t1, size
lw   \$t2, nrows
lw   \$t3, ncols
addi \$t4, \$t2, -1  # nrmax
addi \$t5, \$t3, -1  # ncmax
ori  \$t6, \$0, 0    # initialize row index to 0
lwc1 \$f0, val
mfc1 \$s4, \$0
rloop:  mul  \$t9, \$t6, \$t3 # multiply rindex by ncols
mul  \$t9, \$t9, \$t1 # multiply by size of one array element to get roffset
ori  \$t7, \$0, 0    # initialize column index to 0
cloop:  mul  \$s0, \$t7, \$t1 # multiply cindex by size to get coffset
add  \$s1, \$s0, \$t9 # offset of ar2[rindex][cindex] = roffset + coffset
add  \$t8, \$s1, \$t0 # address of ar2[rindex][cindex] = offset + base
sw   \$s4, 0(\$t8)   # store val in ar2[rindex][cindex]
addi \$t7, \$t7, 1   # increment the column index
sub  \$s2, \$t5, \$t7 # nc = ncmax - cindex
bgez \$s2, cloop    # branch back to cloop if nc >= 0
addi \$t6, \$t6, 1   # increment the row index
sub  \$s3, \$t4, \$t6 # nr = nrmax - rindex
bgez \$s3, rloop    # branch back to rloop if nr >= 0
ori  \$v0, \$0, 10   # reach here if row loop is done
syscall            # end of program!
```
6. [10] Which instruction could do useful work in the branch delay slot, without changing the computational results in any way?
```          or    \$t0,\$0,\$a0          # Reg. t0 points to the array element
or    \$t1,\$0,\$a2          # Reg. t1 is a counter
loop:     sw    \$a3,0(\$t0)          # Store the value into the array element
add   \$t0,\$a1,\$t0         # Increment the pointer by the value of size
addi  \$t1,-1              # Decrement the counter
bgtz  \$t1, loop           # branch back to loop if counter >= 0
#  (since we store at the head of the
#   loop, we compute one more address
#   than necessary just to reduce
#   the number of compares & branches)
beamup: jr \$ra                      # Beam me up....
```

Solutions
Assignments 1-8