Assignment 1
(due September 5, 2012):
Assignment 2
(due September 10, 2012):
Read Dennard et al., IEEE Journal of Solid-State Circuits, Vol. SC-9, pp. 256-268 (1974)
Assignment 3
(due September 17, 2012):
BF 80 00 00 3F 80 00 00 40 00 00 00 40 40 00 00
Assignment 4
(due September 24, 2012):
Assignment 5
(due October 1, 2012):
Assignment 6
(due October 8, 2012):
lw $t2, 0($t3)
lw $t3, 4($t3)
beq $t2, $t3, Label # Assume branch not taken
add $t5, $t2, $t3
sw $t5, 8($t3)
Assume that register t3 contains 0x10000000,
M[0x10000000] = 2310, and
M[0x10000004] = -2110.addi(add immediate) to the multicycle implementation discussed in the notes. This instruction is documented in the SPIM instruction reference. Add any necessary datapaths and control signals to the datapath and control for the multicycle implementation, and add any necessary states to the multicycle finite state machine.
Assignment 7
(due October 15, 2012):
Assignment 8
(due October 22, 2012):
lw $10, 68($24)
lw $11, 324($24)
sll $0, $0, 0
sll $0, $0, 0
sll $0, $0, 0
sll $0, $0, 0
add $12, $11, $10
sll $0, $0, 0
sll $0, $0, 0
sll $0, $0, 0
sll $0, $0, 0
sw $12, 580($24)
Assume that ($24) = 0x10000000,
(M[0x1000 0044]) = 0x7fff fff9,
(M[0x1000 0144]) = 0x0000 000d.
Assignment 9
(due October 29, 2012):
or $t1,$0,$a2
or $t3,$0,$a0
or $t4,$0,$a1
lw $t5,0($t7)
lw $t6,0($t8)
mul $t2,$t5,$t6
addi $t1,$t1,-1
add $t3,$a3,$t3
add $t4,$a3,$t4
la $t1,i1
addi $t1,$t1,100
or $t2,$t3,$t1
add $a0,$a1,$t2
ori $a0,$a0,42
add $t5,$a0,$t2
lw $t0,24($a0)
sub $t4,$t4,$t0
sub $t8,$t8,$t3
add $t6,$t6,$t5
mul $t7,$t7,$t1
la $t0, ar2
lw $t1, size
lw $t2, nrows
lw $t3, ncols
addi $t4, $t2, -1 # nrmax
addi $t5, $t3, -1 # ncmax
ori $t6, $0, 0 # initialize row index to 0
lwc1 $f0, val
mfc1 $s4, $0
rloop: mul $t9, $t6, $t3 # multiply rindex by ncols
mul $t9, $t9, $t1 # multiply by size of one array element to get roffset
ori $t7, $0, 0 # initialize column index to 0
cloop: mul $s0, $t7, $t1 # multiply cindex by size to get coffset
add $s1, $s0, $t9 # offset of ar2[rindex][cindex] = roffset + coffset
add $t8, $s1, $t0 # address of ar2[rindex][cindex] = offset + base
sw $s4, 0($t8) # store val in ar2[rindex][cindex]
addi $t7, $t7, 1 # increment the column index
sub $s2, $t5, $t7 # nc = ncmax - cindex
bgez $s2, cloop # branch back to cloop if nc >= 0
addi $t6, $t6, 1 # increment the row index
sub $s3, $t4, $t6 # nr = nrmax - rindex
bgez $s3, rloop # branch back to rloop if nr >= 0
ori $v0, $0, 10 # reach here if row loop is done
syscall # end of program!
or $t0,$0,$a0 # Reg. t0 points to the array element
or $t1,$0,$a2 # Reg. t1 is a counter
loop: sw $a3,0($t0) # Store the value into the array element
add $t0,$a1,$t0 # Increment the pointer by the value of size
addi $t1,$t1,-1 # Decrement the counter
bgtz $t1, loop # branch back to loop if counter >= 0
# (since we store at the head of the
# loop, we compute one more address
# than necessary just to reduce
# the number of compares & branches)
beamup: jr $ra # Beam me up....
lw $t2, 0($t3) lw $t3, 4($t3) beq $t2, $t3, Label # Assume branch not taken add $t5, $t2, $t3 sw $t5, 8($t3)Comment: Pipelining this code segment is tricky because a branch follows a load. You will need to refer to the detailed diagrams of forwarding after a load in the lecture slides, as well as slide 102. You will also need to decide whether the branch should be detected in the RF stage or the EX stage.
Assignment 10
(due November 5, 2012):
| Memory address | Bank accessed | Clock cycle when accessed |
| 8001 | 1 | 1 |
| 8011 | 3 | 2 |
| 8021 | 5 | 3 |
| 8031 | 7 | 4 |
| 8041 | 1 | 7 |
| 8051 | ||
| 8061 | ||
| 8071 | ||
| 8081 | ||
| 8091 | ||
| 8101 | ||
| 8111 |
Assignment 11
(due November 12, 2012):
Assignment 12
(due November 26, 2012):
Let the bit error probability on a certain point-to-point link be b. We will assume that b is very small compared to 1. We will assume, also, that errors in different bits of a frame are uncorrelated, and that all frames have exactly N bits.
Assignment 13
(due December 3, 2012):
SS CPU: A 2-core superscalar microprocessor that provides out-of-order issue capabilities on two functional units (FUs) per core. Only a single thread can run on each core at a time.
MT CPU: A fine-grained multithreaded processor that allows instructions from two threads to run concurrently on two functional units. However, only instructions from a single thread can be issued on any cycle.
SMT CPU: A symmetric multithreaded processor that allows instructions from two threads to run concurrently on two functional units. Instructions from either or both threads can be issued to run on any clock period.
Assume that we have two threads, X and Y, to run on these CPUs. The threads include the following instructions, each of which takes one clock period (CP) to execute unless otherwise noted, or unless there is a hazard.
| Thread X | Thread Y | |
| A1 -- takes 2 CPs to execute | B1 -- no dependencies | |
| A2 -- depends on the result of A1 | B2 -- conflicts for a FU with B1 | |
| A3 -- conflicts for a FU with A2 | B3 -- no dependencies | |
| A4 -- depends on the result of A2 | B4 -- depends on the result of B2 |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|