### CSCI 250 Introduction to Computer Organisation Lecture 4: Control Unit and Pipelines III



Jetic Gū 2024 Fall Semester (S3)



### Overview

- Architecture: von Neumann
- Textbook: CO: 4.5
- Core Ideas:
  - 1. Pipelined Computers II: Hazard Control
  - 2. Pipelined Computers III: Simple Pipelined CPU
  - 3. Lab 4 Part 2

## **Review** Properties of CPU Pipeline

- Does pipelining reduce latency of a single stage/task?
  - No, but it increases throughput of entire workload
- What could affect pipeline's efficiency?
  - The slowest stage
  - Total number of stages
- When to **fill** pipeline, and when to **drain/flush** it

Unbalanced lengths of stages: some stages significantly slower than others



### **CPU Pipelines II**







### Possible Issues in Implementation

- Structural hazards resource
- Control hazards previous branch instruction already in pipeline
- **Data hazards**

an instruction in the pipeline requires data to be computed by a previous instruction still in the pipeline

Different instructions, at different stages, want to use the same hardware

Succeeding instruction, to put into pipeline, depends on the outcome of a



### Possible Issues in Implementation

- Structural hazards resource
  - Solution: stalling

Different instructions, at different stages, want to use the same hardware

• e.g. when multiple stages of execution wants to access the main memory, it is served at a first-come-first-server principle

• the rest of the stages are "stalled", and have to wait for their turns



# Possible Issues in Implementation

# Structural hazards Different instructions, at different st resource

• Solution: stalling

**P1** 

Pipeline



Different instructions, at different stages, want to use the same hardware



### Possible Issues in Implementation

### Control hazards

Succeeding instruction, to put into pipeline, depends on the outcome of a previous branch instruction already in pipeline

- Solution #1: stalling
  - instructions' fetch are stalled
  - effectively **flushes** the entire pipeline

when a branching instruction is Fetched into the pipeline, subsequent

this prevents new instructions from being fetched into the pipeline,



### Possible Issues in Implementation

### Control hazards

Succeeding instruction, to put into pipeline, depends on the outcome of a previous branch instruction already in pipeline

- Solution #2: static branch prediction

  - Think: how is this different from stalling? Is it better or worse?

• Proceed with pipeline, keep **fetching**. If outcome from a conditional branch stands (actually goes into branch), then perform **flush** 



### Possible Issues in Implementation

### Control hazards

Succeeding instruction, to put into pipeline, depends on the outcome of a previous branch instruction already in pipeline

- Solution #3: delayed branch
  - executing even when it encounters branching
  - This relies heavily on compilers, doesn't always work

• Requires the compiler to find branching-independent instructions to put right next to the branching statement, so the pipeline can keep



### Possible Issues in Implementation

#### Data hazards

An instruction in the pipeline, requires data to be computed by a previous instruction still in the pipeline

- Solution: forwarding

  - In the event this isn't enough, stall

• Create bridges between different stages', so some data can be fast forwarded to the next stage, parallel to e.g. register write operations



> **Data hazards**

An instruction in the pipeline, requires data to be computed by a previous instruction still in the pipeline

• Solution: forwarding

Program execution order (in instructions)

stage and Decoding stage





### sub here depends on add, so we create a bridge between the Execution



### Software Solution

- Is there anything you could do on the software side?
  - Compilers (e.g. gcc, clang, etc.)
    - instructions, simplify your code, etc. to prevent issues
    - Compiler flags
      - gcc options: -01, -02, -03

Code comes in, compiler depends on the CPU architecture, tries to reorder

speed things up for you by aggressively doing reordering among other things. Using -03 could cause issues especially if you are managing memory manually, use with caution. -02 and -03 are also not gdb/11db friendly.





### Summary

- Hazards
  - Structural Hazard: Stall
  - Control Hazard: Stall / Branch prediction / Delayed branch
  - Data Hazard: Forwarding / Stall
  - Software: Compiler optimisation, reordering



**P2** Implementation

### **CPU Pipelines III** Simple Pipeline Implementation

### Implementation Our MIPS Example Before





### Implementation Our MIPS Example Before

- MIPS CPUs commonly uses a 5 stage design
  - Fetch, Decode, Execute
  - Additional stages after Execution: **Memory**, **Write Back**













saved in <u>S2</u> registers (stage 2 registers) for the ALU to access

• In the **Decode** stage, information from the Register Array is retrieved, and

























### Implementation How is ARMv7 different?

- Our ARM16 CPU is based on the specification for ARMv7 thumb instruction set
- In ARMv7, PC is part of the register array
  - We can accomplish that by:

     have R7 values (PC) outputted directly from the register array, separate from the Rm\_data, Rn\_data
     implement the increment mechanism directly into R7 (why can't we implement it in the ALU?)
- In ARMv7, there's only the main memory. However, since instructions are commonly cached in the instruction cache, we can still preserve the instruction memory module, as if it's actually the Instruction Cache.





### Lab 4 Part 2

Data Bus controller, Memory Array Modification

#### **16bit Memory Controller P3** Lab

- - Bidirectional: 16bit D16, db
  - Input: 1bit, db dir; when db dir is 0, data flows into memory, triggering store/mem\_write
  - Input: 16bit, DO from memory module;
  - Output: 16bit, DI to the memory module;
  - Output: 1bit WE;
- Always, WE <= db dir; When  $db_dir == 0$ , DI <= db; When db\_dir == 1, db <= DO;

• We need a memory controller that is connected to the data bus, depending on direction react accordingly





### **3 State Buffers**

- Component name: Buffer-4 T.S.
- This is a 4bit 3 state buffer
- You can use this to control whether a line is connected or not
- Remember, Z here means Hi-Z (no connection, open circuit)





## 4bit Memory Controller

• db\_dir == 0

**P**3

Lab

- Memory is in write/store mode WE <= 0
- Store value: DI <= db
- DO receives Z
- db\_dir == 1
  - Memory is in read/load mode WE <= 1
  - DI receives db or  ${\rm Z}$
  - db <= DO





### Data Bus Controller

- We need a data bus controller just like the one in our MIPS example
  - Bidirectional: 16bit D16, db
  - Input: 16bit D16, ALU from ALU; 16bit D16, Rx data, from Register array
  - Input: 1bit db dir

**P3** 

Lab

- Output: 16bit D16 Rd data
- When db dir == 0, db <= Rx data; Rd data <= ALU; When db dir == 1, Rd data <= db;



### db\_ctrl and mem\_ctrl tested together

**P3** 

Lab





### Register Array Modification

- You need to modify your register array, such that
  - R7's value is updated every CLK, either from Rt data, or to R7 + 1
  - Add an Rx data output bus, selected by Rx
  - Add a 16bit PC output bus, always showing the value of R7

