Cover

DDCArv_Ch6.pdf

Summary

# Introduction to computer architecture and assembly language This section introduces computer architecture as the programmer's perspective of a computer system, emphasizing its instruction set and operand locations, and contrasts it with microarchitecture, while also detailing assembly language and the RISC-V architecture's origins and design principles [3](#page=3). ### 1.1 Computer architecture: the programmer's view Computer architecture is defined as the programmer's view of a computer. It is characterized by the set of instructions the computer understands and where it can find the data (operand locations) that these instructions operate on. This contrasts with microarchitecture, which focuses on the hardware implementation of that architecture [3](#page=3). ### 1.2 Assembly language and machine language * **Instructions:** These are the fundamental commands that a computer executes [4](#page=4). * **Assembly language:** This is a human-readable representation of machine instructions, making it easier for programmers to write and understand code [4](#page=4). * **Machine language:** This is the computer-readable format of instructions, typically represented as sequences of ones and zeros [4](#page=4). ### 1.3 The RISC-V architecture The RISC-V architecture was developed at the University of California, Berkeley, starting in 2010 by Krste Asanovic, David Patterson, and their colleagues. It is recognized as the first widely accepted open-source computer architecture [4](#page=4). #### 1.3.1 Key figures in RISC-V development * **Krste Asanovic:** A Professor of Computer Science at UC Berkeley, he developed RISC-V during a summer and is the Chairman of the Board of the RISC-V Foundation. He is also a co-founder of SiFive [5](#page=5). * **Andrew Waterman:** A co-founder of SiFive, Waterman was instrumental in co-designing the RISC-V architecture and its first cores, driven by dissatisfaction with existing instruction set architectures (ISAs). He earned his PhD from UC Berkeley in 2016 [6](#page=6). * **David Patterson:** A Professor of Computer Science at UC Berkeley since 1976, Patterson, along with John Hennessy, is credited with coinventing the Reduced Instruction Set Computer (RISC) architecture in the 1980s. He was a founding member of the RISC-V team and received the Turing Award with John Hennessy for their pioneering work in the quantitative design and evaluation of computer architectures [7](#page=7). * **John Hennessy:** Served as President of Stanford University from 2000 to 2016 and has been a Professor of Electrical Engineering and Computer Science at Stanford since 1977. He also coinvented RISC with David Patterson and shared the Turing Award with him for their contributions to computer architecture [8](#page=8). ### 1.4 Architecture design principles Hennessy and Patterson articulated several underlying design principles for computer architectures [9](#page=9): 1. **Simplicity favors regularity:** Consistent and predictable design choices simplify the architecture [9](#page=9). 2. **Make the common case fast:** Prioritize performance for the most frequently occurring operations [9](#page=9). 3. **Smaller is faster:** Generally, simpler and smaller designs lead to faster execution [9](#page=9). 4. **Good design demands good compromises:** Effective architectural design often involves balancing competing requirements and making trade-offs [9](#page=9). > **Tip:** Understanding the distinction between architecture and microarchitecture is crucial. Architecture defines *what* the programmer sees and interacts with, while microarchitecture defines *how* that architecture is physically implemented in hardware. This study guide focuses on the former. > > **Tip:** Learning about RISC-V is valuable because its open-source nature makes it accessible and fosters a deeper understanding of computer instruction sets. Once you master one architecture, learning others becomes significantly easier [4](#page=4). --- # RISC-V instruction set architecture: operands and instructions This topic delves into the fundamental components of the RISC-V instruction set, covering its arithmetic and logical instructions, operand types, and memory access mechanisms, all while emphasizing core RISC design principles [10](#page=10) [11](#page=11) [16](#page=16). ### 2.1 Core instruction set principles RISC-V adheres to design principles that prioritize simplicity and efficiency for common operations [15](#page=15). #### 2.1.1 Simplicity favors regularity A key design principle is that simplicity favors regularity. This means RISC-V uses a consistent instruction format, with most arithmetic and logical instructions having two source operands and one destination operand. This regularity makes instructions easier to encode and handle in hardware [13](#page=13). #### 2.1.2 Make the common case fast Another crucial principle is to make the common case fast. RISC-V achieves this by including a small set of simple, frequently used instructions. This allows for simple, small, and fast hardware for decoding and executing these instructions. Less common, more complex instructions are implemented by combining multiple simple RISC-V instructions. This contrasts with Complex Instruction Set Computers (CISC) like Intel's x86, which feature a larger, more complex set of instructions [15](#page=15). #### 2.1.3 Smaller is faster The principle of "smaller is faster" is also evident in RISC-V's design, particularly in its limited number of registers [19](#page=19). ### 2.2 Instruction types RISC-V provides fundamental instructions for arithmetic, logical operations, and memory access. #### 2.2.1 Arithmetic and logical instructions Basic arithmetic operations are supported with straightforward mnemonics. * **Addition:** The `add` instruction performs addition. It takes two source operands and writes the result to a destination operand [11](#page=11). * C Code: `a = b + c;` * RISC-V Assembly: `add a, b, c` [11](#page=11). * **Subtraction:** The `sub` instruction performs subtraction, following the same operand structure as addition [12](#page=12). * C Code: `a = b - c;` * RISC-V Assembly: `sub a, b, c` [12](#page=12). #### 2.2.2 Handling complex operations More complex operations, which might be single instructions in CISC architectures, are decomposed into multiple RISC-V instructions [14](#page=14). * **Example:** `a = b + c - d;` * RISC-V Assembly: ```assembly add t, b, c # t = b + c sub a, t, d # a = t - d ``` ### 2.3 Operands Operands are the physical locations from which data is fetched or to which data is written. In RISC-V, these include registers, memory locations, and constants (immediates) [17](#page=17). #### 2.3.1 Registers RISC-V systems feature 32 32-bit registers, which are significantly faster than memory access. The architecture is referred to as "32-bit" because it primarily operates on 32-bit data [18](#page=18). ##### 2.3.1.1 RISC-V register set The RISC-V register set has specific names and numbers assigned to each register, along with their designated usage [20](#page=20). | Name | Register Number | Usage | | :------ | :-------------- | :----------------------------------- | | `zero` | `x0` | Constant value 0 | | `ra` | `x1` | Return address | | `sp` | `x2` | Stack pointer | | `gp` | `x3` | Global pointer | | `tp` | `x4` | Thread pointer | | `t0`-`t2` | `x5`-`x7` | Temporaries | | `s0`/`fp` | `x8` | Saved register / Frame pointer | | `s1` | `x9` | Saved register | | `a0`-`a1` | `x10`-`x11` | Function arguments / return values | | `a2`-`a7` | `x12`-`x17` | Function arguments | | `s2`-`s11`| `x18`-`x27` | Saved registers | | `t3`-`t6` | `x28`-`x31` | Temporaries | ##### 2.3.1.2 Register usage conventions Registers can be referred to by their names (e.g., `ra`, `zero`) or their numbers (e.g., `x1`, `x0`). Using names is generally preferred for readability. Specific registers have conventional uses [21](#page=21): * `zero` always holds the constant value 0 [21](#page=21). * Saved registers (`s0`-`s11`) are used to preserve variable values across function calls [21](#page=21). * Temporary registers (`t0`-`t6`) are used for intermediate values during calculations [21](#page=21). ##### 2.3.1.3 Instructions involving registers Instructions that operate on registers use the defined naming conventions [22](#page=22). * **Example:** `a = b + c;` (where `a` maps to `s0`, `b` to `s1`, and `c` to `s2`) * RISC-V Assembly: `add s0, s1, s2` [22](#page=22). #### 2.3.2 Constants (Immediates) Constants, also known as immediates, are literal values embedded directly within instructions. * **Instructions with constants:** The `addi` instruction allows adding a constant to a register [23](#page=23). * C Code: `a = b + 6;` (where `a` maps to `s0` and `b` to `s1`) * RISC-V Assembly: `addi s0, s1, 6` [23](#page=23). * **Generating 12-bit constants:** The `addi` instruction can handle 12-bit signed constants. Any constant requiring more than 12 bits cannot be generated this way [37](#page=37). * **Example:** `int a = -372; int b = a + 6;` (where `a` maps to `s0` and `b` to `s1`) * RISC-V Assembly: ```assembly addi s0, zero, -372 addi s1, s0, 6 ``` * **Generating 32-bit constants:** Larger constants require a combination of instructions. The `lui` (load upper immediate) instruction and `addi` are used together [38](#page=38). * The `lui` instruction places an immediate value into the upper 20 bits of a destination register, zeroing out the lower 12 bits [38](#page=38). * The `addi` instruction is then used to add the lower 12 bits of the constant. It is important to note that `addi` sign-extends its 12-bit immediate [38](#page=38). * **Example:** `int a = 0xFEDC8765;` (where `a` maps to `s0`) * RISC-V Assembly: ```assembly lui s0, 0xFEDC8 addi s0, s0, 0x765 ``` * **Handling constants with bit 11 set to 1:** If the 11th bit of the 32-bit constant is 1, the upper 20 bits loaded by `lui` need to be incremented by 1 to correctly form the final constant due to sign extension in `addi` [39](#page=39). * **Example:** `int a = 0xFEDC8EAB;` (where `a` maps to `s0`) * RISC-V Assembly: ```assembly lui s0, 0xFEDC9 # s0 = 0xFEDC9000 addi s0, s0, -341 # s0 = 0xFEDC9000 + 0xFFFFFEAB = 0xFEDC8EAB ``` (Note: -341 is represented as `0xEAB` in hexadecimal for the lower 12 bits after sign extension) [39](#page=39). #### 2.3.3 Memory operands When data exceeds the capacity of registers, it is stored in memory. While memory is large, it is slower than registers [25](#page=25). ##### 2.3.3.1 Memory addressing schemes RISC-V memory can be conceptualized as either word-addressable or byte-addressable. RISC-V itself is byte-addressable [26](#page=26). * **Word-addressable memory:** In this scheme, each 32-bit word has a unique address. A word in this context is 4 bytes wide [27](#page=27). * **Load Word (`lw`):** To read a word from memory, the `lw` instruction is used. It specifies a destination register and an address calculated by adding an offset to a base register [28](#page=28). * Format: `lw destination, offset(base)` [28](#page=28). * Address Calculation: `address = base + offset` [28](#page=28). * **Example:** Reading memory word 1 into `s3`. The address is calculated as `(zero + 1) = 1` [29](#page=29). * RISC-V Assembly: `lw s3, 1(zero)` [29](#page=29). * **Store Word (`sw`):** To write a word to memory, the `sw` instruction is used. It takes a source register (whose value is to be stored) and an address calculated similarly to `lw` [30](#page=30) [31](#page=31). * Format: `sw source, offset(base)` [31](#page=31). * **Example:** Storing the value in `t4` into memory word 3. The address is `(zero + 0x3) = 3` [31](#page=31). * RISC-V Assembly: `sw t4, 0x3(zero)` [31](#page=31). * The offset can be specified in decimal or hexadecimal [31](#page=31). * **Byte-addressable memory:** In this scheme, each individual byte has a unique address. Since a 32-bit word consists of 4 bytes, word addresses increment by 4. RISC-V operates on this model [32](#page=32) [33](#page=33). * **Address Calculation in Byte-Addressable Memory:** For `lw` and `sw` in a byte-addressable system, the memory word address must be multiplied by 4 to get the actual byte address. For instance, the address of memory word 2 is `2 * 4 = 8` [33](#page=33). * **Example (Load Word):** Loading a word at memory address 8 into `s3` [34](#page=34). * RISC-V Assembly: `lw s3, 8(zero)` [34](#page=34). * **Example (Store Word):** Storing the value from `t7` into memory address `0x10` (which is word 4) [35](#page=35). * RISC-V Assembly: `sw t7, 0x10(zero)` [35](#page=35). * **Load Byte (`lb`) and Store Byte (`sb`):** RISC-V also supports instructions for loading and storing individual bytes from or to memory, such as `lb` and `sb` [32](#page=32). --- # Control flow in RISC-V: branches, jumps, loops, and function calls This section details how RISC-V processors manage program execution flow through branches, jumps, and function call mechanisms, and how these instructions facilitate the implementation of high-level control structures [41](#page=41) [51](#page=51). ### 3.1 Branching instructions Branching allows the processor to execute instructions out of their sequential order. RISC-V supports two main types of branches: conditional and unconditional [52](#page=52). #### 3.1.1 Conditional branching Conditional branches execute a jump to a different instruction address only if a specified condition is met. Common conditional branch instructions include: * `beq rs1, rs2, label`: Branch if equal [52](#page=52). * `bne rs1, rs2, label`: Branch if not equal [52](#page=52). * `blt rs1, rs2, label`: Branch if less than [52](#page=52). * `bge rs1, rs2, label`: Branch if greater than or equal [52](#page=52). These instructions compare the values in two source registers (`rs1` and `rs2`) and, if the condition is true, the program counter (PC) is updated to the address specified by `label`. If the condition is false, execution continues to the next instruction in sequence. Labels are symbolic names that represent instruction addresses and are followed by a colon (`:`) [53](#page=53) [54](#page=54). > **Tip:** Assembly code often tests the opposite condition of the high-level language construct to simplify branching logic (e.g., testing `i!= j` to implement an `if (i == j)` statement) [58](#page=58) [60](#page=60). #### 3.1.2 Unconditional branching Unconditional branches always redirect program execution to a specified address. The primary unconditional branch instruction is: * `j label`: Jump [52](#page=52). This instruction directly changes the PC to the address indicated by `label`, causing any instructions between the `j` instruction and the `label` to be skipped [55](#page=55). ### 3.2 Jumps and Jumps with Link Jumps, particularly "jump and link" instructions, are crucial for implementing function calls and more complex control flow. RISC-V has two primary jump instructions: * `jal rd, imm20:0`: Jump and link. This instruction saves the address of the next instruction (PC + 4) into a destination register (`rd`) and then updates the PC to the address specified by the immediate offset. This is fundamental for returning from functions . * `jalr rd, rs, imm11:0`: Jump and link register. This instruction saves the address of the next instruction (PC + 4) into `rd`, and then updates the PC to the sum of the value in a source register (`rs`) and a sign-extended immediate offset . ### 3.3 Pseudoinstructions Pseudoinstructions are convenient mnemonics that the assembler translates into actual RISC-V machine instructions. They simplify programming by providing higher-level abstractions . #### 3.3.1 Jump pseudoinstructions Several pseudoinstructions facilitate jumps: * `j label`: Translates to `jal x0, label`. It performs an unconditional jump to `label` without saving the return address . * `jal label`: Translates to `jal ra, label`. It jumps to `label` and saves the return address in the `ra` (return address) register . * `jr rs`: Translates to `jalr x0, rs, 0`. It jumps to the address in register `rs` without saving the return address . * `ret`: Translates to `jr ra` or `jalr x0, ra, 0`. This is the standard instruction for returning from a function . Labels are used to specify jump targets, and the immediate offset is calculated as the number of bytes past the jump instruction . #### 3.3.2 Long jumps The immediate offsets in `jal` (20 bits) and `jalr` (12 bits) limit the range of jumps. For longer jumps, RISC-V uses the `auipc rd, imm` instruction, which adds an upper immediate to the PC (`rd = PC + {imm31:12, 12'b0}`). The pseudoinstruction `call imm31:0` combines `auipc` and `jalr` to achieve a 32-bit offset jump . ### 3.4 Implementing Control Structures Branch and jump instructions are the building blocks for high-level control structures like conditional statements and loops [57](#page=57). #### 3.4.1 If statements An `if` statement, such as `if (i == j) f = g + h;`, can be implemented by branching over the code block if the condition is *not* met. For instance, to execute `add s0, s1, s2` when `s3` (i) equals `s4` (j), one would use `bne s3, s4, L1` to skip the `add` if `i` is not equal to `j`. The code then continues after the `if` block [58](#page=58). #### 3.4.2 If/Else statements `if/else` statements require an unconditional jump to skip the `else` block if the `if` condition is true. For `if (i == j) f = g + h; else f = f - i;` [59](#page=59): 1. Use `bne s3, s4, L1` to branch to `L1` if `i != j`. 2. If `i == j`, execute `add s0, s1, s2`. 3. Immediately after the `add`, use `j done` to skip the `else` block. 4. At `L1`, execute the `else` block: `sub s0, s0, s3`. 5. The code then proceeds from `done:`. #### 3.4.3 While loops A `while` loop typically checks a condition at the beginning of each iteration. For `while (pow != 128)`, the loop structure is: 1. Check the loop condition using a branch (e.g., `beq s0, t0, done` if `s0` is `pow` and `t0` is 128). If true, exit the loop. 2. Execute the loop body (e.g., `slli s0, s0, 1` for `pow = pow * 2` and `addi s1, s1, 1` for `x = x + 1`). 3. Use an unconditional jump (`j while`) to return to the loop condition check. 4. The `done:` label marks the exit point of the loop [60](#page=60). > **Tip:** Similar to `if` statements, `while` loops in assembly often test the inverse condition to jump out of the loop when the termination condition is met [60](#page=60). #### 3.4.4 For loops A `for` loop has three parts: initialization, condition, and loop operation [61](#page=61). * **Initialization:** Executes once before the loop begins. * **Condition:** Tested at the start of each iteration. * **Loop Operation:** Executes at the end of each iteration. * **Statement:** The loop body. A standard `for` loop like `for (i=0; i!=10; i = i+1) { sum = sum + i; }` can be implemented as: 1. **Initialization:** `addi s1, zero, 0` (sum=0), `addi s0, zero, 0` (i=0). 2. **Condition:** `beq s0, t0, done` (if i == 10, branch to `done`). 3. **Statement:** `add s1, s1, s0` (sum = sum + i). 4. **Loop Operation:** `addi s0, s0, 1` (i = i + 1). 5. **Jump back:** `j for` to re-evaluate the condition [62](#page=62). For loops involving "less than" conditions, such as `for (i=1; i < 101; i = i*2)`, instructions like `bge` (branch if greater or equal) or `slt` (set if less than) can be used [63](#page=63) [64](#page=64). * Using `bge`: `bge s0, t0, done` where `s0` is `i`, `t0` is 101. If `i` is greater than or equal to 101, exit the loop. * Using `slt`: `slt t2, s0, t0` sets `t2` to 1 if `s0 < t0`. Then, `beq t2, zero, done` checks if `t2` is zero (meaning `s0` was not less than `t0`), exiting the loop if it is [64](#page=64). #### 3.4.5 Arrays Arrays provide access to large amounts of similar data using an index. Accessing an array element involves [66](#page=66): 1. Loading the base address of the array into a register [67](#page=67). 2. Calculating the byte offset for the desired element: `byte_offset = index * element_size`. For 4-byte words, this is `index * 4`, which can be efficiently done with a left shift: `slli t0, s1, 2` where `s1` is the index [70](#page=70). 3. Calculating the memory address: `address = base_address + byte_offset`. 4. Using `lw` (load word) or `sw` (store word) to access the element at that address [68](#page=68) [70](#page=70). When accessing arrays of characters (strings), the null terminator character (`\0`) is often used to determine the end of the string. A `while` loop can iterate until it encounters this null character, incrementing a length counter [73](#page=73). ### 3.5 Function Calls Function calls enable modularity and code reuse by allowing one part of the program (the caller) to execute another part (the callee) and then return [75](#page=75). #### 3.5.1 Calling conventions A function call follows a set of conventions: * **Caller:** Passes arguments to the callee and then jumps to the callee's address. After the callee returns, the caller retrieves the return value and resumes execution. * **Callee:** Performs the function's task, returns a result (if any), and returns control to the caller at the correct instruction. Crucially, the callee must not modify registers or memory that the caller relies on without saving and restoring them [77](#page=77). #### 3.5.2 RISC-V function calling conventions RISC-V specifies these conventions: * **Function Call:** `jal func` (jump and link to the function named `func`) [78](#page=78). * **Function Return:** `jr ra` (jump register, returning to the address stored in `ra`) [78](#page=78). * **Arguments:** Passed in registers `a0` through `a7` [78](#page=78). * **Return Value:** Placed in register `a0` [78](#page=78). For functions with more than eight arguments, subsequent arguments are typically passed on the stack [79](#page=79) [80](#page=80). > **Example:** To call a function `diffofsums` with arguments 2, 3, 4, and 5, the caller would load these values into `a0`, `a1`, `a2`, and `a3` respectively, then execute `jal diffofsums`. The result returned in `a0` would then be stored in a caller-saved register like `s7` [80](#page=80). #### 3.5.3 Register usage and the stack The callee must preserve any registers it modifies that the caller might need later. These are categorized into: * **Caller-saved (Non-preserved):** `t0`-`t6` and `a0`-`a7`. The caller is responsible for saving these if they need to retain their values across a function call [88](#page=88). * **Callee-saved (Preserved):** `s0`-`s11` and `sp`. The callee *must* save these registers if it intends to use them and restore them before returning [88](#page=88). The stack is used to save registers that need to be preserved by the callee. The stack grows downwards from higher to lower memory addresses, and the stack pointer (`sp`) points to the top of the stack [81](#page=81) [83](#page=83) [84](#page=84) [85](#page=85). To save registers on the stack: 1. **Make space:** Decrement the stack pointer by the total size of the data to be stored (e.g., `addi sp, sp, -12` to save three 4-byte words) [86](#page=86). 2. **Store:** Use `sw` (store word) to save register values at offsets relative to `sp` (e.g., `sw s3, 8(sp)`) [86](#page=86). 3. **Perform computation.** 4. **Restore:** Use `lw` (load word) to retrieve saved values from the stack. 5. **Deallocate space:** Increment the stack pointer by the amount it was decremented (e.g., `addi sp, sp, 12`) [86](#page=86). > **Tip:** When a function needs to use a callee-saved register (`s0`-`s11`), it must save its original value on the stack *before* using it and restore it *after* it has finished using it and before returning [89](#page=89). #### 3.5.4 Non-leaf function calls A non-leaf function is one that calls another function. Before calling another function, a non-leaf function must save its own return address (`ra`) on the stack, in addition to any other registers it needs to preserve. This is because the subsequent `jal` instruction to call the next function will overwrite `ra` with the return address for the *current* function. After the called function returns, the non-leaf function restores `ra` and any other saved registers before returning to its own caller [91](#page=91) [92](#page=92) [93](#page=93). > **Example:** If `f1` calls `f2`, `f1` must save `ra`, `s4`, and `s5` on the stack before calling `f2`. `f2` might also use the stack to save its own registers, like `s4`. When `f2` returns, `f1` restores its saved registers (including `ra`) and then returns to its caller [92](#page=92) [93](#page=93). #### 3.5.5 Recursive functions A recursive function is one that calls itself. Implementing a recursive function in assembly involves a two-pass approach: 1. **Pass 1:** Treat the recursive call as if it were a call to a different function, ignoring potential register overwrites and stack usage for now [96](#page=96) [99](#page=99). 2. **Pass 2:** Identify which registers are overwritten by the recursive call and are needed *after* the call returns. These registers, along with the return address (`ra`), must be saved on the stack before the recursive call and restored after [100](#page=100) [99](#page=99). For a factorial function: `factorial(n) = n * factorial(n-1)`. The base case is `factorial = 1` [1](#page=1) [97](#page=97) [98](#page=98). * The argument `n` is passed in `a0`. * If `n <= 1`, return 1 by placing 1 in `a0` and jumping to `jr ra`. * If `n > 1`, the function needs to: 1. Save `a0` (the current `n`) and `ra` on the stack [100](#page=100). 2. Decrement `a0` to `n-1` for the recursive call: `addi a0, a0, -1`. 3. Make the recursive call: `jal factorial`. 4. After the call returns, `a0` holds `factorial(n-1)`. Restore the original `n` from the stack into a temporary register (e.g., `t1`) [100](#page=100). 5. Restore `ra` from the stack. 6. Deallocate stack space. 7. Calculate the final result: `mul a0, t1, a0` (n * factorial(n-1)) [100](#page=100). 8. Return: `jr ra`. The stack grows with each recursive call and shrinks as the function returns, creating nested stack frames for each invocation . --- # RISC-V machine language and instruction formats This topic details the binary representation of RISC-V instructions, including their various formats, fields, and how assembly instructions translate to machine code. ### 4.1 Introduction to machine language Computers fundamentally understand only binary sequences (1s and 0s). Machine language is the direct binary representation of instructions that a processor can execute. RISC-V instructions are typically 32-bit in length, a design choice that supports regularity in both data and instruction sizes. To manage this complexity, RISC-V defines four primary instruction formats: R-Type, I-Type, S/B-Type, and U/J-Type . ### 4.2 Instruction formats The RISC-V instruction set architecture (ISA) employs several distinct instruction formats, each designed to efficiently encode different types of operations and operand addressing. These formats share common fields like the opcode, which dictates the basic operation, but differ in the arrangement and interpretation of other fields, particularly for immediate values and register operands . #### 4.2.1 R-Type format The R-Type (Register-Type) format is used for instructions that operate on three register operands: two source registers (`rs1`, `rs2`) and one destination register (`rd`). It also includes `funct7` and `funct3` fields, which, in conjunction with the `opcode`, specify the exact operation to be performed . **Structure:** The R-Type format is structured as follows: | `funct7` (7 bits) | `rs2` (5 bits) | `rs1` (5 bits) | `funct3` (3 bits) | `rd` (5 bits) | `opcode` (7 bits) | | :---------------- | :------------ | :------------ | :---------------- | :----------- | :---------------- | | 31:25 | 24:20 | 19:15 | 14:12 | 11:7 | 6:0 | **Example:** The `add x18, x19, x20` assembly instruction might translate to a specific R-Type binary representation where `rs1` points to register `x19`, `rs2` to `x20`, `rd` to `x18`, and the `opcode`, `funct3`, and `funct7` fields specify the addition operation . #### 4.2.2 I-Type format The I-Type (Immediate-Type) format is used for instructions that involve two register operands (`rs1`, `rd`) and a 12-bit signed immediate value. This format is commonly used for operations like adding an immediate value to a register (`addi`) or for load instructions (`lw`) . **Structure:** The I-Type format is structured as follows: | `imm[11:0]` (12 bits) | `rs1` (5 bits) | `funct3` (3 bits) | `rd` (5 bits) | `opcode` (7 bits) | | :-------------------- | :------------ | :---------------- | :----------- | :---------------- | | 31:20 | 19:15 | 14:12 | 11:7 | 6:0 | **Example:** The `addi x8, x9, 12` instruction uses the I-Type format. The immediate value `12` is encoded in the `imm[11:0]` field, `rs1` points to `x9`, and `rd` points to `x8`. Load instructions like `lw t2, -6(s3)` also use this format, where the immediate is an offset from the base address in `rs1` . #### 4.2.3 S/B-Type format The S/B-Type formats are used for Store and Branch instructions, respectively. They share a common structure but differ in their immediate encoding and intended use . ##### 4.2.3.1 S-Type format (Store-Type) The S-Type format is used for store instructions that write data from a register (`rs2`) to a memory location. The memory address is calculated using a base register (`rs1`) and a 12-bit signed immediate offset . **Structure:** The S-Type format is structured as follows: | `imm[11:5]` (7 bits) | `rs2` (5 bits) | `rs1` (5 bits) | `imm[4:0]` (5 bits) | `funct3` (3 bits) | `opcode` (7 bits) | | :-------------------- | :------------ | :------------ | :------------------ | :---------------- | :---------------- | | 31:25 | 24:20 | 19:15 | 14:12 | 11:7 | 6:0 | **Example:** The `sw x7, -6(x19)` instruction uses the S-Type format. `rs1` (x19) holds the base address, `rs2` (x7) holds the value to be stored, and the immediate `-6` is split into two parts (`imm[11:5]` and `imm[4:0]`) to form the full 12-bit offset . ##### 4.2.3.2 B-Type format (Branch-Type) The B-Type format is used for conditional branch instructions. It typically involves two source registers (`rs1`, `rs2`) and a 13-bit signed immediate value that represents an offset from the current program counter (PC) to the target branch address. The immediate encoding is split across fields to accommodate the requirement for two source registers . **Structure:** The B-Type format is structured as follows: | `imm[12:10]` (3 bits) | `rs2` (5 bits) | `rs1` (5 bits) | `imm[4:1]` (4 bits) | `imm ` (1 bit) | `funct3` (3 bits) | `opcode` (7 bits) | [11](#page=11). | :-------------------- | :------------ | :------------ | :------------------ | :---------------- | :---------------- | :---------------- | | 31:29 | 24:20 | 19:15 | 14:11 | 10 | 7:5 | 6:0 | **Example:** For a `beq x8, x30, L1` instruction, the `imm` field encodes the offset to the label `L1`. The immediate value is calculated relative to the PC of the branch instruction itself . #### 4.2.4 U/J-Type format The U/J-Type formats cater to instructions that manipulate larger immediate values for setting register contents or for unconditional jumps . ##### 4.2.4.1 U-Type format (Upper-Immediate-Type) The U-Type format is primarily used for the `lui` (load upper immediate) instruction. It loads the upper 20 bits of a 32-bit immediate into a destination register (`rd`), with the lower 12 bits implicitly set to zero . **Structure:** The U-Type format is structured as follows: | `imm[31:12]` (20 bits) | `rd` (5 bits) | `opcode` (7 bits) | | :--------------------- | :----------- | :---------------- | | 31:12 | 11:7 | 6:0 | **Example:** The `lui x21, 0x8CDEF` instruction loads the hexadecimal value `0x8CDEF` into register `x21`. The immediate `0x8CDEF` occupies the most significant 20 bits of the instruction . ##### 4.2.4.2 J-Type format (Jump-Type) The J-Type format is used for unconditional jump instructions, specifically `jal` (jump and link). It encodes a 20-bit immediate value that forms a 21-bit offset relative to the PC. It also specifies a destination register (`rd`) for storing the return address . **Structure:** The J-Type format is structured as follows: | `imm[20:10]`, `imm `, `imm[19:12]` (20 bits) | `rd` (5 bits) | `opcode` (7 bits) | [11](#page=11). | :--------------------------------------------- | :----------- | :---------------- | | 31:12 | 11:7 | 6:0 | **Example:** The `jal ra, func1` instruction uses the J-Type format. The immediate value, representing the offset to `func1`, is encoded within the instruction, and the `ra` register (return address) stores the address of the instruction following `jal` . ### 4.3 Instruction fields summary RISC-V instructions are composed of several key fields that are interpreted differently based on the instruction format. Understanding these fields is crucial for decoding machine code and understanding instruction behavior . * **`opcode` (7 bits):** This field is present in all instruction formats and indicates the fundamental operation type . * **`funct3` (3 bits):** Used in R, I, S, and B types to further differentiate operations within the same `opcode` . * **`funct7` (7 bits):** Used in R-Type instructions to specify the exact operation, often distinguishing between variations like `add` and `sub` . * **`rd` (5 bits):** The destination register for the result of the operation . * **`rs1` (5 bits):** The first source register . * **`rs2` (5 bits):** The second source register (used in R-Type and S-Type) . * **`imm` (immediate):** A constant value encoded within the instruction. The size and encoding of the immediate vary significantly by instruction type (12-bit for I/S, 13-bit for B, 20-bit for U/J) . The combination of `opcode`, `funct3`, and `funct7` uniquely identifies the instruction . > **Tip:** The RISC-V design prioritizes simplicity and regularity, which is why the number of instruction formats is kept small, contributing to faster processor design and execution . ### 4.4 Immediate encodings The encoding of immediate values is a critical aspect of RISC-V instruction formats, designed to efficiently embed constants within the 32-bit instruction word. The immediate bits often occupy consistent positions across different instruction formats to simplify hardware implementation . The sign bit of a signed immediate is typically located in the most significant bit (MSB) of the instruction. Different instruction types utilize different subsets of the 32 bits for their immediate values : * **I-Type:** Uses a 12-bit immediate . * **S-Type:** Uses a 12-bit immediate, split into two parts . * **B-Type:** Uses a 13-bit immediate for branch offsets . * **U-Type:** Uses the upper 20 bits of a 32-bit immediate . * **J-Type:** Uses a 20-bit immediate, which forms a 21-bit offset . #### 4.4.1 Composition of 32-bit immediates The immediate values for different instruction types are assembled from various bit ranges within the 32-bit instruction word. This strategic placement simplifies the decoding and sign-extension logic within the processor . ### 4.5 Interpreting machine code Interpreting machine code involves converting the binary instruction back into its assembly representation and understanding its operation. This process begins by examining the `opcode` field, which dictates how the remaining bits should be parsed according to the respective instruction format . **Steps:** 1. **Convert Hex to Binary:** Represent the machine code (often given in hexadecimal) in its full 32-bit binary form . 2. **Identify Opcode:** Extract the `opcode` bits (bits 6:0) to determine the instruction type. 3. **Parse Fields:** Based on the `opcode`, identify and extract the other fields (`funct3`, `funct7`, `rs1`, `rs2`, `rd`, `imm`) according to the instruction format . 4. **Determine Operation:** Use the `opcode`, `funct3`, and `funct7` fields to identify the specific instruction and its operation . 5. **Assemble Operands:** Reconstruct the assembly instruction using the register names and the decoded immediate values. **Example:** Given `0x41FE83B3`, converting to binary and examining the fields reveals it's an R-Type instruction with `opcode=51`, `funct3=0`, and `funct7=0100000`. This combination decodes to the `sub` operation. Similarly, `0xFDA48393` is an I-Type instruction (`opcode=19`, `funct3=0`) corresponding to `addi` . ### 4.6 Addressing modes Addressing modes define how the operands of an instruction are accessed in memory or registers. RISC-V supports several fundamental addressing modes : #### 4.6.1 Register only In this mode, both source and destination operands are found directly in CPU registers . * **Example:** `add s0, t2, t3` . #### 4.6.2 Immediate The operand is a constant value directly encoded within the instruction itself. This is common for arithmetic and logical operations that involve a literal value . * **Example:** `addi s4, t5, -73` . #### 4.6.3 Base addressing This mode is primarily used by load and store instructions. The effective memory address is calculated by adding a base register's content to a signed immediate offset . * **Example:** `lw s4, 72(zero)` computes the address as `0 + 72`. `sw t2, -25(t1)` computes the address as `content(t1) - 25` . #### 4.6.4 PC-relative addressing This mode is used for branches and jump-and-link instructions. The target address is calculated relative to the current Program Counter (PC). The immediate value in the instruction encodes an offset from the PC of the branch/jump instruction . * **Example:** `bne s8, s9, L1`. The label `L1`'s address is determined by an offset encoded in the `imm` field of the `bne` instruction, relative to the PC of the `bne` instruction itself . --- # Program execution, compilation, and memory organization Program execution, compilation, and memory organization details how software is transformed into machine instructions, managed in memory, and executed by the processor. ## 5. Program execution, compilation, and memory organization The stored program concept is fundamental to modern computing, allowing processors to execute a sequence of instructions stored in memory. This eliminates the need for rewiring when switching applications; simply loading a new program into memory is sufficient. Program execution involves the processor fetching instructions sequentially from memory and performing the specified operations, with the Program Counter (PC) tracking the current instruction's address . ### 5.1 Compiling, assembling, and loading programs The process of transforming high-level code into an executable program involves several stages: * **Compiler**: Translates high-level language code (e.g., C, Java) into assembly code. Grace Hopper is credited with developing the first compiler . * **Assembler**: Converts assembly code into machine code (object files) . * **Linker**: Combines multiple object files and library files to create a single executable file . * **Loader**: Loads the executable file into memory, preparing it for execution . ### 5.2 Memory organization Memory stores both program instructions (also known as the "text" segment) and data. Data can be global/static, allocated before program execution, or dynamic, allocated during runtime. In RISC-V, memory is typically addressed from `0x00000000` to `0xFFFFFFFF`, supporting up to 4 gigabytes . #### 5.2.1 RISC-V memory map A typical RISC-V memory map divides the address space into several segments: * **Text segment**: Contains program instructions . * **Global Data**: Stores global and static variables . * **Heap**: Used for dynamic memory allocation . * **Stack**: Used for function calls, local variables, and return addresses. The stack pointer (`sp`) typically grows downwards from a high address . * **Operating System & I/O**: Reserved for the operating system and input/output devices . * **Exception Handlers**: Region for code that handles exceptions and interrupts . #### 5.2.2 Endianness Endianness refers to the order in which bytes are stored within a multi-byte word in memory . * **Big-Endian**: The most significant byte (MSB) is stored at the lowest memory address . * **Little-Endian**: The least significant byte (LSB) is stored at the lowest memory address . > **Tip:** While the choice of endianness does not inherently matter for a single system, it becomes crucial when systems need to share data, as mismatches can lead to incorrect data interpretation . > > **Example:** If a register `t0` holds `0x23456789`, and this value is stored to memory, in a big-endian system, byte `0x89` would be at the lowest address. In a little-endian system, byte `0x23` would be at the lowest address . ### 5.3 Signed and unsigned instructions RISC-V distinguishes between signed and unsigned operations for certain instructions, particularly those involving multiplication, division, branches, and comparisons. This distinction is important for correctly interpreting and manipulating data . #### 5.3.1 Multiplication * **Signed multiplication**: `mulh` instruction . * **Unsigned multiplication**: `mulhu` (both operands unsigned) and `mulhsu` (first operand signed, second unsigned) instructions. The lower 32 bits of the result are the same for signed and unsigned multiplication, so the `mul` instruction can be used for both if only the lower bits are needed . #### 5.3.2 Division and remainder * **Signed division and remainder**: `div`, `rem` instructions . * **Unsigned division and remainder**: `divu`, `remu` instructions . #### 5.3.3 Branches * **Signed comparison branches**: `blt` (branch if less than), `bge` (branch if greater than or equal to) . * **Unsigned comparison branches**: `bltu` (branch if less than unsigned), `bgeu` (branch if greater than or equal to unsigned) . #### 5.3.4 Set less than * **Signed set less than**: `slt`, `slti` (immediate) instructions. These set a destination register to 1 if the first operand is less than the second, and 0 otherwise . * **Unsigned set less than**: `sltu`, `sltiu` (immediate) instructions. RISC-V always sign-extends immediate values, even for unsigned operations . #### 5.3.5 Loads Instructions for loading data from memory into registers can also be signed or unsigned. * **Signed loads**: `lh` (load halfword), `lb` (load byte). These instructions sign-extend the loaded value to create a 32-bit value for the register . * **Unsigned loads**: `lhu` (load halfword unsigned), `lbu` (load byte unsigned). These instructions zero-extend the loaded value to create a 32-bit value . #### 5.3.6 Detecting overflow RISC-V does not have explicit unsigned arithmetic instructions with overflow detection; however, it can be achieved using existing instructions . * **Unsigned overflow detection**: Can be detected by comparing the result of an addition with one of the operands using `bltu` . * **Signed overflow detection**: Requires a more complex sequence involving comparing signs of operands and results using `slti`, `slt`, and `bne` . ### 5.4 Compressed instructions Compressed instructions are 16-bit versions of common 32-bit RISC-V instructions, designed to reduce code size. Compilers and processors can mix 32-bit and 16-bit instructions, favoring compressed versions where possible. These instructions often use a `c.` prefix, e.g., `c.add` for `add`, `c.lw` for `lw`, and `c.addi` for `addi`. Some compressed instructions use a 3-bit register code for registers `x8` to `x15`, and immediates are typically smaller (6-11 bits) . > **Example:** A loop incrementing array elements might be implemented using compressed instructions like `c.lw`, `c.addi`, and `c.sw` for efficiency. However, if an immediate value is too large to fit in the compressed format, a non-compressed instruction like `addi` might be used instead . ### 5.5 Floating-point instructions RISC-V supports floating-point operations through extensions: * **RVF**: Single-precision (32-bit) floating-point . * **RVD**: Double-precision (64-bit) floating-point . * **RVQ**: Quad-precision (128-bit) floating-point . Floating-point registers (`f0` to `f31`) are available, and their width is determined by the highest precision extension implemented. Lower-precision values occupy the lower bits of these registers. Common floating-point instructions include arithmetic operations (`fadd`, `fsub`, `fmul`, `fdiv`, `fsqrt`, multiply-add variants like `fmadd`), data movement (`fmv`), conversions (`fcvt`), and comparisons (`feq`, `flt`, `fle`). Precision is denoted by suffixes `.s`, `.d`, or `.q` (e.g., `fadd.s` for single-precision add). Multiply-add instructions use an R4-type format due to their four register operands . > **Example:** The C code `scores[i = scores[i + 10` with `float scores ` would be translated into RISC-V assembly using floating-point instructions like `flw` (float load word), `fadd.s` (float add single-precision), and `fsw` (float store word) . ### 5.6 Exceptions An exception is an unscheduled function call to an exception handler, triggered by either hardware (interrupts) or software (traps). When an exception occurs, the processor records its cause, jumps to an exception handler, and can later return to the interrupted program . #### 5.6.1 Exception causes Common causes of exceptions include instruction address misalignment, illegal instructions, breakpoints, load/store address misalignments, and access faults. Environment calls (`ecall`) from different privilege modes are also exceptions . #### 5.6.2 Privilege levels RISC-V defines privilege levels to control access to memory and privileged instructions. These levels, from highest to lowest, are : * Machine mode (M-mode): Highest privilege, for bare-metal operation . * Hypervisor mode (H-mode): For supporting virtual machines . * System mode (S-mode): For operating systems . * User mode (U-mode): Lowest privilege, for user applications . #### 5.6.3 Exception registers (CSRs) Each privilege level has control and status registers (CSRRs) to manage exceptions. For M-mode, these include : * `mtvec`: Address of the exception handler . * `mcause`: Records the cause of the exception . * `mepc` (Exception PC): Stores the PC of the instruction that caused the exception . * `mscratch`: A scratch register for temporary use by the exception handler . #### 5.6.4 Exception-related instructions Privileged instructions are used to access CSRRs: * `csrr` (CSR read): Reads a CSR into a general-purpose register . * `csrw` (CSR write): Writes a general-purpose register's value to a CSR . * `csrrw` (CSR read/write): Atomically reads and writes a CSR . * `mret` (Machine mode return): Returns from an exception handler to the address stored in `mepc` . #### 5.6.5 Exception handler summary When an exception occurs, the processor transfers control to the handler at the `mtvec` address. The handler typically saves its context (registers), checks `mcause` to determine the error, handles the exception, and then, using `mret`, returns to the program at the instruction pointed to by `mepc` (optionally incremented by 4) . > **Example:** An exception handler might check if `mcause` indicates an illegal instruction (value 2) or a load address misaligned (value 4). If it's an illegal instruction, it can increment `mepc` to skip the faulty instruction before returning . --- ## Common mistakes to avoid - Review all topics thoroughly before exams - Pay attention to formulas and key definitions - Practice with examples provided in each section - Don't memorize without understanding the underlying concepts

Glossary

| Term | Definition | |------|------------| | Architecture | The programmer's visible view of a computer, defined by its instructions, operand locations, and memory organization. | | Microarchitecture | The specific hardware implementation of a computer architecture, detailing how the architecture is realized in circuits. | | Assembly Language | A low-level programming language that uses mnemonics to represent machine language instructions, making it more human-readable than binary code. | | Machine Language | The low-level binary code (1s and 0s) that a computer's processor can directly execute. | | RISC-V | An open-source instruction set architecture (ISA) developed at UC Berkeley, known for its simplicity and modularity. | | Instruction | A command that a computer processor can execute to perform a specific operation. | | Operand | A value or location that an instruction operates on. | | Register | A small, fast storage location within the CPU used to hold data and instructions that are actively being processed. | | Memory | A hardware component that stores data and instructions for the computer; it is generally slower than registers. | | Immediate | A constant value that is directly encoded within an instruction itself, rather than being fetched from a register or memory. | | Mnemonic | A symbolic name or abbreviation used in assembly language to represent a machine code instruction (e.g., `add` for addition). | | Load Word (lw) | A RISC-V instruction used to read a 32-bit word from memory into a destination register. | | Store Word (sw) | A RISC-V instruction used to write the value from a source register into a memory location. | | Byte-Addressable Memory | A memory system where each individual byte has a unique address. | | Word-Addressable Memory | A memory system where memory locations are accessed in fixed-size words (e.g., 32 bits), and each word has a unique address. | | Base Addressing | An addressing mode where the effective memory address is calculated by adding an offset to the value in a base register. | | PC-Relative Addressing | An addressing mode used for branches and jumps where the target address is calculated relative to the current Program Counter (PC). | | Branch Instruction | An instruction that alters the flow of program execution based on a condition. | | Jump Instruction | An instruction that unconditionally transfers program control to a different location in the code. | | Stack | A region of memory used for temporary storage, particularly for function calls (storing return addresses, local variables, and parameters), operating in a Last-In, First-Out (LIFO) manner. | | Stack Pointer (sp) | A special register that points to the current top of the stack in memory. | | Function Call Convention | A set of rules that define how functions pass arguments, return values, and manage registers to ensure interoperability between different functions. | | Caller | In a function call, the function that initiates the call to another function. | | Callee | In a function call, the function that is being called. | | Return Address (ra) | A register that stores the address to which program execution should return after a function call completes. | | R-Type Instruction Format | A RISC-V instruction format used for operations involving three register operands, typically arithmetic and logical operations. It includes fields for opcode, function codes (funct7, funct3), and three registers (rs1, rs2, rd). | | I-Type Instruction Format | A RISC-V instruction format used for instructions that involve a register and a 12-bit immediate value, such as `addi` and `lw`. It includes fields for opcode, function code (funct3), a source register (rs1), a destination register (rd), and the immediate value. | | S-Type Instruction Format | A RISC-V instruction format used for store operations. It includes fields for opcode, function code (funct3), two source registers (rs1 for base address, rs2 for value), and an immediate offset. | | B-Type Instruction Format | A RISC-V instruction format used for conditional branch instructions. It includes fields for opcode, function code (funct3), two source registers (rs1, rs2) for comparison, and a 12-bit immediate offset for the branch target. | | U-Type Instruction Format | A RISC-V instruction format used for instructions that load a 20-bit immediate value into the upper bits of a register, such as `lui`. It includes fields for opcode, destination register (rd), and the upper immediate value. | | J-Type Instruction Format | A RISC-V instruction format used for unconditional jump-and-link instructions (jal). It includes fields for opcode, destination register (rd, often `ra`), and a 20-bit immediate offset for the jump target. | | Pseudoinstruction | An instruction that is not directly supported by the RISC-V hardware but is translated into one or more actual RISC-V instructions by the assembler for programmer convenience. | | Endianness | The byte order in which multi-byte data is stored in memory. Little-endian stores the least significant byte at the lowest address, while big-endian stores the most significant byte at the lowest address. | | Exception | An event that disrupts the normal sequential execution of instructions, such as an illegal instruction, division by zero, or an interrupt. It typically causes the processor to transfer control to a dedicated exception handler. | | Privilege Level | A mode of operation in a processor that determines the set of instructions and memory regions a program can access. RISC-V has modes like Machine, System, and User. | | Control and Status Register (CSR) | Special registers within the processor that control its operation or hold status information, often used in managing exceptions and privilege levels. | | Compressed Instructions | Shorter (16-bit) versions of common RISC-V instructions, designed to reduce code size and improve instruction cache efficiency. | | Floating-Point Extension (RVF, RVD, RVQ) | Optional sets of instructions and registers in RISC-V designed to perform arithmetic operations on floating-point numbers with single (32-bit), double (64-bit), or quad (128-bit) precision. |