RV32I

RISC-V Base Integer Instruction Set

Technical Reference for Emulator Developers
Version 1.0  ·  Covers RV32I (32-bit base integer ISA)

0.0 - Table of Contents

0.1 - Using This Document [TOC]

This document is a technical reference for the RISC-V RV32I base integer instruction set, written specifically for people building an emulator in C. It covers only the 32-bit base integer ISA (RV32I). Extensions such as M (multiply/divide), A (atomics), F (float), and D (double) are not covered here.

Throughout this document, the following notation is used:

1.0 - About RISC-V [TOC]

RISC-V (pronounced "risk five") is a free and open instruction set architecture (ISA) developed at UC Berkeley, first released in 2010. Unlike x86 or ARM, the RISC-V ISA specification is publicly available and not encumbered by patents or licensing fees, making it ideal for education, research, and custom hardware design.

RISC-V follows the Reduced Instruction Set Computer (RISC) philosophy: a small number of simple instructions that execute in one cycle, fixed-width instruction encoding, and a large register file. The base integer ISA, called RV32I, contains only 47 instructions. This makes it the ideal architecture for building a first CPU emulator.

The RISC-V ISA is modular. The base integer ISA (I) can be extended with standard extensions:

ExtensionNameAdds
MMultiplyMUL, DIV, REM and their unsigned variants
AAtomicAtomic memory operations for multi-core
FFloatSingle-precision floating-point (32-bit)
DDoubleDouble-precision floating-point (64-bit)
CCompressed16-bit compressed instructions

This document covers RV32I only. Implement this base set first before adding any extensions.

2.0 - RV32I Specifications [TOC]

This section describes the memory model, register file, program counter, and instruction encoding of RV32I.

2.1 - Memory [TOC]

RV32I uses a flat, byte-addressable memory space of 232 bytes (4 GB). Addresses are 32-bit unsigned integers. All memory accesses are performed through load and store instructions — there are no memory-to-memory operations.

For an emulator, memory is implemented as a simple byte array:

#define MEM_SIZE (1024 * 1024)   /* 1 MB is enough to run small programs */
uint8_t memory[MEM_SIZE];

A typical memory layout when loading a program looks like this:

+---------------------------+= 0xFFFFFFFF  top of address space
|                           |
|       (unmapped)          |
|                           |
+---------------------------+
|                           |
|        Stack              |  grows downward from high address
|           |               |
|           v               |
|                           |
|           ^               |
|           |               |
|        Heap               |  grows upward
|                           |
+---------------------------+= depends on program
|       .data / .bss        |  initialized and uninitialised globals
+---------------------------+
|       .text               |  program instructions (read-only)
+---------------------------+= entry point (from ELF header, e.g. 0x00010000)
|                           |
|      (reserved)           |
|                           |
+---------------------------+= 0x00000000  bottom of address space

RISC-V is a little-endian architecture. The least-significant byte of a multi-byte value is stored at the lowest address. On an x86 or x86-64 host, this matches the host byte order, so you can read multi-byte values with a direct pointer cast.

2.2 - Registers [TOC]

RV32I has 32 general-purpose integer registers, each 32 bits wide, named x0 through x31. Register x0 is special: it is hardwired to the value zero. Any write to x0 is silently discarded. Any read from x0 always returns 0x00000000.

Each register also has an ABI (Application Binary Interface) name used in assembly language. The ABI names reflect the register's conventional purpose when calling functions:

RegisterABI NameDescriptionPreserved across call?
x0zeroHardwired zero — reads always return 0, writes ignored— (constant)
x1raReturn address — where to jump back after a function callNo (caller saves)
x2spStack pointer — top of the current stack frameYes (callee saves)
x3gpGlobal pointer
x4tpThread pointer
x5t0Temporary / alternate link registerNo
x6–x7t1–t2TemporariesNo
x8s0 / fpSaved register / frame pointerYes
x9s1Saved registerYes
x10–x11a0–a1Function arguments / return valuesNo
x12–x17a2–a7Function argumentsNo
x18–x27s2–s11Saved registersYes
x28–x31t3–t6TemporariesNo

In your emulator, implement two wrapper functions for register access that enforce the x0 rule:

uint32_t reg_read(CPU *cpu, uint32_t r)              { return (r == 0) ? 0 : cpu->regs[r]; }
void     reg_write(CPU *cpu, uint32_t r, uint32_t v) { if (r != 0) cpu->regs[r] = v; }

2.3 - Program Counter [TOC]

The Program Counter (PC) is a separate 32-bit register, not part of the general-purpose register file. It holds the address of the instruction currently being executed.

After most instructions the PC advances by 4 (one instruction = 4 bytes). Branch and jump instructions set the PC to a computed target address instead. The PC must always be aligned to a 4-byte boundary. Misaligned PC values cause an instruction-address-misaligned exception (which you can ignore in a basic emulator).

2.4 - Instruction Formats [TOC]

Every RV32I instruction is exactly 32 bits wide. There are six instruction formats. The opcode field is always in bits [6:0] regardless of format. The other fields change position depending on the format.

Bit range
31–25 24–20 19–15 14–12 11–7 6–0 Format
funct7 rs2 rs1 funct3 rd opcode R-type
imm[11:0] rs1 funct3 rd opcode I-type
imm[11:5] rs2 rs1 funct3 imm[4:0] opcode S-type
imm[12,10:5] rs2 rs1 funct3 imm[4:1,11] opcode B-type
imm[31:12] rd opcode U-type
imm[20,10:1,11,19:12] rd opcode J-type

Field sizes:

FieldBitsDescription
opcode7Identifies the instruction group
rd5Destination register (0–31)
funct33Sub-opcode — distinguishes instructions within a group
rs15First source register (0–31)
rs25Second source register (0–31)
funct77Second sub-opcode — used in R-type and shift I-type
imm12–20Immediate value, format-dependent

2.5 - Immediate Encoding [TOC]

Immediate values are embedded in the instruction word. The sign bit of every immediate is always placed at bit 31 of the instruction, which allows for fast sign extension in hardware. Some formats (B and J) have their immediate bits shuffled to simplify hardware layout — you must reassemble them in the correct order in your emulator.

Immediate extraction in C for each format:

/* I-type: 12-bit signed immediate, bits [31:20], sign-extended */
int32_t imm_i = (int32_t)instr >> 20;

/* S-type: 12-bit signed immediate, split across [31:25] and [11:7] */
int32_t imm_s = ((int32_t)instr >> 20 & ~0x1F) | ((instr >> 7) & 0x1F);

/* B-type: 13-bit signed immediate, bit 0 always 0 (halfword aligned) */
/*   bit[12] = instr[31]   bit[11] = instr[7]                          */
/*   bit[10:5] = instr[30:25]   bit[4:1] = instr[11:8]                 */
int32_t imm_b = ((int32_t)(instr & 0x80000000) >> 19)
              | ((instr & 0x00000080) << 4)
              | ((instr >> 20) & 0x7E0)
              | ((instr >> 7)  & 0x1E);

/* U-type: 20-bit immediate in upper bits, lower 12 bits are zero */
uint32_t imm_u = instr & 0xFFFFF000;

/* J-type: 21-bit signed immediate, bit 0 always 0                     */
/*   bit[20] = instr[31]   bit[19:12] = instr[19:12]                   */
/*   bit[11] = instr[20]   bit[10:1] = instr[30:21]                    */
int32_t imm_j = ((int32_t)(instr & 0x80000000) >> 11)
              | (instr & 0x000FF000)
              | ((instr >> 9)  & 0x800)
              | ((instr >> 20) & 0x7FE);
Note: B-type and J-type immediates always have their lowest bit equal to 0, because instructions must be aligned to 2-byte boundaries. The hardware never stores bit 0 of these immediates — it is always implied to be 0. This is why branch offsets are always even numbers.

3.0 - RV32I Instructions [TOC]

All 47 base RV32I instructions are described below, grouped by instruction type. Each entry shows the assembly syntax, the opcode and function codes needed to identify it, and the operation it performs.

3.1 - Integer Register-Register [TOC]

These instructions take two source registers and write a result to a destination register. All use opcode 0x33 (R-type). The funct3 field selects the operation; funct7 distinguishes ADD/SUB and SRL/SRA.

ADD rd, rs1, rs2 Add
opcode=0x33   funct3=0x0   funct7=0x00

x[rd] = x[rs1] + x[rs2]
Adds the values in rs1 and rs2 and stores the result in rd. Overflow is ignored — the result wraps around modulo 232.

SUB rd, rs1, rs2 Subtract
opcode=0x33   funct3=0x0   funct7=0x20

x[rd] = x[rs1] - x[rs2]
Subtracts rs2 from rs1. Same opcode as ADD — distinguished by funct7. Overflow wraps.

AND rd, rs1, rs2 Bitwise AND
opcode=0x33   funct3=0x7   funct7=0x00

x[rd] = x[rs1] & x[rs2]
Performs bitwise AND of rs1 and rs2.

OR rd, rs1, rs2 Bitwise OR
opcode=0x33   funct3=0x6   funct7=0x00

x[rd] = x[rs1] | x[rs2]
Performs bitwise OR of rs1 and rs2.

XOR rd, rs1, rs2 Bitwise XOR
opcode=0x33   funct3=0x4   funct7=0x00

x[rd] = x[rs1] ^ x[rs2]
Performs bitwise exclusive OR of rs1 and rs2.

SLL rd, rs1, rs2 Shift Left Logical
opcode=0x33   funct3=0x1   funct7=0x00

x[rd] = x[rs1] << (x[rs2] & 0x1F)
Shifts rs1 left by the shift amount held in the lower 5 bits of rs2. Zeros are shifted into the lower bits.

SRL rd, rs1, rs2 Shift Right Logical
opcode=0x33   funct3=0x5   funct7=0x00

x[rd] = x[rs1] >> (x[rs2] & 0x1F)  (unsigned)
Shifts rs1 right logically by the lower 5 bits of rs2. Zeros are shifted into the upper bits (unsigned shift).

SRA rd, rs1, rs2 Shift Right Arithmetic
opcode=0x33   funct3=0x5   funct7=0x20

x[rd] = x[rs1] >> (x[rs2] & 0x1F)  (signed)
Shifts rs1 right arithmetically. The sign bit is replicated into the upper bits. In C: (int32_t)rs1 >> shamt.

SLT rd, rs1, rs2 Set Less Than
opcode=0x33   funct3=0x2   funct7=0x00

x[rd] = ((int32_t)x[rs1] < (int32_t)x[rs2]) ? 1 : 0
Sets rd to 1 if rs1 is less than rs2 (signed comparison), 0 otherwise.

SLTU rd, rs1, rs2 Set Less Than Unsigned
opcode=0x33   funct3=0x3   funct7=0x00

x[rd] = (x[rs1] < x[rs2]) ? 1 : 0  (unsigned)
Same as SLT but treats both operands as unsigned. Note: SLTU rd, x0, rs2 sets rd to 1 if rs2 is nonzero.

3.2 - Integer Register-Immediate [TOC]

These instructions take one source register and a 12-bit signed immediate. All use opcode 0x13 (I-type). The immediate is sign-extended to 32 bits before the operation.

ADDI rd, rs1, imm Add Immediate
opcode=0x13   funct3=0x0

x[rd] = x[rs1] + sext(imm)
The most commonly used instruction in any program. Used to set registers (ADDI rd, x0, imm), copy registers (ADDI rd, rs1, 0), and adjust pointer offsets. There is no SUBI — use a negative immediate instead.

ANDI rd, rs1, imm AND Immediate
opcode=0x13   funct3=0x7

x[rd] = x[rs1] & sext(imm)
Bitwise AND with sign-extended immediate. Used for masking bits.

ORI rd, rs1, imm OR Immediate
opcode=0x13   funct3=0x6

x[rd] = x[rs1] | sext(imm)
Bitwise OR with sign-extended immediate. Used for setting bits.

XORI rd, rs1, imm XOR Immediate
opcode=0x13   funct3=0x4

x[rd] = x[rs1] ^ sext(imm)
Bitwise XOR with sign-extended immediate. Note: XORI rd, rs1, -1 (imm=0xFFF) inverts all bits of rs1 (bitwise NOT).

SLTI rd, rs1, imm Set Less Than Immediate
opcode=0x13   funct3=0x2

x[rd] = ((int32_t)x[rs1] < sext(imm)) ? 1 : 0
Sets rd to 1 if rs1 is less than the signed immediate, 0 otherwise.

SLTIU rd, rs1, imm Set Less Than Immediate Unsigned
opcode=0x13   funct3=0x3

x[rd] = (x[rs1] < (uint32_t)sext(imm)) ? 1 : 0
Unsigned version of SLTI. The immediate is sign-extended first, then treated as unsigned. Note: SLTIU rd, rs1, 1 sets rd to 1 if rs1 equals zero.

SLLI rd, rs1, shamt Shift Left Logical Immediate
opcode=0x13   funct3=0x1   funct7=0x00

x[rd] = x[rs1] << shamt
Shifts rs1 left by the 5-bit immediate shift amount (shamt, bits [24:20]). Zeros fill the lower bits. The upper 7 bits of the immediate field (bits [31:25]) must be 0x00.

SRLI rd, rs1, shamt Shift Right Logical Immediate
opcode=0x13   funct3=0x5   funct7=0x00

x[rd] = x[rs1] >> shamt  (unsigned)
Shifts rs1 right logically. Zeros fill the upper bits.

SRAI rd, rs1, shamt Shift Right Arithmetic Immediate
opcode=0x13   funct3=0x5   funct7=0x20

x[rd] = (int32_t)x[rs1] >> shamt  (signed)
Shifts rs1 right arithmetically. The sign bit replicates into the upper bits. Distinguished from SRLI by funct7=0x20.

3.3 - Upper Immediate [TOC]

These two instructions load a 20-bit immediate into the upper 20 bits of a register. They are used together with I-type instructions to construct 32-bit constants and addresses.

LUI rd, imm Load Upper Immediate
opcode=0x37   U-type

x[rd] = imm << 12  (lower 12 bits zeroed)
Places the 20-bit immediate into bits [31:12] of rd and zeros the lower 12 bits. Used to load large constants: follow with ADDI to set the lower 12 bits.

/* Load 0xDEADB000 into x5 */
LUI  x5, 0xDEADB     /* x5 = 0xDEADB000 */
ADDI x5, x5, 0x000   /* optionally set low 12 bits */
AUIPC rd, imm Add Upper Immediate to PC
opcode=0x17   U-type

x[rd] = PC + (imm << 12)
Adds the 20-bit immediate (shifted left 12) to the current PC and stores the result in rd. Used for position-independent code to compute PC-relative addresses.

3.4 - Loads [TOC]

Load instructions read from memory at address x[rs1] + sext(imm) and write the result to rd. All use opcode 0x03 (I-type). The funct3 field selects the width and sign treatment.

LW rd, imm(rs1) Load Word
opcode=0x03   funct3=0x2

x[rd] = M[x[rs1] + sext(imm)][31:0]
Loads a 32-bit word from memory. The address must be 4-byte aligned (in a basic emulator you can ignore alignment). This is the most commonly used load instruction.

LH rd, imm(rs1) Load Halfword
opcode=0x03   funct3=0x1

x[rd] = sext(M[x[rs1] + sext(imm)][15:0])
Loads a 16-bit halfword and sign-extends it to 32 bits. In C: (int32_t)(int16_t)mem_read16(...).

LB rd, imm(rs1) Load Byte
opcode=0x03   funct3=0x0

x[rd] = sext(M[x[rs1] + sext(imm)][7:0])
Loads a byte and sign-extends it to 32 bits. In C: (int32_t)(int8_t)mem_read8(...).

LHU rd, imm(rs1) Load Halfword Unsigned
opcode=0x03   funct3=0x5

x[rd] = M[x[rs1] + sext(imm)][15:0]  (zero-extended)
Loads a 16-bit halfword and zero-extends it to 32 bits (no sign extension).

LBU rd, imm(rs1) Load Byte Unsigned
opcode=0x03   funct3=0x4

x[rd] = M[x[rs1] + sext(imm)][7:0]  (zero-extended)
Loads a byte and zero-extends it to 32 bits.

3.5 - Stores [TOC]

Store instructions write a register value to memory at address x[rs1] + sext(imm). All use opcode 0x23 (S-type). There is no destination register. The S-type immediate is split across two fields — see Section 2.5 for how to reassemble it.

SW rs2, imm(rs1) Store Word
opcode=0x23   funct3=0x2

M[x[rs1] + sext(imm)] = x[rs2][31:0]
Stores the 32-bit value of rs2 to memory.

SH rs2, imm(rs1) Store Halfword
opcode=0x23   funct3=0x1

M[x[rs1] + sext(imm)] = x[rs2][15:0]
Stores the lower 16 bits of rs2 to memory.

SB rs2, imm(rs1) Store Byte
opcode=0x23   funct3=0x0

M[x[rs1] + sext(imm)] = x[rs2][7:0]
Stores the lowest byte of rs2 to memory.

3.6 - Branches [TOC]

Branch instructions compare two registers and conditionally add a signed offset to the PC. All use opcode 0x63 (B-type). The branch target is PC + sext(imm), where imm is a 13-bit signed offset. The offset is always even (bit 0 is implied zero). If the branch is not taken, execution continues at PC + 4 as normal.

Important for emulators: branch instructions set the PC themselves. In your cpu_step() function, the branch handler must return immediately after setting the PC — do not add 4 after returning from the branch handler.
BEQ rs1, rs2, imm Branch if Equal
opcode=0x63   funct3=0x0

if (x[rs1] == x[rs2]) PC += sext(imm) else PC += 4
Branches if rs1 equals rs2.

BNE rs1, rs2, imm Branch if Not Equal
opcode=0x63   funct3=0x1

if (x[rs1] != x[rs2]) PC += sext(imm) else PC += 4
Branches if rs1 does not equal rs2. Used to implement loops.

BLT rs1, rs2, imm Branch if Less Than
opcode=0x63   funct3=0x4

if ((int32_t)x[rs1] < (int32_t)x[rs2]) PC += sext(imm) else PC += 4
Signed comparison. Branches if rs1 < rs2.

BGE rs1, rs2, imm Branch if Greater or Equal
opcode=0x63   funct3=0x5

if ((int32_t)x[rs1] >= (int32_t)x[rs2]) PC += sext(imm) else PC += 4
Signed comparison. Branches if rs1 >= rs2.

BLTU rs1, rs2, imm Branch if Less Than Unsigned
opcode=0x63   funct3=0x6

if (x[rs1] < x[rs2]) PC += sext(imm) else PC += 4  (unsigned)
Unsigned version of BLT.

BGEU rs1, rs2, imm Branch if Greater or Equal Unsigned
opcode=0x63   funct3=0x7

if (x[rs1] >= x[rs2]) PC += sext(imm) else PC += 4  (unsigned)
Unsigned version of BGE.

3.7 - Jumps [TOC]

Jump instructions unconditionally transfer control to a target address. They also write the return address (PC + 4) into a register, enabling function calls. Like branches, jump handlers must set the PC themselves and return without adding 4.

JAL rd, imm Jump and Link
opcode=0x6F   J-type

x[rd] = PC + 4;  PC += sext(imm)
Saves the address of the next instruction into rd (the return address), then jumps to PC + offset. The offset is a 21-bit signed value. Used for function calls: JAL ra, function_name. If rd is x0, this is a plain unconditional jump (no link).

JALR rd, rs1, imm Jump and Link Register
opcode=0x67   funct3=0x0   I-type

x[rd] = PC + 4;  PC = (x[rs1] + sext(imm)) & ~1
Jumps to an address stored in a register (plus an optional offset). The lowest bit of the target is forced to 0. Used to return from functions: JALR x0, ra, 0 (jump to return address, discard link).

Note: The target address clears bit 0 (& ~1) to ensure alignment. Always implement this masking.

3.8 - System [TOC]

ECALL Environment Call
opcode=0x73   funct3=0x0   imm=0x000

Makes a request to the execution environment (operating system or emulator). The syscall number is in register a0 (x10). Arguments are in a1a7 (x11–x17).

Common syscall numbers used in RISC-V programs:

a0 valueSyscallArguments
1print integera1 = integer to print
4print stringa1 = address of null-terminated string
10exit
93exit (Linux)a1 = exit code
EBREAK Environment Break
opcode=0x73   funct3=0x0   imm=0x001

Transfers control to the debugger. In a basic emulator, treat this the same as ECALL with a0=10 (halt the emulator) or simply print a debug message and stop.


4.0 - Opcode Map [TOC]

Quick reference: opcode → instruction group. Use this as your switch statement guide.

Opcode (hex)FormatInstructionsIdentified by
0x33RADD SUB AND OR XOR SLL SRL SRA SLT SLTUfunct3, funct7
0x13IADDI ANDI ORI XORI SLTI SLTIU SLLI SRLI SRAIfunct3, (funct7 for shifts)
0x03ILB LH LW LBU LHUfunct3
0x23SSB SH SWfunct3
0x63BBEQ BNE BLT BGE BLTU BGEUfunct3
0x6FJJAL— (only one)
0x67IJALRfunct3=0
0x37ULUI— (only one)
0x17UAUIPC— (only one)
0x73IECALL EBREAKimm (0=ECALL, 1=EBREAK)

5.0 - Emulator Implementation Notes [TOC]

This section contains practical notes for implementing the emulator in C.

5.1 - Fetch-Decode-Execute Loop [TOC]

The CPU executes one instruction per call to cpu_step(). The main loop calls this function repeatedly until a halt condition is reached.

void cpu_step(CPU *cpu) {

    /* 1. FETCH — read 4 bytes from memory at current PC */
    uint32_t instr = mem_read32(cpu, cpu->pc);

    /* 2. DECODE — extract common fields */
    uint32_t opcode = instr        & 0x7F;
    uint32_t rd     = (instr >> 7) & 0x1F;
    uint32_t funct3 = (instr >> 12) & 0x7;
    uint32_t rs1    = (instr >> 15) & 0x1F;
    uint32_t rs2    = (instr >> 20) & 0x1F;
    uint32_t funct7 = (instr >> 25) & 0x7F;

    /* 3. EXECUTE — dispatch to handler */
    switch (opcode) {
        case 0x33: exec_rtype (cpu, rd, rs1, rs2, funct3, funct7); break;
        case 0x13: exec_itype (cpu, rd, rs1, funct3, instr);       break;
        case 0x03: exec_load  (cpu, rd, rs1, funct3, instr);       break;
        case 0x23: exec_store (cpu, rs1, rs2, funct3, instr);      break;
        case 0x63: exec_branch(cpu, rs1, rs2, funct3, instr);      return;
        case 0x6F: exec_jal   (cpu, rd, instr);                    return;
        case 0x67: exec_jalr  (cpu, rd, rs1, instr);               return;
        case 0x37: exec_lui   (cpu, rd, instr);                    break;
        case 0x17: exec_auipc (cpu, rd, instr);                    break;
        case 0x73: exec_ecall (cpu);                               return;
        default:
            printf("Unknown opcode 0x%02X at PC=0x%08X\n", opcode, cpu->pc);
            return;
    }

    /* 4. ADVANCE PC (branches/jumps return early and skip this) */
    cpu->pc += 4;
}

5.2 - Field Extraction in C [TOC]

All instruction fields are extracted with two operations: right-shift to bring the field down to bit 0, then AND with a mask to clear all bits above the field.

/* Pattern: (instr >> START_BIT) & MASK */
/* Mask for N bits = (1 << N) - 1       */

/* 7-bit mask  = 0x7F  (bits 6:0)  — used for opcode, funct7  */
/* 5-bit mask  = 0x1F  (bits 4:0)  — used for rd, rs1, rs2    */
/* 3-bit mask  = 0x7   (bits 2:0)  — used for funct3           */

uint32_t opcode = instr        & 0x7F;   /* bits [6:0]   */
uint32_t rd     = (instr >> 7) & 0x1F;  /* bits [11:7]  */
uint32_t funct3 = (instr >> 12) & 0x7;  /* bits [14:12] */
uint32_t rs1    = (instr >> 15) & 0x1F; /* bits [19:15] */
uint32_t rs2    = (instr >> 20) & 0x1F; /* bits [24:20] */
uint32_t funct7 = (instr >> 25) & 0x7F; /* bits [31:25] */

5.3 - Sign Extension [TOC]

RISC-V immediates are signed. A 12-bit immediate can represent values from -2048 to +2047. When you use it in a 32-bit addition, you must first sign-extend it — that is, fill the upper 20 bits with copies of the immediate's sign bit (bit 11).

The simplest way to sign-extend in C is to cast the instruction to int32_t before shifting right. Arithmetic right shift on a signed type replicates the sign bit:

/* Sign-extend a 12-bit I-type immediate */
int32_t imm = (int32_t)instr >> 20;
/* The cast to int32_t makes >> an arithmetic shift.     */
/* This replicates bit 31 (originally bit 31 of instr,   */
/* which is the sign bit of the 12-bit immediate) into   */
/* all upper bits automatically.                          */

/* Example: instr = 0xFFF00013  (ADDI x0, x0, -1) */
/* (int32_t)0xFFF00013 >> 20  =  0xFFFFFFFF  =  -1 */
Note: Arithmetic right shift (signed_val >> n) is implementation-defined in C for negative values, but in practice every compiler you will ever use (GCC, Clang, MSVC) implements it as arithmetic shift. It is safe to rely on this for emulator development.