This document is a technical reference for the RISC-V RV32I base integer instruction set, written specifically for people building an emulator in C. It covers only the 32-bit base integer ISA (RV32I). Extensions such as M (multiply/divide), A (atomics), F (float), and D (double) are not covered here.
Throughout this document, the following notation is used:
x[rd] — the destination registerx[rs1] — the first source registerx[rs2] — the second source registerimm — a sign-extended immediate valuePC — the Program CounterM[addr] — memory at address addrsext(x) — sign-extend x to 32 bitsRISC-V (pronounced "risk five") is a free and open instruction set architecture (ISA) developed at UC Berkeley, first released in 2010. Unlike x86 or ARM, the RISC-V ISA specification is publicly available and not encumbered by patents or licensing fees, making it ideal for education, research, and custom hardware design.
RISC-V follows the Reduced Instruction Set Computer (RISC) philosophy: a small number of simple instructions that execute in one cycle, fixed-width instruction encoding, and a large register file. The base integer ISA, called RV32I, contains only 47 instructions. This makes it the ideal architecture for building a first CPU emulator.
The RISC-V ISA is modular. The base integer ISA (I) can be extended with standard extensions:
| Extension | Name | Adds |
|---|---|---|
| M | Multiply | MUL, DIV, REM and their unsigned variants |
| A | Atomic | Atomic memory operations for multi-core |
| F | Float | Single-precision floating-point (32-bit) |
| D | Double | Double-precision floating-point (64-bit) |
| C | Compressed | 16-bit compressed instructions |
This document covers RV32I only. Implement this base set first before adding any extensions.
This section describes the memory model, register file, program counter, and instruction encoding of RV32I.
RV32I uses a flat, byte-addressable memory space of 232 bytes (4 GB). Addresses are 32-bit unsigned integers. All memory accesses are performed through load and store instructions — there are no memory-to-memory operations.
For an emulator, memory is implemented as a simple byte array:
#define MEM_SIZE (1024 * 1024) /* 1 MB is enough to run small programs */ uint8_t memory[MEM_SIZE];
A typical memory layout when loading a program looks like this:
+---------------------------+= 0xFFFFFFFF top of address space | | | (unmapped) | | | +---------------------------+ | | | Stack | grows downward from high address | | | | v | | | | ^ | | | | | Heap | grows upward | | +---------------------------+= depends on program | .data / .bss | initialized and uninitialised globals +---------------------------+ | .text | program instructions (read-only) +---------------------------+= entry point (from ELF header, e.g. 0x00010000) | | | (reserved) | | | +---------------------------+= 0x00000000 bottom of address space
RISC-V is a little-endian architecture. The least-significant byte of a multi-byte value is stored at the lowest address. On an x86 or x86-64 host, this matches the host byte order, so you can read multi-byte values with a direct pointer cast.
RV32I has 32 general-purpose integer registers, each 32 bits wide, named x0 through x31. Register x0 is special: it is hardwired to the value zero. Any write to x0 is silently discarded. Any read from x0 always returns 0x00000000.
Each register also has an ABI (Application Binary Interface) name used in assembly language. The ABI names reflect the register's conventional purpose when calling functions:
| Register | ABI Name | Description | Preserved across call? |
|---|---|---|---|
| x0 | zero | Hardwired zero — reads always return 0, writes ignored | — (constant) |
| x1 | ra | Return address — where to jump back after a function call | No (caller saves) |
| x2 | sp | Stack pointer — top of the current stack frame | Yes (callee saves) |
| x3 | gp | Global pointer | — |
| x4 | tp | Thread pointer | — |
| x5 | t0 | Temporary / alternate link register | No |
| x6–x7 | t1–t2 | Temporaries | No |
| x8 | s0 / fp | Saved register / frame pointer | Yes |
| x9 | s1 | Saved register | Yes |
| x10–x11 | a0–a1 | Function arguments / return values | No |
| x12–x17 | a2–a7 | Function arguments | No |
| x18–x27 | s2–s11 | Saved registers | Yes |
| x28–x31 | t3–t6 | Temporaries | No |
In your emulator, implement two wrapper functions for register access that enforce the x0 rule:
uint32_t reg_read(CPU *cpu, uint32_t r) { return (r == 0) ? 0 : cpu->regs[r]; }
void reg_write(CPU *cpu, uint32_t r, uint32_t v) { if (r != 0) cpu->regs[r] = v; }
The Program Counter (PC) is a separate 32-bit register, not part of the general-purpose register file. It holds the address of the instruction currently being executed.
After most instructions the PC advances by 4 (one instruction = 4 bytes). Branch and jump instructions set the PC to a computed target address instead. The PC must always be aligned to a 4-byte boundary. Misaligned PC values cause an instruction-address-misaligned exception (which you can ignore in a basic emulator).
Every RV32I instruction is exactly 32 bits wide. There are six instruction formats. The opcode field is always in bits [6:0] regardless of format. The other fields change position depending on the format.
| Bit range | ||||||
|---|---|---|---|---|---|---|
| 31–25 | 24–20 | 19–15 | 14–12 | 11–7 | 6–0 | Format |
| funct7 | rs2 | rs1 | funct3 | rd | opcode | R-type |
| imm[11:0] | rs1 | funct3 | rd | opcode | I-type | |
| imm[11:5] | rs2 | rs1 | funct3 | imm[4:0] | opcode | S-type |
| imm[12,10:5] | rs2 | rs1 | funct3 | imm[4:1,11] | opcode | B-type |
| imm[31:12] | rd | opcode | U-type | |||
| imm[20,10:1,11,19:12] | rd | opcode | J-type | |||
Field sizes:
| Field | Bits | Description |
|---|---|---|
opcode | 7 | Identifies the instruction group |
rd | 5 | Destination register (0–31) |
funct3 | 3 | Sub-opcode — distinguishes instructions within a group |
rs1 | 5 | First source register (0–31) |
rs2 | 5 | Second source register (0–31) |
funct7 | 7 | Second sub-opcode — used in R-type and shift I-type |
imm | 12–20 | Immediate value, format-dependent |
Immediate values are embedded in the instruction word. The sign bit of every immediate is always placed at bit 31 of the instruction, which allows for fast sign extension in hardware. Some formats (B and J) have their immediate bits shuffled to simplify hardware layout — you must reassemble them in the correct order in your emulator.
Immediate extraction in C for each format:
/* I-type: 12-bit signed immediate, bits [31:20], sign-extended */
int32_t imm_i = (int32_t)instr >> 20;
/* S-type: 12-bit signed immediate, split across [31:25] and [11:7] */
int32_t imm_s = ((int32_t)instr >> 20 & ~0x1F) | ((instr >> 7) & 0x1F);
/* B-type: 13-bit signed immediate, bit 0 always 0 (halfword aligned) */
/* bit[12] = instr[31] bit[11] = instr[7] */
/* bit[10:5] = instr[30:25] bit[4:1] = instr[11:8] */
int32_t imm_b = ((int32_t)(instr & 0x80000000) >> 19)
| ((instr & 0x00000080) << 4)
| ((instr >> 20) & 0x7E0)
| ((instr >> 7) & 0x1E);
/* U-type: 20-bit immediate in upper bits, lower 12 bits are zero */
uint32_t imm_u = instr & 0xFFFFF000;
/* J-type: 21-bit signed immediate, bit 0 always 0 */
/* bit[20] = instr[31] bit[19:12] = instr[19:12] */
/* bit[11] = instr[20] bit[10:1] = instr[30:21] */
int32_t imm_j = ((int32_t)(instr & 0x80000000) >> 11)
| (instr & 0x000FF000)
| ((instr >> 9) & 0x800)
| ((instr >> 20) & 0x7FE);
All 47 base RV32I instructions are described below, grouped by instruction type. Each entry shows the assembly syntax, the opcode and function codes needed to identify it, and the operation it performs.
These instructions take two source registers and write a result to a destination register. All use opcode 0x33 (R-type). The funct3 field selects the operation; funct7 distinguishes ADD/SUB and SRL/SRA.
x[rd] = x[rs1] + x[rs2]
Adds the values in rs1 and rs2 and stores the result in rd. Overflow is ignored — the result wraps around modulo 232.
x[rd] = x[rs1] - x[rs2]
Subtracts rs2 from rs1. Same opcode as ADD — distinguished by funct7. Overflow wraps.
x[rd] = x[rs1] & x[rs2]
Performs bitwise AND of rs1 and rs2.
x[rd] = x[rs1] | x[rs2]
Performs bitwise OR of rs1 and rs2.
x[rd] = x[rs1] ^ x[rs2]
Performs bitwise exclusive OR of rs1 and rs2.
x[rd] = x[rs1] << (x[rs2] & 0x1F)
Shifts rs1 left by the shift amount held in the lower 5 bits of rs2. Zeros are shifted into the lower bits.
x[rd] = x[rs1] >> (x[rs2] & 0x1F) (unsigned)
Shifts rs1 right logically by the lower 5 bits of rs2. Zeros are shifted into the upper bits (unsigned shift).
x[rd] = x[rs1] >> (x[rs2] & 0x1F) (signed)
Shifts rs1 right arithmetically. The sign bit is replicated into the upper bits. In C: (int32_t)rs1 >> shamt.
x[rd] = ((int32_t)x[rs1] < (int32_t)x[rs2]) ? 1 : 0
Sets rd to 1 if rs1 is less than rs2 (signed comparison), 0 otherwise.
x[rd] = (x[rs1] < x[rs2]) ? 1 : 0 (unsigned)
Same as SLT but treats both operands as unsigned. Note: SLTU rd, x0, rs2 sets rd to 1 if rs2 is nonzero.
These instructions take one source register and a 12-bit signed immediate. All use opcode 0x13 (I-type). The immediate is sign-extended to 32 bits before the operation.
x[rd] = x[rs1] + sext(imm)
The most commonly used instruction in any program. Used to set registers (ADDI rd, x0, imm), copy registers (ADDI rd, rs1, 0), and adjust pointer offsets. There is no SUBI — use a negative immediate instead.
x[rd] = x[rs1] & sext(imm)
Bitwise AND with sign-extended immediate. Used for masking bits.
x[rd] = x[rs1] | sext(imm)
Bitwise OR with sign-extended immediate. Used for setting bits.
x[rd] = x[rs1] ^ sext(imm)
Bitwise XOR with sign-extended immediate. Note: XORI rd, rs1, -1 (imm=0xFFF) inverts all bits of rs1 (bitwise NOT).
x[rd] = ((int32_t)x[rs1] < sext(imm)) ? 1 : 0
Sets rd to 1 if rs1 is less than the signed immediate, 0 otherwise.
x[rd] = (x[rs1] < (uint32_t)sext(imm)) ? 1 : 0
Unsigned version of SLTI. The immediate is sign-extended first, then treated as unsigned. Note: SLTIU rd, rs1, 1 sets rd to 1 if rs1 equals zero.
x[rd] = x[rs1] << shamt
Shifts rs1 left by the 5-bit immediate shift amount (shamt, bits [24:20]). Zeros fill the lower bits. The upper 7 bits of the immediate field (bits [31:25]) must be 0x00.
x[rd] = x[rs1] >> shamt (unsigned)
Shifts rs1 right logically. Zeros fill the upper bits.
x[rd] = (int32_t)x[rs1] >> shamt (signed)
Shifts rs1 right arithmetically. The sign bit replicates into the upper bits. Distinguished from SRLI by funct7=0x20.
These two instructions load a 20-bit immediate into the upper 20 bits of a register. They are used together with I-type instructions to construct 32-bit constants and addresses.
x[rd] = imm << 12 (lower 12 bits zeroed)
Places the 20-bit immediate into bits [31:12] of rd and zeros the lower 12 bits. Used to load large constants: follow with ADDI to set the lower 12 bits.
/* Load 0xDEADB000 into x5 */ LUI x5, 0xDEADB /* x5 = 0xDEADB000 */ ADDI x5, x5, 0x000 /* optionally set low 12 bits */
x[rd] = PC + (imm << 12)
Adds the 20-bit immediate (shifted left 12) to the current PC and stores the result in rd. Used for position-independent code to compute PC-relative addresses.
Load instructions read from memory at address x[rs1] + sext(imm) and write the result to rd. All use opcode 0x03 (I-type). The funct3 field selects the width and sign treatment.
x[rd] = M[x[rs1] + sext(imm)][31:0]
Loads a 32-bit word from memory. The address must be 4-byte aligned (in a basic emulator you can ignore alignment). This is the most commonly used load instruction.
x[rd] = sext(M[x[rs1] + sext(imm)][15:0])
Loads a 16-bit halfword and sign-extends it to 32 bits. In C: (int32_t)(int16_t)mem_read16(...).
x[rd] = sext(M[x[rs1] + sext(imm)][7:0])
Loads a byte and sign-extends it to 32 bits. In C: (int32_t)(int8_t)mem_read8(...).
x[rd] = M[x[rs1] + sext(imm)][15:0] (zero-extended)
Loads a 16-bit halfword and zero-extends it to 32 bits (no sign extension).
x[rd] = M[x[rs1] + sext(imm)][7:0] (zero-extended)
Loads a byte and zero-extends it to 32 bits.
Store instructions write a register value to memory at address x[rs1] + sext(imm). All use opcode 0x23 (S-type). There is no destination register. The S-type immediate is split across two fields — see Section 2.5 for how to reassemble it.
M[x[rs1] + sext(imm)] = x[rs2][31:0]
Stores the 32-bit value of rs2 to memory.
M[x[rs1] + sext(imm)] = x[rs2][15:0]
Stores the lower 16 bits of rs2 to memory.
M[x[rs1] + sext(imm)] = x[rs2][7:0]
Stores the lowest byte of rs2 to memory.
Branch instructions compare two registers and conditionally add a signed offset to the PC. All use opcode 0x63 (B-type). The branch target is PC + sext(imm), where imm is a 13-bit signed offset. The offset is always even (bit 0 is implied zero). If the branch is not taken, execution continues at PC + 4 as normal.
if (x[rs1] == x[rs2]) PC += sext(imm) else PC += 4
Branches if rs1 equals rs2.
if (x[rs1] != x[rs2]) PC += sext(imm) else PC += 4
Branches if rs1 does not equal rs2. Used to implement loops.
if ((int32_t)x[rs1] < (int32_t)x[rs2]) PC += sext(imm) else PC += 4
Signed comparison. Branches if rs1 < rs2.
if ((int32_t)x[rs1] >= (int32_t)x[rs2]) PC += sext(imm) else PC += 4
Signed comparison. Branches if rs1 >= rs2.
if (x[rs1] < x[rs2]) PC += sext(imm) else PC += 4 (unsigned)
Unsigned version of BLT.
if (x[rs1] >= x[rs2]) PC += sext(imm) else PC += 4 (unsigned)
Unsigned version of BGE.
Jump instructions unconditionally transfer control to a target address. They also write the return address (PC + 4) into a register, enabling function calls. Like branches, jump handlers must set the PC themselves and return without adding 4.
x[rd] = PC + 4; PC += sext(imm)
Saves the address of the next instruction into rd (the return address), then jumps to PC + offset. The offset is a 21-bit signed value. Used for function calls: JAL ra, function_name. If rd is x0, this is a plain unconditional jump (no link).
x[rd] = PC + 4; PC = (x[rs1] + sext(imm)) & ~1
Jumps to an address stored in a register (plus an optional offset). The lowest bit of the target is forced to 0. Used to return from functions: JALR x0, ra, 0 (jump to return address, discard link).
& ~1) to ensure alignment. Always implement this masking.Makes a request to the execution environment (operating system or emulator). The syscall number is in register a0 (x10). Arguments are in a1–a7 (x11–x17).
Common syscall numbers used in RISC-V programs:
| a0 value | Syscall | Arguments |
|---|---|---|
| 1 | print integer | a1 = integer to print |
| 4 | print string | a1 = address of null-terminated string |
| 10 | exit | — |
| 93 | exit (Linux) | a1 = exit code |
Transfers control to the debugger. In a basic emulator, treat this the same as ECALL with a0=10 (halt the emulator) or simply print a debug message and stop.
Quick reference: opcode → instruction group. Use this as your switch statement guide.
| Opcode (hex) | Format | Instructions | Identified by |
|---|---|---|---|
0x33 | R | ADD SUB AND OR XOR SLL SRL SRA SLT SLTU | funct3, funct7 |
0x13 | I | ADDI ANDI ORI XORI SLTI SLTIU SLLI SRLI SRAI | funct3, (funct7 for shifts) |
0x03 | I | LB LH LW LBU LHU | funct3 |
0x23 | S | SB SH SW | funct3 |
0x63 | B | BEQ BNE BLT BGE BLTU BGEU | funct3 |
0x6F | J | JAL | — (only one) |
0x67 | I | JALR | funct3=0 |
0x37 | U | LUI | — (only one) |
0x17 | U | AUIPC | — (only one) |
0x73 | I | ECALL EBREAK | imm (0=ECALL, 1=EBREAK) |
This section contains practical notes for implementing the emulator in C.
The CPU executes one instruction per call to cpu_step(). The main loop calls this function repeatedly until a halt condition is reached.
void cpu_step(CPU *cpu) {
/* 1. FETCH — read 4 bytes from memory at current PC */
uint32_t instr = mem_read32(cpu, cpu->pc);
/* 2. DECODE — extract common fields */
uint32_t opcode = instr & 0x7F;
uint32_t rd = (instr >> 7) & 0x1F;
uint32_t funct3 = (instr >> 12) & 0x7;
uint32_t rs1 = (instr >> 15) & 0x1F;
uint32_t rs2 = (instr >> 20) & 0x1F;
uint32_t funct7 = (instr >> 25) & 0x7F;
/* 3. EXECUTE — dispatch to handler */
switch (opcode) {
case 0x33: exec_rtype (cpu, rd, rs1, rs2, funct3, funct7); break;
case 0x13: exec_itype (cpu, rd, rs1, funct3, instr); break;
case 0x03: exec_load (cpu, rd, rs1, funct3, instr); break;
case 0x23: exec_store (cpu, rs1, rs2, funct3, instr); break;
case 0x63: exec_branch(cpu, rs1, rs2, funct3, instr); return;
case 0x6F: exec_jal (cpu, rd, instr); return;
case 0x67: exec_jalr (cpu, rd, rs1, instr); return;
case 0x37: exec_lui (cpu, rd, instr); break;
case 0x17: exec_auipc (cpu, rd, instr); break;
case 0x73: exec_ecall (cpu); return;
default:
printf("Unknown opcode 0x%02X at PC=0x%08X\n", opcode, cpu->pc);
return;
}
/* 4. ADVANCE PC (branches/jumps return early and skip this) */
cpu->pc += 4;
}
All instruction fields are extracted with two operations: right-shift to bring the field down to bit 0, then AND with a mask to clear all bits above the field.
/* Pattern: (instr >> START_BIT) & MASK */ /* Mask for N bits = (1 << N) - 1 */ /* 7-bit mask = 0x7F (bits 6:0) — used for opcode, funct7 */ /* 5-bit mask = 0x1F (bits 4:0) — used for rd, rs1, rs2 */ /* 3-bit mask = 0x7 (bits 2:0) — used for funct3 */ uint32_t opcode = instr & 0x7F; /* bits [6:0] */ uint32_t rd = (instr >> 7) & 0x1F; /* bits [11:7] */ uint32_t funct3 = (instr >> 12) & 0x7; /* bits [14:12] */ uint32_t rs1 = (instr >> 15) & 0x1F; /* bits [19:15] */ uint32_t rs2 = (instr >> 20) & 0x1F; /* bits [24:20] */ uint32_t funct7 = (instr >> 25) & 0x7F; /* bits [31:25] */
RISC-V immediates are signed. A 12-bit immediate can represent values from -2048 to +2047. When you use it in a 32-bit addition, you must first sign-extend it — that is, fill the upper 20 bits with copies of the immediate's sign bit (bit 11).
The simplest way to sign-extend in C is to cast the instruction to int32_t before shifting right. Arithmetic right shift on a signed type replicates the sign bit:
/* Sign-extend a 12-bit I-type immediate */ int32_t imm = (int32_t)instr >> 20; /* The cast to int32_t makes >> an arithmetic shift. */ /* This replicates bit 31 (originally bit 31 of instr, */ /* which is the sign bit of the 12-bit immediate) into */ /* all upper bits automatically. */ /* Example: instr = 0xFFF00013 (ADDI x0, x0, -1) */ /* (int32_t)0xFFF00013 >> 20 = 0xFFFFFFFF = -1 */
signed_val >> n) is implementation-defined in C for negative values, but in practice every compiler you will ever use (GCC, Clang, MSVC) implements it as arithmetic shift. It is safe to rely on this for emulator development.