R65 Compiler (r65c)
r65c compiles R65 source code to WLA-DX assembly for the 65816 processor. It can be used standalone or through the Makefile generated by r65x init.
r65c <file> [options]
Basic usage
# Compile to file
r65c game.r65 -o game.asm
# Compile to stdout
r65c game.r65
# Compile from stdin
cat game.r65 | r65c -
# With SNES hardware support (enables hw multiplier/divider)
r65c game.r65 -o game.asm --cfg snes
# With debug symbols for Mesen
r65c game.r65 -o game.asm --dbg
# Add include search paths
r65c game.r65 -o game.asm -I src -I lib
# Verbose progress
r65c game.r65 -o game.asm -v
Options reference
Output
| Flag | Description |
|---|---|
-o FILE, --output FILE | Write assembly to file (default: stdout) |
-v, --verbose | Show compilation progress |
-q, --quiet | Suppress all output except errors |
Optimization
| Flag | Description |
|---|---|
-O0 | Disable optimizations (fastest compilation) |
-O1 | Standard optimizations (default) |
-O2 | Aggressive optimizations with implicit inlining |
See Optimization levels in detail below for what each level enables.
Code generation
| Flag | Description |
|---|---|
--cfg CONDITION | Set a configuration flag (can be repeated) |
--abi {Default,FixedStack,Pascal} | Select calling convention |
--dbg | Generate Mesen-compatible debug file (.dbg) |
-I PATH, --include PATH | Add include search path (can be repeated) |
--disable-scratch-parameters | Disable promotion of stack params to scratch registers |
--disable-loop-promotion | Disable promotion of loop counters to X/Y registers |
The --cfg flag enables conditional compilation. Code guarded by if cfg(snes) { ... } or #[cfg(snes)] is only compiled when --cfg snes is passed. Multiple flags can be set: --cfg snes --cfg debug.
The --dbg flag generates a .dbg file alongside the assembly output containing source-level debug information in cc65/ld65 format, compatible with the Mesen emulator.
Diagnostic flags
These are useful for inspecting intermediate representations or debugging the compiler itself:
| Flag | Description |
|---|---|
--dump-tokens | Print the lexer token stream |
--dump-ast | Print the parsed abstract syntax tree |
--dump-hir | Print the high-level intermediate representation |
--dump-mir | Print the mid-level IR (control flow graph) |
--stop-after PHASE | Stop compilation after a phase: parse, hir, typecheck, or mir |
r65c game.r65 --dump-ast # Print the parsed AST
r65c game.r65 --dump-hir # Print the high-level IR
r65c game.r65 --dump-mir # Print the mid-level IR (CFG)
r65c game.r65 --stop-after parse # Stop after parsing
Optimization levels in detail
The compiler applies optimizations in two phases: MIR-level passes that operate on the intermediate representation (control flow graph), and assembly-level peephole passes that clean up the final instruction stream. Both phases are skipped entirely at -O0.
-O0 — No optimization
All MIR and peephole optimization passes are disabled. The compiler emits straightforward code directly from the IR with no transformations. Analysis passes that affect correctness (parameter promotion, far pointer strategy, loop register promotion) still run.
Use -O0 during early development when you want the fastest compile times and the most predictable mapping between source and assembly. The output will be larger and slower but easier to read and debug.
-O1 — Standard optimization (default)
Enables all MIR-level and assembly-level passes. This is the recommended level for all normal development and release builds.
MIR-level passes
Dead function elimination removes functions that are never called. Entry points, interrupt handlers, and functions reachable through trait dispatch are preserved. After inlining (see below), this pass runs a second time to clean up functions that were fully inlined into their callers.
Dead code elimination uses liveness analysis to remove unreachable basic blocks and instructions that write to values never read.
Far-to-near call optimization converts far fn calls (JSL/RTL, 4 bytes, 8 cycles) to near calls (JSR/RTS, 3 bytes, 6 cycles) when the caller and callee are in the same bank, the function address is never taken, and the function is not an interrupt handler. Saves 1 byte and 2 cycles per call site.
Function inlining at -O1 only inlines functions explicitly marked with #[inline] or #[inline(always)], and only if they have fewer than 30 MIR instructions. Functions are never inlined if they are recursive, far, interrupt handlers, entry points, or contain inline assembly (asm!()). Functions marked #[inline(never)] are always respected.
Assembly-level peephole passes
The peephole optimizer runs iteratively until no further changes are made. Each iteration applies the following transformations:
Instruction folding — Replaces multi-instruction sequences with single instructions:
LDA addr; CLC; ADC #1; STA addr→INC addrLDA addr; SEC; SBC #1; STA addr→DEC addrCLC; ADC #1→INC A(when carry is dead after)SEC; SBC #1→DEC A(when carry is dead after)LDA #$00; STA addr→STZ addr(removes the LDA when A is dead after)
Redundant load elimination — Tracks the known value in A across instructions. Eliminates loads when A already holds the required value. Immediate loads survive hardware stores; the tracker is cleared at labels, branches, and mode changes.
Dead store elimination — Removes stores to locations that are overwritten before being read.
Redundant transfer elimination — Removes unnecessary TAX, TAY, TXA, TYA when the destination already holds the value.
Redundant stack operation elimination — Removes PHA; PLA, PHX; PLX, PHY; PLY pairs that push and immediately pop.
Mode switch elimination — Two sub-passes:
- Removes duplicate
SEP/REPinstructions when the mode is already set. - Cross-block mode tracking: eliminates
SEP/REPat label targets where all predecessor paths already agree on the mode. Preserves.ACCUdirectives for WLA-DX's linear mode tracking.
Branch optimization — Several sub-passes:
- Eliminates branches to the immediately following label.
- Threads branches through intermediate jumps to their final target.
- Simplifies nested branch-over-branch patterns.
- Replaces branches to a label whose only instruction is a return (
RTS/RTL/RTI) with the return directly.
Loop optimization — Several sub-passes:
- Rotates top-tested loops to bottom-tested form, eliminating the unconditional back-edge branch.
- Hoists loop-invariant
LDA #immout of bottom-tested loops. - Converts count-up loops (
INX; CPX; BCC) to count-down loops (DEX; BNE) when the counter is unused in the loop body, saving the compare instruction. - Hoists
SEP/REPmode switches before loop headers when the back-edge arrives in the target mode.
Unreachable code elimination — Removes instructions (not labels or directives) between terminal opcodes (RTS, RTL, RTI, BRA, JMP) and the next label.
Redundant compare elimination — Removes CMP #$00 after an instruction that already sets the Z and N flags, when the next branch is BEQ, BNE, BMI, or BPL.
-O2 — Aggressive optimization
Enables everything from -O1 plus implicit function inlining and static loop unrolling.
Implicit function inlining
At this level the compiler considers inlining functions that do not have an #[inline] attribute:
- Functions called exactly once are always inlined (no code size increase since the original is eliminated by dead function elimination).
- Functions called more than once are inlined only if they have fewer than 3 MIR instructions (very small helpers like getters or flag checks).
Static loop unrolling
Fully unrolls for loops with compile-time constant bounds, replacing the loop with sequential copies of the body. The loop index is substituted with a constant in each copy, enabling further constant folding by downstream passes.
A loop is eligible for unrolling when all of the following are true:
- The trip count is a compile-time constant (e.g.,
for i in 0..8) - The body contains at least 4 and at most 20 MIR operations (excluding the counter increment and control flow)
- The total unrolled size (
trip_count * body_ops) is less than 255 operations - The body contains no
break,continue,return, function calls, inline assembly, or nested loops
For example, for i in 0..4 { buffer[i] = 0; ... } with 5 operations per iteration (20 total < 255) would be fully unrolled into 4 sequential copies with i replaced by 0, 1, 2, 3.
All other passes behave identically to -O1.
Passes that always run (all levels)
These analysis and transformation passes run regardless of optimization level because they affect correctness or are controlled by their own flags:
| Pass | Description | Controlled by |
|---|---|---|
| Parameter promotion | Promotes stack parameters to scratch DP registers | --disable-scratch-parameters |
| Loop register promotion | Promotes loop counters to X/Y hardware registers | --disable-loop-promotion |
| Far pointer strategy | Chooses D=S vs SET_DBR addressing per function | automatic |
| Long branch fixup | Converts short branches to JMP when target > 127 bytes | always |
| Bank size validation | Verifies ROM code fits within bank boundaries | always |
Summary
| Pass | -O0 | -O1 | -O2 |
|---|---|---|---|
| Dead function elimination | — | yes | yes |
| Dead code elimination | — | yes | yes |
| Far-to-near call conversion | — | yes | yes |
Explicit inlining (#[inline]) | — | yes | yes |
| Implicit inlining (called-once, small) | — | — | yes |
| Static loop unrolling | — | — | yes |
| Peephole: instruction folding | — | yes | yes |
| Peephole: redundant load elimination | — | yes | yes |
| Peephole: dead store elimination | — | yes | yes |
| Peephole: transfer elimination | — | yes | yes |
| Peephole: stack op elimination | — | yes | yes |
| Peephole: mode switch elimination | — | yes | yes |
| Peephole: branch optimization | — | yes | yes |
| Peephole: loop optimization | — | yes | yes |
| Peephole: unreachable code elimination | — | yes | yes |
| Peephole: redundant CMP elimination | — | yes | yes |
| Peephole: STZ conversion | — | yes | yes |
Build pipeline
The compiler produces WLA-DX assembly. Two additional tools from the WLA-DX suite are needed to produce a ROM:
source.r65 ──r65c──▶ output.asm ──wla-65816──▶ output.o ──wlalink──▶ ROM.smc
Manual pipeline:
# 1. Compile
r65c src/main.r65 -o build/main.asm --cfg snes --dbg -I src
# 2. Assemble
wla-65816 -o build/main.o build/main.asm
# 3. Link
echo "[objects]" > linkfile.txt
echo "build/main.o" >> linkfile.txt
wlalink -v -S linkfile.txt build/ROM.smc
Projects created with r65x init include a Makefile that automates this pipeline. Run make to build, make clean to remove artifacts.
Debugging with Mesen
R65 generates Mesen-compatible debug symbols when compiled with the --dbg flag (enabled by default in generated Makefiles).
- Build your project with
make(the--dbgflag generates a.dbgfile alongside the assembly) - Open the ROM in Mesen
- Mesen automatically loads the
.dbgfile if it's in the same directory - Use Mesen's debugger for source-level breakpoints, register inspection, and memory viewing
To trigger a manual breakpoint in code:
asm!("BRK"); // Breaks into Mesen debugger