Skip to main content

R65 Compiler (r65c)

r65c compiles R65 source code to WLA-DX assembly for the 65816 processor. It can be used standalone or through the Makefile generated by r65x init.

r65c <file> [options]

Basic usage

# Compile to file
r65c game.r65 -o game.asm

# Compile to stdout
r65c game.r65

# Compile from stdin
cat game.r65 | r65c -

# With SNES hardware support (enables hw multiplier/divider)
r65c game.r65 -o game.asm --cfg snes

# With debug symbols for Mesen
r65c game.r65 -o game.asm --dbg

# Add include search paths
r65c game.r65 -o game.asm -I src -I lib

# Verbose progress
r65c game.r65 -o game.asm -v

Options reference

Output

FlagDescription
-o FILE, --output FILEWrite assembly to file (default: stdout)
-v, --verboseShow compilation progress
-q, --quietSuppress all output except errors

Optimization

FlagDescription
-O0Disable optimizations (fastest compilation)
-O1Standard optimizations (default)
-O2Aggressive optimizations with implicit inlining

See Optimization levels in detail below for what each level enables.

Code generation

FlagDescription
--cfg CONDITIONSet a configuration flag (can be repeated)
--abi {Default,FixedStack,Pascal}Select calling convention
--dbgGenerate Mesen-compatible debug file (.dbg)
-I PATH, --include PATHAdd include search path (can be repeated)
--disable-scratch-parametersDisable promotion of stack params to scratch registers
--disable-loop-promotionDisable promotion of loop counters to X/Y registers

The --cfg flag enables conditional compilation. Code guarded by if cfg(snes) { ... } or #[cfg(snes)] is only compiled when --cfg snes is passed. Multiple flags can be set: --cfg snes --cfg debug.

The --dbg flag generates a .dbg file alongside the assembly output containing source-level debug information in cc65/ld65 format, compatible with the Mesen emulator.

Diagnostic flags

These are useful for inspecting intermediate representations or debugging the compiler itself:

FlagDescription
--dump-tokensPrint the lexer token stream
--dump-astPrint the parsed abstract syntax tree
--dump-hirPrint the high-level intermediate representation
--dump-mirPrint the mid-level IR (control flow graph)
--stop-after PHASEStop compilation after a phase: parse, hir, typecheck, or mir
r65c game.r65 --dump-ast          # Print the parsed AST
r65c game.r65 --dump-hir # Print the high-level IR
r65c game.r65 --dump-mir # Print the mid-level IR (CFG)
r65c game.r65 --stop-after parse # Stop after parsing

Optimization levels in detail

The compiler applies optimizations in two phases: MIR-level passes that operate on the intermediate representation (control flow graph), and assembly-level peephole passes that clean up the final instruction stream. Both phases are skipped entirely at -O0.

-O0 — No optimization

All MIR and peephole optimization passes are disabled. The compiler emits straightforward code directly from the IR with no transformations. Analysis passes that affect correctness (parameter promotion, far pointer strategy, loop register promotion) still run.

Use -O0 during early development when you want the fastest compile times and the most predictable mapping between source and assembly. The output will be larger and slower but easier to read and debug.

-O1 — Standard optimization (default)

Enables all MIR-level and assembly-level passes. This is the recommended level for all normal development and release builds.

MIR-level passes

Dead function elimination removes functions that are never called. Entry points, interrupt handlers, and functions reachable through trait dispatch are preserved. After inlining (see below), this pass runs a second time to clean up functions that were fully inlined into their callers.

Dead code elimination uses liveness analysis to remove unreachable basic blocks and instructions that write to values never read.

Far-to-near call optimization converts far fn calls (JSL/RTL, 4 bytes, 8 cycles) to near calls (JSR/RTS, 3 bytes, 6 cycles) when the caller and callee are in the same bank, the function address is never taken, and the function is not an interrupt handler. Saves 1 byte and 2 cycles per call site.

Function inlining at -O1 only inlines functions explicitly marked with #[inline] or #[inline(always)], and only if they have fewer than 30 MIR instructions. Functions are never inlined if they are recursive, far, interrupt handlers, entry points, or contain inline assembly (asm!()). Functions marked #[inline(never)] are always respected.

Assembly-level peephole passes

The peephole optimizer runs iteratively until no further changes are made. Each iteration applies the following transformations:

Instruction folding — Replaces multi-instruction sequences with single instructions:

  • LDA addr; CLC; ADC #1; STA addrINC addr
  • LDA addr; SEC; SBC #1; STA addrDEC addr
  • CLC; ADC #1INC A (when carry is dead after)
  • SEC; SBC #1DEC A (when carry is dead after)
  • LDA #$00; STA addrSTZ addr (removes the LDA when A is dead after)

Redundant load elimination — Tracks the known value in A across instructions. Eliminates loads when A already holds the required value. Immediate loads survive hardware stores; the tracker is cleared at labels, branches, and mode changes.

Dead store elimination — Removes stores to locations that are overwritten before being read.

Redundant transfer elimination — Removes unnecessary TAX, TAY, TXA, TYA when the destination already holds the value.

Redundant stack operation elimination — Removes PHA; PLA, PHX; PLX, PHY; PLY pairs that push and immediately pop.

Mode switch elimination — Two sub-passes:

  • Removes duplicate SEP/REP instructions when the mode is already set.
  • Cross-block mode tracking: eliminates SEP/REP at label targets where all predecessor paths already agree on the mode. Preserves .ACCU directives for WLA-DX's linear mode tracking.

Branch optimization — Several sub-passes:

  • Eliminates branches to the immediately following label.
  • Threads branches through intermediate jumps to their final target.
  • Simplifies nested branch-over-branch patterns.
  • Replaces branches to a label whose only instruction is a return (RTS/RTL/RTI) with the return directly.

Loop optimization — Several sub-passes:

  • Rotates top-tested loops to bottom-tested form, eliminating the unconditional back-edge branch.
  • Hoists loop-invariant LDA #imm out of bottom-tested loops.
  • Converts count-up loops (INX; CPX; BCC) to count-down loops (DEX; BNE) when the counter is unused in the loop body, saving the compare instruction.
  • Hoists SEP/REP mode switches before loop headers when the back-edge arrives in the target mode.

Unreachable code elimination — Removes instructions (not labels or directives) between terminal opcodes (RTS, RTL, RTI, BRA, JMP) and the next label.

Redundant compare elimination — Removes CMP #$00 after an instruction that already sets the Z and N flags, when the next branch is BEQ, BNE, BMI, or BPL.

-O2 — Aggressive optimization

Enables everything from -O1 plus implicit function inlining and static loop unrolling.

Implicit function inlining

At this level the compiler considers inlining functions that do not have an #[inline] attribute:

  • Functions called exactly once are always inlined (no code size increase since the original is eliminated by dead function elimination).
  • Functions called more than once are inlined only if they have fewer than 3 MIR instructions (very small helpers like getters or flag checks).

Static loop unrolling

Fully unrolls for loops with compile-time constant bounds, replacing the loop with sequential copies of the body. The loop index is substituted with a constant in each copy, enabling further constant folding by downstream passes.

A loop is eligible for unrolling when all of the following are true:

  • The trip count is a compile-time constant (e.g., for i in 0..8)
  • The body contains at least 4 and at most 20 MIR operations (excluding the counter increment and control flow)
  • The total unrolled size (trip_count * body_ops) is less than 255 operations
  • The body contains no break, continue, return, function calls, inline assembly, or nested loops

For example, for i in 0..4 { buffer[i] = 0; ... } with 5 operations per iteration (20 total < 255) would be fully unrolled into 4 sequential copies with i replaced by 0, 1, 2, 3.

All other passes behave identically to -O1.

Passes that always run (all levels)

These analysis and transformation passes run regardless of optimization level because they affect correctness or are controlled by their own flags:

PassDescriptionControlled by
Parameter promotionPromotes stack parameters to scratch DP registers--disable-scratch-parameters
Loop register promotionPromotes loop counters to X/Y hardware registers--disable-loop-promotion
Far pointer strategyChooses D=S vs SET_DBR addressing per functionautomatic
Long branch fixupConverts short branches to JMP when target > 127 bytesalways
Bank size validationVerifies ROM code fits within bank boundariesalways

Summary

Pass-O0-O1-O2
Dead function eliminationyesyes
Dead code eliminationyesyes
Far-to-near call conversionyesyes
Explicit inlining (#[inline])yesyes
Implicit inlining (called-once, small)yes
Static loop unrollingyes
Peephole: instruction foldingyesyes
Peephole: redundant load eliminationyesyes
Peephole: dead store eliminationyesyes
Peephole: transfer eliminationyesyes
Peephole: stack op eliminationyesyes
Peephole: mode switch eliminationyesyes
Peephole: branch optimizationyesyes
Peephole: loop optimizationyesyes
Peephole: unreachable code eliminationyesyes
Peephole: redundant CMP eliminationyesyes
Peephole: STZ conversionyesyes

Build pipeline

The compiler produces WLA-DX assembly. Two additional tools from the WLA-DX suite are needed to produce a ROM:

 source.r65 ──r65c──▶ output.asm ──wla-65816──▶ output.o ──wlalink──▶ ROM.smc

Manual pipeline:

# 1. Compile
r65c src/main.r65 -o build/main.asm --cfg snes --dbg -I src

# 2. Assemble
wla-65816 -o build/main.o build/main.asm

# 3. Link
echo "[objects]" > linkfile.txt
echo "build/main.o" >> linkfile.txt
wlalink -v -S linkfile.txt build/ROM.smc

Projects created with r65x init include a Makefile that automates this pipeline. Run make to build, make clean to remove artifacts.


Debugging with Mesen

R65 generates Mesen-compatible debug symbols when compiled with the --dbg flag (enabled by default in generated Makefiles).

  1. Build your project with make (the --dbg flag generates a .dbg file alongside the assembly)
  2. Open the ROM in Mesen
  3. Mesen automatically loads the .dbg file if it's in the same directory
  4. Use Mesen's debugger for source-level breakpoints, register inspection, and memory viewing

To trigger a manual breakpoint in code:

asm!("BRK");   // Breaks into Mesen debugger