Mini Java Compiler Tutorial: Writing a Simple Compiler in 10 Lessons
This 10-lesson tutorial walks you through building a minimal Java-like compiler that parses a small subset of Java, generates an abstract syntax tree (AST), performs basic semantic checks, and emits simple bytecode-like instructions. Each lesson includes objectives, key concepts, and short code examples in Java. Assume Java 11+ and a basic familiarity with parsing and data structures.
Overview: what this compiler supports
- Source: a tiny Java-like language with:
- class with a single static method
main - primitive ints, arithmetic (+, -,, /)
- variable declarations and assignments
- if statements and while loops
- return statement
- method calls (to built-in
print)
- class with a single static method
- No objects, inheritance, or types beyond
intandvoid. - Output: a simple stack-based bytecode (text form) executed by a small VM.
Lesson 1 — Project scaffolding and tokenization (lexer)
Objective
Set up project and implement a lexer that converts source text into tokens: identifiers, numbers, symbols, keywords.
Key points
- Token types: IDENT, NUMBER, KEYWORD, SYMBOL, EOF
- Keep token positions for error messages.
Example (sketch)
java
enum TokenType { IDENT, NUMBER, IF, WHILE, RETURN, INT, CLASS, STATIC, VOID, PRINT, LPAREN, RPAREN, LBRACE, RBRACE, SEMI, PLUS, MINUS, STAR, SLASH, ASSIGN, EOF } class Token { TokenType type; String text; int line, col; }
Lesson 2 — Parser: building the AST
Objective
Write a recursive-descent parser that produces an AST representing program structure.
Key points
- Grammar (simplified):
- program -> classDecl
- classDecl -> ‘class’ IDENT ‘{’ methodDecl ‘}’
- methodDecl -> ‘static’ ‘void’ IDENT ‘(’ ‘)’ block
- block -> ‘{’ stmt* ‘}’
- stmt -> varDecl | ifStmt | whileStmt | exprStmt | returnStmt
- expr -> assignment
- assignment -> IDENT ‘=’ expr | equality
- equality -> additive ((‘==’| ‘!=’) additive)
- additive -> multiplicative ((‘+’|‘-’) multiplicative)
- multiplicative -> primary ((’’|‘/’) primary)
- primary -> NUMBER | IDENT | ‘(’ expr ‘)’ | call
- Create AST node classes: Program, ClassDecl, MethodDecl, Stmt (and subclasses), Expr (and subclasses).
Example (sketch)
java
abstract class Expr {} class Binary extends Expr { Expr left; String op; Expr right; } class Literal extends Expr { int value; } class Var extends Expr { String name; }
Lesson 3 — AST printing and debugging
Objective
Implement a pretty-printer or tree walker to visualize ASTs for debugging.
Key points
- Visitor pattern helps separate operations from AST structure.
- Print indentation per node depth.
Example (sketch)
java
void printExpr(Expr e, int indent) { if (e instanceof Binary) { printBinary(...); } ... }
Lesson 4 — Semantic analysis: symbol table and scope
Objective
Add symbol table to track variable declarations and simple checks: undefined variables, duplicate declarations
Leave a Reply