Building a Mini Java Compiler: Step-by-Step Guide for Beginners

Mini Java Compiler Tutorial: Writing a Simple Compiler in 10 Lessons

This 10-lesson tutorial walks you through building a minimal Java-like compiler that parses a small subset of Java, generates an abstract syntax tree (AST), performs basic semantic checks, and emits simple bytecode-like instructions. Each lesson includes objectives, key concepts, and short code examples in Java. Assume Java 11+ and a basic familiarity with parsing and data structures.

Overview: what this compiler supports

  • Source: a tiny Java-like language with:
    • class with a single static method main
    • primitive ints, arithmetic (+, -,, /)
    • variable declarations and assignments
    • if statements and while loops
    • return statement
    • method calls (to built-in print)
  • No objects, inheritance, or types beyond int and void.
  • Output: a simple stack-based bytecode (text form) executed by a small VM.

Lesson 1 — Project scaffolding and tokenization (lexer)

Objective

Set up project and implement a lexer that converts source text into tokens: identifiers, numbers, symbols, keywords.

Key points

  • Token types: IDENT, NUMBER, KEYWORD, SYMBOL, EOF
  • Keep token positions for error messages.

Example (sketch)

java

enum TokenType { IDENT, NUMBER, IF, WHILE, RETURN, INT, CLASS, STATIC, VOID, PRINT, LPAREN, RPAREN, LBRACE, RBRACE, SEMI, PLUS, MINUS, STAR, SLASH, ASSIGN, EOF } class Token { TokenType type; String text; int line, col; }

Lesson 2 — Parser: building the AST

Objective

Write a recursive-descent parser that produces an AST representing program structure.

Key points

  • Grammar (simplified):
    • program -> classDecl
    • classDecl -> ‘class’ IDENT ‘{’ methodDecl ‘}’
    • methodDecl -> ‘static’ ‘void’ IDENT ‘(’ ‘)’ block
    • block -> ‘{’ stmt* ‘}’
    • stmt -> varDecl | ifStmt | whileStmt | exprStmt | returnStmt
    • expr -> assignment
    • assignment -> IDENT ‘=’ expr | equality
    • equality -> additive ((‘==’| ‘!=’) additive)
    • additive -> multiplicative ((‘+’|‘-’) multiplicative)
    • multiplicative -> primary ((’’|‘/’) primary)
    • primary -> NUMBER | IDENT | ‘(’ expr ‘)’ | call
  • Create AST node classes: Program, ClassDecl, MethodDecl, Stmt (and subclasses), Expr (and subclasses).

Example (sketch)

java

abstract class Expr {} class Binary extends Expr { Expr left; String op; Expr right; } class Literal extends Expr { int value; } class Var extends Expr { String name; }

Lesson 3 — AST printing and debugging

Objective

Implement a pretty-printer or tree walker to visualize ASTs for debugging.

Key points

  • Visitor pattern helps separate operations from AST structure.
  • Print indentation per node depth.

Example (sketch)

java

void printExpr(Expr e, int indent) { if (e instanceof Binary) { printBinary(...); } ... }

Lesson 4 — Semantic analysis: symbol table and scope

Objective

Add symbol table to track variable declarations and simple checks: undefined variables, duplicate declarations

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *