MyCCompiler is a C compiler implementation based on the book "Writing a C Compiler" by Nora Sandler. This project is developed in C++ and utilizes LLVM libraries for various compiler infrastructure components, including abstract syntax tree (AST) manipulation, source location tracking, and utility data structures.
The compiler follows a traditional multi-phase compilation approach, transforming C source code through several intermediate representations before generating x86-64 assembly code.
The project is organized into several key components, each responsible for a specific phase of the compilation process:
include/mycc/- Header files containing interface declarationslib/- Implementation files for all compiler componentsmain.cpp- Entry point and command-line interface
- Location:
include/mycc/Lexer/andlib/Lexer/ - Files:
Lexer.hpp,Token.hpp,Lexer.cpp - Purpose: Tokenizes input C source code into a stream of tokens
- Location:
include/mycc/Parser/andlib/Parser/ - Files:
Parser.hpp,Parser.cpp - Purpose: Constructs an Abstract Syntax Tree (AST) from the token stream
- Location:
include/mycc/Sema/andlib/Sema/ - Files:
Sema.hpp,Scope.hpp,Sema.cpp,Scope.cpp - Purpose: Performs semantic validation and scope resolution
- Location:
include/mycc/AST/andlib/AST/ - Files:
AST.hpp,ASTContext.hpp,ASTPrinter.hpp,ASTPrinter.cpp - Purpose: Defines AST node structures and provides tree manipulation utilities
- Location:
include/mycc/IR/andlib/IR/ - Files:
SimpleIR.hpp - Purpose: Provides a low-level intermediate representation for optimization and code generation
- Location:
include/mycc/CodeGen/andlib/CodeGen/ - Files:
IRGen.hpp,IRGen.cpp - Subdirectory:
x64/containingX64CodeGen.hpp,x64AST.hpp,X64CodeGen.cpp - Purpose: Transforms IR to target-specific assembly code (x86-64)
- Location:
include/mycc/Basic/andlib/Basic/ - Files:
LLVM.hpp,TokenKinds.def,TokenKinds.hpp,Diagnostic.def,Diagnostic.hpp - Purpose: Provides fundamental data types, diagnostics, and LLVM integration
test/- Test cases for various compiler componentswriting-a-c-compiler-tests/- Official test suite from the book
The project uses CMake as its build system with modular library organization:
- Main executable:
mycc - Component libraries:
libLexer.a,libParser.a,libAST.a,libCodeGen.a,libSema.a,libBasic.a
# Build the project
mkdir build && cd build
cmake ..
make
# Compile a C source file (generates input.s assembly file)
./mycc input.c
# Run specific compilation phases
./mycc --lex input.c # Lexer only
./mycc --parse input.c # Lexer and parser
./mycc --validate input.c # Semantic analysis
./mycc --tacky input.c # Generate IR
./mycc --codegen input.c # Full compilation without executable
# Print output to console
./mycc --print input.c # Show assembly output
./mycc --tacky --print input.c # Show IR output- LLVM: Provides core data structures (APSInt, StringRef, StringMap) and utility classes
- CMake: Build system
- C++17 or later: Required for modern C++ features used throughout the codebase
The compiler currently supports:
- Basic C syntax parsing
- Variable declarations and assignments with scoped name resolution
- Arithmetic and logical expressions with proper operator precedence
- Unary operators (negation, bitwise complement, logical not)
- Binary operators (arithmetic, bitwise, comparison, logical)
- Ternary conditional operator
- Assignment and compound assignment operators
- Prefix and postfix increment/decrement operators
- Control flow statements:
- Conditional statements (if/else)
- Loop statements (while, do-while, for)
- Jump statements (break, continue, goto)
- Switch statements with case and default labels
- Labeled statements and goto
- Function definitions with block scope
- Short-circuit evaluation for logical operators
- Comprehensive semantic validation:
- Variable scope and lifetime management
- Duplicate declaration detection
- Break/continue validation in loop contexts
- Switch statement validation (constant case values, duplicate cases, multiple defaults)
- Label uniqueness and goto target validation
- Declaration placement validation (labels must precede statements, not declarations)
- x86-64 assembly generation with Intel syntax
For detailed technical information about the compiler's architecture and implementation, see TECHNICAL.md.