Phases of a Compiler

 A compiler is a program that translates high-level source code into machine code. It works in multiple phases to perform lexical analysis, syntax checking, optimization, and code generation. The compilation process is generally divided into two major parts:

  1. Analysis Phase (Front-End)
  2. Synthesis Phase (Back-End)

Phases of a Compiler

1. Lexical Analysis (Scanner)

  • Converts source code into a stream of tokens.
  • Removes whitespace and comments.
  • Identifies keywords, identifiers, operators, literals.
  • Example:
    c
    int x = 10;
    Tokenized as:
    • int (keyword)
    • x (identifier)
    • = (assignment operator)
    • 10 (integer literal)
    • ; (delimiter)

2. Syntax Analysis (Parser)

  • Checks whether the token sequence follows the grammatical rules of the programming language.
  • Constructs a parse tree (syntax tree).
  • Detects syntax errors.
  • Example:
    • int x = 10; ✅ (Valid)
    • int = x 10; ❌ (Syntax Error)

3. Semantic Analysis

  • Ensures meaningful and valid operations.
  • Checks for:
    • Type mismatches (e.g., int x = "hello"; ❌)
    • Undeclared variables
    • Function signature mismatches
  • Example:
    c
    int x = "hello"; // Error: Type mismatch

4. Intermediate Code Generation

  • Converts the syntax tree into an intermediate representation (IR).
  • IR is independent of machine architecture.
  • Example (Three-Address Code):
    c
    a = b + c * d;
    Translates to:
    ini
    t1 = c * d t2 = b + t1 a = t2

5. Optimization

  • Improves performance by reducing execution time and memory usage.
  • Types:
    • Peephole Optimization (small local optimizations)
    • Loop Optimization (reducing redundant calculations in loops)
    • Constant Folding (3 * 412 at compile time)
  • Example:
    c
    int a = 3 * 4; // Compiler replaces it with 'int a = 12;'

6. Code Generation

  • Converts IR into assembly/machine code.
  • Allocates registers, memory, and instructions for the target processor.
  • Example:
    assembly
    MOV R1, b MOV R2, c MUL R3, R2, d ADD R4, R1, R3 MOV a, R4

7. Symbol Table Management

  • Maintains a record of variable names, types, scopes, and memory locations.
  • Used in semantic analysis, optimization, and code generation.

8. Error Handling

  • Detects and reports lexical, syntax, and semantic errors.
  • Example Errors:
    • Lexical Error: inta x = 5; (invalid token inta)
    • Syntax Error: int x 5; (missing =)
    • Semantic Error: int x = "abc"; (type mismatch)

Summary of Compiler Phases

PhasePurpose
Lexical AnalysisConvert source code to tokens
Syntax AnalysisCheck syntax correctness (parse tree)
Semantic AnalysisEnsure meaningful expressions (type checking)
Intermediate Code GenerationConvert to intermediate representation (IR)
OptimizationImprove performance
Code GenerationProduce machine/assembly code
Symbol Table ManagementStore variable & function details
Error HandlingDetect and report errors

Comments

Popular posts from this blog

Parad Gandhak Bhasm (Mercury-Sulfur Ash) in Ayurveda

मकोय (Makoy) के आयुर्वेदिक प्रयोग

Eulerian Path & Circuit Problem