Augustana University College

Programming Languages

The Compilation Process

Stages from Source to Executable

  1. Compilation: source code ==> relocatable object code (binaries)

  2. Linking: many relocatable binaries (modules plus libraries) ==> one relocatable binary (with all external references satisfied)

  3. Loading: relocatable ==> absolute binary (with all code and data references bound to the addresses occupied in memory)

  4. Execution: control is transferred to the first instruction of the program

At compile time (CT), absolute addresses of variables and statement labels are not known.

In static languages (such as Fortran), absolute addresses are bound at load time (LT).

In block-structured languages, bindings can change at run time (RT).

Phases of the Compilation Process

  1. Lexical analysis (scanning): the source text is broken into tokens.

  2. Syntactic analysis (parsing): tokens are combined to form syntactic structures, typically represented by a parse tree.

    The parser may be replaced by a syntax-directed editor, which directly generates a parse tree as a product of editing.

  3. Semantic analysis: intermediate code is generated for each syntactic structure.

    Type checking is performed in this phase. Complicated features such as generic declarations and operator overloading (as in Ada and C++) are also processed.

  4. Machine-independent optimization: intermediate code is optimized to improve efficiency.

  5. Code generation: intermediate code is translated to relocatable object code for the target machine.

  6. Machine-dependent optimization: the machine code is optimized.

On some systems (e.g., C under Unix), the compiler produces assembly code, which is then translated by an assembler.

Copyright © 2000 Jonathan Mohr