Six Phases of the Compilation Process
In this lesson, we would outline and then discuss the phases of the compilation process. This lesson is recommended for Computer Science and Engineering students for the Compiler Construction/Compiler Design course. The Compiler passes through a number of phases to produce the final target code. This is shown in the figure below.
- Lexical Analysis
- Syntax Analysis
- Semantic Analysis
- Intermediate Code Generation
- Code Optimization
- Target Code Generation
Keep in mind that the first 3 phases are collectively called the Analysis Phase, while the last 3 phases are called the Synthesis Phase. These phases are illustrated in the figure below. Take note of the output of each of the phases. Also observe that the output of a phase serves as input to the succeeding phase. So let’s begin with the first one
This is the first phase of the compilation process and is handled by the lexical analyzer which is also called the Scanner. In this phase the input source code is scanned and separated into lexical units called tokens. The lexical analyses reads the input code character-by-character.
Take an example, the line of code below:
String name = “Saffron”;
The lexical analyzer would generate the following 7 tokens and entered as 7 records in the Symbol Table:
The Symbol table is generated in this phase and populated with tokens generated. A symbol table is typically a data structure that holds a record for each identifier in the source code.
The output of this phase is Stream of Tokens
This phase is handled by the syntax analyser. The stream of tokens generated in the lexical analysis phase is analyzed further to ensure that the input code follows the syntax of the particular language.
Syntax errors are detected in this phase.
The output of this phase includes abstract syntax trees
Semantic analysis is handled by the Semantic Analyses and has to do with ensuring that the source code follows standard semantic rules.
Type Checking is taken care of in this phase. This ensures that the variables are assigned values according to their declaration.
So if a variable have been declared as integer and then assigned a float, the error is trapped by the Semantic Analyzer.
This phase also identifies chunks of code such as operands and operators of statements in the input code.
The output of this phase includes the Parse Tree
Intermediate Code Generation
Intermediate code refers to a code that is somehow between the source code and the target code, an intermediate representation of the input source program. One attribute of an Intermediate Code is ease of translation to target program.
An example would be a java programs compiled into Java Bytecodes (.class files) for the Java Virtual Machine. This intermediate code can run on any operating system that has the JVM.
One form of intermediate code is the “Three-Address-Code” which resembles an assembly language.
The final target code is generated from the intermediate code.
I already discusses the various code optimization techniques in the video “Code Optimization Techniques in Compiler Construction”. You can also print out the code optimization lecture in Code Optimization by The Tech Pro. In Code Optimization, the code is optimized to remove redundant codes and the optimize for efficient memory management as well as improve the speed of execution. The intermediate code ensures that a target code can be generated for any machine enabling portability across different platforms.
Output of this phase is the Optimized Code.
Target Code Generation
Here the target code is generated for the particular platform. Machine instruction are generated from the optimized intermediate code. Assignment of variables and registers is handled here.
The output of this phase is the target code.
I hope this helps you and you can feel free to share the link with your friends and classmates or colleague. You can also leave a comment to let me know how useful this has been to you or area you need me to explain more.