COMP3137›Assembler

Computer Organization and Assembly LanguageTopic 11 of 73

Assembler

8 minread

1,316words

Intermediatelevel

Assembler: Definition and Role in Computer Systems

An assembler is a type of software tool that translates assembly language programs (human-readable low-level code) into machine language (binary code) that the CPU can execute directly. Assembly language is a symbolic representation of the machine code, and the assembler's job is to convert these human-readable mnemonics into the exact binary instructions understood by the hardware.

An assembler serves as an important bridge between the high-level programming languages (like Python or C) and machine language, allowing programmers to write efficient low-level code that can be executed by the CPU.

Key Functions of an Assembler

An assembler performs several essential tasks in the translation process:

Translation of Assembly Code to Machine Code:
- The primary role of an assembler is to convert assembly language instructions into machine language (binary code). Each line of assembly code corresponds to a unique machine code instruction.
- Example: An instruction like MOV AX, 5 in assembly might be translated into a machine code instruction like 10111000 00000101.
Handling Labels and Symbols:
- Assembly language often includes labels (like START or LOOP) that represent memory addresses or positions in the program. These labels are replaced with actual addresses by the assembler.
- The assembler will also manage symbol tables that map symbolic names (like variable names or registers) to memory addresses or values.
Address Resolution:
- When an assembly program contains data references or memory addresses, the assembler resolves these addresses into actual memory locations or offsets.
- For example, if a label START points to a specific memory address in the code, the assembler replaces START with that address during the translation.
Macro Expansion:
- Some assemblers support macros, which are predefined sequences of assembly instructions. The assembler can replace macro calls with the actual sequence of instructions defined by the macro.
- Example: A macro like PRINT could expand into several instructions that output data to the screen.
Error Checking:
- The assembler checks the assembly code for syntax errors, such as using incorrect mnemonics or referencing undefined labels. It typically provides feedback and error messages to help the programmer correct mistakes before generating machine code.

Types of Assemblers

There are several types of assemblers based on how they work and how they handle the translation process.

1. Single-Pass Assembler

A single-pass assembler processes the assembly code in one single pass. During this pass, it reads the code, translates it, and generates the corresponding machine code.

Advantages:
- Faster execution since it only requires one pass through the code.
- Simple to implement.
Disadvantages:
- Since the assembler only makes one pass, it may not know the addresses of labels and symbols that are used later in the code. This can make it harder to resolve labels and addresses on the first pass.

2. Two-Pass Assembler

A two-pass assembler reads the assembly code twice to handle address resolution and other issues.

First pass: The assembler scans the entire program to generate a symbol table and to assign addresses to labels and variables. It doesn’t generate any machine code in the first pass.
Second pass: The assembler uses the symbol table to replace labels with actual addresses and generates the machine code.
Advantages:
- More flexibility in resolving addresses.
- Easier to handle labels and forward references (where labels are used before they are defined).
Disadvantages:
- Requires two passes, which may make it slower than a single-pass assembler.

Basic Steps in Assembling a Program

Here is a simplified view of the steps an assembler follows to turn assembly code into machine code:

1. Lexical Analysis:

The assembler first breaks the program down into individual tokens (e.g., mnemonics, registers, operands, symbols, and labels) and checks for any syntax errors.

2. Parsing:

The assembler uses a grammar to understand the structure of the assembly instructions. It checks whether the instruction format follows the rules of the CPU's instruction set architecture (ISA). For example, it checks if the instruction is in the right format (e.g., MOV register, value).

3. Symbol Table Creation:

As the assembler encounters symbols (e.g., variables, labels), it creates a symbol table to store the names of the symbols and their corresponding memory locations or values.

4. Address Assignment:

In the first pass (for a two-pass assembler), the assembler assigns addresses to all labels and variables. This includes calculating the memory addresses where the code and data will be stored.

5. Code Generation:

The assembler translates each assembly instruction into its corresponding machine code (binary format), replacing mnemonics and symbols with the correct binary opcodes and addresses.

6. Output:

After the machine code has been generated, the assembler produces an output file (called an object file). This file contains the machine code that can be executed by the CPU. In some cases, the object file may need to be linked with other object files or libraries to produce a complete executable.

Example of Assembly to Machine Code Translation

Assembly Language Code (x86):

MOV AX, 5        ; Move the value 5 into the AX register
MOV BX, 10       ; Move the value 10 into the BX register
ADD AX, BX       ; Add the values in AX and BX, and store the result in AX

Steps in Assembly Translation:

The MOV instruction moves values into registers:
- MOV AX, 5 could translate to the machine code instruction 10111000 00000101, which tells the CPU to load 5 into the AX register.
- MOV BX, 10 could translate to 10111011 00001010, which tells the CPU to load 10 into the BX register.
The ADD instruction adds the values in registers AX and BX:
- ADD AX, BX might translate to 00000001 11000011, which tells the CPU to add the values in AX and BX and store the result in AX.

After the assembler translates the assembly code into machine code, the final binary might look like this:

MOV AX, 5    → 10111000 00000101
MOV BX, 10   → 10111011 00001010
ADD AX, BX   → 00000001 11000011

These binary instructions can now be executed by the CPU.

Assembler Features and Tools

1. Assembler Directives:

In addition to translating mnemonics into machine code, assemblers often include directives that help with program structure, data allocation, and other tasks. For example:

.data and .text: Sections that define where the program's data and code reside in memory.
.org: Specifies the memory location where a program or data block should be placed.
.byte and .word: Allocate space for data.

2. Macro Assemblers:

Some assemblers include macro capabilities, allowing programmers to define sequences of instructions (macros) that can be reused throughout the program. Macros allow for easier program maintenance and can improve code readability.

3. Linkers and Loaders:

In larger programs, especially those with multiple modules, linkers and loaders are used in conjunction with assemblers. A linker combines object files into a single executable file, while a loader places the program into memory for execution.

Why Use an Assembler?

Performance: Writing code in assembly allows programmers to optimize performance at a low level by directly controlling CPU registers, memory, and instruction flow.
Control: Assembly language provides direct access to the hardware, making it suitable for systems programming, embedded systems, and situations requiring fine-grained control over hardware resources.
System Software: Assemblers are used to develop system software, including operating systems, device drivers, and embedded software, where performance and control are critical.

Summary

An assembler is a software tool that translates assembly language into machine language so that a computer's CPU can execute the instructions. It plays a crucial role in low-level programming, allowing programmers to write efficient code that directly interacts with the hardware. While assemblers can be simple (single-pass) or more complex (two-pass), their primary goal is to take human-readable assembly code and generate machine code that the CPU can execute.

Previous topic 10

Assembly and Machine Language

Next topic 12

Linker and Link Libraries

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.

COMP3137›Assembler

Computer Organization and Assembly LanguageTopic 11 of 73

Assembler

8 minread

1,316words

Intermediatelevel

Assembler: Definition and Role in Computer Systems

Key Functions of an Assembler

An assembler performs several essential tasks in the translation process:

Translation of Assembly Code to Machine Code:
- The primary role of an assembler is to convert assembly language instructions into machine language (binary code). Each line of assembly code corresponds to a unique machine code instruction.
- Example: An instruction like MOV AX, 5 in assembly might be translated into a machine code instruction like 10111000 00000101.
Handling Labels and Symbols:
- Assembly language often includes labels (like START or LOOP) that represent memory addresses or positions in the program. These labels are replaced with actual addresses by the assembler.
- The assembler will also manage symbol tables that map symbolic names (like variable names or registers) to memory addresses or values.
Address Resolution:
- When an assembly program contains data references or memory addresses, the assembler resolves these addresses into actual memory locations or offsets.
- For example, if a label START points to a specific memory address in the code, the assembler replaces START with that address during the translation.
Macro Expansion:
- Some assemblers support macros, which are predefined sequences of assembly instructions. The assembler can replace macro calls with the actual sequence of instructions defined by the macro.
- Example: A macro like PRINT could expand into several instructions that output data to the screen.
Error Checking:
- The assembler checks the assembly code for syntax errors, such as using incorrect mnemonics or referencing undefined labels. It typically provides feedback and error messages to help the programmer correct mistakes before generating machine code.

Types of Assemblers

There are several types of assemblers based on how they work and how they handle the translation process.

1. Single-Pass Assembler

A single-pass assembler processes the assembly code in one single pass. During this pass, it reads the code, translates it, and generates the corresponding machine code.

Advantages:
- Faster execution since it only requires one pass through the code.
- Simple to implement.
Disadvantages:
- Since the assembler only makes one pass, it may not know the addresses of labels and symbols that are used later in the code. This can make it harder to resolve labels and addresses on the first pass.

2. Two-Pass Assembler

A two-pass assembler reads the assembly code twice to handle address resolution and other issues.

First pass: The assembler scans the entire program to generate a symbol table and to assign addresses to labels and variables. It doesn’t generate any machine code in the first pass.
Second pass: The assembler uses the symbol table to replace labels with actual addresses and generates the machine code.
Advantages:
- More flexibility in resolving addresses.
- Easier to handle labels and forward references (where labels are used before they are defined).
Disadvantages:
- Requires two passes, which may make it slower than a single-pass assembler.

Basic Steps in Assembling a Program

Here is a simplified view of the steps an assembler follows to turn assembly code into machine code:

1. Lexical Analysis:

The assembler first breaks the program down into individual tokens (e.g., mnemonics, registers, operands, symbols, and labels) and checks for any syntax errors.

2. Parsing:

3. Symbol Table Creation:

As the assembler encounters symbols (e.g., variables, labels), it creates a symbol table to store the names of the symbols and their corresponding memory locations or values.

4. Address Assignment:

In the first pass (for a two-pass assembler), the assembler assigns addresses to all labels and variables. This includes calculating the memory addresses where the code and data will be stored.

5. Code Generation:

The assembler translates each assembly instruction into its corresponding machine code (binary format), replacing mnemonics and symbols with the correct binary opcodes and addresses.

6. Output:

Example of Assembly to Machine Code Translation

Assembly Language Code (x86):

MOV AX, 5        ; Move the value 5 into the AX register
MOV BX, 10       ; Move the value 10 into the BX register
ADD AX, BX       ; Add the values in AX and BX, and store the result in AX

Steps in Assembly Translation:

The MOV instruction moves values into registers:
- MOV AX, 5 could translate to the machine code instruction 10111000 00000101, which tells the CPU to load 5 into the AX register.
- MOV BX, 10 could translate to 10111011 00001010, which tells the CPU to load 10 into the BX register.
The ADD instruction adds the values in registers AX and BX:
- ADD AX, BX might translate to 00000001 11000011, which tells the CPU to add the values in AX and BX and store the result in AX.

After the assembler translates the assembly code into machine code, the final binary might look like this:

MOV AX, 5    → 10111000 00000101
MOV BX, 10   → 10111011 00001010
ADD AX, BX   → 00000001 11000011

These binary instructions can now be executed by the CPU.

Assembler Features and Tools

1. Assembler Directives:

In addition to translating mnemonics into machine code, assemblers often include directives that help with program structure, data allocation, and other tasks. For example:

.data and .text: Sections that define where the program's data and code reside in memory.
.org: Specifies the memory location where a program or data block should be placed.
.byte and .word: Allocate space for data.

2. Macro Assemblers:

3. Linkers and Loaders:

Why Use an Assembler?

Performance: Writing code in assembly allows programmers to optimize performance at a low level by directly controlling CPU registers, memory, and instruction flow.
Control: Assembly language provides direct access to the hardware, making it suitable for systems programming, embedded systems, and situations requiring fine-grained control over hardware resources.
System Software: Assemblers are used to develop system software, including operating systems, device drivers, and embedded software, where performance and control are critical.

Summary

Previous topic 10

Assembly and Machine Language

Next topic 12

Linker and Link Libraries

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.