In assembly language, character constants represent individual characters or sequences of characters that can be used directly in the program. These constants are typically used for tasks like manipulating text, storing strings, or performing operations that involve characters (such as comparisons or searching for specific characters).
Character constants are typically represented by their ASCII values (or another character encoding like Unicode). Each character is associated with a unique numeric value, which corresponds to the character's representation in the computer's memory. The most commonly used encoding system for characters is ASCII (American Standard Code for Information Interchange), but other encodings like Unicode may be used in certain systems.
Character constants in assembly are generally represented using single quotes (') or double quotes ("), depending on the assembler syntax. The constant can be either a single character or a string of characters, and the assembler will often convert these characters into their corresponding ASCII (or other character set) values.
A single character constant is typically a byte (8 bits) value that corresponds to one character in the ASCII table. In assembly, these are written inside single quotes.
Example (MASM Syntax):
MOV AL, 'A' ; Load the ASCII value of 'A' (65) into AL register
Explanation:
'A' represents the character 'A', which corresponds to the ASCII value 65.MOV AL, 'A' instruction moves this ASCII value into the AL register.A string constant consists of a sequence of characters. In assembly, a string is typically stored as an array of individual characters, each represented by a byte (or word for wider encodings, like Unicode). In MASM (Microsoft Macro Assembler), strings are usually enclosed in double quotes.
Example (MASM Syntax):
.data
message DB 'Hello, World!', 0 ; Define a null-terminated string
Explanation:
DB stands for Define Byte, and it stores the string 'Hello, World!' in memory.0 at the end of the string is a null terminator, which indicates the end of the string in many programming languages and assembly operations.'H' = 72, 'e' = 101, etc.).In most assembly languages, each character is internally represented by a numeric ASCII code. The ASCII table maps characters to numeric values. For example:
| Character | ASCII Value (Decimal) | ASCII Value (Hex) |
|---|---|---|
| 'A' | 65 | 0x41 |
| 'B' | 66 | 0x42 |
| 'a' | 97 | 0x61 |
| 'z' | 122 | 0x7A |
| '0' | 48 | 0x30 |
| ' ' (Space) | 32 | 0x20 |
| '\n' | 10 | 0x0A |
You can use the numeric ASCII values in place of the characters directly in your assembly code, especially when working with mathematical or logical operations on characters.
MOV AL, 65 ; Load the ASCII value 65 (which represents 'A') into AL
MOV BL, 0x41 ; Load the hexadecimal value 0x41 (which also represents 'A') into BL
Some special characters cannot be represented directly with a single printable symbol (like newline \n, tab \t, or backslash \\). These characters are often written using escape sequences.
Common Escape Sequences in strings:
\n: Newline (ASCII 10)\t: Horizontal tab (ASCII 9)\\: Backslash (ASCII 92)\': Single quote (ASCII 39)\": Double quote (ASCII 34)In some assemblers (such as NASM or GNU Assembler), you can directly use escape sequences inside strings.
.data
newline DB 'Hello, World!', 10 ; String with newline at the end
tabbed DB 'Item 1', 9, 'Item 2', 0 ; String with tab between items
In this example:
newline contains the ASCII value 10 (newline) at the end, which will make it print on a new line.tabbed uses 9 (tab) between two items, which will cause the output to have a tab space between Item 1 and Item 2.You can perform operations on character constants just like with integers, using arithmetic and logical operators. For example, you can compare characters, convert case, or perform bitwise operations.
MOV AL, 'A' ; Load 'A' (ASCII 65) into AL
MOV BL, 'B' ; Load 'B' (ASCII 66) into BL
CMP AL, BL ; Compare AL with BL (AL == 'A', BL == 'B')
JE Same ; Jump to Same if AL == BL (characters are the same)
Since the ASCII values for uppercase and lowercase letters are offset by 32, you can convert a lowercase character to uppercase by subtracting 32.
MOV AL, 'b' ; Load 'b' (ASCII 98) into AL
SUB AL, 32 ; Convert to uppercase (ASCII 66, which is 'B')
You can also perform bitwise operations on characters, such as clearing a particular bit or setting a bit.
MOV AL, 'A' ; Load 'A' (ASCII 65) into AL
OR AL, 32 ; Set the lowercase bit (convert 'A' to 'a')
Other assemblers may use slightly different syntax for character constants.
NASM (Netwide Assembler):
NASM supports single-character constants inside single quotes ('A'), and strings are written in double quotes.
Example:
section .data
message db "Hello, World!", 0
newline db 0x0A ; ASCII value for newline
GAS (GNU Assembler):
In GAS, you can also define character constants using the db directive.
Example:
.data
char_1: .byte 'A' ; Single character constant
string: .string "Hello, World!"
Character constants in assembly language are fundamental for handling text and performing operations on individual characters. They are usually represented by their ASCII values and are used for things like string manipulation, comparisons, and text processing. Understanding how to work with characters and character constants is essential for low-level programming tasks, such as handling input and output, processing text files, and implementing algorithms that involve string matching or parsing.
By using character constants, escape sequences, and bitwise operations, you can manipulate text efficiently in assembly, even in environments with limited resources or hardware.
Open this section to load past papers