In assembly language, string constants represent sequences of characters or text. These constants are essential for tasks like text output, storing messages, or processing data in text form. A string constant in assembly is typically an array of characters, where each character is stored as a byte (or a word for wider encodings like Unicode), and each character in the string corresponds to a specific ASCII value or other encoding.
In most assemblers, string constants are typically declared in the data segment of the program. The string can be either null-terminated (a special character to indicate the end of the string) or length-prefixed (where the length of the string is stored before the characters). The most common approach is using null-terminated strings.
A null-terminated string ends with a special character, usually the ASCII value 0 (null character). This null character indicates the end of the string, so the program knows when to stop reading or processing characters.
In assembly, strings are often declared using the DB (Define Byte) directive, and a 0 byte is used as the null terminator.
.data
message DB 'Hello, World!', 0 ; Declare a null-terminated string
In this example:
'Hello, World!' is the string.0 at the end signifies the null terminator, marking the end of the string.When the string is processed by the program (e.g., for output), the program will continue reading until it encounters the null byte (0).
In length-prefixed strings, the length of the string (number of characters) is stored in the first byte, followed by the string itself. This approach is more efficient for some operations because you don't need to search for a null terminator.
.data
message_length DB 13 ; Length of string (13 characters)
message DB 'Hello, World!'
In this example:
message_length stores the length of the string (13 characters).'Hello, World!') is stored in memory following the length byte.In assembly, to access individual characters of a string, you typically use memory addressing. This involves loading the address of the string into a register and then accessing specific elements one by one.
.data
message DB 'Hello, World!', 0
.code
MOV SI, OFFSET message ; Load the address of 'message' into SI register
MOV AL, [SI] ; Load the first character ('H') into AL
Here:
SI holds the address of the string message.[SI] accesses the first byte of the string ('H'), and it is loaded into the AL register.To iterate through each character in a string (null-terminated), you can use a loop that continues until the null terminator is found.
.data
message DB 'Hello, World!', 0 ; Null-terminated string
.code
start:
MOV SI, OFFSET message ; Load the address of 'message' into SI register
print_loop:
MOV AL, [SI] ; Load the current character into AL
CMP AL, 0 ; Compare the character with 0 (null terminator)
JE done ; If null terminator, jump to done
; Here you would put the code to print AL (e.g., to console)
INC SI ; Move to the next character
JMP print_loop ; Repeat the loop
done:
; End of string processing
Sometimes you need to represent special characters in strings, such as newlines, tabs, backslashes, or quotes. These characters are typically represented using escape sequences.
\n: Newline (ASCII 10)\t: Horizontal tab (ASCII 9)\\: Backslash (ASCII 92)\': Single quote (ASCII 39)\": Double quote (ASCII 34)These escape sequences are often used when you need to include characters that are difficult to type directly, like the newline character or a tab character.
.data
message DB 'Hello, World!', 10, 0 ; 'Hello, World!' followed by newline (ASCII 10)
In this example:
message contains the text 'Hello, World!', followed by a newline character (0x0A or 10 in decimal).0) is used to mark the end of the string.To include a backslash (\) or quotes ("), use escape sequences to avoid conflicts with the assembler syntax.
.data
message DB 'This is a backslash: \', 0 ; String with a backslash
quote_message DB 'She said, "Hello!"', 0 ; String with double quotes
In these examples:
\) is represented by the escape sequence \\.") are represented by the escape sequence \".Other assemblers may have different ways of defining and using string constants. Here's how string constants are typically handled in some popular assemblers.
NASM supports defining strings with db (define byte) or equ for constants. Strings are often defined using double quotes.
section .data
message db 'Hello, World!', 0 ; Null-terminated string
quote_message db 'She said, "Hello!"', 0 ; String with quotes
db defines a byte (or multiple bytes) for the string.0) is used to mark the end of the string.In GAS, string constants are usually declared using .string or .asciz for null-terminated strings.
.data
message: .string "Hello, World!" ; Null-terminated string
quote_message: .asciz "She said, \"Hello!\"" ; String with quotes
.string is used to define a null-terminated string..asciz is another directive that explicitly denotes a null-terminated string.In assembly, you can perform various operations on strings, such as:
These operations usually require looping over the string and using instructions like MOV, CMP, LODS, STOS, etc.
.data
str1 DB 'Hello', 0
str2 DB 'World', 0
result DB 12 DUP (0) ; Allocate space for result string
.code
start:
; Copy str1 to result
MOV SI, OFFSET str1 ; Load address of str1
MOV DI, OFFSET result ; Load address of result
copy_str1:
MOV AL, [SI] ; Load character from str1
CMP AL, 0 ; Check for null terminator
JE copy_str2 ; Jump to copy_str2 if end of str1
MOV [DI], AL ; Store character in result
INC SI ; Move to next character in str1
INC DI ; Move to next position in result
JMP copy_str1
copy_str2:
; Copy str2 to result
MOV SI, OFFSET str2 ; Load address of str2
copy_str2_loop:
MOV AL, [SI] ; Load character from str2
CMP AL, 0 ; Check for null terminator
JE done ; End if null terminator is reached
MOV [DI], AL ; Store character in result
INC SI ; Move to next character in str2
INC DI ; Move to next position in result
JMP copy_str2_loop
done:
; End of string concatenation
String constants in assembly language represent sequences of characters and are a fundamental part of string manipulation in low-level programming. These constants are typically defined in the data segment of the program and are often null-terminated. You can perform various operations on strings, such as iteration, comparison, and manipulation, but these operations require explicit handling of memory and register values.
String constants in assembly offer a flexible way to handle textual data in applications, especially in embedded systems or low-level software, where string manipulation is an essential feature. Understanding how to work with strings efficiently is crucial for tasks like displaying messages, parsing text, or performing low-level I/O operations.
Open this section to load past papers