DC-222›Language definitions preliminaries

Theory of AutomataTopic 2 of 25

Language definitions preliminaries

9 minread

1,449words

Intermediatelevel

Language Definitions Preliminaries

In the theory of automata, the concept of a language is fundamental. A language is a set of strings formed from a given alphabet. The study of languages is essential in automata theory, as automata are often designed to recognize or accept specific languages. The formal definition of a language involves several foundational concepts that build the theoretical framework for understanding how languages are defined, manipulated, and recognized by computational models like automata.

1. Alphabet (Σ)

An alphabet is a finite set of symbols or characters. It is denoted by the Greek letter Σ (sigma), and the elements of an alphabet are called symbols or characters. For example:

Σ = {a, b} is an alphabet containing two symbols.
Σ = {0, 1} is the binary alphabet.

2. String

A string is a finite sequence of symbols from an alphabet. A string can be empty or non-empty. Strings are often denoted using lowercase letters or by a combination of the alphabet's symbols.

For example:

From the alphabet Σ = {a, b}, the string "ab" is formed by concatenating 'a' and 'b'.
The empty string, denoted as ε (epsilon), is the string of length 0 and contains no symbols.

3. Length of a String

The length of a string is the number of symbols it contains. If a string w has $n$ symbols, we say that the length of the string is $|w| = n$ .

For example:

The length of the string "ab" is $|ab| = 2$ .
The length of the empty string ε is $|ε| = 0$ .

4. Concatenation

The concatenation of two strings is the operation of appending one string to the end of another. If w1 and w2 are strings, their concatenation is denoted by $w_1 w_2$ .

For example:

If w1 = "a" and w2 = "b", then $w_1 w_2 = "ab"$ .

5. Language

A language is defined as a set of strings, all of which are formed from a particular alphabet Σ. A language can be finite or infinite, depending on the set of strings it contains.

For example:

The language L = {ε, a, ab, ba} is a finite language over the alphabet Σ = {a, b}.
The language L = {a^n | n ≥ 0} is an infinite language over Σ = {a}, where n is any non-negative integer (the set of strings consisting of zero or more 'a's).

6. Operations on Languages

Several operations can be performed on languages. The most commonly used operations are:

Union ( ∪ ): The union of two languages L1 and L2 is the set of strings that are in either L1 or L2 (or both).
- $L_1 \cup L_2 = \{ w \mid w \in L_1 \text{ or } w \in L_2 \}$
Intersection ( ∩ ): The intersection of two languages L1 and L2 is the set of strings that are in both L1 and L2.
- $L_1 \cap L_2 = \{ w \mid w \in L_1 \text{ and } w \in L_2 \}$
Difference ( − ): The difference of two languages L1 and L2 is the set of strings that are in L1 but not in L2.
- $L_1 - L_2 = \{ w \mid w \in L_1 \text{ and } w \notin L_2 \}$
Kleene Star ( * ): The Kleene star of a language L is the set of all strings that can be formed by concatenating zero or more strings from L, including the empty string ε.
- $L^* = \{ w \mid w \text{ is a concatenation of zero or more strings from } L \}$
For example, if L = {a, b}, then:
- $L^* = \{\epsilon, a, b, aa, ab, ba, bb, aaa, \dots \}$
Kleene Plus ( + ): The Kleene plus of a language L is similar to the Kleene star but requires at least one occurrence of a string from L. It is the set of all strings that can be formed by concatenating one or more strings from L.
- $L^+ = L \cdot L^*$
For example, if L = {a, b}, then:
- $L^+ = \{a, b, aa, ab, ba, bb, aaa, aab, \dots \}$
Complement ( ¬ ): The complement of a language L, denoted $\overline{L}$ , is the set of all strings over the alphabet Σ that are not in L.
- If Σ = {a, b} and L = {a, ab}, then $\overline{L}$ consists of all strings over Σ that are not 'a' or 'ab'.

7. Finite vs. Infinite Languages

A finite language contains a finite number of strings.
An infinite language contains an infinite number of strings.

For example:

The language L = {a, b, ab} is finite because it contains only three strings.
The language L = {a^n | n ≥ 0} is infinite because it contains an infinite number of strings (e.g., "", "a", "aa", "aaa", ...).

8. Regular Languages

A regular language is a language that can be recognized by a finite automaton (DFA, NFA, or ε-NFA). Regular languages can be described by regular expressions, which provide a formal way of writing patterns that describe sets of strings.

For example:

The regular expression *a(b|c)d describes the set of strings that start with 'a', followed by zero or more occurrences of 'b' or 'c', and end with 'd'.

9. Context-Free Languages

A context-free language (CFL) is a language that can be generated by a context-free grammar (CFG). CFLs are more powerful than regular languages and can describe nested structures like balanced parentheses, which regular languages cannot.

For example:

The language of balanced parentheses can be described by the context-free grammar:
- $S \to \epsilon \mid (S)S$

10. Decidability of Languages

Decidability refers to whether there is an algorithm (or procedure) that can determine if a given string belongs to a particular language.

Decidable Languages: A language is decidable if there exists an algorithm (or Turing machine) that can decide whether any given string belongs to the language.
Undecidable Languages: Some languages, like the Halting Problem, are undecidable, meaning no algorithm can determine membership for all possible strings.

Conclusion

The preliminary concepts of language theory provide the foundation for understanding more complex topics in automata theory and formal languages. These concepts help formalize the idea of recognizing and generating strings, applying operations on languages, and analyzing the computational complexity of language-related problems. Understanding these basics is essential for diving deeper into topics like regular languages, context-free languages, and computational models like finite automata and Turing machines.

Previous topic 1

Finite State Models

Next topic 3

Regular expressions/Regular languages

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.

DC-222›Language definitions preliminaries

Theory of AutomataTopic 2 of 25

Language definitions preliminaries

9 minread

1,449words

Intermediatelevel

Language Definitions Preliminaries

1. Alphabet (Σ)

An alphabet is a finite set of symbols or characters. It is denoted by the Greek letter Σ (sigma), and the elements of an alphabet are called symbols or characters. For example:

Σ = {a, b} is an alphabet containing two symbols.
Σ = {0, 1} is the binary alphabet.

2. String

A string is a finite sequence of symbols from an alphabet. A string can be empty or non-empty. Strings are often denoted using lowercase letters or by a combination of the alphabet's symbols.

For example:

From the alphabet Σ = {a, b}, the string "ab" is formed by concatenating 'a' and 'b'.
The empty string, denoted as ε (epsilon), is the string of length 0 and contains no symbols.

3. Length of a String

The length of a string is the number of symbols it contains. If a string w has $n$ symbols, we say that the length of the string is $|w| = n$ .

For example:

The length of the string "ab" is $|ab| = 2$ .
The length of the empty string ε is $|ε| = 0$ .

4. Concatenation

The concatenation of two strings is the operation of appending one string to the end of another. If w1 and w2 are strings, their concatenation is denoted by $w_1 w_2$ .

For example:

If w1 = "a" and w2 = "b", then $w_1 w_2 = "ab"$ .

5. Language

A language is defined as a set of strings, all of which are formed from a particular alphabet Σ. A language can be finite or infinite, depending on the set of strings it contains.

For example:

The language L = {ε, a, ab, ba} is a finite language over the alphabet Σ = {a, b}.
The language L = {a^n | n ≥ 0} is an infinite language over Σ = {a}, where n is any non-negative integer (the set of strings consisting of zero or more 'a's).

6. Operations on Languages

Several operations can be performed on languages. The most commonly used operations are:

Union ( ∪ ): The union of two languages L1 and L2 is the set of strings that are in either L1 or L2 (or both).
- $L_1 \cup L_2 = \{ w \mid w \in L_1 \text{ or } w \in L_2 \}$
Intersection ( ∩ ): The intersection of two languages L1 and L2 is the set of strings that are in both L1 and L2.
- $L_1 \cap L_2 = \{ w \mid w \in L_1 \text{ and } w \in L_2 \}$
Difference ( − ): The difference of two languages L1 and L2 is the set of strings that are in L1 but not in L2.
- $L_1 - L_2 = \{ w \mid w \in L_1 \text{ and } w \notin L_2 \}$
Kleene Star ( * ): The Kleene star of a language L is the set of all strings that can be formed by concatenating zero or more strings from L, including the empty string ε.
- $L^* = \{ w \mid w \text{ is a concatenation of zero or more strings from } L \}$
For example, if L = {a, b}, then:
- $L^* = \{\epsilon, a, b, aa, ab, ba, bb, aaa, \dots \}$
Kleene Plus ( + ): The Kleene plus of a language L is similar to the Kleene star but requires at least one occurrence of a string from L. It is the set of all strings that can be formed by concatenating one or more strings from L.
- $L^+ = L \cdot L^*$
For example, if L = {a, b}, then:
- $L^+ = \{a, b, aa, ab, ba, bb, aaa, aab, \dots \}$
Complement ( ¬ ): The complement of a language L, denoted $\overline{L}$ , is the set of all strings over the alphabet Σ that are not in L.
- If Σ = {a, b} and L = {a, ab}, then $\overline{L}$ consists of all strings over Σ that are not 'a' or 'ab'.

7. Finite vs. Infinite Languages

A finite language contains a finite number of strings.
An infinite language contains an infinite number of strings.

For example:

The language L = {a, b, ab} is finite because it contains only three strings.
The language L = {a^n | n ≥ 0} is infinite because it contains an infinite number of strings (e.g., "", "a", "aa", "aaa", ...).

8. Regular Languages

For example:

The regular expression *a(b|c)d describes the set of strings that start with 'a', followed by zero or more occurrences of 'b' or 'c', and end with 'd'.

9. Context-Free Languages

For example:

The language of balanced parentheses can be described by the context-free grammar:
- $S \to \epsilon \mid (S)S$

10. Decidability of Languages

Decidability refers to whether there is an algorithm (or procedure) that can determine if a given string belongs to a particular language.

Decidable Languages: A language is decidable if there exists an algorithm (or Turing machine) that can decide whether any given string belongs to the language.
Undecidable Languages: Some languages, like the Halting Problem, are undecidable, meaning no algorithm can determine membership for all possible strings.

Conclusion

Previous topic 1

Finite State Models

Next topic 3

Regular expressions/Regular languages

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.