Regular expressions and regular languages are central concepts in the theory of automata and formal languages. Regular expressions provide a concise and powerful way to describe patterns in strings, and regular languages are the set of languages that can be recognized by finite automata or described by regular expressions.
A regular expression (RE) is a symbolic notation used to describe patterns in strings over a given alphabet. Regular expressions are used to describe the syntax of regular languages, and they are used in many practical applications, such as text search, validation, and processing in programming languages.
Literal Characters: A single symbol from the alphabet.
Union ( | ): The "or" operator. It is used to denote the union of two regular expressions, meaning either one pattern or the other.
a|b matches either the string "a" or the string "b".Concatenation: Placing two regular expressions together represents the concatenation of the strings they describe.
ab matches the string "ab" (the concatenation of "a" and "b").Kleene Star ( * ): The Kleene star applied to a regular expression denotes zero or more repetitions of the pattern.
a* matches any string consisting of zero or more 'a's (including the empty string, ε). This includes "", "a", "aa", "aaa", and so on.Kleene Plus ( + ): The Kleene plus is similar to the Kleene star, but it requires at least one occurrence of the pattern.
a+ matches any string consisting of one or more 'a's, such as "a", "aa", "aaa", etc.Parentheses ( () ): Parentheses are used to group expressions and change the order of operations in regular expressions.
(a|b)c matches either "ac" or "bc".Character Classes ( [ ] ): A character class is used to match any one character from a specified set of characters.
[abc] matches any one of the characters 'a', 'b', or 'c'.Ranges in Character Classes ( [ ] ): You can specify ranges of characters within character classes.
[a-z] matches any lowercase letter from 'a' to 'z'.Negation in Character Classes ( ^ ): The caret (^) at the beginning of a character class negates the class, meaning it matches any character except the ones in the class.
[^a] matches any character except 'a'.A regular language is a language that can be described by a regular expression or recognized by a finite automaton (DFA, NFA, or ε-NFA). Regular languages form the simplest class of languages in the Chomsky hierarchy of formal languages.
Closed under Operations: Regular languages are closed under several operations, meaning that applying these operations to regular languages results in another regular language. These operations include:
Finite Automata Representation: A regular language can be recognized by a finite automaton, either a deterministic finite automaton (DFA) or a nondeterministic finite automaton (NFA). This is a crucial property because it ties regular languages to automata theory.
Equivalence with Regular Expressions: Every regular language can be described by a regular expression, and conversely, every regular expression describes a regular language.
Simple strings:
a|b.Languages with repetition:
a+.Alternating strings:
(ab)+.Binary strings with even length:
(aa|ab|ba|bb)*.To construct a regular expression for a language, you generally combine basic expressions using the operations (union, concatenation, and Kleene star). Here are some examples:
Even number of 'a's:
(bb|ab|ba)*.Strings with exactly one 'a':
b*a(b*).Strings that contain at least one 'a':
.*a.*.Regular expressions are widely used in computing for various tasks, such as:
grep or find in Unix systems).A regular expression can be converted into an equivalent finite automaton (DFA, NFA, or ε-NFA). This is a critical concept because finite automata provide a concrete model for recognizing regular languages, and regular expressions offer a compact way to describe those same languages.
These conversions illustrate that regular expressions, NFAs, and DFAs are equivalent in terms of the languages they can recognize.
Several important theorems apply to regular languages:
Regular expressions and regular languages are foundational to automata theory and formal language theory. Regular expressions provide a convenient syntax for describing patterns in strings, and regular languages are the class of languages that can be recognized by finite automata or described by regular expressions. The close relationship between finite automata and regular expressions is what makes the theory of regular languages so powerful, and it has practical applications in fields like text processing, syntax analysis, and pattern recognition.
Open this section to load past papers