Automata and Language Theory First Semester SY 2009-2010
To understand language To know the fundamental units of language
How can you define a language? English Filipino Japanese Chinese Etc…
English Different entities: letters, words, and sentences Groups of letters make up words Groups of words make up sentences Not all collection of letters form a valid word and as well as not all collection of words form a valid sentence.
Computer Languages Certain character strings are recognizable words (DO, IF, END, …) Strings of words that are recognizable commands
Sets of commands becomes a program ( with or without data) that can be compiled. It is necessary for us to adopt a definition of a “Language Structure” that is, a structure in which the decision of whether a given string of units constitutes a valid larger unit is not a matter of guesswork, but based on explicitly stated rules.
Theory of Formal Languages “formal” refers to the fact that all rules for the languages are explicitly stated in terms of what strings of symbols can occur. No liberties are tolerated, and no reference to any “deeper understanding” is required.
Languages - will be considered solely as symbols on paper and not as expressions of ideas in the minds of humans. Not a communication among intellects, but a game of symbols with formal rules. The term “formal”, emphasizes that it is the form of the string of symbols we are interested in, not the meaning.
Alphabet – finite set of fundamental units out of which we build structures. Language – a certain specified set of strings of characters from the alphabet. Words – those strings that are permissible in the language.
Λ (empty string or null string) – a string having no letter No matter what “alphabet” we are considering, the null string will always Λ and for all languages the null word, if it is a word in the language, is also Λ Two words are considered the same if all their letters are the same and in the same order.
φ - language that has no words A subtle but important difference between a word that has no letters, Λ, and the language that has no words, φ
It is not true that Λ is a word in the language φ since this language has no words at all. A certain language L does not contain the word Λ and we wish to add it to L, we use the “union of sets” operation denoted by “+” to form L + {Λ}.
A certain language L does not contain the word Λ and we wish to add it to L, we use the “union of sets” operation denoted by “+” to form L + {Λ}. This language is not equal or same as L. Nevertheless, L + φ is the same as L since no new word have been added.
The fact that φ is a language even though it has no words will turn out to be an important distinction.
English – familiar example of language The alphabet is the usual set of letters plus the apostrophe and hyphen. Let us denote it by a Greek letter capital sigma: Σ = { a b c d e . . . Z ’ -} Σ, Γ, φ, Λ - time-honored tradition If we wished to be supermeticulous, we would also include in Σ the uppercase letters and the seldom used diacritical marks.
Valid Words in the Language -
Strings, more or less, as is done in a dictionary say, ENGLISH-WORDS={all the words in a standard dictionary}
o o
{} – mathematical notational denoting a set Does not have any grammar Formal definition of language of the sentences in English , we must begin by saying that this time our basic alphabet is the entries in the dictionary.
Consider Γ, the capital gamma, as the alphabet Γ= {the entries in a standard dictionary, plus a blank space, plus the usual punctuation marks}
Language, ENGLISH-SENTENCES Rule: Grammatical Rules The trick of defining the language ENGLISHSENTENCES by listing all the rules of English grammar allows us to give a finite description of
If we go by the rules of grammar only, Examine: “I ate three Tuesdays” - this string is allowed in formal language - grammatically correct; only its meaning reveals that it is ridiculous
Meaning is something that we do not refer to in formal languages. Syntax, not semantics or diction “just like the bad teacher who is interested only in the correct spelling, not the ideas in a homework composition”
Two ways of defining Abstract Languages we treat Either they will be presented as an alphabet and the exhaustive list of all valid words, or They will be presented as an alphabet and a set of rules defining the acceptable words.
Defining a Language – presenting the alphabet and then specifying which strings are words. The word “specify” is trickier than we may at first suppose
Consider the Language MY-PET and the alphabet for this language is {a c d g o t} - there is only one word for this language, and for our own perverse reasons we wish to specify it by this sentence: If the Earth and the Moon ever collide, then MY-PET = {cat} but, if the Earth and the Moon never collide, then MY-PET = {dog}
MY-PET
issue
One or the other of these two events will occur, but at this point in history of the universe, it is impossible to be certain whether the word dog is or is not in the language MY-PET. This sentence is therefore not an adequate specification of the language MY-PET because it is not useful To be an acceptable specification of a language, a set of rules must enable us to decide, in a finite amount of time, whether a given string of alphabet letters is or is not a word in the language.
MY-PET issue continued… It is not a requirement that all the letters in the alphabet need to appear in the words selected for the language.
Language Different entities: letters/alphabets, words, and sentences
Λ empty string or null string a string having no letter
φ A language with no words
+ union of sets
Automata and Language Theory
Define words in a language
The set of language-defining rules can be of two kinds -
-
They can either tell us how to test a string of alphabet letters that we might be presented with, to see if it is a valid word, or They can tell us how to construct all the words in the language by some clear procedures
Consider an alphabet having only one letter, the letter x, Σ = {x} Language definition: any nonempty string of alphabet characters is a word: L1= {x xx xxx xxxx . . .} Alternate form: L1= {xn for n = 1 2 3 . . . }
Concatenation – two strings are written down side by side to form a new longer string Example: to concatenate the word xxx and the word xx, we obtain the word xxxxx. The words in this language are analogous to positive integers, and the operation of concatenation is analogous to addition. xn concatenated with xm is the word xn+m
Concatenation continued… It is often convenient to designate the words in a given language by new symbols, that is, other than the ones in the alphabet. Say, The word xxx is called a and the word xx is b. Concatenating a and b, ab = xxxxx
It is not always true that when two words are concatenated they produce another word in the language
Concatenation continued… For example, the language is L2 = {x xxx xxxxx xxxxxxx . . . } = {xodd} = {x2n+1 for n = 0 1 2 3 . . .} Then, a = xxx and b = xxxxx are both words in L2, but their concatenation ab = xxxxxxxx is not in L2. In these simple examples, when we concatenate a with b, we get the same word as when we concatenate b with a. ab = ba
Concatenation continued… ab = ba This relationship does not hold for all languages. In English when we concatenate “house” and “boat,” we get “houseboat,” which is indeed a word but distinct from “boathouse,” which is different thing – not because they have different meanings, but because they are different words. “Merry-go-round” and “carousel” mean the same thing, but they are different words.
Example Consider another language. Let us begin with the alphabet: ∑={0 1 2 3 4 5 6 7 8 9} and define the set of words: L3= {any finite string of alphabet letters that does not start with the letter zero} L3= { 1 2 3 4 5 6 7 8 9 10 11 12 . . .}
Defining Language rules … how to test … L1= {xn for n = 1 2 3 . . . }
… how to construct … L3= {any finite string of alphabet letters that does not start with the letter zero}
DEFINITION We define the function length of a string to be a number of letters in the string. Example: a = xxxx in the language L1 length(a) = 4 if c = 428 in the language L3 length© = 3 Or, we could write directly that in L1 length(xxxx) = 4 and in L3 length(428) = 3
Example: length(Λ) = 0
if length(w) = 0, then w = Λ. therefore, another definition of L3 L3={ any finite string of alphabet letters that, if it has a length more than 1, does not start with a zero}
There is some inherent ambiguity in the phrase “any finite string,” since it is not clear whether we intend to include the null string (Λ, the string of no letters). We’ll define a language like L1 but that does contain Λ. L4 = { Λ x xx xxx xxxx . . . } = { xn for n = 0 1 2 3 . . . } Here x0 = Λ, not x0 = 1 as in algebra.
DEFINITION
Let us introduce the function reverse. If a is a word in some language L, then reverse(a) is the same string of letters spelled backward, called the reverse of a, even if this backward string is not a word in L.
EXAMPLE reverse(xxx) = xxx reverse(xxxx) = xxxx reverse(145) = 541 But let us also note that in L3 reverse(140) = 041 which is not a word in L3.
DEFINITION Let us define a new language called PALINDROME over the alphabet ∑ = {a b} PALINDROME = {Λ, all strings x such that reverse(x)=x}
If we begin listing the elements in PALINDROME, we find PALINDROME = {Λ a b aa bb aaa aba bab bbb aaaa abba . . .}
DEFINITION Given an alphabet ∑, we wish to define a language in which any string of letters from ∑ is a word, even the null string. This language we shall call the closure of the alphabet. It is denoted by writing a star (an asterisk) after the name of the alphabet as a superscript: ∑* This notation is sometimes known as the Kleene star after the logician who was one of the founders of this subject.
EXAMPLE If ∑ = {x}, then ∑*=L4= {Λ x xx xxx . . .} EXAMPLE If ∑ = { 0 1 }, then ∑* = {Λ 0 1 00 01 10 11 000 001 . . .} EXAMPLE If ∑ = {a b c}, then ∑* ={Λ a b c aa ab ac ba bb bc ca cb cc aaa …}
We can think of the Kleene star as an operation that makes an infinite language of strings of letters out of an alphabet. Infinitely many words, each of finite length. Lexicographic order words arranged in size order (words of shortest length first) and then listed all the words of the same length alphabetically. We shall usually follow this method of sequencing a language.
DEFINITION If S is a set of words, then by S* we mean the set of all finite strings formed by concatenating words from
Let us not make the mistake of confusing the two language ENGLISH-WORDS* and ENGLISH-SENTENCES the first language contains the word butterbutterbutterhat, the second does not. Words in the ENGLISH-WORDS are the concatenate of arbitrarily many words from ENGLISH-WORDS While ENGLISH-SENTENCES are restricted to juxtaposing only words from ENGLISH-WORDS in an order that compiles with the rules of grammar.
EXAMPLE If S = {aa b}, then S*= {Λ plus any word composed of factors of aa and b} = {Λ plus all strings of a’s and b’s in which the a’s occur in even clumps} = {Λ b aa aab baa bbb aaaa aabb baab bbaa bbbb aaaab aabaa aabbb baaaa baabb bbaab … } The string aabaaab is not S* since it has a clump of a’s of length 3.
EXAMPLE If S = {a ab}, then S*= {Λ plus any word composed of factors of a and ab} = {Λ plus all strings of a’s and b’s except those that start with b and those that contain double b} = {Λ a aa ab aaa aab aba aaaa aaab aaba abaa abab aaaaa aaaab aaaba aabaa aabab abaaa abaab ababa ...}
To prove that a certain word is in the closure language S*, we must show how it can be written as a concatenate of words from the base set S. In the last example, to show that abaab is in S*, we can factor as follows: (ab)(a)(ab) These three factors are all in the set S; therefore, their concatenation is in S*. This is the only way to factor this string into factors of (a) and (ab). When this happens, we say
Sometimes factoring is not unique. For example, consider S= {xx xxx}. Then S*= {Λ and all strings of more than one x} = {xn for n = 0 2 3 4 5 . . .} = {Λ xx xxx xxxx xxxxx xxxxxx . . .} Notice that the word x is not in the language S*. The string xxxxxxx is in the closure for any of these three reasons. It is (xx)(xx)(xxx) or (xx)(xxx)(xx) or
Proof by constructive algorithm - the method of proving that something exists by showing how to create it. If the alphabet has no letters, then its closure is the language with the null string as its only word, because Λ is always a word in a Kleene closure. Symbolically, we write If ∑ = φ (the empty set), ∑*= {Λ } This is not the same as if S = {Λ }, then S*= {Λ } which is also true for a different reason, that is Λ Λ=
The Kleene closure of two sets can end up being he same language even if the two sets that we started with were not. EXAMPLE Consider the following languages S = { a b ab} and T= {a b bb} Then both S* and T* are languages of a’s and b’s since any string of a’s and b’s can be factored into syllables or either (a) and (b), both of which are in S and T.
If for some reason we with to modify the concept of closure to refer only to the concatenation of some (not zero) string for a set S, we use the notation + instead of *. For Example if S={x}, then S+ = {x xx xxx xxxx ….} (S+ = S* except the word ^)
This is not to say that S+ cannot, in general, contain a word ^. It can, but only on the condition that S contains the word I initially. What happens if we apply the closure operator twice? (S*)* or S**
Theorem 1 For any set S of strings we have S*=S**
Proof It is analogous to say that if people are made up of molecules and molecules are made up of atoms, the people are made up of atoms
Every words in S** is made up of factors from S*. Every factor from S* is made up of factors from S. Therefore, every word in S** is made up of factors from S. Therefore, every word in S** is also a word in S*. We can write S** ⊂ S* ⊂ means “is contained in or equal to.”
Now, in general, it is true that for any set A we know that A⊂A*, since A* we can chose as a word any one from A. So if we consider A to be our set S*, we have
S* ⊂ S** Together, these two inclusions prove that
S*=S**
Consider the language S*, where S = {ab ba}. a. Write at least 30 words in S*. b. Can any of this language contain the substring aaa or bbb? c. Give another description of this language.
Consider the language S*, where S={a b}. How many word does this language have: a. of length 2 b. of length 3 c. of length n
Consider the language S*, where S={aa aba baa}. a. Show that the words: aabaa, baaabaaa, & baaaaababaaaa are all in this language. b. Can any in this language have an odd total number
1. Consider the language S*, where S = {aa ab ba bb} Give another description of this language 2. Give an example of set S such that S* only contains all possible strings of a’s and b’s that have length divisible by 3.
3. Prove that for all sets S. a. (S+)* = (S*)* b. (S+)+ = S+ c. is (S*)+ = (S+)* for all sets S?