Linguistics L445/L515 Assignment 4 Machine Translation DUE: see oncourse Total: 60 points 1. (20 points) Go to http://babelfish.altavista.com. This site allows you to type in text and translate it into another language. Notice that you can also backtranslate – translate back into the original language – by copying and pasting the system’s output back into the input field. To answer the following questions, you will need to play with the MT system with a few interesting examples and use these examples to defend your answers. (a) (10 points) Review the features of and differences between transformer systems and transfer systems. Based on your understanding of this and your experience with the system, do you think this site (Babelfish) uses a transformer system or a transfer system? What facts about the system make you think so? (b) (10 points) How deep do you think the grammatical analysis is? Can you see evidence of morphological analysis, syntactic analysis, and/or semantic analysis? Explain why you think so with one or two concrete examples. 2. (20 points) Probabilistic Machine Translation (a) (9 points) In (1), (2), and (3) below, align the words in the English (a) examples with the words in the Russian (b) examples. Note that several English words may correspond with one Russian word and one English word may correspond with several Russion words. Some English words may correspond with no Russian word at all. (The glosses in the third line of each group are for your reference only – you are aligning the first and second lines.) (1) a. This is a beautiful cat. b. Eto krasivaya koshka. This beautiful cat (2) a. I have no money. b. U menya net deneg. By me no money (3) a. I didn’t know that I needed to go shopping for Eva. b. Ya ne znal, chto mne nuzhno bylo poiti v magazin dlya Evy. I not knew that me needed was go to shop for Eva (b) (3 points) How would you use the alignments in (a) to calculate probabilities of translations? In other words, for each English word, how many different Russian words can it be translated into? If there are more than one, what is the probability for the English word to be translated into each of the Russian words? Pick one English word which can be translated into at least two Russian words and use it as an example to explain how this works.
1
(c) (2 points) If you didn’t have word alignments, you could use a bag of words model. For the same word you picked, how would the candidate Russian words and their associated probabilities differ from those in part (b)? What would be the new probability of that word being translated as another word? (d) (3 points) The bag of words model, of course, gets better over time. Describe how the following extra sentences may help you translate certain words better. (That is, which words in the previous three sentences get easier to translate and why?) (4) a. I bought a cat. b. Ya kupil koshku. I bought cat (5) a. Ivan thought that I knew. b. Ivan dumal, chto ya znal. Ivan thought that I knew (e) (3 points) Note how cat changes depending on how it’s being used in Russian. What would you have to do to translate cat into Russian appropriately with the bag of words method? 3. (20 points) System evaluation exercise: Go to: http://www.tashian.com/multibabel/ For the first four parts of this question, leave the option blank to include Chinese, Japanese, and Korean. (a) (3 point) Come up with an example sentence that you’re going to translate and backtranslate and write it down. Be funny; be creative; pick a song lyric or movie quote; whatever. Just make sure that the sentence is sufficiently interesting, so that you are able to answer all of the following questions. (b) (7 points) Enter your sentence, and examine all the (English) backtranslations. Write down all the backtranslations and for each backtranslation (there are 5 languages, so make sure you give me all 5 backtranslations), give me its score (1-4) on the intelligibility scale (p. 64 of the slides). (c) (5 points) In terms of quality, pick the best and worst backtranslations. Explain how you arrived at the best and worst – i.e. think about intelligibility, accuracy, error analysis. (For error analysis, think of criteria you can use for determining quality: meaning change, tense change [present, past, future], word choice, missing/added words, word order, “word salad,” etc.) (d) (5 points) Now turn on the option to include Chinese, Japanese, and Korean. Are the backtranslations you then get generally better or worse than the others? Why do you think that is?
2