Vietai Report Nmt.docx

  • Uploaded by: Thong Vo
  • 0
  • 0
  • December 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Vietai Report Nmt.docx as PDF for free.

More details

  • Words: 756
  • Pages: 4
Jan 23rd, 2019 Vo Tri Thong.

NERAL MACHINE TRANSLATION (SEQ2SEQ) REPORT 1. INTRODUCTIONS A. Neural Machine Translation (NMT) This report discusses the architecture and the implementation of the neural machine translation which was devised by (Luong, Brevdo, & Zhao, 2017). The document also covers related concepts such as thought vector, attention mechanism and beam search. i.

Neural Machine Translation Due to the existence of numerous of human languages around the world, the need to develop a machine translation system has been around for decades. Until the recent years, neural machine translation systems have typically relied on phrase by phrase translation approaches which led to disfluency and unsatisfying results. In contrast, the technic in this paper scan the entire source sentence to produce the translated sentence. This sequence reading method is similar to the way that humans employ to translate documents. The main components of the system are an encoder, a decoder and a thought vector.

ii.

Encoder, Decoder and Thought vector The NMT System firstly reads the entire sentence in the source language through the encoder. Then, it generates a Thought Vector that carries information of the source sentence in numeric format. Subsequently, the Decoder emit the output sentence by generating one word after another.

Figure 1: Encoder-decoder architecture

On top of the fundamental encoder-decoder architecture, some advance techniques are employed to further enhance the result such as attention mechanism and beam searching. iii.

Attention mechanism: The process of compressing the meaning of the source sentence into a fixed length Thought Vector may result in information loss. Because the output (Bahdanau, 2014)thought vector, some information may not be retained. Therefore, the principle of attention mechanism is to maintain a connection between the output and the source sentence.

Figure 2: Effects of Attention Mechanism (Bahdanau, 2014)

This technique demonstrates its effectiveness especially in long sentence as indicated in Figure 2. B. BLEU Score i.

Introductions Bleu score is a method for automatic evaluation of machine translation. As human evaluations are expensive and time-consuming, an automatic set of matrices is needed to evaluate translation results. Bleu is a relatively quick to apply, and it highly correlates with human evaluation. Bleu has been proven as one of the most prominent method to evaluate translation results because of its correlation to human judgments. (Kishore Papineni, 2012)

ii.

Algorithm The Bleu score is a number between 0 and 1. In order to score the translation, one will compare n-grams of the candidate translation with the n-grams of the reference translation and count the number of matches. The process includes the evaluation for various n-gram sizes and compute the weighted average.

2

2. EXPERIMENTS A. Dataset Experiments in this report are trained and tested based on the IWSLT EnglishVietnamese dataset. The training set has 133K sentence pairs provided by the IWSLT Evaluation Campaign. B. The ‘Vanilla’ NMT model: i.

Configurations:

The parameters of this model is referred from the Standard HParams iwslt15.json with a slight adjustment: Attention is none. Key configurations: "attention": "", "attention_architecture": "standard",

"learning_rate": 1.0, "num_units": 512, "optimizer": "sgd", "beam_width": 10

ii.

Results: Blue score is relatively slow without attention mechanism, maximum test Blue at 8.9 # Best bleu, step 9000 lr 0.0625 step-time 0.00s wps 0.00K ppl 0.00 gN 0.00 dev ppl 21.77, dev bleu 10.0, test ppl 24.12, test bleu 8.9, Mon Jan 21 09:14:51 2019 Time to train: 30’ using a rig with Nvidia1080 Ti

C. NMT with Attention model i.

Model with SGD optimizer 1. Configurations: The parameters of this model is referred from the Standard HParams iwslt15 without any adjustment. "attention": "scaled_luong", "attention_architecture": "standard", "learning_rate": 1.0, "num_units": 512,

3

"optimizer": "sgd", "beam_width": 10

2. Results Blue score is higher with attention mechanism, maximum test Blue at 23.1 # Best bleu, step 12000 lr 0.125 step-time 0.14s wps 40.07K ppl 4.87 gN 5.96 dev ppl 9.88, dev bleu 20.3, test ppl 8.39, test bleu 23.1, Mon Jan 21 06:58:25 2019 Time to train: 30’ using a rig with Nvidia 1080 ti ii.

Model with Adam optimizer and learning rate is 0.001: 1. Configurations: "attention": "scaled_luong", "attention_architecture": "standard", "learning_rate": 0.001, "num_units": 512, "optimizer": "adam", "beam_width": 10

2. Results Adam optimizer quickly achieved a high Bluescore at step 5000 but it didn’t reach the same high as SGD. Also one observation is that Adam optimizer generated significantly more log files. # Best bleu, step 5000 lr 0.000125 step-time 0.18s wps 30.47K ppl 2.79 gN 8.81 dev ppl 10.88, dev bleu 19.6, test ppl 9.69, test bleu 21.6, Mon Jan 21 10:09:06 2019 Time to train 1 hour.

4

Related Documents

Vietai Report Nmt.docx
December 2019 8
Report
April 2020 26
Report
July 2020 18
Report
June 2020 21
Report
November 2019 42

More Documents from ""

Vietai Report Nmt.docx
December 2019 8
Literature - Nation.docx
December 2019 7
Lecun2015.pdf
December 2019 5
Kuisioner.docx
August 2019 34
Kuisioner.docx
August 2019 25
Kuisioner
August 2019 43