Implementing Beam Search - Part 2

Advance Features that Regularize the Translator

Nov 7, 2018 · 1302 words · 7 minute read machine_learning deep-learning nlp python

Overview

Part one gave an overview on how OpenNMT-py produces output sequences for a batch of input sequences (Translator._translate_batch method), and how it conducts beam searches (Beam objects):

Implementing Beam Search (Part 1) - A Source Code Analysis of OpenNMT-py

Now we turn our attention to some of the details we skipped through in part one — the advanced features that influence how the translator produce output candidates/hypotheses. They can be put into two categories: rule-based and number-based.

More concretely, what these features aim to achieve includes:

Stipulate a minimum length of output candidates.
Prevent any n-grams from appearing more than once in the output (with exception of certain tokens).
Discourage or encourage longer output candidates.
Penalize when an output candidate references only a part of the input sequence.
Penalize when an output candidate repeats itself (focusing too much on the same part of the input sequence).

They can be used when the test corpus differs from the train corpus significantly, or when the model unfortunately was not able to learn the desired behaviors due to its limitations. They are essentially another set of hyper-parameters, but only relevant in test/inference stage.

Rule-based Regularizers

Minimum length of output candidates

This is controlled by the command line argument of translator.py: -min_length.