Reducing the SentencePiece Vocabulary Size of Pretrained NLP Models
Photo Credit Motivation Q: Why and when would we want to trim down the vocabulary size of a pretrained model? A: When a large portion of the vocabulary isn’t used in your downstream task, it will make sense to get rid of the redundant part of the vocabulary to increase the model speed. For example, Google’s multilingual version of T5 — mT5 — was pretrained on 101 languages. Imagine if we only use English, Japanese, and Chinese in our downstream text generation task. We would waste a lot of time and space to process the rows in the embedding matrix and the LM head that corresponds to tokens that never appear in the dataset. ...