Smaller Docker Image using Multi-Stage Build

Photo Credit Why Use Mutli-Stage Build? Starting from Docker 17.05, users can utilize this new “multi-stage build” feature [1] to simplify their workflow and make the final Docker images smaller. It basically streamlines the “Builder pattern”, which means using a “builder” image to build the binary files, and copying those binary files to another runtime/production image. Despite being an interpreted programming language, many of Python libraries, especially the ones doing scientific computing and machine learning, are built upon pieces written in compiled languages (mostly C/C++). Therefore, the “Builder pattern” can still be applied. ...

June 21, 2019 · Ceshine Lee

Mixed Precision Training on Tesla T4 and P100

Photo Credit tl;dr: the power of Tensor Cores is real. Also, make sure the CPU does not become the bottleneck. Motivation I’ve written about Apex in this previous post: Use NVIDIA Apex for Easy Mixed Precision Training in PyTorch. At that time I only have my GTX 1070 to experiment on. And as we’ve learned in that post, pre-Volta nVidia cards does not benefit from half-precision arithmetic in terms of speed. It only saves some GPU memory. Therefore, I wasn’t able to personally evaluate how much speed boost we can get from mixed precision with Tensor Cores. ...

June 13, 2019 · Ceshine Lee

Use NVIDIA Apex for Easy Mixed Precision Training in PyTorch

Photo by Sam Power on Unsplash The Apex project from NVIDIA is touted as a PyTorch extension that let developers do mixed precision and distributed training “with 4 or fewer line changes to the existing code”. It’s been out for a while (circa June 2018) and seems to be well received (huggingface/pytorch-pretrained-BERT uses Apex to do 16-bit training). So I decided to give it a try. This post documents what I’ve learned. ...

March 26, 2019 · Ceshine Lee

Multilingual Similarity Search Using Pretrained Bidirectional LSTM Encoder

Photo by Steven Wei on Unsplash Introduction Previously I’ve demonstrated how to use pretrained BERT model to create a similarity measure between two documents in this post: News Topic Similarity Measure using Pretrained BERT Model. However, to find similar entries to* N* documents in corpus A of size M, we need to run NM* feed-forwards. A more efficient and widely used method is to use neural networks to generate sentence/document embeddings, and calculate cosine similarity scores between these embeddings. ...

February 15, 2019 · Ceshine Lee

News Topic Similarity Measure using Pretrained BERT Model

credit In this post we establish a topic similarity measure among the news articles collected from the New York Times RSS feeds. The main purpose is to familiarized ourselves with the (PyTorch) BERT implementation and pretrained model(s). What is BERT? BERT stands for Bidirectional Encoder Representations from Transformers. It comes from a paper published by Google AI Language in 2018[1]. It is based on the idea that fine-tuning a pretrained language model can help the model achieve better results in the downstream tasks[2][3]. ...

February 10, 2019 · Ceshine Lee