Pytorch

News Topic Similarity Measure using Pretrained BERT Model

credit In this post we establish a topic similarity measure among the news articles collected from the New York Times RSS feeds. The main purpose is to familiarized ourselves with the (PyTorch) BERT implementation and pretrained model(s). What is BERT? BERT stands for Bidirectional Encoder Representations from Transformers. It comes from a paper published by Google AI Language in 2018[1]. It is based on the idea that fine-tuning a pretrained language model can help the model achieve better results in the downstream tasks[2][3]. ...

[Notes] Neural Language Models with PyTorch

Photo Credit Motivation I was reading this paper titled “Character-Level Language Modeling with Deeper Self-Attention” by Al-Rfou et al., which describes some ways to use Transformer self-attention models to solve the language modeling problem. One big problem of Transformer models in this setting is that they cannot pass information from one batch to the next, so they have to make predictions based on limited contexts. It becomes a problem when we have to compare the results with “traditional” RNN-based models, and what Al-Rfou et al. proposed is to use only the outputs at the last position in the sequence from the Transformers when evaluating. If we ignore the first batch, a sequence of length N will require N batches to predict for Transformers, and only (N / M) batches for RNN models (M being the sequence length of a batch). ...

[Learning Note] Single Shot MultiBox Detector with Pytorch — Part 3

(Reminder: The SSD paper and the Pytorch implementation used in this post. Also, the first and second part of the series.) Training Objective / Loss Function Every deep learning / neural network needs a differentiable objective function to learn from. After pairing ground truths and default boxes, and marking the remaining default boxes as background, we’re ready to formulate the objective function of SSD: Overall Objective — Formula (1) from the original paper ...

[Learning Note] Single Shot MultiBox Detector with Pytorch — Part 2

In the previous post we discussed the network structure and the prediction scheme of SSD. Now we move on to combine default boxes and the ground truth, so the quality of the prediction can be determined (and be improved via training). (Reminder: The SSD paper and the Pytorch implementation used in this post) Map Default Boxes to Coordinates On Input Images Parameters of default boxes for each feature map are pre-calculated and hard-coded in data/config.py: ...

[Learning Note] Single Shot MultiBox Detector with PyTorch — Part 1

Recently I’m trying to pick up PyTorch as well as some object detection deep learning algorithms. So to kill two birds with one stone, I decided to read the Single Shot MultiBox Detector paper along with one of the PyTorch implementation written by Max deGroot. Admittedly, I have some trouble understanding some ideas in the paper. After reading the implementation and scratching my head for a while, I think I figured out at least some parts of them. So the following is my notes on some confusing concept after my first and second pass of reading. ...