Veritable Tech Blog

[Notes] Neural Language Models with PyTorch

Photo Credit Motivation I was reading this paper titled “Character-Level Language Modeling with Deeper Self-Attention” by Al-Rfou et al., which describes some ways to use Transformer self-attention models to solve the language modeling problem. One big problem of Transformer models in this setting is that they cannot pass information from one batch to the next, so they have to make predictions based on limited contexts. It becomes a problem when we have to compare the results with “traditional” RNN-based models, and what Al-Rfou et al. proposed is to use only the outputs at the last position in the sequence from the Transformers when evaluating. If we ignore the first batch, a sequence of length N will require N batches to predict for Transformers, and only (N / M) batches for RNN models (M being the sequence length of a batch). ...

Use Visual Studio Code To Develop Python Programs

Photo Credit Joel Grus offered his critique of Jupyter Notebook in a recent talk. I think most of his points are valid and recommend you to read the slides or watch the talk. However, one thing that caught my attention is Mr. Grus’s Python IDE(Visual Studio Code). It looks so good that I decided to give it a try, which led to this blog post. (I Don’t Like Notebooks - Joel Grus - #JupyterCon 2018) ...

Prepare Deep-Learning-Ready VMs on Google Cloud Platform

Photo Credit [The 2nd YouTube-8M Video Understanding Challenge](http://The 2nd YouTube-8M Video Understanding Challenge) has just finished. Google generously handed out $300 Google Cloud Platform(GCP) credits to the first 200 eligible people, and I was lucky enough to be one of them. I wouldn’t be able to participate in this challenge at a higher level otherwise. My local hardware can barely handle the size of the dataset and is not strong enough to handle the size of the model. The least I can do to return the favor is to write a short tutorial on how to set up deep-learning-ready VMs on GCP and about some tips that I’ve learned. ...

Quantile Regression — Part 2

Photo Credit We’ve discussed what quantile regression is and how does it work in Part 1. In this Part 2 we’re going to explore how to train quantile regression models in deep learning models and gradient boosting trees. Source Code The source code to this post is provided in this repository: ceshine/quantile-regression-tensorflow. It is a fork of strongio/quantile-regression-tensorflow, with following modifcations: Use the example dataset from the scikit-learn example. The TensorFlow implementation is mostly the same as in strongio/quantile-regression-tensorflow. ...

Quantile Regression — Part 1

Photo Credit I’m starting to think prediction interval[1] should be a required output of every real-world regression model. You need to know the uncertainty behind each point estimation. Otherwise the predictions are often not actionable. For example, consider historical sales of an item under a certain circumstance are (10000, 10, 50, 100). Standard least squares method gives you an estimate of 2540. If you restock based on that prediction, you’re likely going to significantly overstock 75% of the time. The prediction is almost useless. But if you estimate the quantiles of the data distribution, the estimated 5th, 50th, and 95th percentiles are 16, 75, 8515, which are much more informative than the 2540 single estimation. It is also the idea of quantile regression. ...