[Tensorflow] Training CV Models on TPU without Using Cloud Storage

Photo Credit Introduction Recently I was asked this question (paraphrasing): I have a small image dataset that I want to train on Google Colab and its free TPU. Is there a way to do that without having to upload the dataset as TFRecord files to Cloud Storage? First of all, if your dataset is small, I’d say training on GPU wouldn’t be much slower than on TPU. But they were adamant that they wanted to see how fast training on TPU can be. That’s fine, and the answer is yes. There is a way to do that. ...

October 11, 2020 · Ceshine Lee

Replicate Conda Environment in Docker

Photo Credit Introduction You just finished developing your prototype in a Conda environment, and you are eager to share it with stakeholders, who may not have the required knowledge to recreate the environment to run your model on their end. Docker is a great tool that can help in this kind of scenario (p.s: it can utilize GPU via nvidia-docker). Just create a Docker image and share it with the stakeholders, and your model will run on their device the same way it runs on yours. ...

October 7, 2020 · Ceshine Lee

[Paper] Please Stop Permuting Features

Photo Credit This post summarizes the findings and suggestions from the paper “Please Stop Permuting Features ‒ An Explanation and Alternatives” by Giles Hooker and Lucas Mentch. (Note: Permutation importance is covered in one of my previous posts: Feature Importance Measures for Tree Models — Part I.) TL;DR Permutation importance (permuting features without retraining) is biased toward features that are correlated. Avoid using it, and use one of the following alternatives: ...

September 8, 2020 · Ceshine Lee

[Paper] Language-agnostic BERT Sentence Embedding

Photo Credit The Google AI Blog post This post on Google AI Blog explains the premise, background, and related works of this paper pretty well. I’m not going to repeat them in this post. Instead, I’ll try to fill in some of the gaps I see as someone that is familiar with this topic but does not follow very closely with the latest development. Firstly, I want to point out something in the Google AI post that confuses me. In the first paragraph the authors stated: ...

August 19, 2020 · Ceshine Lee

[Competition] Jigsaw Multilingual Toxic Comment Classification

Photo Credit Introduction Jigsaw Multilingual Toxic Comment Classification is the third Jigsaw toxic comment classification hosted on Kaggle. I’ve covered both the first one in 2018 and the second one in 2019 on this blog. This time, Kagglers were asked to use English training corpora to create multilingual toxic comment classifiers that are tested in 6 other languages. I’ve been taking a break from Kaggle during COVID pandemic, so I did not participate in this year’s competition. However, reading top solutions is always very helpful whether you participated or not, and is exactly what I’m doing in this post. Due to time limitation, I will only cover a small part of the solutions shared. I’ll update the post if I find the other interesting things later. ...

August 5, 2020 · Ceshine Lee