[Competition] Jigsaw Multilingual Toxic Comment Classification

Photo Credit Introduction Jigsaw Multilingual Toxic Comment Classification is the third Jigsaw toxic comment classification hosted on Kaggle. I’ve covered both the first one in 2018 and the second one in 2019 on this blog. This time, Kagglers were asked to use English training corpora to create multilingual toxic comment classifiers that are tested in 6 other languages. I’ve been taking a break from Kaggle during COVID pandemic, so I did not participate in this year’s competition. However, reading top solutions is always very helpful whether you participated or not, and is exactly what I’m doing in this post. Due to time limitation, I will only cover a small part of the solutions shared. I’ll update the post if I find the other interesting things later. ...

August 5, 2020 · Ceshine Lee

[Tip] TorchScript Supports Half Precision

Photo Credit This is a short post describing how to use half precision in TorchScript. This can speed up models that were trained using mixed precision in PyTorch (using Apex Amps), and also some of the model trained using full precision (with some potential degradation of accuracy). TorchScript is a way to create serializable and optimizable models from PyTorch code. Any TorchScript program can be saved from a Python process and loaded in a process where there is no Python dependency. source ...

July 11, 2020 · Ceshine Lee

Self-Supervised Domain Adaptation

Photo Credit Introduction Self-supervised learning made transfer learning possible in NLP [1] (by using language modeling as the pre-training task) and has started to show some potential in CV as well [2, 3, 4]. They make the downstream tasks more label efficient, that is, requires fewer labeled examples to achieve good prediction accuracies. In CV, we are already quite familiar with transfer learning from models pre-trained on the labeled Imagenet dataset. However, if the dataset used in the downstream task is significantly different from the Imagenet, transfer learning/fine-tuning usually would not be very helpful. ...

July 6, 2020 · Ceshine Lee

Deploying EfficientNet Model using TorchServe

Photo Credit Introduction AWS recently released TorchServe, an open-source model serving library for PyTorch. The production-readiness of Tensorflow has long been one of its competitive advantages. TorchServe is PyTorch community’s response to that. It is supposed to be the PyTorch counterpart of Tensorflow Serving. So far, it seems to have a very strong start. This post from the AWS Machine Learning Blog and the documentation of TorchServe should be more than enough to get you started. But for advanced usage, the documentation is a bit chaotic and the example code suggests sometimes conflicting ways to do things. ...

May 4, 2020 · Ceshine Lee

[Notes] Jigsaw Unintended Bias in Toxicity Classification

Photo Credit Preamble Jigsaw hosted a toxic comment classification competition[2] in 2018, and has also created an API service for detecting toxic comments[3]. However, it has been shown that the model trained on this kind of datasets tend to have some biases against minority groups. For example, a simple sentence “I am a black woman” would be classified as toxic, and also more toxic than the sentence “I am a woman"[4]. This year’s Jigsaw Unintended Bias in Toxicity Classification competition[1] introduces an innovative metric that aims to reduce such biases and challenges Kagglers to find out the best score we can get under this year’s new dataset. ...

August 4, 2019 · Ceshine Lee