[Kaggle] Google Research Football 2020

Photo Credit (This post an expansion of this Kaggle post.) My Solution Thanks to Kaggle, Manchester City F.C., and Google Research for this fantastic competition. Working on this competition was the most fun I’ve had for a while. The tl;dr version of my solution is that I used an MLP model to stochastically imitate WeKick’s agents, with some rules to help it navigate in unfamiliar waters. Why this Approach After I got the GCP coupon, I looked at the competition timeline and thought that there is no way I can train a competitive RL agent from scratch in less than two weeks. I had to find some way to cut the training time shorter. ...

December 28, 2020 · Ceshine Lee

[Competition] Jigsaw Multilingual Toxic Comment Classification

Photo Credit Introduction Jigsaw Multilingual Toxic Comment Classification is the third Jigsaw toxic comment classification hosted on Kaggle. I’ve covered both the first one in 2018 and the second one in 2019 on this blog. This time, Kagglers were asked to use English training corpora to create multilingual toxic comment classifiers that are tested in 6 other languages. I’ve been taking a break from Kaggle during COVID pandemic, so I did not participate in this year’s competition. However, reading top solutions is always very helpful whether you participated or not, and is exactly what I’m doing in this post. Due to time limitation, I will only cover a small part of the solutions shared. I’ll update the post if I find the other interesting things later. ...

August 5, 2020 · Ceshine Lee

TensorFlow 2.1 with TPU in Practice

Photo Credit Executive Summary TensorFlow has become much easier to use: As an experience PyTorch developer who only knows a bit of TensorFlow 1.x, I was able to pick up TensorFlow 2.x in my spare time in 60 days and do competitive machine learning. TPU has never been more accessible: The new interface to TPU in TensorFlow 2.1 works right out of the box in most cases and greatly reduces the development time required to make a model TPU-compatible. Using TPU drastically increases the iteration speed of experiments. We present a case study of solving a Q&A labeling problem by fine-tuning the RoBERTa-base model from huggingface/transformer library: Codebase Colab TPU training notebook Kaggle Inference Kernel High-level library TF-HelperBot to provide more flexibility than the Keras interface. (TensorFlow 2.1 and TPU are also a very good fit for CV applications. A case study of solving an image classification problem will be published in about a month.) Acknowledgment I was granted free access to Cloud TPUs for 60 days via TensorFlow Research Cloud. It was for the TensorFlow 2.0 Question Answering competition. I chose to do this simpler Google QUEST Q&A Labeling competition first but unfortunately couldn’t find enough time to go back and do the original one (sorry!). ...

February 13, 2020 · Ceshine Lee

[Notes] Jigsaw Unintended Bias in Toxicity Classification

Photo Credit Preamble Jigsaw hosted a toxic comment classification competition[2] in 2018, and has also created an API service for detecting toxic comments[3]. However, it has been shown that the model trained on this kind of datasets tend to have some biases against minority groups. For example, a simple sentence “I am a black woman” would be classified as toxic, and also more toxic than the sentence “I am a woman"[4]. This year’s Jigsaw Unintended Bias in Toxicity Classification competition[1] introduces an innovative metric that aims to reduce such biases and challenges Kagglers to find out the best score we can get under this year’s new dataset. ...

August 4, 2019 · Ceshine Lee

[Notes] iMet Collection 2019 - FGVC6 (Part 1)

Photo Credit Overview Preamble I started doing this competition (iMet Collection 2019 - FGVC6) seriously after hitting a wall doing the Freesound competition. It was really late (only about one week until the competition ends), but by re-using a lot of code from the Freesound competition and using Kaggle Kernels to train models, I managed to get a decent submission with F2 score of 0.622 on the private leaderboard (the top 1 solution got 0.672, but used a hell lot more resources to train). ...

July 16, 2019 · Ceshine Lee