More Memory-Efficient Swish Activation Function

Photo Credit Update on 2020-08-22: using torch.cuda.max_memory_allocated() and torch.cuda.reset_peak_memory_stats() in the newer version (1.6+) of PyTorch is probably more accurate. (reference) Motivation Recently I’ve been trying out EfficientNet models implemented in PyTorch. I’ve managed to successfully fine-tune pretrained EfficientNet models on my data set and reach accuracy on par with the mainstream ones like SE-ResNeXt-50. However, training the model from scratch has proven to be much harder. Fine-tuned EfficientNet models can reach the same accuracy with much smaller number of parameters, but they seem to occupy a lot of GPU memory than it probably should (comparing to the mainstream ones). There is an open issue on the Github Repository about this problem — [lukemelas/EfficientNet-PyTorch] Memory Issues. ...

August 22, 2019 · Ceshine Lee

Customizing Spacy Sentence Segmentation

Photo Credit The Problem Often in natural language processing(NLP), we would want to split a large document into sentences, so we can analyze the individual sentences and the relationship between them. Spacy’s pretrained neural models provide such functionality via their syntactic dependency parsers. It also provides a rule-based Sentencizer, which will be very likely to fail with more complex sentences. While the statistical sentence segmentation of spacy works quite well in most cases, there are still some weird cases on which it fails. One of them is the difficulty in handling the ’s tokens, which I noticed when using Spacy version 1.0.18 and model en_core_web_md version 2.0.0. ...

August 14, 2019 · Ceshine Lee

[Notes] Jigsaw Unintended Bias in Toxicity Classification

Photo Credit Preamble Jigsaw hosted a toxic comment classification competition[2] in 2018, and has also created an API service for detecting toxic comments[3]. However, it has been shown that the model trained on this kind of datasets tend to have some biases against minority groups. For example, a simple sentence “I am a black woman” would be classified as toxic, and also more toxic than the sentence “I am a woman"[4]. This year’s Jigsaw Unintended Bias in Toxicity Classification competition[1] introduces an innovative metric that aims to reduce such biases and challenges Kagglers to find out the best score we can get under this year’s new dataset. ...

August 4, 2019 · Ceshine Lee

[Notes] iMet Collection 2019 - FGVC6 (Part 1)

Photo Credit Overview Preamble I started doing this competition (iMet Collection 2019 - FGVC6) seriously after hitting a wall doing the Freesound competition. It was really late (only about one week until the competition ends), but by re-using a lot of code from the Freesound competition and using Kaggle Kernels to train models, I managed to get a decent submission with F2 score of 0.622 on the private leaderboard (the top 1 solution got 0.672, but used a hell lot more resources to train). ...

July 16, 2019 · Ceshine Lee

Dealing with Synthetic Data

Photo Credit Overview Kaggle recently hosted a competition (Instant Gratification) to test their new “synchronous Kernel-only competition” format. It features a synthetic dataset, and the best way to achieve high score on this dataset is to reverse-engineer the dataset creation algorithm. I did not really spend time into this competition, but after the competition was over I went back checked the discussion forum for solutions and insights shared, and found it actually quite interesting. There are quite a few of lessons to be learned about how to create or deal with synthetic data. ...

June 25, 2019 · Ceshine Lee