[Notes] “Statistical Inference Enables Bad Science; Statistical Thinking Enables Good Science”

Photo Credit This article by Christopher Tong has got a lot of love from people I followed on Twitter, so I decided to read it. It was very enlightening. But to be honest, I don’t fully understand quite a few arguments made by this article, probably because I lack the experience of more rigorous scientific experiments and research. Nonetheless, I think writing down the parts I find interesting and put it into a blog post would be beneficial for myself and other potential readers. Hopefully, it makes it easier to reflect on these materials later. ...

November 9, 2019 · Ceshine Lee

Pro Tip: Use Shutdown Script Detect Preemption on GCP

Photo Credit Motivation I was recently given a $300 credit to Google Cloud Platform from Google to participate in a Kaggle competition. This gives me free access to the powerful GPUs (T4, P100, even V100) to train models and thus opens the window to many new possibilities. However, the problem is that $300 can be used up rather quickly. For example, one Tesla P100 GPU cost $1.46 per hour, so $300 can only give me 200 hours or 8.5 days. Don’t forget there are still other costs from CPU, memory, and disk storage. ...

October 25, 2019 · Ceshine Lee

Zero Shot Cross-Lingual Transfer with Multilingual BERT

Photo Credit Synopsis Do you want multilingual sentence embeddings, but only have a training dataset in English? This post presents an experiment that fine-tuned a pre-trained multilingual BERT model(“BERT-Base, Multilingual Uncased” [1][2]) on monolingual(English) AllNLI dataset[4] to create sentence embeddings model(that maps a sentence to a fixed-size vector)[3]. The experiment shows that the fine-tuned multilingual BERT sentence embeddings have generally better performance (i.e. lower error rates) over baselines in a multilingual similarity search task (Tatoeba dataset[5]). However, the error rates are still significantly higher than the ones from specialized sentence embedding models trained with multilingual datasets[5]. ...

September 24, 2019 · Ceshine Lee

More Memory-Efficient Swish Activation Function

Photo Credit Update on 2020-08-22: using torch.cuda.max_memory_allocated() and torch.cuda.reset_peak_memory_stats() in the newer version (1.6+) of PyTorch is probably more accurate. (reference) Motivation Recently I’ve been trying out EfficientNet models implemented in PyTorch. I’ve managed to successfully fine-tune pretrained EfficientNet models on my data set and reach accuracy on par with the mainstream ones like SE-ResNeXt-50. However, training the model from scratch has proven to be much harder. Fine-tuned EfficientNet models can reach the same accuracy with much smaller number of parameters, but they seem to occupy a lot of GPU memory than it probably should (comparing to the mainstream ones). There is an open issue on the Github Repository about this problem — [lukemelas/EfficientNet-PyTorch] Memory Issues. ...

August 22, 2019 · Ceshine Lee

Customizing Spacy Sentence Segmentation

Photo Credit The Problem Often in natural language processing(NLP), we would want to split a large document into sentences, so we can analyze the individual sentences and the relationship between them. Spacy’s pretrained neural models provide such functionality via their syntactic dependency parsers. It also provides a rule-based Sentencizer, which will be very likely to fail with more complex sentences. While the statistical sentence segmentation of spacy works quite well in most cases, there are still some weird cases on which it fails. One of them is the difficulty in handling the ’s tokens, which I noticed when using Spacy version 1.0.18 and model en_core_web_md version 2.0.0. ...

August 14, 2019 · Ceshine Lee