Clutter-free Interactive Charts in R using Plotly

This is a short post describing how to use Plotly to make text-heavy charts cleaner in R. Introduction David Robinson presented a beautiful way to visualize the ratings of the Office episodes in this screencast: The chart (shown below) is sufficiently readable when zoomed in on a full HD monitor, but is quite messy when exported to a smaller frame. Moreover, some of the episode names are not displayed (to avoid overlapping). ...

March 31, 2020 · Ceshine Lee

TensorFlow 2.1 with TPU in Practice

Photo Credit Executive Summary TensorFlow has become much easier to use: As an experience PyTorch developer who only knows a bit of TensorFlow 1.x, I was able to pick up TensorFlow 2.x in my spare time in 60 days and do competitive machine learning. TPU has never been more accessible: The new interface to TPU in TensorFlow 2.1 works right out of the box in most cases and greatly reduces the development time required to make a model TPU-compatible. Using TPU drastically increases the iteration speed of experiments. We present a case study of solving a Q&A labeling problem by fine-tuning the RoBERTa-base model from huggingface/transformer library: Codebase Colab TPU training notebook Kaggle Inference Kernel High-level library TF-HelperBot to provide more flexibility than the Keras interface. (TensorFlow 2.1 and TPU are also a very good fit for CV applications. A case study of solving an image classification problem will be published in about a month.) Acknowledgment I was granted free access to Cloud TPUs for 60 days via TensorFlow Research Cloud. It was for the TensorFlow 2.0 Question Answering competition. I chose to do this simpler Google QUEST Q&A Labeling competition first but unfortunately couldn’t find enough time to go back and do the original one (sorry!). ...

February 13, 2020 · Ceshine Lee

Create a Customized Text Annotation Tool in Two Days - Part 2

Photo Credit Introduction In Part 1 of this series, we’ve discussed why building your own annotation tool can be a good idea, and demonstrated a back-end API server based on FastAPI. Now in this Part 2, we’re going to build a front-end interface that interacts with the end-user (the annotator). The front-end needs to do mainly three things: Fetch a batch of sentence/paragraph pairs to be annotated from the back-end server. Present the pairs to the annotator and provide a way for them to adjust the automatically generated labels. Send the annotated results to the back-end server. Disclaimer: I’m relatively inexperienced in front-end development. The code here may seem extremely amateur to professionals. However, I hope this post can serve as a reference or starting point for those with similar requirements. ...

December 17, 2019 · Ceshine Lee

Create a Customized Text Annotation Tool in Two Days - Part 1

Photo Credit Introduction In my previous post, Fine-tuning BERT for Similarity Search, I mentioned that I annotated 2,000 pair of sentence pairs, but did not describe how I did it and what tool I used. Now in this two-part series, we’ll see how I created a customized text annotation tool that greatly speeds up the annotation process. The entire stack was developed in two days. You can probably do it a lot faster if you are familiar with the technology (the actual time I spent on it is about 6 hours top). ...

December 16, 2019 · Ceshine Lee

Fine-tuning BERT for Similarity Search

Photo Credit Synopsis I have the task of finding similar entries among 8,000+ pieces of news, using their title and edited short descriptions in Traditional Chinese. I tried LASER[1] first but later found Universal Sentence Encoder[2] seemed to work slightly better. Results from these unsupervised approaches are already acceptable, but still have occasional confusion and hiccups. Not entirely satisfied with the unsupervised approaches, I collected and annotated 2,000 pairs of news and fine-tuned the BERT model on this dataset. This supervised approach is visibly better than the unsupervised one. And it’s also quite sample-efficient. Three hundred and fifty training example is already enough to beat Universal Sentence Encoder by a large margin. ...

November 28, 2019 · Ceshine Lee