Mistake I Made that Crippled My Streamlit App

Photo Credit Streamlit an increasingly popular tool that allows Python developers to turn data scripts into interactive web applications in a few lines of code. I recently developed and deployed a semantic search app for news articles in Chinese, and I made a mistake not caching the model loading code. The performance was abysmal, and the memory footprint was huge for a TinyBERT-4L model (had to allocate 1GB of memory for the app). ...

March 14, 2021 · Ceshine Lee

A Case Study of fastcore @patch_to

Photo Credit Motivation I recently came across this new image data augmentation technique called SnapMix. It looks like a very sensible improvement over CutMix, so I was eager to give it a try. The SnapMix author provides a PyTorch implementation. I made some adjustments to improve the numeric stability and converted it to a callback in PyTorch Lightning. I encountered one major obstacle during the process — SnapMix uses Class Activation Mapping(CAM) to calculate an augmented example’s label weights. It requires access to the final linear classifier’s weight and the model activations before the pooling operation. Some PyTorch pre-trained CV models do implement methods to access these two things, but the namings are inconsistent. We need a unified API to do this. ...

February 19, 2021 · Ceshine Lee

Deploying EfficientNet Model using TorchServe

Photo Credit Introduction AWS recently released TorchServe, an open-source model serving library for PyTorch. The production-readiness of Tensorflow has long been one of its competitive advantages. TorchServe is PyTorch community’s response to that. It is supposed to be the PyTorch counterpart of Tensorflow Serving. So far, it seems to have a very strong start. This post from the AWS Machine Learning Blog and the documentation of TorchServe should be more than enough to get you started. But for advanced usage, the documentation is a bit chaotic and the example code suggests sometimes conflicting ways to do things. ...

May 4, 2020 · Ceshine Lee

Tensorflow Profiler with Custom Training Loop

Photo Credit Introduction The Tensorflow Profiler in the upcoming Tensorflow 2.2 release is a much-welcomed addition to the ecosystem. For image-related tasks, often the bottleneck is the input pipeline. But you also don’t want to spend time optimizing the input pipeline unless it is necessary. The Tensorflow Profiler makes pinpointing the bottleneck of the training process much easier, so you can decide where the optimization effort should be put into. ...

April 24, 2020 · Ceshine Lee

Monitor Python Script Cron Jobs using Telegram

Photo Credit Motivation Apache Airflow is great for managing scheduled workflows, but in a lot of cases, it is an overkill and brings unnecessary complexity to the overall solution. Cron jobs are much easier to set up, have built-in support in most systems, and have a very flat learning curve. However, the lack of monitoring features and the consequential silent failures can be the bane of system admins’ lives. We want a simple solution that can help admins monitor the health of cron jobs in simple scenarios that do not warrant Airflow. The simple scenarios have the following characteristics: ...

April 10, 2020 · Ceshine Lee