Python

Tensorflow Profiler with Custom Training Loop

Photo Credit Introduction The Tensorflow Profiler in the upcoming Tensorflow 2.2 release is a much-welcomed addition to the ecosystem. For image-related tasks, often the bottleneck is the input pipeline. But you also don’t want to spend time optimizing the input pipeline unless it is necessary. The Tensorflow Profiler makes pinpointing the bottleneck of the training process much easier, so you can decide where the optimization effort should be put into. ...

Monitor Python Script Cron Jobs using Telegram

Photo Credit Motivation Apache Airflow is great for managing scheduled workflows, but in a lot of cases, it is an overkill and brings unnecessary complexity to the overall solution. Cron jobs are much easier to set up, have built-in support in most systems, and have a very flat learning curve. However, the lack of monitoring features and the consequential silent failures can be the bane of system admins’ lives. We want a simple solution that can help admins monitor the health of cron jobs in simple scenarios that do not warrant Airflow. The simple scenarios have the following characteristics: ...

Customizing Spacy Sentence Segmentation

Photo Credit The Problem Often in natural language processing(NLP), we would want to split a large document into sentences, so we can analyze the individual sentences and the relationship between them. Spacy’s pretrained neural models provide such functionality via their syntactic dependency parsers. It also provides a rule-based Sentencizer, which will be very likely to fail with more complex sentences. While the statistical sentence segmentation of spacy works quite well in most cases, there are still some weird cases on which it fails. One of them is the difficulty in handling the ’s tokens, which I noticed when using Spacy version 1.0.18 and model en_core_web_md version 2.0.0. ...

A First Look at Plotly Express

Photo Credit Plotly has a new high-level wrapper libaray for Python called Plotly Express. Along with the new theming system introduced late last year, this post documents me trying out the new API and features. It also includes simple comparisons between the base Plotly.py API and the Plotly Express, and my initial thoughts on Plotly Express. This post does not intend to cover all kind of plots. Only plots relevant to the particular dataset used here (basically bar charts) are covered. ...

UMAP on RAPIDS (15x Speedup)

A_Different_Perspective from Pixabay RAPIDS RAPIDS is a collection of Python libraries from NVIDIA that enables the users to do their data science pipelines entirely on GPUs. The two main components are cuDF and cuML. The cuDF library provides Pandas-like data frames, and cuML mimics scikit-learn. There’s also a cuGRAPH graph analytics library that have been introduced in the latest release (0.6 on March 28). The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. RAPIDS is incubated by NVIDIA® based on years of accelerated data science experience. RAPIDS relies on NVIDIA CUDA® primitives for low-level compute optimization, and exposes GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces. ...