Veritable Tech Blog

Custom Image Augmentation with Keras

Photo by Josh Gordon on Unsplash The new Tensorflow 2.0 is going to standardize on Keras as its High-level API. The existing Keras API will mostly remain the same, while Tensorflow features like eager execution, distributed training and other deeper Tensorflow integration will be added or improved. I think it’s a good time to revisit Keras as someone who had switched to use PyTorch most of the time. ...

UMAP on RAPIDS (15x Speedup)

A_Different_Perspective from Pixabay RAPIDS RAPIDS is a collection of Python libraries from NVIDIA that enables the users to do their data science pipelines entirely on GPUs. The two main components are cuDF and cuML. The cuDF library provides Pandas-like data frames, and cuML mimics scikit-learn. There’s also a cuGRAPH graph analytics library that have been introduced in the latest release (0.6 on March 28). The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. RAPIDS is incubated by NVIDIA® based on years of accelerated data science experience. RAPIDS relies on NVIDIA CUDA® primitives for low-level compute optimization, and exposes GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces. ...

Use NVIDIA Apex for Easy Mixed Precision Training in PyTorch

Photo by Sam Power on Unsplash The Apex project from NVIDIA is touted as a PyTorch extension that let developers do mixed precision and distributed training “with 4 or fewer line changes to the existing code”. It’s been out for a while (circa June 2018) and seems to be well received (huggingface/pytorch-pretrained-BERT uses Apex to do 16-bit training). So I decided to give it a try. This post documents what I’ve learned. ...

Multilingual Similarity Search Using Pretrained Bidirectional LSTM Encoder

Photo by Steven Wei on Unsplash Introduction Previously I’ve demonstrated how to use pretrained BERT model to create a similarity measure between two documents in this post: News Topic Similarity Measure using Pretrained BERT Model. However, to find similar entries to* N* documents in corpus A of size M, we need to run NM* feed-forwards. A more efficient and widely used method is to use neural networks to generate sentence/document embeddings, and calculate cosine similarity scores between these embeddings. ...

News Topic Similarity Measure using Pretrained BERT Model

credit In this post we establish a topic similarity measure among the news articles collected from the New York Times RSS feeds. The main purpose is to familiarized ourselves with the (PyTorch) BERT implementation and pretrained model(s). What is BERT? BERT stands for Bidirectional Encoder Representations from Transformers. It comes from a paper published by Google AI Language in 2018[1]. It is based on the idea that fine-tuning a pretrained language model can help the model achieve better results in the downstream tasks[2][3]. ...