Analyzing Tweets with R

Source Introduction NLP(Natural-language processing) is hard, partly because human is hard to understand. We need good tools to help us analyze texts. Even if the texts are eventually fed into a black box model, doing exploratory analysis is very likely to help you get a better model. I’ve heard great things about a R package tidytext and recently decided to give it a try. The package authors also wrote a book about it and kindly released it online: Text Mining with R: A guide to text analysis within the tidy data framework, using the tidytext package and other tidy tools. ...

February 27, 2018 · Ceshine Lee

Feature Importance Measures for Tree Models — Part I

Photo Credit 2018–02–20 Update: Adds two images (random forest and gradient boosting). 2019–05–25 Update: I’ve published a post covering another importance measure — SHAP values — on my personal blog and on Medium. This post is inspired by a Kaggle kernel and its discussions [1]. I’d like to do a brief review of common algorithms to measure feature importance with tree-based models. We can interpret the results to check intuition(no surprisingly important features), do feature selection, and guide the direction of feature engineering. ...

October 28, 2017 · Ceshine Lee

[Learning Note] Single Shot MultiBox Detector with Pytorch — Part 3

(Reminder: The SSD paper and the Pytorch implementation used in this post. Also, the first and second part of the series.) Training Objective / Loss Function Every deep learning / neural network needs a differentiable objective function to learn from. After pairing ground truths and default boxes, and marking the remaining default boxes as background, we’re ready to formulate the objective function of SSD: Overall Objective — Formula (1) from the original paper ...

July 27, 2017 · Ceshine Lee

[Learning Note] Single Shot MultiBox Detector with Pytorch — Part 2

In the previous post we discussed the network structure and the prediction scheme of SSD. Now we move on to combine default boxes and the ground truth, so the quality of the prediction can be determined (and be improved via training). (Reminder: The SSD paper and the Pytorch implementation used in this post) Map Default Boxes to Coordinates On Input Images Parameters of default boxes for each feature map are pre-calculated and hard-coded in data/config.py: ...

July 26, 2017 · Ceshine Lee

[Learning Note] Single Shot MultiBox Detector with PyTorch — Part 1

Recently I’m trying to pick up PyTorch as well as some object detection deep learning algorithms. So to kill two birds with one stone, I decided to read the Single Shot MultiBox Detector paper along with one of the PyTorch implementation written by Max deGroot. Admittedly, I have some trouble understanding some ideas in the paper. After reading the implementation and scratching my head for a while, I think I figured out at least some parts of them. So the following is my notes on some confusing concept after my first and second pass of reading. ...

July 24, 2017 · Ceshine Lee