Playing with rstudio/gt R Package

Photo Credit Tables can be an effective way of communicating data. Though not as powerful in telling stories as charts, by cramming a lot of numbers into a limited space, tables can provide readers with accurate and potentially useful information which readers can interpret in their own ways. I’ve come across this new R package gt (Easily generate information-rich, publication-quality tables from R) and decided to give it a try. ...

January 22, 2019 · Ceshine Lee

More Portable, Reproducible R Development Environment

Photo Credit R is awesome. In my opinion it’s the best (free) tool for telling great stories with data. My first post on Medium was about R. Although what I wrote here mostly involves Python, I still try to get back to R from time to time. I briefly mentioned my preferred R setup in this previous post “Analyzing Tweets with R” (in “R tips” section), which includes _Microsoft R Open _(MRO) and the checkpoint package. Unfortunately, checkpoint doesn’t work well with RStudio, and some weird issues with MRO become more and more annoying to me. Therefore I decided to find a new setup that can work more smoothly and reliably. After some trial and error, here is a configuration that I ended up most satisfied with: ...

January 3, 2019 · Ceshine Lee

Use TextRank to Extract Most Important Sentences in Article

Photo Credit Motivation I’m trying to build a NLP system that can automatically highlight the important part of an article to help people to read long articles. The common practice is to start with a simple baseline model that is useful enough, and then incrementally improves the performance. The TextRank algorithm[1], which I also used as a baseline in a text summarization system, is a natural fit to this task. ...

December 7, 2018 · Ceshine Lee

Implementing Beam Search - Part 2

Photo Credit Overview Part one gave an overview on how OpenNMT-py produces output sequences for a batch of input sequences (Translator._translate_batch method), and how it conducts beam searches (Beam objects): Implementing Beam Search (Part 1) - A Source Code Analysis of OpenNMT-py Now we turn our attention to some of the details we skipped through in part one — the advanced features that influence how the translator produce output candidates/hypotheses. They can be put into two categories: rule-based and number-based. ...

November 7, 2018 · Ceshine Lee

Implementing Beam Search - Part 1

Photo Credit As hinted in the previous post “Building a Summary System in Minutes”, I’ll try do some source code analysis of OpenNMT-py project in this post. I’d like to start with its Beam Search implementation. It is widely used in seq2seq models, but I haven’t yet had a good grasp on its details. The translator/predictor of OpenNMT-py is also one of the most powerful I’ve seen, coming with a wide range of parameters and options. ...

November 5, 2018 · Ceshine Lee