Implement FTRL-Proximal Algorithm in Go - Part 1

For the sake of practicing, I’ve re-written tinrtgu’s solution to the Avazu challenge on Kaggle using Go. I’ve made some changes to save more memory, but the underlying algorithm is basically the same. (See this paper from where the alogorithm came for more information). The go code has been put on Github Gist. Any constructive comments are welcomed on that gist page, as I haven’t added a comment section on this blog. (I haven’t even set up Google Analytics, so I have no idea how many people are reading thi blog) I’m also working on a concurrent version utilizing the built-in support of concurrency in Go. So theoretically it would run faster in multi-core environment. ...

December 9, 2014 · Ceshine Lee

The Power of PyPy

PyPy is an alternative Python implementation which emphasize on speed and memory usage. I didn’t take it seriously until I wrote a Python script for a kaggle competition that requires hours to run. I read someone on the kaggle forum suggesting everyone to give PyPy a try. I did. And it worked like a magic. A 2 to 5 times speed boost can be achieved just by substituting python with pypy when you run a python script. Don’t have a accurate number for that, but it was significantly faster. This is critical because now you have more time to try different models and hence get a better score in the competition. ...

November 29, 2014 · Ceshine Lee

Tip for using iPython Notebooks in virtualenv

When trying to install ipython and dependencies of its notebook feature via pip, I was stuck. Even I’d already installed pyzmq, I still got this message: ImportError: IPython.zmq requires pyzmq It was quite frustrating, until I found this post on StackOverflow. So it turns out this can be solved by just install pyzmq using an extra parameter: pip install pyzmq --install-option="--zmq=bundled"

April 29, 2014 · Ceshine Lee

A simple script to automate MySQLdump backups

I just moved my MySQL database to some OpenVZ VPS, which doesn’t support snapshot backups. Therefore I had to set up some backup mechanism myself. The solution I came up with is to use BitTorrent Sync to sync my backups to the other server. It turns out to be much faster than transfering backups using scp and much easier (and perhaps more secure) than using FTP. I highly recommend BitTorrent Sync. ...

March 5, 2014 · Ceshine Lee

Shortcuts for some common statistical functions

Here are some useful functions when performing statistical analysis: Confidence Interval from scipy import stats from numpy import mean, var from math import sqrt sample = [1, 2, 3, 4, 5] #95% confidence interval R = stats.t.interval(0.95, len(sample)-1, loc=mean(sample), scale=sqrt(var(sample)/len(sample))) >>> R (1.2440219338298311, 4.7559780661701687) SciPy documentation Correlation Coefficient from numpy import corrcoef x = [1, 2, 3, 4, 100] y = [6, 7, 8, 9, 10] r = corrcoef(x, y) >>> r array([[ 1., 0.72499943], [ 0.72499943, 1.]]) SciPy documentation Linear Regression from scipy import stats x = [1, 2, 3, 4, 5] y = [6, 7, 8, 9, 10] slope, intercept, r_value, p_value, std_err = stats.linregress(x,y) >>> slope, intercept (1.0, 5.0) SciPy documentation

February 25, 2014 · Ceshine Lee