The Power of PyPy

PyPy is an alternative Python implementation which emphasize on speed and memory usage. I didn’t take it seriously until I wrote a Python script for a kaggle competition that requires hours to run. I read someone on the kaggle forum suggesting everyone to give PyPy a try. I did. And it worked like a magic. A 2 to 5 times speed boost can be achieved just by substituting python with pypy when you run a python script. Don’t have a accurate number for that, but it was significantly faster. This is critical because now you have more time to try different models and hence get a better score in the competition. ...

November 29, 2014 · Ceshine Lee

Tip for using iPython Notebooks in virtualenv

When trying to install ipython and dependencies of its notebook feature via pip, I was stuck. Even I’d already installed pyzmq, I still got this message: ImportError: IPython.zmq requires pyzmq It was quite frustrating, until I found this post on StackOverflow. So it turns out this can be solved by just install pyzmq using an extra parameter: pip install pyzmq --install-option="--zmq=bundled"

April 29, 2014 · Ceshine Lee

Shortcuts for some common statistical functions

Here are some useful functions when performing statistical analysis: Confidence Interval from scipy import stats from numpy import mean, var from math import sqrt sample = [1, 2, 3, 4, 5] #95% confidence interval R = stats.t.interval(0.95, len(sample)-1, loc=mean(sample), scale=sqrt(var(sample)/len(sample))) >>> R (1.2440219338298311, 4.7559780661701687) SciPy documentation Correlation Coefficient from numpy import corrcoef x = [1, 2, 3, 4, 100] y = [6, 7, 8, 9, 10] r = corrcoef(x, y) >>> r array([[ 1., 0.72499943], [ 0.72499943, 1.]]) SciPy documentation Linear Regression from scipy import stats x = [1, 2, 3, 4, 5] y = [6, 7, 8, 9, 10] slope, intercept, r_value, p_value, std_err = stats.linregress(x,y) >>> slope, intercept (1.0, 5.0) SciPy documentation

February 25, 2014 · Ceshine Lee

Dicussing the zen of python

Every newbie for python should have already heard of or read PEP-8 a.k.a. THE style guide for python code, which hopefully I can cover in one of the next few posts. However, there’s one more important ground for you to cover before you can become a professional. It’s The Zen of Python. You can read it right inside the python interpreter: import this The Zen of Python Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren’t special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one– and preferably only one –obvious way to do it. Although that way may not be obvious at first unless you’re Dutch. Now is better than never. Although never is often better than right now. If the implementation is hard to explain, it’s a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea – let’s do more of those! Interpretations and Confusions The Zen of Python describes the philosophy the creators of Python hold when designing Python. However, official interpretations for these 19 aphorisms do not exist except PEP-8. This somewhat creates confusions for Python learners to understand them and to apply them, I myself included. ...

October 7, 2013 · Ceshine Lee