First Step of Web Scraping in Go

An appropriate amount of web scraping is often required for web-related data science projects. Python has a well-known scraping framework called Scrapy which aims to accommodate all kinds of possible scenarios. For those who want more control over the process and don’t mind getting their hands dirty, GRequests(or the good old Requests) combined with BeautifulSoup can also be a solid solution. However, multi-threading in Python can cause a lot of pain in the neck. And Scrapy depends on Twisted, which is not yet Python3-ready, and there is no clear roadmap on when the project will finish migrating to Python 3.x. These constraints made me started finding other faster, and more robust alternatives. ...

August 29, 2015 · Ceshine Lee

Implement FTRL-Proximal Algorithm in Go - Part 2

I’ve actually finished the concurrent version of the algorithm a while ago, right after the previous post. Unfortunately my laptop broke and it took almost a month to repair. Now I finally get to publish the result here. I know that the code is not elegant nor properly documented, but it’s a start. You’ll need to set the core variable in the main function to the number of cores of your CPU. The program will simultaneously trains a number of models according to that value, and predict the average of the prediction from each model. ...

January 2, 2015 · Ceshine Lee

Implement FTRL-Proximal Algorithm in Go - Part 1

For the sake of practicing, I’ve re-written tinrtgu’s solution to the Avazu challenge on Kaggle using Go. I’ve made some changes to save more memory, but the underlying algorithm is basically the same. (See this paper from where the alogorithm came for more information). The go code has been put on Github Gist. Any constructive comments are welcomed on that gist page, as I haven’t added a comment section on this blog. (I haven’t even set up Google Analytics, so I have no idea how many people are reading thi blog) I’m also working on a concurrent version utilizing the built-in support of concurrency in Go. So theoretically it would run faster in multi-core environment. ...

December 9, 2014 · Ceshine Lee