Dealing with Synthetic Data
Photo Credit Overview Kaggle recently hosted a competition (Instant Gratification) to test their new “synchronous Kernel-only competition” format. It features a synthetic dataset, and the best way to achieve high score on this dataset is to reverse-engineer the dataset creation algorithm. I did not really spend time into this competition, but after the competition was over I went back checked the discussion forum for solutions and insights shared, and found it actually quite interesting. There are quite a few of lessons to be learned about how to create or deal with synthetic data. ...