Kaggle

Photo Credit Introduction Jigsaw toxic comment classification challenge features a multi-label text classification problem with a highly imbalanced dataset. The test set used originally was revealed to be already public on the Internet, so a new dataset was released mid-competition, and the evaluation metric was changed from Log Loss to AUC. I tried a few ideas after building up my PyTorch pipeline but did not find any innovative approach that looks promising. Text normalization is the only strategy I had found to give solid improvements, but it is very time consuming. The final result (105th place/about top 3%) was quite fitting IMO given the time I spent on this competition(not a lot). ...