[PYTHON] [Survey] Kaggle --Quora 3rd place solution summary

Kaggle --Quora Question Pairs [^ 1] 3rd place solution [^ 2] research article.

[3rd place] Overview Of 3rd Place Solution Author: Jared Turkewitz Discussion URL: https://www.kaggle.com/c/quora-question-pairs/discussion/34288

architecture

--Use neural network, LightGBM, XGBoost --The first layer of Model Stacking is 1300 features --Use LightGBM (5 times faster than XGboost, slightly less accurate) --Stacking of 15 models --XGBoost is the best for a single model (CV = 0.185)

Features in natural language processing

--Natural language processing features: word match, similar word match, etc. --Distance between TI-IDF and LDA --Word co-occurrence (self-mutual information [^ 4]) [^ 7] --Number of word matches --Fuzzy word matching scale (editing distance, letter N-gram distance)

Features on the graph structure

--Common number of words, frequency, question frequency only for question 1, question frequency only for question 2, etc.

Neural net

--Bidirectional LSTM --Distributed expression --Learned GloVe --part of speech embedding --named entity embedding --dependency parse embedding [^ 6] --siamese network [^ 3] --Attention part - Softmax Matching - Maxpool Matching

Other ideas

--Selectively adjust forecasts according to question frequency

References

Recommended Posts

[Survey] Kaggle --Quora 3rd place solution summary
[Survey] Kaggle --Quora 5th place solution summary
[Survey] Kaggle --Quora 4th place solution summary
[Survey] Kaggle --Quora 2nd place solution summary
[Survey] Kaggle --Data Science Bowl 2017, 2nd place solution
Kaggle Summary: Outbrain # 2
Kaggle Summary: Outbrain # 1
Kaggle related summary
Kaggle Summary: Redhat (Part 1)
Kaggle Summary: BOSCH (kernels)
Kaggle Summary: BOSCH (winner)
Kaggle Summary: Redhat (Part 2)