[PYTHON] Classifying SNLI datasets with Word2Vec + LSTM

There is SNLI as a data set that classifies whether the two given sentences have a relationship between the premise and the hypothesis. In order to solve this problem, I implemented a model that combines the learned Word2Vec [^ 6] and LSTM in Keras.

Details of the SNLI [^ 1] dataset can be found in the article "How to read the SNLI dataset" [^ 2]. I'm glad if you can use it as a reference.

Preparation

Download SNLI dataset

You can download it from The Stanford Natural Language Inference (SNLI) Corpus.

wget https://nlp.stanford.edu/projects/snli/snli_1.0.zip
unzip snli_1.0.zip

Download trained Word2Vec weights

Word2Vec weights learned by Google News can be downloaded from the link "GoogleNews-vectors-negative300.bin.gz" on the following site [^ 6].

https://code.google.com/archive/p/word2vec/

Implementation

It is a code [^ 5] that trains a model that combines trained Word2Vec and LSTM with the SNLI dataset. The article [^ 3] will be helpful.

https://gist.github.com/namakemono/4e2a37375edf7a5e849b0a499897dcbe

Experimental result

It can be confirmed that the performance is about the same as the existing 300D LSTM [^ 11]. From here, you can use Decomposable Attention [^ 7] [^ 8] and ESIM [^ 9] [^ 10] to further improve the performance.

References

Recommended Posts

Classifying SNLI datasets with Word2Vec + LSTM
Word2Vec with BoUoW
Processing datasets with pandas (2)
Merge datasets with pandas
Learn Zundokokiyoshi with LSTM
Multivariate LSTM with Keras
Implemented word2vec with Theano + Keras
Sentiment analysis with Python (word2vec)
I made Word2Vec with Pytorch