[PYTHON] We have released a trained model of fastText

We have released a trained model of fastText. You can download the trained model from:

The embedded vector information is summarized in the following repository, so please check it out as well. awesome-embedding-models

Motivation In the following article, I have pasted the link that icoxfog417 published on GitHub.

However, there was a problem that Git LFS was required to download the published vector and the location was difficult to understand. Therefore, this time, I learned and published it so that it can be easily downloaded.

How to make I referred to the following article for how to use fastText. This is a good article that explains the theory and usage of fastText.

The data used for learning is wikipedia 2017/01/01.

Hyperparameters are set as follows. Other hyperparameters use the Default setting.

How to use After downloading the data, you can load it as follows. (For gensim)

import gensim

model = gensim.models.KeyedVectors.load_word2vec_format('model.vec', binary=False)

Related words can be found as follows.

>>> model.most_similar(positive=['Japanese'])
[('Korean', 0.7338133454322815),
 ('Chinese', 0.717720627784729),
 ('American', 0.6725355982780457),
 ('Japanese woman', 0.6723321676254272),
 ('Foreigner', 0.6420464515686035),
 ('Filipino', 0.6264426708221436),
 ('Westerners', 0.621786892414093),
 ('Asian', 0.6192302703857422),
 ('Taiwanese', 0.6034690141677856),
 ('Nikkei', 0.5906497240066528)]

Good NLP Life!

Recommended Posts

We have released a trained model of fastText
Implementation of VGG16 using Keras created without using a trained model
Diversion of layers of trained keras model
Use fastText trained model from Python
Use a scikit-learn model trained in PySpark
Sum of variables in a mathematical model
We have released a Python module that generates a regional mesh for Japan
[NNabla] How to add a quantization layer to the middle layer of a trained model
[Python] Implementation of clustering using a mixed Gaussian model
A summary of what I have touched like a blog
Two-dimensional visualization of document vectors using Word2Vec trained model