mémo de nuage de mot
#Tous les mots(Ci-dessous un exemple)
$ vocab
array(['a', 'able', 'at', ..., 'zebra', 'zone', 'zoo'], dtype='<U79')
#TF pour chaque document-Vecteur IDF
$ TF_IDF
array([[ 0. , 0. , 0. , ..., 0. ,
0. , 0. ],
[61.9792226 , 0. , 3.38385083, ..., 0. ,
0. , 0. ],
[ 0. , 0. , 6.76770166, ..., 0. ,
0. , 0. ],
...,
[ 0. , 0. , 0. , ..., 0. ,
0. , 0. ],
[ 2.75463212, 0. , 0. , ..., 0. ,
0. , 0. ],
[ 1.37731606, 2.84060202, 0. , ..., 0. ,
0. , 0. ]])
words = vocab.tolist()
vecs = TF_IDF.tolist()
temp_dic = {}
vecs_dic = []
for vec in vecs:
for i in range(len(vec)):
temp_dic[words[i]] = vec[i]
vecs_dic.append(temp_dic)
temp_dic = {}
$ len(vecs_dic)
(Nombre de documents)
$ len(vecs_dic[0])
(Nombre de dimensions du vecteur)
#Visualisez le 89e document à partir de la liste des documents
from wordcloud import WordCloud
import matplotlib.pyplot as plt
import sys
wordcloud = WordCloud(background_color='white', width=1024, height=674)
wordcloud.generate_from_frequencies(vecs_dic[88])
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.figure()
plt.show()
Résolu en ajoutant de petites valeurs en référence à la référence [2]
words = vocab.tolist()
vecs = TF_IDF.tolist()
temp_dic = {}
vecs_dic = []
for vec in vecs:
for i in range(len(vec)):
temp_dic[words[i]] = vec[i] + 1e-5 #Empêcher l'élément de devenir 0
vecs_dic.append(temp_dic)
temp_dic = {}
Pour l'enregistrer, ajoutez wordcloud.to_file et modifiez-le comme suit.
i=0
for v in vecs_dic:
i+=1
wordcloud = WordCloud(background_color='white', width=1024, height=674)
wordcloud.generate_from_frequencies(v)
wordcloud.to_file([PATH] + str(i) + ".png ")
[1] https://qiita.com/pma1013/items/d183b4b2504173ba037e [2] https://github.com/amueller/word_cloud/issues/456
Recommended Posts