[PYTHON] Learning record No. 28 (32nd day)

Learning record (32nd day)

Start studying: Saturday, December 7th

Teaching materials, etc .: ・ Miyuki Oshige "Details! Python3 Introductory Note ”(Sotec, 2017): 12/7 (Sat) -12/19 (Thu) read ・ Progate Python course (5 courses in total): 12/19 (Thursday) -12/21 (Saturday) end ・ Andreas C. Müller, Sarah Guido "(Japanese title) Machine learning starting with Python" (O'Reilly Japan, 2017): 12/21 (Sat) -December 23 (Sat) ・ Kaggle: Real or Not? NLP with Disaster Tweets: Posted on Saturday, December 28th to Friday, January 3rd Adjustment ・ Wes Mckinney "(Japanese title) Introduction to data analysis by Python" (O'Reilly Japan, 2018): 1/4 (Wednesday) to 1/13 (Monday) read ・ Yasuki Saito "Deep Learning from Zero" (O'Reilly Japan, 2016): 1/15 (Wed) -1/20 (Mon) ・ ** François Chollet “Deep Learning with Python and Keras” (Queep, 2018): 1/21 (Tue) ~ **

"Deep learning with Python and Keras"

p.244 Finish reading Chapter 6 Deep Learning for Texts and Sequences.

Learned word embedding

-Trained network (word embedding): A trained and saved network on a large dataset. ** If the dataset used is large and versatile, the spatial hierarchy of learned features is effectively a general-purpose model in the same world. ** **

Similar to CNN (Pattern Movement Invariance, Spatial Hierarchy Learning) in image classification, natural language if the required features are fairly general and have general visual or semantic features. The learned word embedding is also advantageous in processing.

The trained model is applied to the embedding layer. The embedded layer can be easily thought of as a "dictionary that maps an integer index representing a particular word to a dense vector". (Word index → ** Embedded layer ** → Corresponding word vector)

Kaggle (Real or Not? NLP with Disaster Tweets), which I tried before, was a natural language processing problem, and now these Trial and error trying to apply a trained model (gensim: glove-twitter) to the dataset.

Build index to map(Embedding)


gensim = '/Users/***/gensim-data/glove-twitter-100' #Extract the ZIP file in advance.

embedding_index = {}
f = open(os.path.join(gensim, 'glove-twitter-100'))
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype = 'float32')
    embedding_index[word] = coefs
f.close()

print('Found %s word vectors.' % len (embedding_index))

#Found 1193515 word vectors.

Struggling

-Tokenization of train.csv ['text'] Last time I was able to do batch conversion with tfidf_vectorizer, but this time it is necessary to tokenize in advance because it passes through the Embedding layer ... but for some reason it does not work. In the book, it is processed by keras built-in Tokenizer, so I tried the same procedure, but the following error.

a.png

Full use of google

Recommended Posts

Learning record No. 28 (32nd day)
Learning record No. 21 (25th day)
Learning record No. 10 (14th day)
Learning record No. 17 (21st day)
Learning record No. 24 (28th day)
Learning record No. 19 (23rd day)
Learning record No. 29 (33rd day)
Learning record No. 23 (27th day)
Learning record No. 25 (29th day)
Learning record No. 26 (30th day)
Learning record No. 20 (24th day)
Learning record No. 27 (31st day)
Learning record No. 14 (18th day) Kaggle4
Learning record No. 15 (19th day) Kaggle5
Learning record 4 (8th day)
Learning record 3 (7th day)
Learning record 5 (9th day)
Learning record 6 (10th day)
Programming learning record day 2
Learning record 8 (12th day)
Learning record 1 (4th day)
Learning record 7 (11th day)
Learning record 2 (6th day)
Learning record 16 (20th day)
Learning record 22 (26th day)
Learning record (2nd day) Scraping by #BeautifulSoup
Learning record 13 (17th day) Kaggle3
Learning record 12 (16th day) Kaggle2
Learning record
Learning record # 3
Learning record # 1
Learning record # 2
Learning record 11 (15th day) Kaggle participation
Python learning day 4
Learning record so far
Go language learning record
Linux learning record ① Plan
Effective Python Learning Memorandum Day 15 [15/100]
<Course> Deep Learning: Day2 CNN
Effective Python Learning Memorandum Day 6 [6/100]
Effective Python Learning Memorandum Day 12 [12/100]
Effective Python Learning Memorandum Day 9 [9/100]
Effective Python Learning Memorandum Day 8 [8/100]
Learning record (3rd day) #CSS selector description method #BeautifulSoup scraping
Rabbit Challenge Deep Learning 1Day
<Course> Deep Learning: Day1 NN
Effective Python Learning Memorandum Day 14 [14/100]
Effective Python Learning Memorandum Day 1 [1/100]
Subjects> Deep Learning: Day3 RNN
Rabbit Challenge Deep Learning 2Day
Effective Python Learning Memorandum Day 13 [13/100]
Effective Python Learning Memorandum Day 3 [3/100]
Effective Python Learning Memorandum Day 5 [5/100]
Effective Python Learning Memorandum Day 4 [4/100]
Effective Python Learning Memorandum Day 7 [7/100]
Effective Python Learning Memorandum Day 2 [2/100]
Thoroughly study Deep Learning [DW Day 0]
Learning record (4th day) #How to get the absolute path from the relative path