Learning record (33rd day)

Start studying: Saturday, December 7th

Teaching materials, etc .: ・ Miyuki Oshige "Details! Python3 Introductory Note ”(Sotec, 2017): 12/7 (Sat) -12/19 (Thu) read ・ Progate Python course (5 courses in total): 12/19 (Thursday) -12/21 (Saturday) end ・ Andreas C. Müller, Sarah Guido "(Japanese title) Machine learning starting with Python" (O'Reilly Japan, 2017): 12/21 (Sat) -December 23 (Sat) ・ Kaggle: Real or Not? NLP with Disaster Tweets: Posted on Saturday, December 28th to Friday, January 3rd Adjustment ・ Wes Mckinney "(Japanese title) Introduction to data analysis by Python" (O'Reilly Japan, 2018): 1/4 (Wednesday) to 1/13 (Monday) read ・ Yasuki Saito "Deep Learning from Zero" (O'Reilly Japan, 2016): 1/15 (Wed) -1/20 (Mon) ・ ** François Chollet “Deep Learning with Python and Keras” (Queep, 2018): 1/21 (Tue) ~ **

"Deep learning with Python and Keras"

p.261 Chapter 6 Deep Learning for Texts and Sequences Finished reading halfway.

Learned word embedding (2nd day)

The tokenization that was struggling yesterday has been completed.

`Data preprocessing (natural language processing)`


#type : pandas.core.series.Series

#Convert to lowercase
X_l = X.str.lower()

#Replace unnecessary characters with half-width spaces.
X_r = X_l.replace(',', ' ').replace('.', ' ').replace('#', ' ').replace('＃', ' ').replace('!', ' ').replace('！', ' ').replace('　', ' ')

#Divide each word using a half-width space as a separator
X_s = X_r.str.split(' ')

#Defined together
def make_vector(df):
    X_l = df.str.lower()
    X_r = X_r = X_l.replace(',', ' ').replace('.', ' ').replace('#', ' ').replace('＃', ' ').replace('!', ' ').replace('！', ' ').replace('　', ' ')
    X_s = X_r.str.split(' ')
    return X_s

Now that we've tokenized the text retrieved from the dataset, all we have to do now is train the defined model. (Under implementation)

By the way, at first I tried to take out one by one and turn it with a for statement as follows, but it doesn't work. I wondered if it would be okay to preprocess the Series as it is without having to take it out, so I looked it up and found that it was still possible. Write while referring to the pandas official (API reference, Series) for preprocessing Succeeded.

[PYTHON] Learning record No. 29 (33rd day)

Learning record (33rd day)

"Deep learning with Python and Keras"

Learned word embedding (2nd day)

Data preprocessing (natural language processing)

`Data preprocessing (natural language processing)`