[PYTHON] Learning record 11 (15th day) Kaggle participation

Learning record (15th day)

Start studying: Saturday, December 7th

Teaching materials, etc .: ・ Miyuki Oshige "Details! Python3 Introductory Note ”(Sotec, 2017): Completed on Thursday, December 19th ・ Progate Python course (5 courses in total): Ends on Saturday, December 21st ・ Andreas C. Müller, Sarah Guido "(Japanese title) Machine learning starting with Python" (O'Reilly Japan, 2017): Completed on Saturday, December 23

Kaggle first participation

Participation competition: Real or Not? NLP with Disaster Tweets 12/24 (Tue) ~

The problem is to sort out tweets that show information about disasters and those that don't. As a field, it corresponds to natural language processing.

It will be held until March next year, but I would like to submit once by January 10th (Friday), about two weeks after today, at the latest.

I was fortunate to be able to form a team with the people in the laboratory of a certain university who are indebted to me now, so it is a very encouraging situation, but I will firmly output it so that it will not be reliable.

Data preprocessing

・ Get an overview of the data with head (), shape, describe () ・ Understand missing values and the number of training data -Cut unnecessary (possibly) parts with drop ('data label name', axis = 1) -Extract the corresponding text part with df ["data label name"] and list it with tolist ()

Data extraction / corpus creation

-Define a stop word (and or or) and split it with split () -Split into lowercase letters with lower () and split into words with split in for syntax -Output with pprint () (With pprint, line breaks are inserted for each element, making it easier to see.) ・ Count the number of times a word appears and exclude those that are less than the specified number of times -Dictionary the completed word string using gensim's corpora.dictionary () (corpus completed) ・ Convert to LDA model

Although we succeeded in vectorization so far, we noticed that the number of dimensions has reached several thousand because it was just converted, and that the target indicating whether the information about the disaster or not is not linked with the extracted information. ..

For now, I haven't considered how to connect them, but I will continue to challenge tomorrow.

Recommended Posts

Learning record 11 (15th day) Kaggle participation
Learning record 13 (17th day) Kaggle3
Learning record 12 (16th day) Kaggle2
Learning record No. 14 (18th day) Kaggle4
Learning record No. 15 (19th day) Kaggle5
Learning record 4 (8th day)
Learning record 9 (13th day)
Learning record 3 (7th day)
Learning record 5 (9th day)
Learning record 6 (10th day)
Learning record 1 (4th day)
Learning record 7 (11th day)
Learning record 2 (6th day)
Learning record 16 (20th day)
Learning record 22 (26th day)
Learning record No. 10 (14th day)
Learning record No. 24 (28th day)
Learning record No. 25 (29th day)
Learning record No. 26 (30th day)
Learning record No. 20 (24th day)
Programming learning record day 2
Learning record No. 17 (21st day)
Learning record No. 18 (22nd day)
Learning record No. 19 (23rd day)
Learning record No. 28 (32nd day)
Learning record No. 27 (31st day)
Learning record
Learning record # 3
Learning record # 1
Learning record (2nd day) Scraping by #BeautifulSoup
Learning record (4th day) #How to get the absolute path from the relative path
AtCoder 6th Challenge from Dwango Qualifying Participation Record
Learning record so far
yukicoder contest 266 Participation record
yukicoder contest 263 Participation record
yukicoder contest 243 Participation record
yukicoder contest 273 Participation record
Go language learning record
yukicoder contest 252 Participation record
yukicoder contest 259 Participation record
yukicoder contest 249 Participation record
yukicoder contest 242 Participation record
yukicoder contest 241 Participation record
yukicoder contest 277 Participation record
yukicoder contest 257 Participation record
yukicoder contest 246 Participation record
yukicoder contest 275 Participation record
yukicoder contest 274 Participation record
yukicoder contest 247 Participation record
yukicoder contest 261 Participation record
Linux learning record ① Plan
yukicoder contest 278 Participation record
yukicoder contest 248 Participation record
Learning record (6th day) #Set type #Dictionary type #Mutual conversion of list tuple set #ndarray type #Pandas (DataFrame type)
Learning record (3rd day) #CSS selector description method #BeautifulSoup scraping
Effective Python Learning Memorandum Day 15 [15/100]
<Course> Deep Learning: Day2 CNN
Effective Python Learning Memorandum Day 6 [6/100]
Effective Python Learning Memorandum Day 9 [9/100]
Effective Python Learning Memorandum Day 8 [8/100]
yukicoder contest 270 (mathematics contest) Participation record