Start studying: Saturday, December 7th
Teaching materials, etc .: ・ Miyuki Oshige "Details! Python3 Introductory Note ”(Sotec, 2017): Completed on Thursday, December 19th ・ Progate Python course (5 courses in total): Ends on Saturday, December 21st ・ Andreas C. Müller, Sarah Guido "(Japanese title) Machine learning starting with Python" (O'Reilly Japan, 2017): Completed on Saturday, December 23
Participation competition: Real or Not? NLP with Disaster Tweets 12/24 (Tue) ~
The problem is to sort out tweets that show information about disasters and those that don't. As a field, it corresponds to natural language processing.
It will be held until March next year, but I would like to submit once by January 10th (Friday), about two weeks after today, at the latest.
I was fortunate to be able to form a team with the people in the laboratory of a certain university who are indebted to me now, so it is a very encouraging situation, but I will firmly output it so that it will not be reliable.
・ Get an overview of the data with head (), shape, describe () ・ Understand missing values and the number of training data -Cut unnecessary (possibly) parts with drop ('data label name', axis = 1) -Extract the corresponding text part with df ["data label name"] and list it with tolist ()
-Define a stop word (and or or) and split it with split () -Split into lowercase letters with lower () and split into words with split in for syntax -Output with pprint () (With pprint, line breaks are inserted for each element, making it easier to see.) ・ Count the number of times a word appears and exclude those that are less than the specified number of times -Dictionary the completed word string using gensim's corpora.dictionary () (corpus completed) ・ Convert to LDA model
Although we succeeded in vectorization so far, we noticed that the number of dimensions has reached several thousand because it was just converted, and that the target indicating whether the information about the disaster or not is not linked with the extracted information. ..
For now, I haven't considered how to connect them, but I will continue to challenge tomorrow.
Recommended Posts