[PYTHON] Text mining (for memos)

A note about text mining.

Divide a large amount of text data into words and phrases by natural language processing ⇒ Previously, natural language was not fully developed and it was difficult to divide.

It has the following three functions.


Information extraction

Removes noise from text data and extracts information necessary for mining

① Morphological analysis

Extract words by morpheme using the recorded dictionary The dictionary needs to be updated from time to time.

② Synonyms

Absorbs notation fluctuations by creating and using synonym dictionaries. Determine if it is a synonym by looking at the data. ex) Evaluation is "high" = evaluation is "good" Price is "high" ≠ evaluation is "good"

③ Word pattern

By extracting morphemes that appear in the vicinity, interrogative forms, negative forms, and fluctuations in expression are extracted. ex) "Are you there?" = "Are you there?" ⇒ Verb + auxiliary verb + symbol

④ Dependency analysis

Morphemes are grouped into clauses, and the main predicate relations and modifier relations between clauses are judged.


Mining

Obtain new information and knowledge that matches the information you want to obtain from the set of extracted concepts

・ Analysis between variables

Calculate relevance from word co-occurrence

・ Analysis between samples

Divide text data into similar groups.

・ Keyword analysis

Analyze the context in which keywords are used ⇒ Is it similar to the topic model? ??

·Multivariate analysis


Visualization of analysis results

Helps understand and consider analysis results

Recommended Posts

Text mining (for memos)
Text mining with Python-Scraping-
Prepare sample data for text mining by yourself
3.6 Text Normalization 3.7 Regular Expressions for Tokenizing Text
Text mining with Python ① Morphological analysis
Memo for creating a text formatting tool
Text mining with Python ② Visualization with Word Cloud
Try text mining your diary in Python