[PYTHON] [Natural language processing] Preprocessing with Japanese

I would like to summarize some Japanese preprocessing that has natural language processing. (Scheduled to be updated at any time)

Full-width-> half-width

>>> import unicodedata
>>> 
>>> text =u'1994'
>>> print unicodedata.normalize(‘NFKC’, text)
1994

Cloud = proper noun? ??

I think most people parse Japanese with mecab.

And I think that there are many people who use neologd as a dictionary, but there is one I found using this dictionary.

$ mecab -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd
cloud
Cloud noun,Proper noun,General,*,*,*,cloud~,Spider spider,Spider spider
EOS

Spider Koyakuso Kunobasho ...? When I looked it up, it was an anime movie directed by Makoto Shinkai.

Recommended Posts

[Natural language processing] Preprocessing with Japanese
3. Natural language processing with Python 2-1. Co-occurrence network
[WIP] Pre-processing memo in natural language processing
3. Natural language processing with Python 1-1. Word N-gram
I tried natural language processing with transformers.
Python: Natural language processing
RNN_LSTM2 Natural language processing
3. Natural language processing with Python 2-2. Co-occurrence network [mecab-ipadic-NEologd]
[Python] I played with natural language processing ~ transformers ~
Let's enjoy natural language processing with COTOHA API
100 Language Processing with Python Knock 2015
Natural language processing 1 Morphological analysis
Natural language processing 3 Word continuity
Natural language processing 2 Word similarity
3. Natural language processing with Python 4-1. Analysis for words with KWIC
Performance verification of data preprocessing in natural language processing
Building an environment for natural language processing with Python
Overview of natural language processing and its data preprocessing
3. Natural language processing with Python 5-1. Concept of sentiment analysis [AFINN-111]
100 natural language processing knocks Chapter 4 Commentary
100 Language Processing Knock with Python (Chapter 1)
Quick batch text formatting + preprocessing for Aozora Bunko data for natural language processing with Python
Natural Language: Word2Vec Part1 --Japanese Corpus
100 Language Processing Knock with Python (Chapter 3)
Artificial language Lojban and natural language processing (artificial language processing)
■ [Google Colaboratory] Preprocessing of Natural Language Processing & Morphological Analysis (janome)
Preparing to start natural language processing
Natural language processing analyzer installation summary
3. Natural language processing with Python 5-2. Emotion intensity analysis tool VADER
3. Natural language processing with Python 5-5. Emotion value analysis of Japanese sentences [Japanese evaluation polarity dictionary (words)]
3. Natural language processing with Python 5-3. Emotion value analysis of Japanese sentences [Word emotion polarity value correspondence table]
Japanese Natural Language Processing Using Python3 (4) Sentiment Analysis by Logistic Regression
Easily build a natural language processing model with BERT + LightGBM + optuna
Dockerfile with the necessary libraries for natural language processing in python
Summarize how to preprocess text (natural language processing) with tf.data.Dataset api
Natural Language Processing Case Study: Word Frequency in'Anne with an E'
100 Language Processing Knock 2020 with GiNZA v3.1 Chapter 4
Natural Language: GPT --Japanese Generative Pretraining Transformer
Natural language processing of Yu-Gi-Oh! Card name-Yu-Gi-Oh!
100 Knocking Natural Language Processing Chapter 1 (Preparatory Movement)
100 Language Processing Knock with Python (Chapter 2, Part 2)
100 Language Processing Knock with Python (Chapter 2, Part 1)
Natural Language: BERT Part1 --Japanese Wikipedia Corpus
Convenient goods memo around natural language processing
100 Language Processing Knock-88: 10 Words with High Similarity
3. Natural language processing with Python 5-4. Emotion value analysis of Japanese sentences [Japanese evaluation polarity dictionary (noun edition)]
100 language processing knocks 03 ~ 05
100 language processing knocks (2020): 40
100 language processing knocks (2020): 35
100 language processing knocks (2020): 47
100 language processing knocks (2020): 39
100 language processing knocks (2020): 22
100 language processing knocks (2020): 26
100 language processing knocks (2020): 34
100 Language Processing Knock (2020): 28
100 language processing knocks (2020): 42
100 language processing knocks (2020): 29
100 language processing knocks (2020): 49
100 language processing knocks 06 ~ 09
100 language processing knocks (2020): 43
100 language processing knocks (2020): 24