[PYTHON] That's right, let's eat it. [Natural language processing starting with Kyoto dialect]

Introduction

Natural language processing for the first time. It's exciting. This article is [Qiita x COTOHA API present plan] Let's analyze the text with COTOHA API! ](Https://zine.qiita.com/event/collaboration-cotoha-api/?utm_source=qiita&utm_medium=banner). ~~ I want too many prizes! ~~ I was in time for posting.

The main subject immediately. What to do now.

First of all, I will briefly introduce what to do. The following one was made ↓

python3 bubuduke.py "Hetaxo"
"Don't be good"

I will make a Kyoto dialect translator like this. Bubu-zuke is delicious! Yay! bubu.jpg

Flow of this article

--Try using the COTOHA API for the time being (I used it for the first time) --What is natural language processing? --Try natural language processing of text

With that flow, if you notice, you have a convenient tool like the one above. Very easy.

Make the COTOHA API available for the time being

Register now for free from here. Send your email address to create an account. Log in if you can. Then you will see this screen. (Promotion so far) cotohaapi.png

This is the end of this site because I will only use the ID later.

What is natural language processing

To put it very simply, to process the words (= natural language) that humans usually use. No, isn't it as it is? What is difficult about this is that ** natural language, especially Japanese, is not well-defined **.

Japanese is not well-defined

"Not well-defined" means that the definition does not assign a unique interpretation or value. You can think of multiple interpretations for a sentence here.

Consider a simple example. As follows.  toilet.png There is no one who sees this and thinks that it is not good to shed poop. However, if you try to write this sentence literally with pseudo code, it will be like this.

if to flow== "Toilet Paper" then
You can shed

that? I feel like I can't poop.

The Kyoto dialect is at its peak.

The Kyoto dialect that I decided to handle this time. For example, like this.  bubuduke.png

The one above is the famous Bubu-zuke. Bubu-zuke should mean Ochazuke in the Kyoto dialect. Even so, being advised to pickle in Bubu means that you should go home. I don't understand.

Anyway, the Kyoto dialect is insidious

Let me give you another example.   tokei.png

I'm sorry, the derailment is long. There are many other Kyoto dialects that I dislike, but I will omit them. If you are interested, I think you can find various here. In short, ** Kyoto dialect is insidious and is the height of words that are not well-defined. ** **

In natural language processing, words and syntax are processed against information from these natural languages. After all, I'm only explaining natural language. From now on, we will implement a bot that can process the natural language of Japanese and reply to such an insidious Kyoto dialect "wind".

Try natural language processing of the text

Ignore the part that makes the bot for the time being, and do natural language processing. ~~ To be honest, this is the essence, so you don't have to read anything else. ~~ The essence is from here, but COTOHA was so amazing that it ended soon.

Sample 1

For the time being, accept the input and try to process it lightly. This is a demo that receives sentences and returns only nouns. I referred to Masterpiece. The power of Library is so incomplete that it can be done without knowing anything. First, put the library.

pip install git+https://github.com/obilixilido/cotoha-nlp.git
This is the code I made.

samplecode1.py


from cotoha_nlp.parse import Parser

parser = Parser("Client ID",
  "Client secret",
  "https://api.ce-cotoha.com/api/dev/nlp",
  "https://api.ce-cotoha.com/v1/oauth/accesstokens"
)
s = parser.parse(input())

print(" ".join([token.form for token in s.tokens if token.pos in  ["noun"]]))

I will post it again later, but I have posted the sample code on GitHub, so please have a look there as well. Let's run this code. If you enter in the order of python file name string, the result of processing the string will be returned.

python samplecode1.py Spring is Akebono. Yes, let's go to Kyoto.

Then it will come back.

>>Spring Akebono Kyoto

What do you think. With this much code, I was able to do good natural language processing. It's too dangerous. ~~ I don't understand anything. ~~

Kyoto dialect "wind" conversion script completed

Next, I would like to start implementing the Kyoto dialect bot. Scraping from the previous site, if there is a Japanese literal translation that matches the extracted noun, return the Kyoto dialect. Here's the flow for making from ordinary Japanese input. image.png

I will scrape it, so I will add a little library.

pip3 install requests
pip3 install beautifulsoup4
It looks like this.

bubuduke.py


from cotoha_nlp.parse import Parser
import requests
from bs4 import BeautifulSoup
import re

parser = Parser("Client ID",
  "Client secret",
  "https://api.ce-cotoha.com/api/dev/nlp",
  "https://api.ce-cotoha.com/v1/oauth/accesstokens"
)
# input
s = parser.parse(input())

# get nouns
nouns = [token.form for token in s.tokens if token.pos in  ["noun"]]

# web scraping
r = requests.get('https://iirou.com/kazoekata/')
soup = BeautifulSoup(r.content, "html.parser")
block = soup.find_all("p")

# output
for noun in nouns:
  for tag in block:
    if noun in str(tag):
      #Cut out the Kyoto dialect in the strong tag
      output = re.findall('<strong>.*</strong>', str(tag))
      out = output[0]
      out = out.replace("<strong>", "")
      out = out.replace("</strong>", "")
      print(out)

Execute immediately.

python bubuduke.py "Annoying"

The insidious Kyoto dialect is back!

>>"Young lady, you're good at playing the piano."

This is the end of the natural language processing part. Next time, I'll just make a bot. The genre will change, and this article ends here.

What I want to do next time

I want to make it a LINE bot. I want to improve the accuracy a little more. I want to pick up words that do not exactly match. There are very few words that correspond so far. I will write about that again next time.

in conclusion

Repository of code used this time Reference After all, the official reference is the best for any language, library, framework, etc.

Thank you for reading until the end. ~~ I want too many prizes ~~ LGTM please. If LGTM does not accept this area Please like it for reference.

Recommended Posts

That's right, let's eat it. [Natural language processing starting with Kyoto dialect]
Let's enjoy natural language processing with COTOHA API
Study natural language processing with Kikagaku
[Natural language processing] Preprocessing with Japanese
3. Natural language processing with Python 2-1. Co-occurrence network
I tried natural language processing with transformers.
3. Natural language processing with Python 2-2. Co-occurrence network [mecab-ipadic-NEologd]
[Python] I played with natural language processing ~ transformers ~
3. Natural language processing with Python 4-1. Analysis for words with KWIC
Building an environment for natural language processing with Python
Python: Natural language processing
RNN_LSTM2 Natural language processing
3. Natural language processing with Python 5-1. Concept of sentiment analysis [AFINN-111]
3. Natural language processing with Python 5-2. Emotion intensity analysis tool VADER
Natural language processing 3 Word continuity
Natural language processing 2 Word similarity
Easily build a natural language processing model with BERT + LightGBM + optuna
Dockerfile with the necessary libraries for natural language processing in python
Summarize how to preprocess text (natural language processing) with tf.data.Dataset api
Natural Language Processing Case Study: Word Frequency in'Anne with an E'
[Word2vec] Let's visualize the result of natural language processing of company reviews