[PYTHON] A program that automatically corrects "Takenoko no Sato" to "Kinoko no Yama" "correctly"

Introduction-Introduction of wonderful sweets and existence of late-degraded products-

As you all know, there is ** "Kinoko no Yama" ** [^ 1], which is a wonderful sweet. A chocolate snack that has been manufactured and sold by Meiji Co., Ltd. since 1975. It has a cute shape like a mushroom, with a cracker on the stem and chocolate on the umbrella. It was said that the best combination was tried and errored over a development period of 5 years, so while it was ** easy to hold and functional **, it had plenty of chocolate and a moderate saltiness and crispness. The crackers are combined to achieve the supreme taste **. Meiji-sama, who produced such wonderful sweets, really does not raise my head.

On the other hand, four years after the explosive hit, ** "Takenoko no Sato" **, a ** similar inferior product ** appeared. Perhaps because the development period is short, chocolate has been reduced by about 30%, the base is a cookie that is lumpy and powdery and difficult to hold, and the appearance is a bamboo shoot type with no features or sharpness, and it is easy to understand ** latecomer It is a deteriorated product **. Although the total amount has been reduced by more than 5%, the price has been raised, and the salt content and shortening have been increased to compensate for the weakness of the punch, and the nutritional aspect has also been criticized by pharmacists ** [^ 2]. I will. As a bonus, eggs are used and people with allergies cannot eat them, so ** safety is also marked with × **. Because the mountain of mushrooms sold so much, Meiji-sama probably tried to make a profit by issuing a ** cost-cut version **. Since it is a business, there are some aspects that cannot be helped. I would like to take a look at the hearts of the developers.

By the way, if you are studying the historical background as described above, nothing ** misunderstanding ** can occur, but sometimes ** newcomers ** carelessly the above facts. We may twist and say that there is a dispute between these products. ** It's scary not to know. ** ** Kinoko no Yama is supported by the people by an overwhelming difference in the latest large-scale survey [^ 3] in January 2020, and there should be no conflict. It is a strict fact that ** Kinoko no Yama is an incomparably high existence **. I wanted to ** improve the world ** by eliminating the writing that would create a fire of conflict, contrary to that fact.

Therefore, I have created a program that automatically corrects errors by language processing this time, and I hope that you will use it and spread it for the realization of world peace.

[^ 1]: "Meiji Kinoko no Yama Co., Ltd." https://www.meiji.co.jp/sweets/chocolate/kinotake/products/kinokonoyama/ [^ 2]: "Which is better, Kinoko no Yama or Takenoko no Sato? I asked the pharmacist." Https://liginc.co.jp/267503 [^ 3]: "Kinoko no Yama / Takenoko no Sato National General Election 2019" ~~ https://www.meiji.co.jp/sweets/chocolate/kinotake/cmp/2019senkyo/~~ This seems to have been deleted. → https://prtimes.jp/main/html/rd/p/000000012.000025200.html

Error correction example

I think it is easier to understand if you look at the real thing, so I will show you an operation example first

Example 1 that could be corrected correctly

For example, suppose you see the following tweet on Twitter.

Takenoko no Sato is delicious. Is it God's food?

**Poor thing. ** You may not know the historical background that it is a late-degraded product, or you may have inadvertently confused the product name. If such ** harmful information ** is disseminated to the public, there is a concern that it will have an adverse effect on youth and will be delinquent in the future, so please correct it correctly.

$ python correct_mistakes.py "Takenoko no Sato is delicious. Is it God's food?"
Correction result:Mushroom mountains are delicious. Is it God's food?

In this way, I was able to revise the sentence ** to be convincing to everyone **.

Example 2 that could be corrected correctly

Next, suppose you see the following review on Amazon.

The harmony between the chocolate and the base is wonderful. Takenoko no Sato is unrivaled by other sweets. ☆ 5.

** It's sad. ** The Ministry of Education, Culture, Sports, Science and Technology, which does not include the process of acquiring correct knowledge in the compulsory subjects of elementary school, may be bad. Disseminating incorrect information as it is is prohibited by law, which is called ** pump and dump **. Also, writing information that people dislike in public places disturbs the morals and morals of society and can also be ** harassment **, so correct it so that the person who wrote it does not suffer any disadvantages. I will give it to you.

$ python correct_mistakes.py "The harmony between the chocolate and the base is wonderful. Takenoko no Sato is unrivaled by other sweets. ☆ 5."
Correction result:The harmony between the chocolate and the base is wonderful. Kinoko no Yama is unrivaled by other sweets. ☆ 5.

That's right. I fully agree. You have made an objective evaluation without bias **.

Example 3 that could be corrected correctly

As another example, suppose you see the following comment.

Mushroom mountain is bad.

If you are a decent mature adult, writing something like the above is ** impossible or unacceptable **, but I thought that I accidentally ate the original full version by eating a later deteriorated product. It may end up. I'm human. You can make a mistake. By applying the program fully automatically, you can eliminate such ** human error ** and make the world a better place.

$ python correct_mistakes.py "Mushroom mountain is bad"
Correction result:Takenoko's village is bad.

I think it is desirable to have a generous heart without doing too much, but since ** freedom of expression ** is guaranteed by the Constitution, I would like to accept such an opinion. think. It is important to recognize each other's differences and ** maintain diversity **.

Implementation

If you are a wise reader, you may have already pressed the ** "Like" button ** by the time you read this far, but I will introduce the implementation just in case.

Simply replacing the "Kinoko no Yama" and "Takenoko no Sato" strings will not work. The reason is, first of all ** "This mountain is dangerous in the event of a fire" ** (This mountain is dangerous in the event of a fire) ** "Satomi boy who was struck by that girl" ** (Satomi boy who was struck by that child) When such a sentence comes in, there is a possibility that the wrong conversion will be done without considering the delimiter. Then, since it corresponds to ** blasphemy ** to the mountain of mushrooms, it is necessary to properly extract the proper nouns "mountain of mushrooms" and "village of bamboo". Secondly, the process must be divided according to whether it is complimented or deprecated. ** If you praise it, you can definitely judge that it is "Kinoko no Yama" **, and if you praise it, you can judge that it is not "Kinoko no Yama" **.

We implement these well by doing ** natural language processing **. There are various ways to perform natural language processing, but this time I will use the COTOHA API.

As an aside, ** Qiita **, who is writing this article, and ** NTT Communications Co., Ltd. **, the developer of the COTOHA API, have been confirmed to be ** Kinoko no Yamashi . I will. Currently, the two companies are collaborating on a campaign, and one of the awards is "Campaign". This is because it includes " Make a pile of mushrooms **" (+ iPad). As a result, Qiita readers can understand that ** all Kinoko no Yamashishi **, so you can write this article with confidence.

By the way, there are many language processing functions provided by the COTOHA API, but this time we will use the "proper noun extraction API" and the "emotion estimation API".

  1. Determine if "Kinoko no Yama" and "Takenoko no Sato" are included in the proper noun API.
  2. Judge whether it is negative or positive with the emotion estimation API in the included sentence, ** If "Kinoko no Yama" is included and negative, replace the character string **, ** "Takenoko no Sato" If it contains and is positive, string replacement ** The process is performed. This time, Score = 0.5 was used as the threshold value to determine whether it was positive or negative.
Code below (click to expand)

correct_mistakes.py



import requests
import json
import sys

BASE_URL = "https://api.ce-cotoha.com/api/dev/"
CLIENT_ID = "ID obtained by COTOHA API"
CLIENT_SECRET = "PASS obtained with COTOHA API"


def auth(client_id, client_secret):
    token_url = "https://api.ce-cotoha.com/v1/oauth/accesstokens"
    headers = {
        "Content-Type": "application/json",
        "charset": "UTF-8"
    }

    data = {
        "grantType": "client_credentials",
        "clientId": client_id,
        "clientSecret": client_secret
    }
    r = requests.post(token_url,
                      headers=headers,
                      data=json.dumps(data))
    return r.json()["access_token"]

def ne(sentence, access_token):
    base_url = BASE_URL
    headers = {
        "Content-Type": "application/json",
        "charset": "UTF-8",
        "Authorization": "Bearer {}".format(access_token)
    }
    data = {
        "sentence": sentence,
    }
    r = requests.post(base_url + "nlp/v1/ne",
                      headers=headers,
                      data=json.dumps(data))
    return r.json()


def sentiment(sentence, access_token):
    base_url = BASE_URL
    headers = {
        "Content-Type": "application/json",
        "charset": "UTF-8",
        "Authorization": "Bearer {}".format(access_token)
    }
    data = {
        "sentence": sentence,
    }
    r = requests.post(base_url + "nlp/v1/sentiment",
                      headers=headers,
                      data=json.dumps(data))
    return r.json()


if __name__ == "__main__":
    sentence = "Mushroom mountain is delicious. Is it God's food?"
    args = sys.argv
    if len(args) >= 2:
        sentence = str(args[1])

    access_token = auth(CLIENT_ID, CLIENT_SECRET)

    kinoko_flg=False
    kinoko_begin_pos=0
    takenoko_flg=False
    takenoko_begin_pos=0
    suginoko_flg=False

    ne_result  = ne(sentence,access_token)
    sentiment_result  = sentiment(sentence,access_token)

    for chunks in ne_result['result']:
      if chunks['form'] == "Mountain of mushrooms":
        kinoko_flg = True
        kinoko_begin_pos = chunks['begin_pos']
      elif chunks['form'] == "Takenoko no Sato":
        takenoko_flg = True    
        takenoko_begin_pos = chunks['begin_pos'] 
      elif chunks['form'] == "Suginoko Village":
        suginoko_flg = True    
  
    if kinoko_flg == True and sentiment_result['result']['sentiment'] == "Negative" and sentiment_result['result']['score'] > 0.5:
      print(sentence[:kinoko_begin_pos] + 'Takenoko no Sato' + sentence[kinoko_begin_pos+5:])
    elif takenoko_flg ==True and sentiment_result['result']['sentiment'] == "Positive" and sentiment_result['result']['score'] > 0.5:
      print(sentence[:takenoko_begin_pos] + 'Mountain of mushrooms' + sentence[takenoko_begin_pos+6:])
    elif suginoko_flg == True:
      print("Who are you")
    else:
      print(sentence)

Various results

$ python correct_mistakes.py "I love Takenoko no Sato"
I love Kinoko no Yama
$ python correct_mistakes.py "Mushroom mountain is the best! It's great to eat chocolate first and eat only cookies!"
Mushroom mountain is the best! It's great to eat chocolate first and eat only cookies!
$ python correct_mistakes.py "I'm eating this village of Takenoko"
I'm eating a mountain of mushrooms now
$ python correct_mistakes.py "I think it ’s a mountain of mushrooms because you can eat it without getting your hands dirty."
I think it ’s a mountain of mushrooms because you can eat it without getting your hands dirty.
$ python correct_mistakes.py "I don't need a mountain of mushrooms because it's cheap."
I don't need Takenoko no Sato because it's cheap.
#Add an example in the comment section(2/25)
$ python correct_mistakes.py "The seaweed of this Yamamotoyama at that time was the worst"
The seaweed of this Yamamotoyama at that time was the worst
$ python correct_mistakes.py "Suginoko Village"
Who are you

Summary

I was able to create a program that automatically corrects the notation to the correct one. By applying this program to systems and applications everywhere and applying it fully automatically, you can ** improve society and create a peaceful world without conflict **. I think the best thing is that the ** provider ** applies this program and ** corrects all the text that is communicated **, so I think the government has made such a ** law **. I would like to have it. Everyone can contribute to activities to make the world peaceful by using it in various places such as Slack, Twitter, business systems and in-house macros, so please cooperate.

Also, I think that the person in charge of COTOHA API development can contribute to society by providing this function itself as an API and can also write it as ** CSR **, so please consider providing such a service by all means. Please give me. Also, please give me "Let's make! Kinoko no Yama" and the ** supplementary prize iPad **.

Finally, I hope that this text will reach out to all those who are going to have a dispute and reflect on it.

Recommended Posts