[PYTHON] Use MeCab to translate sloppy sentences in a "slow" way.

When I'm surfing the net, I'm fluttering with taunts that jump out of the words Masakari and Ahat Ahat. If such stray bullets fly here, how should we spend for mental health?

One solution is to pass on such bad mental health sentences by converting them "slowly".

yukkurihonyaku.png

** Slow translation ** http://needtec.sakura.ne.jp/yukkuri_translator/

Let's say you were told, "Don't just make shit videos, this de morons." However, if it is converted to "Yeah, don't make it, this Dote-san.", You won't get angry.

Here, we will use MeCab to perform morphological analysis, and slowly convert sentences that are bad for mental health to eliminate discomfort, but rather to make them feel at home.

Source code

yukkuri_translator.py


#!/usr/bin/env python
# -*- coding: utf-8 -*-
import MeCab
import jctconv
import sys
import codecs
reload(sys)
sys.setdefaultencoding('utf-8')
sys.stdout = codecs.getwriter('utf-8') (sys.stdout)


converter = {
    'eat' : 'Mush Mush',
    'eat' : 'Squint',
    'sleep' : 'Suyasu',
    'sleep' : 'Suyasu',
    'Sleep' : 'Suyasuyashi',
    'Sleep' : 'Suyasuyashi',
    'shit' : 'Yes Yes',
    'Stool' : 'Yes Yes',
    'Flight' : 'Yes Yes',
    'urine' : 'Shishi',
    'Piss' : 'Shishi',
    'Sun' : 'Sun',
    'Sanctions' : 'At all',
    'Confectionery' : 'Fair',
    'candy' : 'Fair',
    'sugar' : 'Fair',
    'juice' : 'Fair',
    'Coordination' : 'Coordination',
    'pregnancy' : 'Ninshin'
}



class MarisaTranslator:
    def __init__(self, user_dic):
        self.mecab = MeCab.Tagger("-u " + user_dic)

    def _check_san(self, n):
        """
Judgment whether to add "san"
        """
        f = n.feature.split(',')
        if f[0] == 'noun':
            if f[1] == 'Proper noun' or f[1] == 'General':
                if n.next:
                    #Check the next word
                    nf = n.next.feature.split(',')
                    if nf[0] in ['noun', 'Auxiliary verb']:
                        #If the noun follows, do not add "san" here
                        return False
                    else:
                        if n.surface.endswith('Mr.'):  # Mr.でおわる場合は付与しない
                            return False
                        if n.surface == 'Mr' or n.surface == 'Sama':  # Mrでおわる場合は付与しない
                            return False
                        return True
                else:
                    return True
        return False


    def _check_separator(self, n):
        """
Judgment whether to add ","
        """
        f = n.feature.split(',')
        if f[0] == 'Particle':
            if n.next:
                #Check the next word
                nf = n.next.feature.split(',')
                if nf[0] in ['symbol', 'Particle']:
                    return False
                return True
        return False


    def _get_gobi(self, n):
        if n.next:
            f_next = n.next.feature.split(',')
            if n.next.surface == '、':
                return None
            if f_next[0] == 'BOS/EOS' or f_next[0] == 'symbol':
                f = n.feature.split(',')
                if f[0] in ['Particle', 'noun', 'symbol', 'Interjection']:
                    return None
                if f[5] in ['Command e', 'Continuous form']:
                    return None
                if n.surface in ['Is']:
                    return 'What'
                else:
                    return n.surface + 'Noze'
        return None

    def translate(self, src):
        n = self.mecab.parseToNode(src)
        text = ''
        pre_node = None
        while n:
            f = n.feature.split(',')
            if n.surface in converter:
                text += converter[n.surface]
            elif len(f) > 8:
                gobi = self._get_gobi(n)
                if gobi is not None:
                    text += gobi
                elif f[8] != '*':
                    text += f[8]
                else:
                    text += n.surface
            else:
                text += n.surface
            if self._check_san(n):
                text += 'Mr.'
            elif self._check_separator(n):
                text += '、'
            n = n.next
            pre_node = n
        return jctconv.kata2hira(text.decode('utf-8')).encode('utf-8')

Example of using the above class:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from yukkuri_translator import MarisaTranslator


if __name__ == "__main__":
    t = MarisaTranslator('yukkuri.dic')
    print t.translate('Don't just make shit videos, this de morons.')

Description

Conversion to hiragana

By making all the characters into hiragana, it becomes a line like a bean paste brain.

To do this, first perform a morphological analysis in MeCab. This will give you the reading of each word. This applies to 8 features (starting with 0). Since this reading is in katakana, use jctconv to convert everything to hiragana.

It may be misread, but it's ** specification ** because it's just bean paste.

Since it is only hiragana, add "," appropriately.

Due to the slow specifications, hiragana will be used a lot. Therefore, in order to improve readability, add "," after the particle as much as possible. See "_check_separator" for more information on this condition.

Add "san" to the end of "noun"

By adding "san" to the end of the noun, you can express the slowness. If the noun follows, there are conditions such as exclusion, so please check "_check_san" for details.

Add "noze" to the end of the word

The ending of Slow Marisa has a characteristic, and in many cases the end of the sentence ends with "Noze" or "Nanoze", so I reproduced it.

An example is as follows.

Managing payments and spending is a matter of course

If there is

It's natural to manage spending and spending.

It will be.

See "_get_gobi" for ending conditions.

Replace word

Try to replace some words. For example, the dirty word "feces" is replaced with "yes" to stabilize the user's mind. This replacement is performed according to the contents registered in the converter variable.

Summary

By using MeCab's morphological analysis, it was confirmed that sentences that are bad for mental health can be disguised as if they were talking slowly and cutely.

By applying this, it is thought that translations into sentences such as "Slow Reimu", "Slow Youmu", and "Yaruo" can be performed.

The application that runs on the Web and its code are attached below.

** Slow translation ** http://needtec.sakura.ne.jp/yukkuri_translator/ https://github.com/mima3/yukkuri_translator

that's all.

Recommended Posts

Use MeCab to translate sloppy sentences in a "slow" way.
Easy way to use Wikipedia in Python
A memorandum because I stumbled on trying to use MeCab in Python
A clever way to time processing in Python
Use ELMo, BERT, USE to detect anomalies in sentences
A memorandum on how to use keras.preprocessing.image in Keras
Convenient to use matplotlib subplots in a for statement
Fixed a way to force Windows to boot in UEFI
Add a dictionary to MeCab
How to use pyenv and pyenv-virtualenv in your own way
A simple way to avoid multiple for loops in Python
A standard way to develop and distribute packages in Python
Introducing a good way to manage DB connections in Python
An easy way to re-execute a previously executed command in ipython
[Introduction to Python] How to use the in operator in a for statement?
The strongest way to use MeCab and CaboCha with Google Colab
How to use classes in Theano
How to use SQLite in Python
How to use Mysql in python
How to use ChemSpider in Python
How to use PubChem in Python
Connect to postgreSQL from Python and use stored procedures in a loop.
Use a shortcut to enable or disable the touchpad in Linux Mint
What is the fastest way to create a reverse dictionary in python?
I thought it would be slow to use a for statement in NumPy, but that wasn't the case.
How to use calculated columns in CASTable
[Introduction to Python] How to use class in Python?
Use print in a Python2 lambda expression
I want to print in a comprehension
How to use mecab, neologd-ipadic on colab
How to use Google Test in C
How to get a stacktrace in python
Minimum knowledge to use Form in Flask
How to use Anaconda interpreter in PyCharm
How to use __slots__ in Python class
[V11 ~] A memorandum to put in Misskey
Use Python3's Subprocess.run () in a CGI script
How to use regular expressions in Python
How to use Map in Android ViewPager
A way to understand Python duck typing
How to use is and == in Python
Use WebDAV in a Portable Docker environment
Flutter in Docker-How to build and use a Flutter development environment inside a Docker container
Use libsixel to output Sixel in Python and output a Matplotlib graph to the terminal.
Use dHash to locate on the course from a scene in a racing game
Searching for an efficient way to write a Dockerfile in Python with poetry
Python program is slow! I want to speed up! In such a case ...
How to use loc / iloc / ix to get by specifying a column in CASTable