I made a class to get the analysis result by MeCab in ndarray with python

0. How to use MeCab with python

  1. Do your best to install MeCab. ** If you are using 64-bit version of python (3.7 or above) from windows, don't be fooled by old information **. Installing MeCab using mecab-python-windows doesn't work. Instead, pip install mecab worked fine in one shot.
  2. ʻimport MeCab. Note that it is not ʻimport Mecab
  3. Make it like `print (MeCab.Tagger ('-Owakati'). parse ('Plum hahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahahaha

What you get is a "long string", not a list or tuple. A little inconvenient.

1. What I did

I made a class with python. By doing MeCab.Tagger ('-Odump') in the constructor, all the information fetched by MeCab is stored in the field, and each method fetches only the necessary information from that field in regular expression and outputs it.

The code is as follows.

MeCab_handler.py


import re, MeCab
import numpy as np
import jaconv
from pykakasi import kakasi

class MeCab_handler:
    """
    MeCab.Tagger('-Odump').parse((Constructor arguments))And
Get the result as a one-dimensional ndarray in each method
    
    """
    def __init__(self, sentence):
        self.parse_result = MeCab.Tagger('-Odump').parse(sentence)

    def get_separated(self):
        """
Word-separation
        """
        tmp = np.array(re.findall('\n[0-9]+ ([^ ]*)', self.parse_result))
        return tmp[0:np.size(tmp)-1] #Cut EOS

    def get_words_basic(self):
        """
Uninflected word
        """
        tmp = np.array(re.findall('\n[0-9]+ [^ ]* (?:[^,]*,){6,6}([^,]*)', self.parse_result))
        return tmp[0:np.size(tmp)-1]

    def get_POS(self, need_detail=False):
        """
Part of speech
Optional argument need_If detail is True,
Subclassification(Up to 3 stages)Get it if there is
        """
        if need_detail:
            tmp = np.array(re.findall('\n[0-9]+ [^ ]* ([^,]*(?:,[^*,]+(?:,[^*,]+(?:,[^*,])?)?)?)', self.parse_result))
        else:
            tmp = np.array(re.findall('\n[0-9]+ [^ ]* ([^,]*)', self.parse_result))
            
        return tmp[0:np.size(tmp)-1] #Cut EOS

    def get_conjugation_type(self):
        """
Utilization type
        """
        tmp = np.array(re.findall('\n[0-9]+ [^ ]* (?:[^,]*,){4,4}([^,]*)', self.parse_result), dtype='object')
        tmp = np.where(tmp=='*', None, tmp)
        return tmp[0:np.size(tmp)-1]

    def get_conjugation_form(self):
        """
Inflected form
        """
        tmp = np.array(re.findall('\n[0-9]+ [^ ]* (?:[^,]*,){5,5}([^,]*)', self.parse_result))
        return tmp[0:np.size(tmp)-1]

    def get_katakana(self):
        """
Katakana
        """
        tmp = np.array(re.findall('\n[0-9]+ [^ ]* (?:[^,]*,){7,7}([^,]*)', self.parse_result))
        return tmp[0:np.size(tmp)-1]

    def get_hiragana(self):
        """
Hiragana
        """
        katakanas = self.get_katakana()
        hiraganas = np.zeros(0, dtype=katakanas.dtype)
        for katakana in katakanas:
            hiraganas = np.append(hiraganas, jaconv.kata2hira(katakana))
        return hiraganas
            
        
    def get_how_to_speak(self):
        """
How to pronounce. get_hiragana and get_It may be different from katakana etc.
Romaji
        """
        tmp = np.array(re.findall('\n[0-9]+ [^ ]* (?:[^,]*,){8,8}([^ ]*)', self.parse_result))
        katakanas = tmp[0:np.size(tmp)-1]

        kakac = kakasi()
        kakac.setMode("K", "a") #Katakana to ascii
        kakac.setMode("r", "Hepburn") #Hepburn is adopted for Romaji
        conv = kakac.getConverter()

        romans = np.zeros(0, dtype='object')        
        for katakana in katakanas:
            romans = np.append(romans, conv.do(katakana))
        return romans

I wrote the function in the source code, but it is as shown in the table below.

Method Example (print ('print (MeCab_handler ('The United States cried. Movie Doraemon" Nobita's Theory and Practice "). Method))
get_separated () ['National''is''crying''. "" Movie "" Doraemon "" "" "Nobita" "" Theory "" and "" Practice "" ""]
get_words_basic () ['National''is''cry''". "" Movie "" Doraemon "" "" "Nobita" "" Theory "" and "" Practice "" ""]
get_POS () `['noun''particle'' particle''auxiliary verb''symbol''noun''noun''symbol''particle'
get_POS (True) ['noun, proper noun, region, one'" particle, case particle, general'' verb, independence''particle'' symbol, punctuation''noun, general' '''Noun, proper noun, person's name, first name'' Noun, generalization''Noun, general''' Noun, case particle, general''Noun, Sahen connection''Noun, parenthesis closing']
get_conjugation_type () [None None'Five-stage / Kakou Ionbin''Special / Ta'None None None None None None None None None None]
get_conjugation_form () ['*''*''Conjugated word''Uninflected word''*''*''*''*''*''*''*' *'*' *'*' ]
get_katakana () ['Zenbei''Ga''Nai''Ta''. "" Aiga "" Doraemon "" "" "Nobita" "No" "Lilon" "To" "Jissen" ""]
get_hiragana () `['Zenbei''is''not''wa'. ''Eiga''Doraemon' '
get_how_to_speak() ['zenbei' 'ga' 'nai' 'ta' '。' 'eiga' 'doraemon' '「' 'nobita' 'no' 'riron' 'to' 'jissen' '」']

Recommended Posts

I made a class to get the analysis result by MeCab in ndarray with python
[Python] Get the files in a folder with Python
I made a program to check the size of a file in Python
[Python] Created a class to play sin waves in the background with pyaudio
Get the result in dict format with Python psycopg2
I want to work with a robot in python.
I made a module in C language to filter images loaded by Python
How to get a list of files in the same directory with python
I made a package to filter time series with python
How to use the __call__ method in a Python class
I tried "How to get a method decorated in Python"
I tried to get started with Hy ・ Define a class
How to get the last (last) value in a list in Python
I made a puzzle game (like) with Tkinter in Python
I made a fortune with Python.
I made a daemon with Python
Recursively get the Excel list in a specific folder with python and write it to Excel.
I made a program to collect images in tweets that I liked on twitter with Python
I tried to create a Python script to get the value of a cell in Microsoft Excel
I also tried to imitate the function monad and State monad with a generator in Python
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
I made a library to easily read config files with Python
[Python] A memo that I tried to get started with asyncio
I wanted to solve the ABC164 A ~ D problem with Python
I made a command to display a colorful calendar in the terminal
I wrote a class that makes it easier to divide by specifying part of speech when using Mecab in python
I made a payroll program in Python!
I made a character counter with Python
How to get a stacktrace in python
I made a Hex map with Python
I made a roguelike game with Python
I made a simple blackjack with Python
I made a configuration file with Python
I made a neuron simulator with Python
Output the result of morphological analysis with Mecab to a WEB browser compatible with Sakura server / UTF-8
A story that didn't work when I tried to log in with the Python requests module
A memo organized by renaming the file names in the folder with python
I made a web application in Python that converts Markdown to HTML
How to get the date and time difference in seconds with python
[Django] I made a field to enter the date with 4 digit numbers
I want to batch convert the result of "string" .split () in Python
I want to explain the abstract class (ABCmeta) of Python in detail.
I tried to get the authentication code of Qiita API with Python.
I get a UnicodeDecodeError when trying to connect to oracle with python sqlalchemy
I tried to discriminate a 6-digit number with a number discrimination application made with python
Environment maintenance made with Docker (I want to post-process GrADS in Python
I made a script in python to convert .md files to Scrapbox format
I get a Python No module named'encodings' error with the aws command
A memorandum because I stumbled on trying to use MeCab in Python
I tried to get the movie information of TMDb API with Python
I made a mistake in fetching the hierarchy with MultiIndex of pandas
I tried to verify the result of A / B test by chi-square test
How to sort by specifying a column in the Python Numpy array.
How to deal with old Python versions in Cloud9 made by others
I made a function to see the movement of a two-dimensional array (Python)
I tried to open the latest data of the Excel file managed by date in the folder with Python
I made a competitive programming glossary with Python
I made a weather forecast bot-like with Python.
I made a GUI application with Python + PyQt5
Get the caller of a function in Python
I want to create a window in Python