[PYTHON] When I try to divide a list with MeCab, I get'TypeError: in method'Tagger_parse', argument 2 of type'char const *''

When I try to divide a list in MeCab I get'TypeError: in method'Tagger_parse', argument 2 of type'char const *''.

The error message states that argument 2 is incorrect, so I thought that the way to write CSV and the way to write code was bad, but I could not solve it.

In addition, the reference source site is as follows Since it is not necessary to index the label, I got various errors one after another when I deleted it. I thought I wouldn't get that much error because I intended to delete only the label-dependent variables. Reference site: https://qiita.com/Qazma/items/0daf927e34d22617ddcd

We apologize for the inconvenience, but we would appreciate it if anyone could understand it.

Supplement: The CSV file has one line and one column, and one sentence per line.

2020-12-25 11:55:30.878680: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
C:\Users\Katuta\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.2) or chardet (4.0.0) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
Traceback (most recent call last):
  File "ex.py", line 5, in <module>
    padded, one_hot_y, word_index, tokenizer, max_len, vocab_size = wakatigaki.create_tokenizer()
  File "C:\Users\Katuta\gotou\wakatigaki.py", line 21, in create_tokenizer
    text_wakati = wakati.parse(text)
  File "C:\Users\Katuta\AppData\Local\Programs\Python\Python38\lib\site-packages\MeCab.py", line 293, in parse
    return _MeCab.Tagger_parse(self, *args)
TypeError: in method 'Tagger_parse', argument 2 of type 'char const *'
Additional information:
Wrong number or type of arguments for overloaded function 'Tagger_parse'.
  Possible C/C++ prototypes are:
    MeCab::Tagger::parse(MeCab::Model const &,MeCab::Lattice *)
    MeCab::Tagger::parse(MeCab::Lattice *) const
    MeCab::Tagger::parse(char const *)
import MeCab
import csv
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.sequence import pad_sequences

def create_tokenizer() :
    text_list = []
    with open("C:/Users/Katuta/gotou/corpus_MEIDAI.csv",'r',encoding="utf-8",errors='ignore') as csvfile :
        texts = csv.reader(csvfile)

        for text in texts :
            text_list.append(text)

    #Use MeCab to divide Japanese text.
        wakati_list = []
        for text in text_list :
            text = list(map(str.lower,text))

            wakati = MeCab.Tagger("-O wakati")
            text_wakati = wakati.parse(text)
            wakati.parse('')
            wakati_list.append(text_wakati)

    #Find out the number of elements in the largest sentence.
    #Create a list of text data to use in the tokenizer.
        max_len = -1
        split_list = []
        sentences = []
        for text in wakati_list :
            text = text.split()
            split_list.extend(text)
            sentences.append(text)

            if len(text) > max_len :
                max_len = len(text)
        print("Max length of texts: ", max_len)
        vocab_size = len(set(split_list))
        print("Vocabularay size: ", vocab_size)

    #Use Tokenizer to assign numbers from index 1 to words.
    #Also create a dictionary.
        tokenizer = tf.keras.preprocessing.text.Tokenizer(oov_token="<oov>")
        tokenizer.fit_on_texts(split_list)
        word_index = tokenizer.word_index
        print("Dictionary size: ", len(word_index))
        sequences = tokenizer.texts_to_sequences(sentences)

    # to_categorical()Is the actual label data passed to the model using One-Create Hot vector.
        one_hot_y = tf.keras.utils.to_categorical(sentences)

    #To match the size of the training data, add 0 to the short text to match the longest text data.
        padded = pad_sequences(sequences, maxlen=max_len, padding="post", truncating="post")
        print("padded sequences: ", padded)

        return padded, one_hot_y, word_index, tokenizer, max_len, vocab_size

Self-solving

```python #C language const that makes up MeCab*The type is constant and cannot be changed. #I get an error because I am trying to change it. Therefore, you can avoid the error by temporarily converting the type of text with str. #Use MeCab to divide Japanese text. wakati_list = [] for text in text_list : text = str(text).lower()
        wakati = MeCab.Tagger("-Owakati")
        text_wakati = wakati.parse(text)
        wakati.parse('')
        wakati_list.append(text_wakati)


Recommended Posts

When I try to divide a list with MeCab, I get'TypeError: in method'Tagger_parse', argument 2 of type'char const *''
When I try to divide with Bert Japanese Tokenizer of Hugging Face, it fails with initializing of MeCab or even with encode.
When I got a list of study sessions in Python, I found something I wanted to make
I tried to create a list of prime numbers with python
I wrote a class that makes it easier to divide by specifying part of speech when using Mecab in python
I want to sort a list in the order of other lists
I want to see a list of WebDAV files in the Requests module
How to get a list of files in the same directory with python
[Python] I want to use only index when looping a list with a for statement
I want to transition with a button in flask
How to identify the element with the smallest number of characters in a Python list?
I got stuck when trying to specify a relative path with relative_to () in python
I want to work with a robot in python.
Things to note when initializing a list in Python
I made a class to get the analysis result by MeCab in ndarray with python
What I was addicted to when dealing with huge files in a Linux 32bit environment
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
The story I was addicted to when I specified nil as a function argument in Go
Behavior when giving a list with shell = True in subprocess
How to display a list of installable versions with pyenv
How to get a list of built-in exceptions in python
I tried to divide with a deep learning language model
[Python3] List of sites that I referred to when I started Python
A story that didn't work when I tried to log in with the Python requests module
I tried to embed a protein-protein interaction network in hyperbolic space with Poincarē embeding of gensim
A memorandum when I tried to get it automatically with selenium
[Introduction to StyleGAN] I played with "The Life of a Man" ♬
If you give a list with the default argument of the function ...
I tried to get a list of AMI Names using Boto3
A note I was addicted to when creating a table with SQLAlchemy
Get a list of files in a folder with python without a path
I tried to make a mechanism of exclusive control with Go
When I try to execute the make command of Makefile with os / exec of golang, the second and subsequent executions result in an error.
[Completed version] Try to find out the number of residents in the town from the address list with Python
When creating a matrix in a list
I want to solve the problem of memory leak when outputting a large number of images with Matplotlib
(Matplotlib) I want to draw a graph with a size specified in pixels
Get a list of packages installed in your current environment with python
Gist repository to use when you want to try a little with ansible
Generate a list packed with the number of days in the current month.
[Python] How to put any number of standard inputs in a list
[Introduction to Python] How to sort the contents of a list efficiently with list sort
[Linux] Command to get a list of commands executed in the past
I get a UnicodeDecodeError when trying to connect to oracle with python sqlalchemy
I want to color a part of an Excel string in Python
Receive a list of the results of parallel processing in Python with starmap
A reminder of what I got stuck when starting Atcoder with python
A memorandum because I stumbled on trying to use MeCab in Python
How to format a list of dictionaries (or instances) well in Python
Try to solve a set problem of high school math with Python
I made a program to check the size of a file in Python
I made a mistake in fetching the hierarchy with MultiIndex of pandas
[Python] [Word] [python-docx] Try to create a template of a word sentence in Python using python-docx
I tried to display the altitude value of DTM in a graph
When I try to go back using chainer, it fits a little
[Cloudian # 5] Try to list the objects stored in the bucket with Python (boto3)
I tried to implement a card game of playing cards in Python
What to do when a warning message is displayed in pip list
[Python & SQLite] I tried to analyze the expected value of a race with horses in the 1x win range ①
In IPython, when I tried to see the value, it was a generator, so I came up with it when I was frustrated.
I wanted to know the number of lines in multiple files, so I tried to get it with a command