"Mecab" that can analyze Japanese morphological elements. It's also an excellent tool, and it's built into each programming language and used in many places.
However, when implemented on Python3, there are cases where "** characters cannot be acquired on node.surface, which should be able to acquire characters, resulting in an error **". Correspondence memo in such a case.
Execution environment
If you do the following, a bug will occur.
tagger = MeCab.Tagger('-Ochasen')
node = tagger.parseToNode(sentence)
while node:
	print(node.surface) # <=Characters cannot be acquired and an encoding error occurs
	node = node.next
The response to this works well if you "parse the empty string and then parse the target string". (Reference: How to use MeCab on Ubuntu 14.04 and Python 3 )
tagger = MeCab.Tagger('-Ochasen')
tagger.parse('') # <=Parse the empty string
node = tagger.parseToNode(sentence)
while node:
	print(node.surface) # <=You can get the characters!
	node = node.next
I'm not sure why, but this seems to be a known bug. I want you to respond as soon as possible because it is too trapped ...
Recommended Posts