Make pyknp (JUMAN, KNP) available on windows

table of contents

  1. Development environment
  2. Download various software
  3. Confirm at the command prompt
  4. pyknp installation
  5. Usage test
  6. Program rewriting
  7. Reference / Digression

I hope you find it helpful, please do your best to set it up!

1. Development environment

environment *Windows10 *Python-3.7.5 64bit **Terminal? ** ** *Command prompt

2. Download various software

First download JUMAN and KNP 1.JUMAN

About the third from the top of the download column, ** JUMAN Ver.7.0 (Windows 64bit version) (with installer; 8,330,604 bytes) ** There is. Download it, open it and move on to installation image.png The settings should be fine ...

1.KNP

Also search for the one below from the download column and download it KNP Ver.4.11 (Windows 64bit version) (with installer; 979,363,446 bytes) Somehow the size is ! / So it takes some time If you download and open this as well, the installation will start, so follow it

Set PATH

path must be set by yourself I referred to ** here **. All you have to do is mess with the system

  1. Open the system properties by searching or something
  2. Advanced settings
  3. [Preferences] at the bottom
  4. [path] in System Preferences
  5. [Edit]
  6. Add [C: \ Program Files \ juman]
  7. Add [C: \ Program Files \ knp] 8.[OK] Some people may want to reboot. I had to reboot to see it

Digression: I use JUMAN ++ and KNP in pyknp, but I use the old version of JUMAN because it was troublesome to use JUMAN on windows. The correspondence of this is written at the bottom

3. Confirm at the command prompt

First open a command prompt (all can be closed with ctrl + C) ** Confirmation method 1: Enter juman **

C:~\> juman
Enter some text
(Success story)
Something something something adverb 8* 0 * 0 * 0 "Representative notation:Something/Something 標準:what/What+Or/Or"
Sentence Bunsho Sentence Noun 6 Common noun 1* 0 * 0 "Representative notation:Sentence/Bunsho category:Abstract"
To the particle 9 case particle 1* 0 * 0 NIL
Input Nyuryoku Input Noun 6 Sahen Noun 2* 0 * 0 "Representative notation:input/Nyuryoku Category:Abstract domain:Science / Technology Rebellion:noun-サ変noun:output/Shutsuryoku"
EOS

** Confirmation method 2: echo Some text | Enter juman **

C:~\>echo Enter some text| juman
Something something something adverb 8* 0 * 0 * 0 "Representative notation:Something/Something 標準:what/What+Or/Or"
Sentence Bunsho Sentence Noun 6 Common noun 1* 0 * 0 "Representative notation:Sentence/Bunsho category:Abstract"
To the particle 9 case particle 1* 0 * 0 NIL
Input Nyuryoku Input Noun 6 Sahen Noun 2* 0 * 0 "Representative notation:input/Nyuryoku Category:Abstract domain:Science / Technology Rebellion:noun-サ変noun:output/Shutsuryoku"
  \  \Special 1 blank 6* 0 * 0 NIL
EOS

** Confirmation method 3: Enter juman | knp **

C:\~> juman | knp
Enter some text
# S-ID:1 KNP:4.11-CF1.1 DATE:2020/11/23 SCORE:-27.41598
Something ──┐
Sentence ──┤
input
EOS

** (Failure example) **

'juman'Is an internal or external command,
It is not recognized as an operable program or batch file.

4. pyknp installation

Working in the Visual Studio Code terminal install with pip install pyknp

Terminal


C:~\>pip install pyknp

If you can't, (in my case, it seems that it was a character code error) https://qiita.com/Nidhog-tm/items/c7e9d759ce1a0f5c85c6 Use UTF-8 according to ** After that, garbled characters will probably occur somewhere, so uncheck> English> Japanese, and fix it **

5. Usage test

Maybe I will throw an absolute error. From here, we will use this script to check for errors. (Reference: https://pyknp.readthedocs.io/en/latest/)

test.py


# coding: utf-8
from __future__ import unicode_literals # It is not necessary when you use python3.
from pyknp import Juman
jumanpp = Juman()   # default is JUMAN++: Juman(jumanpp=True). if you use JUMAN, use Juman(jumanpp=False)
result = jumanpp.analysis("The approach to Shimogamo Shrine was dark.")
for mrph in result.mrph_list(): #Access each morpheme
    print("Heading:%s,reading:%s,Prototype:%s,Part of speech:%s,Part of speech細分類:%s,Utilization type:%s,Inflected form:%s,Semantic information:%s,Representative notation:%s" \
            % (mrph.midasi, mrph.yomi, mrph.genkei, mrph.hinsi, mrph.bunrui, mrph.katuyou1, mrph.katuyou2, mrph.imis, mrph.repname))

### (Success story) ###
Heading:Shimogamo,reading:Shimogamo,Prototype:Shimogamo,Part of speech:noun,Part of speech細分類:Place name,Utilization type:*,Inflected form:*,Semantic information:Automatic acquisition:Wikipedia WikipediaPlace name,Representative notation:
Heading:Shrine,reading:Shinto shrine,Prototype:Shrine,Part of speech:noun,Part of speech細分類:普通noun,Utilization type:*,Inflected form:*,Semantic information:Representative notation:Shrine/Shinto shrine ドメイン:Culture / Arts Category:place-Facility place name end, Representative notation:Shrine/Shinto shrine
Heading:of,reading:of,Prototype:of,Part of speech:Particle,Part of speech細分類:接続Particle,Utilization type:*,Inflected form:*,Semantic information:NIL,Representative notation:
Heading:Approach,reading:Sando,Prototype:Approach,Part of speech:noun,Part of speech細分類:普通noun,Utilization type:*,Inflected form:*,Semantic information:Representative notation:Approach/Sando ドメイン:Culture / Arts Category:place-Facility, Representative notation:Approach/Sando
Heading:Is,reading:Is,Prototype:Is,Part of speech:Particle,Part of speech細分類:副Particle,Utilization type:*,Inflected form:*,Semantic information:NIL,Representative notation:
Heading:It was dark,reading:It was easy,Prototype:dark,Part of speech:adjective,Part of speech細分類:*,Utilization type:イadjectiveアウオ段,Inflected form:T shape,Semantic information:Representative notation:dark/About, Representative notation:dark/About
Heading:。,reading:。,Prototype:。,Part of speech:Special,Part of speech細分類:Punctuation,Utilization type:*,Inflected form:*,Semantic information:NIL,Representative notation:

6. Program rewriting

It really took me a day here. But if you know the cause, you can go in a few minutes ... (Return the time) The place to rewrite is the .py file inside the pyknp file.

1.knp.py

knp.py


# (Line 29)
#Rewrite contents
jumancommand='jumanpp'To'juman'To
jumanpp     =Set True to False

#Before rewriting
def __init__(self, command='knp', server=None, port=31000, timeout=60,
                 option='-tab', rcfile='', pattern=r'EOS',
                 jumancommand='jumanpp', jumanrcfile='',
                 jumanoption='', jumanpp=True):
#After rewriting
def __init__(self, command='knp', server=None, port=31000, timeout=60,
                 option='-tab', rcfile='', pattern=r'EOS',
                 jumancommand='juman', jumanrcfile='',
                 jumanoption='', jumanpp=False):

2.juman.py

knp.py


# (27th line)
#Rewrite contents
command = 'jumanpp'To'juman'To
jumanpp =Set True to False

#Before rewriting
    def __init__(self, command='jumanpp', server=None, port=32000, timeout=30,
                 option='', rcfile='', ignorepattern='',
                 pattern=r'^EOS$', jumanpp=True):
#After rewriting
    def __init__(self, command='juman', server=None, port=32000, timeout=30,
                 option='', rcfile='', ignorepattern='',
                 pattern=r'^EOS$', jumanpp=False):

3.process.py

process.py


# (Line 72)
#Rewrite content 1
signal.signal(signal.SIGALRM, alarm_handler)
signal.alarm(self.process_timeout)
# ↓
alarm = threading.Timer(self.process_timeout, alarm_handler)
alarm.start()

#Rewrite content 2
self.process.stdin.write(sentence.encode('utf-8') + six.b('\n'))
# ↓ (utf-8 to cp932)
self.process.stdin.write(sentence.encode('cp932') + six.b('\n'))

#Rewrite content 3
line = self.process.stdout.readline().rstrip().decode('utf-8')
# ↓ (utf-8 to cp932)
line = self.process.stdout.readline().rstrip().decode('cp932')

#Rewrite content 4
signal.alarm(0)
# ↓
alarm.cancel()


#After rewriting
    def query(self, sentence, pattern):
        assert(isinstance(sentence, six.text_type))

        def alarm_handler(signum, frame):
            raise subprocess.TimeoutExpired(self.process_command, self.process_timeout)
#Rewrite point 1
        # signal.signal(signal.SIGALRM, alarm_handler)
        # signal.alarm(self.process_timeout)
        alarm = threading.Timer(self.process_timeout, alarm_handler)
        alarm.start()
        result = ""
        try:
#Rewrite point 2
            # self.process.stdin.write(sentence.encode('utf-8') + six.b('\n'))
            self.process.stdin.write(sentence.encode('cp932') + six.b('\n'))
            self.process.stdin.flush()
            while True:
#Rewrite point 3
                # line = self.process.stdout.readline().rstrip().decode('utf-8')
                line = self.process.stdout.readline().rstrip().decode('cp932')
                if re.search(pattern, line):
                    break
                result = "%s%s\n" % (result, line)
        finally:
#Rewrite point 4
            # signal.alarm(0)
            alarm.cancel()
        return result

By the way, if you explain the mechanism of reference, ** See juman for knp ** (If the command name used at this time is not jumanpp (JUMAN ++) but juman, an error will occur) ** juman see process ** (The command used at this time must also be juman) ** See subprocess (the one that executes terminal commands) for process ** (When referring here, most windows use the character code'cp932', so using'utf-8' causes a problem, so fix it. In addition, alarm is also rewritten for windows. It seems that the subprocess system is not designed for using the powershell which is the terminal of windows. It is troublesome.) It has become a relationship

7. Reference / Digression

reference: JUMAN >>> http://nlp.ist.i.kyoto-u.ac.jp/index.php?JUMAN KNP >>> http://nlp.ist.i.kyoto-u.ac.jp/?KNP pyKNP >>> http://nlp.ist.i.kyoto-u.ac.jp/?PyKNP Reference site: ** About pyKNP >>> https://pyknp.readthedocs.io/en/latest/** [** About subprocess >>> https://docs.python.org/ja/3.5/library/asyncio-subprocess.html ](https://docs.python.org/ja/3.5/library/asyncio -subprocess.html) [ A good site I found while writing an article >>> http://chuckischarles.hatenablog.com/entry/2019/09/12/150505 **](http://chuckischarles.hatenablog.com/ entry / 2019/09/12/150505)

What I have tried so far

Finally, I wanted to do a dependency analysis, so I worked on this one

  1. There is a dependency analysis called ** CaboCha **, but apparently it doesn't support ** 64bit **, so it's 32bit. I gave up because it was troublesome to do. So I am using this KNP. (One more thing you need, MeCab for morphological analysis, is probably easy to install with pip)

  2. pyknp originally supports ** juman ++**, but there is no ** installer **, It will be troublesome if you do not do it on Linux (For windows, path settings, environment settings, utf-8 and other settings are extremely difficult. I could not do it.)

  3. There is a module called ** subprocess ** that is initially installed in python, but this is one of the difficult points that it was made for linux. For example, it seems that the windows shell cannot be used unless ** Shell ='True' **, but this setting is dangerous and ** not recommended **, so I gave up.

  4. When converting with utf-8, it seems that the number of characters is reduced when converting a string to bytes (?) Due to the above specifications. Therefore, this time, I changed the conversion method to one using cp932 instead of using utf-8.

Since this is my first post, I would like to correct any mistakes, so I would appreciate it if you could contact me. I hope this article is for someone !!!

Thank you very much to the people in the laboratory who developed juman, knp, pyknp. We would like to take this opportunity to thank you.

Recommended Posts

Make pyknp (JUMAN, KNP) available on windows
Make iPython available on OSGeo4W
Make anaconda environment available from command prompt on windows
[Python] Make pip available on macOS
Make SciPy, scikit-learn available on M1 chip Macbooks
Make pip available on Mac (easy_install is deprecated)
Make DHT11 available on Raspberry Pi + python (memo)
Pylint on Windows Atom
Linux (WSL) on Windows
Use pyvenv on Windows
Anaconda on Windows Terminal
Install Anaconda on Windows 10
python basic on windows ②
Install python on windows
Install pycuda on Windows10
Build TensorFlow on Windows
Try FEniCS on Windows!
Build XGBoost on Windows
Install pygraphviz on Windows 10
Use Ansible on Windows
Try Poerty on Windows
Install Chainer 1.5.0 on Windows
Use QuTiP on Windows
Use pip on Windows