[PYTHON] [Introduction to RasPi4] Environment construction; natural language processing system mecab, etc. .. .. ♪

Install Lib, a natural language processing system, to run the conversation app. It's almost the same as ~~-nano, but ~~ ** I had a hard time **, so I'd like to describe it carefully. It is almost as a reference, but some directories are different, so we will support it. 【reference】 -Install mecab on ubuntu 18.10

install mecab

$ sudo apt install mecab
$ sudo apt install libmecab-dev
$ sudo apt install mecab-ipadic-utf8

I've done so far.

$ mecab
Limited express Hakutaka
Limited express noun,General,*,*,*,*,Limited express,Tokyu,Tokkyu
Is a particle,Particle,*,*,*,*,Is,C,Wow
Phrasal verb,Independence,*,*,Kuru,Word connection special 2,come,Ku,Ku
Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
Ka particle,Sub-particles / parallel particles / final particles,*,*,*,*,Or,Mosquito,Mosquito
EOS

You will get the above output.

Install neologd

$ git clone https://github.com/neologd/mecab-ipadic-neologd.git
$ cd mecab-ipadic-neologd
$ sudo bin/install-mecab-ipadic-neologd

You can install it up to this point without any problems. It took a long time (about 30 minutes) to download the dictionary.

Edit / etc / mecabrc

A problem occurred here. On ubuntu, dictionaries are installed in the following directories, but on Raspbian it seems different.

dicdir = /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd

So, search the directory where the file exists. 【reference】 Find files [find and locate]

$ sudo find / -name '*mecab-ipadic-neologd*'
/usr/lib/arm-linux-gnueabihf/mecab/dic/mecab-ipadic-neologd

You can now rewrite it with the following command. By the way, please refer to the vi command. 【reference】 Basic operation of vi

$ sudo vi /etc/mecabrc

So, I rewrote it as follows.

$ cat /etc/mecabrc
;
; Configuration file of MeCab
;
; $Id: mecabrc.in,v 1.3 2006/05/29 15:36:08 taku-ku Exp $;
;
;dicdir = /var/lib/mecab/dic/debian
dicdir =/usr/lib/arm-linux-gnueabihf/mecab/dic/mecab-ipadic-neologd 
; userdic = /home/foo/bar/user.dic

; output-format-type = wakati
; input-buffer-size = 8192

; node-format = %m\n
; bos-format = %S\n
; eos-format = EOS\n

Then confirm that the dictionary has changed. I was able to separate them in a cohesive form as "Hakutaka".

$ mecab
Limited express Hakutaka
Limited express noun,General,*,*,*,*,Limited express,Tokyu,Tokkyu
Hakutaka noun,Proper noun,General,*,*,*,Hakutaka,Hakutaka,Hakutaka
EOS

Make it available in python3

sudo apt install swig
sudo apt install python3-pip
sudo pip3 install mecab-python3

Now the reference sample works as below.

$ python3 mecab_sample.py
noun,固有noun,General,*,*,*,Hakutaka,Hakutaka,Hakutaka
Hakutaka
noun,固有noun,area,General,*,*,Toyama,Toyama,Toyama
Toyama
noun,固有noun,area,General,*,*,Kanazawa,Kanazawa,Kanazawa
Kanazawa
noun,固有noun,area,General,*,*,Kenrokuen,Kenrokuen,Kenrokuen
Kenrokuen

Install pyaudio

The conversation app uses pyaudio because it outputs voice conversations. 【reference】 Install PyAudio | Python memorandum

$ sudo apt-get install python3-pyaudio

I was able to install it successfully.

Install Pykakasi

This is used to generate recorded voice (file name is alphabetic) and convert the generated voice to Text.

$ pip3 install pykakasi --user

Check with the code below

# coding: utf-8
from pykakasi import kakasi
kakasi = kakasi()
kakasi.setMode('H', 'a')
kakasi.setMode('K', 'a')
kakasi.setMode('J', 'a')
conv = kakasi.getConverter()
filename = 'Its a sunny day.jpg'
print(filename) #Its a sunny day.jpg
print(type(filename))
print(conv.do(filename))

`Output example.`


$ python3 pykakasi_ex.py
Its a sunny day.jpg
<class 'str'>
honjitsuhaseitennari.jpg

environment

$ uname -a
Linux raspberrypi 4.19.97-v7l+ #1294 SMP Thu Jan 30 13:21:14 GMT 2020 armv7l GNU/Linux

$ cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"

Run conversation app

gensm_ex1.py

$ python3 gensm_ex1.py

Start training
Epoch: 1
gensm_ex1.py:16: DeprecationWarning: Call to deprecated `iter` (Attribute will be removed in 4.0.0, use self.epochs instead).
  model.train(sentences, epochs=model.iter, total_examples=model.corpus_count)
Epoch: 2
Epoch: 3
Epoch: 4
Epoch: 5
Epoch: 6
Epoch: 7
Epoch: 8
Epoch: 9
Epoch: 10
Epoch: 11
Epoch: 12
Epoch: 13
Epoch: 14
Epoch: 15
Epoch: 16
Epoch: 17
Epoch: 18
Epoch: 19
Epoch: 20
SENT_0
[('SENT_2', 0.08270145207643509), ('SENT_3', 0.0347767099738121), ('SENT_1', -0.08307887613773346)]
SENT_3
[('SENT_0', 0.0347767099738121), ('SENT_1', 0.02076556906104088), ('SENT_2', -0.003991239238530397)]
SENT_1
[('SENT_3', 0.02076556347310543), ('SENT_2', 0.010350690223276615), ('SENT_0', -0.08307889103889465)]
gensm_ex1.py:33: DeprecationWarning: Call to deprecated `similar_by_word` (Method will be removed in 4.0.0, use self.wv.similar_by_word() instead).
  print (model.similar_by_word(u"fish"))
[('now', 0.15166150033473969), ('Sea', 0.09887286275625229), ('tomorrow', 0.03284810855984688), ('Cat', 0.019402338191866875), ('Barked', -0.0008345211390405893), ('swim', -0.02624458074569702), ('now日', -0.05557712912559509), ('dog', -0.0900348424911499)]

RaspberryPi4_conversation/model_skl.py /

$ python3 model_skl.py
TfidfVectorizer(analyzer='word', binary=False, decode_error='strict',
                dtype=<class 'numpy.float64'>, encoding='utf-8',
                input='content', lowercase=True, max_df=1.0, max_features=None,
                min_df=1, ngram_range=(1, 1), norm='l2', preprocessor=None,
                smooth_idf=True, stop_words=None, strip_accents=None,
                sublinear_tf=False, token_pattern='(?u)\\b\\w\\w+\\b',
                tokenizer=None, use_idf=True, vocabulary=None)
{'I': 5, 'Soy sauce': 6, 'ramen': 2, 'Tonkotsu': 1, 'Like': 4, 'is': 0, 'miso': 3}
{'Soy sauce': 4, 'ramen': 1, 'Tonkotsu': 0, 'Like': 3, 'miso': 2}
Soy sauce 4
Ramen 1
Tonkotsu 0
Like 3
Miso 2
['Tonkotsu', 'ramen', 'miso', 'Like', 'Soy sauce']
  (0, 4)	0.4976748316029239
  (0, 1)	0.7081994831914716
  (0, 0)	0.3540997415957358
  (0, 3)	0.3540997415957358
  (1, 1)	0.7081994831914716
  (1, 0)	0.3540997415957358
  (1, 3)	0.3540997415957358
  (1, 2)	0.4976748316029239
{'Soy sauce': 6, 'ramen': 3, 'Tonkotsu': 2, 'Like': 5, 'miso': 4, 'Katsudon': 1, 'Okonomiyaki': 0}
  (0, 6)	0.5486117771118656
  (0, 3)	0.6480379064629606
  (0, 2)	0.4172333972107692
  (0, 5)	0.3240189532314803
  (1, 3)	0.6480379064629607
  (1, 2)	0.41723339721076924
  (1, 5)	0.32401895323148033
  (1, 4)	0.5486117771118657
  (2, 3)	0.35959372325985667
  (2, 5)	0.35959372325985667
  (2, 1)	0.6088450986844796
  (2, 0)	0.6088450986844796
[[1.         0.69902512 0.34954555]
 [0.69902512 1.         0.34954555]
 [0.34954555 0.34954555 1.        ]]

In this way, I arrived at the conversation app safely. RaspberryPi4_conversation/auto_conversation_.py

$ python3 auto_conversation_.py -i data/conversation_n.txt  -s data/stop_words.txt
data/conversation_n.txt
>It's nice weather today
(0.41):That's right.
>What is that
(0.55):What is a dog
>The dog is a dog
(0.41):Where was the dog
>It's an alley over there
(0.46):* * * What is it?
>Near the park
(0.00): """"Why do I have to do it alone? Please help me a little.","""
>What should i help
(0.46):(Yes, that's good)
>Somehow it's crazy
(0.38):Who are you talking about
>You see
(0.00):Say "Awanchan", and by the way, these people went over here, and we went over here.
>I don't remember that
(0.35):I'm not an idiot
>I think it's stupid
(0.33):He looks good.
>

Below is the installation of what you need.

【reference】 How to install scipy and numpy on Ubuntu 16.04?

$ sudo apt update
$ sudo apt upgrade

$ sudo apt install python3-numpy python3-scipy

$ sudo pip3 install numpy scipy
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Requirement already satisfied: numpy in /usr/lib/python3/dist-packages (1.16.2)
Requirement already satisfied: scipy in /usr/lib/python3/dist-packages (1.1.0)

$ pip3 install --user gensim

Successfully installed boto-2.49.0 boto3-1.11.14 botocore-1.14.14 gensim-3.8.1 jmespath-0.9.4 s3transfer-0.3.3 smart-open-1.9.0

【reference】 Install scikit-learn in Ubuntu

$ sudo pip3 install scikit-learn
...
Requirement already satisfied: scipy>=0.17.0 in /usr/lib/python3/dist-packages (from scikit-learn) (1.1.0)
Requirement already satisfied: numpy>=1.11.0 in /usr/lib/python3/dist-packages (from scikit-learn) (1.16.2)
Installing collected packages: joblib, scikit-learn
Successfully installed joblib-0.14.1 scikit-learn-0.22.1

Summary

・ Installed Lib required for natural language on RasPi4 ・ For the time being, I was able to run a natural language application

・ I want to make the conversation app a little more decent

bonus

This is about to enter. 【reference】・ Difference between pip list and freeze

$ pip3 freeze > requirements.txt

・ RaspberryPi4_conversation / requirements.txt

$ pip3 freeze
absl-py==0.9.0
arrow==0.15.5
asn1crypto==0.24.0
astor==0.8.1
astroid==2.1.0
asttokens==1.1.13
attrs==19.3.0
automationhat==0.2.0
backcall==0.1.0
beautifulsoup4==4.7.1
bleach==3.1.0
blinker==1.4
blinkt==0.1.2
boto==2.49.0
boto3==1.11.14
botocore==1.14.14
buttonshim==0.0.2
Cap1xxx==0.1.3
certifi==2018.8.24
chardet==3.0.4
Click==7.0
colorama==0.3.7
colorzero==1.1
cookies==2.2.1
cryptography==2.6.1
cycler==0.10.0
Cython==0.29.14
decorator==4.4.1
defusedxml==0.6.0
dill==0.3.1.1
docutils==0.14
drumhat==0.1.0
entrypoints==0.3
envirophat==1.0.0
ExplorerHAT==0.4.2
Flask==1.0.2
fourletterphat==0.1.0
gast==0.3.3
gensim==3.8.1
google-pasta==0.1.8
gpiozero==1.5.1
grpcio==1.27.1
h5py==2.10.0
html5lib==1.0.1
idna==2.6
importlib-metadata==1.5.0
ipykernel==5.1.4
ipython==7.12.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
isort==4.3.4
itsdangerous==0.24
jedi==0.13.2
jinja2-time==0.2.0
jmespath==0.9.4
joblib==0.14.1
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==5.3.4
jupyter-console==6.1.0
jupyter-core==4.6.1
Keras==2.3.1
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
keyring==17.1.1
keyrings.alt==3.1.1
kiwisolver==1.1.0
klepto==0.1.8
lazy-object-proxy==1.3.1
logilab-common==1.4.2
lxml==4.3.2
make==0.1.6.post1
Markdown==3.2
MarkupSafe==1.1.0
matplotlib==3.1.3
mccabe==0.6.1
mecab-python3==0.996.3
microdotphat==0.2.1
mistune==0.8.4
mote==0.0.4
motephat==0.0.2
mypy==0.670
mypy-extensions==0.4.1
nbconvert==5.6.1
nbformat==5.0.4
notebook==6.0.3
numpy==1.16.2
oauthlib==2.1.0
olefile==0.46
opencv-python==3.4.6.27
pandocfilters==1.4.2
pantilthat==0.0.7
parso==0.3.1
pexpect==4.8.0
pgzero==1.2
phatbeat==0.1.1
pianohat==0.1.0
picamera==1.13
pickleshare==0.7.5
piglow==1.2.5
pigpio==1.44
pox==0.2.7
prometheus-client==0.7.1
prompt-toolkit==3.0.3
protobuf==3.11.3
psutil==5.5.1
ptyprocess==0.6.0
PyAudio==0.2.11
pygame==1.9.4.post1
Pygments==2.3.1
PyGObject==3.30.4
pyinotify==0.9.6
PyJWT==1.7.0
pykakasi==1.2
pylint==2.2.2
pyOpenSSL==19.0.0
pyparsing==2.4.6
pyrsistent==0.15.7
pyserial==3.4
python-apt==1.8.4.1
python-dateutil==2.8.1
PyYAML==5.3
pyzmq==18.1.1
qtconsole==4.6.0
rainbowhat==0.1.0
requests==2.21.0
requests-oauthlib==1.0.0
responses==0.9.0
roman==2.0.0
RPi.GPIO==0.7.0
RTIMULib==7.2.1
s3transfer==0.3.3
scikit-learn==0.22.1
scipy==1.1.0
scrollphat==0.0.7
scrollphathd==1.2.1
SecretStorage==2.3.1
Send2Trash==1.5.0
sense-hat==2.2.0
simplejson==3.16.0
six==1.12.0
skywriter==0.0.7
smart-open==1.9.0
sn3218==1.2.7
soupsieve==1.8
spidev==3.4
ssh-import-id==5.7
tensorboard==1.13.1
tensorflow-estimator==1.14.0
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
thonny==3.2.6
tornado==6.0.3
touchphat==0.0.1
traitlets==4.3.3
twython==3.7.0
unicornhathd==0.0.4
wcwidth==0.1.8
webencodings==0.5.1
widgetsnbextension==3.5.1
wrapt==1.11.2
zipp==2.2.0