[PYTHON] [Einführung in RasPi4] Umgebungskonstruktion, Mecab des Verarbeitungssystems für natürliche Sprache usw. .. .. ♪

Installieren Sie Lib, ein System zur Verarbeitung natürlicher Sprache, um die Konversations-App auszuführen. ~~ -Es ist fast das gleiche wie Nano, aber ~~ ** Ich hatte es schwer **, deshalb möchte ich es sorgfältig beschreiben. Es ist fast als Referenz, aber einige Verzeichnisse sind unterschiedlich, deshalb werden wir es unterstützen. 【Referenz】

Mecab unter Ubuntu 18.10 installieren

installiere mecab

$ sudo apt install mecab
$ sudo apt install libmecab-dev
$ sudo apt install mecab-ipadic-utf8

Ich habe es bisher getan.

$ mecab
Limited Express Hakutaka
Begrenztes Express-Nomen,Allgemeines,*,*,*,*,Eingeschränkter Express,Tokyu,Tokkyu
Ist ein Assistent,Hilfe,*,*,*,*,Ist,C.,Beeindruckend
Ku Verb,Unabhängigkeit,*,*,Kahen / Kuru,Wortverbindungsspezial 2,Kommen Sie,Ku,Ku
Hilfsverb,*,*,*,Besondere,Grundform,Ta,Ta,Ta
Ka Assistent,Hilfs- / Parallelassistent / Endassistent,*,*,*,*,Oder,Leistung,Leistung
EOS

Sie erhalten die obige Ausgabe.

Installieren Sie neologd

$ git clone https://github.com/neologd/mecab-ipadic-neologd.git
$ cd mecab-ipadic-neologd
$ sudo bin/install-mecab-ipadic-neologd

Sie können es bis zu diesem Punkt problemlos installieren. Das Herunterladen des Wörterbuchs dauerte lange (ca. 30 Minuten).

Bearbeiten Sie / etc / mecabrc

Hier ist ein Problem aufgetreten. In Ubuntu wird das Wörterbuch im folgenden Verzeichnis installiert, in Raspbian scheint es jedoch anders zu sein.

dicdir = /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd

Suchen Sie also das Verzeichnis, in dem die Datei vorhanden ist. 【Referenz】 Dateien suchen [suchen und suchen]

$ sudo find / -name '*mecab-ipadic-neologd*'
/usr/lib/arm-linux-gnueabihf/mecab/dic/mecab-ipadic-neologd

Sie können es jetzt mit dem folgenden Befehl neu schreiben. Weitere Informationen finden Sie im Befehl vi. 【Referenz】 Grundlegende Bedienung von vi

$ sudo vi /etc/mecabrc

Also habe ich es wie folgt umgeschrieben.

$ cat /etc/mecabrc
;
; Configuration file of MeCab
;
; $Id: mecabrc.in,v 1.3 2006/05/29 15:36:08 taku-ku Exp $;
;
;dicdir = /var/lib/mecab/dic/debian
dicdir =/usr/lib/arm-linux-gnueabihf/mecab/dic/mecab-ipadic-neologd 
; userdic = /home/foo/bar/user.dic

; output-format-type = wakati
; input-buffer-size = 8192

; node-format = %m\n
; bos-format = %S\n
; eos-format = EOS\n

Bestätigen Sie dann, dass sich das Wörterbuch geändert hat. Ich konnte sie in einer zusammenhängenden Form als "Hakutaka" trennen.

$ mecab
Limited Express Hakutaka
Begrenztes Express-Nomen,Allgemeines,*,*,*,*,Eingeschränkter Express,Tokyu,Tokkyu
Hakutaka Substantiv,Proprietäre Nomenklatur,Allgemeines,*,*,*,Hakutaka,Hakutaka,Hakutaka
EOS

Stellen Sie es in Python3 zur Verfügung

sudo apt install swig
sudo apt install python3-pip
sudo pip3 install mecab-python3

Das Referenzbeispiel funktioniert nun wie folgt.

$ python3 mecab_sample.py
Substantiv,固有Substantiv,Allgemeines,*,*,*,Hakutaka,Hakutaka,Hakutaka
Hakutaka
Substantiv,固有Substantiv,Bereich,Allgemeines,*,*,Toyama,Toyama,Toyama
Toyama
Substantiv,固有Substantiv,Bereich,Allgemeines,*,*,Kanazawa,Kanazawa,Kanazawa
Kanazawa
Substantiv,固有Substantiv,Bereich,Allgemeines,*,*,Kenrokuen,Ken Roquen,Ken Roquen
Kenrokuen

Installieren Sie pyaudio

Die Konversations-App verwendet Pyaudio, da sie Sprachkonversationen ausgibt. 【Referenz】 PyAudio | Python Memorandum installieren

$ sudo apt-get install python3-pyaudio

Ich konnte es erfolgreich installieren.

Installieren Sie Pykakasi

Dies wird verwendet, um aufgezeichnetes Audio zu generieren (Dateiname ist alphabetisch) und generiertes Audio in Text zu konvertieren.

$ pip3 install pykakasi --user

Überprüfen Sie mit dem Code unten

# coding: utf-8
from pykakasi import kakasi
kakasi = kakasi()
kakasi.setMode('H', 'a')
kakasi.setMode('K', 'a')
kakasi.setMode('J', 'a')
conv = kakasi.getConverter()
filename = 'Es ist ein sonniger Tag.jpg'
print(filename) #Es ist ein sonniger Tag.jpg
print(type(filename))
print(conv.do(filename))

`Ausgabebeispiel.`


$ python3 pykakasi_ex.py
Es ist ein sonniger Tag.jpg
<class 'str'>
honjitsuhaseitennari.jpg

Umgebung

$ uname -a
Linux raspberrypi 4.19.97-v7l+ #1294 SMP Thu Jan 30 13:21:14 GMT 2020 armv7l GNU/Linux

$ cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"

Führen Sie die Konversations-App aus

gensm_ex1.py

$ python3 gensm_ex1.py

Fange an zu trainieren
Epoch: 1
gensm_ex1.py:16: DeprecationWarning: Call to deprecated `iter` (Attribute will be removed in 4.0.0, use self.epochs instead).
  model.train(sentences, epochs=model.iter, total_examples=model.corpus_count)
Epoch: 2
Epoch: 3
Epoch: 4
Epoch: 5
Epoch: 6
Epoch: 7
Epoch: 8
Epoch: 9
Epoch: 10
Epoch: 11
Epoch: 12
Epoch: 13
Epoch: 14
Epoch: 15
Epoch: 16
Epoch: 17
Epoch: 18
Epoch: 19
Epoch: 20
SENT_0
[('SENT_2', 0.08270145207643509), ('SENT_3', 0.0347767099738121), ('SENT_1', -0.08307887613773346)]
SENT_3
[('SENT_0', 0.0347767099738121), ('SENT_1', 0.02076556906104088), ('SENT_2', -0.003991239238530397)]
SENT_1
[('SENT_3', 0.02076556347310543), ('SENT_2', 0.010350690223276615), ('SENT_0', -0.08307889103889465)]
gensm_ex1.py:33: DeprecationWarning: Call to deprecated `similar_by_word` (Method will be removed in 4.0.0, use self.wv.similar_by_word() instead).
  print (model.similar_by_word(u"Fisch"))
[('jetzt', 0.15166150033473969), ('Meer', 0.09887286275625229), ('Morgen', 0.03284810855984688), ('Katze', 0.019402338191866875), ('Bellte', -0.0008345211390405893), ('schwimmen', -0.02624458074569702), ('jetzt日', -0.05557712912559509), ('Hund', -0.0900348424911499)]

RaspberryPi4_conversation/model_skl.py /

$ python3 model_skl.py
TfidfVectorizer(analyzer='word', binary=False, decode_error='strict',
                dtype=<class 'numpy.float64'>, encoding='utf-8',
                input='content', lowercase=True, max_df=1.0, max_features=None,
                min_df=1, ngram_range=(1, 1), norm='l2', preprocessor=None,
                smooth_idf=True, stop_words=None, strip_accents=None,
                sublinear_tf=False, token_pattern='(?u)\\b\\w\\w+\\b',
                tokenizer=None, use_idf=True, vocabulary=None)
{'ich': 5, 'Sojasauce': 6, 'Ramen': 2, 'Tonkotsu': 1, 'Mögen': 4, 'ist': 0, 'Miso': 3}
{'Sojasauce': 4, 'Ramen': 1, 'Tonkotsu': 0, 'Mögen': 3, 'Miso': 2}
Sojasauce 4
Ramen 1
Tonkotsu 0
Wie 3
Miso 2
['Tonkotsu', 'Ramen', 'Miso', 'Mögen', 'Sojasauce']
  (0, 4)	0.4976748316029239
  (0, 1)	0.7081994831914716
  (0, 0)	0.3540997415957358
  (0, 3)	0.3540997415957358
  (1, 1)	0.7081994831914716
  (1, 0)	0.3540997415957358
  (1, 3)	0.3540997415957358
  (1, 2)	0.4976748316029239
{'Sojasauce': 6, 'Ramen': 3, 'Tonkotsu': 2, 'Mögen': 5, 'Miso': 4, 'Katsudon': 1, 'Oshiyaki': 0}
  (0, 6)	0.5486117771118656
  (0, 3)	0.6480379064629606
  (0, 2)	0.4172333972107692
  (0, 5)	0.3240189532314803
  (1, 3)	0.6480379064629607
  (1, 2)	0.41723339721076924
  (1, 5)	0.32401895323148033
  (1, 4)	0.5486117771118657
  (2, 3)	0.35959372325985667
  (2, 5)	0.35959372325985667
  (2, 1)	0.6088450986844796
  (2, 0)	0.6088450986844796
[[1.         0.69902512 0.34954555]
 [0.69902512 1.         0.34954555]
 [0.34954555 0.34954555 1.        ]]

Auf diese Weise bin ich sicher zur Konversations-App gekommen. RaspberryPi4_conversation/auto_conversation_.py

$ python3 auto_conversation_.py -i data/conversation_n.txt  -s data/stop_words.txt
data/conversation_n.txt
>Heute ist schönes Wetter
(0.41):Korrekt.
>Was ist das
(0.55):Was ist ein Hund?
>Der Hund ist ein Hund
(0.41):Wo war der Hund?
>Es ist eine Gasse da drüben
(0.46):* * * Was ist es?
>In der Nähe des Parks
(0.00): """"Warum muss ich es alleine machen? Bitte hilf mir ein wenig.","""
>Was soll ich helfen
(0.46):(Ja das ist gut)
>Irgendwie ist es verrückt
(0.38):Über wen redest du
>Sie sehen
(0.00):Sagen Sie "Awanchan", und diese Leute sind übrigens hierher gegangen, und wir sind hierher gegangen.
>Ich erinnere mich nicht daran
(0.35):Ich bin kein Idiot
>Ich finde es blöd
(0.33):Er sieht gut aus.
>

Unten finden Sie die Installation dessen, was Sie benötigen.

【Referenz】 How to install scipy and numpy on Ubuntu 16.04?

$ sudo apt update
$ sudo apt upgrade

$ sudo apt install python3-numpy python3-scipy

$ sudo pip3 install numpy scipy
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Requirement already satisfied: numpy in /usr/lib/python3/dist-packages (1.16.2)
Requirement already satisfied: scipy in /usr/lib/python3/dist-packages (1.1.0)

$ pip3 install --user gensim

Successfully installed boto-2.49.0 boto3-1.11.14 botocore-1.14.14 gensim-3.8.1 jmespath-0.9.4 s3transfer-0.3.3 smart-open-1.9.0

【Referenz】 Installation von scikit-learn in Ubuntu

$ sudo pip3 install scikit-learn
...
Requirement already satisfied: scipy>=0.17.0 in /usr/lib/python3/dist-packages (from scikit-learn) (1.1.0)
Requirement already satisfied: numpy>=1.11.0 in /usr/lib/python3/dist-packages (from scikit-learn) (1.16.2)
Installing collected packages: joblib, scikit-learn
Successfully installed joblib-0.14.1 scikit-learn-0.22.1

Zusammenfassung

・ Installierte Bibliothek für ein System in natürlicher Sprache auf RasPi4 erforderlich ・ Vorläufig konnte ich eine Anwendung in natürlicher Sprache ausführen

・ Ich möchte die Konversations-App etwas anständiger gestalten

Bonus

Dies ist im Begriff einzutreten. 【Referenz】・ Unterschied zwischen Pip-Liste und Einfrieren

$ pip3 freeze > requirements.txt

・ RaspberryPi4_conversation / require.txt

$ pip3 freeze
absl-py==0.9.0
arrow==0.15.5
asn1crypto==0.24.0
astor==0.8.1
astroid==2.1.0
asttokens==1.1.13
attrs==19.3.0
automationhat==0.2.0
backcall==0.1.0
beautifulsoup4==4.7.1
bleach==3.1.0
blinker==1.4
blinkt==0.1.2
boto==2.49.0
boto3==1.11.14
botocore==1.14.14
buttonshim==0.0.2
Cap1xxx==0.1.3
certifi==2018.8.24
chardet==3.0.4
Click==7.0
colorama==0.3.7
colorzero==1.1
cookies==2.2.1
cryptography==2.6.1
cycler==0.10.0
Cython==0.29.14
decorator==4.4.1
defusedxml==0.6.0
dill==0.3.1.1
docutils==0.14
drumhat==0.1.0
entrypoints==0.3
envirophat==1.0.0
ExplorerHAT==0.4.2
Flask==1.0.2
fourletterphat==0.1.0
gast==0.3.3
gensim==3.8.1
google-pasta==0.1.8
gpiozero==1.5.1
grpcio==1.27.1
h5py==2.10.0
html5lib==1.0.1
idna==2.6
importlib-metadata==1.5.0
ipykernel==5.1.4
ipython==7.12.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
isort==4.3.4
itsdangerous==0.24
jedi==0.13.2
jinja2-time==0.2.0
jmespath==0.9.4
joblib==0.14.1
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==5.3.4
jupyter-console==6.1.0
jupyter-core==4.6.1
Keras==2.3.1
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
keyring==17.1.1
keyrings.alt==3.1.1
kiwisolver==1.1.0
klepto==0.1.8
lazy-object-proxy==1.3.1
logilab-common==1.4.2
lxml==4.3.2
make==0.1.6.post1
Markdown==3.2
MarkupSafe==1.1.0
matplotlib==3.1.3
mccabe==0.6.1
mecab-python3==0.996.3
microdotphat==0.2.1
mistune==0.8.4
mote==0.0.4
motephat==0.0.2
mypy==0.670
mypy-extensions==0.4.1
nbconvert==5.6.1
nbformat==5.0.4
notebook==6.0.3
numpy==1.16.2
oauthlib==2.1.0
olefile==0.46
opencv-python==3.4.6.27
pandocfilters==1.4.2
pantilthat==0.0.7
parso==0.3.1
pexpect==4.8.0
pgzero==1.2
phatbeat==0.1.1
pianohat==0.1.0
picamera==1.13
pickleshare==0.7.5
piglow==1.2.5
pigpio==1.44
pox==0.2.7
prometheus-client==0.7.1
prompt-toolkit==3.0.3
protobuf==3.11.3
psutil==5.5.1
ptyprocess==0.6.0
PyAudio==0.2.11
pygame==1.9.4.post1
Pygments==2.3.1
PyGObject==3.30.4
pyinotify==0.9.6
PyJWT==1.7.0
pykakasi==1.2
pylint==2.2.2
pyOpenSSL==19.0.0
pyparsing==2.4.6
pyrsistent==0.15.7
pyserial==3.4
python-apt==1.8.4.1
python-dateutil==2.8.1
PyYAML==5.3
pyzmq==18.1.1
qtconsole==4.6.0
rainbowhat==0.1.0
requests==2.21.0
requests-oauthlib==1.0.0
responses==0.9.0
roman==2.0.0
RPi.GPIO==0.7.0
RTIMULib==7.2.1
s3transfer==0.3.3
scikit-learn==0.22.1
scipy==1.1.0
scrollphat==0.0.7
scrollphathd==1.2.1
SecretStorage==2.3.1
Send2Trash==1.5.0
sense-hat==2.2.0
simplejson==3.16.0
six==1.12.0
skywriter==0.0.7
smart-open==1.9.0
sn3218==1.2.7
soupsieve==1.8
spidev==3.4
ssh-import-id==5.7
tensorboard==1.13.1
tensorflow-estimator==1.14.0
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
thonny==3.2.6
tornado==6.0.3
touchphat==0.0.1
traitlets==4.3.3
twython==3.7.0
unicornhathd==0.0.4
wcwidth==0.1.8
webencodings==0.5.1
widgetsnbextension==3.5.1
wrapt==1.11.2
zipp==2.2.0