[PYTHON] Raspberry Pi 3 x Julius (reading file and grammar file)

Thing you want to do

Julius seems to have a reading file and a grammar file to speed up authentication. I will try which one is easier to use.

background

Try Julius voice authentication to create a Raspberry Pi Robo that answers your question.

As shown in the figure below, the final goal is voice authentication and transcription with Raspberry Pi 3 x Julius x Watson (Speech to Text). (http://qiita.com/nanako_ut/items/1e044eb494623a3961a5)

This time, we will verify Julius in parts ① and ② in the figure below. img20170324_14192489.jpg

environment

Premise

The following is assumed to be ready. For reference, list the link of the site that I referred to

--Enable the microphone on Raspberry Pi 3 -Easy to do! Conversation with Raspberry pi using speech recognition and speech synthesis -Try voice recognition and voice synthesis with Raspberry Pi 2 --Julius installation on Raspberry Pi 3 -Voice recognition by Julius-Utilization of domestic open source library

procedure

  1. Create reading file
  2. Create grammar file
  3. Analyze julius voice with python
  4. Summary (difference between reading file and grammar file)

■ Reading file

1.1 Create a reading file

$ cat julius_watson.yomi
Raspberry pi
Watson started
Watson finished Watson Shuryo
Test test
Start
End end
End End
Reply to me
Say something
Regards

1.2 Convert to dictionary format

iconv -f utf8 -t eucjp ~/julius_watson.yomi | ~/julius-kits/dictation-kit-v4.3.1-linux/bin/yomi2voca.pl > ~/julius-kits/dictation-kit-v4.3.1-linux/julius_watson.dic

1.3 Create a configuration file

$ cat ~/julius-kits/dictation-kit-v4.3.1-linux/julius_watson.jconf
-w julius_watson.dic ← Converted to dictionary format above.Specify dic
-v model/lang_m/bccwj.60k.htkdic
-h model/phone_m/jnas-tri-3k16-gid.binhmm
-hlist model/phone_m/logicalTri
-lmp 8.0 -2.0
-lmp2 8.0 -2.0
-b 1500
-b2 100
-s 500
-m 10000
-n 30
-output 1
-input mic
-zmeanframe
-rejectshort 800
-charconv EUC-JP UTF-8

1.4 Execution

$ cd julius-kits/dictation-kit-v4.3.1-linux
~/julius-kits/dictation-kit-v4.3.1-linux $ julius -C julius_watson.jconf -demo

STAT: include config: julius_watson.jconf
WARNING: m_chkparam: "-lmp" only for N-gram, ignored
WARNING: m_chkparam: "-lmp2" only for N-gram, ignored
STAT: jconf successfully finalized

~ Halfway through ~

----------------------- System Information end -----------------------

Notice for feature extraction (01),
        *************************************************************
        * Cepstral mean normalization for real-time decoding:       *
        * NOTICE: The first input may not be recognized, since      *
        *         no initial mean is available on startup.          *
        *************************************************************

Stat: adin_oss: device name = /dev/dsp (application default)
Stat: adin_oss: sampling rate = 16000Hz
Stat: adin_oss: going to set latency to 50 msec
Stat: adin_oss: audio I/O Latency = 32 msec (fragment size = 512 samples)
STAT: AD-in thread created

pass1_best:Regards ← Noise
sentence1:Regards ← Noise
pass1_best:Regards ← Say "Thank you"
sentence1:nice to meet you
pass1_best:Watson ← Speak "Watson"
sentence1:Watson
pass1_best:Watson start ← Say "Watson start"
sentence1:Watson started
pass1_best:Raspberry Pi ← Say "I want to be hungry"
sentence1:Raspberry pi
<<< please speak >>>^C

1.5 Impressions

Even with noise, "Thank you" is displayed ... Are all undecidable words interpreted as the last letter of the reading file? ??

■ Create a grammar file

2.1 Pronunciation phoneme sequence in a voca file? Describe

$ cat julius_watson.voca
Watson w a t s o n
Raspberry pi r a z u p a i
NFL  a m e f u t o
Electric d e n k i
% WO
W o
% PLEASE
Put on t u k e t e
Erase k e sh i t e
% NS_B
[s]     silB
% NS_E
[s]     silE

2.2 Create a grammar file to enforce syntax constraints

cat julius_watson.grammar
S      : NS_B WATSON_ PLEASE NS_E
WATSON_ : WATSON
WATSON_ : WATSON WO

2.3 Compile grammar files, configuration constraint files, etc.

cp julius-4.3.1/gramtools/mkdfa/mkfa-1.44-flex/mkfa julius-4.3.1/gramtools/mkdfa/mkfa
cp julius-4.3.1/gramtools/dfa_minimize/dfa_minimize julius-4.3.1/gramtools/mkdfa/dfa_minimize

sudo julius-4.3.1/gramtools/mkdfa/mkdfa.pl julius_watson

julius_watson.grammar has 3 rules
julius_watson.voca    has 5 categories and 9 words
---
Now parsing grammar file
Now modifying grammar to minimize states[-1]
Now parsing vocabulary file
Now making nondeterministic finite automaton[6/6]
Now making deterministic finite automaton[6/6]
Now making triplet list[6/6]
5 categories, 6 nodes, 6 arcs
-> minimized: 6 nodes, 6 arcs
---
generated: julius_watson.dfa julius_watson.term julius_watson.dict

2.4 Operation check

Stat: adin_oss: device name = /dev/dsp (application default)
Stat: adin_oss: sampling rate = 16000Hz
Stat: adin_oss: going to set latency to 50 msec
Stat: adin_oss: audio I/O Latency = 32 msec (fragment size = 512 samples)
STAT: AD-in thread created
pass1_best: [s]Watson started[s]← Speak "Watson"
pass1_best_wordseq: 3 0 2 4
pass1_best_phonemeseq: silB | w a t o s n | k a i s i | silE
pass1_best_score: -3108.902100
### Recognition: 2nd pass (RL heuristic best-first)
STAT: 00 _default: 23 generated, 23 pushed, 5 nodes popped in 122
sentence1: [s]Watson started[s]
wseq1: 3 0 2 4
phseq1: silB | w a t o s n | k a i s i | silE
cmscore1: 1.000 0.482 0.476 1.000
score1: -3108.899414

pass1_best: [s]Raspberry Pi[s]← Say "Raspberry Pi"
pass1_best_wordseq: 3 0 2 4
pass1_best_phonemeseq: silB | r a z u p a i | s i t e | silE
pass1_best_score: -3268.691406
### Recognition: 2nd pass (RL heuristic best-first)
STAT: 00 _default: 23 generated, 23 pushed, 5 nodes popped in 132
sentence1: [s]Raspberry Pi[s]
wseq1: 3 0 2 4
phseq1: silB | r a z u p a i | s i t e | silE
cmscore1: 1.000 0.959 0.691 1.000
score1: -3268.694824

<<< please speak >>>

2.5 Impressions

When you pronounce a word, do you interpret it by supplementing the noun + verb? ?? There is a feeling of not being good. .. .. It's a little to make up for what I didn't say ...

■ Analyze julius voice with python

It seems that you can connect to Julius from other modules by adding the -module option. So, start Julius with the -module option, and try to output the connection / analysis result from python to the julius server.

3.1 python program

Julius connection & analysis result output program. Someone copied the source of the ancestor, but ... I lost sight of the source. .. .. We will update it as soon as it becomes clear.

Julius_test.py


#!/usr/bin/python
# -*- coding: utf-8 -*-
import socket
import cStringIO
host = 'XXX.XXX.XX.XX' #← Enter the local host address

port = 10500
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((host, port))
xml_buff = ""
in_recoguout = False
while True:
    data = cStringIO.StringIO(sock.recv(4096))
    line = data.readline()
    while line:
        if line.startswith(""):
            in_recoguout = True
            xml_buff += line
        elif line.startswith(""):
            xml_buff += line
            print xml_buff
            in_recoguout = False
            xml_buff = ""
        else:
            if in_recoguout:
                xml_buff += line
        line = data.readline()
sock.close()

3.2 Execution

First, start Julius in module mode

~/julius-kits/dictation-kit-v4.3.1-linux $ julius -C main.jconf -C am-gmm.jconf -module

Execution result

$ python Julius_test.py
<RECOGOUT>
  <SHYPO RANK="1" SCORE="-5520.531738">
    <WHYPO WORD="" CLASSID="<s>" PHONE="silB" CM="0.200"/>
    <WHYPO WORD="voice" CLASSID="voice+noun" PHONE="o N s e:" CM="0.187"/>
    <WHYPO WORD="Authentication" CLASSID="Authentication+noun" PHONE="n i N sh o:" CM="0.074"/>
    <WHYPO WORD="test" CLASSID="test+noun" PHONE="t e s u t o" CM="0.273"/>
    <WHYPO WORD="。" CLASSID="</s>" PHONE="silE" CM="1.000"/>
  </SHYPO>
</RECOGOUT>

■ Summary (difference between reading file and grammar file)

Here is a summary of the differences when running in module mode.

4.1 How to execute

■ Reading file
cd julius-kits/dictation-kit-v4.3.1-linux
julius -C julius_watson.jconf -module

■ Grammar file
julius -C julius-kits/grammar-kit-v4.1/hmm_mono.jconf -input mic -gram julius_watson
※.hmm_mono.In jconf,-Describe module option

4.2 Execution result

The result of saying "Watson started"

■ Grammar file

<RECOGOUT>
  <SHYPO RANK="1" SCORE="-2817.017578" GRAM="0">
    <WHYPO WORD="[s]" CLASSID="3" PHONE="silB" CM="1.000"/>
    <WHYPO WORD="Watson" CLASSID="0" PHONE="w a t s n" CM="0.973"/>
    <WHYPO WORD="erase" CLASSID="2" PHONE="k e s h i t e" CM="0.560"/>
    <WHYPO WORD="[s]" CLASSID="4" PHONE="silE" CM="1.000"/>
  </SHYPO>
</RECOGOUT>

■ Reading file

<RECOGOUT>
  <SHYPO RANK="1" SCORE="-2903.453613" GRAM="0">
    <WHYPO WORD="Watson" CLASSID="Watson" PHONE="silB w a t o s o N silE" CM="0.791"/>
  </SHYPO>
</RECOGOUT>

<RECOGOUT>
  <SHYPO RANK="1" SCORE="-8478.763672" GRAM="0">
    <WHYPO WORD="Watson started" CLASSID="Watson started" PHONE="silB w a t o s o N k a i sh i silE" CM="1.000"/>
  </SHYPO>
</RECOGOUT>

4.3 Consideration

If you say "Start Watson" ・ Grammar file ⇒ It will reply that the accuracy of "Erase Watson" that was hit by "Watson" is high. ・ Reading file ⇒ Since nouns and verbs are separated, "Watson" and "Watson start" are judged separately.

⇒ How to register the grammar file in words? ?? Even if you don't call it a sentence, if you speak with a noun + verb, it seems that misrecognition will increase considerably. This time, the grammar file looks better.

Finally

Julius was described as slow with raspberryPi2, but I felt it was quite fast with raspbeerypi3. For the purpose of improving authentication speed, it may not be necessary to have a reading file or grammar file. If I could limit the words I spoke to to some extent, I wondered if I would use a reading file or grammar file to improve the authentication rate.

Recommended Posts

Raspberry Pi 3 x Julius (reading file and grammar file)
Use raspberry Pi and Julius (speech recognition). ③ Dictionary creation
Create an LCD (16x2) game with Raspberry Pi and Python
MQTT on Raspberry Pi and Mac
Python CSV file reading and writing
Pet monitoring with Rekognition and Raspberry pi
Voice authentication & transcription with Raspberry Pi 3 x Julius x Watson (Speech to Text)
[Raspberry Pi] Add a thermometer and a hygrometer
RabbitMQ message notification app in Python with Growl ~ with Raspberry Pi and Julius ~
[pandas] .csv file reading and display method
Creating a temperature control system with Raspberry Pi and ESP32 (3) Recipient Python file
MQTT RC car with Arduino and Raspberry Pi
Easy connection between Raspberry Pi and AWS IoT
Get temperature and humidity with DHT11 and Raspberry Pi
Raspberry Pi and AWS IoT connection program example
Raspberry Pi backup
Record temperature and humidity with systemd on Raspberry Pi
Machine learning with Raspberry Pi 4 and Coral USB Accelerator
Create a color sensor using a Raspberry Pi and a camera
Easy IoT to start with Raspberry Pi and MESH
Detect mask wearing status with OpenCV and Raspberry Pi
Measure temperature and humidity with Raspberry Pi3 and visualize with Ambient
Ubuntu 20.04 on raspberry pi 4 with OpenCV and use with python
Installation of Docker on Raspberry Pi and L Chika
Install pyenv on Raspberry Pi and version control Python
Getting Started with Yocto Project with Raspberry Pi 4 and WSL2
Troubleshoot with installing OpenCV on Raspberry Pi and capturing