Thing you want to do

Julius seems to have a reading file and a grammar file to speed up authentication. I will try which one is easier to use.

background

Try Julius voice authentication to create a Raspberry Pi Robo that answers your question.

As shown in the figure below, the final goal is voice authentication and transcription with Raspberry Pi 3 x Julius x Watson (Speech to Text). (http://qiita.com/nanako_ut/items/1e044eb494623a3961a5)

This time, we will verify Julius in parts ① and ② in the figure below.

environment

Raspberry Pi3 --USB microphone (SANWA SUPPLY MM-MCUSB16 USB microphone) --julius 4.3.1 (Open Source Speech Recognition Library)

Premise

The following is assumed to be ready. For reference, list the link of the site that I referred to

--Enable the microphone on Raspberry Pi 3 -Easy to do! Conversation with Raspberry pi using speech recognition and speech synthesis -Try voice recognition and voice synthesis with Raspberry Pi 2 --Julius installation on Raspberry Pi 3 -Voice recognition by Julius-Utilization of domestic open source library

procedure

Create reading file
Create grammar file
Analyze julius voice with python
Summary (difference between reading file and grammar file)

■ Reading file

1.1 Create a reading file

$ cat julius_watson.yomi
Raspberry pi
Watson started
Watson finished Watson Shuryo
Test test
Start
End end
End End
Reply to me
Say something
Regards

1.2 Convert to dictionary format

iconv -f utf8 -t eucjp ~/julius_watson.yomi | ~/julius-kits/dictation-kit-v4.3.1-linux/bin/yomi2voca.pl > ~/julius-kits/dictation-kit-v4.3.1-linux/julius_watson.dic

1.3 Create a configuration file

$ cat ~/julius-kits/dictation-kit-v4.3.1-linux/julius_watson.jconf
-w julius_watson.dic ← Converted to dictionary format above.Specify dic
-v model/lang_m/bccwj.60k.htkdic
-h model/phone_m/jnas-tri-3k16-gid.binhmm
-hlist model/phone_m/logicalTri
-lmp 8.0 -2.0
-lmp2 8.0 -2.0
-b 1500
-b2 100
-s 500
-m 10000
-n 30
-output 1
-input mic
-zmeanframe
-rejectshort 800
-charconv EUC-JP UTF-8

1.4 Execution

$ cd julius-kits/dictation-kit-v4.3.1-linux
~/julius-kits/dictation-kit-v4.3.1-linux $ julius -C julius_watson.jconf -demo

STAT: include config: julius_watson.jconf
WARNING: m_chkparam: "-lmp" only for N-gram, ignored
WARNING: m_chkparam: "-lmp2" only for N-gram, ignored
STAT: jconf successfully finalized

~ Halfway through ~

----------------------- System Information end -----------------------

Notice for feature extraction (01),
        *************************************************************
        * Cepstral mean normalization for real-time decoding:       *
        * NOTICE: The first input may not be recognized, since      *
        *         no initial mean is available on startup.          *
        *************************************************************

Stat: adin_oss: device name = /dev/dsp (application default)
Stat: adin_oss: sampling rate = 16000Hz
Stat: adin_oss: going to set latency to 50 msec
Stat: adin_oss: audio I/O Latency = 32 msec (fragment size = 512 samples)
STAT: AD-in thread created

pass1_best:Regards ← Noise
sentence1:Regards ← Noise
pass1_best:Regards ← Say "Thank you"
sentence1:nice to meet you
pass1_best:Watson ← Speak "Watson"
sentence1:Watson
pass1_best:Watson start ← Say "Watson start"
sentence1:Watson started
pass1_best:Raspberry Pi ← Say "I want to be hungry"
sentence1:Raspberry pi
<<< please speak >>>^C

1.5 Impressions

Even with noise, "Thank you" is displayed ... Are all undecidable words interpreted as the last letter of the reading file? ??

■ Create a grammar file

2.1 Pronunciation phoneme sequence in a voca file? Describe

$ cat julius_watson.voca
Watson w a t s o n
Raspberry pi r a z u p a i
ＮＦＬ  a m e f u t o
Electric d e n k i
% WO
W o
% PLEASE
Put on t u k e t e
Erase k e sh i t e
% NS_B
[s]     silB
% NS_E
[s]     silE

2.2 Create a grammar file to enforce syntax constraints

cat julius_watson.grammar
S      : NS_B WATSON_ PLEASE NS_E
WATSON_ : WATSON
WATSON_ : WATSON WO

2.3 Compile grammar files, configuration constraint files, etc.

cp julius-4.3.1/gramtools/mkdfa/mkfa-1.44-flex/mkfa julius-4.3.1/gramtools/mkdfa/mkfa
cp julius-4.3.1/gramtools/dfa_minimize/dfa_minimize julius-4.3.1/gramtools/mkdfa/dfa_minimize

sudo julius-4.3.1/gramtools/mkdfa/mkdfa.pl julius_watson

julius_watson.grammar has 3 rules
julius_watson.voca    has 5 categories and 9 words
---
Now parsing grammar file
Now modifying grammar to minimize states[-1]
Now parsing vocabulary file
Now making nondeterministic finite automaton[6/6]
Now making deterministic finite automaton[6/6]
Now making triplet list[6/6]
5 categories, 6 nodes, 6 arcs
-> minimized: 6 nodes, 6 arcs
---
generated: julius_watson.dfa julius_watson.term julius_watson.dict

2.4 Operation check

Stat: adin_oss: device name = /dev/dsp (application default)
Stat: adin_oss: sampling rate = 16000Hz
Stat: adin_oss: going to set latency to 50 msec
Stat: adin_oss: audio I/O Latency = 32 msec (fragment size = 512 samples)
STAT: AD-in thread created
pass1_best: [s]Watson started[s]← Speak "Watson"
pass1_best_wordseq: 3 0 2 4
pass1_best_phonemeseq: silB | w a t o s n | k a i s i | silE
pass1_best_score: -3108.902100
### Recognition: 2nd pass (RL heuristic best-first)
STAT: 00 _default: 23 generated, 23 pushed, 5 nodes popped in 122
sentence1: [s]Watson started[s]
wseq1: 3 0 2 4
phseq1: silB | w a t o s n | k a i s i | silE
cmscore1: 1.000 0.482 0.476 1.000
score1: -3108.899414

pass1_best: [s]Raspberry Pi[s]← Say "Raspberry Pi"
pass1_best_wordseq: 3 0 2 4
pass1_best_phonemeseq: silB | r a z u p a i | s i t e | silE
pass1_best_score: -3268.691406
### Recognition: 2nd pass (RL heuristic best-first)
STAT: 00 _default: 23 generated, 23 pushed, 5 nodes popped in 132
sentence1: [s]Raspberry Pi[s]
wseq1: 3 0 2 4
phseq1: silB | r a z u p a i | s i t e | silE
cmscore1: 1.000 0.959 0.691 1.000
score1: -3268.694824

<<< please speak >>>

2.5 Impressions

When you pronounce a word, do you interpret it by supplementing the noun + verb? ?? There is a feeling of not being good. .. .. It's a little to make up for what I didn't say ...

■ Analyze julius voice with python

It seems that you can connect to Julius from other modules by adding the -module option. So, start Julius with the -module option, and try to output the connection / analysis result from python to the julius server.

3.1 python program

Julius connection & analysis result output program. Someone copied the source of the ancestor, but ... I lost sight of the source. .. .. We will update it as soon as it becomes clear.

`Julius_test.py`


#!/usr/bin/python
# -*- coding: utf-8 -*-
import socket
import cStringIO
host = 'XXX.XXX.XX.XX' #← Enter the local host address

port = 10500
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((host, port))
xml_buff = ""
in_recoguout = False
while True:
    data = cStringIO.StringIO(sock.recv(4096))
    line = data.readline()
    while line:
        if line.startswith(""):
            in_recoguout = True
            xml_buff += line
        elif line.startswith(""):
            xml_buff += line
            print xml_buff
            in_recoguout = False
            xml_buff = ""
        else:
            if in_recoguout:
                xml_buff += line
        line = data.readline()
sock.close()

3.2 Execution

First, start Julius in module mode

~/julius-kits/dictation-kit-v4.3.1-linux $ julius -C main.jconf -C am-gmm.jconf -module

Execution result

$ python Julius_test.py
<RECOGOUT>
  <SHYPO RANK="1" SCORE="-5520.531738">
    <WHYPO WORD="" CLASSID="<s>" PHONE="silB" CM="0.200"/>
    <WHYPO WORD="voice" CLASSID="voice+noun" PHONE="o N s e:" CM="0.187"/>
    <WHYPO WORD="Authentication" CLASSID="Authentication+noun" PHONE="n i N sh o:" CM="0.074"/>
    <WHYPO WORD="test" CLASSID="test+noun" PHONE="t e s u t o" CM="0.273"/>
    <WHYPO WORD="。" CLASSID="</s>" PHONE="silE" CM="1.000"/>
  </SHYPO>
</RECOGOUT>

■ Summary (difference between reading file and grammar file)

Here is a summary of the differences when running in module mode.

4.1 How to execute

■ Reading file
cd julius-kits/dictation-kit-v4.3.1-linux
julius -C julius_watson.jconf -module

■ Grammar file
julius -C julius-kits/grammar-kit-v4.1/hmm_mono.jconf -input mic -gram julius_watson
※.hmm_mono.In jconf,-Describe module option

4.2 Execution result

The result of saying "Watson started"

■ Grammar file

<RECOGOUT>
  <SHYPO RANK="1" SCORE="-2817.017578" GRAM="0">
    <WHYPO WORD="[s]" CLASSID="3" PHONE="silB" CM="1.000"/>
    <WHYPO WORD="Watson" CLASSID="0" PHONE="w a t s n" CM="0.973"/>
    <WHYPO WORD="erase" CLASSID="2" PHONE="k e s h i t e" CM="0.560"/>
    <WHYPO WORD="[s]" CLASSID="4" PHONE="silE" CM="1.000"/>
  </SHYPO>
</RECOGOUT>

■ Reading file

<RECOGOUT＞
  <SHYPO RANK="1" SCORE="-2903.453613" GRAM="0">
    <WHYPO WORD="Watson" CLASSID="Watson" PHONE="silB w a t o s o N silE" CM="0.791"/>
  </SHYPO>
</RECOGOUT>

<RECOGOUT>
  <SHYPO RANK="1" SCORE="-8478.763672" GRAM="0">
    <WHYPO WORD="Watson started" CLASSID="Watson started" PHONE="silB w a t o s o N k a i sh i silE" CM="1.000"/>
  </SHYPO>
</RECOGOUT>

4.3 Consideration

If you say "Start Watson" ・ Grammar file ⇒ It will reply that the accuracy of "Erase Watson" that was hit by "Watson" is high. ・ Reading file ⇒ Since nouns and verbs are separated, "Watson" and "Watson start" are judged separately.

⇒ How to register the grammar file in words? ?? Even if you don't call it a sentence, if you speak with a noun + verb, it seems that misrecognition will increase considerably. This time, the grammar file looks better.

Finally

Julius was described as slow with raspberryPi2, but I felt it was quite fast with raspbeerypi3. For the purpose of improving authentication speed, it may not be necessary to have a reading file or grammar file. If I could limit the words I spoke to to some extent, I wondered if I would use a reading file or grammar file to improve the authentication rate.

[PYTHON] Raspberry Pi 3 x Julius (reading file and grammar file)

Thing you want to do

background

environment

Premise

procedure

■ Reading file

1.1 Create a reading file

1.2 Convert to dictionary format

1.3 Create a configuration file

1.4 Execution

1.5 Impressions

■ Create a grammar file

2.1 Pronunciation phoneme sequence in a voca file? Describe

2.2 Create a grammar file to enforce syntax constraints

2.3 Compile grammar files, configuration constraint files, etc.

2.4 Operation check

2.5 Impressions

■ Analyze julius voice with python

3.1 python program

Julius_test.py

3.2 Execution

■ Summary (difference between reading file and grammar file)

4.1 How to execute

4.2 Execution result

4.3 Consideration

Finally

`Julius_test.py`