Try Watson's Speech to text. Try running the sample demo site below (https://www.ibm.com/blogs/watson/2016/07/getting-robots-listen-using-watsons-speech-text-service/)
Watson's voice authentication (Speech to Text) for creating Raspberry Pi Robo that can convert video audio into text in real time Try.
As shown in the figure below, the final goal is voice authentication and transcription with Raspberry Pi 3 x Julius x Watson (Speech to Text). (http://qiita.com/nanako_ut/items/1e044eb494623a3961a5)
This time, we will search for the watson voice authentication method in part (4) of the figure below.
The following is assumed to be ready. --User registration to watson (It seems that all services can be used free of charge for one month after registration) --Created Speech to Text service with watson and obtained credentials
Specify the audio file (test.wat) and upload it to watson via HTTP connection
curl -X POST -u username:passward --header "Content-Type: audio/wav" --header "Transfer-Encoding: chunked" --data-binary @test.wav "https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?model=ja-JP_BroadbandModel"
Something has returned. But ... the characters are garbled ... Is Raspberry Pi UTF-8 garbled due to Japanese analysis results (S-JIS?)? ??
Implemented with reference to this sample source Getting robots to listen: Using Watson’s Speech to Text service
python library for watson watson-developer-cloud-0.23.0 installation
Not required if pip is already installed. It wasn't in the Raspberry Pi I'm using, probably because I put RASPBIAN JESSIE LITE in Raspberry Pi 3. .. ..
$ python -m pip -V
/usr/bin/python: No module named pip
$ sudo apt-get install python-pip
Reading package lists... Done
Building dependency tree
~ Halfway through ~
$ python -m pip -V
pip 1.5.6 from /usr/lib/python2.7/dist-packages (python 2.7)
update
$ sudo pip install -U pip
Downloading pip-9.0.1-py2.py3-none-any.whl (1.3MB): 1.3MB downloaded
Installing collected packages: pip
Found existing installation: pip 1.5.6
Not uninstalling pip at /usr/lib/python2.7/dist-packages, owned by OS
Successfully installed pip
Cleaning up...
$ python -m pip -V
pip 9.0.1 from /usr/local/lib/python2.7/dist-packages (python 2.7)
$ sudo pip install --upgrade watson-developer-cloud
Collecting watson-developer-cloud
Downloading watson-developer-cloud-0.23.0.tar.gz (52kB)
~ Halfway through ~
Successfully installed pysolr-3.6.0 requests-2.12.5 watson-developer-cloud-0.23.0
Copy the referenced site
watson_test1.py
from watson_developer_cloud import SpeechToTextV1
import json
stt = SpeechToTextV1(username="username", password="password")
audio_file = open("test1.wav", "rb")
print json.dumps(stt.recognize(audio_file, content_type="audio/wav"), indent=2)
Something came back. It seems that the text is being returned. However, it should have been a longer voice, but the text was cut off in the middle! ?? ??
{
"results": [
{
"alternatives": [
{
"confidence": 0.438,
"transcript": "so we know it's coming Julio just say yeah lost me grow mandatory right here shone like a great kid fifth grader etan Allemand planning his fifth critics "
}
],
"final": true
}
],
"result_index": 0
}
It seems that you can analyze voice in real time by using something called webSocket.
(https://www.html5rocks.com/ja/tutorials/websockets/basics/) The WebSocket specification defines an API that establishes a "socket" connection between a web browser and a server. Simply put, there is a persistent connection between the client and the server, and either side can start sending data at any time.
It seems.
(http://www.atmarkit.co.jp/ait/articles/1111/11/news135.html) In HTML5, a new communication standard called "WebSocket" has been added. Feature
Once a connection is established between the server and the client, data can be exchanged via socket communication without being aware of the communication procedure unless explicitly disconnected. A server with a WebSocket connection and all clients can share the same data and send and receive in real time. In the conventional communication technology, an HTTP header is added each time communication is performed, so in addition to sending and receiving data according to the number of connections, a small amount of traffic is generated and resources are consumed. WebSocket sends a handshake request from the client side to continue using the connection on the first connection. The server side uses one connection by returning a handshake response and continues. It seems.
I see. .. ..
Install ws4py library for webSocket
$ sudo pip install ws4py
Collecting ws4py
Downloading ws4py-0.3.5-py2-none-any.whl (40kB)
100% |????????????????????????????????| 40kB 661kB/s
Installing collected packages: ws4py
Successfully installed ws4py-0.3.5
Copy the referenced site
watson_test2.py
from ws4py.client.threadedclient import WebSocketClient
import base64, time
class SpeechToTextClient(WebSocketClient):
def __init__(self):
ws_url = "wss://stream.watsonplatform.net/speech-to-text/api/v1/recognize"
username = "username"
password = "password"
auth_string = "%s:%s" % (username, password)
base64string = base64.encodestring(auth_string).replace("\n", "")
try:
WebSocketClient.__init__(self, ws_url,
headers=[("Authorization", "Basic %s" % base64string)])
self.connect()
except: print "Failed to open WebSocket."
def opened(self):
self.send('{"action": "start", "content-type": "audio/l16;rate=16000"}')
def received_message(self, message):
print message
stt_client = SpeechToTextClient()
time.sleep(3)
stt_client.close()
Audio data is returned.
$ python watson_test2.py
opend
Message received: {u'state': u'listening'}
sleep audio
Recording raw data 'stdin' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono
Message received: {u'results': [{u'alternatives': [{u'confidence': 0.713, u'transcript': u'over the entire course of the scalp was it was all the guys that one rings before imagine '}], u'final': True}], u'result_index': 0}
Hmmm, even though it's real-time, no matter how much voice data you send, you can only receive the first message. Is there any option, or is the data passed badly? It seems that we need to find out a little more.
It seems that the UI of bluemix is changing steadily, the URL of Speech to text is different from the sample, and it is still under development. The drawback is that it takes time to investigate. .. ..
Recommended Posts