[PYTHON] [Introduction to AWS] I tried porting the conversation app and playing with text2speech @ AWS ♪

Recently, I was playing with Shogi AI, so the article was neglected, but I participated in the "Own AWS Hackathon (tentative)" from Friday to Saturday, and I left it because I was able to do the title and make an AWS debut. I'll keep it. All the participants were really beginners in a piggyback ride, but it was a very fulfilling 24 hours.

What I did in 24 hours

** Below is a chronological history. ** ** ・ Connect to aws ・ Check the type of AI solution ・ Confirmation of speech2text ** ・ Shogi AI learning ** ・ Pykakasi confirmation ** ・ Mecab installation @ AWS ** ** ・ Modified QA_conversation for AWS ** ** ・ Record to mp3 file with text2speech @ AWS ** ・ Batch download of mp3 files @TeraTerm **-Publishing mp3 files to S3 #confidential file was in the way ** ** ・ Automatic generation of S3 public audio file **

** The following could not be done due to time out ** 。。。・ Mp3 transfer & resend as the conversation progresses 。。。・ Voice text conversion by speech2text ・ Conversation on a smartphone ·Complete

To make an article

・ Learn Shogi AI in [email protected] environment ・ Porting conversation apps -Play the converted voice on the Web with polly's text2speech

・ Learn Shogi AI in [email protected] environment

The environment was mostly provided by the organizer, and it was a style of writing progress on the Wiki created on ec2. In other words, create an instance according to your app. So I chose the GPU environment p2.xlarge for machine learning. Now you can use all the frameworks for machine learning. Almost as shown below, but the request for relaxation of restrictions was not necessary because the organizer had set it in advance. 【reference】 If you want to run Chainer on GPU, it's easy to use Deep Learning Base AMI (Ubuntu).

If you also create a key pair, the instance will be launched safely

The flow is as in the above article, but I will add a little supplementary description. This key pair is important and you will need it when connecting to SSH, so you need to store it in a safe place. Also, when creating an instance, the following information is set and the file can be downloaded. This information is essential when creating an app using AWS solutions, so you need to keep it in a safe place.

aws_access_key_id                     
aws_secret_access_key

Connect with Teraterm

-Install TeraTerm. -Enter a host name such as [ec2- \ * \ *-\ * \ *-\ * \ * \ *-\ * \ *. Us-west-2.compute.amazonaws.com] in the host and OK. When you connect, you will be asked further, so enter the user name: ubuntu and the private key pair and connect with OK. -Once connected, copy and paste [source activate chainer_p36] from the following site and paste it back to activate the chainer_p36 environment. 【reference】 Chainer@aws -Zip a set of files, transfer them with the transfer function of TeraTerm, and expand them to an appropriate Dir of ec2.

This time I tried running train_policy_value_resnet.py of Shogi AI, but it took a lot of time compared to doing it in the local environment. For the time being, I was using the GPU, but since the CPU was 100%, I am using CuDNN locally, but I was disappointed that I did not check if it was installed.

This time it was not the main, so I didn't pursue it.
As shown in the reference below, other frameworks are also supported. 【reference】 Activating the framework

Conversation app port

The file is Use the nano @ ubuntu version of the conversation app the other day. I will send this material. Most of all, machine learning does not use DL, so I will recreate a new t2.large instance of ubuntu18.04 and run it. Compared to the price, it is an order of magnitude cheaper as shown below.

p2.xlarge 4 12 61 GiB EBS only 0.9USD/time
t2.large 2 variable 8 GiB EBS only 0.0928USD/time

【reference】 Amazon EC2 Pricing

・ Mecab installation @ AWS

The biggest challenge in porting is installing Mecab. I'm having trouble with any environment, but I'm having trouble with t2.large ubuntu 18.04 on AWS. The basics are as shown below same as for nano. 【reference】 Install mecab on ubuntu 18.10 However, this time too, I got a No module named'Mecab' error. The cause is unknown, but I reinstalled it below and it worked.

sudo pip3 install --no-cache-dir mecab-python3

Also, the dictionary entered without any trouble by the above method. I don't install pyaudio because I don't fix the sound generation, but I installed and confirmed pykakasi. Now it works when I remove the text2speak () function call in qa_conversation_nano.py.

・ Play the converted voice on the Web with text2speech of amazon polly

The app eventually wants to have a voice conversation. So, I wanted to use speech2text and text2speech if they existed, but I decided to use this because AWS only supports text2speech in Japanese.

Voice conversion with Polly's text2speech

AWS has a lot of explanations, but it seems that you need to get used to getting to the point. So I found the following code. If you use this, the text entered in Text will be converted to an audio file, and the mp3 file will be placed in the specified Dir of the ec2 server. Here, enter the key generated when the above instance was created on the right side of each of aws_access_key_id = and aws_secret_access_key = ,.

SynthesizeSpeech@AWS

import boto3

polly_client = boto3.Session(
                aws_access_key_id=,                     
    aws_secret_access_key=,
    region_name='us-west-2').client('polly')

response = polly_client.synthesize_speech(VoiceId='Joanna',
                OutputFormat='mp3', 
                Text = 'This is a sample text to be synthesized.')

file = open('speech.mp3', 'wb')
file.write(response['AudioStream'].read())
file.close()

For Japanese, set VoiceId ='Mizuki' and put Japanese in Text. It seems that you can make various adjustments, but this time it is the default due to time constraints. Amazon Polly Voice

Transfer from ec2 to s3

I was able to do it with the following code. Actually, there was a problem here. In other words, I got an error that the keys are different. This was because authentication was occurring during this transfer and it automatically went to look at the confidential file and read the old key. So, this time I deleted this file and uploaded it normally.

# -*- coding: utf-8 -*-
import boto3

s3 = boto3.resource('s3') #Get S3 object

bucket = s3.Bucket('s3 bucket-name')
bucket.upload_file('Specify ec2 mp3 file', 'dir specification of bucket of s3')

This time, it was carried out with boto3.resource, but for reference, it is carried out with the following boto3.client. 【reference】 Manipulate S3 files with boto3

s3_client = boto3.client('s3')

# Upload the file to S3
s3_client.upload_file('test.txt', 'bucket-name', 'test-remote.txt')

s3 Output the audio of the public server

At this point, you can generate audio with normal html.

【reference】

<html>
<body>
<figure>
    <figcaption>Listen to the Answer:</figcaption>
    <audio
        autoplay
        controls
        src="https://******.s3-****.amazonaws.com/mp3/speech0.mp3">
            Your browser does not support the
            <code>audio</code> element.
    </audio>
</figure>
</body>
</html>

Summary

・ I was able to make my AWS debut by participating in the "Own AWS Hackathon (tentative)" ・ I tried playing the conversation app on AWS ・ Text2speech was unclear

・ Speech2text did not support Japanese on AWS, so it is necessary to consider countermeasures. ・ Conversation on a smartphone