[PYTHON] Achieve "Bals" with Amazon Echo

This post is I'll do it again this year! AWS Lambda tied up Advent Calendar 2015 --Qiita is the 18th day article. It's tied to Lambda, but this article has a lot of ingredients in Amaozon Echo and Alexa Skills Kit ...

What is Barus?

Needless to say, the horrifying words of destruction. Our ancestors have already created commands that will limit the destruction https://github.com/qphoney/balus This time, I'm not so ruthless.

What is Amazon Echo?

It can answer questions and cooperate with services by voice recognition. You can add features by using the Amazon Skills Kit. Third-party Skills can be installed and used from a dedicated site after screening. https://youtu.be/7Jc82wIL7m4 IMAGE ALT TEXT HERE

What is Amazon Skills Kit?

Alexa is a cloud-based speech recognition service used by Amazon Echo. The Alexa Skill Kit provides the environment you need to easily build the features (Skills) that Alexa can use. Check out the Amazon Alexa Skills Kit

Create Lambda function for Alexa Skills Kit (ASK)

Lambda becomes Python in re: Invent the other day It was announced that it was supported, but Lambda can now be used with ASK functions. However, since ASK only supports the ʻus-east-1` region, the ASK template will not appear in the Lambda template list in other regions.

スクリーンショット 2015-12-15 16.35.37.png

This time I modified this template a little and implemented Barus.

# -*- coding: utf-8 -*-
from __future__ import print_function
import boto3
from time import gmtime, strftime

client = boto3.client('ec2')

#Entry point
def lambda_handler(event, context):
    print(strftime('%a, %d %b %Y %H:%M:%S +0000', gmtime()))
    print(event)

    if event['request']['type'] == "LaunchRequest":
        #request to start skill
        return on_launch(event['request'], event['session'])
    elif event['request']['type'] == "IntentRequest":
        #Intent call
        return on_intent(event['request'], event['session'])

    print("nothing and finish")
    return get_finish_response()

def on_launch(launch_request, session):
    #Get the word of magic
    return get_charm_response()

def on_intent(intent_request, session):

    intent = intent_request['intent']
    intent_name = intent_request['intent']['name']
    #If you don't include the words of destruction, you're done
    if 'Barusu' not in intent['slots'] or \
       'value' not in intent['slots']['Barusu'] or \
       not intent_name == "RunHorobi":
        return get_finish_response()
    
    print(intent['slots']['Barusu']['value'])

    stop_instance()
    return get_horobi_response()

#Generate a magic word response
def get_charm_response():

    session_attributes = {}
    card_title = "Charm"
    audio_url = "https://url/to/your/audio.mp3"
    should_end_session = False
    return build_response(session_attributes, build_audio_response(
        card_title, speech_output, audio_url, should_end_session))

#Generate a response that returns when the word of destruction is said
def get_horobi_response():

    session_attributes = {}
    card_title = "Horobi"
    speech_output = "Megaaaaa!"
    reprompt_text = speech_output
    should_end_session = True
    return build_response(session_attributes, build_speechlet_response(
        card_title, speech_output, reprompt_text, should_end_session))

#Generate a response when nothing is done
def get_finish_response():

    session_attributes = {}
    card_title = "Words that should not be used"
    speech_output = "Don't say the word of horobee"
    reprompt_text = speech_output
    should_end_session = True
    return build_response(session_attributes, build_speechlet_response(
        card_title, speech_output, reprompt_text, should_end_session))

#Get an instance to stop at Barus
def get_instances():
    response = client.describe_instances(
        Filters=[
            {
                'Name': 'tag-value','Values': [
                    'laputa',
                ]
            }
        ]
    )
    instance_ids = []
    for res in response['Reservations']:
        for item in res['Instances']:
            instance_ids.append(item['InstanceId'])
    return instance_ids

#Start instance
def start_instance():
    print('start_instance')
    response = client.start_instances(
        InstanceIds=get_instances()
    )

#Terminate the instance
def stop_instance():
    print('stop_instance')
    response = client.stop_instances(
        InstanceIds=get_instances()
    )

#Generate JSON of return value
def build_speechlet_response(title, output, reprompt_text, should_end_session):
    return {
        'outputSpeech': {
            'type': 'PlainText',
            'text': output
        },
        'card': {
            'type': 'Simple',
            'title': 'SessionSpeechlet - ' + title,
            'content': 'SessionSpeechlet - ' + output
        },
        'reprompt': {
            'outputSpeech': {
                'type': 'PlainText',
                'text': reprompt_text
            }
        },
        'shouldEndSession': should_end_session
    }

#Generate SSML format return value JSON
def build_audio_response(title, output, audio_url, should_end_session):
    return {
        'outputSpeech': {
            'type': 'SSML',
            'ssml': '<speak><audio src="{0}" /></speak>'.format(audio_url)
        },
        'card': {
            'type': 'Simple',
            'title': 'SessionSpeechlet - ' + title,
            'content': 'SessionSpeechlet - ' + output
        },
        'reprompt': {
            'outputSpeech': {
                'type': 'SSML',
                'ssml': '<speak><audio src="{0}" /></speak>'.format(audio_url)
            }
        },
        'shouldEndSession': should_end_session
    }

#Overall return value
def build_response(session_attributes, speechlet_response):
    return {
        'version': '1.0',
        'sessionAttributes': session_attributes,
        'response': speechlet_response
    }

In this script, talk to ʻAlexa, run laputaand Amazon Echo, and they will answer the magicalLite Latobarita Urs Ariaros Bar Netril` in case of trouble.

Alexa Skills returns the text and speaks on Amazon Echo, and [SSML (Speech Synthesis Markup Language)](https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/ There is a way to speak using speech-synthesis-markup-language-ssml-reference), but [the other day's announcement](https://developer.amazon.com/public/community/post/Tx3FXYSTHS579WO/Announcing-New- Alexa-Skills-Kit-ASK-Features-SSML-Audio-Tags-and-Developer-Porta) can now handle arbitrary audio data. Alexa pronunciation is based on English, so I specify the voice-synthesized mp3 file with SSML. (At build_audio_response)

Audio file restrictions

The audio files that Amazon Echo can play have small restrictions and must be in the following format.

--Valid MP3 file (MPEG version 2) --Within 90 seconds --Bit rate is 48kbps --Sample rate is 16000 Hz --It must be an SSL certificate that can be accessed via https and can be trusted (I can't do it) --Do not include personal information or confidential information

When using ffmpeg on mac, the following command

ffmpeg -i input.mp3 -ac 2 -codec:a libmp3lame -b:a 48k -ar 16000 output.mp3

Add Alexa Skills

Skill information https://developer.amazon.com/edw/home.html#/ Select ʻAlexa Skills Kit from the dashboard and register a new one from ʻAdd a New Skill. スクリーンショット 2015-12-15 18.09.39.png Select Lambda in ʻEndpoint` and specify the ARN of the Lambda function you just created.

Intent Schema

{
  "intents": [
    {
      "intent": "RunHorobi",
      "slots": [
        {
          "name": "Barusu",
          "type": "LIST_OF_BARUSU"
        }
      ]
    }
  ]
}

Set the definition of the keyword attached to the intent (function on the Lambda function side).

Custom Slot Types

barusu
bars
barus
barsu

Register the corresponding keyword. (However, it seems that it is unavoidable as far as I heard at the workshop that the Lambda function is read out in response to keywords that are not registered here.)

Sample Utterances

RunHorobi {Barusu}

It specifies which intent to call according to the spoken word. There are several patterns of how the user actually speaks, so if you assume that and associate multiple sentences with the intent, it will be easier for the ASK side to identify.

test

ASK has a mechanism that allows you to test Skill without using Amazon Echo. スクリーンショット 2015-12-15 19.06.37.png You can use it to test your Lambda functions so you don't have to talk to Echo in debugging. Click the play button at the bottom right and it will speak the return text. (However, SSML is not currently supported)

Actual machine confirmation

You can check the words actually spoken and the results on the Dashboard on the Echo registered as your device. .. スクリーンショット 2015-12-15 19.11.29.png

https://youtu.be/7Jc82wIL7m4 IMAGE ALT TEXT HERE

Summary

By using Amazon Echo, you can now create a more realistic "Bals". (Voice recognition flying stones make you feel more excited) However, Amazon Echo doesn't support Japanese yet, so the expression "eyes!" Is too disappointing. I hope that you will respond as soon as possible.

Serious story

Now that Lambda can be written in Python, it's much easier to code personally. With the addition of Skill tests and simulators, you can now build pretty well without Amazon Echo.

Recommended Posts

Achieve "Bals" with Amazon Echo
Face recognition with Amazon Rekognition
Prepare pipenv environment with amazon Linux 2
Achieve Linux/dev/null with Windows system function
Automatically send emails with Amazon SES
Subtitle data created with Amazon Transcribe
Achieve a Netflix-like UI with FlexSlider2
Achieve pytorch reflection padding with Tensorflow