[PYTHON] [AWS; Introduction to Lambda] 2nd; Extract sentences from json file and save S3 ♬

Last time translates the mp3 file placed in the s3: // bucket into text with the following code and converts the json file to OutputBucketName. S3; Placed in a bucket. This time, I will call this json file and extract the text-converted sentences. I purposely issued the code last time because the code is similar this time as well.

s3 = boto3.client('s3')
transcribe = boto3.client('transcribe')
def lambda_handler(event, context):
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
    try:
        transcribe.start_transcription_job(
            TranscriptionJobName= datetime.datetime.now().strftime('%Y%m%d%H%M%S') + '_Transcription',
            LanguageCode='ja-JP',
            Media={
                'MediaFileUri': 'https://s3.ap-northeast-1.amazonaws.com/' + bucket + '/' + key
            },
            OutputBucketName='lamoutput'
        )
...
        raise e

So, I was able to implement it with the following code. S3; The method of saving in the bucket is as a reference. 【reference】 ① [AWS Lambda basic code 2] Save file to S3Manipulate S3 objects with Boto3 (high level API and low level API) I have left a comment for Reference ①. It worked with almost the same code. The difference is that it incorporates How to handle json files the other day. First, Lib is as follows

#① Import of library
import boto3
import urllib.parse
from datetime import datetime
import json

The following defines the client by imitating from reference (2).

print('Loading function')      #(2) Output the function load to the log
s3 = boto3.resource('s3')      #③ Get S3 object
client = s3.meta.client

Getting the bucket and key with the lambda_handler is exactly the same as the transcript code above (of course ...).

#④ Lambda's main function
def lambda_handler(event, context):
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')

The following reads response ['Body'] from the json file with the same code as reference ②. However, I stumbled here. In other words, I thought that Japanese sentences would appear if this body.decode ('utf-8') was used. However, in reality, a fairly json-like (character string) appears. Initially, I didn't realize it was a string and thought it was a json file. So, I noticed that it was a character string, and I found that it can be converted to a json file with json.loads ,. .. .. I finally arrived at the code below. That is, body is a string.

    response = client.get_object(Bucket=bucket, Key=key)
    body = response['Body'].read()

Convert the string to a json file.

    dec = json.loads(body)

And because it is a json file, Japanese sentences could be easily extracted as follows.

    con_el=dec["results"]["transcripts"][0]["transcript"]
    print('contents=',con_el)

Contents = Hello Tokyo Yokohama also cloudy little voice is Mizuki's Finally, you can save it as a key-like timed .txt in the s3; bucket specified as follows.

    bucket = 'muauanpub'    #⑤ Specify the bucket name
    key = 'test_' + datetime.now().strftime('%Y-%m-%d-%H-%M-%S') + '.txt'  #⑥ Specify the key information of the object
    file_contents = con_el # 'Lambda test'  #⑦ File contents
    obj = s3.Object(bucket,key)     #⑧ Specify the bucket name and path
    obj.put( Body=file_contents )   #⑨ Output file to bucket
    return

Summary

・ I was able to extract sentences from the converted json file of the audio file and store it in the s3 bucket. -This is a two-step process, but when you put an mp3 file in the s3 bucket, the text-converted text itself is automatically saved in the s3 bucket. ・ For the time being, Teraterm → ec2 → s3 bucket transfer. .. .. Download from s3 bucket ⇒ Display was possible

-Also, if an application that transfers audio files to this s3 bucket and an application that displays the text file of the s3 bucket can be created, it seems that an audio file-text conversion application that is easier to use will be created (Web conversion). -Even if the conversion time is long, both Lambda functions are started asynchronously, so it seems to be a money- and time-friendly app.

Recommended Posts

[AWS; Introduction to Lambda] 2nd; Extract sentences from json file and save S3 ♬
[Python] Convert CSV file uploaded to S3 to JSON file with AWS Lambda
Script to generate directory from json file
[AWS] Link Lambda and S3 with boto3
Connect to s3 with AWS Lambda Python
[Introduction to AWS] Text-Voice conversion and playing ♪
Terraform configured to launch AWS Lambda from Amazon SQS
Python script to create a JSON file from a CSV file
Extract data from S3
Extract features (features) from sentences.
Tweet from AWS Lambda
Output CloudWatch Logs to S3 with AWS Lambda (Pythyon ver)
[Python] Regularly export from CloudWatch Logs to S3 with Lambda
A quick explanation from creating AWS Lambda Layers to linking
Send a request from AWS Lambda to Amazon Elasticsearch Service
Process the gzip file UNLOADed with Redshift with Python of Lambda, gzip it again and upload it to S3