Have Google Text-to-Speech create audio data (narration) for video material (with C # and Python samples)

Overview

This is an introduction memo of the speech synthesis service "** Cloud Text-to-Speech **" provided by Google. I explained in as much detail as possible the flow from enabling the API service to obtaining the authentication file and calling it from your own program (C # or Python).

Basically, screen the contents of the official "Quickstart: Use Client Library" It is explained with a shot (note that steps that seem unnecessary are skipped).

** Cloud Text-to-Speech ** is a cloud service that generates ** read-aloud voice data ** (.mp3) from text data (Japanese OK). It is possible to output natural audio that is quite close to humans. You can check the quality by giving any text (Japanese is also OK) from here.

Register with Google Cloud Platform

Register as a user from Google Cloud (https://cloud.google.com/?hl=ja).

You can use your Google Count for a free trial. However, a credit card is required at the time of registration. However, after the period ends, it will not be automatically transferred to a paid account **, and even if you switch to a paid count, the fee will be very reasonable (personally). Let's register without much effort.

2020-04-30_10h18_41.png

Estimated usage fee

-Text-to-Speech charges
<img width = "565" alt = "2020-04-" 30_14h28_42.png "src =" https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/152805/c1970995-c98a-5d21-529d-23d894916cfb.png ">

Below, we will proceed with the explanation as ** registered ** on Google Cloud.
2020-04-30_14h35_05.png

Text-to-Speech service activation and authentication file acquisition

Access Google Cloud Platform and log in.
2020-04-30_13h13_35.png

A dialog will be displayed. Select "** New Project **".
2020-04-30_13h23_04.png

Enter an appropriate project name (here, Text To Speech 20xxx) and click" ** Create ** ".
2020-04-30_13h24_501.png

You'll be returned to the dashboard, ** switch to the project you just created **.
2020-04-30_13h27_19.png

Click the menu in the upper left to go to "** APIs and Services ", " Dashboard **".
2020-04-30_13h32_01.png

Select ** Enable APIs and Services **.
2020-04-30_13h34_24.png

Type Text to Speech in the text box.
2020-04-30_13h36_53.png

Select ** Cloud Text-to-Speech API **. 2020-04-30_13h37_58.png

Select ** Enable **.
2020-04-30_13h39_35.png

Select "** Create Credentials **" that is required to use the service from your own program.
2020-04-30_13h40_59.png

You will be taken to the "** Add Credentials to Project " screen, select " Cloud Text-to-Speech API " from the drop-down list below, and then " Required Authentication" Click "Information **".

The display will switch, so select "** No, not in use " and click " Required credentials **" again.
2020-04-30_13h53_29.png

Enter the appropriate ** service account name ** (here, test). ** No role is selected **. In addition, the "service account ID" is automatically generated. Select ** Next **.
2020-04-30_13h54_42.png

The following dialog will be displayed. Select "** Create without role **".
2020-04-30_13h57_51.png

The following dialog will be displayed, and the ** JSON file ** containing the authentication information will be downloaded to your PC.
2020-04-30_13h58_37.png

Suppose you rename this file to " credentials.json "and place it in" C: \ Users \ xxx \ Desktop".

In the official Quick Start, the path of this file is ** environment variable ** Explains how to register as GOOGLE_APPLICATION_CREDENTIALS and refer to the information via environment variables in the program. On the other hand, ** Here, the method is to refer to the information by directly specifying the path from the program without registering it in the environment variable </ font> **.

Call from a C # (.NET Core) program

Start Visual Studio and select [** File ]-[ New ]-[ Project ], then select " Visual C # "-" Console App (.NET Core) * *"Choose.

Select [** Tools ]-[ NuGet Package Manager ]-[ Package Manager Console **] from the menu. Enter ʻInstall-Package Google.Cloud.TextToSpeech.V1 -Pre` in the console to run it.

PM> Install-Package Google.Cloud.TextToSpeech.V1 -Pre

Rewrite the contents of Program.cs as follows.

Program.cs


using System;
using System.IO;
using Google.Cloud.TextToSpeech.V1;
using System.Diagnostics;

public class QuickStart {
  public static void Main(string[] args) {

    var credentialsFilePath = @"C:\Users\xxx\Desktop\credentials.json";

    var textToSpeechClientBuilder = new TextToSpeechClientBuilder() {
      CredentialsPath = credentialsFilePath
    };
    var client = textToSpeechClientBuilder.Build();

    //Read-aloud text settings
    SynthesisInput input = new SynthesisInput {
      Text = "The destination is Nihonbashi."
    };

    //Voice type setting
    VoiceSelectionParams voice = new VoiceSelectionParams {
      Name = "ja-JP-Wavenet-D",
      LanguageCode = "ja-JP",
      SsmlGender = SsmlVoiceGender.Neutral
    };

    //Audio output settings
    AudioConfig config = new AudioConfig {
      AudioEncoding = AudioEncoding.Mp3,
      Pitch = -2.0
    };

    // Text-to-Generate Speech request
    var response = client.SynthesizeSpeech(new SynthesizeSpeechRequest {
      Input = input,
      Voice = voice,
      AudioConfig = config
    });

    // Text-to-Saving Speech response (voice file)
    var fileName = DateTime.Now.ToString("yyyy-MM-dd_HHmmss") + ".mp3";
    using (Stream output = File.Create(fileName)) {
      response.AudioContent.WriteTo(output);
      Console.WriteLine($"Audio content'{fileName}'Saved as.");
    }

    Console.WriteLine("Do you want to open the folder where you output the file?[Y]/n");
    var k = Console.ReadKey();
    if (k.Key != ConsoleKey.N && k.Key != ConsoleKey.Escape) {
      Process.Start("explorer.exe", Directory.GetCurrentDirectory());
    }
  }
}

When executed, an MP3 file "Mokukichi is Nihonbashi" will be generated.

In addition, ** Speech Synthesis Markup Language ** (SSML) is also supported, and if you change it as follows It reads out, "Mokukichi is not ** Nipponbashi **, but ** Nipponbashi **." You can also insert ** intervals **, such as by <break time =" 200ms "/>.

SSML format


SynthesisInput input = new SynthesisInput {
  Ssml = "<speak>The destination is not Nihonbashi,<sub alias='Nipponbashi'>Nihonbashi</sub>is.</speak>".Replace("'", "\"")
};

Call from a Python program

pip install --upgrade google-cloud-texttospeech

python


from datetime import datetime
from pytz import timezone
from google.cloud import texttospeech
from google.oauth2 import service_account

credentials = service_account.Credentials.from_service_account_file('credentials.json')
client = texttospeech.TextToSpeechClient(credentials=credentials)

synthesis_input = texttospeech.types.SynthesisInput(
  text='The destination is Akihabara.')

voice = texttospeech.types.VoiceSelectionParams(
  language_code='ja-JP',
  name='ja-JP-Wavenet-D',
  ssml_gender=texttospeech.enums.SsmlVoiceGender.NEUTRAL)

audio_config = texttospeech.types.AudioConfig(
  audio_encoding=texttospeech.enums.AudioEncoding.MP3,
  pitch = -2.0
  )

response = client.synthesize_speech(synthesis_input, voice, audio_config)

now = datetime.now(timezone('Asia/Tokyo'))
filename = now.strftime('%Y-%m-%d_%H%M%S.mp3')
with open(filename, 'wb') as out:
  out.write(response.audio_content)
  print(f'Audio content written to file {filename}')

Recommended Posts

Have Google Text-to-Speech create audio data (narration) for video material (with C # and Python samples)
Benchmark for C, Java and Python with prime factorization
Get data from analytics API with Google API Client for python
Create an audio file with the text-to-speech function with Google Text To Speak and check the text as a guide for the speech for 3 minutes.
Create AtCoder Contest appointments on Google Calendar with Python and GAS
Create a striped illusion with gamma correction for Python3 and openCV3
Create a C ++ and Python execution environment with WSL2 + Docker + VSCode
Create a USB boot Ubuntu with a Python environment for data analysis
Create Awaitable with Python / C API
Create noise-filled audio data with SoX
Create and edit spreadsheets in any folder on Google Drive with python
Sensor data acquisition and visualization for plant growth with Intel Edison and Python
I have 0 years of programming experience and challenge data processing with python
Edit Slide (PowerPoint for Google) with Python (Low-cost RPA case with Google API and Python)
Create and decrypt Caesar cipher with python
RaspberryPi L Chika with Python and C #
Exchange encrypted data between Python and C #
Data acquisition from analytics API with Google API Client for python Part 2 Web application
How to use Service Account OAuth and API with Google API Client for python
Align the number of samples between classes of data for machine learning with Python
Create test data like that with Python (Part 1)
Causal reasoning and causal search with Python (for beginners)
Try running Google Chrome with Python and Selenium
Create a LINE BOT with Minette for Python
Wrap C with Cython for use from Python
Wrap C ++ with Cython for use from Python
Test python models and functions deployed online with Cloud Pack for Data with form-formatted input data
Quickly create a Python data analysis dashboard with Streamlit and deploy it to AWS
For those who are new to programming but have decided to analyze data with Python