Play the voice generated by Amazon Polly on Discord's Voice Channel.

Introduction

This article is the 20th day of Discord Advent Calendar 2017. You want a customizable chat-to-speech bot that runs on Discord, so let's make it with Amazon Polly. It is a story.

Assumed behavior

1.VoiceChannel: Resident in General Repeating in and out every time you post is noisy for users whose notifications are on. 2. Catch the content posted on TextChannel 3. Convert posted content to audio with Polly 4. Play the audio converted by Polly

Main tools to use

Java Discord4j Amazon Polly

First of all, from the start of Bot

First, implement the execution class based on the sample code of Discord4j

Main.java


public class Main {
	private static String TOKEN = "BOT_TOKEN";
	public static void main(final String[] args) {
		final IDiscordClient client = Main.createClient(TOKEN, true);
		final EventDispatcher dispatcher = client.getDispatcher();
		dispatcher.registerListener(new Listener());
	}

	public static IDiscordClient createClient(final String token, final boolean login) {
		final ClientBuilder clientBuilder = new ClientBuilder().withToken(token);
		if (login) {
			return clientBuilder.login();
		}
		return clientBuilder.build();
	}
}

Implementation of Listener

Discord4j registers Listener in EventDispatcher and processes various events. There are two ways to implement Listner, but this time we will implement it using the EventSubscriber annotation.

Listner.java


public class Listener {
	private final Synthesizer polly = new Synthesizer();
	private final static long GUILD_ID = 0l;
	private final static long VOICE_CHANNEL_ID = 0l;
	//Check and set the above ID in advance

	@EventSubscriber
	//Specify the channel you want to play audio and join
	public void onReadyEvent(final ReadyEvent event) {
		final IDiscordClient client = event.getClient();
		final IGuild guild = client.getGuildByID(GUILD_ID);
		final IVoiceChannel voiceChannel = guild.getVoiceChannelByID(VOICE_CHANNEL_ID);
		voiceChannel.join();
	}

	@EventSubscriber
	//An event that reacts when there is a post in the channel
	public void onMessageReceivedEvent(final MessageReceivedEvent event) {
		try {
			final IMessage message = event.getMessage();
			final IGuild guild = message.getGuild();
			if (guild.getLongID() != Listener.GUILD_ID) {
				log.info("Not an Event that occurred on the server of the Channel you are joining");
				return;
			}
			final String content = message.getContent();
			final String speechContent = content.replaceAll("<.+>", "");
			if (speechContent.isEmpty()) {
				return;
			}
			final IAudioManager audioManager = guild.getAudioManager();
			final AudioInputStream input = this.polly.synthesize(speechContent, OutputFormat.Mp3);
			final AudioInputStreamProvider provider = new AudioInputStreamProvider(input);
			audioManager.setAudioProvider(provider);
			log.info("content:{}To play.", speechContent);
		} catch (IOException | UnsupportedAudioFileException e) {
			log.warn("Exception occurred during MessageReceivedEvent processing", e);
		}
	}
}

By registering IAudioProvider in AudioManager that can be obtained from IGuild object, you can send audio from bot to VoiceChannel. There are several classes that inherit from IAudioProvider, but they use AudioInputStreamProvider, which has a high affinity with the Java standard library.

important point

Login to server

Guild on Discord refers to the registered server. In the above code, when ReadyEvent occurs, GUILD_ID is specified to log in. As a rough behavior, I think that there is no problem in recognizing that you will log in when you are ready, For more information on ReadyEvent, see Discord's Official Documentation (https://discordapp.com/developers/docs/topics/gateway#events).

Event processing restrictions

The bot can register more than just one server. Since onMessageReceivedEvent reacts to all posts visible to bot users, In the above code, the server you want to operate is restricted by GUID_ID in advance. On servers where TextChannel is thriving, you should also limit Channel, but this time it is omitted.

Handling of pictograms

You can receive the content posted to the text channel as a string with the getContent method of the IMessage class. At this time, if you post the emoji to TextChannel, Content will contain a character string in the format <: emoji ALIAS: emoji ID>. If you read it aloud, the emoji ID will be read aloud for a long time, so it's a little rough, but I'm cutting it with a regular expression.

Generate audio with Polly

Synthesizer.java


public class Synthesizer {
	private final AmazonPolly pollyClient;
	private final String languageCode = "ja-JP";
	private final Voice voice;
	private final Regions regions = Regions.AP_NORTHEAST_1;
	private final static String ACCESS_KEY = "ACCESS_KEY";
	private final static String SECRET_KEY = "SECRET_KEY";

	public Synthesizer() {
		final BasicAWSCredentials credentials = new BasicAWSCredentials(ACCESS_KEY, SECRET_KEY);
		this.pollyClient = AmazonPollyClientBuilder.standard()
				.withCredentials(new AWSStaticCredentialsProvider(credentials)).withRegion(this.regions).build();
		final DescribeVoicesRequest describeVoicesRequest = new DescribeVoicesRequest()
				.withLanguageCode(this.languageCode);
		final DescribeVoicesResult describeVoicesResult = this.pollyClient.describeVoices(describeVoicesRequest);
		final List<Voice> voices = describeVoicesResult.getVoices();
		//Takumi at 0
		//Mizuki at 1
		this.voice = voices.get(0);
	}

	public AudioInputStream synthesize(final String text, final OutputFormat format)
			throws UnsupportedAudioFileException, IOException {
		final SynthesizeSpeechRequest synthReq = new SynthesizeSpeechRequest().withText(text)
				.withVoiceId(this.voice.getId()).withOutputFormat(format);
		final SynthesizeSpeechResult synthRes = this.pollyClient.synthesizeSpeech(synthReq);
		final AudioInputStream audioStream = AudioSystem.getAudioInputStream(synthRes.getAudioStream());
		return audioStream;
	}
}

Implemented with reference to Sample Code. Only InputStream can be received from Polly's library, but it is converted to AudioInputStream using Java standard AudioSystem class.

important point

Polly's free tier is 5 million characters per month for 12 months from the first request. In Polly, two types of voices, one male voice and one female voice, can be used in Japanese.

At the end

The impression that you can do everything you want with Discord4j. Next, I would like to prepare Polly's voice settings for each user on the server and create an environment that is easy to hear.

Recommended Posts

Play the voice generated by Amazon Polly on Discord's Voice Channel.
Core dump generated by default on Ubuntu 18.04