An example of real-time translation from English to Japanese at Zoom Meeting.
While Zoom easily crosses national borders, if you can't communicate in English, you won't be able to benefit from it, so I tried to build a simple mechanism.
As a rough flow,
Use Soundflower to perform real-time voice translation of internally routed voice from Zoom voice output using Microsoft Azure API in Python. The translation result sent by OSC from python is displayed in subtitles according to the webcam input with touch designer. Output using Siphon Spout Out in Touch Designer and let Zoom recognize it as a virtual webcam via CamTwist. A feeling of power.
Zoom doesn't have to be a professional account at all.
・ Mac Catalina ・ Python3.7 ・ Microsoft Azure account ・ Touch Designer ・ Soundflower ・ TwistCam
Download from here
Soundflower https://github.com/mattingalls/Soundflower/releases/tag/2.0b2 (Look carefully at the notes)
TwistCam http://camtwiststudio.com/
When installed, the item "sound flower" is displayed for both input and output in the sound menu of mac, so set input 2ch and output 2ch. This allows you to treat the sound you hear in Zoom as a microphone input. In windows, voice meeter banana is quite effective. So far, only soundflower has been found to work properly with mac.
Within Azure, use an API called Cognitive Services. https://azure.microsoft.com/ja-jp/services/cognitive-services/ Register from the following page. I also have a contract for the free trial version, so if you want to do it firmly, of course it costs money.
After registering, make a note of your subscription key and area code.
The sample code is here. https://github.com/Azure-Samples/cognitive-services-speech-sdk Download this. Of all the files in the python / console folder "YourSubscriptionKey", "YourServiceRegion" Rewrite.
Rewrite the inside of the translation_sample.py file to get the value of the real-time translation result from the voice input of mac.
Settings for OSC
#Beginning of sentence
from pythonosc import udp_client
from pythonosc.osc_message_builder import OscMessageBuilder
IP = '~'
PORT =Set appropriately
Set the translation destination to Japanese. Added code to send to touch designer in OSC.
def translation_continuous():
"""performs continuous speech translation from input from an audio file"""
# <TranslationContinuous>
# set up translation parameters: source language and target languages
translation_config = speechsdk.translation.SpeechTranslationConfig(
subscription=speech_key, region=service_region,
speech_recognition_language='en-US',
target_languages=('ja', 'fr'), voice_name="de-DE-Hedda")
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
# Creates a translation recognizer using and audio file as input.
recognizer = speechsdk.translation.TranslationRecognizer(
translation_config=translation_config, audio_config=audio_config)
def result_callback(event_type, evt):
"""callback to display a translation result"""
print("{}: {}\n\tTranslations: {}\n\tResult Json: {}".format(
event_type, evt, evt.result.translations['ja'], evt.result.json))
client = udp_client.UDPClient(IP, PORT);
msg = OscMessageBuilder(address='/translation')
msg.add_arg(evt.result.translations['ja'])
m = msg.build()
client.send(m)
done = False
#abridgement
Now, if you execute main.py in the console folder from the command prompt and play youtube in English appropriately, the translation result should be displayed in the console like this.
I've only used the touch designer a few times, so I'm groping. I think that this can also be implemented with oF.
Select the following nodes from the menu and connect them.
・ (TOP) video device in: webcam input ・ (TOP) Text: Display translated subtitles ・ (DAT) OSC In: Change subtitle text in response to OSC ・ (TOP) Over: Combine webcam video and subtitles ・ (TOP) Syphon Device Out: Output as syphon By the way, syphon seems to be open source for exchanging images between applications on Mac OSX.
In the osc node, enter the port selected in python and rewrite the code as follows.
def onReceiveOSC(dat, rowIndex, message, bytes, timeStamp, address, args, peer):
op("text2").par.text = message.strip("/translation ")
return
You should now see something like this:
Start TwistCam. Select syphon and you should see the touchDesigner item. Within this software, the output from Touch Designer can be converted into a virtual webcam.
This starts zoom. I think that Cam Twist appears in Zoom's camera selection, so if you select it, the touch designer screen will be the mainstay.
The accuracy is pretty good. If you rewrite the python code from Japanese to English, you should be able to do it immediately. It's not particularly difficult, but I used a lot of software, so make a note. Please comment if there is a better way.