The method of converting text data into synthetic speech is called TextToSpeech (TTS). I haven't learned TextToSpeech this time, but if the text data is input in Japanese, I will record the failure story that the transliteration_cleaners of Tacotron2 system could not convert to Romaji well.
NVIDIA's TextToSpeech includes: This time, I tried flowtron, but it seems that the failure of unitecode for Japanese input is common in other versions. https://github.com/NVIDIA/flowtron https://github.com/NVIDIA/mellotron https://github.com/NVIDIA/tacotron2 (By the way, I don't know the difference between them in detail)
I haven't prepared the training data and haven't trained the model, but to train the original data, create the file list by yourself as shown below.
train.py
...
data_config['training_files'] = 'filelists/train_filelist.txt'
data_config['validation_files'] = 'filelists/validation_filelist.txt'
data_config['text_cleaners'] = ['transliteration_cleaners']
train(n_gpus, rank, **train_config)
I think you need to write the file location, audio text, and speaker ID in the file list. I think it is necessary to prevent the speaker IDs from being duplicated in learning data in which multiple speakers are mixed. (Perhaps)
data.py
def load_filepaths_and_text(filename, split="|"):
with open(filename, encoding='cp932') as f: #Change encoding to cp932(For windows)
...
def get_text(self, text):
print(text) #add to
text = _clean_text(text, self.text_cleaners)
print(text) #add to
text/cleaners.py
def transliteration_cleaners(text):
'''Pipeline for non-English text that transliterates to ASCII.'''
text = convert_to_ascii(text)
text = lowercase(text)
text = collapse_whitespace(text)
return text
And if you want to read Japanese instead of English, you may need to change the encoding to cp932 and change the cleaners to ['transliteration_cleaners']. This is'''Pipeline for non-English text that transliterates to ASCIIʻ'** (a pipeline that transliterates non-English text into ASCII) **, so I wonder if this is appropriate for Japanese input for a moment. think. I thought so.
This is the output result of the print () statement added to def get_text. I was able to confirm that the hiragana and katakana "test" was successfully converted. On the other hand, Kanji has been converted to Chinese phonemes.
python
Epoch: 0
It's a test.
tesutodesu.
Tokyo Patent Approval Office
dong jing te xu xu ke ju
Testing the microphone.
maikunotesutozhong .
In the first place, a library called unitecode is used for conversion from Japanese (Unicode) to ASCII.
python
from unidecode import unidecode
def convert_to_ascii(text):
return unidecode(text)
I've seen some conversions for this unitecode.
python
# coding: cp932
from unidecode import unidecode
text1 = 'AIUEO'
text2 = unidecode(text1)
print(text1)
print(text2)
text1 = 'a-I-U-E-O'
text2 = unidecode(text1)
print(text1)
print(text2)
text1 = 'Compatibility'
text2 = unidecode(text1)
print(text1)
print(text2)
text1 = 'Consultation'
text2 = unidecode(text1)
print(text1)
print(text2)
text1 = 'This way'
text2 = unidecode(text1)
print(text1)
print(text2)
text1 = 'Kotei'
text2 = unidecode(text1)
print(text1)
print(text2)
text1 = 'Koote'
text2 = unidecode(text1)
print(text1)
print(text2)
text1 = 'Kotei'
text2 = unidecode(text1)
print(text1)
print(text2)
text1 = 'This'
text2 = unidecode(text1)
print(text1)
print(text2)
text1 = 'This-Was'
text2 = unidecode(text1)
print(text1)
print(text2)
text1 = 'Cat'
text2 = unidecode(text1)
print(text1)
print(text2)
text1 = 'Cat'
text2 = unidecode(text1)
print(text1)
print(text2)
text1 = 'Kanai'
text2 = unidecode(text1)
print(text1)
print(text2)
text1 = 'Crab'
text2 = unidecode(text1)
print(text1)
print(text2)
......
AIUEO
aiueo
a-I-U-E-O
aiueo
Compatibility
Xiang Xing
Consultation
Xiang Tan
This way
koutei
Kotei
koutei
Koote
kootei
Kotei
kotei
This
ko~tei
This-Was
ko-tei
Cat
kiyatsuto
Cat
kiyatsuto
Kanai
kani
Crab
kani
・ Kanji is converted to Chinese ・ "Aiuuyayuyo" is the same as "Aiueoya Yuyotsu" ・ "-" Is not recognized. ・ The conversion of "kani" and "crab" is the same There are many problems.
Therefore, unitecode is not suitable for Japanese conversion in the first place.
When using pykakasi, it became as follows. Incomplete conversion of unidecode has been improved. Also, .setMode ('s', True) will automatically insert a space for each word.
python
# coding: cp932
from pykakasi import kakasi
kakasi = kakasi()
kakasi.setMode('H', 'a')
kakasi.setMode('K', 'a')
kakasi.setMode('J', 'a')
kakasi.setMode('E', 'a')
kakasi.setMode('s', True)
conv = kakasi.getConverter()
text = 'Aiueo and Aiueo.'
print(conv.do(text))
text = 'Compatibility and consultation'
print(conv.do(text))
text = 'Cat and cat'
print(conv.do(text))
text = 'Files and files'
print(conv.do(text))
text = 'Kotei Kotei Kotei Kotei Kotei'
print(conv.do(text))
text = 'Tokyo Patent Approval Office'
print(conv.do(text))
text = 'Simple and crab'
print(conv.do(text))
aiueo, to aiueo.
aishou to soudan
kyatto to kiyatsuto
fairu to fuairu
koutei to koutei to kootei to kootei to ko ~ tei
toukyou tokkyo kyoka kyoku
kan'i to kani
Need to install OpenJTalk? In this case, it seems to be decomposed not by word but by syllable. I don't know which is better to divide by words (maybe it depends on the learning model)
python
import pyopenjtalk
print(pyopenjtalk.g2p("Hello"))
'k o N n i ch i w a'
Tacotron2 series unitecode is not suitable for Japanese input, and it is wrong to use ** transliteration_cleaners. ** Therefore, if you want to translate the learning data into Japanese, you should create your own japanease_cleaners in text / cleaners.py. (Or do you prepare learning data converted to Romaji in advance?)
Recommended Posts