seedhartha

Speech Synthesis using NVIDIA Flowtron

Recommended Posts

Hello,

I have recently discovered this video, which got me interested in a field of speech synthesis. I have researched this subject and, as it turns out, today technology is quite capable of producing semi-realistic speech, which, I believe, is more than good enough for modders to do voice-overs. I was able to replicate results shown in this video, and while at it, made some Python scripts and sort of a guide on how to train a model on Bastila's (or any other character's) voice. Here's the link:

https://github.com/seedhartha/reone/wiki/TTS-Research

Training an AI model is a tedious and error-prone process, but this step can be avoided altogether if you use a pre-trained model. I'm not sure I want to make my models public due to legal concerns, but contact me if you're interested in the topic (i.e. you need a voice-over for your mod), and we can work something out.

  • Thanks 1

Share this post


Link to post
Share on other sites

I believe @lachjames is also working on speech synthesis.  Perhaps you should discuss the matter in a PM?

At the very least you can ask him to share all of the files that I sent him so you have extra material to train your models,

Share this post


Link to post
Share on other sites
5 minutes ago, Sith Holocron said:

I believe @lachjames is also working on speech synthesis.  Perhaps you should discuss the matter in a PM?

At the very least you can ask him to share all of the files that I sent him so you have extra material to train your models,

Yes, we are in close contact with @lachjames on the matter. I'm curious, what are these extra materials you're talking about? My primary source is TLK and DLG files, and I have sufficient tooling to extract this data.

  • Like 1

Share this post


Link to post
Share on other sites

To give you an idea, here's what I managed so far. This really isn't the best example, and @lachjames got a lot further than that. These models were both trained for 10,000 iterations (around 1 hour of fine-tuning, after which the validation loss plateaus) using a method I described on my wiki page.

Just to make it perfectly clear, this is not the project I'm actively working on, but rather a discovery I wanted to share, and a guide on how to get similar results.

 

atton_as_canderous.mp3 bastila_as_kreia.mp3

Share this post


Link to post
Share on other sites
Guest Qui-Gon Glenn

My phone is somehow missing the codec required to listen to this. Regardless, interesting this is.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.