
It will be edited.

Text-to-Speech Demo

Below, we provide TTS samples depending on the datasets and TTS scenarios.
  • 1. Multi-speaker (VCTK) - Expressive TTS
  • 2. Emotional multi-speaker (ESD) - Expressive TTS
  • 3. Single speaker (LJSpeech) - General TTS
  • 4. Multi-speaker (VCTK) - General TTS
  • 5. Additional samples

1. Multi-speaker (VCTK) - Expressive TTS

1.1) Seen speaker scenarios :

Text: The game is now open.

Reference Speaker: p279

Reference DEX-TTS MetaStyleSpeech StyleTTS YourTTS GenerSpeech

Text: He also insisted that no concessions had been made to the French.

Reference Speaker: p300

Reference DEX-TTS MetaStyleSpeech StyleTTS YourTTS GenerSpeech

1.2) Unseen speaker scenarios:

Text: It was like a scene from the Holocaust.

Reference Speaker: p271

Reference DEX-TTS MetaStyleSpeech StyleTTS YourTTS GenerSpeech

Text: He said no survivors had been found.

Reference Speaker: p336

Reference DEX-TTS MetaStyleSpeech StyleTTS YourTTS GenerSpeech

2. Emotional multi-speaker (ESD) - Expressive TTS

2.1) Seen speaker scenarios :

Text: I suppose no, it doesn't!

Reference Speaker & Emotion: 0012_Surprise

Reference DEX-TTS MetaStyleSpeech StyleTTS YourTTS GenerSpeech

Text: Over them swooped the eagles.

Reference Speaker & Emotion: 0016_Sad

Reference DEX-TTS MetaStyleSpeech StyleTTS YourTTS GenerSpeech

Text: I don't paint a tiger.

Reference Speaker & Emotion: 0015_Angry

Reference DEX-TTS MetaStyleSpeech StyleTTS YourTTS GenerSpeech

2.2) Unseen speaker scenarios:

Text: Mum shuts one's mouth up, doesn't it.

Reference Speaker & Emotion: 0018_Neutral

Reference DEX-TTS MetaStyleSpeech StyleTTS YourTTS GenerSpeech

Text: Clear are your eyes and bright your breath!

Reference Speaker & Emotion: 0011_Angry

Reference DEX-TTS MetaStyleSpeech StyleTTS YourTTS GenerSpeech

Text: Fur flew through the air, teeth gnashed.

Reference Speaker & Emotion: 0018_Happy

Reference DEX-TTS MetaStyleSpeech StyleTTS YourTTS GenerSpeech

3. Single speaker (LJSpeech) - General TTS

Text: GeDEX TTS is the general TTS version of the diffusion-based expressive TTS model.

Text: To inquire into and report upon the several jails and houses of correction in the counties, cities, and corporate towns within England and Wales.

GT GeDEX-TTS FastSpeech2 Grad-TTS ComoSpeech

Text: Since there was no background to the New Orleans FPCC, quote, organization, end quote, which consisted solely of Oswald.

GT GeDEX-TTS FastSpeech2 Grad-TTS ComoSpeech

4. Multi-speaker (VCTK) - General TTS

Text: They have now been banned from Celtic Park for life.

Speaker: p273

GT GeDEX-TTS FastSpeech2 Grad-TTS ComoSpeech

Text: He will be a very hard act to follow.

Speaker: p280

GT GeDEX-TTS FastSpeech2 Grad-TTS ComoSpeech

5. Additional samples

5.1) Unseen emotion scenarios: The following samples are synthesized by DEX-TTS without the model being trained on the emotional category of the reference.

Text: And his heart wagged with joy like a lamb's tail.

Reference Speaker & Emotion: 0017_Sad

Reference DEX-TTS

Text: From August eighteenth, of their divorce.

Reference Speaker & Emotion: 0014_Surprise

Reference DEX-TTS

Text: A tick a tack too.

Reference Speaker & Emotion: 0012_Angry

Reference DEX-TTS

Text: She has a high voice.

Reference Speaker & Emotion: 0013_Happy

Reference DEX-TTS

Text: Enough, you a foolish chatter.

Reference Speaker & Emotion: 0019_Surprise

Reference DEX-TTS

5.2) Samples for the paper: We provide samples corresponding to the visualization results in the paper.

Text: Goat Billy asked the old Chinese guy.

Reference Speaker & Emotion: 0012_Happy

Reference DEX-TTS

Text: He dreamt them all night

Reference Speaker & Emotion: 0012_Neutral

Reference DEX-TTS

5.3) Samples depending on NFE: We provide samples based on the number of function evaluations (10, 25, 50) for the VCTK and LJSpeech datasets.

Text: There were no proposals on the table.

Reference Speaker: p300

Reference DEX-TTS-10 DEX-TTS-25 DEX-TTS-50

Text: Paying particular attention to the crowd for any unusual activity.


Thanks for your interest!