Abstract
It will be edited.Text-to-Speech Demo
Below, we provide TTS samples depending on the datasets and TTS scenarios.- 1. Multi-speaker (VCTK) - Expressive TTS
- 2. Emotional multi-speaker (ESD) - Expressive TTS
- 3. Single speaker (LJSpeech) - General TTS
- 4. Multi-speaker (VCTK) - General TTS
- 5. Additional samples
1. Multi-speaker (VCTK) - Expressive TTS
1.1) Seen speaker scenarios :
Text: The game is now open.
Reference Speaker: p279
Reference | DEX-TTS | MetaStyleSpeech | StyleTTS | YourTTS | GenerSpeech |
---|---|---|---|---|---|
Text: He also insisted that no concessions had been made to the French.
Reference Speaker: p300
Reference | DEX-TTS | MetaStyleSpeech | StyleTTS | YourTTS | GenerSpeech |
---|---|---|---|---|---|
1.2) Unseen speaker scenarios:
Text: It was like a scene from the Holocaust.
Reference Speaker: p271
Reference | DEX-TTS | MetaStyleSpeech | StyleTTS | YourTTS | GenerSpeech |
---|---|---|---|---|---|
Text: He said no survivors had been found.
Reference Speaker: p336
Reference | DEX-TTS | MetaStyleSpeech | StyleTTS | YourTTS | GenerSpeech |
---|---|---|---|---|---|
2. Emotional multi-speaker (ESD) - Expressive TTS
2.1) Seen speaker scenarios :
Text: I suppose no, it doesn't!
Reference Speaker & Emotion: 0012_Surprise
Reference | DEX-TTS | MetaStyleSpeech | StyleTTS | YourTTS | GenerSpeech |
---|---|---|---|---|---|
Text: Over them swooped the eagles.
Reference Speaker & Emotion: 0016_Sad
Reference | DEX-TTS | MetaStyleSpeech | StyleTTS | YourTTS | GenerSpeech |
---|---|---|---|---|---|
Text: I don't paint a tiger.
Reference Speaker & Emotion: 0015_Angry
Reference | DEX-TTS | MetaStyleSpeech | StyleTTS | YourTTS | GenerSpeech |
---|---|---|---|---|---|
2.2) Unseen speaker scenarios:
Text: Mum shuts one's mouth up, doesn't it.
Reference Speaker & Emotion: 0018_Neutral
Reference | DEX-TTS | MetaStyleSpeech | StyleTTS | YourTTS | GenerSpeech |
---|---|---|---|---|---|
Text: Clear are your eyes and bright your breath!
Reference Speaker & Emotion: 0011_Angry
Reference | DEX-TTS | MetaStyleSpeech | StyleTTS | YourTTS | GenerSpeech |
---|---|---|---|---|---|
Text: Fur flew through the air, teeth gnashed.
Reference Speaker & Emotion: 0018_Happy
Reference | DEX-TTS | MetaStyleSpeech | StyleTTS | YourTTS | GenerSpeech |
---|---|---|---|---|---|
3. Single speaker (LJSpeech) - General TTS
Text: GeDEX TTS is the general TTS version of the diffusion-based expressive TTS model.
Text: To inquire into and report upon the several jails and houses of correction in the counties, cities, and corporate towns within England and Wales.
GT | GeDEX-TTS | FastSpeech2 | Grad-TTS | ComoSpeech |
---|---|---|---|---|
Text: Since there was no background to the New Orleans FPCC, quote, organization, end quote, which consisted solely of Oswald.
GT | GeDEX-TTS | FastSpeech2 | Grad-TTS | ComoSpeech |
---|---|---|---|---|
4. Multi-speaker (VCTK) - General TTS
Text: They have now been banned from Celtic Park for life.
Speaker: p273
GT | GeDEX-TTS | FastSpeech2 | Grad-TTS | ComoSpeech |
---|---|---|---|---|
Text: He will be a very hard act to follow.
Speaker: p280
GT | GeDEX-TTS | FastSpeech2 | Grad-TTS | ComoSpeech |
---|---|---|---|---|
5. Additional samples
5.1) Unseen emotion scenarios: The following samples are synthesized by DEX-TTS without the model being trained on the emotional category of the reference.
Text: And his heart wagged with joy like a lamb's tail.
Reference Speaker & Emotion: 0017_Sad
Reference | DEX-TTS |
---|---|
Text: From August eighteenth, of their divorce.
Reference Speaker & Emotion: 0014_Surprise
Reference | DEX-TTS |
---|---|
Text: A tick a tack too.
Reference Speaker & Emotion: 0012_Angry
Reference | DEX-TTS |
---|---|
Text: She has a high voice.
Reference Speaker & Emotion: 0013_Happy
Reference | DEX-TTS |
---|---|
Text: Enough, you a foolish chatter.
Reference Speaker & Emotion: 0019_Surprise
Reference | DEX-TTS |
---|---|
5.2) Samples for the paper: We provide samples corresponding to the visualization results in the paper.
Text: Goat Billy asked the old Chinese guy.
Reference Speaker & Emotion: 0012_Happy
Reference | DEX-TTS |
---|---|
Text: He dreamt them all night
Reference Speaker & Emotion: 0012_Neutral
Reference | DEX-TTS |
---|---|
5.3) Samples depending on NFE: We provide samples based on the number of function evaluations (10, 25, 50) for the VCTK and LJSpeech datasets.
Text: There were no proposals on the table.
Reference Speaker: p300
Reference | DEX-TTS-10 | DEX-TTS-25 | DEX-TTS-50 |
---|---|---|---|
Text: Paying particular attention to the crowd for any unusual activity.
GT | GeDEX-TTS-10 | GeDEX-TTS-25 | GeDEX-TTS-50 |
---|---|---|---|
Thanks for your interest!