Is guiding the pronunciation possible? #488

S-T-K · 2025-01-10T11:23:50Z

S-T-K
Jan 10, 2025

Is there a way to influence the pronunciation, pacing, and emotion in the TTS output?
For instance, in ElevenLabs, placing quotation marks around a word can create stronger emphasis. The only methods I’ve found to actively control pacing involve using punctuation marks (e.g., . , ; : ? !) or adding ellipses or dashes for pauses, see https://github.com/erew123/alltalk_tts?tab=readme-ov-file#-tricks-to-get-the-model-to-say-things-correctly
Any other adjustments appear to be ignored.

finefin · 2025-01-17T13:05:36Z

finefin
Jan 17, 2025

That depends on the model. A new model was released a few days ago that seems to be able to do what you ask for:
https://huggingface.co/OuteAI/OuteTTS-0.3-1B

1 reply

S-T-K Jan 21, 2025
Author

Oh didn't know about that one, thanks for pointing it out to me!
Tried it for a while, and yes it's easier/possible to guide the emotional tone. I found it a bit lacking in overall sound and prosody fidelity compared to xttsv2 though, unfortunately

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Is guiding the pronunciation possible? #488

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Is guiding the pronunciation possible? #488

Uh oh!

S-T-K Jan 10, 2025

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

finefin Jan 17, 2025

Uh oh!

S-T-K Jan 21, 2025 Author

S-T-K
Jan 10, 2025

Replies: 1 comment 1 reply

finefin
Jan 17, 2025

S-T-K Jan 21, 2025
Author