🌬️ Sopro TTS - Zero-Shot Voice Cloning
A lightweight (135M parameter) text-to-speech model with zero-shot voice cloning by Samuel Vitorino. Upload a 3-12 second audio clip to clone a voice!
⚠️ Disclaimers
- Sopro can be inconsistent. If the output sounds glitchy, try tweaking the Temperature and Style Strength.
- Voice cloning quality is highly dependent on the microphone quality and ambient noise of the reference audio.
- Generation length is currently capped at ~32 seconds to prevent hallucinations.