🌬️ Sopro TTS - Zero-Shot Voice Cloning

A lightweight (135M parameter) text-to-speech model with zero-shot voice cloning by Samuel Vitorino. Upload a 3-12 second audio clip to clone a voice!

⚠️ Disclaimers

  • Sopro can be inconsistent. If the output sounds glitchy, try tweaking the Temperature and Style Strength.
  • Voice cloning quality is highly dependent on the microphone quality and ambient noise of the reference audio.
  • Generation length is currently capped at ~32 seconds to prevent hallucinations.