New AI Tools
banner

MegaTTS3


Introduction:

MegaTTS3 is a lightweight and efficient speech synthesis model developed by ByteDance, supporting both Chinese and English, with features such as high-quality voice cloning and accent control.









MegaTTS3

MegaTTS3 is an open-source lightweight and efficient text-to-speech (TTS) synthesis system developed by ByteDance. Its main features include:

  • Lightweight and Efficient: The backbone network of TTS Diffusion Transformer has only 0.45B parameters.
  • High-Quality Voice Cloning: It has excellent voice cloning capabilities, capable of generating similar voices based on provided audio samples.
  • Bilingual Support: Supports Chinese and English, as well as mixed-language contexts.
  • Controllability: Supports accent intensity control and plans to support more precise pronunciation/duration adjustments.

The project also includes other useful sub-modules, such as:

  • Aligner: A speech-text alignment model that can be used for tasks like dataset filtering, speech segmentation, and phoneme recognition.
  • Grapheme-to-Phoneme Model: A grapheme-to-phoneme conversion model.
  • WaveVAE: A waveform VAE used to compress speech into low-dimensional acoustic latent variables, which can serve as training targets for speech synthesis models or be used for speech conversion.

In summary, MegaTTS3 is a powerful and flexible TTS system with high-quality voice cloning capabilities and bilingual support, while also providing a series of useful tools to support speech processing tasks.