Speech Generation
Also known as text-to-speech (TTS), speech generation/synthesis is integral part to modern AI systems. We have built this endpoint with strong support for African languages.
If Python isn’t your vibe, try using the ReST API directly. Go to the API Reference section. We’re working to add SDKs for more languages.
Request
The generate()
function can be used to generate speech. Examples are provided below as a guide for you.
We highly recommend that you perform tone-marking first before TTS. This allows the model to pronounce the words properly during speech generation.
Response
The response for speech generation is in bytes.
- The Content-Type is
audio/wav
- The content is streamed back to the caller.
- The file type of the generated audio is
wav
. If you use the streaming interface (Python SDK), you can start to take action on the byte chunks, e.g. stream to file.
Choosing a Voice
We currently have 8 characters with unique voices for the supported languages. Each of these characters has unique attributes, we think you will find them fun to use. Feel free to try them out and let us know which one you love the most. 😉
Not all voices work for all languages. Ensure you select the voice that matches the language of your choice.
More info on voices can be found on the Voices page
Language Support
The speech generation model supports the following languages:
- English:
en
- Hausa:
ha
- Igbo:
ig
- Yoruba:
yo
When generating speech, you should use the language code (e.g. en
, yo
, ig
) and not the full text.
More info on languages can be found on the Languages page