Transcription
Also known as speech-to-text (STT), transcription is the process of converting speech to text. We have built this endpoint with strong support for African languages.
If Python isn’t your vibe, try using the ReST API directly. Go to the API Reference section. We’re working to add SDKs for more languages.
Request
The transcribe()
function can be used to transcribe audio. Pass either a url
or a content
to the transcribe function.
Examples are provided below as a guide for you.
If you provide the url
, we will download the file from the specified location.
- You can provide either the
content
(file) orurl
(str), but do not provide both. - The maximum file size is 25MB, we will support larger sizes in the future.
- We only support
wav
andmp3
file formats. - If you provide
url
, ensure that access to the file is not blocked by authentication.
Response
The response for speech generation is in bytes.
- The Content-Type is
application/json
- A
request_id
is returned for issue resolution with our support team.
Below is an example of a response from the transcription endpoint.
Language Support
Our speech-to-text model supports the following languages:
- Hausa: ha
- Igbo: ig
- Yoruba: yo
- English: en
More info on languages can be found on the Languages page
When transcribing, you should use the language code (e.g. en
, yo
, ig
) and not the full text.