Also known as speech-to-text (STT), transcription is the process of converting speech to text. We have built this endpoint with strong support for African languages.

If Python isn’t your vibe, try using the ReST API directly. Go to the API Reference section. We’re working to add SDKs for more languages.

Request

The transcribe() function can be used to transcribe audio. Pass either a url or a content to the transcribe function. Examples are provided below as a guide for you.

If you provide the url, we will download the file from the specified location.

  • You can provide either the content (file) or url (str), but do not provide both.
  • The maximum file size is 25MB, we will support larger sizes in the future.
  • We only support wav and mp3 file formats.
  • If you provide url, ensure that access to the file is not blocked by authentication.

Response

The response for speech generation is in bytes.

  • The Content-Type is application/json
  • A request_id is returned for issue resolution with our support team.

Below is an example of a response from the transcription endpoint.

    {
      "request_id": "86095cea-77d5-45ba-a093-0f800ac2c7df",
      "text": "Báwo ni olólùfẹ́ mi?"
    }

Language Support

Our speech-to-text model supports the following languages:

  • Hausa: ha
  • Igbo: ig
  • Yoruba: yo
  • English: en

More info on languages can be found on the Languages page

When transcribing, you should use the language code (e.g. en, yo, ig) and not the full text.

Examples - file

Examples - url