27.3 openai transcribe


The whisper package provides the functionality for the transcribe command. We expose the command through the MLHub package, but also note that the package provides the whisper command line utility as well. You can use either. The MLHub package conforms to other transcribe commands from other packages whilst the whisper command provides many more options. A particularly nice feature is to include output to json, srt, tsv, txt, and vtt formats, which include video subtitles.

The input can be any of wav, mp4, mp3, flac.

There are many options available but not yet exposed through the MLHub package.

wget https://github.com/realpython/python-speech-recognition/raw/master/audio_files/harvard.wav
ml transcribe openai harvard.wav

We can run whisper standalone using FP32 mode for running on a CPU with Indonesian language (not required since the model is pretty good at identifying the language) to generate each of the supported output formats:

whisper --fp16 False --language id --output_format=all jokowi.wav

