27.3 openai transcribe


The whisper package provides the functionality for the transcribe command. We expose the command through the MLHub package, but also note that the package provides the whisper command line utility as well. You can use either. The MLHub package conforms to other transcribe commands from other packages whilst the whisper command provides many more options. A particularly nice feature is to include output to json, srt, tsv, txt, and vtt formats, which include video subtitles.

The input can be any of wav, mp4, mp3, flac.

There are many options available but not yet exposed through the MLHub package.

wget https://github.com/realpython/python-speech-recognition/raw/master/audio_files/harvard.wav
ml transcribe openai harvard.wav

We can run whisper standalone using FP32 mode for running on a CPU with Indonesian language (not required since the model is pretty good at identifying the language) to generate each of the supported output formats:

whisper --fp16 False --language id --output_format=all jokowi.wav

Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0