A robot transcribing a big mp3

HOWTO transcribe from MP4 to TXT with Whisper AI

In an era where information is constantly flowing through various forms of media, the need to extract and transcribe audio content has become increasingly important. Whether you’re a journalist, a content creator, or simply someone looking to convert spoken words into written text, the process of transcribing audio can be a game-changer. In this guide, we’ll explore how to transcribe audio from an MP4 file to text using Whisper AI, a powerful automatic speech recognition (ASR) system developed by OpenAI.

What is Whisper AI?

Whisper AI is an advanced ASR system designed to convert spoken language into written text. It has been trained on an extensive dataset, making it capable of handling various languages and accents. Whisper AI has numerous applications, including transcription services, voice assistants, and more. In this guide, we will focus on using it for transcribing audio from MP4 files to text.

Prerequisites

Before you can start transcribing MP4 files with Whisper AI, make sure you have the following prerequisites in place:

  1. Docker: Docker is a platform for developing, shipping, and running applications in containers. You’ll need Docker installed on your system. If you don’t have it, you can download and install Docker.
  2. MP4 to MP3 Conversion: Whisper AI currently accepts MP3 audio files as input. If your audio is in MP4 format, you’ll need to convert it to MP3 first. There are various tools available for this purpose. You can use FFmpeg for a reliable and versatile conversion process.

fmpeg -i 20230523_111106-Meeting\ Recording.mp4 20230523_111106-Meeting\ Recording.mp3

Transcribing MP4 to TXT with Whisper AI

Now, let’s walk through the steps to transcribe an MP4 file to text using Whisper AI. We’ll assume you already have your MP4 file converted to MP3.

Step 1: Clone the Whisper AI Docker Repository

First, clone the Whisper AI Docker repository to your local machine. Open a terminal and run the following command:

git clone https://github.com/hisano/openai-whisper-on-docker.git

Step 2: Navigate to the Repository

Change your current directory to the cloned repository:

cd openai-whisper-on-docker

Step 3: Build the Docker Image

Build the Docker image for Whisper AI with the following command:

docker image build --tag whisper:latest .

Step 4: Set Up Volume and File Name

Set the VOLUME_DIRECTORY to your current directory and specify the name of your MP3 file. In this example, we’ll use “hello.mp3”:

VOLUME_DIRECTORY=$(pwd)

FILE_NAME=hello.mp3

Step 5: Copy Your MP3 File

Copy your MP3 file (the one you want to transcribe) to the current directory.

cp ../20230503_094932-Meeting\ Recording.mp3 ./$FILE_NAME

Step 6: Transcribe the MP3 File

Finally, use the following command to transcribe the MP3 file to text using Whisper AI. In this example, we’re specifying the model as “small” and the language as “Spanish.” Adjust these parameters according to your needs:

docker container run --rm --volume ${VOLUME_DIRECTORY}:/data whisper --model small --language Spanish /data/$FILE_NAME

Once you execute this command, Whisper AI will process the audio file and provide you with the transcribed text output.

You’ll see transcription is outputted through stdout so consider piping the docker run to a file.

docker container run --rm --volume ${VOLUME_DIRECTORY}:/data whisper --model small --language Spanish /data/$FILE_NAME &> result.txt

You can monitor how it goes with:

tail -f result.txt

If you see a warning like:

/usr/local/lib/python3.9/site-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead

It will mean that you lack a CUDA setup so it will run using your CPU.

Also notice that here we’re using the small model, which is good enough but perhaps too slow with CPU usage. In my machine, it takes like 2.5 hours to transcribe 3 hours of audio.

Conclusion

Transcribing audio from MP4 to text has never been easier, thanks to Whisper AI and the power of Docker. With this guide, you can efficiently convert spoken content into written text, opening up a world of possibilities for content creation, research, and more. Experiment with different Whisper AI models and languages to tailor your transcription experience to your specific needs. Happy transcribing!

Note: I’ve written this blog post with the help of ChatGPT based on my own experiments with Whisper AI. I’m just too lazy to write something coherent in English. Sorry for that, I hope you liked it anyway.


Prompt: “Write a blog post whose title is HOWTO transcribe from mp4 to txt with Whisper AI. It should explain what Whisper AI is but also explain how to extract mp3 from mp4, and the following commands, ignore first column: 10054 git clone https://github.com/hisano/openai-whisper-on-docker.git 10055 cd openai-whisper-on-docker 10056 docker image build –tag whisper:latest . 10057 VOLUME_DIRECTORY=$(pwd) 10058 FILE_NAME=hello.mp3 10059 cp ../20230503_094932-Meeting\ Recording.mp3 ./hello.mp3 10060 docker container run –rm –volume ${VOLUME_DIRECTORY}:/data whisper –model small –language Spanish /data/hello.mp3” . After that, I added some extra useful information about performance.

Published by

Iván Mosquera Paulo

Software Engineer

Leave a comment