02 Jun 2025 1 min read Whisper AI

Get the Lyrics of a Song using Whisper AI on Ubuntu 24.04 LTS (AMD GPU)

Band

When you’re listening to music and can’t find the lyrics, AI can save you some work by generating them for you. The solution is Whisper AI by OpenAI. Whisper AI will create a .txt file for you, transcribed from the mp3 audio file.

To get this working on Ubuntu 24.04 LTS with an AMD GPU, you need a Python virtual environment with the ROCm version of Torch by AMD. I explained that here.

In order to run git commands and do other useful things, make sure you install the following tools.

Installation

Open a terminal by pressing the Windows button. Then type ‘cmd’ and press Enter. Go to the home directory of the logged user:

cd ~

Install ffmpeg in case it is not installed yet:

sudo apt install ffmpeg -y

Clone the Whisper AI repository:

git clone https://github.com/openai/whisper

Change the directory into the cloned whisper repository:

cd whisper

Copy the venv folder from this article into the git repository:

cp -a ../ai-venv/venv .

Activate the virtual environment:

source venv/bin/activate

Install the Python requirements of whisper:

pip install -r requirements.txt

Then run:

pip install .

Now run the following command:

whisper audio.mp3 --model small --language en --output_format txt --output_dir ./lyrics --device cuda

Explanation:

audio.mp3: This is your audio file.

--model: There are several AI models that can be used; tiny, base, small, medium, large.

--language en: Force language detection to English; you can change it to other languages, like es (Spanish) for example.

--task translate: Translate non-English audio to English subtitles.

--output_dir ./lyrics: Choose the output directory where you want to store the txt file with the lyrics.

--device cuda: This parameter ensures the GPU is used during transcribing.

Installation

You might also like...

Convert a Video to Subtitles with Whisper AI on Ubuntu 24.04 LTS (AMD GPU)