Get the Lyrics of a Song using Whisper AI on Ubuntu 24.04 LTS (AMD GPU)

When you’re listening to music and can’t find the lyrics, AI can save you some work by generating them for you. The solution is Whisper AI by OpenAI. Whisper AI will create a .txt
file for you, transcribed from the mp3
audio file.
To get this working on Ubuntu 24.04 LTS with an AMD GPU, you need a Python virtual environment with the ROCm version of Torch by AMD. I explained that here.
In order to run git
commands and do other useful things, make sure you install the following tools.
Installation
- Open a terminal by pressing the Windows button. Then type ‘cmd’ and press Enter. Go to the home directory of the logged user:
cd ~
- Install
ffmpeg
in case it is not installed yet:
sudo apt install ffmpeg -y
- Clone the Whisper AI repository:
git clone https://github.com/openai/whisper
- Change the directory into the cloned
whisper
repository:
cd whisper
- Copy the
venv
folder from this article into the git repository:
cp -a ../ai-venv/venv .
- Activate the virtual environment:
source venv/bin/activate
- Install the Python requirements of
whisper
:
pip install -r requirements.txt
- Then run:
pip install .
- Now run the following command:
whisper audio.mp3 --model small --language en --output_format txt --output_dir ./lyrics --device cuda
Explanation:
audio.mp3
: This is your audio file.
--model
: There are several AI models that can be used; tiny
, base
, small
, medium
, large
.
--language en
: Force language detection to English; you can change it to other languages, like es
(Spanish) for example.
--task translate
: Translate non-English audio to English subtitles.
--output_dir ./lyrics
: Choose the output directory where you want to store the txt
file with the lyrics.
--device cuda
: This parameter ensures the GPU is used during transcribing.
Member discussion