Using Whisper for Real-Time Audio Translation on AMD Threadripper 3995X with AMD 6900XT

This blog post will guide you through setting up OpenAI's Whisper for real-time audio transcription and translation, running efficiently on an AMD Threadripper 3995X and AMD 6900XT. We'll also integrate text-to-speech (TTS) for real-time audio output.

Prerequisites

Hardware:
AMD Threadripper 3995X
AMD Radeon RX 6900XT (ensure ROCm is installed for GPU acceleration).
Software:
Python 3.10 or higher
ROCm-compatible PyTorch
Required libraries: openai-whisper, ffmpeg, SpeechRecognition, gTTS, and playsound.

Step 1: Environment Setup

Install dependencies:

conda create --name whisper_env python=3.10 conda activate whisper_env conda install -c conda-forge ffmpeg pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6 pip install openai-whisper SpeechRecognition gTTS playsound

Verify GPU compatibility:

import torch print(torch.cuda.is_available()) # Should return True if ROCm is set up correctly.

Step 2: Real-Time Transcription and Translation

Create a Python script (real_time_translator.py):

import whisper import speech_recognition as sr from gtts import gTTS from playsound import playsound import tempfile # Load Whisper model (optimized for GPU) model = whisper.load_model("large") def transcribe_and_translate(audio_data, target_lang="es"): # Save audio to temp file with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as temp_audio: temp_audio.write(audio_data.get_wav_data()) temp_audio_path = temp_audio.name # Transcribe and translate using Whisper result = model.transcribe(temp_audio_path, task="translate", language=target_lang) return result["text"] def text_to_speech(text, lang="es"): tts = gTTS(text=text, lang=lang) with tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) as temp_audio: tts.save(temp_audio.name) playsound(temp_audio.name) def main(): recognizer = sr.Recognizer() mic = sr.Microphone() print("Listening...") while True: with mic as source: recognizer.adjust_for_ambient_noise(source) audio = recognizer.listen(source) try: translated_text = transcribe_and_translate(audio, target_lang="es") print(f"Translated Text: {translated_text}") text_to_speech(translated_text, lang="es") except Exception as e: print(f"Error: {e}") if __name__ == "__main__": main()

Run the script:

python real_time_translator.py

Step 3: Optimizing for AMD GPUs

Ensure ROCm libraries are correctly installed to leverage the AMD Radeon RX 6900XT.
Use torch.device("cuda") to confirm the model runs on GPU.

This setup enables real-time transcription, translation, and audio output with Whisper on powerful AMD hardware, providing seamless performance for multilingual applications!

Sources
[1] How to use Whisper in Python - @nicobytes https://nicobytes.com/blog/en/how-to-use-whisper/
[2] jahnavirishikesh/Real-Time-Voice-Translator-in-Python - GitHub https://github.com/jahnavirishikesh/Real-Time-Voice-Translator-in-Python
[3] Starting with Whisper Large V3 for Real-Time Audio Transcription in … https://blog.gopenai.com/starting-with-whisper-large-v3-for-real-time-audio-transcription-in-python-8ff3e8df34d3
[4] How to use open source Whisper ASR in Python - Educative.io https://www.educative.io/answers/how-to-use-open-source-whisper-asr-in-python
[5] [PDF] Real-Time Speech Translation with Python - ijrpr https://ijrpr.com/uploads/V5ISSUE6/IJRPR30511.pdf
[6] Speech-to-Text on an AMD GPU with Whisper — ROCm Blogs https://rocm.blogs.amd.com/artificial-intelligence/whisper/README.html
[7] openai/whisper: Robust Speech Recognition via Large … - GitHub https://github.com/openai/whisper
[8] SamirPaulb/real-time-voice-translator - GitHub https://github.com/SamirPaulb/real-time-voice-translator

When using Whisper for real-time audio transcription and translation on an AMD Threadripper 3995X with an AMD 6900XT, latency is a critical factor to consider. Here's an overview of latency performance and optimization:

Whisper Latency:
Whisper is not inherently designed for real-time transcription, but implementations like Whisper-Streaming achieve around 3.3 seconds latency for unsegmented long-form speech transcription[1].
Optimized APIs like Gladia’s Whisper-based solution can reduce latency to 300 milliseconds, making it suitable for real-time applications[3].
Fireworks' Whisper-based API achieves even lower latency, around 200 milliseconds, by leveraging advanced optimizations[5].
Hardware Impact:
The AMD Threadripper 3995X, with its high core count and octa-channel memory bandwidth, ensures efficient processing of Whisper's computationally heavy tasks. However, the memory latency (~100ns for octa-channel setups) may slightly impact real-time responsiveness[2].
The AMD 6900XT can accelerate model inference via ROCm-compatible PyTorch, minimizing GPU-related delays.
Challenges and Recommendations:
Latency in Whisper depends on factors like input chunking and emitted tokens. For real-time use, chunking smaller audio segments (~1-2 seconds) is recommended[7][9].
Translation tasks introduce additional delays since Whisper often waits to process complete sentences for accuracy, which can add 1-2 seconds depending on language structure[9].

By combining Whisper-Streaming or optimized APIs with Threadripper's multi-core capabilities and GPU acceleration, you can achieve low-latency transcription and translation suitable for real-time audio output.

Sources
[1] ufal/whisper_streaming: Whisper realtime streaming for … - GitHub https://github.com/ufal/whisper_streaming
[2] AMD Threadripper Pro 3995WX Review: Ripping With 8 Memory … https://www.tomshardware.com/reviews/amd-threadripper-pro-3995wx-review/3
[3] Real-Time Audio Transcription API: What it is, How it works - Gladia https://www.gladia.io/blog/real-time-transcription-powered-by-whisper-asr
[4] AMD Ryzen Threadripper 3990X review (Page 29) - www.guru3d.com https://www.guru3d.com/review/amd-ryzen-threadripper-3990x-review/page-29/
[5] 20x faster Whisper than OpenAI - Fireworks audio transcribes 1 hour … https://fireworks.ai/blog/audio-transcription-launch
[6] AMD Threadripper Pro 3995WX Review: Ripping With 8 Memory … https://www.reddit.com/r/hardware/comments/kymsw3/amd_threadripper_pro_3995wx_review_ripping_with_8/
[7] Can Whisper be used for real-time speech to text? - Hugging Face https://huggingface.co/spaces/openai/whisper/discussions/76
[8] How to maximize Threadripper Pro 3995WX memory bandwidth? https://forum.level1techs.com/t/how-to-maximize-threadripper-pro-3995wx-memory-bandwidth/171367
[9] Whisper Real-time Language Translation : r/OpenAIDev - Reddit https://www.reddit.com/r/OpenAIDev/comments/15yqx9w/whisper_realtime_language_translation/

Imported from rifaterdemsahin.com · 2025

Using Whisper for Real-Time Audio Translation on AMD Threadripper 3995X with AMD 6900XT

Prerequisites

Step 1: Environment Setup

Step 2: Real-Time Transcription and Translation

Step 3: Optimizing for AMD GPUs

📚 Related Reading