Speech-to-Text in Python: Tools, Code, and Comparisons

By manhnv, at: Nov. 26, 2024, 11:17 a.m.

Estimated Reading Time: 4 min read

Speech-to-Text in Python: Tools, Code, and Comparisons
Speech-to-Text in Python: Tools, Code, and Comparisons

Speech-to-Text in Python: Tools, Code, and Comparisons

Converting speech into text is a critical feature in many applications, from personal assistants to transcription tools. In this post, we'll explore the top libraries and services for implementing speech-to-text in Python: SpeechRecognitionGoogle Cloud Speech-to-Text, Azure Speech Service, and Whisper by OpenAI. We'll provide sample code for each and compare their performance, accuracy, and pricing.

 

1. SpeechRecognition


Overview

  • Type: Open-source
     
  • Dependencies: Local microphone or pre-recorded audio files
     
  • Backend: Google Web Speech API by default
     
  • Ideal for: Quick prototyping, hobby projects


Code Snippet

import speech_recognition as sr

recognizer = sr.Recognizer()

with sr.Microphone() as source:
    print("Listening...")
    audio = recognizer.listen(source)

try:
    text = recognizer.recognize_google(audio)
    print("You said:", text)
except sr.UnknownValueError:
    print("Could not understand the audio.")
except sr.RequestError as e:
    print(f"API error: {e}")

 

 

2. Google Cloud Speech-to-Text


Overview

  • Type: Cloud-based API
     
  • Dependencies: Google Cloud account
     
  • Ideal for: High accuracy, diverse language support
     

Code Snippet

from google.cloud import speech
import io

client = speech.SpeechClient()

with io.open("audio.wav", "rb") as audio_file:
    content = audio_file.read()

audio = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=16000,
    language_code="en-US",
)

response = client.recognize(config=config, audio=audio)

for result in response.results:
    print("Transcript:", result.alternatives[0].transcript)

 

 

3. Azure Speech Service


Overview

  • Type: Cloud-based API
     
  • Dependencies: Azure Cognitive Services account
     
  • Ideal for: Enterprise applications, integration with Microsoft ecosystem
     

Code Snippet

import azure.cognitiveservices.speech as speechsdk

speech_key = "YOUR_AZURE_SPEECH_KEY"
service_region = "YOUR_SERVICE_REGION"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

print("Speak something...")
result = speech_recognizer.recognize_once()

if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized:", result.text)
else:
    print("Error:", result.reason)

 

 

4. Whisper by OpenAI


Overview

  • Type: Open-source, offline
     
  • Dependencies: Pre-trained Whisper model
     
  • Ideal for: Offline transcription, high accuracy for varied accents
     

Code Snippet

import whisper

model = whisper.load_model("base")

result = model.transcribe("audio.mp3")
print("Transcription:", result["text"])

 

 

Comparison Table

Speech to text - services comparison

 

Performance Insights

  1. Accuracy:

    • Whisper and Google Cloud Speech provide superior accuracy for diverse accents and noisy environments.
       
    • SpeechRecognition can struggle with unclear audio.
       
  2. Speed:

    • SpeechRecognition and cloud services like Google and Azure are faster than Whisper, especially for real-time tasks.
       
    • Whisper is slower but operates offline, making it valuable for privacy-focused applications.
       
  3. Cost:

    • SpeechRecognition and Whisper are cost-effective for smaller projects.
       
    • Cloud services (Google and Azure) are better for large-scale, high-accuracy needs but come with ongoing costs.

 

Final Recommendations

  • Use SpeechRecognition for quick and small-scale projects.
     
  • Choose Google Cloud Speech or Azure Speech Service for enterprise applications requiring high accuracy and language support.
     
  • Opt for Whisper if offline capabilities or privacy are priorities.

 


Subscribe

Subscribe to our newsletter and never miss out lastest news.