How to Extract Text from Images Using Python
By hientd, at: April 5, 2024, 8:17 p.m.
Estimated Reading Time: __READING_TIME__ minutes
How to Extract Text from Images Using Python
Extracting text from images—a process known as Optical Character Recognition (OCR)—has numerous applications, from digitizing printed documents to processing street signs in real-time. Python, with its rich ecosystem of libraries and APIs, offers several solutions for OCR tasks. This article explores four popular Python libraries and four cloud APIs for text extraction from images.
Python Libraries for OCR
1. pytesseract
- Description: A wrapper for Google's Tesseract-OCR Engine.
- Code Snippet: you need to run the installation first
pip install pytesseract
from PIL import Image
import pytesseract
text = pytesseract.image_to_string(Image.open('image.jpg'))
print(text) - Pros: Free and open-source, supports multiple languages.
- Cons: Can struggle with images containing complex layouts.
2. easyOCR
- Description: A more recent library that supports over 40 languages and is designed for simplicity.
- Code Snippet: installation vai
pip install easyocr
import easyocr
reader = easyocr.Reader(['en'])
results = reader.readtext('image.jpg')
print(results)
- Pros: Easy to use, good performance on various image types.
- Cons: Larger size due to its deep learning models.
3. OCRopus
- Description: An OCR suite written in Python, focusing on historical document recognition.
- Code Snippet:
# Pseudocode as OCRopus uses command line
ocropus-rpred 'image.jpg'
- Pros: Good for historical documents, open-source.
- Cons: Less effective for modern text layouts, command-line based.
Cloud APIs for OCR
1. Microsoft OCR
- Description: A part of Azure AI designed for understanding, processing, and extracting information from documents.
- SDK: https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/quickstarts/get-started-sdks-rest-api?view=doc-intel-4.0.0&preserve-view=true&pivots=programming-language-python
- Pros: Deep integration with other Microsoft services.
- Cons: Can be complex to set up for beginners.
2. Amazon Textract
- Description: Extracts text and data from scanned documents with machine learning.
- SDK: https://docs.aws.amazon.com/code-library/latest/ug/python_3_textract_code_examples.html
- Pros: Can process a large volume of documents, supports forms and tables.
- Cons: Usage costs can add up for large-scale applications.
3. Google Cloud Vision API
- Description: Provides powerful image analysis capabilities including text detection.
- SDK: https://cloud.google.com/python/docs/reference/vision/latest
- Pros: Highly accurate, easy to integrate with other Google services.
- Cons: Pricing can be a concern for high-volume users.
Conclusion
In conclusion, Python OCR offers a versatile range of tools and cloud APIs, each with its own strengths and weaknesses, catering to a wide array of use cases from simple text extraction to complex document analysis. Whether you're working with historical manuscripts or modern documents, there’s a FIT solution. However, choosing the right tool or API depends on your specific needs, including accuracy, language support, cost, and ease of integration.
Since ChatGPT is available in Azure Services, Microsoft Document Intelligence or OCR seem to be the best now