How to Write a PDF File Using Python: A Comprehensive Guide
By JoeVu, at: Nov. 30, 2023, 8:58 p.m.
Introduction
Python provides a versatile set of libraries for working with PDF files, enabling developers to create, modify, and manipulate PDF documents effortlessly. In this comprehensive guide, we will explore various Python libraries for writing PDF files, covering fundamental operations and showcasing the capabilities of popular libraries such as PyPDF2, ReportLab, pdfkit, pdfrw and PyMuPDF.
There is another blog post about reading pdf files
1. PyPDF2: Basic PDF Writing
Installation: pip install PyPDF2
Basic PDF Writing Example:
import PyPDF2
with open('output_pypdf2.pdf', 'wb') as output_file:
pdf_writer = PyPDF2.PdfFileWriter()
# Add a new page to the PDF
page = pdf_writer.addPage()
# Set content for the page
page.mergePage(PyPDF2.PdfReader('input.pdf').getPage(0))
# Encrypt the PDF (optional)
pdf_writer.encrypt('password')
# Write the modified PDF to the output file
pdf_writer.write(output_file)
This example demonstrates creating a new PDF file, adding a page, setting its content, and optionally encrypting the PDF using PyPDF2.
2. ReportLab: Advanced PDF Generation
Installation: pip install reportlab
Advanced PDF Generation Example:
from reportlab.pdfgen import canvas
def generate_pdf(file_path):
# Create a new PDF document
pdf_canvas = canvas.Canvas(file_path)
# Add content to the PDF
pdf_canvas.drawString(100, 750, "Hello, ReportLab!")
# Save the PDF
pdf_canvas.save()
# Generate a PDF using ReportLab
generate_pdf('output_reportlab.pdf')
In this example, ReportLab is used to generate a simple PDF document with text content at specified coordinates.
3. pdfkit: HTML to PDF Conversion
Installation: pip install pdfkit
HTML to PDF Conversion Example:
import pdfkit
# Convert HTML to PDF
pdfkit.from_file('input.html', 'output_pdfkit.pdf')
pdfkit leverages the wkhtmltopdf library to convert HTML content to PDF. In this example, we convert an HTML file to a PDF document.
4. PyMuPDF: PDF Writing with Text and Images
Installation: pip install PyMuPDF
PDF Writing with Text and Images Example:
import fitz # PyMuPDF
def write_pdf_with_text_and_images(file_path):
# Create a new PDF document
doc = fitz.open()
# Add a page to the PDF
page = doc.newPage()
# Add text to the page
page.insertText((100, 100), "Hello, PyMuPDF!")
# Add an image to the page
image_path = 'image.jpg'
image_rect = fitz.Rect(100, 200, 300, 400)
page.insertImage(image_rect, filename=image_path)
# Save the PDF
doc.save(file_path)
doc.close()
# Write a PDF with text and images using PyMuPDF
write_pdf_with_text_and_images('output_pymupdf.pdf')
In this PyMuPDF example, we create a new PDF, add a page, insert text and an image, and save the resulting document.
5. pdfrw: Low-Level PDF Writing
Installation: pip install pdfrw
Basic PDF Writing Example:
from pdfrw import PdfWriter, PdfDict, IndirectPdfDict
def write_pdf_with_pdfrw(file_path):
# Create a new PDF document
pdf_writer = PdfWriter()
# Add a page to the PDF
page = PdfDict(
Type='/Page',
Parent=IndirectPdfDict(),
Contents=IndirectPdfDict(stream="q 1 0 0 1 72 720 cm /F1 12 Tf (Hello, pdfrw!) Tj Q"),
)
# Set the page size
page.MediaBox = [0, 0, 612, 792]
# Add the page to the PDF
pdf_writer.pages.append(page)
# Save the PDF
with open(file_path, 'wb') as output_file:
pdf_writer.write(output_file)
# Write a PDF with pdfrw
write_pdf_with_pdfrw('output_pdfrw.pdf')
In this example, pdfrw
is used for low-level PDF writing. We create a new PDF document, add a page, set its content (text), and save the resulting document. pdfrw
provides flexibility for direct manipulation of PDF objects, making it suitable for scenarios requiring precise control over PDF structure and content.
Conclusion
Python offers a rich ecosystem of libraries for writing PDF files, each catering to different needs and scenarios. Whether you're performing basic operations with PyPDF2, generating sophisticated documents with ReportLab, converting HTML content with pdfkit, or combining text and images with PyMuPDF, you have a plethora of tools at your disposal.
Choose the library that best aligns with your specific requirements, and dive into the world of PDF generation and customization with Python. As you explore these libraries, you'll discover the flexibility and power they bring to your PDF-related projects. Happy coding!