Python Generator: What is this?

By khoanc, at: Nov. 3, 2023, 11:02 a.m.

Estimated Reading Time: 6 min read

Python Generator: What is this?
Python Generator: What is this?

1. What is a Generator?

A Python generator is a special type of iterable that allows you to iterate over a potentially large sequence of items without holding the entire sequence in memory. Unlike lists or tuples, generators don't store all their values at once. Instead, they generate values on the fly as you iterate over them. Generators are defined using functions with the yield keyword.

 

2. Why are Generators Important?

Generators are essential for several reasons:

  • Memory Efficiency: Generators are memory-efficient because they don't load the entire sequence into memory. This is crucial when working with large datasets.

  • Lazy Evaluation: They support lazy evaluation, meaning values are generated only when needed, reducing computation time and resource usage.

  • Endless Sequences: Generators can represent endless sequences, allowing you to work with data streams that don't have a defined endpoint.

  • Simplifying Code: They make code more concise and readable by separating the data generation logic from the iteration logic.

An important use case of generator is when you read a huge file line by line. 

The normal approach would be

with open(filename) as file:
    lines = [line.rstrip() for line in file]


this is super slow and might cause the memory issue. Instead we should do the performance-wise approach below

def read_in_chunks(file_object, chunk_size=1024):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.read(chunk_size)
        if not data:
            break
        yield data

with open('really_big_file.dat') as f:
    for piece in read_in_chunks(f):
        process_data(piece)

 

3. How to Use Generators

To create and use a generator:

  • Define a function that contains the yield keyword.
  • When the function is called, it doesn't execute immediately but returns a generator object.
  • Values are generated using the yield keyword within the function, and the function's state is retained between calls.
 

def number_generator(n):
    for i in range(n):
        yield i

gen = number_generator(5)
for num in gen:
    print(num)  # Output: 0, 1, 2, 3, 4

 

4. Issues with Generators

While generators offer memory efficiency, they can introduce performance issues:

  • Slower Access: Accessing elements in a generator can be slower compared to accessing elements in a list because each value is generated dynamically.

  • One-Time Iteration: Generators are typically one-time use. Once the generator is exhausted, you cannot rewind it, unlike lists that you can iterate over repeatedly.

  • State Management: Managing the state of a generator and understanding when it gets exhausted can be challenging.

  • Limited Use Cases: Generators are most suitable for sequential data, making them less suitable for random access or complex data manipulation tasks.

 

5. Libraries that Use Generators Extensively for Performance

  • Python Standard Library: Python's standard library includes several built-in functions and modules that return generators, such as range(), zip(), and enumerate().

  • Third-Party Libraries: Libraries like itertools, asyncio, and Dask make extensive use of generators to provide high-performance data processing and asynchronous programming capabilities.

 

6. Difference between Iterator and Generator

Generators are a type of iterator, but there are key differences:

  • Iterator: An iterator is a more general concept and can be an object that follows the iterator protocol (with __iter__() and __next__() methods). Iterators can be created with classes and don't necessarily involve lazy evaluation.

  • Generator: A generator is a specific type of iterator created using functions with the yield keyword. They are explicitly designed for lazy evaluation and are typically more memory-efficient.

Iterator Example

class MyIterator:
    def __init__(self, max_val):
        self.max_val = max_val
        self.current = 0

    def __iter__(self):
        return self

    def __next__(self):
        if self.current < self.max_val:
            result = self.current
            self.current += 1
            return result
        else:
            raise StopIteration

my_iter = MyIterator(5)
for num in my_iter:
    print(num)  # Output: 0, 1, 2, 3, 4

 

Generator Example

def number_generator(n):
    for i in range(n):
        yield i

gen = number_generator(5)
for num in gen:
    print(num)  # Output: 0, 1, 2, 3, 4

 

In summary, generators are a specialized form of iterators, tailored for memory efficiency and lazy evaluation, making them ideal for sequential data and large datasets.


Subscribe

Subscribe to our newsletter and never miss out lastest news.