How to Create an Index in Django Without Causing Downtime

By JoeVu, at: 2024年2月10日18:05

Estimated Reading Time: 5 min read

How to Create an Index in Django Without Causing Downtime
How to Create an Index in Django Without Causing Downtime

How to Create an Index in Django Without Causing Downtime

 

Problem Introduction

Recently, I have had a small problem with Django migration. The current situation is:

  1. The application has a big table (100 millions rows), which is usually queried but just a few insertions per day.
  2. Querying the table takes a lot of time, I analyzed and it turned out the problem is due to there wasis_ no index solutions.
  3. I decided to add the index to some querying columns
    • created_date: DATETIME
    • is_checked: BOOLEAN

The problem is: adding indexes to this table took a lot of time in my machine (16GB RAM, 8 cores). As it requires exclusive lock on the table and a huge load to the database memory. 

As a result, the deployment process would take hours.

 

Understanding the Challenge

Adding indexes is a common database optimization task that becomes necessary as your dataset grows. However, index creation often requires an exclusive lock on the table, preventing data modification operations and potentially causing significant downtime for large tables.

A quick solution came to my mind: Some databases offer syntax for creating indexes without locking the table, such as CREATE INDEX CONCURRENTLY in PostgreSQL or CREATE INDEX ... ONLINE in Oracle. Unfortunately, Django's migration system does not automatically utilize these features.

How do we resolve this issue???

 

Solution Overview

This tutorial will guide you through the process of adding an index to a large table in Django without causing downtime, focusing on:

  • Inspecting Django-generated migration commands.
  • Modifying migrations to avoid downtime.
  • Utilizing database-specific syntax for non-locking index creation.

 

Step-by-Step Guide


1. Initial Setup

Assume a simple ABTesting model in your Django app:

# models.py
from django.db import models

class ABTesting(models.Model):
    created_date = models.DateTimeField(auto_now_add=True)
    is_checked = models.Boolean()


Create and apply the initial migration:

python manage.py makemigrations
python manage.py migrate


2. Adding an Index

Modify the ABTesting model to add an index on created_date:

# models.py
from django.db import models

class ABTesting(models.Model):
    created_date = models.DateTimeField(auto_now_add=True, db_index=True)  # adding index
    is_checked = models.Boolean()


Generate and inspect the migration without applying it:

python manage.py makemigrations --name add_index_without_lock

 

3. Modifying the Migration for a Non-locking Index Creation

To avoid locking the table, modify the generated migration file by the SQL command to include database-specific syntax, such as CONCURRENTLY for PostgreSQL:

CREATE INDEX CONCURRENTLY ix_abtesting_created_date ON app_abtesting (created_date);


Apply the modified migration using Django's RunSQL operation within a new migration file as follow:

# migrations/0002_add_index_without_lock.py
from django.db import migrations

class Migration(migrations.Migration):

    dependencies = [
        ('app', '0001_initial'),
    ]

    operations = [
        migrations.RunSQL(
            "CREATE INDEX CONCURRENTLY ix_abtesting_created_date ON app_abtesting (created_date);",
            reverse_sql="DROP INDEX CONCURRENTLY ix_abtesting_created_date;"
        ),
    ]

 

4. Applying and Faking Migrations

Manually apply the SQL command in your database, then mark the migration as applied in Django:

python manage.py migrate app 0002_add_index_without_lock


After adding the index, the queries, which are based on created_date indexing, run much faster.

 

Conclusion

By following the steps outlined, I was able to add indexes to our Django project without causing downtime, ensuring the application remains responsive even during schema changes. This approach requires a good understanding of both Django's migration system and your database's capabilities, highlighting the importance of database state and model state synchronization.

This tutorial has demonstrated:

  • The significance of inspecting and modifying Django-generated migrations.
  • Techniques for executing custom SQL in migrations, including non-locking index creation.
  • The concept of atomic and non-atomic migrations and their implications.

As you continue developing with Django, remember that understanding the underlying database and Django's abstraction layer is key to performing efficient and safe schema changes.