How to Create an Index in Django Without Causing Downtime
By JoeVu, at: 2024年2月10日18:05
How to Create an Index in Django Without Causing Downtime
Problem Introduction
Recently, I have had a small problem with Django migration. The current situation is:
- The application has a big table (100 millions rows), which is usually queried but just a few insertions per day.
- Querying the table takes a lot of time, I analyzed and it turned out the problem is due to there wasis_ no index solutions.
- I decided to add the index to some querying columns
created_date
: DATETIMEis_checked
: BOOLEAN
The problem is: adding indexes to this table took a lot of time in my machine (16GB RAM, 8 cores). As it requires exclusive lock on the table and a huge load to the database memory.
As a result, the deployment process would take hours.
Understanding the Challenge
Adding indexes is a common database optimization task that becomes necessary as your dataset grows. However, index creation often requires an exclusive lock on the table, preventing data modification operations and potentially causing significant downtime for large tables.
A quick solution came to my mind: Some databases offer syntax for creating indexes without locking the table, such as CREATE INDEX CONCURRENTLY
in PostgreSQL or CREATE INDEX ... ONLINE
in Oracle. Unfortunately, Django's migration system does not automatically utilize these features.
How do we resolve this issue???
Solution Overview
This tutorial will guide you through the process of adding an index to a large table in Django without causing downtime, focusing on:
- Inspecting Django-generated migration commands.
- Modifying migrations to avoid downtime.
- Utilizing database-specific syntax for non-locking index creation.
Step-by-Step Guide
1. Initial Setup
Assume a simple ABTesting
model in your Django app:
# models.py
from django.db import models
class ABTesting(models.Model):
created_date = models.DateTimeField(auto_now_add=True)
is_checked = models.Boolean()
Create and apply the initial migration:
python manage.py makemigrations
python manage.py migrate
2. Adding an Index
Modify the ABTesting
model to add an index on created_date
:
# models.py
from django.db import models
class ABTesting(models.Model):
created_date = models.DateTimeField(auto_now_add=True, db_index=True) # adding index
is_checked = models.Boolean()
Generate and inspect the migration without applying it:
python manage.py makemigrations --name add_index_without_lock
3. Modifying the Migration for a Non-locking Index Creation
To avoid locking the table, modify the generated migration file by the SQL command to include database-specific syntax, such as CONCURRENTLY
for PostgreSQL:
CREATE INDEX CONCURRENTLY ix_abtesting_created_date ON app_abtesting (created_date);
Apply the modified migration using Django's RunSQL
operation within a new migration file as follow:
# migrations/0002_add_index_without_lock.py
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
('app', '0001_initial'),
]
operations = [
migrations.RunSQL(
"CREATE INDEX CONCURRENTLY ix_abtesting_created_date ON app_abtesting (created_date);",
reverse_sql="DROP INDEX CONCURRENTLY ix_abtesting_created_date;"
),
]
4. Applying and Faking Migrations
Manually apply the SQL command in your database, then mark the migration as applied in Django:
python manage.py migrate app 0002_add_index_without_lock
After adding the index, the queries, which are based on created_date
indexing, run much faster.
Conclusion
By following the steps outlined, I was able to add indexes to our Django project without causing downtime, ensuring the application remains responsive even during schema changes. This approach requires a good understanding of both Django's migration system and your database's capabilities, highlighting the importance of database state and model state synchronization.
This tutorial has demonstrated:
- The significance of inspecting and modifying Django-generated migrations.
- Techniques for executing custom SQL in migrations, including non-locking index creation.
- The concept of atomic and non-atomic migrations and their implications.
As you continue developing with Django, remember that understanding the underlying database and Django's abstraction layer is key to performing efficient and safe schema changes.