Configuring AWS Aurora Endpoints in Django
At my work, we plan on migrating from AWS RDS Postgres to Aurora to enable autoscaling db instances. We hope that this will alleviate CPU utilization spikes typically caused by unpredictably complex SQL queries generated by Django ORM.
AWS Aurora Endpoints
Aurora databases are accessed via endpoints that reflect clusters of db instances. There are two out-of-the-box endpoints:
- The Cluster endpoint is intended for write-operations and, in a single-master cluster configuration, allows queries to the db via the master db instance.
- The Reader endpoint is intended solely for read-operations, and load-balances between read-replicas on the Aurora cluster.
Django Database Routing
In Django, each endpoint must be configured as a separate “database”:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'cluster',
'USER': 'postgres',
'PASSWORD': 'password',
'HOST': 'cluster.endpoint.here',
'PORT': '5432',
},
'readonly': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'readonly',
'USER': 'postgres',
'PASSWORD': 'password',
'HOST': 'readonly.endpoint.here',
'PORT': '5432',
}
}
Keeping default
as the name of the cluster endpoint allows the ./manage.py migrate
command to operate using the cluster endpoint as the target db. Same with makemigrations
.
Using database routing, we can specify which endpoints our models should use when reading or writing. In our case, all models will use the cluster endpoint to write and the readonly endpoint to read. Below is an example of such a router:
class AuroraRouter:
route_app_labels = {'app_name'}
def db_for_read(self, model, **hints):
return "readonly"
def db_for_write(self, model, **hints):
return "default"
def allow_relation(self, obj1, obj2, **hints):
return None
def allow_migrate(self, db, app_label, model_name=None, **hints):
return None
app_name
, as the name suggests, will need to be the name of your django application. In settings.py
:
DATABASE_ROUTERS = ['path.to.AuroraRouter']
Because Aurora separates the storage layer from the db instances, no synchronization (ie migrations) between readonly db instances and the cluster instance needs to be done.
For specific operations (such as reporting, exporting, etc), you may want to specify a db instance to use so as not to tie up your more i/o bound db instances that need to respond to queries in real-time. In these cases, one approach you can use is to specify which db your queryset should use:
User.objects.using('some_other_db_cluster').all()
In this case, some_other_db_cluster
would need to be listed in your DATABASES
setting. Other methods of ExampleModel.objects
may have different ways of specifying the cluster you wish to use. Here’s a list of examples.
Conclusion
As of this writing, there does not appear to be an easy way of declaring the desired Django database to use for the length of a request. This stackoverflow answer offers one possible way of doing this via a custom Django middleware, but I haven’t personally tried it. Ymmv.