Selective restore from database backups with Django
When things go wrong and you lose your database, backups will help save the day. If the whole production database is corrupted or lost, it’s simple to throw it away and restore it in its entirety from the latest backup.
If data loss caused by a user error has remained unnoticed for some time, valuable data may since have been stored, and restoring a complete backup is not an option. In such cases it’s useful to be able to do a partial restore of one or more tables while keeping the rest of the database.
For these complex cases, I’m going to describe a technique for restoring a subset of the data in one or more relational database tables using Django.
Django 1.2 or later is required since this technique depends on having multiple database connections (see Django’s multi-db support). This opened the possibility to copy data between databases using the Django ORM.
To cut some corners, I’ll use SQLite as the database engine in the example project. The shell examples assume an Ubuntu server.
The example project
The victim of simulated data loss is an example project which uses Photologue for managing photos and albums. If you’d like to follow along, you can create the project in its own virtualenv:
$ django-admin.py startproject pets
$ cd pets
$ virtualenv --distribute --system-site-packages venv
$ . venv/bin/activate
$ pip install django-photologue
Here’s a minimal settings.py file for the project. You can replace the generated file with it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | DATABASES = {
'default': {
'ENGINE': 'sqlite3',
'NAME': 'pets.sqlite'
}
}
INSTALLED_APPS = ['django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.admin',
'photologue']
ROOT_URLCONF = 'pets.urls'
|
The urls.py file should look like this:
1 2 3 4 5 6 7 8 9 | from django.conf.urls.defaults import patterns, include, url
from django.contrib import admin
admin.autodiscover()
urlpatterns = patterns(
'',
url(r'^admin/', include(admin.site.urls)),
)
|
Now initialize the database and run the development web server:
$ ./manage.py syncdb
$ ./manage.py runserver
Open http://localhost:8000/admin/ and log in with the admin credentials you entered for syncdb. Add several photos and create three galleries. Set the slug of one of the galleries to “pets”. This will be the gallery you’ll “accidentally” delete. Attach two photos into the “pets” gallery and some other photos into other galleries. Your list of galleries should look something like this:
and the “pets” gallery like this:
When you’re done, stop the development server and make a backup of the database. With SQLite it’s as easy as making a copy of the database file:
$ cp pets.sqlite pets-backup.sqlite
Start the development server again, and delete the “pets” gallery.
Restoring a backup
Let’s connect your project to the temporary backup database. To avoid touching the production codebase, it makes sense to create a separate db_restore_settings.py file:
1 2 3 4 5 6 | from settings import *
DATABASES['backup'] = {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': 'pets-backup.sqlite'
}
|
Now you can fire up the shell and start poking at your production and backup databases. Remember to use the settings created above.
$ ./manage.py shell --settings=db_restore_settings
Photologue doesn’t remove Photo instances from the database nor image files from the disk when albums are deleted. So, we have one missing Gallery instance, and all its many-to-many relations to Photos have been deleted as well.
We can take a look at the Gallery model in different databases with the Queryset.using() method:
>>> Gallery.objects.using('backup').count()
3
>>> Gallery.objects.using('default').count()
2
>>> Gallery.objects.using('backup').filter(title_slug='pets').count()
1
>>> Gallery.objects.using('default').filter(title_slug='pets').count()
0
(When using the default database, the .using('default') method call can be omitted, but I have included it in the examples for clarity’s sake.)
Restoring a single model instance
At this point, we can simply retrieve the gallery from the backup and save it as a new instance to the production database.
In Django, a new copy of a model instance can be made by setting its primary key to None before saving. To specify that the copy should be saved into another database, the save() method accepts a using= keyword argument as stated in the multi-db documentation. Again, using= could be omitted here since the default database is specified, but I included it for clarity:
>>> backup_pets = Gallery.objects.using('backup').get(title_slug='pets')
>>> backup_pets.id = None
>>> backup_pets.save(using='default')
You can now verify in the Django admin interface that an empty “pets” gallery has indeed been resurrected.
Restoring many-to-many links
You’ll notice that the gallery restored from the production database doesn’t yet contain the two deleted photos. We can verify in the shell that they do appear in the backed up gallery. The photo_count() instance method provided by Photologue makes this easy:
>>> restored_pets = Gallery.objects.using('default').get(title_slug='pets')
>>> restored_pets.photo_count()
0
>>> backup_pets = Gallery.objects.using('backup').get(title_slug='pets')
>>> backup_pets.photo_count()
In Photologue, the photos are connected to galleries using a many-to-many “through” table. To prepare for restoring the missing photos, first query the links from the “through” table. Then find out the primary keys of the corresponding photos. Here the flat=True option for .values_list() provides handy shortcut for getting a straight list for values of a single field instead of a list of dicts.
>>> Through = backup_pets.photos.through
>>> links = Through.objects.using('backup').filter(gallery=backup_pets)
>>> restored_photos = restored_pets.photos.using('default')
>>> restored_photo_ids = restored_photos.values_list('id', flat=True)
Now loop through the links in the backup database and make copies of them in the production database. Replace the reference to the gallery with the restored “pets” gallery. The if clause makes sure that we skip already restored links and photos which have been deleted.
>>> all_photo_ids = Photo.objects.using('default').values_list('id', flat=True)
>>> for link in links:
... pk = link.photo_id
... if pk not in restored_photo_ids and pk in all_photo_ids:
... link.gallery_id = restored_pets.id
... link.id = None
... link.save(using='default')
...
That’s all! Now you can verify in the admin interface that the images have been restored.
The method illustrated above may not be suitable as such for huge database tables for memory consumption and performance reasons. See these possible off-the-shelf solutions for optimizing the query/save loop:
- Thierry Schellenbach’s Django query set iterator – for really large, querysets
- David Cramer’s Large SQL Result Sets in Django
- Thomas Weholt’s DSE - Simplified “bulk” insert/update/delete for Django
