Migrating Data in Django: A Guide to Using RunPython in Migration Files

Migrating Data in Django: A Guide to Using RunPython in Migration Files

Learn how to safely migrate data while running database migrations in Django

While working on your project, you will have to add new fields to your database. When you add a new field to the database, what values should the existing rows get for this field? You may want it to be null or you may want to populate it with correct data.

How can you populate existing fields? You can write a script and run this script from the shell after running the migrations. Seems nice, but if you have multiple environments such as staging, production, etc. then you have to run this script manually everywhere.

It would be awesome if somehow we could run such a script in our migration file itself. Django offers us a special operation called RunPython which will help us achieve the same. This will enable us to run python scripts as a migration operation where we perform our data population or migrations.

You could also create an empty migration file using the python manage.py makemigrations --empty yourappname command and add a RunPython operation to run a script.

Let's jump into two examples that will provide much more clarity on this topic (Don't miss the second example).

Example 1: Populating existing records with data

We have an Employee model with fields name and hire_date. There are 5 records in the table.

class Employee(models.Models):

    name = models.CharField(max_length=64)
    hire_date = models.DateField()

existing employee records

We want to add a third field employee_id to store the Employee ID of the employee. As per the company policies, employee ID should have the format EMP<hiring_year>-<incrementing_number>

class Employee(models.Model):

    name = models.CharField(max_length=64)
    hire_date = models.DateField()
    employee_id = models.CharField(max_length=64, null=True, blank=True)

So, after adding employee_id in the Employee model, create the migration file using the command python manage.py makemigrations.

The migration file will have only the migrations.AddField operation for adding the employee_id. The migrations.RunPython operation can be added after the migrations.AddField operation to add the employee IDs to existing employees.

# Generated by Django 3.1.7 on 2021-05-11 13:29

from django.db import migrations, models


def add_employee_id_to_existing_employees(apps, schema_editor):
    Employee = apps.get_model('shop', 'Employee')
    employees = Employee.objects.all().order_by('id')
    for i, employee in enumerate(employees):
        employee_id = f'EMP{employee.hire_date.year}-{i + 1}'
        employee.employee_id = employee_id
        employee.save()


class Migration(migrations.Migration):

    dependencies = [
        ('shop', '0005_employee'),
    ]

    operations = [
        migrations.AddField(
            model_name='employee',
            name='employee_id',
            field=models.CharField(blank=True, max_length=64, null=True),
        ),
        migrations.RunPython(
            add_employee_id_to_existing_employees,
            migrations.RunPython.noop
        )
    ]

After we run migrate this file using the python manage.py migrate command, the employee IDs for the existing employees will be created.

Screenshot from 2021-05-11 19-07-13.png

Example 2: Comma-separated tags to m2m tags

We have a model Book for storing books with fields name and tags. The tags field is currently a CharField which stores tags related to the book in a comma-separated string. We have 5 books in the table.

class Book(models.Model):

    name = models.CharField(max_length=64)
    tags = models.CharField(max_length=256)

Screenshot from 2021-05-11 19-16-38.png

The comma-separated tags approach looks bad, right?

So we create a BookTag model where all the tags will be and add tags as a ManyToManyField in our Book model.

class BookTag(models.Model):

    name = models.CharField(max_length=64)


class Book(models.Model):

    name = models.CharField(max_length=64)
    # tags = models.CharField(max_length=256)
    tags = models.ManyToManyField(BookTag)

When we create the migration file for this, Django created 3 operations:

  • migrations.CreateModel for the BookTag model.

  • migrations.RemoveField for the tags (CharField).

  • migrations.AddField for adding BookTag as a ManyToManyField.

# Generated by Django 3.1.7 on 2021-05-11 13:48

from django.db import migrations, models


class Migration(migrations.Migration):

    dependencies = [
        ('shop', '0007_book'),
    ]

    operations = [
        migrations.CreateModel(
            name='BookTag',
            fields=[
                ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
                ('name', models.CharField(max_length=64)),
            ],
        ),
        migrations.RemoveField(
            model_name='book',
            name='tags',
        ),
        migrations.AddField(
            model_name='book',
            name='tags',
            field=models.ManyToManyField(to='shop.BookTag'),
        ),
    ]

If we migrate this file, all our existing comma-separated tags data will be lost since Django will run the RemoveField operation on our comma-separated tags column.

We need to move the data properly from the old tags column to the new tags column before removing the old one.

We could run the operations in a certain way as illustrated below.

sdasdas.png

# Generated by Django 3.1.7 on 2021-05-11 13:48

from django.db import migrations, models

def create_tags(apps, schema_editor):
    Book = apps.get_model('shop', 'Book')
    BookTag = apps.get_model('shop', 'BookTag')

    for book in Book.objects.all():
        print(f'For book {book.name} ...')
        old_book_tags = book.old_tags.split(',')
        for old_book_tag in old_book_tags:
            try:
                new_tag = BookTag.objects.get(name=old_book_tag)
                print(f'\tTag {new_tag.name} already present ...')
            except BookTag.DoesNotExist:
                new_tag = BookTag.objects.create(name=old_book_tag)
                print(f'\tTag {new_tag.name} created ...')
            book.tags.add(new_tag)
            print(f'\tTag {new_tag.name} added to {book.name}')

class Migration(migrations.Migration):

    dependencies = [
        ('shop', '0007_book'),
    ]

    operations = [
        migrations.CreateModel(
            name='BookTag',
            fields=[
                ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
                ('name', models.CharField(max_length=64)),
            ],
        ),

        migrations.RenameField(
            model_name='book',
            old_name='tags',
            new_name='old_tags',
        ),


        migrations.AddField(
            model_name='book',
            name='tags',
            field=models.ManyToManyField(to='shop.BookTag'),
        ),

        migrations.RunPython(
            create_tags,
            migrations.RunPython.noop
        ),

        migrations.RemoveField(
            model_name='book',
            name='old_tags',
        )
    ]

I had added print statements in the scripts, so while migrating we can see the script got executed.

Screenshot from 2021-05-11 19-34-55.png

We can check the books and their tags which are in a ManyToManyField tags field.

Screenshot from 2021-05-11 19-36-20.png

Tada! Easy, right? That's how you can migrate without worrying and keep your data clean.

Thanks for reading!