[Chore] Migrate to PostgreSQL #93212
Labels
No Label
legacy project
Infrastructure: blender.org
legacy project
Infrastructure: Websites
Priority
High
Priority
Low
Priority
Normal
Status
Archived
Status::Confirmed
Status
Duplicate
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Report
Type
To Do
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: infrastructure/blender-id#93212
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Goal
The goal of this migration is to have a more unified (in terms of number of technologies used) web infrastructure, starting with at least having all Django projects use the same RDBMS. This makes running local development setups and having exactly the same up-to-date dependencies everywhere easier.
One downside to this is the way PostgreSQL major version upgrades have to be done, but this would produce an issue once every several years, while upgrades of Django and surrounding boilerplate of DB client libs and other deps have to happen much more often.
The machine, which runs production Blender ID, already has a PostgreSQL container, so, if production data is moved successfully, doing the swithcover should be easy.
The main problem is copying the data with minimal disruptions.
How production data is copied
./manage.py dumpdata|loaddata
were considered, but without extra hacks, such as migrating first, then truncating all the tables,loaddata
won't actually be able to load all the data. Not only that, but it's also problematic becauseloaddata
will trigger all signals, which we don not want to do for all the production data.Luckily for us, latest https://pgloader.io/ (version 3.6.2 at the moment of writing) does a good job of copying data directly from a running MySQL to a running PostgreSQL.
It has to be built from sources because Debian/Ubuntu are much behind the currently stable working version, as usual. Let's assume we've done that and
pgloader
binary is now located at/opt/pgloader/pgloader-bundle-3.6.2
.In order to be able to copy from MySQL instance to PostgreSQL one,
pgloader
must have access to both, so new user, database and schema must be created in PostgreSQL instance (these can be the same ones Blender ID will later use). Something along these lines executed inpsql
should do the trick:pgloader
also needs to be able to connect to both instances, so we need to find IPs of their containers:This outputs something along the lines of
A script for copying the data can look as follows:
where contents of
mysql_to_psql.load
looks like this:Issues encountered
Invalid datetime values
The first attempt produced some errors due to invalid
datetime
values:This is because MySQL has a bug that allows inserting invalid
0000-00-00 00:00:00
DATETIME values, which are later treated asNULL
s, which in turn breaks theNON NULL
constraint for these values.Invalid inet values
PostgreSQL has a data types for IP addresses and these used by Django's
GenericIPAddressField
, which means that all values contained bylast_login_ip
andcurrent_login_ip
must be castable toinet
, otherwise row will fail to be copied with the following error:To find all offending records we can use:
These also have to be fixed manually before data can be copied fully.
Constraint names are truncated
Updating the rows with invalid values and running
pgloader
again produces no errors, but still produces a lot of warnings about some obnoxiously long index/constraint names being truncated. E.g.:As far as I can tell, this doesn't break anything because the truncated names were still unique and constraints and indices were still created, so for now we assume this is a non-issue.
At this point we have a full copy of Blender ID's production data in our PostgreSQL instance 🎉
To be continued:
Recap of the migration
switch to maintenance
backup everything
copy and switch
bring everything back
Added subscriber: @railla
This issue was referenced by
4ff88bee7b
This issue was referenced by 7a13e8c0a0b8e0bd0e047aed11f6b324cd37915d
This issue was referenced by 27c592e0654174d4e3872b0bdf8f5beee9bbaa84
This issue was referenced by
5ddeaf8047
This issue was referenced by
02cbd58391
Changed status from 'Needs Triage' to: 'Confirmed'
Changed status from 'Confirmed' to: 'Resolved'