Moving data, whether to other environments or to development machines, has traditionally been a problem for us. It is no longer a technical problem.
The Jenkins job which moves the data is scheduled to automatically run against staging every week on a Sunday. This way we always keep staging data close to production. The job can also be run manually against preview or staging, or just to create a clean dump shared on S3.
Moving data is easy as.
Visit our continuous integration server
TARGET(i.e. the database or S3 bucket to fill with new data)
Valid targets are
Find the “Clean and apply database dump - TARGET” job
Click “Build Now”
Wait for like 25 minutes
(Buy more biscuits if there are none left)
You should now see a sanitized version of production on the target stage and a new dump in the S3 bucket.
It’s a bit of a fafflafel, but here’s the basic algorithm what happens whenever you click “Build” (as seen above).
A new Docker container is run on Jenkins from a Postgres base image. This is where we cleanup the production data.
Jenkins checks our database-backups S3 bucket for the latest production database dump (a new one is created every morning) and downloads it.
The dump is then decrypted with GPG using the private key in the credentials repo, unzipped, and then imported into the Docker container.
The database in the container is then cleaned using a series of SQL queries that purge personal and confidential data. This step also generates an amount of pseudorandomly varied data to replace some of the removed data in order to provide test environments with a more realistic, diverse dataset.
The cleaned database is then queried for its alembic version. This is stored for later.
Jenkins then uses
pg_dumpto create a dump of this cleaned database, zips it, and then saves it to disk.
s3then the script skips all the following steps up until the point where it uploads to S3.
stagingthen Jenkins targets the appropriate Cloud Foundry space, fetches the credentials for the postgres service containing that stages database, creates an ssh tunnel to it and applies the cleaned dump to it with
psql. The dumps are created with the
--cleanflag so they will drop any database objects before replacing them. The cleaned dump is synced with S3 in the same way as mentioned above and is then deleted from Jenkins.
There is a chance that the target stage may have more advanced migrations than production. Jenkins compares the alembic versions stored in previous steps, and if they do not match will run the database-migration-paas job against the target stage.
Next, as there is new data in the target stages database, the services and briefs need to be indexed. Jenkins uses the existing
index-<briefs | services>-<preview | staging>jobs for this.
Because new indexes have been created, the aliases being used needs to be shifted. Currently we only have one alias per framework type, i.e.
briefs-digital-outcomes-and-specialists. Jenkins uses the
update-<preview | staging>-index-aliasjob to move it to the new index.
Jenkins then syncs the zipped dump with an S3 bucket in the digitalmarketplace-development account, deletes its local dump, and we’re pretty much done. The sync process removes any old files so the directory doesn’t become massive.
The final step is to clean up. The Docker container and its volume are destroyed, and Jenkins logs out from Cloud Foundry
After all of this is complete, we should have filled one of our non-production databases with cleaned data and saved the dump in a shared place where other developers can acess it. Pretty rad.
So it’s all well and good wiping out everything in the preview setup, but what about when we want to blow up everything in our own databases? How do we do that?
Luckily, this need has been anticipated and it’s actually fairly simple. If you’re using dmrunner, run
make data. Otherwise, you’ll need to:
visit the “digitalmarketplace-cleaned-db-dumps” bucket in S3.
this was created the last time data was moved, so if you want a more up to date version run the “Clean and apply database dump” job with the
gzcat ./cleaned-<stage>-<date>.sql.gz | psql -f - digitalmarketplace(adjust for the target database name and specific path of the downloaded file)
You may need to run migrations locally. From your local api repo run make run-migrations if you do.
Reindex the new data so it appears in search results using
Have one of those biscuits you bought earlier.