Our database backup and restore process is currently undergoing renewal. Information on this page could be outdated and/or inaccurate, and is subject to change. Before acting on any of the information contained on this page discuss it with the whole rest of the team first.
Backups are generated by a Jenkins job at 4am every morning. A dump is taken from the production database, which is then
zipped, encrypted and uploaded to a replicated S3 bucket,
digitalmarketplace-database-backups for storage. This is
done in the following steps:
Jenkins creates a unique file name for the dump in the format
Jenkins deploys a worker app,
db-backupto the PaaS, using the
deploy-db-backup-appMakefile command in the digitalmarketplace-aws repo. This app has the scripts required to create and upload the dump baked into its Docker image (see Digital Marketplace Docker Hub).
The following environment variables are required by the app manifest: -
DUMP_FILE_NAME- the unique filename generated by Jenkins (passed as an argument to
S3_POST_URL_DATA- A signed url and extra data for POSTing the dump to S3. Explained below. -
RECIPIENT- Used for encryption with GPG, it signifies which public key to use to encrypt.
An additional variable
PUBKEY(the public key used for encryption) is set after the app has spun up.
db-backupapp then starts a task container which executes the
create-db-dump.sh. This container has its own disk and memory quotas, needed to handle the large file size.
create-db-dump.sh script first imports
PUBKEY into GPG2. It then connects to the database instance in the PaaS and
pg_dump to create a plaintext dump with no owner and no access control list. The dump is streamed to gzip and then
straight to GPG2 for encryption before being written to disk.
Next, a python script,
upload-dump-to-s3.py, is executed for uploading the dump to S3. It uses
signed S3 url generated earlier) and will return an error if upload fails.
Next, Jenkins checks that the new encrypted dump in S3 can be decrypted. This is to ensure that the private key used to
decrypt the dumps is the correct counterpart of the public key used to encrypt. If the private key was rotated and the public
key wasn’t for some reason, we wouldn’t know about it until too late without this check. Jenkins uses a script called
The decrypt script downloads the new dump from S3. It then decrypts and imports the private GPG key from the credentials
repo and imports it. GPG then executes a
--list-packets command on the dump. We don’t actually care about the packets,
but the command needs the correct private key to operate successfully. It means we can test decryption without actually
having to decrypt. Finally it deletes the secret key as well as the downloaded dump.
Finally, Jenkins alerts slack with either a success or failure message and deletes the
db-backup app from the PaaS.
The bucket where zipped and encrypted dumps are stored in the first instance is called
digitalmarketplace-database-backups and is in the Digital Marketplace Backups AWS account.
This bucket has cross region replication enabled and will replicate all new objects to another bucket called
digitalmarketplace-cross-region-database-backups in the
eu-west-2 (london) region.
The buckets are accessible to 1 group and the Jenkins role.
The group is called ‘backups’ and contains the users currently in the
This means that users on 2nd line support as well as permanent admins will be able to GET the backup files.
The Jenkins role only has permissions to PUT or GET on the bucket to prevent deletion of dumps.
The buckets sit in the digitalmarketplace-backups account which can only be accessed using a password reset.
The backups in
retained for 180 and 7 days respectively.
The ‘S3_POST_URL_DATA’ is generated by a script in the AWS repo called
generate-s3-post-url-data.py. It needs to be
executed by an AWS entity with the correct rights to upload to the S3 Jenkins bucket. In our case this is the Jenkins role
assumed by the Jenkins server. The signed URL can then be used by an entity with no permissions on the bucket.
The dumps are being encrypted with GPG2. The public and private keys being used are kept in the digitalmarketplace-credentials repo. The private key is encrypted with SOPS in the usual way. The public key is unencrypted. The private key has a passphrase which is required to use it. This is also in the credentials repo and is also encrypted with SOPS.
The keys use RSA 4096.
This process does not use the database backups described above, but instead uses the GOV.UK PaaS provided backups.
This method has not yet been tested.
There is no automatic process to restore the production database from one of the dumps. If we’re in the situation where it needs to happen, it’s probably quite a serious situation and should probably be done manually. The steps will be similar to below:
Alert the team on the
#dm-releaseSlack channel, and grab the deploy gorilla.
Ensure that you’re logged in to Cloud Foundry and are in the production space (if that’s where you’re restoring to):cf target -s production
Follow the steps in the PaaS manual to create a new PostgreSQL service from a snapshot or point in time (depending on need). Give the new service a descriptive name like
The restore will take at least 15 minutes to run. If you need a cup of tea, now is the time.
Test that the data has restored correctly (https://dm-api-production.cloudapps.digital should respond even during maintenance mode).
Let stakeholders know that the restore has been completed.
Ensure the team has a plan for reconciling any lost data, and how this will be communicated to users.
Rename the existing
digitalmarketplace_api_dbservice to something like
digitalmarketplace_api_db_old(or just delete it altogether), and rename the
Revert the change to the production app manifest variables, and rerelease all apps as above.
Toggle maintenance mode to ‘recovery’ to restore access to the API apps only.
Re-sync the Elasticsearch indices for services and briefs, using the Jenkins catchup jobs:
Toggle maintenance mode to ‘live’ to restore access to the Frontend apps.