Backups are generated by a Jenkins job at 4am every morning. A dump is taken from the production database, which is then
zipped, encrypted and uploaded to a replicated S3 bucket,
digitalmarketplace-database-backups for storage. This is
done in the following steps:
- Jenkins creates a unique file name for the dump in the format
- Jenkins deploys a worker app,
db-backupto the PaaS, using the
deploy-db-backup-appMakefile command in the digitalmarketplace-aws repo. This app has the scripts required to create and upload the dump baked into its Docker image (see Digital Marketplace Docker Hub).
- The following environment variables are required by the app manifest:
DUMP_FILE_NAME- the unique filename generated by Jenkins (passed as an argument to
S3_POST_URL_DATA- A signed url and extra data for POSTing the dump to S3. Explained below. -
RECIPIENT- Used for encryption with GPG, it signifies which public key to use to encrypt.
- An additional variable
PUBKEY(the public key used for encryption) is set after the app has spun up.
db-backupapp then starts a task container which executes the
create-db-dump.sh. This container has its own disk and memory quotas, needed to handle the large file size.
create-db-dump.sh script first imports
PUBKEY into GPG2. It then connects to the database instance in the PaaS and
pg_dump to create a plaintext dump with no owner and no access control list. The dump is streamed to gzip and then
straight to GPG2 for encryption before being written to disk.
Next, a python script,
upload-dump-to-s3.py, is executed for uploading the dump to S3. It uses
signed S3 url generated earlier) and will return an error if upload fails.
Next, Jenkins checks that the new encrypted dump in S3 can be decrypted. This is to ensure that the private key used to
decrypt the dumps is the correct counterpart of the public key used to encrypt. If the private key was rotated and the public
key wasn’t for some reason, we wouldn’t know about it until too late without this check. Jenkins uses a script called
The decrypt script downloads the new dump from S3. It then decrypts and imports the private GPG key from the credentials
repo and imports it. GPG then executes a
--list-packets command on the dump. We don’t actually care about the packets,
but the command needs the correct private key to operate successfully. It means we can test decryption without actually
having to decrypt. Finally it deletes the secret key as well as the downloaded dump.
Finally, Jenkins alerts slack with either a success or failure message and deletes the
db-backup app from the PaaS.
The bucket where zipped and encrypted dumps are stored in the first instance is called
digitalmarketplace-database-backups and is in the Digital Marketplace Backups AWS account.
This bucket has cross region replication enabled and will replicate all new objects to another bucket called
digitalmarketplace-cross-region-database-backups in the
eu-west-2 (london) region.
The buckets are accessible to 1 group and the Jenkins role.
The group is called ‘backups’ and contains the users currently in the
This means that users on 2nd line support as well as permanent admins will be able to GET the backup files.
The Jenkins role only has permissions to PUT or GET on the bucket to prevent deletion of dumps.
The buckets sit in the digitalmarketplace-backups account which can only be accessed using a password reset.
The backups in
retained for 180 and 7 days respectively.
The ‘S3_POST_URL_DATA’ is generated by a script in the AWS repo called
generate-s3-post-url-data.py. It needs to be
executed by an AWS entity with the correct rights to upload to the S3 Jenkins bucket. In our case this is the Jenkins role
assumed by the Jenkins server. The signed URL can then be used by an entity with no permissions on the bucket.
The dumps are being encrypted with GPG2. The public and private keys being used are kept in the digitalmarketplace-credentials repo. The private key is encrypted with SOPS in the usual way. The public key is unencrypted. The private key has a passphrase which is required to use it. This is also in the credentials repo and is also encrypted with SOPS.
The keys use RSA 4096.
There is no automatic process to restore the production database from one of the dumps. If we’re in the situation where it needs to happen, it’s probably quite a serious situation and should probably be done manually. The steps will be similar to below:
Alert the team on the
#dm-releaseSlack channel, and grab the deploy gorilla.
Download the required database dump file (usually the most recent) from the
digitalmarketplace-database-backupsS3. Members of the
adminsgroup and the
production-infrastructuregroup will be able to do this via the GUI console by assuming the backups role in the digitalmarketplace-backups account. If no admin is available, you can ssh into Jenkins and download it via the CLI. The command will be:aws s3 cp s3://digitalmarketplace-database-backups/<filename-on-s3> <path-to-local-destination>
The dump will need to be unencrypted and unzipped, using GPG and gzip. (Mac users: install via
brew install gpg).
Use SOPS to decrypt the secret key from
digitalmarkeplace-credentials/gpg/database-backups/secret.key.encand save it to a file:./sops-wrapper -d gpg/database-backups/secret.key.enc > <path-to-unencrypted-secret-key>
Import the secret key:gpg --import <path-to-unencrypted-secret-key>
You will be asked for a passphrase for the secret key. It can be decrypted with SOPS from
Decrypt the database dump file and unzip. You’ll be asked for the passphrase again:gpg --decrypt <path-to-encrypted-dump> | gunzip > <path-to-local-destination>
Delete the secret key from your GPG keyring:gpg --delete-secret-key 'Digital Marketplace DB backups'
Delete the decrypted secret key file:rm <path-to-unencrypted-secret-key>
PaaS have an event trigger which invokes a function to reassign the owner of an object. The function checks if the current user has a particular role. If that role doesn’t exist in the database it causes an error. Removing the trigger prevents the function being executed. The trigger is removed in our cleaned database dump that is used by developers, however it is still present in the production dump.
Open the database dump file (use an editor that can handle an extremely large file, such as
vi) and remove the following lines:CREATE EVENT TRIGGER reassign_owned ON ddl_command_end EXECUTE PROCEDURE public.reassign_owned();
Ensure that you’re logged in to Cloud Foundry and are in the production space (if that’s where you’re restoring to):cf target -s production
populate-paas-db.shscript in the digitalmarketplace-aws repo to import the decrypted database dump to the database. PaaS will prompt for a password (this will have been copied to your clipboard by the script - it’s as easy as Ctrl+V!)./digitalmarketplace-aws/scripts/populate-paas-db.sh <path-to-decrypted-dump-file>
The restore script will take at least 15 minutes to run. If you need a cup of tea, now is the time.
Test that the data has restored correctly (https://dm-api-production.cloudapps.digital should respond even during maintenance mode).
Delete the unencrypted database dump file from your local machine.
Let stakeholders know that the restore has been completed.
Ensure the team has a plan for reconciling any lost data, and how this will be communicated to users.
Toggle maintenance mode to ‘recovery’ to restore access to the API apps only.
Re-sync the Elasticsearch indices for services and briefs, using the Jenkins catchup jobs:
Toggle maintenance mode to ‘live’ to restore access to the Frontend apps.