Jenkins

What we use Jenkins for

We use Jenkins for:

  • deploying applications through our release pipeline
  • running smoke tests and functional tests
  • running scheduled scripts (usually overnight)
  • tagging packaged libraries and Docker images, after a change has been merged

Jenkins can be accessed at https://ci.marketplace.team.

Our Jenkins jobs are defined in the digitalmarketplace-jenkins repo, in the job definitions directory.

Accessing and deploying jobs to Jenkins

Developers can have access to Jenkins once they are security cleared.

For HTTPS and SSH access to Jenkins you must be on GDS network or VPN; allowed IP addresses are defined in digitalmarketplace-credentials/terraform/common.json. Authentication is done using your Github account; the list of allowed users is found in digitalmarketplace-credentials/jenkins-vars/jenkins.yaml.

For SSH access to Jenkins, we use Github developer SSH keys. These keys are gathered by the keys ansible task, which needs to be re-run to propagate any changes made to the set of trusted SSH keys. We also have a shared key called ci in the aws-keys directory of the digitalmarketplace-credentials repo, however you should generally not need to use this.

See Adding and removing access for new starters / leavers for a step-by-step guide.

Instructions on how to SSH into the Jenkins box or deploy new Jenkins jobs can be found in the digitalmarketplace-jenkins repo README.

Jenkins Logging

For debugging and auditing purposes, we have configured Jenkins to log various events. You can log into the CloudWatch console and look at the relevant log groups to see what’s been happening.

All of the below log events are streamed to AWS CloudWatch using the Amazon CloudWatch Logs Agent (see PR #173 for details).

Types of events Log file (relative to /var/log) CloudWatch log group Further details
Job events jenkins/audit-trail.log jenkins-access PR #170
Web access jenkins/access.log jenkins-audit PR #171
SSH access auth.log server-login-access PR #172

Jenkins infrastructure

The infrastructure on which Jenkins relies is created in the main AWS account.

Shared Infrastructure: AWS resources created once per account and shared across all Jenkins instances:

  • ELB wildcard certificate
  • IAM profile/policy document
  • EBS snapshot policy
  • S3 bucket to store access logs

These are defined in the Terraform main account.

Per-instance Infrastructure: AWS resources created once for each jenkins instance. Each Jenkins instance has its own:

  • EC2 instance
  • Elastic load balancer (ELB), that uses the shared certificate
  • DNS ‘A’ record in Route 53
  • Security groups for the EC2 and ELB instances

These are defined in the jenkins.jenkins module and can be instantiated in the main account as many times as we like (typically once). See Creating a new Jenkins instance for details on how to create these resources.

Shared Infrastructure

ELB certificate

Follow along here

The route53.tf file in the main account defines the certificate used for the Jenkins ELB (see below). The certificate is a wildcard that covers all subdomains under .marketplace.team.. This should make it easier if we have to replace the current Jenkins instance with a new one. The certificate is validated by a DNS record, which is also defined here.

Roles and policies

Follow along here

Jenkins has an instance profile. An instance profile is a way of applying a AWS IAM Role (and its associated permissions) to an AWS EC2 instance. This way, any actions performed by that instance will have the permissions defined. The permissions that Jenkins has are defined in the jenkins_instance_profile.tf file in the main account.

The role the instance profile uses is defined here, as well as a policy document with a bunch of statements in. These statements define what Jenkins can do, including accessing various S3 buckets, and sending application logs to CloudWatch. If you’re looking to change Jenkins permissions, this is probably where you want to do it.

Permissions defined on other resources need to know the arn (amazon resource name) of the Jenkins instance profile, so that Jenkins has access. This is why it’s defined here, rather than in the module. A new module can reuse this instance profile without worrying about breaking permissions elsewhere.

Access log S3 bucket

Follow along here

For audit purposes we want to keep a log of who is accessing the Jenkins server; one way we do this is to configure ELB to export access logs. These go to an S3 bucket, which is created once in the main account using the jenkins.log_bucket terraform sub-module.

EBS snapshots

Follow along here

This module defines the snapshot policies for EBS volumes tagged as jenkins data (currently every 24 hrs at 23:30). Created once in the main account using the jenkins.snapshots terraform sub-module.

Per-instance Infrastructure

EC2

Follow along here

This is the instance that Jenkins runs on. It’s based on a public image which is currently ami-01e6a0b85de033c99 which is an Ubuntu 18.04 LTS image. If you’re particularly interested in ubuntu images, read this list. For more information on finding an AMI, look at this useful post

We currently use a t3.large instance. As these are charged at a lower ‘reserved’ pricing tier. Agreed with AWS by the Reliability Engineering team.

This section defines an elastic IP address which is associated with the instance. This is important as it allows us to add this IP to our whitelist for the router. The environment specific whitelists can be found in the credentials repo in the vars folder.

We also define a key pair to use - the public part is injected into the Terraform as a variable by the Makefile we use to run our Terraform code. This key lives in our credentials repo under aws-keys.

An extra 100GB volume is defined and attached to the instance. This volume is mounted in the Ansible set up.

The instance is defined with a couple more attributes, specifically an instance profile and a security group, which are described below.

ELB

Follow along here

Jenkins runs behind an elastic load balancer (ELB). The main reason for this is to take advantage of the free certificate available from AWS for load balancers.

The ELB listens on port 443 and port 22 and proxies traffic on to the Jenkins instance on port 80 and 22 respectively. It terminates our https traffic and sends it on as http. Both the ELB and the instance have strict security groups to protect this traffic, as well as being in the same VPC and subnet.

The ELB performs health checks on the instances that are attached to it (just our Jenkins instance). It does this by pinging a TCP request to port 22. Usually a health check is performed on port 80, however when standing up a brand new Jenkins instance with Terraform, there is nothing listening on port 80. If the health check is returned unhealthy twice in a row then the instance is removed from the ELB. It is not destroyed.

To keep an audit trail, we configure the ELB to export access logs every 60 minutes. These are saved to an S3 bucket (details below).

DNS record (Route 53)

Follow along here

We create an A record for the dns_name variable passed in to the module. This will be something like ci.marketplace.team, but can actually use any subdomain due to our use of a wildcard certificate. Usually an A record would point directly to an IP address, however AWS allows you to define an alias to a load balancer. This is convenient as the DNS of a load balancer is long and horrible and not guaranteed to remain consistent. The code here will automatically grab the details required for this alias from the load balancer created elsewhere in the module.

Security Groups

Follow along here

This is where we define who can access the ELB and the EC2 instance. Two security groups are defined, one for each of the ELB and EC2, with different ingress (who can talk to the box) and egress (who the box can talk to).

The ELB security group allows TCP access to port 22 only for IP address in our list of developer IPs. These can be found in digitalmarketplace-credentials/terraform/common.json. Any traffic received on port 22 from a whitelisted IP is proxied directly to port 22 on the EC2 instance. Rather than defining the location of the EC2 instance with a CIDR block, we just define the source security group (the security group of the EC2 instance).

The ELB security group allows HTTPS access on port 443 to the list of developer IPs as well as the elastic IP used by the EC2 instance. This is important, as some of the Ansible code requires an HTTP request be made to itself. HTTPS traffic is terminated here and proxied on to port 80 in the EC2 instance via HTTP. Both the ELB and EC2 are in our private AWS VPC and on the same subnet within that, so the unsecured traffic never leaves our own network.

The EC2 security group works in a similar way to the ELB security group, except ingress is restricted to ports 80 and 22 and only to traffic coming from the ELB (by defining the source security group as the ELB’s security group). Egress for the instance is unrestricted.

../_images/jenkins-security-group-setup.png

(The security group diagram source can be opened using draw.io in Google Drive if it needs to be edited.)

Installing Jenkins itself

Jenkins itself is configured using Ansible, while the jobs within Jenkins use Jenkins Job Builder.

Once the infrastructure steps above have completed, run make jenkins TAGS=all from the digitalmarketplace-jenkins repo to install and configure Jenkins and Jenkins Job Builder.

Warning

It is possible to manually edit config and jobs in the Jenkins web UI, however any config changes that deviate from what Jenkins Job Builder maintains will be lost whenever Jenkins restarts. Make sure any changes are committed to the digitalmarketplace-jenkins repo and applied using make jenkins TAGS=config or make jenkins TAGS=jobs. This includes enabling/disabling jobs as well as editing what the jobs actually do.

We also use various Jenkins plugins (such as the pipeline workflow). More details about these can be found in the digitalmarketplace-jenkins repo README.

Destroying Jenkins

The terraform module for Jenkins enables instance termination protection to make it harder to accidentally delete the root volume for the Jenkins instance. Terminating the instance is not possible from the AWS console or cli. If you want to terminate the Jenkins instance you can either remove it from the terraform, or follow the instructions on AWS on how to change termination protection.

Rebuilding Jenkins from scratch

If the worst happens, or we just fancy it, and we need to completely recreate Jenkins please see here for a rough guide.

How Jenkins uses credentials

As a CI/CD server, Jenkins needs access to many of our credentials in order to run jobs that automate our build and deployment processes, as well as to run other jobs as part of maintaining the Digital Marketplace. These credentials are exposed to our jobs in some different ways, though our plan going forward is primarily to pass any secrets through environment variables.

Jenkins global configuration

A number of our Jenkins jobs expect to be given certain environment variables containing the credentials they need. Generally, where this is the case, these are sourced from Jenkins global configuration (Manage Jenkins -> Configure System -> Global properties). Here we define a number of tokens (primarily for the Data and Search APIs across all environments, and for Notify/Mailchimp). These tokens are statically defined in the digitalmarketplace-credentials repo in jenkins-vars/jenkins.yaml and are updated when Jenkins config is applied (with make jenkins TAGS=config from the digitalmarketplace-jenkins repository).

If these tokens are not in sync with what the apps in those stages are expecting, a number of our jobs will fail due to be unable to authenticate with eg the APIs.

Some examples of jobs that read from Jenkins global configuration:
  • stats_snapshots
  • notify_suppliers_of_new_questions_answers
  • smoke_tests
  • functional_tests
  • index_briefs
  • index_services

Decrypting dm-credentials

Other Jenkins jobs use a local checkout of the dm-credentials repository maintained on the Jenkins host. A separate job, update-credentials, pulls the latest master branch of the repository to make sure it’s updated as needed. Most of the jobs that use this method are those which interact with CloudFoundry to deploy/manage our applications and services.

Some examples of jobs that decrypt dm-credentials on demand:
  • clean_and_apply_db_dump
  • database_migration_paas
  • release_application_paas
  • build_image