Category Archives: Configuration Management

Slaying the Development branch – Evolving from git-flow (Part 1)

Continuous Integration and Continuous Delivery (CI/CD) is essential to any modern day, mission critical software development life cycle.

The economic logic is simple. If you’re paying a developer a lot of money to fix bugs or add features to your software, it doesn’t make sense to have those bug fixes and features sitting in a build pipeline for 2 months waiting to be deployed. Its the equivalent of spending money on stock for your grocery store and leaving it on a shelf in your loading bay instead of putting it in the shop window.

But taking legacy software development life cycles and refactoring them so that they can use CICD is a significant challenge. It is much harder to re-factor embedded, relatively stable processes than to design new ones from the ground up.

This was a challenge I was faced with in my most recent employment. This article, and its sequel, describe some of the challenges I encountered and how they were resolved, focusing specifically on how we evolved our source control management strategy from one based on git-flow to one that permitted merging code changes directly from Feature branches to Production.

I’ll begin by describing the environment as I found it.

This was a Multi Tenant Software as a Service (SAAS) provided over the Internet on a business to business basis. The SAAS comprised 16 individual services with a variety of MySQL and PostgreSQL as data stores. The services were built with Java (for processing and ETL operations) and Rails (for Web UI and API operations).

The business profile required parallel development streams, so source control was based on the git-flow model. Each project had a development branch, from which feature branches were taken. Feature branches were merged into concurrent release branches. Builds were created from release branches and deployed in the infrastructure tiers (Dev, QA, UAT, Staging, Prod). There was no historical deployment branch and no tagging. Each release cycle lasted approximately 6 weeks. A loose Agile framework applied, in that stories were part of releases, but Agile processes were not strictly followed.

Infrastructure used in the software development life cycle was shared. There were monolithic central Dev, QA, UAT environments etc. Local developer environments were not homogenous. Java developers couldn’t run local Rails apps and vice versa. All code was tested manually in centralised, shared environments.

The situation described above would be reasonably typical in software development environments which have evolved without a DevOps culture and dedicated Operations resources (ie where development teams build the environments and processes).

While the development/deployment process in this environment was working, it was sub-optimal, resulting in delays, cost overruns and issues with product quality.

A plan was developed to incrementally migrate from the current process to a CI/CD-based process. This involved changes to various functions, but perhaps the most important change was to the source control management strategy, which is what I want to deal with in detail in this article.

A typical development cycle worked as follows.

In every git project, the development branch was the main branch. That is to say, the state of all other branches was relative to the development branch (ahead or behind in terms of commits).

For a scheduled release, a release branch was created from the development branch. For for the purposes of illustration, lets call this release1. Stories scheduled for release1 were developed in feature branches taken from development, which were then merged into release1. These features also had to be merged to development. When all features were merged to release1, release1 was built and deployed to QA.

At the same time, work would start on release2, but a release2 branch would not be created, nor would release2 features be merged to development, as development was still being used as a source for release1 features. Only when development for release1 was frozen could release2 features be merged to development, and only when release1 was built for Production was a release2 branch created.

This system had been inherited from a simpler time when the company was younger and the number of applications comprising the platform was much smaller. Its limitations were obvious to all concerned, but the company didn’t not have a dedicated “DevOps” function until later in its evolution, so no serious attempt had been made to re-shape it.

From talking to developers, it became clear that the primary source of frustration with the system was the requirement to have to merge features to multiple branches. This was particularly painful when a story was pulled from a release, where the commit was reversed in the release branch but not the development branch. It was not infrequent for features to appear in one release when they were scheduled for another.

After talking through the challenge, we decided on a number of requirements:

1. Features would only be merged to one other branch

2. We could have concurrent release branches at any time

3. We would have a single historical “Production” branch, called “deploy”, which was tagged at each release

4. At the end of the process, we would only be one migration away from true CI/CD (merging features directing to deploy)

5. We would no longer have a development branch

From the outset, we knew the requirement that would present the biggest challenge was to be able to maintain concurrent release branches, because when multiple branches are stemmed from the same source branch, you always run the risk of creating merge conflicts when you try to merge those branches back to the source.

At this juncture, its probably wise to recap on what a merge conflict is, as this is necessary to understand how we approached the challenge in the way that we did.

A merge conflict occurs between 2 branches when those branches have a shared history, but an update is made to the same line in the same file after those branches have diverged from their common history. If a conflict exists, only one of the branches can be merged back to the common source.

If you think of a situation in which 2 development teams are working on 2 branches of the same project taken from the same historical branch, and those 2 branches ultimately have to be merged back to that historical branch, you can see how this could present a problem.

When you then extrapolate that problem out over 16 individual development projects, you see how you’re going to need a very clearly defined strategy for dealing with merge conflicts.

Our first step was to define at which points in the development cycle interaction with the source control management system would be required. This was straightforward enough:

1. When a new release branch was created

2. When a new patch branch was created

3. When a release was deployed to Production

We understood that at each of these points, source control would have to be updated to ensure that all release branches were in sync, and that whatever method we used to ensure they were in sync would have to be automated. In this instance, “in sync” means that every release branch should have any commits that are necessary for that release. For instance, if release1 were being deployed to Production, it was important that release2 should have all release1 commits after the point of deployment. Similarly, if we were creating release3, release3 should have all commits from release2 etc etc.

However, we knew that managing multiple branches in multiple projects in this way was bound to produce merge conflicts, but at the same time, we didn’t want a situation in which a company-wide source control management operation was held up by a single merge conflict in a single project.

In light of this, we decided to do something a little bit controversial.

If our aim was to keep branches in sync and up to date, we decided that branching operations should focus on this goal, and that we would use a separate mechanism to expose and resolve merge conflicts. Crucially, this part of the process would occur prior to global branching updates, so that all branches arrived at the point of synchronisation in good order.

So, to return to the 3 points where we interacted with source control, we decided on the following:

1. When a new release branch was created

This branch would be created from the latest tag on the historical production branch ( “deploy”). All commits from all other extant release branches would be merged to this branch, resulting in a new release branch that already contained all forward development. When a merge conflict existed between the new branch and an extant release branch, the change from the extant branch would be automatically accepted (–merge-strategy=theirs)

2. When a new patch branch was created

This branch would be created from the latest tag on the deploy branch . No commits from any other extant release branches would be merged to this branch, because we did not want forward development going into a patch. Because no extant release branches were being merged to the patch, there was no need to deal with merge conflicts at this point.

3. When a release was deployed to Production

At this point, the release branch would be merged to the deploy branch, and the deploy branch would then be merged to any extant release branches. This would ensure that everything that had been deployed to Production was included in forward development. When a merge conflict existed between the deploy branch and an extant branch, the change from the extant branch would be automatically accepted (–merge-strategy=ours). The release branch that had been merged would be deleted, and the the deploy branch tagged with the release version number.

We decided to refer to the automatic acceptance of changes during merge operations as “merge conflict suppression”. In the next part of the article, I’ll explain how we decided to deal with merge conflicts in a stable and predictable way.

 

 

 

Thoughts on Ansible variables

If you want to use Ansible to really empower your configuration management function, its important to have a solid understanding of how variables work.

Here’s a few must-knows:

Values in ansible.cfg are environment variables, not script variables

The ansible.cfg file is provided to allow the user set default values that are used when ansible is executed from a local environment. This file isn’t a YAML file, which is why assignments use “=” rather than “:”.

The values in this file are set as environment variables when Ansible runs. You cannot access these values directly as script variables eg

remote_user = root

does not provide you with a

{{ remote_user }}

variable in your playbooks.

Its important to create your variables in the right place: inventory or play

Generally, a variable will apply to either a host (or group of hosts), or to a task (play) within a playbook. Decide early where your variable applies and create it in the right place.

For variables that apply to hosts (eg a username to login with) create the variable in either:

Your inventory file:

[server_group_1]

server1 ansible_ssh_user=admin

Under your group_vars directory:

#file: ./groups_vars/server_group_1

ansible_ssh_user=admin

Under your host_vars directory:

#file: ./host_vars/server1

ansible_ssh_user=admin

You can also create host-related variables deeper in your playbook:

- hosts: webservers
  remote_user: admin

but I don’t recommend this. Ansible provides sufficient functionality to create an abstraction layer for variables above the play/task level, and it makes sense to use it.

For variables that are specific to plays, the value can be set closer to the point of execution, for example:

After the hosts specification:

- hosts: webservers
  vars:
     app_version: 12.03

As a parameter for the role that is being applied to the hosts:

- hosts: webservers
  roles:
    - { role: app, app_version: 12.03 }

Variables in Ansible have precedence rules

Particular care needs to be paid to precedence. In instances, you may want a variable to have an absolute value which cannot be changed by an assignment in any other part of the playbook or from the command line. In other instances you may wish to allow a variable to be changed. These behaviours are controlled by where you create the assignment of the variable.

 

Installing Passenger for Puppet on Amazon Linux

Introduction

Puppet ships with a web server called Web Brick. This is fine for test and use with a small number of nodes, but will cause problems with larger fleets of nodes. It is recommended to use the Ruby application server, Passenger, to run Puppet in production environments.

Setup

Provision a new server instance.

Install required RPMs. Use Ruby 1.8 rather than Ruby 2.0. Both are shipped with the Amazon Linux AMI at the time of writing, but you need to set up the server to use version 1.8 by default.

sudo yum install -y ruby18 httpd httpd-devel mod_ssl ruby18-devel rubygems18 gcc mlocate
sudo yum install -y gcc-c++ libcurl-devel openssl-devel zlib-devel git

Make Ruby 1.8 the default

sudo alternatives --set ruby /usr/bin/ruby1.8

Set Apache to start at boot

sudo chkconfig httpd on

Install Passenger gem

sudo gem install rack passenger

Update the location DB (you will need this to find files later)

sudo updatedb

Find the path to the installer and add this to the path

locate passenger-install-apache2-module
sudo vi /etc/profile.d/puppet.sh
 
export PATH=$PATH:/usr/lib/ruby/gems/1.8/gems/passenger-5.0.10/bin/
 
sudo chmod 755 /etc/profile.d/puppet.sh

Make some Linux swap space (the installer will fail on smaller instances if this doesn’t exist)

sudo dd if=/dev/zero of=/swap bs=1M count=1024
sudo mkswap /swap
sudo chmod 0600 /swap
sudo swapon /swap

At this point, open a separate shell to the server (you should have 2 shells). This isn’t absolutely essential, but the installer will ask you to update an Apache file mid-flow, so if you want to do things to the letter of the law, a second shell helps.

Next, run the installer, and accept the default options.

sudo /usr/lib/ruby/gems/1.8/gems/passenger-5.0.10/bin/passenger-install-apache2-module

The installer will ask you to add some Apache configuration before it completes. Do this in your second shell. Add the config to a file called /etc/httpd/conf.d/puppet.conf. You can ignore warning about the PATH.

<IfModule mod_passenger.c>
  PassengerRoot /usr/lib/ruby/gems/1.8/gems/passenger-5.0.10
  PassengerDefaultRuby /usr/bin/ruby1.8
</IfModule>

Restart Apache after you add this and then press Enter to complete the installation

Next, make the necessary directories for the Ruby application

sudo mkdir -p /usr/share/puppet/rack/puppetmasterd
sudo mkdir /usr/share/puppet/rack/puppetmasterd/public /usr/share/puppet/rack/puppetmasterd/tmp

Copy the application config file to the application directory and set the correct permissions

sudo cp /usr/share/puppet/ext/rack/files/config.ru /usr/share/puppet/rack/puppetmasterd/
sudo chown puppet:puppet /usr/share/puppet/rack/puppetmasterd/config.ru

Add the necessary SSL config for the ruby application to Apache. You can append this to the existing puppet.conf file you created earlier. Note that you need to update this file to specify the correct file names and paths for your Puppet certs (puppet.pem in the example below).The entire file should now look like below:

LoadModule passenger_module /usr/lib/ruby/gems/1.8/gems/passenger-5.0.10/buildout/apache2/mod_passenger.so
<IfModule mod_passenger.c>
  PassengerRoot /usr/lib/ruby/gems/1.8/gems/passenger-5.0.10
  PassengerDefaultRuby /usr/bin/ruby1.8
</IfModule>
# And the passenger performance tuning settings:
# Set this to about 1.5 times the number of CPU cores in your master:
PassengerMaxPoolSize 12
# Recycle master processes after they service 1000 requests
PassengerMaxRequests 1000
# Stop processes if they sit idle for 10 minutes
PassengerPoolIdleTime 600
Listen 8140
<VirtualHost *:8140>
    # Make Apache hand off HTTP requests to Puppet earlier, at the cost of
    # interfering with mod_proxy, mod_rewrite, etc. See note below.
    PassengerHighPerformance On
    SSLEngine On
    # Only allow high security cryptography. Alter if needed for compatibility.
    SSLProtocol ALL -SSLv2 -SSLv3
    SSLCipherSuite EDH+CAMELLIA:EDH+aRSA:EECDH+aRSA+AESGCM:EECDH+aRSA+SHA384:EECDH+aRSA+SHA256:EECDH:+CAMELLIA256:+AES256:+CAMELLIA128:+AES128:+SSLv3:!aNULL:!eNULL:!LOW:!3DES:!MD5:!EXP:!PSK:!DSS:!RC4:!SEED:!IDEA:!ECDSA:kEDH:CAMELLIA256-SHA:AES256-SHA:CAMELLIA128-SHA:AES128-SHA
    SSLHonorCipherOrder     on
    SSLCertificateFile      /var/lib/puppet/ssl/certs/puppet.pem
    SSLCertificateKeyFile   /var/lib/puppet/ssl/private_keys/puppet.pem
    SSLCertificateChainFile /var/lib/puppet/ssl/ca/ca_crt.pem
    SSLCACertificateFile    /var/lib/puppet/ssl/ca/ca_crt.pem
    SSLCARevocationFile     /var/lib/puppet/ssl/ca/ca_crl.pem
    #SSLCARevocationCheck   chain
    SSLVerifyClient         optional
    SSLVerifyDepth          1
    SSLOptions              +StdEnvVars +ExportCertData
    # Apache 2.4 introduces the SSLCARevocationCheck directive and sets it to none
    # which effectively disables CRL checking. If you are using Apache 2.4+ you must
    # specify 'SSLCARevocationCheck chain' to actually use the CRL.
    # These request headers are used to pass the client certificate
    # authentication information on to the puppet master process
    RequestHeader set X-SSL-Subject %{SSL_CLIENT_S_DN}e
    RequestHeader set X-Client-DN %{SSL_CLIENT_S_DN}e
    RequestHeader set X-Client-Verify %{SSL_CLIENT_VERIFY}e
    DocumentRoot /usr/share/puppet/rack/puppetmasterd/public
    <Directory /usr/share/puppet/rack/puppetmasterd/>
      Options None
      AllowOverride None
      # Apply the right behavior depending on Apache version.
      <IfVersion < 2.4>
        Order allow,deny
        Allow from all
      </IfVersion>
      <IfVersion >= 2.4>
        Require all granted
      </IfVersion>
    </Directory>
    ErrorLog /var/log/httpd/puppet-server.example.com_ssl_error.log
    CustomLog /var/log/httpd/puppet-server.example.com_ssl_access.log combined
</VirtualHost>

The ruby application is now ready. Install the puppet master application. Note, do NOT start the puppetmaster service or set it to start at boot.

sudo yum install -y puppet-server

Restart Apache and test using a new puppet agent. You can also import the the ssl assets from an existing puppet master into /var/lib/puppet/ssl. This will allow you existing puppet agents to continue to work.

Allowing puppet agents manage their own certificates

What?

Why would you want to allow a puppet agent manage the certificates the puppet master holds for that agent? Doesn’t that defeat the whole purpose of certificate based authentication in puppet?

Well, yes, it does, but there are situations in which this is useful, but only where security in not a concern!!

Enter Cloud Computing.

Servers in Cloud Computing environments are like fruit flies. There are millions of them all over the world being born and dying at any given time. In a an advanced Cloud configuration they can have lifespans of hours, if not minutes.

As puppet generally relies on fully qualified domain names to match agent requests to stored certificates, this can become a bit of a problem, as server instances that come and go in something like Amazon AWS can sometimes be required to have the same hostname at each launch.

Imagine the following scenario:

You are running automated performance testing, in which you want to test the amount of time if takes to re-stage an instance with a specific hostname and run some tests against it. Your script both launches the instance an expects the instance to contact a puppet master to obtain its application.

In this case, the first time the instance launches, the puppet agent will generate a client certificate signing request, send that to the master, get it signed and pull the necessary catalog. The puppet master will then have certificate for that agent.

Now, you terminate the instance and re-launch it. The agent presents another signing request, with the same hostname, but this time the puppet master refuses to play, telling you that it already has a certificate for that hostname, and the one you are presenting doesn’t match.

You’re snookered.

Or so you think. The puppet master has a REST api that is disabled by default but when you can open up to it receive HTTP requests to manage certificates. To enable the necessary feature, add the following to your auth.conf file

path /certificate_status
auth any
method find, save, destroy
allow *

Restart the puppet master when you’ve done this.


sudo service puppetmaster restart

Next, when you start you server instance, include the following script at boot. It doesn’t actually matter when this is run, provided it is run after the hostname of the instance has been set.


#!/bin/bash

curl -k -X PUT -H "Content-Type: text/pson" --data '{"desired_state":"revoked"}' https://puppet:8140/production/certificate_status/$HOSTNAME

curl -k -X DELETE -H "Accept: pson"  https://puppet:8140/production/certificate_status/$HOSTNAME

rm -Rf /var/lib/puppet/ssl/*

puppet agent -t

This will revoke and delete the agent certificate on the master, delete the agent’s copy of the certificate and renew the signing process, giving you new certs on the agent and master and allowing the catalog to be ingested into the agent.

You can also pass a script like this as part of the Amazon EC2 process of launching an instance.

aws ec2 run-instances  --user-data file://./pclean.sh

Where pclean.sh is the name of the locally saved script file, and it is saved in the same directory as your working directory (otherwise include the absolute path).

With this in place, each time you launch a new instance, regardless of its hostname, it will revoke any existing cert that has the same hostname, and generate a new one.

Obviously, if you are launching hundreds of instances at the same time, you may have concurrency issues, and some other solution will be required.

Again, this is only a solution for environments where security is not an issue.