Encryption made easy

Encryption can be a bit of a black box for people working in operations and engineering, but when it’s broken down into basic mathematics, it’s a lot more accessible.

I’ve written a collection of Python scripts that demonstrate the most common implementations of encryption: Diffie-Hellman Key Exchange, RSA Public Private Key Encryption, and Elliptic Curve Encryption.

https://github.com/garrethmcdaid/python-toy-encryption

The anatomy of a Bitcoin transaction

Anatomy of a Bitcoin transaction

Anatomy of a Bitcoin transaction

1. Origination

Every new Bitcoin transaction starts with an existing Bitcoin transaction.

For the purposes of this explanation, let’s assume that a Wallet means a single Public Private Key Pair. The Wallet will have previously scanned the blockchain and know that a particular transaction in a particular block has an output that contains an amount of Bitcoin that has been sent to the Wallet’s address (which is derived from the Wallet’s public key).

The Wallet can then being to construct a new transaction based on the output of the existing transaction.

2. Inputs

The Wallet first creates a list of inputs. In our case, we will assume only 1 input, but there can be more than one.

This input refers to the ID of the existing transaction, and the sequence number of the output in that transaction (there can be multiple outputs in a transaction too).

The input also contains an unlocking script. This comprises a digital signature, created with the Wallet’s Private Key, and a copy of the Wallet’s Public Key (remember, the output represents Bitcoin sent to the Wallet creating the transaction).

When the transaction is later validated by a Bitcoin node, a test will be performed on the digital signature to verify if it matches a Private Key.

3. Outputs

The Wallet now creates a list of outputs. In our case, we will assume only 1 output, but there can be more than one. In most cases, there are at least 2 outputs, where the 2nd output is created to return a balance of change to the Wallet that is sending the Bitcoin.

An output must contain the amount of Bitcoin that is being transferred ( this is the UTXO = Unspent Transaction Output )

An output will also include a locking script, which will include a hash of the Public Key of the Wallet to which the Bitcoin is being sent ( the “address” of the destination Wallet ).

4. Validation

Once the Wallet has completed the construction of the transaction, the transaction is passed to the nearest Bitcoin node, which forwards it to the Bitcoin node network. Each node that receives the transaction validates it.

The validation process can be summarised as follows:

  1. The Signature in the Input unlocking script is checked to ensure it is a valid signature
  2. The PublicKey in the Input unlocking script is converted into a PublicKeyHash
  3. This PublicKeyHash is compared to the PublicKeyHash in the Output locking script ( recipient address == Wallet address )
  4. The signature in the Input locking script is matched against the PublicKeyHash in the Output unlocking script to ensure both refer to the same PrivateKey
  5. The final output of all of the above should evaluate to TRUE at the end of execution of both scripts

Once a node validates the transaction in this way, it places it in the transaction pool. The transaction pool exists on every node, so every node that receives the transaction will have that transaction in their transaction pool.

5. Mining

Each mining node (which is also a standard node) will also have its own copy of the transaction pool. The miner will attempt to create a new block by bundling transactions from the transaction pool into a candidate block and attempting to solve the blockchain’s Proof of Work requirement for a new block.

6. Insertion in a new block

If the miner is successful in adding a new block to the blockchain, the transaction will be included in that block and be available for another Wallet to start the process of creating a new transaction.

Summary

A Wallet has a record of which blockchain transactions have sent it Bitcoin.

The Wallet makes a new transaction to spend these, providing proof that the previous transaction output belongs to it.

The new transaction indicates another Wallet which is entitled to the forwarded Bitcoin, and tells the destination Wallet what proof is required for it to use that output in a transaction of its own.

 

 

Understanding the nonce in blockchain validation

With a reasonable amount of research, the concept of a blockchain is accessible to most technically minded people. Simply put:

A blockchain is a public ledger. Participants compete to add blocks of transactions to the ledger, in return for a reward. This competition is based on problem solving. The participant who solves the problem gets to add the new block, and the transaction that creates their reward is included in that block. The block also contains the solution to the problem, so that other participants can verify that the participant solved the problem.

The detail is a little bit trickier, particularly when it comes to the problem solving step.

In the Bitcoin blockchain, problem solving means creating a hash of all of the contents of the candidate block and converting it into a binary string (e.g. 0000011001110010). The problem is solved when the participant (a miner in the case of Bitcoin) can produce a string with a minimum number of leading zeros. The number of leading zeros required can be obtained from the previous block in the chain, which is determined by how long it took the miner who created that block to solve the problem for that block. Bitcoin self-regulates the rate at which new blocks are created by adjusting this “difficultly” level up and down.

But here’s the thing:

If a miner knows that it must calculate a binary string with x leading zeros, why doesn’t it just create such a string with a line of code (a fake or forged string, if you will), add that to the block, submit the block for validation by other miners and wait for its reward?

Bitcoin (and other blockchains) obviously deal with this, but how they do so may not be immediately apparent.

When a miner attempts to solve the puzzle, it bundles all the data in the block into a string, hashes it and the converts it to binary format. However, for it to produce a different string each time, in order to find one that meets the requirements of the puzzle, it needs to slightly alter the input to the hash, as otherwise it would just produce the same string each time.

This altered input is the nonce. A basic nonce is just a number incremented by one. For example, in the first iteration, the miner would include a nonce of “1″, in the second “2″, in the third “3″ etc etc.

If the minder finds a string that solves the puzzle, they include both that string *and* the nonce in the block.

Now when another miner goes to validate that block, they include the same nonce as the miner that originated the block in hashing function, and derive the same binary string as is required by the puzzle.

If the originating miner had simply faked/forged the binary string, they would not know the nonce required to generate it, and no other miner would be able to validate it. The block would therefore be rejected.

Simple, but effective!

 

Basic guide to the issues around CNAME at the zone apex

What is a CNAME?

In DNS, CNAME stands for “canonical name”.

“Canonical” is one of those words you hear every now and then in technology discussions, but not many people are exactly sure what it means. In effect, it means “official”, which can be further extrapolated to “universally accepted”.

So in a DNS zone file, a CNAME looks like this:

www.nightbluefruit.com. 3600 IN CNAME www.nightbluefruit.hostingcompany.com

In this case, the canonical name for the server which hosts www.nightbluefruit.com is:

www.nightbluefruit.hostingcompany.com

An alias (“also know as”) of that canonical name is:

www.nightbluefruit.com

So if you request this name, which is an alias, the response you will get will be data for the canonical record, for any type of record. For example, if you request the MX data for www.nightbluefruit.com, you will/should receive the MX data for  www.nightbluefruit.hostingcompany.com.

At this point, it is important to understand that www.nightbluefruit.com is an alias of www.nightbluefruit.hostingcompany.com, and not the other way around.

What is the zone apex?

The zone apex is that part of the a DNS zone for which data exists for the entire domain, rather that a specific host in the domain.

So if you want to define an MX records for the entire domain, nightbluefruit.com, you would create that record at the apex:

nightbluefruit.com. 3600 IN MX mail.nightbluefruit.com.

Its worth noting that there is a difference between the default record type for a domain and the zone apex:

*.nightbluefruit.com. 3600 IN A 192.168.100.1

The default value for a particular record type defines the data that should be returned if a request is received for a record type which does not have a specific match is the zone file.

What’s a CNAME at the zone apex?

A CNAME at the zone apex looks like this:

nightbluefruit.com. 3600 IN CNAME www.nightbluefruit.hostingcompany.com.

This is a common requirement for website owners who want to advertise their website without the “www” prefix and who also use the services of a 3rd party web hosting company who cannot provide them with a dedicated ip address and instead provide them with a canonical name for the website running on their infrastructure.

Why is it not allowed?

Given that so many web owners have the requirement outlined above, it seems incredulous that this isn’t allowed, which is why this is such a hot topic.

Firstly, let’s clarify that there is no explicit RFC rule that says “you can’t have a CNAME at the zone apex”.

What the RFCs do say is that if you define a CNAME with a name (the left side of the record declaration) of “nightbluefruit.com”, you can’t create any other record of any other type using the same name. For example, the following would be non-compliant:

nightbluefruit.com. 3600 IN CNAME www.nightbluefruit.hostingcompany.com.
nightbluefruit.com  3600 IN MX mail.nightbluefruit.com.

The reason for this goes back to understanding that the “alias” is the left side and the “official” name is the ride side. If something is “official”, there can only be one version of it. The first record in the above sequence tells DNS that www.nightbluefruit.hostingcompany.com is “official” for nightbluefruit.com, so DNS doesn’t want to know about any other records for nightbluefruit.com.

But, you’ll ask, why can’t DNS simply segregate “officialness” based on the record type? The answer to this is that the DNS standard came into being long before HTTP or any contemplation of shared web hosting, and it is no longer practical to reverse engineer all of the DNS software that has grown out of the standard to fit this specific use case.

Is this strict?

This is where is starts to get interesting.

Given that so many people want CNAMEs at their zone apex ($$$$), many software developers and their product managers have taken a sideways look at the RFC and determined that it permits a degree of flexibility in its implementation:

If a CNAME RR is present at a node, no other data should be present; this ensures that the data for a canonical name and its aliases cannot be different.

The key phrase is “should be”. The argument runs that the absence of the phrase “must be” is in fact a license to interpret the standard more liberally, and there are DNS server implementations on the market that will allow CNAME records to co-exist with other records with the same name.

If you’re using a web hosting or DNS provider who says you can have a CNAME at the zone apex, they will be using such an implementation. This isn’t something that exists only in the dark, backstreets of web hosting. Global internet services providers like Cloudflare have permitted CNAMEs at the zone apex.

In truth, this interpretation exists on very shaky foundations. There is a more detailed discussion of this issue here.

How does this problem manifest itself?

RFCs exist to allow different people design software that will interoperate. If person A is writing code to implement a DNS server, and person B is writing software to implement a DNS client. If they both follow the standard, the 2 pieces of software should work together. The RFC doesn’t tell them how to write code, but it does tell them how their code should behave.

When people beginning interpreting the intent of an RFC, nothing good comes of it. It may not be immediately apparent, but the longer software exists, the more edge cases it has to deal with, and that’s where it becomes important that one piece of software can anticipate the response of another.

In terms of a practical example, this is really good:
https://social.technet.microsoft.com/Forums/exchange/en-US/b3beefee-e353-44ec-b456-f2c70bcd1913/cname-issue?forum=exchange2010

In this case, MS Exchange Server 2010 stopped delivering mail to addresses whose DNS zone had a CNAME at the zone apex. The Exchange mail server was relying on a local DNS cache. Previously, someone had queried an A record for company.com, and received a CNAME response. That data was cached. Later, when the MX record for company.com was queried, the cache ignored the fact that the cached record was an A record (this was compliant behaviour) and returned a CNAME. The Exchange server correctly rejected this as CNAMEs are not valid data for MX queries.

Are there workarounds?

The first workaround is to not implement the RFC standard. Some providers will tell that this is their workaround, but it isn’t. It’s just chicanery, and you should avoid it.

The big cloud hosting companies are the best place to go for workarounds. Amazon AWS have a black box solution in Route53 which allows CNAMEs at the zone apex if the canonical name is an AWS resource identifier (like an ARN for an Elastic Load Balancer).

The most en vogue workaround at the moment is what is called CNAME flattening.

What is CNAME flattening?

DNS server software that implements CNAME flattening permits the user to create multiple CNAMEs of the same name with different official values, which allows the user to create a CNAME at the zone apex. When you configure the zone file in this way, the server will accept it and start as normal.

When a query is then received for one of these records, rather than return the CNAME value, the server will go off and query the CNAME value, and any subsequent values, until it gets to an IP address, and then return that IP address as the response to the original query.

Is CNAME flattening standards compliant?

Yes and no.

On the one hand it permits the existence of something that the RFC says is not permitted, but equally, it behaves in a way that is RFC compliant.

Whether a user wants to rely on CNAME flattening is something they will have to make a call on them according to their individual circumstances.

Using the map directive to resolve multiple conditions in Nginx

As your Nginx configuration expands and becomes more complex, you will inevitably be faced with a situation if which you have to apply configuration directives on a conditional basis.

Nginx includes an if directive, but they really don’t want you to use it: https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/

The alternative that is generally recommended is the map directive, which is super-efficient as map directives are only evaluated when they are used: http://nginx.org/en/docs/http/ngx_http_map_module.html

This isn’t quite as intuitive as the if directive, and newcomers to Nginx can struggle with its logic.

Basically put, map will create a new variable, and assign the value of that variable based on the value of other variables. eg

$new_variable = 1 if $existing_variable = true

or

$new_variable = 0 if $existing_variable = false.

map $existing_variable $new_variable {
    "true" 1;
    "false" 0;
}

You can leverage this in conditional configuration by assigning a configuration value to a new variable, and using that in your configuration. For example, use a different whitelist depending on the source ip address of the request:

map $remote_addr $whitelist {
    "192.168.100.1" "whitelist1.incl";
    "192.168.100.2" "whitelist2.incl";
}

location / {
   ...
   include $whitelist;
   ...
}

This works fine when you you want to set the value of a new variable based on evaluation of one other variable, but in a typical if statement, you can evaluate multiple variables at the same time:

if ( $a == 1 && and $b == 2) etc etc

Can you do the same thing with map?

Some people will tell you can’t, which is technically true, but you can “couple” map blocks to produce the same effect. Let’s use the following example:

If a request is for host "example.com" and the source ip address is 192.168.100.1, return a different home page.

If this instance, we need 2 map blocks:

  • One to test if the host name is “example.com”
  • One to test if the source ip address is 192.168.100.1

and 2 variables:

  • One to hold the value of the home page file ($index_page)
  • One to link the map conditions together ($map_link)
#Test the host name, and assign the home page filename value to the $map_link variable
map $host $map_link {
    default "index.html";
    "example.com" "index_new.html";
}

#Test the source address, and if it matches the relevant address, interpolate the value assigned from the previous map
map $remote_addr $index_page {
    default "index.html";
    "192.168.100.1" "${map_link}";
}

location / {
    ....
    index $index_page;
    ....
}

The full logic here is as follows:

If the hostname is “example.com”, we should provisionally make the home page file “index_new.html”. Otherwise, the home page file should be “index.html”.

If the source ip address is 192.168.100.1, which is the second part of our test, we should refer to the result of the first part of our test. Otherwise, we can ignore the first part of our test and use the default value “index.html”.

 

 

 

 

 

What to expect when running a Bitcoin Core Full Node

This post is not about how to install Bitcoin Core and run it as a full node.

Its about what you can expect when you install Bitcoin Core and run it as a full node.

About full nodes

Firstly, lets clarify the role of a full node. A node is any system that connects to the Bitcoin network. A full node is any system that connects to the Bitcoin network and retains a full and up to date copy of the blockchain. If you see how anonymous shipping companies use Crypto currency as their mode of payment, you’d understand how blockchain technology works.

A full node is not a miner. Nodes simply relay information. By contrast, a miner specifically listens for transactions and tries to create new blocks. Miners do not typically hold a copy of the blockchain.

A full node fulfils one of two roles.

If you simply run a full node, and do not use it as a wallet, all your node is doing is adding to the capacity of the Bitcoin Network to relay information between nodes. This is of some value, but not hugely significant.

If you run a full node and use it as a wallet (either directly or by linking a client wallet to your node), your full node adds to the economic strength of the Bitcoin Network. It does this by enforcing the consensus rules for transactions. The more nodes that enforce the consensus rules, the more difficult it is for malicious nodes to break that consensus.

It is also worth pointing out that running your Bitcoin transactions through your own node is the purest form of transacting in Bitcoin. You have your own copy of the blockchain and can verify transactions for which you are a beneficiary without have to rely on someone else’s copy of the blockchain. It is the most accurate interpretation of the common Bitcoin axiom of “being your own bank”. 

Hardware

If you’re an individual and not employed by a software firm involved in Bitcoin, or some other agency tasked with promoting Bitcoin, chances are you’re going to run Bitcoin Core on some old system that you might otherwise have recycled/dumped.

Generally, this is OK. Where your system is going to need the most poke is in downloading the blockchain and verifying each block as it comes in. Once you have downloaded the entire blockchain, each new block is created every 10 minutes, so your system will have a 10 minute break between processing calls. Prior to this, when you’re downloading blocks one after the next in order to complete the blockchain, your system will exhaust its RAM and disk IO. Its quite normal for your system to be become momentarily unresponsive in this phase.

For reference, I downloaded the blockchain on an 8 year old mini-PC with 4GB of RAM and a 300GB disk. You’re going to need 180GB of disk at the time of writing to accommodate the current block chain.

Network

Something similar applies in respect of the network. The current blockchain is ~180GB, so you’re going to have to download this. There is no hurry with this. You can stop and start the Bitcoin daemon as often as you want. It will just pick up where it left off when you restart. I set up a cron schedule on mine to start the daemon at 23:00 and stop it again at 08:00, so that the node wasn’t interfering with my day to day download requirements. It took me 5-6 week to get the entire blockchain.

At the beginning blocks will will rack up really quickly, as the first blocks weren’t full and considerably smaller than the 1mb limit. As you get into the last 50k blocks (out of 500k at time of writing), where all blocks are full, things slow down significantly.

Once you have the entire chain, the load on the network eases, as you’re only picking up 1 new 1mb block every 10 minutes. There is also a bit of chatter re. notifications but nothing substantial.

One point to note:

If the Bitcoin daemon isn’t shut down cleanly, the next time it starts, it will re-verify all the blocks it downloaded during its last run. During this time, it won’t download any new blocks, and the RPC service won’t be available to process calls. If the previous run was particularly long, this process will also take a long time. You can check the log to see that this is happening. All you can do is let it run. If the daemon gets improperly killed again, the whole process will start again when the Bitcoin daemon is restarted. You should really, really try to avoid letting the daemon stop unexpectedly. Never kill the daemon.

Checking status

How do you know when you have downloaded the full blockchain?

One way you’ll know is when the Bitcoin daemon is running, but you disk isn’t thrashing and your system is generally responsive. That generally means you have all the blocks and the daemon is just sitting there waiting to pick up the next one that becomes available.

You can obviously verify this with an RPC call too:

root@ubuntu:~# bitcoin-cli getblockchaininfo | grep blocks -A 1
 "blocks": 508695,
 "headers": 508695,

This tell you that the service can see headers on the blockchain for 508,695, and also that there are 508,695 blocks on your system.

If you stop your system for  a few hours or days, and run this command again when your restart it, the number of blocks will be lower than the numbers of headers, and your system will start thrashing again as it catches up. The longer the gap, the longer the catch up period. When your system is catching up, it has no value to the Bitcoin network, so try and organise your system so that it is always on with cron controlling whether or not the Bitcoin daemon is running.

 

Everything you want to know about AWS Direct Connect but were afraid to ask

AWS Direct Connect is one of those AWS services that everybody knows about but not too many people use. Learn more about your options here. I’ve recently been involved in the set up of a redundant AWS Direct Connect link. To assist others considering doing the same, I’m sharing what I’ve learned.

MTUs?

This is big.

Within an AWS availability zone, and between availability zones in the same region, EC2 instances use jumbo frames. However, jumbo frames are not supported on AWS Direct Connect links, so you will be limited to a maximum MTU of 1500. You may wish to consider the implications of this before you consider using AWS Direct Connect.

Update: as of Nov 2018, Jumbo Frames are now supported on AWS Direct Connect!

Otherwise…

What is AWS Direct Connect?

Its a dedicated link between a 3rd party network and AWS. That means data flows over a dedicated isolated connection, which means you get dedicated, consistent bandwidth, unlike a VPN, which flows over the public Internet.

How is it provisioned?

You have 2 choices. AWS partners with co-location data centre providers across their various regions. This involves AWS dropping wholesale connectivity directly into the Meet Me Rooms in these 3rd party data centres. If your equipment is located in one of these data centres, your AWS Direct Connect connection is then simply patched from your cabinet into the Meet Me Room. This is called a Cross Connect.

If you are not using one of AWS’s co-location data centre partners, you can still make a Direct Connect link from your corporate network to AWS. This involves linking your corporate network to one of the data centres where AWS has a presence in their Meet Me Room, from where you can make on onward connection to AWS. The Direct Connect documentation lists telecoms providers in each region who can provide this service, and the data centres to which they can make connections.
https://aws.amazon.com/directconnect/partners/

What speeds are available?

By default, you can get either a 10GB or 1GB connection, but you can also consult directly with the AWS partners to get lower speed connections.

What do you pay?

You pay per hour for the amount of time your connection is “up” (connected at both ends). What you pay per hour depends on the speed of your connection. If you provision a connection but it isn’t “up”, you don’t pay, unless you leave that unconnected connection in place for > 90 days (after which you start paying the standard rate).

You also pay per GB of data transferred from AWS to your location. You don’t pay for data transferred from your location to AWS.

What if I need more than 10GB?

You can aggregate multiple 10GB connections together.

How stable are the connections?

Whereas connecting to AWS with a VPN provides for 2 BGP routes from your location to AWS, a Direct Connect link is a single point of failure. It is thought (presumed?) that AWS provide for a certain level of redundancy once the connection leaves the Meet Me Room in the data centre, but there are no guarantees about this and AWS do not offer an SLA for connectivity.

What hardware do I need?

You will need L3 network hardware. It will need to be able to do BGP routing and support encrypted BGP passphrases. It will need to have sufficient port speed to connect to the Direct Connect uplinks you have provided. If this is a virgin install in a co-location data centre, there are switches available that can do both L3 and L2, handle BGP and provide redundancy for 2 Direct Connect connections. This negates the need to purchase both routers and switches. You should be able to get this kit for < €20,000. Providers will almost certainly try to sell you more expensive kit. If you’re using Direct Connect, they presume money is no object for you.

What are the steps required to set up a connection?

Decide if you need a single connection or if your going to need a pair of redundant connections.

Decide what speed connection you need. Don’t guess this. Estimate it based on current network traffic in your infrastructure.

Design you IP topology

If you are going to use one of the co-location data centres, contact them. Otherwise, contact one of the Telecoms Provider partners. They will provide pricing/guidance in terms of connecting your equipment or location to the relevant Meet Me Room. Moreover, Advanced Telecom Systems can help.

Procure the termination hardware on your side of the connection.

Once you have provisioned your connection and hardware, starting building your configuration on the AWS side of the connection.

What do I need on the in terms of configuring the VPC I am connecting to?

Typically, you will be connecting resources in a VPC to your co-location data centre of on-premises infrastructure. There are a number of hops between a VPC and a Direct Connect connection.

Working out from the VPC, the first thing you need is a Virtual Private Gateway (AWS denotes these as VGW, rather than VPG). This is logically a point of ingress/egress to your VPC. You will be asked to chose a BGP identifier when creating this. If you use BGP already, supply what you need. Otherwise, let AWS generate one for you.

When you have created this, you next create a Route Table that contains a route for the CIDR of your co-location data centre or on-premises infrastructure that points to the VGW you created earlier.

Next, create a subnet(s) (or use an existing one) and attach the Route Table to that subnet. Anything resources that need to use the Direct Connect connection need to be deployed in that subnet(s). Its probably worth deploying an EC2 instance in that subnet for testing.

This is all you need to do in the VPC configuration (you can apply NACLs, security etc later. Leave everything open for now for testing.)

How do I set up the Direct Connect configuration on the AWS side?

Once you’ve configured your VPC, you now need to configure your Direct Connect service (you don’t need to do these in any particular order. You can start with Direct Connect if you like).

Create the connections (dxcon) you require in the AWS Direct Connect console. You’ll be asked for a location to connect to and chose a speed of either 10GB or 1GB (if you want a lower speed, you’ll need to talk to your Telco or data centre before you can proceed).

The connection will be provisioned fairly quickly, and show itself in a “provisioning” state. After a few hours, it will be in a “down” state. At this point, you can select actions and download what is called a Letter of Authority (LOA) for the connection. This will specify what ports in the Meet Me Room your connection should be patched in to. You need to forward this to your co-location data centre or Telco for them to action.

Note: it is not infrequent to find the ports you have been allocated are already in use by someone else. In this case, delete the connection and start again. If you can, check with the data centre verbally that the ports are free before you submit the LOA to them. Repeat all of above if you have multiple connections. Redundancy is dealt with later in the process.

To be able to use your connection, you now need to attach a Virtual Interface (dxvif) to it. You have options here, and as is always the case, options make things a bit more complicated.

You can connect a Virtual Interface to either a VGW (Virtual Private Gateway) or a Direct Connect Gateway (not the same thing as a Direct Connect connection).

If you connect to a VGW, you will only ever be able to connect to the VPC to which that VGW provides access.

If you connect to a Direct Connect Gateway, you can associate multiple VGWs with that Gateway, allowing you access to multiple VPCs *across all AWS regions*. If you want to use this option, you need to create a Direct Connect Gateway before you create a Virtual Interface.

I can’t see any reason other than corporate governance and security why you would not want to use a Direct Connect Gateway, so I’d suggest using that option if in doubt.

So now proceed and create your Virtual Interface. If you only want to attach it to the VGW you created earlier, that option is there for you. Otherwise, attach it to the Direct Connect Gateway you created.

Once you have your Virtual Interface, go back to the Connections panel and associate that with one of your connections. You will need a dedicated Virtual Interface for each connection (you can also attach multiple Virtual Interfaces to the same connection, but that isn’t relevant here).

The final step here only occurs if you are using a Direct Connect Gateway. If you are, you need to associate the VGW you created in your VPC with the Direct Connect Gateway. It should be presented as option for you in the list of available VGWs. Start typing its identifier into the search field if not. The UI can be a bit flaky here.

That should be everything. Redundancy is the next piece.

How do I configure redundancy on the AWS side?

If you want redundant connectivity, you really need to use a Direct Connect Gateway rather than linking your connection directly to a VGW. I *think* this is a requirement for redundancy. If not, its still my recommendation.

If you have done that, you should now have 2 Virtual Interfaces and 1 VGW associated with your Direct Connect Gateway. Think of the Direct Connect Gateway as a router. The 2 Virtual Interfaces are on the external side of the router, linking in to 2 Direct Connect connections. The VGW is on the AWS side of the router, linking back to the VPC.

That should be all that is required. Traffic will flow out of the VPC through the VGW into the Direct Connect Gateway, which is BGP enabled and links into the 2 Virtual Interfaces, which are also BGP enabled. If one connection goes down, BGP routes the traffic on to the other connection. This is transparent to the VPC.

What about redundancy on the other side of the connection?

This is matter for your network administrator or service provider. Typically, the 2 connections will terminate in a logical stack of redundant routers/switches which are BGP enabled and can transfer traffic flow between the external connections.

How do I know it’s working?

You won’t see the state of your connections and Virtual Interfaces switch to “available” until L2 connectivity is established and the necessary BGP authentication handshake has occurred. At that point, you should be able to send ICMP requests from your termination hardware to the EC2 instance you created in your VPC earlier.

Good luck!

Slaying the Development branch – Evolving from git-flow (Part 2)

In Part 1, I talked about how we developed a git branching strategy that allowed for both a single historical deployment branch, but also for multiple release branches to exist at the same time. The residual issue with this was that we had to suppress merge conflicts during our branch sync’ing operations, so we had to find another way of exposing merge conflicts in a timely and reliable way.

In dealing with this, we devolved responsibility for merge conflict detection to Jenkins.

Builds that were deployed to our Development, QA and UAT environments all originated in Jenkins Build jobs. We had lots of build jobs, but we maintained these programmatically using python, so updating them was not difficult.

We added an extra build step before our integration tests. This step queried the source control repo for all current release branches in a particular project. Each branch was checked out and merged to the branch being built. If any branch failed to merge, the Build job was failed. The only way the development team lead could get the Build to succeed was to resolve the merge conflict.

When I first proposed this, there was some consternation. The main argument was that the development of one release branch should not be delayed by code in another release branch. If you consider release branches in isolation, there is some merit in this argument.

However, while timely deployment of releases is an important consideration, it is not the concern of Source Control. The concern of Source Control is that the code underpinning the product is well managed, which means conflicts between different development strands should be exposed and resolved as soon as possible, even if this impinges on one particular group who are contributing to the overall process.

It was also the case the merge conflicts were reasonably rare, and I was able to argue that a few minutes spent resolving a merge conflict every couple of months was a small price to pay to not have to unravel a bug based on a merge conflict that had found its way into Production.

We proceeded with the system, and as predicted, Build failures due to merge conflicts were rare. However, they did happen, which was something of a relief, as it proved the system was working as designed.

More generally, the development teams were given a whole new lease of life by removing the development branch from the SDLC. In fact, it really only became apparent how much of a bind the development branch was after it had been removed.

The overall success of the system was apparent when a minor bug in the scripting that underpinned the system led to a brief period of confusion before it was found and resolved. During this incident, I offered the option of temporarily re-opening the development branch.

The resounding NO that this was greeted with gave me not insignificant satisfaction!

Renaming files with non-interactive sftp

SFTP hangs around the IT Operations world like a bit of a bad smell.

Its pretty secure, it works, and its similar enough to FTP for software developers and business managers to understand, so its not uncommon to find it underpinning vast data transfer processes that have been designed in a hurry.

Of course, its very rudimentary in terms of what it can do, and very dependent on the underlying security of the OS on which it resides, so its not really something that should find a home in Enterprise IT solutions.

Anyway, sometimes you just have to deal with it. One problem that you will often encounter is that while you have SFTP access to a system, you may not have shell access via OpenSSH. This makes bulk operations on files a bit more difficult, but not impossible.

SFTP has a batch mode that allows you pass STDIN commands to the processor. If used in conjunction with non-interactive login (ie an OpenSSH Public/Private Key Pair) you can actually process bulk operations.

Let’s say you want to rename 500 files in a particular directory:

You can list the files as follows:

echo "ls -l1" | sftp -q -i ~/.ssh/id_rsa -b - user@sftp.mycompany.com:/dir1/

In this case, the parameter:

-b -

tells the processor to process the command coming from STDIN

You can now incorporate this into a BASH loop to complete the operation:

for f in `sftp -q -i ~/.ssh/id_rsa -b - user@sftp.mycompany.com:/dir1/ | grep -v sftp | grep -v Changing`;
    do
    echo "Renaming $f...";
    echo "rename $f $f.renamed" | sftp -q -i ~/.ssh/id_rsa -b - user@sftp.mycompany.com:/dir1/;
done

 

Slaying the Development branch – Evolving from git-flow (Part 1)

Continuous Integration and Continuous Delivery (CI/CD) is essential to any modern day, mission critical software development life cycle.

The economic logic is simple. If you’re paying a developer a lot of money to fix bugs or add features to your software, it doesn’t make sense to have those bug fixes and features sitting in a build pipeline for 2 months waiting to be deployed. Its the equivalent of spending money on stock for your grocery store and leaving it on a shelf in your loading bay instead of putting it in the shop window, including my korean ginseng treatment.

But taking legacy software development life cycles and refactoring them so that they can use CICD is a significant challenge. It is much harder to re-factor embedded, relatively stable processes than to design new ones from the ground up, to help you overcome this make sure you learn more about the event architecture.

This was a challenge I was faced with in my most recent employment. This article, and its sequel, describe some of the challenges I encountered and how they were resolved, focusing specifically on how we evolved our source control management strategy from one based on git-flow to one that permitted merging code changes directly from Feature branches to Production.

I’ll begin by describing the environment as I found it.

This was a Multi Tenant Software as a Service (SAAS) provided over the Internet on a business to business basis. The SAAS comprised 16 individual services with a variety of MySQL and PostgreSQL as data stores. The services were built with Java (for processing and ETL operations) and Rails (for Web UI and API operations).

The business profile required parallel development streams, so source control was based on the git-flow model. Each project had a development branch, from which feature branches were taken. Feature branches were merged into concurrent release branches. Builds were created from release branches and deployed in the infrastructure tiers (Dev, QA, UAT, Staging, Prod). There was no historical deployment branch and no tagging. Each release cycle lasted approximately 6 weeks. A loose Agile framework applied, in that stories were part of releases, but Agile processes were not strictly followed.

Infrastructure used in the software development life cycle was shared. There were monolithic central Dev, QA, UAT environments etc. Local developer environments were not homogenous. Java developers couldn’t run local Rails apps and vice versa. All code was tested manually in centralised, shared environments.

The situation described above would be reasonably typical in software development environments which have evolved without a DevOps culture and dedicated Operations resources (ie where development teams build the environments and processes).

While the development/deployment process in this environment was working, it was sub-optimal, resulting in delays, cost overruns and issues with product quality.

A plan was developed to incrementally migrate from the current process to a CI/CD-based process. This involved changes to various functions, but perhaps the most important change was to the source control management strategy, which is what I want to deal with in detail in this article.

A typical development cycle worked as follows.

In every git project, the development branch was the main branch. That is to say, the state of all other branches was relative to the development branch (ahead or behind in terms of commits).

For a scheduled release, a release branch was created from the development branch. For for the purposes of illustration, lets call this release1. Stories scheduled for release1 were developed in feature branches taken from development, which were then merged into release1. These features also had to be merged to development. When all features were merged to release1, release1 was built and deployed to QA.

At the same time, work would start on release2, but a release2 branch would not be created, nor would release2 features be merged to development, as development was still being used as a source for release1 features. Only when development for release1 was frozen could release2 features be merged to development, and only when release1 was built for Production was a release2 branch created.

This system had been inherited from a simpler time when the company was younger and the number of applications comprising the platform was much smaller. Its limitations were obvious to all concerned, but the company didn’t not have a dedicated “DevOps” function until later in its evolution, so no serious attempt had been made to re-shape it.

From talking to developers, it became clear that the primary source of frustration with the system was the requirement to have to merge features to multiple branches. This was particularly painful when a story was pulled from a release, where the commit was reversed in the release branch but not the development branch. It was not infrequent for features to appear in one release when they were scheduled for another.

After talking through the challenge, we decided on a number of requirements:

1. Features would only be merged to one other branch

2. We could have concurrent release branches at any time

3. We would have a single historical “Production” branch, called “deploy”, which was tagged at each release

4. At the end of the process, we would only be one migration away from true CI/CD (merging features directing to deploy)

5. We would no longer have a development branch

From the outset, we knew the requirement that would present the biggest challenge was to be able to maintain concurrent release branches, because when multiple branches are stemmed from the same source branch, you always run the risk of creating merge conflicts when you try to merge those branches back to the source.

At this juncture, its probably wise to recap on what a merge conflict is, as this is necessary to understand how we approached the challenge in the way that we did.

A merge conflict occurs between 2 branches when those branches have a shared history, but an update is made to the same line in the same file after those branches have diverged from their common history. If a conflict exists, only one of the branches can be merged back to the common source.

If you think of a situation in which 2 development teams are working on 2 branches of the same project taken from the same historical branch, and those 2 branches ultimately have to be merged back to that historical branch, you can see how this could present a problem.

When you then extrapolate that problem out over 16 individual development projects, you see how you’re going to need a very clearly defined strategy for dealing with merge conflicts.

Our first step was to define at which points in the development cycle interaction with the source control management system would be required. This was straightforward enough:

1. When a new release branch was created

2. When a new patch branch was created

3. When a release was deployed to Production

We understood that at each of these points, source control would have to be updated to ensure that all release branches were in sync, and that whatever method we used to ensure they were in sync would have to be automated. In this instance, “in sync” means that every release branch should have any commits that are necessary for that release. For instance, if release1 were being deployed to Production, it was important that release2 should have all release1 commits after the point of deployment. Similarly, if we were creating release3, release3 should have all commits from release2 etc etc.

However, we knew that managing multiple branches in multiple projects in this way was bound to produce merge conflicts, but at the same time, we didn’t want a situation in which a company-wide source control management operation was held up by a single merge conflict in a single project.

In light of this, we decided to do something a little bit controversial.

If our aim was to keep branches in sync and up to date, we decided that branching operations should focus on this goal, and that we would use a separate mechanism to expose and resolve merge conflicts. Crucially, this part of the process would occur prior to global branching updates, so that all branches arrived at the point of synchronisation in good order.

So, to return to the 3 points where we interacted with source control, we decided on the following:

1. When a new release branch was created

This branch would be created from the latest tag on the historical production branch ( “deploy”). All commits from all other extant release branches would be merged to this branch, resulting in a new release branch that already contained all forward development. When a merge conflict existed between the new branch and an extant release branch, the change from the extant branch would be automatically accepted (–merge-strategy=theirs)

2. When a new patch branch was created

This branch would be created from the latest tag on the deploy branch . No commits from any other extant release branches would be merged to this branch, because we did not want forward development going into a patch. Because no extant release branches were being merged to the patch, there was no need to deal with merge conflicts at this point.

3. When a release was deployed to Production

At this point, the release branch would be merged to the deploy branch, and the deploy branch would then be merged to any extant release branches. This would ensure that everything that had been deployed to Production was included in forward development. When a merge conflict existed between the deploy branch and an extant branch, the change from the extant branch would be automatically accepted (–merge-strategy=ours). The release branch that had been merged would be deleted, and the the deploy branch tagged with the release version number.

We decided to refer to the automatic acceptance of changes during merge operations as “merge conflict suppression”. In the next part of the article, I’ll explain how we decided to deal with merge conflicts in a stable and predictable way.