software development

Test Trade-Offs

TL;DR: Software developers often decide what tests to write based on technical considerations. Instead they should decide what tests to write based on what feedback is missing. The Test Trade-Offs Model can be used to make better decisions around what tests to write. Useful dimensions to use when deciding what type of tests you should write are primarily: speed of feedback, coverage and variation.

The current thinking in the software development industry is to have a lot of low-level unit tests, fewer integration tests, and even fewer higher-level tests like system or end-to-end tests. The Test Pyramid, shown below, is a common model used to describe the relative amounts or ratios of the different types of tests we should aim for.

Traditional Test Pyramid

This kind of thinking generally focuses on how quickly the tests run – i.e. speed of feedback – and also how easy it is to write the different types of tests. Both of these are technical considerations. The problem I have with this thinking is it ignores the primary reason we have tests in the first place – to get feedback about our system. If technical considerations govern the types of tests we have, there may be a large number of tests we will never write, and thus a lot of feedback we’re not getting. For example, having lots of low-level unit tests doesn’t give us any information about how the system works as a whole. Evidence of this phenomenon is the multitude of memes around unit testing not being enough. Some of my favourites (click the pictures for the original Tweets):

Unit testers be like

Focusing on technical considerations only leads us to make blind trade-offs: we’re not even aware of other dimensions we should be considering when deciding which tests to write. The Test Trade-Offs Model was developed so that teams trade-offs when deciding which tests to write, by making other trade-of dimensions explicit. The model is predicated on the idea that different tests are valuable to different audiences at different times, for different reasons.

The dimensions currently in the model are:

  • Speed: How quickly does the test execute? How long do we have to wait to get the feedback the test gives us?
  • Coverage: How much of the system (vertically) does the test exercise? In general, the higher the coverage, the more confident we are about the behaviour of the system as whole, since more of the system is being exercised. Coverage is also known as scope or depth.
  • Variation: How many near-identical variations of the test are there? E.g. if a test has lots of inputs, there may be very many combinations of inputs, with each combination requiring its own test. (This article is useful for more on this idea.)

In an ideal world, our tests would execute instantaneously, cover the entire system, and would deal every combination of inputs and states as well. Therefore, the ideal test would score very highly in all dimensions. Unfortunately this is not possible in the real world since some of the dimensions have an inverse affect on others. The image below is a causal loop diagram showing the causal relationships between dimensions.

Causal Loop Diagram
  • An increase in Coverage generally leads to a decrease in speed of feedback. This is because the more of the system covered by the test, the longer the test takes to run.
  • An increase in Variation typically leads to a decrease in coverage. With high variation, there is usually a very high number of tests. If the suite of tests is to complete running in a reasonable timeframe, we usually decrease the coverage of these tests.

As the model shows, no test can ever maximise for all dimensions. Any test will compromise on some of the dimensions. We therefore need to choose which dimension to prioritise for a test. This is the trade-off. Each test should prioritise one of the dimensions. The trade-off of priorities should be based on what feedback about the system we need.

For example, if we need tests that give us information about the behaviour of the whole system, which will be valuable for a long time, we’re most likely willing to compromise on speed of execution and variation. The trade-off is now explicit and deliberate. Traditionally we would have ruled out such a test immediately because it would take too long to run.

The way I’d like to see the model being used is for teams to decide what system feedback they’re missing, decide what trade-offs to make, and then what kind of tests to write.

I believe this to be the the first iteration of the model, I expect it to evolve. I’m certain there are other dimensions I haven’t yet included, perhaps even more important dimensions. What dimensions do you use when deciding what type of tests to write? What dimensions do you think should be added to the model?

I would like thank Louise Perold, Jacques de Vos and Cindy Carless who helped me refine my thinking around this model and who helped improve this article.

software development

The States, Interactions and Outcomes Model

TL;DR: The States, Interactions and Outcomes model provides a way for cross-functional teams to collaboratively explore, specify and document expected system behaviour.

Specification by Example (SbE) and Behaviour-Driven Development (BDD) can be an incredibly effective way for teams to explore and define their expectations for the behaviour of a system. The States, Interactions and Outcomes Model provides a set of steps, and a lightweight documentation structure for teams to use SbE and BDD more effectively. The best way of conveying the model is through a worked example.

Worked example
To demonstrate the model and the process, I will take you through applying it to a problem I use frequently in coaching and training. Imagine we are creating software to calculate the total cost of purchased items at a point of sale. (This problem is inspired by Dave Thomas’ Supermarket Pricing Kata and here.) You walk up to a till at a supermarket, hand the check-out person your items one-by-one, and the checkout person starts calculating the total of the items you want to purchase. The total is updated each time the checkout person records an item for purchase.

We would like to include a number of different ways of calculating the total price for purchased items, since the supermarket will want to run promotions from time to time. Some of the pricing methods we would like to include are:

  • Simple Pricing: the total cost is calculated simply by adding up the cost of each individual item recorded at the point of sale.
  • Three-for-Two Promotion: By three of any particular item, pay for only two. This promotion is specific to the type of item being sold. For example, buy three loaves of Brand-X bread, pay for only two.
  • Combo Deal: A discount is applied when a specific combination of items is purchased.
  • Bulk Discount: A discount is applied when more than a specific number of a particular item is purchased.

In this article I will deal with only ‘Simple Pricing’ and ‘Three-for-Two Promotion’. I will deal first with ‘Simple Pricing’ completely, and then start with ‘Three-for-Two Promotion’.

Simple Pricing

  • System boundaries: We are concerned only with the way the total for the purchased items is calculated. We are not concerned with things like how the cost of an item is acquired (e.g. barcode scanning), accepting payment etc.
  • Types of inputs: For Simple Pricing, the only input is the price of the item being recorded – item price.
  • Types of state: What affects calculating the total price besides item price? For Simple Pricing, the total after recording an item – the new total – is determined by both the price of the captured item, as well as the total before the item is captured. Therefore state consists of current total.
  • Outcome dimensions: For Simple Pricing, the outcome consists only of the total calculated as a result of capturing an item – new total.
  • Possible values for state types: Current total is an integer, which can be negative, 0, or positive.
  • Possible values for inputs: Item price is an integer, which can be negative, 0, or positive.

Expected outcomes for combinations of state and inputs:

State Interaction Outcome Scenario Name
Current total Capture item that costs New total Error
0 0 0 Free first item
0 10 10 First item
10 10 20 Second item
0 -10 ERROR – item price can’t be negative First item with negative price
10 -10 ERROR – item price can’t be negative Second item with negative price
10 ABCDEF ERROR – invalid input Text input

Three-for-Two Promotion

  • System boundaries: The system boundaries don’t change compared to Simple Pricing.
  • Types of inputs: For Three-for-Two Promotion the type or name of the item is now also required as an input – item type.
  • Types of state: The outcome is now also affected by two other types of state: the types of items already captured – already captured items; and the type of Promotion currently active – Active Promotion.
  • Outcome dimensions: For Three-for-Two Promotion, the outcome consists of new total, as well as the new list of items that have been captured – new captured items.
  • Possible values for state types: Current total is an integer, which can be negative, 0, or positive. Active Promotion is a complex type. It can be ‘none’ or a promotion for a specific type of item, e.g. ‘Buy 3 Cokes, pay for 2’.
  • Possible values for inputs: Item price is an integer, which can be negative, 0, or positive. Already captured items specifies the quantity and types of items already captured.

Expected outcomes for combinations of state and inputs:

State Interaction Outcome Scenario Name
Active promotion Current total Items already captured Capture That costs New total New captured items Error
20 2 Cokes Coke 10 30 3 Cokes 3rd item with no promotion
Buy 3 Cokes pay for 2 20 2 Cokes Coke 10 20 3 Cokes 3rd qualifying item with 3 for 2 promotion
Buy 3 Cokes pay for 2 20 1 Coke, 1 bread Coke 10 30 2 Cokes, 1 bread 3rd item doesn’t trigger promotion

There are several interesting things about the specifications above to which I’d like to draw particular attention:

  • All the words and concepts used are domain-level words and concepts. There are no implementation or software-specific words.
  • The specification describes the transactions and outcomes only, not how the work should be done.
  • The things that determine the outcome of a transaction are super-obvious and explicit. This makes it easier to detect and discuss edge cases.
  • Invalid states and interactions are easy to see.
  • The path to any particular state is clear and obvious
  • Should we want to, it would be easy to automate the verification of a system which should satisfy these specifications.

As mentioned above, I developed and use this model during my coaching and training. It has proven very effective for quickly exploring and documenting system behaviour. In some BDD Bootcamps, we have explored and specified legacy systems running in productions in about 3 hours. One of the ways this has proven useful is people in the bootcamp who have not worked on those particular systems gained a very thorough high-level overview of the intention of the system.

The worked example above follows these steps:
1. Explicitly define and bound the system under specification. What is included, what is excluded?
2. What are the different inputs to the system?
3. What are the types of state that the system can have? Another way to ask this: Besides the inputs, what can affect the outcome of an interaction?
4. What constitutes system outcome? Is any output returned to the user? Note that an outcome must, by definition, include all states as identified above. Outcome can also include error conditions.
5. For each type of state, what are the possible values?
6. For each type of input, what are the possible values?
7. For each combination of state and interaction, what is the expected outcome (including all dimensions)?

The Thinking Behind The Model
The idea behind the model is that the outcome of a system interaction is a function of the interaction and the state of the system at the time of interaction. We can develop a complete and comprehensive specification of expected system behaviour by describing the expected outcome for every possible combination of state and interaction.

Specification by Example and Behaviour-Driven Development
The model and the steps are largely based on the concepts of Specification by Example and Behaviour-Driven Development. Specification by Example (SBE) is the practice of specifying expected system behaviour using concrete values instead of natural-language descriptions. For more on Specification by Example,you can’t do better than Gojko Adzic’s book. Behaviour-Driven Development (BDD) uses SBE. One of the reasons I use SBE is that it allows us to work with something tangible, instead of ‘invisible ideas’. Some of the benefits of using BDD and SBE are:

  • Getting feedback on the work from a wider audience earlier in the process.
  • Making edge cases more obvious.

Ordinarily, we would need to write some software to achieve these things. By using BDD and SBE we can get these benefits before writing any software. However it is not always easy to get started with these techniques.

A common challenge teams face when they start using BDD and SBE is the need to make every aspect of expected externally-observable system behaviour completely explicit. That is, all the factors which affect the behaviour of the system must be identified and made explicit. If any of these factors are missing or unknown, we cannot specify expected system behaviour completely and comprehensively – we will have gaps. It is difficult to develop a successful software product if there are gaps or inconsistencies in what we expect the software to do.

Understanding systems
The steps above are designed to help a team understand the system they’re dealing with. The simplest way we can understand the behaviour of a system is as a simple transaction: some entity is stimulated or exercised in a particular way, and the entity does some work. The simplest way of modeling a transaction is by stating that the input to a system determines the output.


In this view, the system output is determined only by the input to the system. I have come to use the terms ‘Interaction’ and ‘Outcome’ instead of ‘input’ and ‘output’ respectively, because they are closer to the way most people think about working with software products: “I interact with a system to achieve some outcome”.


However, it is important to understand that the outcome of an interaction with a system is determined not only by the interaction, but also by the state of the system at the time of the interaction.


The introduction of state into the picture often causes some challenges. The first challenge is differentiating between interaction and state. The easiest way to distinguish between them is by asking the question What determines the outcome of an interaction besides the input?.

The next challenge is understanding that system state is generally not described by a single value. System state is typically made up of multiple dimensions or types, and therefore must be expressed as a set of concrete values, one value per dimension. The same applies to values supplied to the system as part of an interaction.

Once a team begins thinking in terms of states, interactions and outcomes, they’re generally able to have more effective conversations around what behaviour they expect from their system.

software development

Branching and Deployment Flow

Branching and Deployment Flow

TL;DR: We can deploy and test each feature in our dev environment independently and in combination before promotion. This is done through some simple Git and Jenkins setup and simple team discipline. Promotion-ready features are not blocked by ‘immature’ work-in-progress (WIP), but WIP is still independently testable. The build server tells us when Feature Branches are out of date.

I’m quite proud of the delivery flow that one of my teams is currently using. The setup and workflows/discipline are quite simple and relatively easy to create and have turned out to be quite big enablers for increased agility, responsiveness and quality.

The basic setup

  • We use GitFlow with Feature Branching, without release or hotfix branches
  • On Git Push each Feature Branch is built, tested and deployed to its own IIS Application/Virtual Directory in the dev environment (using msdeploy)
  • Jenkins tells us when there are Merge Conflicts with development
  • Developers can do testing on the deployed feature before marking the work as Ready for Test
  • When the tester is happy with the feature, dev merges feature back into develop
  • If bugs are found, work continues in isolation in the Feature Branch
  • The changes to develop are merged into existing Feature Branches
  • Each push to develop triggers another build-test-deploy CI job to the ‘master’ dev environment
  • Deploys to the QA, Staging and Production environments are one-click and triggered manually.
  • Deploys to the Staging and Production environments are based on master
  • When features are ready for production, develop is merged into master
  • The team also makes use of a canary production server (served a fraction of live traffic through load balancing) with automated fan-out and roll-back.
  • The team has little automated acceptance testing at the moment, but is working to improve this.


This setup has solved some issues the team had in the past, such as not being able to deploy ready-for-test work to the QA environment, due to ‘contamination’ by not-ready-for-test work. This in turn was caused by unpredictable and highly varying priorities, and in varaible cycle-time within the dev and ready-for-test phases.


Using feature branches means we’re not doing true Continuous Delivery or even Continuous Integration. We have tried to mitigate this by being very disciplined around not letting Feature Branches diverge too far from develop. We have a Jenkins job which attempts a ‘reverse merge’ of the feature into develop on every feature branch push, which fails if there are any merge conflicts.

There is additional house-keeping in that feature branches need to be created on commencement of work and deleted at end-of-life. Not only that, the IIS Applications created for each Feature Branch need to be deleted manually (creation is automatic). The Git housekeeping is made easier using SourceTree, which has great GitFlow support.

I’d love to hear your comments on this set-up, especially how it can be improved. If you want more details on any of the configuration/set-up/workflows/discipline, feel free to give me a shout 🙂