software development

The Essence of SAFe

Scaled Agile Framework (SAFe) can look quite complicated and intimating at first glance. However the essence of SAFe is pretty simple:

  • Fractal Plan-Do-Check-Act feedback loops (Team- and Program-level)
  • Fractal levels of work filtering and prioritisation
  • Priority alignment at scale (i.e. across a large number of people)

Around 6 months ago I got trained in Scaled Agile Framework (SAFe) and became certified as a SAFe Program Consultant (SPC) for SAFe 4.0. Before doing the training, I had a somewhat negative opinion of SAFe, as well as other scaling frameworks, e.g. LeSS, DAD etc. I can now say that my former opinion was based on very little understanding of what SAFe is and how it works, and was without a doubt an uneducated opinion. I admit to being somewhat ashamed for having the views I held then. Anyway, since the training, I have coached almost full-time in an organisation practicing SAFe, and have done a fair amount of thinking, training and speaking on SAFe. The following is what I believe to be the essence of SAFe. (Disclosure: my training and certification is in SAFe 4.0. SAFe 4.5 was released relatively recently. I have not yet caught up with 4.5, though I believe my thoughts remain valid and accurate.)

Fractal feedback loops: SAFe has 2 main Plan-Do-Check-Act feedback loops. At the team level, Scrum (or Kanban), and at the program, or team-of-teams level, the Program Increment. A Program Increment (PI) is analagous to a sprint in Scrum, that is a timebox wherein a group of people plan, execute, reflect and adjust together. The differences between a PI and a sprint are that a PI involved more people, is longer lived (typically 4-5 sprints), and has ceremonies designed for the level of scale. Note that although a plan is made at PI Planning, this plan is not set in stone. It is revisited frequently during execution of the PI, much the same way a Sprint plan is revisited every day at a Daily Stand-Up or Daily Scrum (HT to Kevin Shine for reminding me about this point).

Fractal work filtering and prioritisation: Kanban is used at various levels (portfolio, value stream and program levels in SAFe 4) to filter and prioritise the work that the people in the teams do. This is to ensure as far as possible that we don’t waste time and effort working on the wrong things. This is similar to how work gets filtered and prioritised in Scrum. The feedback loops mentioned above provide a limit to the length of time the teams work on anything before getting feedback from stakeholders.

Priority alignment at scale: An Agile Release Train is a way of making sure that everybody, across the organisation, required for work to happen has the same single list of priorities. That is, all the necessary people are dedicated solely to this set of priorities. This is analagous to cross-functional teams in Scrum. When everyone has the same priority, we have much less coordination overheard and much less waiting for each other compared to being dependent on people with other priorities.

SAFe does have a lot of roles and moving parts, there is no doubt. However I believe that the three things mentioned above form the core and essence of the framework. I certainly believe that most organisations, certainly those I’ve worked in, would benefit from all three of these structures and practices.

Kent Beck is fond of saying:

Work on the most important thing until either its done or its no longer the most important thing

If you look closely at the three things above, you might find that they’re simply ways of doing just this, when there are a large number of people working together on the same thing.

Advertisements
software development

That’s Not Agile!

TL;DR People spend a lot of time arguing whether things are ‘Agile’ or not. ‘Agile’ has become the tail wagging the dog and asking that question misses the point. The question we should be asking is “Does X help us (or our client) achieve the outcomes they desire?”


I recently spoke at a community meetup on the topic Agile Release Trains: Agile, or not so much?. The talk had 2 parts, the first explaining what Agile Release Trains are; the second part discussing whether the question But is it Agile? is a useful question to ask. In my previous post (What is ‘Agile’), I discussed my current interpretation of the Manifesto for Agile Software Development and what I believe ‘Agile’ means.

I no longer believe that there is much value in discussing whether something, e.g. Agile Release Trains, is Agile or not. The term ‘Agile’ has become a misnomer, a red herring. It has become the tail wagging the dog. In his Agile Africa 2015 Keynote, Kent Beck used the analogy of dogs and small children looking at the pointed finger, instead of what the finger is pointing at, to describe Agile. Alistair Cockburn, another Manifesto signatory has launched his own ‘new branding’: Heart of Agile. I assume this was because he was frustrated by what Agile had become.

Agile has become a goal in its own right, and has moved away from its origins as a guideline for a way of operating. Instead of asking “does this help me (or my client) achieve the outcomes we seek?”, we’re so caught up in debating what agile is or isn’t, and spend a lot of time in No True Scotsmen arguments.

Who cares if its agile? Are we moving in the direction of continuous improvement or not? Do we even know what continuous improvement means for us in our context? What would imrovement look like? How would we measure it? Would we know it if it bit us in the backside? The next time you hear, or say, “That’s Not Agile!”, stop. Ask yourself what’s really being said and why.

For what it’s worth, my opinion is that any operating model that follows the 4 value statements described in the Manifesto is, by definition, agile.

software development, Uncategorized

What is ‘Agile’?

TL;DR Agile is a way of operating wherein the decisions we make are guided primarily by 4 trade-off statements expressed as value statements in the Agile Manifesto. Following this guidance is believed to lead to improved ways of developing software.


Recently I have again started to question what is meant by ‘Agile’ (I make no distinction between ‘Agile’ and ‘agile’). I have asked this question at a few conferences, Lean Coffees etc. This is my current interpretation of ‘Agile’, as informed by the Manifesto for Agile Software Development, specifically the first statement and the four value statements:

We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value:

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

That is, while there is value in the items on
the right, we value the items on the left more.

These statements provide guidelines for trade-off decisions we should be making. If you’re in a situation where you can choose between working on either working software or comprehensive documentation, rather choose working software, especially if you don’t yet have working software.

If you’re in a situation where you have a plan but have discovered that the plan no longer makes sense, you can choose between either following the plan (even though you now know it no longer makes sense to do so), or responding to the change, the guidance from the Manifesto is to rather respond to the change. (I personally prefer adapt to learning over responding to change but that’s another story.)

The same thinking applies to the other two value statements: should you have to choose either contract negotiation or customer collaboration, rather choose the latter. If you need to choose between processes and tools and individuals and interactions, again rather choose the latter.

The original authors and signatories of the Manifesto believed, based on their collective experience, that if you follow these decision making guidelines, you will become better at developing software. I certainly don’t disagree with them.

software development

Test Trade-Offs

TL;DR: Software developers often decide what tests to write based on technical considerations. Instead they should decide what tests to write based on what feedback is missing. The Test Trade-Offs Model can be used to make better decisions around what tests to write. Useful dimensions to use when deciding what type of tests you should write are primarily: speed of feedback, coverage and variation.


The current thinking in the software development industry is to have a lot of low-level unit tests, fewer integration tests, and even fewer higher-level tests like system or end-to-end tests. The Test Pyramid, shown below, is a common model used to describe the relative amounts or ratios of the different types of tests we should aim for.

Traditional Test Pyramid

This kind of thinking generally focuses on how quickly the tests run – i.e. speed of feedback – and also how easy it is to write the different types of tests. Both of these are technical considerations. The problem I have with this thinking is it ignores the primary reason we have tests in the first place – to get feedback about our system. If technical considerations govern the types of tests we have, there may be a large number of tests we will never write, and thus a lot of feedback we’re not getting. For example, having lots of low-level unit tests doesn’t give us any information about how the system works as a whole. Evidence of this phenomenon is the multitude of memes around unit testing not being enough. Some of my favourites (click the pictures for the original Tweets):

Unit testers be like

Focusing on technical considerations only leads us to make blind trade-offs: we’re not even aware of other dimensions we should be considering when deciding which tests to write. The Test Trade-Offs Model was developed so that teams trade-offs when deciding which tests to write, by making other trade-of dimensions explicit. The model is predicated on the idea that different tests are valuable to different audiences at different times, for different reasons.

The dimensions currently in the model are:

  • Speed: How quickly does the test execute? How long do we have to wait to get the feedback the test gives us?
  • Coverage: How much of the system (vertically) does the test exercise? In general, the higher the coverage, the more confident we are about the behaviour of the system as whole, since more of the system is being exercised. Coverage is also known as scope or depth.
  • Variation: How many near-identical variations of the test are there? E.g. if a test has lots of inputs, there may be very many combinations of inputs, with each combination requiring its own test. (This article is useful for more on this idea.)

In an ideal world, our tests would execute instantaneously, cover the entire system, and would deal every combination of inputs and states as well. Therefore, the ideal test would score very highly in all dimensions. Unfortunately this is not possible in the real world since some of the dimensions have an inverse affect on others. The image below is a causal loop diagram showing the causal relationships between dimensions.

Causal Loop Diagram
  • An increase in Coverage generally leads to a decrease in speed of feedback. This is because the more of the system covered by the test, the longer the test takes to run.
  • An increase in Variation typically leads to a decrease in coverage. With high variation, there is usually a very high number of tests. If the suite of tests is to complete running in a reasonable timeframe, we usually decrease the coverage of these tests.

As the model shows, no test can ever maximise for all dimensions. Any test will compromise on some of the dimensions. We therefore need to choose which dimension to prioritise for a test. This is the trade-off. Each test should prioritise one of the dimensions. The trade-off of priorities should be based on what feedback about the system we need.

For example, if we need tests that give us information about the behaviour of the whole system, which will be valuable for a long time, we’re most likely willing to compromise on speed of execution and variation. The trade-off is now explicit and deliberate. Traditionally we would have ruled out such a test immediately because it would take too long to run.

The way I’d like to see the model being used is for teams to decide what system feedback they’re missing, decide what trade-offs to make, and then what kind of tests to write.

I believe this to be the the first iteration of the model, I expect it to evolve. I’m certain there are other dimensions I haven’t yet included, perhaps even more important dimensions. What dimensions do you use when deciding what type of tests to write? What dimensions do you think should be added to the model?

Acknowledgement
I would like thank Louise Perold, Jacques de Vos and Cindy Carless who helped me refine my thinking around this model and who helped improve this article.

software development

The States, Interactions and Outcomes Model

TL;DR: The States, Interactions and Outcomes model provides a way for cross-functional teams to collaboratively explore, specify and document expected system behaviour.


Specification by Example (SbE) and Behaviour-Driven Development (BDD) can be an incredibly effective way for teams to explore and define their expectations for the behaviour of a system. The States, Interactions and Outcomes Model provides a set of steps, and a lightweight documentation structure for teams to use SbE and BDD more effectively. The best way of conveying the model is through a worked example.

Worked example
To demonstrate the model and the process, I will take you through applying it to a problem I use frequently in coaching and training. Imagine we are creating software to calculate the total cost of purchased items at a point of sale. (This problem is inspired by Dave Thomas’ Supermarket Pricing Kata and here.) You walk up to a till at a supermarket, hand the check-out person your items one-by-one, and the checkout person starts calculating the total of the items you want to purchase. The total is updated each time the checkout person records an item for purchase.

We would like to include a number of different ways of calculating the total price for purchased items, since the supermarket will want to run promotions from time to time. Some of the pricing methods we would like to include are:

  • Simple Pricing: the total cost is calculated simply by adding up the cost of each individual item recorded at the point of sale.
  • Three-for-Two Promotion: By three of any particular item, pay for only two. This promotion is specific to the type of item being sold. For example, buy three loaves of Brand-X bread, pay for only two.
  • Combo Deal: A discount is applied when a specific combination of items is purchased.
  • Bulk Discount: A discount is applied when more than a specific number of a particular item is purchased.

In this article I will deal with only ‘Simple Pricing’ and ‘Three-for-Two Promotion’. I will deal first with ‘Simple Pricing’ completely, and then start with ‘Three-for-Two Promotion’.

Simple Pricing

  • System boundaries: We are concerned only with the way the total for the purchased items is calculated. We are not concerned with things like how the cost of an item is acquired (e.g. barcode scanning), accepting payment etc.
  • Types of inputs: For Simple Pricing, the only input is the price of the item being recorded – item price.
  • Types of state: What affects calculating the total price besides item price? For Simple Pricing, the total after recording an item – the new total – is determined by both the price of the captured item, as well as the total before the item is captured. Therefore state consists of current total.
  • Outcome dimensions: For Simple Pricing, the outcome consists only of the total calculated as a result of capturing an item – new total.
  • Possible values for state types: Current total is an integer, which can be negative, 0, or positive.
  • Possible values for inputs: Item price is an integer, which can be negative, 0, or positive.

Expected outcomes for combinations of state and inputs:

State Interaction Outcome Scenario Name
Current total Capture item that costs New total Error
0 0 0 Free first item
0 10 10 First item
10 10 20 Second item
0 -10 ERROR – item price can’t be negative First item with negative price
10 -10 ERROR – item price can’t be negative Second item with negative price
10 ABCDEF ERROR – invalid input Text input

Three-for-Two Promotion

  • System boundaries: The system boundaries don’t change compared to Simple Pricing.
  • Types of inputs: For Three-for-Two Promotion the type or name of the item is now also required as an input – item type.
  • Types of state: The outcome is now also affected by two other types of state: the types of items already captured – already captured items; and the type of Promotion currently active – Active Promotion.
  • Outcome dimensions: For Three-for-Two Promotion, the outcome consists of new total, as well as the new list of items that have been captured – new captured items.
  • Possible values for state types: Current total is an integer, which can be negative, 0, or positive. Active Promotion is a complex type. It can be ‘none’ or a promotion for a specific type of item, e.g. ‘Buy 3 Cokes, pay for 2’.
  • Possible values for inputs: Item price is an integer, which can be negative, 0, or positive. Already captured items specifies the quantity and types of items already captured.

Expected outcomes for combinations of state and inputs:

State Interaction Outcome Scenario Name
Active promotion Current total Items already captured Capture That costs New total New captured items Error
20 2 Cokes Coke 10 30 3 Cokes 3rd item with no promotion
Buy 3 Cokes pay for 2 20 2 Cokes Coke 10 20 3 Cokes 3rd qualifying item with 3 for 2 promotion
Buy 3 Cokes pay for 2 20 1 Coke, 1 bread Coke 10 30 2 Cokes, 1 bread 3rd item doesn’t trigger promotion

There are several interesting things about the specifications above to which I’d like to draw particular attention:

  • All the words and concepts used are domain-level words and concepts. There are no implementation or software-specific words.
  • The specification describes the transactions and outcomes only, not how the work should be done.
  • The things that determine the outcome of a transaction are super-obvious and explicit. This makes it easier to detect and discuss edge cases.
  • Invalid states and interactions are easy to see.
  • The path to any particular state is clear and obvious
  • Should we want to, it would be easy to automate the verification of a system which should satisfy these specifications.

As mentioned above, I developed and use this model during my coaching and training. It has proven very effective for quickly exploring and documenting system behaviour. In some BDD Bootcamps, we have explored and specified legacy systems running in productions in about 3 hours. One of the ways this has proven useful is people in the bootcamp who have not worked on those particular systems gained a very thorough high-level overview of the intention of the system.

The worked example above follows these steps:
1. Explicitly define and bound the system under specification. What is included, what is excluded?
2. What are the different inputs to the system?
3. What are the types of state that the system can have? Another way to ask this: Besides the inputs, what can affect the outcome of an interaction?
4. What constitutes system outcome? Is any output returned to the user? Note that an outcome must, by definition, include all states as identified above. Outcome can also include error conditions.
5. For each type of state, what are the possible values?
6. For each type of input, what are the possible values?
7. For each combination of state and interaction, what is the expected outcome (including all dimensions)?

The Thinking Behind The Model
The idea behind the model is that the outcome of a system interaction is a function of the interaction and the state of the system at the time of interaction. We can develop a complete and comprehensive specification of expected system behaviour by describing the expected outcome for every possible combination of state and interaction.

Specification by Example and Behaviour-Driven Development
The model and the steps are largely based on the concepts of Specification by Example and Behaviour-Driven Development. Specification by Example (SBE) is the practice of specifying expected system behaviour using concrete values instead of natural-language descriptions. For more on Specification by Example,you can’t do better than Gojko Adzic’s book. Behaviour-Driven Development (BDD) uses SBE. One of the reasons I use SBE is that it allows us to work with something tangible, instead of ‘invisible ideas’. Some of the benefits of using BDD and SBE are:

  • Getting feedback on the work from a wider audience earlier in the process.
  • Making edge cases more obvious.

Ordinarily, we would need to write some software to achieve these things. By using BDD and SBE we can get these benefits before writing any software. However it is not always easy to get started with these techniques.

A common challenge teams face when they start using BDD and SBE is the need to make every aspect of expected externally-observable system behaviour completely explicit. That is, all the factors which affect the behaviour of the system must be identified and made explicit. If any of these factors are missing or unknown, we cannot specify expected system behaviour completely and comprehensively – we will have gaps. It is difficult to develop a successful software product if there are gaps or inconsistencies in what we expect the software to do.

Understanding systems
The steps above are designed to help a team understand the system they’re dealing with. The simplest way we can understand the behaviour of a system is as a simple transaction: some entity is stimulated or exercised in a particular way, and the entity does some work. The simplest way of modeling a transaction is by stating that the input to a system determines the output.

input-system-output

In this view, the system output is determined only by the input to the system. I have come to use the terms ‘Interaction’ and ‘Outcome’ instead of ‘input’ and ‘output’ respectively, because they are closer to the way most people think about working with software products: “I interact with a system to achieve some outcome”.

interaction-system-outcome

However, it is important to understand that the outcome of an interaction with a system is determined not only by the interaction, but also by the state of the system at the time of the interaction.

state-interaction-system-outcome

The introduction of state into the picture often causes some challenges. The first challenge is differentiating between interaction and state. The easiest way to distinguish between them is by asking the question What determines the outcome of an interaction besides the input?.

The next challenge is understanding that system state is generally not described by a single value. System state is typically made up of multiple dimensions or types, and therefore must be expressed as a set of concrete values, one value per dimension. The same applies to values supplied to the system as part of an interaction.

Once a team begins thinking in terms of states, interactions and outcomes, they’re generally able to have more effective conversations around what behaviour they expect from their system.

software development

Issues I’ve seen with Jira (and other similar tools)

TL;DR: Jira (and similar tools) is great for keeping track of bugs, and for tracking, coordinating and auditing work. It is a terrible way to communicate and collaborate across a team. Don’t use it as a communication tool. Rather speak in-person and then document your shared understanding in a tool like Jira, if such documentation is required.


There are some common anti-patterns I’ve seen in organisations that use Jira (and other similar tools). This is what I’ve experienced in several environments: A team goes into Backlog Refinement and/or Sprint Planning and the Product Owner (or whomever is responsible for documenting requirements) projects Jira onto the screen in the boardroom. One by one, each work-item is opened individually and the PO reads through what he or she has written. There’s a little bit of discussion, the delivery team decides they’re ready to play planning poker, they do so and provide an estimate for the work-item, which is captured, and then they move on to the next work-item.

Instead of having a conversation about what problem we’re trying to solve and what outcome we want to achieve, what often happens is everyone just reads what’s been written down. Yes, there is some discussion and a few questions, but fundamentally the conversation is framed (and limited) by what the PO had written down previously (however long ago that was). “We don’t need a conversation because all you need to know is in Jira, just read it.”

Because there is a lot of content, and because the PO clearly took a lot of time and effort to write all of this content, there is a tendency to avoid asking difficult question. “It must be the correct solution, right? The PO must have thought of everything, right?” How likely is the PO to be open to suggestions around solving the problem in a different way? (This is true for everyone, not just POs. Programmers are notorious for being attached to previous work.)

The opportunities for and likelihood of achieving shared understanding in this scenario are severely diminished. “There is so much written down, ‘communication’ must have taken place, right? We all understand what we’re trying to achieve, right?” The likelihood of coming up with a good solution to the problem is also severely diminished.

What typically happens is during development of the feature, there will be multiple ad-hoc face-to-face conversations between the delivery team and the PO (assuming the PO is available to the delivery team), when it is discovered that the written content in Jira is deficient or defective in some way.

What would happen if instead of the PO creating all the requirements documentation and then passing it to the delivery team, the parties had a real conversation, communicated effectively and gained shared understanding. If it is necessary to document this shared understanding, it can then be stored with a tool like Jira. This mitigates a lot of the pitfalls outlined above. I do also think tools like Jira can be very useful for attaching artifacts to work items, like screenshots, designs, wireframes etc.

Another big problem with Jira and similar tools is tool enslavement. Instead of making the tool work for us, we begin to work for the tool. I’ve seen much time and effort wasted through configuring and troubleshooting things like Jira workflows. I’ve seen things like Team A’s Sprint being closed because Team B’s Sprint was correctly closed, and the Sprints happened to have the same name (based on the date). Guess how much fun that was to resolve.

In conclusion, Jira was originally created as a bug/work tracking tool, and its best use is as a tool to plan, coordinate and audit work, not as a communication medium. If you need to document shard understanding, or what work was done when by whom, experiment with capturing this information post-fact, instead of up-front.

software development

How I do TDD

TL;DR: The specification I write are based on domain-level Ubiquitous Language-based specifications of system behaviour. These spec tests describe the functional behaviour of the system, as it would be specified by a Product Owner (PO), and as it would be experienced by the user; i.e. user-level functional specifications.


I get asked frequently how I do Test-Driven Development (TDD), especially with example code. In my previous post I discussed reasons why I do TDD. In this post I’ll discuss how I do TDD.

First-off, in my opinion, there is no difference between TDD, Behaviour-Driven Development (BDD) and Acceptance-Test-Driven Development (ATDD). The fundamental concept is identical: write a failing test, then write the implementation that makes that test pass. The only difference is that these terms are used to describe the ‘amount’ of the system being specified. Typically, with BDD, at a user level, with ATDD, at a user acceptance level (like BDD), and TDD for much finer-grained components or parts of the system. I see very little value in these distinctions: they’re all just TDD to me. (I do find it valuable to discuss the ROI of TDD specifications at different levels, as well as their cost and feedback speed. I’m currently working on a post discussing these ideas.) Like many practices and techniques, I see TDD as fractal. We can get the biggest ROI for a spec test that covers as much as possible.

The test that covers the most of the system is a user-level functional test. That is, a test written at the level of a user’s interaction with the system – i.e. outside of the system – which describes the functional behaviour of the system. The ‘user-level’ part is important, since this level is by definition outside of the system, and covers the entirety of the system being specified.

Its time for an example. The first example I’ll use is specifying Conway’s Game of Life (GoL).

These images represent an (incomplete) specification of GoL, and can be used to specify and validate any implementation of Conway’s Game of Life, from the user’s (i.e. a functional) point of view. These specifications make complete sense to a PO specifying GoL – in fact, this is often how GoL is presented.

These images translate to the following code:

[Test]
public void Test_Barge()
{
    var initialGrid = new char[,]
    {
        {'.', '.', '.', '.', '.', '.'},
        {'.', '.', '*', '.', '.', '.'},
        {'.', '*', '.', '*', '.', '.'},
        {'.', '.', '*', '.', '*', '.'},
        {'.', '.', '.', '*', '.', '.'},
        {'.', '.', '.', '.', '.', '.'},
    };

    Game.PrintGrid(initialGrid);

    var game = CreateGame(initialGrid);
    game.Tick();
    char[,] generation = game.Grid;

    Assert.That(generation, Is.EqualTo(initialGrid));
}

[Test]
public void Test_Blinker()
{
    var initialGrid = new char[,]
    {
        {'.', '.', '.'},
        {'*', '*', '*'},
        {'.', '.', '.'},
    };

    var expectedGeneration1 = new char[,]
    {
        {'.', '*', '.'},
        {'.', '*', '.'},
        {'.', '*', '.'},
    };

    var expectedGeneration2 = new char[,]
    {
        {'.', '.', '.'},
        {'*', '*', '*'},
        {'.', '.', '.'},
    };

    var game = CreateGame(initialGrid);
    Game.PrintGrid(initialGrid);
    game.Tick();

    char[,] actualGeneration = game.Grid;
    Assert.That(actualGeneration, Is.EqualTo(expectedGeneration1));

    game.Tick();
    actualGeneration = game.Grid;
    Assert.That(actualGeneration, Is.EqualTo(expectedGeneration2));
}

[Test]
public void Test_Glider()
{
    var initialGrid = new char[,]
    {
        {'.', '*', '.'},
        {'.', '.', '*'},
        {'*', '*', '*'},
    };

    var expectedGeneration1 = new char[,]
    {
        {'.', '.', '.'},
        {'*', '.', '*'},
        {'.', '*', '*'},
        {'.', '*', '.'},
    };

    var expectedGeneration2 = new char[,]
    {
        {'.', '.', '.'},
        {'.', '.', '*'},
        {'*', '.', '*'},
        {'.', '*', '*'},
    };

    var game = CreateGame(initialGrid);
    Game.PrintGrid(initialGrid);

    game.Tick();
    char[,] actualGeneration = game.Grid;

    Assert.That(actualGeneration, Is.EqualTo(expectedGeneration1));

    game.Tick();
    actualGeneration = game.Grid;
    Assert.That(actualGeneration, Is.EqualTo(expectedGeneration2));
}

All I’ve done to make the visual specification above executable is transcode it into a programming language – C# in this case. Note that the tests do not influence or mandate anything of the implementation, besides the data structures used for input and output.

I’d like to introduce another example, this one a little more complicated and real-world. I’m currently developing a book discovery and lending app, called Lend2Me, the source code for which is available on Github. The app is built using Event Sourcing (and thus Domain-Driven Design and Command-Query Responsibility Segregation). The only tests in this codebase are user-level functional tests. Because I’m using DDD, the spec tests are written in the Ubiquitous Language of the domain, and actually describe the domain, and not a system implementation. Since user interaction with the system is in the form of Command-Result and Query-Result pairs, these are what are used to specify the system. Spec tests for a Command generally take the following form:

GIVEN:  a sequence of previously handled Commands
WHEN:   a particular Command is issued
THEN:   a particular result is returned

In addition to this basic functionality, I also want to capture the domain events I expect to be persisted (to the event store), which is still a domain-level concern, and of interest to the PO. Including event persistence, the tests, in general, look like:

GIVEN:  a sequence of previously handled Commands
WHEN:   a particular Command is issued
THEN:   a particular result is returned
And:    a particular sequence of events is persisted

A concrete example from Lend2Me:

User Story: AddBookToLibrary – As a User I want to Add Books to my Library so that my Connections can see what Books I own.
Scenario: AddingPreviouslyRemovedBookToLibraryShouldSucceed

GIVEN:  Joshua is a Registered User AND Joshua has Added and Removed Oliver Twist from his Library
WHEN:   Joshua Adds Oliver to his Library
THEN:   Oliver Twist is Added to Joshua's Library

This spec test is written at the domain level, in the Ubiquitous Language. The code for this test is:

[Test]
public void AddingPreviouslyRemovedBookToLibraryShouldSucceed()
{
    RegisterUser joshuaRegisters = new RegisterUser(processId, user1Id, 1, "Joshua Lewis", "Email address");
    AddBookToLibrary joshuaAddsOliverTwistToLibrary = new AddBookToLibrary(processId, user1Id, user1Id, title, author, isbnnumber);
    RemoveBookFromLibrary joshuaRemovesOliverTwistFromLibrary = new RemoveBookFromLibrary(processId, user1Id, user1Id, title, author, isbnnumber);

    UserRegistered joshuaRegistered = new UserRegistered(processId, user1Id, 1, joshuaRegisters.UserName, joshuaRegisters.PrimaryEmail);
    BookAddedToLibrary oliverTwistAddedToJoshuasLibrary = new BookAddedToLibrary(processId, user1Id, title, author, isbnnumber);
    BookRemovedFromLibrary oliverTwistRemovedFromJoshuaLibrary = new BookRemovedFromLibrary(processId, user1Id, title, author, isbnnumber);

    Given(joshuaRegisters, joshuaAddsOliverTwistToLibrary, joshuaRemovesOliverTwistFromLibrary);
    When(joshuaAddsOliverTwistToLibrary);
    Then(succeed);
    AndEventsSavedForAggregate(user1Id, joshuaRegistered, oliverTwistAddedToJoshuasLibrary, oliverTwistRemovedFromJoshuaLibrary, oliverTwistAddedToJoshuasLibrary);
}

The definitions of the Commands in this code (e.g. RegisterUser) would be specified as part of the domain:

public class RegisterUser : AuthenticatedCommand
{
    public long AuthUserId { get; set; }
    public string UserName { get; set; }
    public string PrimaryEmail { get; set; }

    public RegisterUser(Guid processId, Guid newUserId, long authUserId, string userName, string primaryEmail)
    : base(processId, newUserId, newUserId)
    {
        AuthUserId = authUserId;
        UserName = userName;
        PrimaryEmail = primaryEmail;
    }

}

Again, the executable test is just a transcoding of the Ubiquitous Language spec test, into C#. Note how closely the code follows the Ubiquitous Language used in the natural language specification. Again, the test does not make any assumptions about the implementation of the system. It would be very easy, for example, to change the test so that instead of calling CommandHandlers and QueryHandlers directly, HTTP requests are issued.

It could be argued that the design of this system makes it very easy to write user-level functional spec tets. However these patterns can be used to specify any system any system, since it is possible to automatically interact with any system through the same interface a user uses. Similarly, it is possible to automatically assert any state in any boundary-interfacing system, e.g. databases, 3rd party webservices, message busses etc. It may not be easy or cheap or robust, but it is possible. For example, a spec test could be:

GIVEN:  Pre-conditions
WHEN:   This HTTP request is issued to the system
THEN:   Return this response
AND:    This HTTP request was made to this endpoint
AND:    This data is in the database
AND:    This message was put onto this message bus

This is how I do TDD: the spec tests are user-level functional tests. If its difficult, expensive or brittle to use the same interface the user uses, e.g. GUIs, then test as close to the system boundary, i.e. as close to the user, as you can.

For interest’s sake, all the tests in the Lend2Me codebase hit a live PostgreSQL database, and an embedded eventstore client. There is no mocking anywhere. All interactions and assertions are outside the system.