Story Based Testing and Automation

Did you know? We’re staging some webinars

Last night, we announced dates for two webinars that I will present on the subject, “Story-Based Test Automation Using Free Tools”. Nothing very exciting in that, except that it’s the first time we have used a paid-for service to host our own webinar and marketed that webinar ourselves. (In the past we have always pitched our talks through other people who marketed them).

Anyway, right now (8.40 PM GMT and less than 24 hours since we started the announcements) we have 96 people booked on the webinar. Our GoToWebinar account allows us to accept no more than 100. Looks like a sell-out. Great.

Coincidentally, James Bach and Michael Bolton have revisited and restated their positions on the “testing versus checking” and “manual versus automated testing” dichotomies (if you believe they are dichotomies, that is). You can see their position here:

I don’t think these two events are related, but it seemed to me that it would be a good time to make some statements that set the scene for what I am currently working on in general and the webinar specifically.

Business stories and testing

You might know that we (Gerrard Consulting) have written and promoted a software development method ( that uses the concept of business stories and have created a software as a service product ( to support the method. The method is not a test method, but it obviously involves a lot of testing. Testing that takes place throughout the development process – during the requirements phase, development phase, test phase and ongoing post-production phases.

Business stories are somewhat more to us than ‘a trigger for a conversation’, but we’ll use the term ‘stories’ to refer to them from now on.

In the context of these phases, the testing in scope might be called by other names and/or be part of processes other than ‘test’. Requirements prototyping, validation, (Specification by Example/Behaviour-Driven Development/Acceptance Test Driven Development/ Test-Driven Development – take your pick), feature-acceptance testing, system testing, user-testing and regression testing during and after implementation and go-live.

There’s quite a lot of this testing stuff going on. Right now, the Bach-Bolton dialogue isn’t addressing all of this in a general way, so I’m keeping a watching brief on events in that space. I look forward to a useful, informant outcome.

How we use (business) stories

In this blog, I want to talk specifically about the use of stories in a structured domain-specific language (using, for example Gherkin format (see to example (and that is a KEY word) requirements. I’m not interested in the Cucumber-specific extensions to the Gherkin syntax. I’m only interested in the feature heading (As a…/I want…/So that…) and the scenario structure (given…/when…/then…) etc. and how they are used to test in a broader sense:

  • Stories provide accessible examples in business language of features in use. They might be the starting point of a requirement, but usually not a full definition of a requirement. Without debating whether requirements can ever be complete, we argue that Specification by Example is not (in general) possible or desirable. See here:
  • If requirements provide definitions of behaviour in a general way, stories can be used to create examples of features described in requirements that are specific and, if carefully chosen, can be used to clarify understanding, to prototype behaviours and validate requirements in the eyes of stakeholders, authors and recipients of requirements. We describe this process here:
  • Depending on who creates these stories and scenarios and for what purpose, these scenarios can be used to feed a BDD, ATDD or Specification by Example approach. The terminology used in these approaches varies, but a tester would recognise them as a keyword-driven approach to test automation. Are these automated scenarios checks or tests? Probably checks. But these automated checks have multiple goals beyond ‘defect-detection’.

Story-based testing and automation

You see, the goals of an automated test (and let me persist in calling them tests for the time being) varies and there are several distinct goals of story-based scenarios as test definitions.

In the context of a programmer writing code, the rote automation of scenarios as tests gives the programmer a head start in their test-driven development approach. (And crafting scenarios in the language of users segues into BDD of course). The initial tests a programmer would have needed to write already exist so they have a clearer initial goal. Whether the scenarios exist at a sufficiently detailed level for programmers to use them as unit-tests is a moot point and not relevant right now. The real value of writing tests and running them first derives from:

  1. Early clarification of the goal of a feature when defined
  2. Immediate feedback of the behaviour of a feature when run
  3. When the goal is understood and the tests pass, then the programmer can more safely refactor their code

Is this testing? 2 is clearly an automated test. 3 is the reusable regression test that might find its way into a continuous integration and test regime. These tests typically exercise objects or features through a technical API. The user interface probably won’t be exercised.

There is another benefit of using scenarios as the basis of automated tests. The language of the scenario (which is derived from the businesses’ language in a requirement) can be expected to be reused in the test code. We can expect (or indeed mandate) the programmer to reuse that language in the naming of their variables and objects in code. The goals of Ubiquitous Language in systems (defined by Eric Evans and nicely summarised by Martin Fowler here are supported.

Teams needing to demonstrate acceptance of a feature (identified and defined by a story), often rely on manual tests executed by the user or tester. The tester might choose to automate these and/or other behaviour or user-oriented tests as acceptance regression tests.

Is that it? Automated story tests are ‘just’ regression tests? Well maybe so.

The world is going ‘software as a service’ and the development world moves closer to continuous delivery approaches every day. The time available to do manual testing is shrinking rapidly. In extremis, to avoid bottlenecks in the deployment pipeline ( there may be time only to perform cursory manual testing. Manual, functional testing of new features might take place in parallel with development and automation of functional tests must also happen ahead of deployment because automated testing becomes part of the deployment process itself. Perhaps manual testing becomes a test-as-we-develop activity?

But there are two key considerations for this high-automation approach to work:

  1. I’ve said elsewhere that Continuous Delivery is a beast that eats requirements ( and for CD to work, then the quality of requirements must be much higher than we are accustomed to. We use the term trusted requirements. You could say, tested and trusted. We, and I mean testers mostly, need to validate requirements using stories so the developers receive both trusted requirements and examples of features in use. Without trusted requirements, CD will just hit a brick wall faster.
  2. Secondly, it seems to me that for the testers not to be a bottleneck, then the manual checking that they do must be eliminated. Whichever tests can be automated should be. The responsibility for automation of checking must move from being a retrospective activity to possibly a developer activity. This will free the manual testers to conduct and optimise their activity in the short time they have available.
  3. There are several spin-off benefits of basing tests on stories and scenarios. Here’s two: if test automation is built early, then all checks can take advantage of it; if automation is built in parallel with the software under test, then the developers are much more likely to consider the test automation and build the hooks to allow it to operate effectively. The continuous automated testing provides the early warning system of continuous delivery regimes. These don’t ‘find bugs’, rather they signal functional equivalence. Or not.

    I wrote a series of four articles on ‘Anti-Regression Approaches’ here: What are the skills of setting up regression test regimes? Not necessarily the same as those required to design functional tests. Primarily, you need automation skills and a knowledge of the internals of the system under test. Are these testing skills? Not really. They are more likely to be found in developers. This might be a good thing. Would it not be best to place responsibility for regression detection on those people responsible for introducing regressions? Maybe developers can do it better?

    One final point. If testers are allowed (and I use that word deliberately) to test or validate requirements using stories in the way we suggest, then the quality of requirements provided to developers will improve. And so will the software they write. And the volume of testing we are currently expected to resource will reduce. So we need fewer testers. Or should I say checkers?

    This is the essence of the “redistributed testing” offer that we, as testers, can make to our businesses.

    The webinar is focused on our technical solution and is driven by the thinking above.

    Last time I looked we had 97 registrants on the 4th April Webinar. If you are interested, the 12th April webinar takes place at 10 AM GMT – you can register for it here:

Anti-Regression Approaches: Impact Analysis and Regression Testing Compared and Combined: Part IV: Automated Regression Testing

In Parts I and II of this article series, we introduced the nature of regression, impact analysis and regression prevention. In Part III we looked at Regression Testing and how we select regression tests. This article focuses on the automation of regression testing.

Automated Regression Testing is One Part of Anti-Regression

Sometimes it feels like more has been written about test automation, especially GUI test automation, than any other testing subject. My motivation in writing this article series was that most things of significance in test automation had been said 8, 10 or 15 years ago and not that much progress has been made since (notwithstanding the varying technology changes that have occurred). I suggest there’s been a lack of progress, because significant and sustained success with automation of (what is primarily) regression testing is still not assured. Evidence of failure, or at least troublesome implementations of automation, is still widespread.

My argument in the January 2010 Test Management Summit was that perhaps the reason for failure in test automation was that people didn’t think it through before they started. In this context, ‘started’ often means getting a good deal on a proprietary GUI Test Automation tool.

It’s obvious – buying a tool isn’t the best first step. Automating tests through the user interface may not be the most effective way to achieve anti-regression objectives. Test automation may not be an effective approach at all. It certainly shouldn’t be the only one considered. Test execution automation promises reliable, error-free, rapid, unattended test execution. In some environments, the promise is delivered, but in most – it is not.

In the mid 1990’s informal surveys revealed that very few (in one survey, less than 1% of) test automation users achieved ‘significant benefits’. The percentage is much higher nowadays – maybe as high as 50%, but that is probably because most practitioners have learnt their lessons the hard way. Regardless, success is not assured.

Much has been written on the challenges and pitfalls of test automation. The lessons learned by practitioners in the mid-90s are substantially the same as those facing practitioners today. I have to say that it’s a cause of some frustration that many companies still haven’t learnt them. In this article, there isn’t space to repeat those lessons. The referred papers, books and blogs at the end of this article focus on implementing automation, primarily from a user interface point of view, and sometimes as an end in itself. To complement these texts, to bring them up to date and focus them on our anti-regression objective, the remainder of this article will set out some wider considerations.

Regression test objectives and (or versus?) automation

The three main regression test objectives are set out below together with some suggestions for test automation. Although the objectives are distinct, the differences between regression testing and automation for the three objectives are somewhat blurred.

Anti-Regression Objective Source of Tests Automation Considerations
1. To detect unwanted changes to trusted functionality. Functional system tests

Integration tests

Consider the criteria in references 6, 7, 8

Most likely to be automated using drivers to component and sub-system interfaces

2. To detect unwanted changes (to support technical refactoring). Test-first, test-driven environments generate automated tests naturally Consider reference 9 and the discussion of testing in TDD and Agile in general.
3. To demonstrate to stakeholders that they can still do business. Acceptance Tests, business process flows, ‘end to end’ tests. Consider the criteria in references 6, 7, 8 but expect mostly manual testing for demonstration purposes.

See reference 10 for an introduction to Acceptance Driven Development.

Regression objectives reframed: detecting regression v providing confidence

Of the three regression test objectives above, objectives 1 and 2 are similar. What differentiates them is who (and where) they come from. Objective 1 comes from a system supplier perspective and tests are most likely to be sourced from system or integration tests that were previously run (either manually or automated). Objective 2 comes from a developer or technical perspective where the aim is to perform some refactoring in a safe environment. By and large, ‘safe’ refactoring is most viable in a Test-Driven environment where all unit tests are automated, probably in a Continuous Integration regime. (Although refactoring at any level benefits from automated regression testing).

If objectives 1 and 2 require tests to demonstrate ‘functional equivalence’, regression test coverage can be based on the need to exercise the underlying code and cover the system functionality. Potentially, tests based on equivalence partitioning ought to cover the branches in code (but not housekeeping or error-handling functionality – but see below). Tests covering edge conditions or boundary values should verify the ‘precision’ of those decisions. So a reasonable guideline could be – use automation to cover functional paths through the system and data-drive those tests to expand the coverage of boundary conditions. The automation does not necessarily have to execute tests that would be recognisable to the user, if the objective is to demonstrate functional equivalence.

Objective 3 – to provide confidence to stakeholders is slightly different. In this case, the purpose of a regression test is to demonstrate to end users that they can execute business transactions and use the system to support their business. In this respect, it may be that these tests could be automated and some automated tests that fall under the category 1 and 2 above will be helpful. But experience of testing GUI applications in particular suggests that end users sometimes only trust their own eyes and need to have a hands-on experience to give them the confidence that is required. Potentially, a set of automated tests might be used to drive a number of ‘end to end’ transactions, and reconciliation or control reports could be generated to be inspected by end users. There is a large spectrum of possibilities of course. In summary, automated tests could help, but in some environments, the need for manual tests as a ‘confidence builder’ cannot be avoided.

At what level(s) should we automate regression tests?

In Part III of this article series, we identified three levels at which regression testing might be implemented – at the component, system and business (or integrated system) levels. These levels should be considered as complementary and the choice is where to place emphasis, rather than which to include or exclude. The choice of automation at these levels is not really the point. Rather, a level of regression testing may be chosen primarily to achieve an objective, partly on the value of information generated and partly because of the ease with which the tests can be automated.

What are the technical considerations for automation?

At the most fundamental, technical level, there are four aspects of the system under test that must be considered. How the system under test is stimulated, and how the test outcomes of interest (with respect to regression) will be detected.

Mechanisms for stimulating the system under test

This aspect reflects how a test is driven by either a user or an automated tool. Nowadays, the number of user and technical interfaces in use is large – and growing. A table of the most common are presented and some suggestions made.

PC/Workstation-based applications and clients>
  • Proprietary or open source GUI-object based drivers
  • Hardware (keyboard, video, mouse) based tools – physically connected to clients
  • Software based automation tools driving clients working across VNC connections
Browser/web-based applications
  • Proprietary object-based agents
  • Open source JavaScript-based agents
  • Open source script languages and GUI toolkits
Web-Server-based functionality (HTTP)
  • Proprietary or open source webserver/HTTP/S drivers
Web services
  • Proprietary or open source web services drivers
Mobile applications
  • Mobile OS simulators driven by integrated or separate GUI based toolkits
  • Typically java-based toolkits
Error, failure, spate, race conditions or other situations
  • May be simulated by instrumentation, load generation tools or manipulation of the test environment or infrastructure
  • Don’t forget that environmental conditions influence the behaviour of ALL systems under test.

There are an increasing number of proprietary and open source unit and acceptance testing frameworks available to manage and control the test execution engines above.

Outcome/Output detection and capture

A regression can be detected in as many ways as any outcome (output, change of state etc.) of a system can be exposed and detected. Here’s a list of common outcome/output formats that we may have to deal with. This is not a definitive list.

Browser-rendered output
  • The state of any object on the document Object Model (DOM) exposed by a GUI tool
Any screen-based output
  • Image recognition by hardware or software based agents
Transaction response times
  • Any automated tool with response time capture capability
Database changes
  • Appropriate SQL or database query tool
Message output and content
  • Raw packets captured by network sniffers
  • Message payloads captured and analysed by protocol-specific tools
Client or server system resources
  • CPU, i/o, memory, network traffic etc. detected by performance monitors
Application or other infrastructure – changes of state
  • (Database, enterprise messaging, object request brokers etc. etc.) – dedicated system/resource monitors or custom-built instrumentation etc.
Changes in accessibility or usability (adherence to standards etc.)
  • Web page HTML scanners, character-based screen or report scanners or screen image scanners
Security (server)
  • Port scanning and server-penetration tools

Comparison of Outcomes

A fundamental aspect of regression testing is comparison of actual outcomes (in whatever format from whatever source above) to expected outcomes. If we are running a test again, the comparison is between the new ‘actual’ output/outcome and previously captured ‘baseline’ output/outcome.

Simple comparison functionality of numbers, text, system states, images, mark-up language, database content, reports, message payloads, system resource is not enough. We need to have a capability in our automation to:

Filter content: we may not need to compare ‘everything’. Subsets of database records, screen/image regions, branches or leaves in marked up text, some objects and states but not others etc. may be filtered out (of both actual and baseline content).

Mask content: of the content we filter out, we may wish to mask out certain patterns of content such as image regions that do not contain field borders; textual report columns or rows that contain dates/times, page numbers, varying/unique record ids etc.; screen fields or objects of certain colours, sizes, that are hidden/visible; patterns of text that can be matched using regular expressions and so on.

Calculate from content: the value, significance or meaning of content may have to be calculated: perhaps the number of rows displayed on a screen is significant; the error message, number or status code displayed on a screen image, extracted by text recognition; the result of a formula in which the variables are extracted from an outputted report and so on.

Identify content meeting/exceeding a threshold: the significance of output is determined by its proximity to thresholds such as: CPU, memory or network bandwidth usage compared to pre-defined limits; the value of a purchase order exceeds some limit; the response time of a transaction exceeds a requirement and so on.

System Architecture

The architecture of a system may have a significant influence over the choice of regression approach and automation in particular. An example will illustrate. An increasingly common software model is the MVC or model-view-controller architecture. Simplistically (from Wikipedia):

“The model is used to manage information and notify observers when that information changes; the view renders the model into a form suitable for interaction, typically a user interface element; the controller receives input and initiates a response by making calls on model objects. MVC is often seen in web applications where the view is the HTML or XHTML generated by the app. The controller receives GET or POST input and decides what to do with it, handing over to domain objects (i.e. the model) that contain the business rules and know how to carry out specific tasks such as processing a new subscription.”

A change to a ‘read-only’ view may be completely cosmetic and have no impact on models or controllers. Why regression test other views, models or controllers? Why automate testing at all – a manual inspection may suffice.

If a controller changes, the user interaction may be affected in terms of data captured and/or presented but the request/response dialogue may allow complete control of the transaction and examination of the outcome. In many situations, automated control of requests to and from controllers (e.g. HTTP GETs and POSTs) is easier to achieve than automating tests through the GUI (i.e. a rendered web page).

Note that cross-browser test automation, to verify the behaviour and appearance of a system’s web pages across different browser types, for example, cannot be handled this way. (Some functional automation may be possible, but some usability/accessibility tests will always be manual).

It is clear that the number and variety of the ways a system can be stimulated and potentially regressive outcomes can be observed is huge. Few, if any tools, proprietary or open source, have all the capabilities we need. The message is clear – don’t ever assume the only way to automate regression testing is to use a GUI-based test execution tool!

Regression test automation – summary

In summary, we strongly advise you to bear in mind the following considerations:

  1. What is the outcome of your impact analysis?
  2. What are the objectives of your anti-regression effort?
  3. How could regressions manifest themselves?
  4. How could those regressions be detected?
  5. How can the system under test be stimulated to exercise the modes of operation of concern?
  6. Where in the development and test process is it feasible to implement the regression testing and automation?
  7. What technology, tools, harnesses, custom utilities, skills, resources and environments do you need to implement the automated regression test regime?
  8. What will be your criteria for automating (new or existing, manual) tests?

Test Automation References

  1. Brian Marick, 1997, Classic Testing Mistakes,
  2. James Bach, 1999, Test Automation Snake Oil,
  3. Cem Kaner, James Bach, Bret Pettichord, 2002, Lessons Learned in Software Testing, John Wiley and Sons
  4. Dorothy Graham, Paul Gerrard, 1999, the CAST Report, Fourth Edition
  5. Paul Gerrard, 1998, Selecting and Implementing a CAST Tool,
  6. Brian Marick, 1998, When Should a Test be Automated?
  7. Paul Gerrard, 1997, Testing GUI Applications,
  8. Paul Gerrard, 2006, Automation below the GUI (blog posting),
  9. Scott Ambler, 2002-10, Introduction to Test-Driven Design,
  10. Naresh Jain, 2007, Acceptance-Test Driven Development,

In the final article of this series, we’ll consider how an anti-regression approach can be formulated, implemented and managed and take a step back to summarise and recap the main messages of these articles.

Paul Gerrard
21 June 2010.