Regression Testing v Recession Testing

Anne-Marie Charrett wrote a blog post that I commented on extensively. I’ve reproduced the comment here:

“Some to agree with here, and plenty to disagree with too…

1. Regression testing isn’t about finding bugs the same way as one might test new software to detect bugs (testing actually does not detect bugs, it exposes failure. Whatever.) It is about detecting unwanted changes in functionality caused by a change to software or its environment. Good regression tests are not necessarily ‘good functional tests’. They are tests that will flag up changes in behaviour – some changes will be acceptable, some won’t. A set of tests that purely achieve 80% branch coverage will probably be adequate to demonstrate functional equivalence of two versions of software with a high level of confidence – economically. They might be lousy functional tests “to detect bugs”. But that’s OK – ‘bug detection’ is a different objective.

2. Regression Testing is one of four anti-regression approaches. Impact analysis from a technical and business point of view are the two preventative approaches. Static code analysis is a rarely used regression detection approach. Fourthly…and finally … regression testing is what most organisations attempt to do. It seems to be the ‘easiest option’ and ‘least disruptive to the developers’. (Except that it isn’t easy and regression bugs are an embarrassing pain for developers). The point is one can’t consider regression testing in isolation. It is one of four weapons in our armoury (although the technical approaches require tools). It is also over relied-on and done badly (see 1 above and 3 below).

3. If Regression testing is about demonstrating functional equivalence (or not), then who should do it? The answer is clear. Developers introduce the changes. They understand or should understand the potential impact of planned changes on the code base before they proceed. Demonstrating functional equivalence is a purely technical activity. Call it checking if you must. Tools can do it very effectively and efficiently if the tests are well directed (80% branch coverage is a rule of thumb). Demonstrating functional equivalence is a purely technical activity that should be done by technicians.

Of course, what happens mostly is that developers are unable to perform accurate technical impact analyses and they don’t unit test well so they have no tests and certainly nothing automated. They may not be interested in and/or paid to do testing. So the poor old system or acceptance testers working purely from the user interface are obliged to give it their best shot. Of course, they try and re-use their documented tests or their exploratory nous to create good ones. And fail badly. Not only are tests driven from the UI point of view unlikely to cover the software that might be affected, the testers are generally uninformed of the potential impact of software changes so have no steer to choose good tests in the first place. By and large, they aren’t technical and aren’t privy to the musings of the developers, before they perform the code changes so they are pretty much in the dark.

So UI driven manual or automated regression testing is usually of low value (but high expense) *when intended to demonstrate functional equivalence*. That is not to say that UI driven testing has no value. Far from it. It is central to assessing the business impact of changes. Unwanted side effects may not be bugs in code. Unwanted side-effects are a natural outcome of the software changes requested by users. A common unwanted effect here is for example, a change in configuration in an ERP system. The users may not get what they wanted from the ‘simple change’. Ill-judged configuration changes in ERP systems designed to perform straight-through processing can have catastrophic effects. I know of one example that caused 75 man-years of manual data clean-up effort. The software worked perfectly – there was no bug. The business using the software did not understand the impact of configuration changes.

Last year I wrote four short papers on Anti-Regression Approaches (including regression testing) and I expand on the points above. You can see them here: http://gerrardconsulting.com/index.php?q=node/479 “

Anti-Regression Approaches: Anti-Regression Strategy – Making it Work

The first four articles in this series have set out the main approaches to combating regression in changing software systems. From a business and technical viewpoint, we have considered both pre-change regression prevention (impact analysis) and post-change regression detection (regression testing). In this final article of the series, we’ll consider three emerging approaches that promise to reduce the regression threat and present some considerations of an effective anti-regression strategy with a recap of the main messages of the article series.

Three Approaches: Test, Behaviour and Acceptance Test-Driven Development

There is an increasing amount of discussion on development approaches based on the test-driven model. Ten years or so ago, before lightweight (later named Agile) approaches became widely publicized, test-driven development (TDD) was rare. Some TTD happened, but mostly in high integrity environments where component development and testing was driven by the need to meet formal functional and structural test coverage targets.

Over the course of the last ten years however, the notion of developers creating automated tests typically based on stories and discussions with on-site customers is becoming more common. The leaders in the Agile community are tending to preach behaviour- (BDD) and even acceptance test-driven development (ATDD) to improve and make accessible the test assets in Agile projects. They are also an attempt to move the Agile emphasis from coding to delivery of stakeholder value.

The advocates of these approaches (see for example testdriven.net, gojko.net, behaviour-driven.org, ATDD in Practice) would say that the approaches are different and of course, in some respects they are. But from the point of view of our discussion of anti-regression approaches, the relevance is this:

  1. Regression testing performed by developers is probably the most efficient way to demonstrate functional equivalence of software (given the limited scope of unit testing).
  2. The test-driven paradigm ensures that regression test assets are acquired and maintained in synchrony with the code – so are accurate and constantly reusable.
  3. The existence of a set of trusted regression tests means that the programmer is protected (to some degree) from introducing regressions when they change code (to enhance, fix bugs in or refactor code).
  4. Programmers, once they commit to the test-first approach tend to find their design and coding activities more predictable and less stressful.

These approaches obviously increase the effort at the front-end and many programmers are not adopting (and may never adopt) them. However, the trend toward test-first does seem to be gaining momentum.

A natural extension of test-first in Agile and potentially more structured environments is the notion of live specifications. In this approach, the automated tests become the independent and executable definition of the behavior of the system. The tests define the behavior of a system by example, and can be considered to be executable specifications (of a sort). Of course, examples alone cannot define the behavior of systems completely and some level of logical specification will always be required. However, the live-specification approach holds great promise, particularly as way of reducing regressions.

The ideal seems to be that where a change is required by users, the live specification is changed, new tests added and existing tests changed or retired as required. The software changes are made in parallel. The new and changed tests are run to demonstrate the changes work as required, and the existing (unchanged) tests are, by definition, the regression test pack. The format, content and structure of such live-specifications are evolving and a small number of organisations claim some successes. It will be interesting to see examples of the approach in action.

Unified Requirements and Systems Testing

The test-first approaches discussed above are gaining popularity in Agile environments. But what can be done in structured, waterfall, larger developments?

Some years ago (in my first Eurostar paper in 1993), I proposed a ‘Unified Approach to System Functional Testing’. In that paper, I suggested that a tabular notation for capturing examples or test cases could be used to create crude prototypes, review check lists and structured walkthroughs of requirements. These ‘behaviours’ as I called them could be used to test requirements documents, but also reused as the basis of both system and acceptance testing later on. Other interests took priority and I didn’t take this proposal much further until recently.

Several developments in the industry make me believe that a practical implementation of this unified approach is now possible and attractive to practitioners. See for example the model-based papers here: www.geocities.com/model_based_testing/online_papers.htm or the tool described here: teststories.info. To date, these approaches have focused on high formality and embedded/industrial applications.

Our approach involves the following activities:

  1. Requirements are tabulated to allow cross-referencing.
  2. Requirements are analysed and stories, comprising feature descriptions and a covering set of scenarios and examples (acceptance criteria) are created
  3. The scenarios are mapped to paths though the business process and a data dictionary; paper and automated prototypes can be generated from the scenarios
  4. Using scenario walkthroughs, the requirements are evaluated, omissions and ambiguities identified and fixed.
  5. The process paths, scenarios and examples may be incorporated into software development contracts, if required.
  6. The process paths, scenarios and examples are re-used as the basis of the acceptance test which is conducted in the familiar way.

Essentially, the requirements are ‘exampled’, with features identified and a set of acceptance criteria defined for each – in a structured language. It is the structure of the scenarios that allows tabular definitions of tests for use in manual procedures as well as skeletal automated tests to be generated automatically. There are several benefits deriving from this approach, but the two that concern us here are:

  • The definition of tests and the ability to generate automated scripts occurs before code is written which means that the test-first approach is viable for all projects, not just Agile.
  • The database of requirements, processes, process paths, features, examples and data dictionary are cross-referenced. The database can be used to support more detailed business-oriented impact analysis.

The first benefit has been discussed in the previous section. The second has great potential.

The business knowledge captured in the process will allow some very interesting what-if questions to be asked and answered. If a business process is to change, the system features, requirements, scenarios and tests affected can be traced. If a system feature is to be changed, the scenarios, tests, requirement and business process affected can be traced. This knowledge should provide at least at a high level, a better understanding of the impact of change. Further, it promotes the notion of live specifications and Trusted Requirements.

There is a real possibility that the (typically) huge investment in requirements capture will not be wasted and the requirements may be accurately maintained in parallel with a covering set of scenarios. Further, the business knowledge captured in the requirements and the database can be retained for the lifetime of the system in question.

Improving Software Analysis Tools

The key barrier to performing better technical impact analyses is the lack (and expense) of appropriate tools to provide a range of source code analyses. Tools that provide visualisations of the architecture, relationships between components and hierarchical views of these relationships are emerging. Some obvious challenges make life somewhat difficult though:

  1. Tools are usually language dependent so mixed environments are troublesome.
  2. The source code for third-party components used in your system may not be available.
  3. Visualisation software is available, but for real-size systems, the graphical models can become huge and unworkable.

These tools are obviously focused at architects, designers and developers and are naturally technical in nature.

An example of how tools are evolving in this area is Structure101g by (headwaysoftware.com).
This tool can perform detailed structural analyses of several languages (“java, C/C++ and anything”) but can, in principle provide visualisations, navigation and query facilities for any structural model. For example, with the necessary plugins, the tool can provide insights into XML/XSLT libraries and web site maps at varying levels of abstraction.

As tools like this become better established and more affordable, they will surely become ‘must-haves’ for architects and developers in large systems environments.

Anti-Regression Strategy – Making it Happen

We’ll close this article series with some guidelines summarised from this and previous articles. Numerals in brackets refer to the article number.

  1. Regressions in working software affect business users, technical support, testers, developers, software and project management and stakeholders. It is everyone’s problem (I, V).
  2. A comprehensive anti-regression strategy would include both regression prevention and detection techniques from a technical and business viewpoint. (I, II).
  3. Impact analysis can be performed from both a business and technical viewpoints. (all)
  4. Technical impact analysis really needs tool support; consider open source, proprietary (or consider building your own to meet your specific objectives).
  5. Regression testing may be your main defence against regression, but should never be the only one; impact analysis prevents regression and informs good regression testing. (I, II, IV).
  6. Regression testing can typically be performed at the component, system or business level. These test levels have different objectives, owners and may be automated to different degrees (III).
  7. Regression tests may be created in a test-driven regime, or as part of requirements or design based approaches. Reuse of tests saves time, but check that these tests actually meet your anti-regression objectives (III).
  8. Regression tests become less effective over time; review your test pack regularly, especially when you are about to add to it. (This could be daily in an Agile environment!) (III)
  9. Analyses of production data will tell you the format, volumes and patterns of data that is most common – use it as a source of test data and a model for coverage; but don’t forget to include the negative tests too though! (III)
  10. If you need to be selective in the tests you retain and execute then you’ll need an agreed process, forum, decision-maker or makers or criteria for selection (agreed with all stakeholders in 1 above) (III).
  11. Most regression testing can and should be automated. Understand your context (objectives, test levels, risk areas, developer/tester motivations and capabilities etc.) before defining your automation strategy (III, IV).
  12. Consider what test levels, system stimulation and outcome detection methods, ownership, capabilities and tool usability are required before defining an automation regime (IV).
  13. Creating an automation regime retrospectively is difficult and expensive; test-first approaches build regression testing into the DNA of project teams (V).
  14. There is a lot of thinking, activity and new approaches/tools being developed to support requirements testing, exampling, live-specs and test automation; take a look (V).

I wish you the best of luck in your anti-regression initiatives.

I’d like to express sincere thanks to the Eurostar Team for asking me to write this article series and attendees at the Test Management Summit for inspiring it.

Paul Gerrard
23 August 2010.

Anti-Regression Approaches: Impact Analysis and Regression Testing Compared and Combined: Part IV: Automated Regression Testing

In Parts I and II of this article series, we introduced the nature of regression, impact analysis and regression prevention. In Part III we looked at Regression Testing and how we select regression tests. This article focuses on the automation of regression testing.

Automated Regression Testing is One Part of Anti-Regression

Sometimes it feels like more has been written about test automation, especially GUI test automation, than any other testing subject. My motivation in writing this article series was that most things of significance in test automation had been said 8, 10 or 15 years ago and not that much progress has been made since (notwithstanding the varying technology changes that have occurred). I suggest there’s been a lack of progress, because significant and sustained success with automation of (what is primarily) regression testing is still not assured. Evidence of failure, or at least troublesome implementations of automation, is still widespread.

My argument in the January 2010 Test Management Summit was that perhaps the reason for failure in test automation was that people didn’t think it through before they started. In this context, ‘started’ often means getting a good deal on a proprietary GUI Test Automation tool.

It’s obvious – buying a tool isn’t the best first step. Automating tests through the user interface may not be the most effective way to achieve anti-regression objectives. Test automation may not be an effective approach at all. It certainly shouldn’t be the only one considered. Test execution automation promises reliable, error-free, rapid, unattended test execution. In some environments, the promise is delivered, but in most – it is not.

In the mid 1990’s informal surveys revealed that very few (in one survey, less than 1% of) test automation users achieved ‘significant benefits’. The percentage is much higher nowadays – maybe as high as 50%, but that is probably because most practitioners have learnt their lessons the hard way. Regardless, success is not assured.

Much has been written on the challenges and pitfalls of test automation. The lessons learned by practitioners in the mid-90s are substantially the same as those facing practitioners today. I have to say that it’s a cause of some frustration that many companies still haven’t learnt them. In this article, there isn’t space to repeat those lessons. The referred papers, books and blogs at the end of this article focus on implementing automation, primarily from a user interface point of view, and sometimes as an end in itself. To complement these texts, to bring them up to date and focus them on our anti-regression objective, the remainder of this article will set out some wider considerations.

Regression test objectives and (or versus?) automation

The three main regression test objectives are set out below together with some suggestions for test automation. Although the objectives are distinct, the differences between regression testing and automation for the three objectives are somewhat blurred.

Anti-Regression Objective Source of Tests Automation Considerations
1. To detect unwanted changes to trusted functionality. Functional system tests

Integration tests

Consider the criteria in references 6, 7, 8

Most likely to be automated using drivers to component and sub-system interfaces

2. To detect unwanted changes (to support technical refactoring). Test-first, test-driven environments generate automated tests naturally Consider reference 9 and the discussion of testing in TDD and Agile in general.
3. To demonstrate to stakeholders that they can still do business. Acceptance Tests, business process flows, ‘end to end’ tests. Consider the criteria in references 6, 7, 8 but expect mostly manual testing for demonstration purposes.

See reference 10 for an introduction to Acceptance Driven Development.

Regression objectives reframed: detecting regression v providing confidence

Of the three regression test objectives above, objectives 1 and 2 are similar. What differentiates them is who (and where) they come from. Objective 1 comes from a system supplier perspective and tests are most likely to be sourced from system or integration tests that were previously run (either manually or automated). Objective 2 comes from a developer or technical perspective where the aim is to perform some refactoring in a safe environment. By and large, ‘safe’ refactoring is most viable in a Test-Driven environment where all unit tests are automated, probably in a Continuous Integration regime. (Although refactoring at any level benefits from automated regression testing).

If objectives 1 and 2 require tests to demonstrate ‘functional equivalence’, regression test coverage can be based on the need to exercise the underlying code and cover the system functionality. Potentially, tests based on equivalence partitioning ought to cover the branches in code (but not housekeeping or error-handling functionality – but see below). Tests covering edge conditions or boundary values should verify the ‘precision’ of those decisions. So a reasonable guideline could be – use automation to cover functional paths through the system and data-drive those tests to expand the coverage of boundary conditions. The automation does not necessarily have to execute tests that would be recognisable to the user, if the objective is to demonstrate functional equivalence.

Objective 3 – to provide confidence to stakeholders is slightly different. In this case, the purpose of a regression test is to demonstrate to end users that they can execute business transactions and use the system to support their business. In this respect, it may be that these tests could be automated and some automated tests that fall under the category 1 and 2 above will be helpful. But experience of testing GUI applications in particular suggests that end users sometimes only trust their own eyes and need to have a hands-on experience to give them the confidence that is required. Potentially, a set of automated tests might be used to drive a number of ‘end to end’ transactions, and reconciliation or control reports could be generated to be inspected by end users. There is a large spectrum of possibilities of course. In summary, automated tests could help, but in some environments, the need for manual tests as a ‘confidence builder’ cannot be avoided.

At what level(s) should we automate regression tests?

In Part III of this article series, we identified three levels at which regression testing might be implemented – at the component, system and business (or integrated system) levels. These levels should be considered as complementary and the choice is where to place emphasis, rather than which to include or exclude. The choice of automation at these levels is not really the point. Rather, a level of regression testing may be chosen primarily to achieve an objective, partly on the value of information generated and partly because of the ease with which the tests can be automated.

What are the technical considerations for automation?

At the most fundamental, technical level, there are four aspects of the system under test that must be considered. How the system under test is stimulated, and how the test outcomes of interest (with respect to regression) will be detected.

Mechanisms for stimulating the system under test

This aspect reflects how a test is driven by either a user or an automated tool. Nowadays, the number of user and technical interfaces in use is large – and growing. A table of the most common are presented and some suggestions made.

PC/Workstation-based applications and clients>
  • Proprietary or open source GUI-object based drivers
  • Hardware (keyboard, video, mouse) based tools – physically connected to clients
  • Software based automation tools driving clients working across VNC connections
Browser/web-based applications
  • Proprietary object-based agents
  • Open source JavaScript-based agents
  • Open source script languages and GUI toolkits
Web-Server-based functionality (HTTP)
  • Proprietary or open source webserver/HTTP/S drivers
Web services
  • Proprietary or open source web services drivers
Mobile applications
  • Mobile OS simulators driven by integrated or separate GUI based toolkits
Embedded
  • Typically java-based toolkits
Error, failure, spate, race conditions or other situations
  • May be simulated by instrumentation, load generation tools or manipulation of the test environment or infrastructure
Environments
  • Don’t forget that environmental conditions influence the behaviour of ALL systems under test.

There are an increasing number of proprietary and open source unit and acceptance testing frameworks available to manage and control the test execution engines above.

Outcome/Output detection and capture

A regression can be detected in as many ways as any outcome (output, change of state etc.) of a system can be exposed and detected. Here’s a list of common outcome/output formats that we may have to deal with. This is not a definitive list.

Browser-rendered output
  • The state of any object on the document Object Model (DOM) exposed by a GUI tool
Any screen-based output
  • Image recognition by hardware or software based agents
Transaction response times
  • Any automated tool with response time capture capability
Database changes
  • Appropriate SQL or database query tool
Message output and content
  • Raw packets captured by network sniffers
  • Message payloads captured and analysed by protocol-specific tools
Client or server system resources
  • CPU, i/o, memory, network traffic etc. detected by performance monitors
Application or other infrastructure – changes of state
  • (Database, enterprise messaging, object request brokers etc. etc.) – dedicated system/resource monitors or custom-built instrumentation etc.
Changes in accessibility or usability (adherence to standards etc.)
  • Web page HTML scanners, character-based screen or report scanners or screen image scanners
Security (server)
  • Port scanning and server-penetration tools

Comparison of Outcomes

A fundamental aspect of regression testing is comparison of actual outcomes (in whatever format from whatever source above) to expected outcomes. If we are running a test again, the comparison is between the new ‘actual’ output/outcome and previously captured ‘baseline’ output/outcome.

Simple comparison functionality of numbers, text, system states, images, mark-up language, database content, reports, message payloads, system resource is not enough. We need to have a capability in our automation to:

Filter content: we may not need to compare ‘everything’. Subsets of database records, screen/image regions, branches or leaves in marked up text, some objects and states but not others etc. may be filtered out (of both actual and baseline content).

Mask content: of the content we filter out, we may wish to mask out certain patterns of content such as image regions that do not contain field borders; textual report columns or rows that contain dates/times, page numbers, varying/unique record ids etc.; screen fields or objects of certain colours, sizes, that are hidden/visible; patterns of text that can be matched using regular expressions and so on.

Calculate from content: the value, significance or meaning of content may have to be calculated: perhaps the number of rows displayed on a screen is significant; the error message, number or status code displayed on a screen image, extracted by text recognition; the result of a formula in which the variables are extracted from an outputted report and so on.

Identify content meeting/exceeding a threshold: the significance of output is determined by its proximity to thresholds such as: CPU, memory or network bandwidth usage compared to pre-defined limits; the value of a purchase order exceeds some limit; the response time of a transaction exceeds a requirement and so on.

System Architecture

The architecture of a system may have a significant influence over the choice of regression approach and automation in particular. An example will illustrate. An increasingly common software model is the MVC or model-view-controller architecture. Simplistically (from Wikipedia):

“The model is used to manage information and notify observers when that information changes; the view renders the model into a form suitable for interaction, typically a user interface element; the controller receives input and initiates a response by making calls on model objects. MVC is often seen in web applications where the view is the HTML or XHTML generated by the app. The controller receives GET or POST input and decides what to do with it, handing over to domain objects (i.e. the model) that contain the business rules and know how to carry out specific tasks such as processing a new subscription.”

A change to a ‘read-only’ view may be completely cosmetic and have no impact on models or controllers. Why regression test other views, models or controllers? Why automate testing at all – a manual inspection may suffice.

If a controller changes, the user interaction may be affected in terms of data captured and/or presented but the request/response dialogue may allow complete control of the transaction and examination of the outcome. In many situations, automated control of requests to and from controllers (e.g. HTTP GETs and POSTs) is easier to achieve than automating tests through the GUI (i.e. a rendered web page).

Note that cross-browser test automation, to verify the behaviour and appearance of a system’s web pages across different browser types, for example, cannot be handled this way. (Some functional automation may be possible, but some usability/accessibility tests will always be manual).

It is clear that the number and variety of the ways a system can be stimulated and potentially regressive outcomes can be observed is huge. Few, if any tools, proprietary or open source, have all the capabilities we need. The message is clear – don’t ever assume the only way to automate regression testing is to use a GUI-based test execution tool!

Regression test automation – summary

In summary, we strongly advise you to bear in mind the following considerations:

  1. What is the outcome of your impact analysis?
  2. What are the objectives of your anti-regression effort?
  3. How could regressions manifest themselves?
  4. How could those regressions be detected?
  5. How can the system under test be stimulated to exercise the modes of operation of concern?
  6. Where in the development and test process is it feasible to implement the regression testing and automation?
  7. What technology, tools, harnesses, custom utilities, skills, resources and environments do you need to implement the automated regression test regime?
  8. What will be your criteria for automating (new or existing, manual) tests?

Test Automation References

  1. Brian Marick, 1997, Classic Testing Mistakes,
    http://www.exampler.com/testing-com/writings/classic/checklist.html
  2. James Bach, 1999, Test Automation Snake Oil,
    http://www.satisfice.com/articles/test_automation_snake_oil.pdf
  3. Cem Kaner, James Bach, Bret Pettichord, 2002, Lessons Learned in Software Testing, John Wiley and Sons
  4. Dorothy Graham, Paul Gerrard, 1999, the CAST Report, Fourth Edition
  5. Paul Gerrard, 1998, Selecting and Implementing a CAST Tool,
    http://gerrardconsulting.com/?q=node/532
  6. Brian Marick, 1998, When Should a Test be Automated?
    http://www.stickyminds.com/sitewide.asp?Function=edetail&ObjectType=ART&ObjectId=2010
  7. Paul Gerrard, 1997, Testing GUI Applications,
    http://gerrardconsulting.com/?q=node/514
  8. Paul Gerrard, 2006, Automation below the GUI (blog posting),
    http://gerrardconsulting.com/index.php?q=node/555
  9. Scott Ambler, 2002-10, Introduction to Test-Driven Design,
    http://www.agiledata.org/essays/tdd.html
  10. Naresh Jain, 2007, Acceptance-Test Driven Development,
    http://www.slideshare.net/nashjain/acceptance-test-driven-development-350264

In the final article of this series, we’ll consider how an anti-regression approach can be formulated, implemented and managed and take a step back to summarise and recap the main messages of these articles.

Paul Gerrard
21 June 2010.