Testing the Code that Checks Code

Twitter is a lousy medium for debate, don’t you think?

I had a very brief exchange with Michael Bolton below.  (Others have commented on this thread this afternoon). To my apparently contradictory (and possibly stupid) comment, Michael responded with a presumably perplexed “?”

This blog is a partial explanation of what I said, and why I said it. You might call it an exercise in pedantry. (Without pedantry, there is less joy in the world – discuss). There’s probably a longer debate to be had, but certainly not on Twitter. Copenhagen perhaps, Michael? My response was to 3) Lesson.

3) Lesson: don’t blindly trust … your automated checks, lest they fail to reveal important problems in production code.

I took the lesson tweet out of context and ignored the first two tweets deliberately, and I’ll comment on those below. For the third, I also ignored the “don’t blindly trust your test code” aspect too and here’s why. If you have test code that operates at all, and you have automated checks that operate, you presumably trust the test code already. You will have already done whatever testing of the test code you deemed appropriate. I was more concerned with the second aspect. Don’t blindly trust the checks.

But you know what? My goal with automation is exactly that – to blindly trust automated checks.

If you have an automated check that runs at all, then given the same operating environment, test data, software versions, configuration and so on, you would hardly expect the repeated check to reveal anything new, unless you’d want to research its intricacies manually and embark on trying to find newer resolutions, through debate and comparisons; for example, the grafana vs kibana comparison. If it did ‘fail’, then it really ought to flag some kind of alarm. If you are not paying attention or are ignoring the alarm, then on your own head be it. But if I have to be paying attention all the time, effectively babysitting – then my automation is failing. It is failing to replace my manual labour (often the justification for automating in the first place).

A single check is most likely to be run as part of a larger collection of tests, perhaps thousands, so the notification process needs to be integrated with some form of automated interpretation or at least triggered when some pre-defined threshold is exceeded.

Why blindly? Well, we humans are cursed by our own shortcomings. We have low attention-spans, are blind to things we see but aren’t paying attention to and of course, we are limited in what we can observe and assimilate anyway. We use tools to replace humans not least because of our poor ability to pay attention.

So I want my automation to act as if I’m not there and to raise alarms in ways that do not require me to be watching at the time. I want my phone to buzz, or my email client to bong, or my chatOps terminal to beep at me. Better still, I want the automation to choose who to notify. I want to be the CC: or BCC: in the message, not necessarily the To: all the time.

I deliberately took an interpretation of Michael’s comment that he probably didn’t intend. (That’s Twitter for you).

When my automated checks run, I don’t expect to have to evaluate whether the test code is doing ‘the right thing’ every time. But over time, things do change – the environment, configuration and software under test – so I need to pay attention to whether these changes impact my test code. Potentially, the check needs adjustment, re-ordering, enhancing, replacing or removing altogether. The time to do this is before the test code is run – during requirements discussions or in collaboration with developers.

I believe this was what Michael intended to highlight: your test code needs to evolve with the system under test and you must pay attention to that.

Now, my response to the tweet suggests rather than babysit your automated checks, you should spend your time more wisely – testing the system in ways your test code cannot (economically).

To the other tweets:

1) What would you discover if you submitted your check code to the same review, scrutiny, and testing as your production code?

2) If you’re not scrutinizing test code, why do you trust it any more than your production code? Especially when no problems are reported?

Test code can be trivial, but can sometimes be more complex than the system under test. It’s the old old story, “who tests the test code and how?”. I have worked on a a few projects where test code was treated like any other code. High integrity projects and the like. but even then I didn’t see much ‘test the test code’ activity. I’d say there are some common factors that make it less likely you would test your test code, and feel safe (enough) not doing so.

  1. Test code is built incrementally, usually, so that it is ‘tried’ in isolation. Your test code might simulate a web or mobile transaction, for example? if you can watch it move to fields, enter data, check the outcomes you see correctly, most testers would be satisfied it works as a simple check. What other test is required than re-running it, expecting the same outcome each time?
  2. Where the check is data-driven, of course, the code uses prepared data to fill, click or check parameterised fields, buttons and outcomes respectively. On a GUI app this can be visibly checked. Should you try invalid data (not included in your planned test data) and so on? Why bother? If the test code fails, then that is notification enough that you screwed up – fix it. If the test code flags false negatives when for example your environment changes, then you have a choice: tidy up your environment, or add code to accommodate acceptable environmental variations.
  3. Now, when your test code loses synchronisation or encounters a real mismatch of outcomes, your code needs handlers for these situations. These handlers might be custom-built for every check (an expensive solution) or utilise system-wide procedures to log, recover, re-start or hand-off depending on the nature of the tests or failures. This ought to be where your framework or scaffolding code comes in.
  4. Surely the test code needs testing more than just ‘using it’? The thing is, your test code is not handed over to users for them to enter extreme, dubious or poor quality data. All the data it will ever handle is in the test suite you use to test the system under test. Another tester might add new rows of test data to feed it, but problems arising are as likely to be due to other things than new test data. At any rate, what tests would you apply to your test code? Your test data, selected to exercise extremes in your system under test is probably quite well suited to testing the test code anyway.
  5. When problems do arise when your test code is run, it is more likely to be caused by environmental/data problems or software changes, so your test code will be adapted in parallel with these changes, or made more resilient to variations (bearing in mind the original purpose of the test code).
  6. Your scaffolding code or home-grown test frameworks handles this doesn’t it? Pretty much the same arguments above apply. They are likely to be made more robust through use, evolution and adaptation than a lot of planned tests.
  7. Who tests the tests? Who tests the tests of the tests? Who tests the tests of …

Your scaffolding, framework, handlers, analysis, notifications, messaging capability need a little more attention. In fact, your test tools might need to be certified in some way (like standard C, C++, Java compilers for example) to be acceptable to use at all.

I’m not suggesting that under all circumstances, your test code doesn’t need testing. But it seems to me, that in most situations, the code that actually performs a check might be tested by your test data well enough and that most exceptions arise as you develop and refine that code and can be fixed before you come to rely on it.

Regression Testing – What to Automate and How

This was one of two presentations at the fourth Test Management Summit in London on 27 January 2010.

I don’t normally preach about test auotmation as, quite frankly, I find the subject boring. The last time I talked about automation was maybe ten years ago. This was the most popular topic in the session popularity survey we ran a few days before the event and it was very well attended.

The powerpoint slides were written on the morning of the event while other sessions weere taking place. It would seem that a lot of deep-seated frustrations with regression testing and automation came to the fore. The session itself became something of a personal rant and caused quite a stir.

The slides have been amended a little to include some of the ad-hoc sentiments I expressed in the session and also to clarify some of the messages that came ‘from the heart’.

I hope you find it interesting and/or useful.

Regression Testing – What to Automate and How

Tools for Testing Web Based Systems

The extraordinary growth in the Internet is sweeping through industry. Small companies can now compete for attention in the global shopping mall – the World Wide Web. Large corporations see ‘The Web’ not only as an inexpensive way to make company information on products and services available to anyone with a PC and browser, but increasingly as a means of doing on-line business with world wide markets. Companies are using the new paradigm in four ways:

  • Web sites – to publicise services, products, culture and achievements.
  • Internet products – on-line services and information to a global market on the Web.
  • Intranet products – on-line services and information for internal employees.
  • Extranet products – on-line services and information enabling geographically distributed organisations to collaborate.

Web-based systems can be considered to be a particular type of client/server architecture. However, the way that these systems are assembled and used means some specialist tools are required and since such tools are becoming available we will give them some consideration here.

The risks that are particular to Web based applications are especially severe where the system may be accessed by thousands or tens of thousands of customers. The very high visibility of some systems being built mean that failure in such systems could be catastrophic. Web pages usually comprise text base files in Hypertext Markup Language (HTML) and now contain executable content so that the traditional separation of ‘code and data’ is no longer possible or appropriate. Browsers, plug-ins, active objects and Java are also new concepts which are still immature.

There are four main categories of tools which support the testing of Web applications:

Application test running

Test running tools that can capture tests of user interactions with Web applications and replay them now exist. These tools are either enhancements to existing GUI test running tools or are new tools created specifically to drive applications accessed through Web browsers. The requirements for these tools are very similar to normal GUI test running tools, but there are some important considerations.

Firstly, the Web testing tool will be designed to integrate with specific browsers. Ask your vendor whether the tool supports the browsers your Web application is designed for, but also check whether old versions of these browsers are also supported. To test simple text-oriented HTML pages, the Web testing tool must be able to execute the normal web page navigation commands, recognise HTML objects such as tables, forms, frames, links to other pages and content consisting of text, images, video clips etc. HTML text pages are often supplemented by server-side programs typically in the form of Common Gateway Interface (CGI) Scripts which perform more substantial processing. These should be transparent to the test tool.

Increasingly, web applications will consist of simple text-based web pages, as before, but these will be complemented with ‘active content’ components. These components are likely to be Java applets, Netscape ‘plug-ins’, ActiveX controls. Tools are only just emerging which are capable of dealing with these components. Given the portable nature of the Java development language, tools written in Java may actually be completely cable of dealing with any legitimate Java object in your Web application, so may be an obvious choice. However, if other non-Java components are present in your application, a ‘pure-Java’ tool may prove inadequate. Another consideration is how tools cope with dynamically generated HTML pages – some tools cannot.

HTML source, link, file and image checkers

Tools have existed for some time which perform ‘static tests’ of Web pages and content. These tools open a Web page (a site Home page, for example) to verify the syntax of the HTML source and check that all the content, such as images, sounds and video clips can be accessed and played/displayed. Links to other pages on the site can be traversed, one by one. For each linked page, the content is verified until the tool runs out of unvisited links. These tools are usually configured to stop the search once they encounter a link to an off-server page or another site, but they can effectively verify every page and that every home-site based link and content is present. Most of these tools provide graphical reports on the structure of Web sites which highlight the individual pages, internal and external links, missing links and other missing content.

Component test-drivers

Advanced Web applications are likely to utilise active components which are not directly accessible, using a browser-oriented Web testing tool. Currently, developers have to write customised component drivers, for example using the main{} method in Java classes to exercise the methods in a the class without having to use other methods in other classes.

As web applications become more sophisticated, the demand for specialised component drivers to test low level functionality and integration of components will increase. Such tools may be delivered as part of a development toolkit, but unfortunately, development tool vendors are more often interested in providing the ‘coolest’ development, rather than testing tools.

Internet performance testing

Web applications are most easily viewed as a particular implementation of client/server. The performance testing tools for the web that are available are all enhanced versions of established client/server based tools. We can consider the requirements for load generation and client application response time measurement separately.

Load generation tools rely on a master or control program running on a server which drive physical workstations using a test running tool to drive the application, or test drivers which submit Web traffic across the network to the Web servers. In the first case, all that is new is that it is the Web-oriented test running tool which drives the client application through a browser. For larger tests, test drivers capable of generating Web traffic across the network are used. Here, the test scripts dispatch remote procedure calls to the web servers. Rather than a SQL-oriented database protocol such as ODBC, the test script will contain HTTP calls which use the HTTP protocol instead. All that has changed is the test driver programs.

Client application response time measurement is done using the Web test running tool. This may be a standalone tool running on a client workstation or the client application driver, controlled by the load generation tool.