Tenets of Testing – A comparison with Test Axioms

Nicholas Snogren posted on LinkedIn a reference to an “Axioms of Testing” presentation from 2009 and asked me to comment on his “Tenets of Software Testing“. There are some similarities but not many I think, some parallel too, but his question prompted me to give a longer response than I guess was expected. I said…

“Hi, thanks for asking for my opinion. Your tenets look interesting – and although I don’t think they map directly to what I’ve written, they raise points in my mind that need a little airing – my mind grows cobwebby over time, and it’s good to brush off old ideas. A bit like exercising muscles that haven’t been used for a while haha.”

I give my response as a comparison with my Tester’s Pocketbook, and Test Axioms website and presentations. (I notice that some of these posts are around 12 years old and some links don’t work (anymore). Some are out of my control, others I’ll have to track down and correct others – let me know if you want that.)

Tenets v Axioms

Firstly, let’s get our definitions right.

According to dictionary.com, a Tenet is “any opinion, principle, doctrine, dogma, etc., especially one held as true by members of a profession, group, or movement.” Tenets might be regarded as beliefs that don’t require proof and don’t provide a strong foundation.

From the same source, an Axiom is, “i) a self-evident truth that requires no proof. ii) a universally accepted principle or rule. iii) Logic, Mathematics. a proposition that is assumed without proof for the sake of studying the consequences that follow from it”

I favoured the use of Axioms as a foundation for thinking in the testing domain. Axioms, if they are defensible, would provide a stronger foundational set of ideas. When I proposed a set of Testing Axioms, there was some resistance – here’s a Prezi talk that introduces the idea.

James Bach in particular challenged the idea of Axioms and suggested I was creating a school of testing (when schools of testing were getting some attention) here.

By and large, by defining the Axioms in terms that are context-neutral, challenges have tended to be easy to acknowledge, disarm and set aside. Critics, almost entirely from the context-driven school, jumped the gun so to speak – they clearly hadn’t read what I had written at the time before critiquing. Only one or two people responded to James’ call to arms to criticise the Axioms and challenged them.

The Axioms are fully described in The Tester’s Pocketbook – http://testers-pocketbook.com/.

The Axioms of Testing website – https://testaxioms.com/ – sets out the Axioms with some explanation and provides around 50% of the pocketbook content for free.

Axioms caught the attention (and criticism) of people because I pitched them as universal principles or laws of testing. Tenets, being less strident in their definition might not attract attention (or criticism) in the same way.

Immediate Comments on the Tenets

The Tenets are numbered and italicised. My comments in plain text.

  1. A software product’s behavior is exhibited by interactions.
  2. There is potentially an infinite number of possible behaviors in software.

These are properties of software. I’m not sure what 1 says other than behavior is triggered by interactions and presumably observed through interactions. Although a lot of software behaving autonomously might respond to internal events such as the passing of time and might not exhibit any behaviour through interactions e.g. a change of internal state. I’m not sure 1 says much.

Tenet 2 is reasonable for reasonably-sized software artefacts.

In the Tester’s Pocketbook, I hardly use the term software. I prefer that we test systems. Software is usually an important part of every system. Humans do not interact with software (except by reading or writing it). Software exists in the context of program compilation, hosted on operating systems, running on devices which have peripherals and other interconnected systems which may or may not have user interfaces.

Basing Axioms on Systems means that the Axioms are open to interpretation as Axioms of testing ANY system (i.e. anything. I don’t press that idea – but it’s an attractive one). Another ‘benefit’ is that all of the Systems Thinking principles can also be brought to bear on our arguments. Outside its context, Software is not a System.

3. Some of those behaviors are potentially negative, that is, would detract from the objectives of the software company or users.

I use the term Stakeholders to refer to parties interested in the valuable, reliable behavior of systems and the outcome and value of testing those systems.

4. The potentiality for that negative behavior is risk.

OK, but it could be better worded. I would simply say ‘potential modes of failure’ rather than negative behaviour.

5. It’s impossible to guarantee a lack of risk as it’s impossible to experience an infinite number of behaviors.

Not really. You can guarantee a no-risk situation if no one cares or no one cares enough to voice their concerns before testing (or after testing). There is always the potential for failure because systems are complex and we are not skilled enough to create perfect systems.

6. Therefore a subset of behaviors must be sampled to represent the risk.

Rather than represent, I would say trigger the failure(s) of concern to explore the risk and better inform a risk-assessment.

7. The ability to take an accurate sample, representative of the true risk, is a testing skill.

Not sure what you mean by sample – tests or test cases, I presume? Representative is a subjective notion, surely; ‘true’ I don’t understand; and a testing skill would need more definition than this, wouldn’t it?

8. A code change to an existing product may also affect the product in an infinite number of ways.

I’d use ‘ANY’ change, to a ‘SYSTEM’. Why ‘also’? What would you say fits into a ‘not only…. but also…’ clause? But I’m not sure I agree with this assertion anyway. A code change changes some software artefact. The infinite effects (faulty behaviors?) derive from infinite tests (or uses in production) – which you say in 5 is impossible to achieve. I’m not sure what you’re trying to say here.

9. It is possible to infer that some behaviors are more likely to be affected by that change than others.

You can infer anything you like by calling upon the great Unicorn in the sky. How will you do this? Either you use tools which are limited in capability or you might use change and defect history or you might guess based on partial knowledge and experience.

10. The risk -of that change- is higher within the set of behaviors that are more likely to be affected by that change.

Do you mean probability of failure or the consequence of failure? I assume probability. At any rate, this is redundant. You have already asserted this in 9. But it’s also more complicated than this – a cosmetic defect on an app can be catastropic and a system failure negligible at times.

11. The ability to accurately estimate a scope of affected behavior is another testing skill.

I would call this the skills of impact analysis rather than testing. Developers are relatively poor at this, even having a far deeper technical knowledge (either they aren’t able or lack the time to impact-analyse to any reliable degree). So we rely on testing to catch regressions which is less than ideal. Testers depend on their experience rather than system internals knowledge. But, since buggy systems behave in essentially unpredictable ways, we must admit our experience is limited and fallible. It’s not a ‘skill’ that I would dwell on.

12. The scope and sampling ideas alone are meaningless without empirical evidence.

The scope and sampling ideas have meaning regardless of whether you implement them. I suppose you might say they are useless ideas if you don’t gather evidence.

13. Empirical evidence is gathered through interactions with the product, observation of resultant behavior, and assessment of those observations.

The word empirical is redundant. I would use the word ‘some’ here. We also get evidence from operation in production, for example. (Unless you include that already?)

14. The accuracy and speed of scope estimation, behavior sampling, and gathering of evidence are key performance indicators for the tester.

If you are implying 13 are tester skills, I suppose you could make this assertion. But you haven’t said what the value of evidence is yet. Is the purpose of testing only to evaluate the performance of testers? Hope not ;O)

15. Heuristics for the gathering of such evidence, the estimation of scope, and the sampling of behavior are defined in the Heuristic Test Strategy Model.

Heuristics are available in a wide range of sources including software, systems and engineering standards. Why choose such a limited source?

Inspiration for the Tenets

These tenets were inspired by James Bach’s “Risk Gap” and Doug Hubbard’s book “How to Measure Anything.” Both Bach and Hubbard discuss a very similar idea from different spaces. Hubbard suggests that by defining our uncertainty, we can communicate the value of reducing the uncertainty. Bach describes the “knowledge we need to know” as the “Risk Gap.” This Risk Gap is our uncertainty, and in defining it, we can compute the value of closing it. In testing, I realized we have three primary areas of uncertainty: 1) what is the “risk gap,” or knowledge we need to find out, 2) how can we know when we’ve acquired enough of that unknown knowledge, and 3) how can we design interactions with the program to efficiently reveal this knowledge.

There are several interesting anomalies to unpick here:

  • I recall James telling a story about Tom Gilb asserting anything could be measured. James suggested Love and Tom obliged. I don’t think James was impressed.
  • ‘Defining uncertainty’ – how do you do that reliably? Numerically? Objectively? We can put any numbers we like against probability and consequence. Being certain, with or without evidence, is always subjective. People can say they are more certain, but based on … what? How do we correlate data with a human emotion and use that to make engineering decisions? People can be easily deceived – by themselves, not just by others. Consider this, for example, and this.
  • Risk Gap – how is a quantity of knowledge measured? What units? With what certainty? These are aspects that James has argued against since the early 1990s.
  • Your three challenges 1), 2)and 3) are reasonable as goals. How do Bach and Hubbard argue you achieve them, if not by calling on the subjective opinions of other people?

Some More General Comments

You seem to be trying to ‘make a case’ for testing as a tool to address the risk of failure in systems. I (like and) use that same approach in a rounder sense in my conference talks and writings, when practicable. My observations on this are:

  1. The logic doesn’t flow as it should because of flaws in the individual statements
  2. You have no pre-definition of test, testing or its purpose at the outset, so it’s not clear what your destination is
  3. There’s no defined goal of testing, other than to gather evidence to (my words) reassess risks and thereby reduce uncertainty (but you don’t say why that’s a ‘good thing’)
  4. Testing enables a reassessment of risk, but that reassessment may increase risk if, for example, you find a bug in something that was previously deemed reliable. (there’s a bigger conversation to be had, but risk is not a BAD thing, it’s the barrier(s) you need to navigate or break through to gain your REWARD).
  5. Extant, significant risks are a BARRIER to accepting or releasing systems. As such, the goal of testing is to provide evidence that to the people who make the decision, those risks are acceptable or negligible (reducing uncertainty, sure but never eliminating it). But the ultimate goal of testing is to show that the system WORKS. Encountering failures is a detour from that goal. The testing goal is broader than exploring risks.
  6. You don’t mention stakeholders at all. Why do we test? To provide evidence to testing stakeholders – our customers – so they can make better-informed decisions.

Summary

I don’t want to give the impression that I’m criticising Nicholas or am arguing against the concept of Tenets or Principles or Axioms of testing. All I have tried to do is offer reasonable criticism of the Tenets to show that is a) extremely difficult to postulate bullet-proof Tenets, Principles or Axioms and b) it is extremely easy to criticise such efforts by:

  • Exposing flaws in the language used and the logic in an argument that C follows B follows A etc.
  • Identifying implicit assumptions of meaning, scope, dependency and authority and
  • Offering examples of context that contradict, or expose flaws in, the statements made.

I do this because I have been there many times since 2008 and occasionally have to defend the Test Axioms from such criticisms. I have to say, Critical Thinking is STILL a rare skill – I wish criticism were more often proffered as a result of it.

References

  1. The Tester’s Pocketbook – http://testers-pocketbook.com/
  2. Axioms of Testing website – https://testaxioms.com/

Test Management is Dead, Long Live Test Management

Do you remember the ‘Testing is Dead’ meme that kicked off in 2011 or so? It was triggered by a presentation done by Alberto Savoiea here . It caused quite a stir, some copycat presentations and a lot of counter-arguments. But I always felt most people missed the point being made. you just had to strip out the dramatics and Doors music.

The real message was that for some organisations, the old ways wouldn’t work any more, and as time has passed, that prediction has come true. With the advent of Digital, mobile, IoT, analytics, machine learning and artificial intelligence, some organisations are changing the way they develop software, and as a consequence, testing changes too.

Shifting testing left, with testers working more collaboratively with the business and developers, test teams are being disbanded and/or distributed across teams. With no test team to manage, the role of the test manager is affected. Or eliminated.

Test management thrives; test managers come and go.

It is helpful to think of testing as less of a role and more of an activity that people undertake in their projects or organisations. Everyone tests, but some people specialise and make a career of it. In the same way, test management is an activity associated with testing. Whether you are the tester in a team or running all the testing in a 10,000 man-year programme, you have test management activities.

For better or for worse, many companies have decided that the role of test managers is no longer required. Responsibility for testing in a larger project or programme is distributed to smaller, Agile teams. There might be only one tester in the team. The developers in the team take more responsibility for testing and run their own unit tests. There’s no need for a test manager as such – there is no test team. But many of the activities of test management still need to be done. It might be as mundane as keeping good records of tests planned and/or executed. It could be taking the overall project view on test coverage (of developer v tester v user acceptance testing for example).

There might not be a dedicated test manager, but some critical test management activities need to be performed. Perhaps the team jointly fulfil the role of a virtual test manager!

Historically, the testing certification schemes have focused attention on the processes you need to follow—usually in structured or waterfall projects. There’s a lot of attention given to formality and documentation as a result (and the test management schemes follow the same pattern). The processes you follow, the test techniques you use, the content and structure of reporting vary wherever you work. I call these things logistics.

Logistics are important, but vary in every situation.

In my thinking about testing, as far as possible, I try to be context-neutral. (Except my stories, which are grounded in real experience).

As a consultant to projects and companies, I never knew what situation would underpin my next assignment. Every organisation, project, business domain, company culture, and technology stack is different. As a consequence, I avoided having fixed views on how things should be done, but over twenty-five years of strategy consulting, test management and testing, certain patterns and some guiding principles emerged. I have written about these before[1].

To the point.

Simon Knight at Gurock asked me to create a series of articles on Test Management, but with a difference. Essentially, the fourteen articles describe what I call “Logistics-Free Test Management”. To some people that’s an oxymoron. But that is only because we have become accustomed in many places to treat test management as logistics management. Logistics aren’t unique to testing.

Logistics are important, but they don’t define test management.

I believe we need to  think about testing as a discipline where logistics choices are made in parallel with the testing thinking. Test Management follows the same pattern. Logistics are important, but they aren’t testing. Test management aims to support the choices, sources of knowledge, test thinking and decision making separately from the practicalities – the logistics – of documentation, test process, environments and technologies used.

I derived the idea of a New Model for Testing – a way of visualising the thought processes of testers – in 2014 or so. Since then, I have presented to thousands of testers and developers and I get very few objections. Honestly!

However, some people do say, with commitment, “that’s not new!”. And indeed it isn’t.

If the New Model reflects how you think, then it should be a comfortable fit. It is definitely not new to you!

One of the first talks I gave on the New Model is here. (Skip to 43m 50s to skip the value of testing talk and long introduction).

The New Model for Testing

Now, I might get a book out of the material (hard-copy and/or ebook formats), but more importantly, I’m looking to create an online and classroom course to share my thinking and guidance on test management.

Rather than offer you specific behaviours and templates to apply, I will try to describe the goals, motivations, thought processes, the sources of knowledge and the principles of application and use stories from my own experience to illustrate them. There will also be suggestions for further study and things to think about as exercises or homework.

You will need to adjust these lessons to your specific situation. It requires that you think for yourself – and that is no bad thing.

Here’s the deal in a nutshell: I’ll give you some interesting questions to ask. You need to get the answers from your own customers, suppliers and colleagues and decide what to do next.

I’ll be exploring these ideas in my session at the next Assurance Leadership Forum on 25 July. See the programme here and book a place.

In the meantime, if you want to know more, leave a comment or do get in touch at my usual email address.

[1] The Axioms of Testing in the Tester’s Pocketbook for example, https://testaxioms.com

Testing the Code that Checks Code

Twitter is a lousy medium for debate, don’t you think?

I had a very brief exchange with Michael Bolton below.  (Others have commented on this thread this afternoon). To my apparently contradictory (and possibly stupid) comment, Michael responded with a presumably perplexed “?”

This blog is a partial explanation of what I said, and why I said it. You might call it an exercise in pedantry. (Without pedantry, there is less joy in the world – discuss). There’s probably a longer debate to be had, but certainly not on Twitter. Copenhagen perhaps, Michael? My response was to 3) Lesson.

3) Lesson: don’t blindly trust … your automated checks, lest they fail to reveal important problems in production code.

I took the lesson tweet out of context and ignored the first two tweets deliberately, and I’ll comment on those below. For the third, I also ignored the “don’t blindly trust your test code” aspect too and here’s why. If you have test code that operates at all, and you have automated checks that operate, you presumably trust the test code already. You will have already done whatever testing of the test code you deemed appropriate. I was more concerned with the second aspect. Don’t blindly trust the checks.

But you know what? My goal with automation is exactly that – to blindly trust automated checks.

If you have an automated check that runs at all, then given the same operating environment, test data, software versions, configuration and so on, you would hardly expect the repeated check to reveal anything new, unless you’d want to research its intricacies manually and embark on trying to find newer resolutions, through debate and comparisons; for example, the grafana vs kibana comparison. If it did ‘fail’, then it really ought to flag some kind of alarm. If you are not paying attention or are ignoring the alarm, then on your own head be it. But if I have to be paying attention all the time, effectively babysitting – then my automation is failing. It is failing to replace my manual labour (often the justification for automating in the first place).

A single check is most likely to be run as part of a larger collection of tests, perhaps thousands, so the notification process needs to be integrated with some form of automated interpretation or at least triggered when some pre-defined threshold is exceeded.

Why blindly? Well, we humans are cursed by our own shortcomings. We have low attention-spans, are blind to things we see but aren’t paying attention to and of course, we are limited in what we can observe and assimilate anyway. We use tools to replace humans not least because of our poor ability to pay attention.

So I want my automation to act as if I’m not there and to raise alarms in ways that do not require me to be watching at the time. I want my phone to buzz, or my email client to bong, or my chatOps terminal to beep at me. Better still, I want the automation to choose who to notify. I want to be the CC: or BCC: in the message, not necessarily the To: all the time.

I deliberately took an interpretation of Michael’s comment that he probably didn’t intend. (That’s Twitter for you).

When my automated checks run, I don’t expect to have to evaluate whether the test code is doing ‘the right thing’ every time. But over time, things do change – the environment, configuration and software under test – so I need to pay attention to whether these changes impact my test code. Potentially, the check needs adjustment, re-ordering, enhancing, replacing or removing altogether. The time to do this is before the test code is run – during requirements discussions or in collaboration with developers.

I believe this was what Michael intended to highlight: your test code needs to evolve with the system under test and you must pay attention to that.

Now, my response to the tweet suggests rather than babysit your automated checks, you should spend your time more wisely – testing the system in ways your test code cannot (economically).

To the other tweets:

1) What would you discover if you submitted your check code to the same review, scrutiny, and testing as your production code?

2) If you’re not scrutinizing test code, why do you trust it any more than your production code? Especially when no problems are reported?

Test code can be trivial, but can sometimes be more complex than the system under test. It’s the old old story, “who tests the test code and how?”. I have worked on a a few projects where test code was treated like any other code. High integrity projects and the like. but even then I didn’t see much ‘test the test code’ activity. I’d say there are some common factors that make it less likely you would test your test code, and feel safe (enough) not doing so.

  1. Test code is built incrementally, usually, so that it is ‘tried’ in isolation. Your test code might simulate a web or mobile transaction, for example? if you can watch it move to fields, enter data, check the outcomes you see correctly, most testers would be satisfied it works as a simple check. What other test is required than re-running it, expecting the same outcome each time?
  2. Where the check is data-driven, of course, the code uses prepared data to fill, click or check parameterised fields, buttons and outcomes respectively. On a GUI app this can be visibly checked. Should you try invalid data (not included in your planned test data) and so on? Why bother? If the test code fails, then that is notification enough that you screwed up – fix it. If the test code flags false negatives when for example your environment changes, then you have a choice: tidy up your environment, or add code to accommodate acceptable environmental variations.
  3. Now, when your test code loses synchronisation or encounters a real mismatch of outcomes, your code needs handlers for these situations. These handlers might be custom-built for every check (an expensive solution) or utilise system-wide procedures to log, recover, re-start or hand-off depending on the nature of the tests or failures. This ought to be where your framework or scaffolding code comes in.
  4. Surely the test code needs testing more than just ‘using it’? The thing is, your test code is not handed over to users for them to enter extreme, dubious or poor quality data. All the data it will ever handle is in the test suite you use to test the system under test. Another tester might add new rows of test data to feed it, but problems arising are as likely to be due to other things than new test data. At any rate, what tests would you apply to your test code? Your test data, selected to exercise extremes in your system under test is probably quite well suited to testing the test code anyway.
  5. When problems do arise when your test code is run, it is more likely to be caused by environmental/data problems or software changes, so your test code will be adapted in parallel with these changes, or made more resilient to variations (bearing in mind the original purpose of the test code).
  6. Your scaffolding code or home-grown test frameworks handles this doesn’t it? Pretty much the same arguments above apply. They are likely to be made more robust through use, evolution and adaptation than a lot of planned tests.
  7. Who tests the tests? Who tests the tests of the tests? Who tests the tests of …

Your scaffolding, framework, handlers, analysis, notifications, messaging capability need a little more attention. In fact, your test tools might need to be certified in some way (like standard C, C++, Java compilers for example) to be acceptable to use at all.

I’m not suggesting that under all circumstances, your test code doesn’t need testing. But it seems to me, that in most situations, the code that actually performs a check might be tested by your test data well enough and that most exceptions arise as you develop and refine that code and can be fixed before you come to rely on it.