Freedom, Responsibility and the Tester’s Choice

Peter Farrell-Vinay posted the question “Does exploratory testing mean we’ve stopped caring about test coverage?”on LinkedIn here: http://www.linkedin.com/groupItem?view=&gid=690977&type=member&item=88040261&qid=75dd65c0-9736-4ac5-9338-eb38766e4c46&trk=group_most_recent_rich-0-b-ttl&goback=.gde_690977_member_88040261.gmr_690977

I’ve replied on that forum, but I wanted to restructure some of the various thoughts expressed there to make a different case.

Do exploratory testers care about coverage? If they don’t think and care about coverage, they absolutely should.

All test design is based on models

I’ve said this before: http://testaxioms.com/?q=node/11

Testing is a process in which we create mental models of the environment, the system, human nature, and the tests themselves. Test design is the process by which we select, from the infinite number possible, the tests that we believe will be most valuable to us and our stakeholders. Our test model helps us to select tests in a systematic way. Test models are fundamental to testing – however performed. A test model might be a checklist or set of criteria; it could be a diagram derived from a design document or an analysis of narrative text. Many test models are never committed to paper – they can be mental models constructed specifically to guide the tester whilst they explore the system under test.

From the tester’s point of view, a model helps us to recognise particular aspects of the system that could be the object of a test. The model focuses attention on areas of the system that are of interest. But, models almost always over-simplify the situation.

All models are wrong, some models are useful

This maxim is attributed to the statistician George Box. But it absolutely applies in our situation.

Here’s the rub with all models – an example will help. A state diagram is a model. Useful, but flawed and incomplete. It is incomplete because a real system has billions of states, not the three defined in a design document. (And the design might have a lot or little in common with the delivered system itself, by the way). So the model in the document is idealised, partial and incomplete – it is not reality. So, the formality of models does not equate to test accuracy or completeness in any way. All coverage is measured with respect to the model used to derive testable items (in this case it could be state transitions). Coverage of the test items derived from the model doesn’t usually (hardly ever?) indicate coverage of the system or technology.

The skill of testing isn’t mechanically following the model to derive testable items. The skill of testing is in the choice of the considered mix of various models. The choice of models ultimately determines the quality of the testing. The rest is clerical work and (most important) observation.

I’ve argued elsewhere that not enough attention is paid to the selection of test models. http://gerrardconsulting.com/index.php?q=node/495

Testing needs a test coverage model or models

I’ve said this before too: http://testaxioms.com/?q=node/14

Test models allow us to identify coverage items. A coverage item is something we want to exercise in our tests. When we have planned or executed tests that cover items identified by our model we can quantify the coverage achieved as a proportion of all items on the model – as a percentage.

Numeric test coverage targets are sometimes defined in standards and plans and to be compliant these targets must be met. Identifiable aspects of our test model, such as paths through workflows, transitions in state models or branches in software code can be used as the coverage items.
Coverage measurement can help to make testing more ‘manageable’. If we don’t have a notion of coverage, we may not be able to answer questions like, ‘what has been tested?’, ‘what has not been tested?’, ‘have we finished yet?’, ‘how many tests remain?’ This is particularly awkward for a test manager.

Test models and coverage measures can be used to define quantitative or qualitative targets for test design and execution. To varying degrees, we can use such targets to plan and estimate. We can also measure progress and infer the thoroughness or completeness of the testing we have planned or executed. But we need to be very careful with any quantitative coverage measures or percentages we use.

Formal and Informal Models

Models and coverage items need not necessarily be defined by industry standards. Any model that allows coverage items to be identified can be used.

My definition is this: a Formal Model allows coverage items to be reliably identified on the model. A quantitative coverage measure can therefore be defined and used as a measurable target (if you wish).

Informal Models tend to be checklists or criteria used to brainstorm a list of coverage items or to trigger ideas for testing. These lists or criteria might be pre-defined or prepared as part of a test plan or adopted in an exploratory test session.

Informal models are different from formal models in that the derivation of the model itself is dependent on the experience, intuition and imagination of the practitioner using them so coverage using these models can never be quantified meaningfully. We can never know what ‘complete coverage’ means with respect to these models.

Needless to say, tests derived from an informal model are just as valid as tests derived from a formal model if they increase our knowledge of the behaviour or capability of our system.

Risk-based testing is an informal model approach – there is no way to limit the number of risks that can be identified. Is that bad? Of course not. It’s just that we can’t define a numeric coverage target (other than ‘do some tests associated with every serious risk’). Risk identification, assessments etc. are subjective. Different people would come up with different risks, described differently, with different probabilities and consequences. Different risks would be included/omitted; some risks would be split into micro-risks or not. It’s subjective. All risks aren’t the same so %coverage is meaningless etc. The formality associated with risk-based approaches relates mostly to the level of ceremony and documentation and not the actual technique of identifying and assessing risks. It’s still an informal technique.

In contrast, two testers given the same state transition diagram or state table asked to derive, say, state transitions to be covered by tests, would come up with the same list of transitions. Assuming a standard presentation for state diagrams can be agreed, you have an objective model (albeit flawed, as already suggested).

Coverage does not equal quality

A coverage measure (based on a formal model) may be calculated objectively, but there is no formula or law that says X coverage means Y quality or Z confidence. All coverage measures give only indirect, qualitative, subjective insights into the thoroughness or completeness of our testing. There is no meaningful relationship between coverage and the quality of systems.

So, to return to Peter’s original question “Does exploratory testing mean we’ve stopped caring about test coverage?” Certainly not, if the tester is competent.

Is the value of testing less because informal test/coverage models are used rather than formal ones? No one can say – there is no data to support that assertion.

One ‘test’ of whether ANY tester is competent is to ask about their models and coverage. Most testing is performed by people who do not understand the concept of models because they were never made aware of them.

The formal/informal aspects of test models and coverage are not a criteria for deciding whether planned/documented v exploratory is best because planned testing can use informal models and ET can use formal models.

Ad-Hoc Test Models

Some models can be ad-hoc – here and now, for a specific purpose – invented by the tester just before or even during testing. If, while testing, a tester sees an opportunity to explore a particular aspect of a system, he might use his experience to think up some interesting situations on-the-fly. Nothing may be written down at the time, but the tester is using a mental model to generate tests and speculate how the system should behave.

When a tester sees a new screen for the first time, they might look at the fields on screen (model: test all the data fields), they might focus on the validation of numeric fields (model: boundary values), they might look at the interactions between checkboxes and their impact on other fields visibility or outcomes (model: decision table?) or look at ways the screen could fail e.g. extreme values, unusual combinations etc. (model: failure mode or risk-based). Whatever. There are hundreds of potential models that can be imagined for every feature of a system.

The very limited number of test models associated with textual requirements are just that – limited – to the common ones taught in certification courses. Are they the best models? Who knows? There is very little evidence to say they are. Are they formal – yes, in so far objective definitions of the models (often called test techniques) exist. Is formal better than informal/ad-hoc? That is a cultural or value-based decision – there’s little or no evidence other than anecdotal to justify the choice.

ET exists partly to allow testers to do much more testing than that limited by the common models. ET might be the only testing used in some contexts or it might be the ‘extra testing on the top’ of more formally planned, documented testing. That’s a choice made by the project.

Certification Promotes Testing as a Clerical Activity

This ‘clerical’ view of testing is what we have become accustomed to (partly because of certification). The handed-down or ‘received wisdom’ of off-the-shelf models are useful in that they are accessible, easy to teach and mostly formal (in my definition). There were, when I last looked, 60+ different code coverage models possible in plain vanilla program languages. My guess is there are dozens associated with narrative text analysis, dozens associated with usage patterns, dozens associated with integration and messaging strategies. And for every formal design model in say, UML, there are probably 3-5 associated test models – for EACH. Certified courses give us five or six models. Most testers actually use one or two (or zero).

Are the stock techniques efficient/effective? Compared to what? They are taught mostly as a way of preparing documentation to be used as test scripts. They aren’t taught as test models having more or less effectiveness or value for money to be selected and managed. They are taught as clerical procedures. The problem with real requirements is you need half a dozen different models on each page, on each paragraph even. Few people are trained/skilled enough to prepare good designed, documented tests. When people talk about requirements coverage it’s as sophisticated as saying we have a test that someone thinks relates to something mentioned in that requirement. Hey – that’s subjective again – subjective, not very effective and also very expensive.

With Freedom of Model Choice Comes Responsibility

A key aspect of exploratory testing is that you should not be constrained but should be allowed and encouraged to choose models that align with the task in hand so that they are more direct, appropriate and relevant. But the ‘freedom of model choice’ applies to all testing, not just exploratory, because at one level, all testing is exploratory (http://gerrardconsulting.com/index.php?q=node/588).
In future, testers need to be granted the freedom of choice of test models but for this to work, testers must hone their modelling skills. With freedom comes responsibility. Given freedom to choose, testers need to make informed choices of model that are relevant to the goals of their testing stakeholders. It seems to me that the testers who will come through the turbulent times ahead are those who step up to that responsibility.

Sections of the text in this post are lifted from the pocketbook http://testers-pocketbook.com

The higher the quality, the less effective we are at testing

Its been interesting to me to watch over the last 10 or maybe 15 years the debate over whether exploratory or scripted testing is more effective. There’s no doubt that one can explore more of a product in the time it takes for someone to follow a script. But then again – how much time exploratory testers lose spent bumbling around lost, aimlessly going over the same ground many times, hitting dead ends (because they have little or no domain or product knowledge to start with). Compare that with a tester who has lived with the product requirements as they have evolved over time. They may or may not be blinkered, but they are better informed – sort of.

I’m not going to decry the value of exploration or planned tests – both have great value. But I reckon people who think exploration is better than scripted under all circumstances have lost sight of a thing or two. And that phrase ‘lost sight of a thing or two’ is significant.

I’m reading Joseph T. Hallinan’s book, “Why We Make Mistakes”. Very early on, in the first chapter no less Hallinan suggests, “we’re built to quit”. It makes sense. So we are.

When humans are looking for something, smuggled explosives, tumors in x-rays, bugs in software – humans are adept at spotting what they look for – if, and it’s a big if, these things are common – in which case they are pretty effective – spotting what they look for most of the time.

But, what if what they seek is relatively rare? Humans are predisposed to just give up the search prematurely. It’s evolution stupid! Looking for, and not finding, food in one place just isn’t sensible after a while. you need to move on.

Hallinan quotes (among others) the cases of people who look for guns in luggage in airports and tumours in xrays. In these cases, people look for things that rarely exist. In the case of radiologists, mammograms reveal tumours only 0.3 percent of the time. 99.7 percent of the time the searcher will not find what they look for.

In the case of guns or explosives in luggage the occurrence is rarer. In 2004, according to the Transportation Security Administration, 650 million passengers travvelled in the US by air. But only 598 firearms were found – about one in a million occurences.

Occupations that seek to find things that are rare have considerable error rates. The miss rate for radiogists looking for cancers is around 30%. In one study at the world famous Mayo clinic, 90% of the tumours missed by radiologists were visible in previous x-rays.

In 2008, from the UK I travelled to the US, to Holland and Ireland. On my third trip, returning from Ireland with the same rucksdack on my back at the security check at Dublin airport (i.e. my sixth flight), when going through security, I was called to one side by a security officer. A lock-knife with a 4.5 inch blade was found in my rucksack. Horrified, when presented with the article I asked could it please be disposed of! It was mine, but in the bag by mistake – and had been there for six months unnoticed, by me and by five airport security scans. This was the *sixth* flight with the offending article in the bag. Five previous scans at airports terminals had failed to detect a quite heavy metal object – pointed and a potentially dangerous weapon. How could that happen? Go figure.

Back to software. Anyone can find bugs in crappy software. Its like walking in bare feet in a room full of loaded mouse traps. But if you are testing software of high quality, it’s harder to find bugs. It may be you give up *before* you have given yourself time to find the really (or not so) subtle ones.

Would a script help? I don’t know. It might help because in principle, you have to follow it. But it might make you even more bored. All testers get bored/hungry/lazy/tired and are more or less incompetent or uninformed – you might give up before you’ve given yourself time to find anything significant. Our methods, such as they are, don’t help much with this problem. Exploratory testing can be just as draining/boring as scripted.

I want people to test well. It seems to me that the need to test well increases with the criticality and quality of software, and motivation to test aligns pretty closely. Is exploration or scripted testing of very high quality software more effective? I’m not sure we’ll ever know until someone does a proper experiment (and I don’t mean testing a 2000 line of code toy program in a website or nuclear missile).

I do know that if you are testing high quality code and just before release it usually is of high quality, then you have to have your eyes open and your brain switched on. Both of em.

Changing perceptions of exploratory testing

It seems to me that, to date, perceptions of exploration in communities that don’t practice it have always been that it is appropriate only for document- and planning-free contexts. It’s not been helped by the emphasis that is placed on these contexts by the folk who practice and advocate exploration. Needless to say, the certification schemes have made the same assumption and promote the same misconception.

But I’m sure that things will change soon. Agile is mainstream and a generation of developers, product owners and testers who might have known no other approach are influencing at a more senior level. Story-based approaches to define requirements or to test existing requirements ‘by example’ and drive acceptance (as well as being a source of tests for developers) are gaining followers steadily. The whole notion of story telling/writing and exampling is to ask and answer ‘what if’ questions of requirements, of systems, of thinking.

There’s always been an appetite for less test documentation but rarely the trust – at least in testers (and managers) brought up in process and standards-based environments. I think the structured story formats that are gaining popularity in Agile environments are attracting stakeholders, users, business analysts, developers – and testers. Stories are not new – users normally tell stories to communicate their initial views on requirements to business analysts. Business analysts have always known the value of stories in eliciting, confirming and challenging the thinking around requirements.

The ‘just-enough’ formality of business stories provides an ideal communication medium between users/business analysts and testers. Business analysts and users understand stories in business language. The structure of scenarios (given/when/then etc.) maps directly to the (pre-conditions/steps/post-conditions) view of a formal test case. But this format also provides a concise way of capturing exploratory tests.

The conciseness and universality of stories might provide the ‘just enough’ documentation that allows folk who need documentation to explore with confidence.

I’ll introduce some ideas for exploratory test capture in my next post.