Critical Thinking – An Example

Background

Michael Bolton recently posted a message on LinkedIn as follows https://www.linkedin.com/posts/michael-bolton-08847_low-code-testing-tools-are-really-low-testing-activity-6921751556463685633-pcQS?utm_source=linkedin_share&utm_medium=member_desktop_web:

“Low-code testing tools” are really “low-testing code tools”

In what follows I infer no criticism of Michael or the people who responded to the post, whether positive or negative.

I want to use Michael’s proposition to explain why we need to be much more careful about:

  • How we interpret what other people say or write
  • How we assess its clarity, credibility, logical consistency or truthfulness
  • Whether we understand the assumptions, experience or agenda of the author, or of ourselves as the audience
  • And so on…

This is a critical thinking exploration of a phrase of a ten-word, five-term sentence.

I’ve been reading a useful book recently, “The Socratic Way of Questioning” – written by Michael Britton and published by Thinknetic. Thinknetic publish quite a few books in the critical thinking domain and you can see these here:   https://www.amazon.co.uk/Thinknetic/e/B091TWBHXN/

I’ll use a few ideas I gleaned from the book, describe their relevance in a semi-serious analysis of Michaels’ proposition and hopefully, illustrate some aspects of Critical Thinking – which is the real topic of this article. I don’t claim to be an expert in the topic. But I’m a decent reviewer and can posit a reasonable argument. I’ll try to share my thought process concerning the proposition and take a few detours on the overall journey to expand on a few principles.

I’ve no doubt that some of you could do a better job than I have.

The importance of agreed definitions

The first consideration is that of definition. How can you have a meaningful discussion without having agreed meanings of the words used in that discussion? Socrates himself is said to spend at least half of his time in argument trying to get consensus on the meanings of words. Plato’s Republic, the most famous Socratic dialogue spends most of its time defining just one term – justice.

Turning towards the proposition, what do the terms used actually mean? Do they mean the same thing to you as they mean to Michael? Is there an agreed, perhaps universally agreed definition of these terms? So, what do the following terms mean?

  • Code
  • Testing
  • Tools
  • Low-Code
  • Low-Testing

Now, in exploring the potential or actual meanings of these terms, I’ll have to use some, perhaps many terms that I won’t define. Defining them precisely in a short article is inevitably going to get very cumbersome – so I won’t do that. Of course, we need an agreed glossary of terms and their definition as the basis of the definitions at hand. But it’s almost a never-ending circle of definitions. The Oxford English Dictionary, when I last looked, has compiled over 600,000 definitions of words and phrases covering “1000 years of English” (https://public.oed.com/history/).

Recently, I have been doing some research with ‘merged’ regular English dictionaries and some of the testing-related glossaries available on the internet. I now have some Python utilities using open-source libraries to scrape web pages and PDF documents and scan them for terms defined in these sources as well as detection of phrases deemed ‘meaningful’. (What meaningful means is not necessarily obvious). I’ll let you know what I’m up to in a future post.

We have a real problem in the language used in our product and service marketing, and sadly our exchanges on testing topics in discussions/debates.

Definitions

Here goes with my definitions and sources, where I could identify and/or select a preferred source. I have tried to use what I think might be Michael’s intended definition. (I am surely wrong, but here goes anyway).

Code: generally defined as a programming or script language used to create functional software to: control devices; capture, store, manipulate and present data; provide services to other systems or to humans. In this context, we’re thinking of code that controls the execution of some software under test.

Testing: …is the process of evaluating a product by learning about it through experiencing, exploring, and experimenting, which includes to some degree: questioning, study, modelling, observation, inference, etc. (Source: https://www.satisfice.com/blog/archives/856 – there’s a security vulnerability on the page by the way – so use an incognito window if your browser blocks it).

Tools: Oxford English Dictionary: a device or implement, especially one held in the hand, used to carry out a particular function. Merriam Webster: something (such as an instrument or apparatus) used in performing an operation or necessary in the practice of a vocation or profession. In our context – software test execution automation – we refer to software programs that can: stimulate a system under test (SUT); send inputs to and receive outputs from the SUT; record specific behaviours of the SUT; compare received outputs or recorded behaviours with prepared results. And so on.

Low-Code: There is no agreed definition. Here is an example relating to software-building which is reasonable https://www.mendix.com/low-code-guide/: “Low-code is a visual approach to software development that optimizes the entire development process to accelerate delivery. With low-code, you can abstract and automate every step of the application lifecycle to streamline the deployment of a variety of solutions.” I’d substitute software test development for software development. In the context of testing, the terms ‘codeless’ is more often used. This definition appears on a tools review website (https://briananderson2209.medium.com/top-10-codeless-testing-tools-in-2020-every-tester-should-know-2cb4bd119313): “Simply put, codeless test automation means creating automated tests without the need to physically write codes. It provides testers lacking sophisticated programming skills with project templates for workflow, element libraries, and interface customization.”

Low-testing: I’m pretty certain Michael has invented this term without defining it. But we can guess, I think. There are two alternatives – either (a) the quantity of testing is lower (than what?). We don’t know. But let’s assume it refers to ‘enough’… or (b) the quality of testing is lower than it should be (by Michael’s standard, or someone else’s – we don’t know). I suspect (b) is the intended interpretation.

Interpreting the Proposition

Since we now have some working definitions of the individual terms, we can expand the proposition into a sentence that might be more explicit. We’ll have to document our assumptions. This carries a risk: have we ‘guessed’ and misunderstood Michael’s purpose, thinking, assumptions and intended meaning?

Anyway. Here goes:

“Using test execution tools that require less (or no) coding means the quality (or quantity, possibly) of your testing will be reduced.”

Here are our assumptions in performing this expansion:

  • The ‘low-testing’ phrase implies the lower quality or quantity testing is achieved. (Does it matter which? It would probably help to know, of course).
  • Only using such tools causes this loss of quality or quantity. Does having such tools, but not using them, not?
  • Having knowledge of some of Michael’s previous writings about tools and testing justifies our (still ambiguous) expanded phrase, I believe.

Ambiguities

We have several ambiguities in our reformulation of the original proposition:

  • Do low-code tools have the effect of lowering the quality/quantity of testing or do ‘high-code’ – that is, all tools – do this?
  • Do tools lower the quantity, or do they lower the quality of testing? (Or both?)
  • Lowered compared to what? Testing without tools? (Or see below… tools with ‘higher-code’)
  • What is meant by ‘the quality of testing’? How is that defined? How could it be measured? Evaluated?
  • The quantity of testing might be reduced – but might this be a good thing, if the quality is the same or improved?
  • There’s no mention of the benefits of using (low-code) tools – but is the overall effect of using tools (low or high code) detrimental or beneficial?
  • There is no mention of context here. Is the proposition true for all contexts or just some? Could using such tools possibly be a universally best practice? Or a universally bad one?

Why This Proposition?

Is this proposition making a serious point? Does it inform a wider audience? I don’t think so. It’s clickbait. Some people will (perhaps angrily) agree and some (angrily) disagree. These people will probably be driven by preconceptions and existing beliefs. Do ‘thinking people’ respond to such invitations?

It’s a common tactic on internet forums and I’ve used the tactic myself from time to time. Although I think it’s a lazy way of starting a debate because the ‘thoughts from abroad/musings/worldview/the toilet’ (strike out what doesn’t apply) – tend not to trigger debate, they just cause some folk to extol/confirm their existing beliefs.

So, don’t ask me, ask Michael.

Summary

I’ve written over a thousand words so far, analysing a ten-word, five-term proposition. I’m sure there is much more I could have written if I thought more deeply about the topic. A word-ratio of 100 to 1 isn’t surprising. Many books have been written discussing less complicated but more significant propositions. Love, justice, quality, democracy… you know what I mean.

The lack of agreed definitions of the terms we use can be a real stumbling block, leading to misinterpretations and misconceptions hardly likely to improve our communications, consensus and understanding. Even with documented definitions there are problems if people dispute them.

Before you engage in a troublesome conversation of a proposition, ask for definitions of the terms used and try, with patience, to appreciate what the other person is actually saying. You may find gold or you may demolish their proposition. Or both.

Finally

Am I suggesting you apply critical thinking to all conversations, all communications? No of course not. Life is too short. And surely, if you analyse all your partner’s words you are bound to land in scalding hot water. So, like testing and all criticism – it has its place. A weapon of choice, in its place.

Critical thinking, like testing, should be reserved for special occasions. When you really want to get to the bottom of what an author, speaker or presenter is saying or a supplier is supplying. How can you agree or disagree without understanding exactly what is being proffered? Critical thinking works.

What’s my Response to the Proposition?

“After all this circumlocution, what do you think of the proposition, Paul?”

I think there are more interesting propositions to debate.

Is Leadership Innate or can it be Learned?

It’s a common question. I need to lead, but do I have the ability to do that?

There’s lots of talk about ‘born leaders’ but what if I’m not a born leader? Can I still succeed?

In this video, I suggest leadership can be learned and enhanced with practice and support.

Tenets of Testing – A comparison with Test Axioms

Nicholas Snogren posted on LinkedIn a reference to an “Axioms of Testing” presentation from 2009 and asked me to comment on his “Tenets of Software Testing“. There are some similarities but not many I think, some parallel too, but his question prompted me to give a longer response than I guess was expected. I said…

“Hi, thanks for asking for my opinion. Your tenets look interesting – and although I don’t think they map directly to what I’ve written, they raise points in my mind that need a little airing – my mind grows cobwebby over time, and it’s good to brush off old ideas. A bit like exercising muscles that haven’t been used for a while haha.”

I give my response as a comparison with my Tester’s Pocketbook, and Test Axioms website and presentations. (I notice that some of these posts are around 12 years old and some links don’t work (anymore). Some are out of my control, others I’ll have to track down and correct others – let me know if you want that.)

Tenets v Axioms

Firstly, let’s get our definitions right.

According to dictionary.com, a Tenet is “any opinion, principle, doctrine, dogma, etc., especially one held as true by members of a profession, group, or movement.” Tenets might be regarded as beliefs that don’t require proof and don’t provide a strong foundation.

From the same source, an Axiom is, “i) a self-evident truth that requires no proof. ii) a universally accepted principle or rule. iii) Logic, Mathematics. a proposition that is assumed without proof for the sake of studying the consequences that follow from it”

I favoured the use of Axioms as a foundation for thinking in the testing domain. Axioms, if they are defensible, would provide a stronger foundational set of ideas. When I proposed a set of Testing Axioms, there was some resistance – here’s a Prezi talk that introduces the idea.

James Bach in particular challenged the idea of Axioms and suggested I was creating a school of testing (when schools of testing were getting some attention) here.

By and large, by defining the Axioms in terms that are context-neutral, challenges have tended to be easy to acknowledge, disarm and set aside. Critics, almost entirely from the context-driven school, jumped the gun so to speak – they clearly hadn’t read what I had written at the time before critiquing. Only one or two people responded to James’ call to arms to criticise the Axioms and challenged them.

The Axioms are fully described in The Tester’s Pocketbook – http://testers-pocketbook.com/.

The Axioms of Testing website – https://testaxioms.com/ – sets out the Axioms with some explanation and provides around 50% of the pocketbook content for free.

Axioms caught the attention (and criticism) of people because I pitched them as universal principles or laws of testing. Tenets, being less strident in their definition might not attract attention (or criticism) in the same way.

Immediate Comments on the Tenets

The Tenets are numbered and italicised. My comments in plain text.

  1. A software product’s behavior is exhibited by interactions.
  2. There is potentially an infinite number of possible behaviors in software.

These are properties of software. I’m not sure what 1 says other than behavior is triggered by interactions and presumably observed through interactions. Although a lot of software behaving autonomously might respond to internal events such as the passing of time and might not exhibit any behaviour through interactions e.g. a change of internal state. I’m not sure 1 says much.

Tenet 2 is reasonable for reasonably-sized software artefacts.

In the Tester’s Pocketbook, I hardly use the term software. I prefer that we test systems. Software is usually an important part of every system. Humans do not interact with software (except by reading or writing it). Software exists in the context of program compilation, hosted on operating systems, running on devices which have peripherals and other interconnected systems which may or may not have user interfaces.

Basing Axioms on Systems means that the Axioms are open to interpretation as Axioms of testing ANY system (i.e. anything. I don’t press that idea – but it’s an attractive one). Another ‘benefit’ is that all of the Systems Thinking principles can also be brought to bear on our arguments. Outside its context, Software is not a System.

3. Some of those behaviors are potentially negative, that is, would detract from the objectives of the software company or users.

I use the term Stakeholders to refer to parties interested in the valuable, reliable behavior of systems and the outcome and value of testing those systems.

4. The potentiality for that negative behavior is risk.

OK, but it could be better worded. I would simply say ‘potential modes of failure’ rather than negative behaviour.

5. It’s impossible to guarantee a lack of risk as it’s impossible to experience an infinite number of behaviors.

Not really. You can guarantee a no-risk situation if no one cares or no one cares enough to voice their concerns before testing (or after testing). There is always the potential for failure because systems are complex and we are not skilled enough to create perfect systems.

6. Therefore a subset of behaviors must be sampled to represent the risk.

Rather than represent, I would say trigger the failure(s) of concern to explore the risk and better inform a risk-assessment.

7. The ability to take an accurate sample, representative of the true risk, is a testing skill.

Not sure what you mean by sample – tests or test cases, I presume? Representative is a subjective notion, surely; ‘true’ I don’t understand; and a testing skill would need more definition than this, wouldn’t it?

8. A code change to an existing product may also affect the product in an infinite number of ways.

I’d use ‘ANY’ change, to a ‘SYSTEM’. Why ‘also’? What would you say fits into a ‘not only…. but also…’ clause? But I’m not sure I agree with this assertion anyway. A code change changes some software artefact. The infinite effects (faulty behaviors?) derive from infinite tests (or uses in production) – which you say in 5 is impossible to achieve. I’m not sure what you’re trying to say here.

9. It is possible to infer that some behaviors are more likely to be affected by that change than others.

You can infer anything you like by calling upon the great Unicorn in the sky. How will you do this? Either you use tools which are limited in capability or you might use change and defect history or you might guess based on partial knowledge and experience.

10. The risk -of that change- is higher within the set of behaviors that are more likely to be affected by that change.

Do you mean probability of failure or the consequence of failure? I assume probability. At any rate, this is redundant. You have already asserted this in 9. But it’s also more complicated than this – a cosmetic defect on an app can be catastropic and a system failure negligible at times.

11. The ability to accurately estimate a scope of affected behavior is another testing skill.

I would call this the skills of impact analysis rather than testing. Developers are relatively poor at this, even having a far deeper technical knowledge (either they aren’t able or lack the time to impact-analyse to any reliable degree). So we rely on testing to catch regressions which is less than ideal. Testers depend on their experience rather than system internals knowledge. But, since buggy systems behave in essentially unpredictable ways, we must admit our experience is limited and fallible. It’s not a ‘skill’ that I would dwell on.

12. The scope and sampling ideas alone are meaningless without empirical evidence.

The scope and sampling ideas have meaning regardless of whether you implement them. I suppose you might say they are useless ideas if you don’t gather evidence.

13. Empirical evidence is gathered through interactions with the product, observation of resultant behavior, and assessment of those observations.

The word empirical is redundant. I would use the word ‘some’ here. We also get evidence from operation in production, for example. (Unless you include that already?)

14. The accuracy and speed of scope estimation, behavior sampling, and gathering of evidence are key performance indicators for the tester.

If you are implying 13 are tester skills, I suppose you could make this assertion. But you haven’t said what the value of evidence is yet. Is the purpose of testing only to evaluate the performance of testers? Hope not ;O)

15. Heuristics for the gathering of such evidence, the estimation of scope, and the sampling of behavior are defined in the Heuristic Test Strategy Model.

Heuristics are available in a wide range of sources including software, systems and engineering standards. Why choose such a limited source?

Inspiration for the Tenets

These tenets were inspired by James Bach’s “Risk Gap” and Doug Hubbard’s book “How to Measure Anything.” Both Bach and Hubbard discuss a very similar idea from different spaces. Hubbard suggests that by defining our uncertainty, we can communicate the value of reducing the uncertainty. Bach describes the “knowledge we need to know” as the “Risk Gap.” This Risk Gap is our uncertainty, and in defining it, we can compute the value of closing it. In testing, I realized we have three primary areas of uncertainty: 1) what is the “risk gap,” or knowledge we need to find out, 2) how can we know when we’ve acquired enough of that unknown knowledge, and 3) how can we design interactions with the program to efficiently reveal this knowledge.

There are several interesting anomalies to unpick here:

  • I recall James telling a story about Tom Gilb asserting anything could be measured. James suggested Love and Tom obliged. I don’t think James was impressed.
  • ‘Defining uncertainty’ – how do you do that reliably? Numerically? Objectively? We can put any numbers we like against probability and consequence. Being certain, with or without evidence, is always subjective. People can say they are more certain, but based on … what? How do we correlate data with a human emotion and use that to make engineering decisions? People can be easily deceived – by themselves, not just by others. Consider this, for example, and this.
  • Risk Gap – how is a quantity of knowledge measured? What units? With what certainty? These are aspects that James has argued against since the early 1990s.
  • Your three challenges 1), 2)and 3) are reasonable as goals. How do Bach and Hubbard argue you achieve them, if not by calling on the subjective opinions of other people?

Some More General Comments

You seem to be trying to ‘make a case’ for testing as a tool to address the risk of failure in systems. I (like and) use that same approach in a rounder sense in my conference talks and writings, when practicable. My observations on this are:

  1. The logic doesn’t flow as it should because of flaws in the individual statements
  2. You have no pre-definition of test, testing or its purpose at the outset, so it’s not clear what your destination is
  3. There’s no defined goal of testing, other than to gather evidence to (my words) reassess risks and thereby reduce uncertainty (but you don’t say why that’s a ‘good thing’)
  4. Testing enables a reassessment of risk, but that reassessment may increase risk if, for example, you find a bug in something that was previously deemed reliable. (there’s a bigger conversation to be had, but risk is not a BAD thing, it’s the barrier(s) you need to navigate or break through to gain your REWARD).
  5. Extant, significant risks are a BARRIER to accepting or releasing systems. As such, the goal of testing is to provide evidence that to the people who make the decision, those risks are acceptable or negligible (reducing uncertainty, sure but never eliminating it). But the ultimate goal of testing is to show that the system WORKS. Encountering failures is a detour from that goal. The testing goal is broader than exploring risks.
  6. You don’t mention stakeholders at all. Why do we test? To provide evidence to testing stakeholders – our customers – so they can make better-informed decisions.

Summary

I don’t want to give the impression that I’m criticising Nicholas or am arguing against the concept of Tenets or Principles or Axioms of testing. All I have tried to do is offer reasonable criticism of the Tenets to show that is a) extremely difficult to postulate bullet-proof Tenets, Principles or Axioms and b) it is extremely easy to criticise such efforts by:

  • Exposing flaws in the language used and the logic in an argument that C follows B follows A etc.
  • Identifying implicit assumptions of meaning, scope, dependency and authority and
  • Offering examples of context that contradict, or expose flaws in, the statements made.

I do this because I have been there many times since 2008 and occasionally have to defend the Test Axioms from such criticisms. I have to say, Critical Thinking is STILL a rare skill – I wish criticism were more often proffered as a result of it.

References

  1. The Tester’s Pocketbook – http://testers-pocketbook.com/
  2. Axioms of Testing website – https://testaxioms.com/