These are my principles. If you don't like them - well - I have others

(G. Marx)

Nov 17, 2023

There are many methods of scientific reasoning. The previous post demonstrated a common form of abductive (abducktive?) reasoning that is very popular in the entertainment industry known as the “Duck Test”: if looks like a duck, floats like a duck, and quacks like a duck, then it is a duck. Abductive reasoning infers conclusion ‘A’ – “a duck” – from evidence ‘B’ – “looks, floats, quacks”, and therefore identifies the simplest plausible conclusion from the evidence. However Boston Dynamics could create a roboduck that meets all of these criteria and more (eat, poop, fly, etc.), demonstrating the method is not infallible. Sadly the “Ark Test” fails at the second hurdle: “Noah’s Ark” floats like a lead balloon. The entertaining and compelling “Noah’s Ark” story is therefore exceptionally unlikely, and must be scientifically abandoned in favour of the more likely yet less entertaining “Debris Flow” theory.

Suppose you return from a weekend away to find your very nervous teenage offspring standing in front of your burnt-down house. “A comet hit it!” they wail. Occam’s razor (and abductive reasoning) states that the explanation with the fewest number of assumptions is usually correct, so you pat your kids on the head and commiserate “How traumatic for you!”. Or not. While simple solutions are often preferable, parental experience suggests party + booze + fireworks is significantly more probable. Probabilities play a role in determining the truth: the combined probability of 3 assumptions (party!) can be significantly larger than that of a very improbable comet strike.

Suppose that a podcast proposes that illuminati living in a moonbase have modified Nikola Tesla’s Teleforce Death Ray into a heat ray that is currently causing Climate Change. Sounds like the plot of a truly awful science fiction movie. The estimated probabilities (p) of the different premises:

Illuminati moon base: p < 0.0001%
Nikola Tesla inventing Teleforce Death Ray: p = 50%
Illuminati scientists able to modify Death Ray into heat ray: p = 80%
Heat Ray causing Climate Change: p < 0.0001%

The combined probability of the podcast story is calculated by multiplying these probabilities, resulting in a very small number that indicates the story is exceptionally unlikely. Most podcast listeners will intuitively sense the unlikelihood of the low probability premises and therefore find the theory crackpot and dull: we like to be entertained by creative stories and sudden plot twists, but they must be reasonably credible, i.e. p > 1%. Nikola Tesla famously claimed he had invented and built a Teleforce Death Ray, but no plans or prototypes were reportedly found. Still, illuminati scientists modifying a Tesla invention sounds like the plot of an entertaining Dan Brown book, so is a much better premise for a thriller (Based on a True Story!) than one involving illuminati moonbases. Note that some will think an assigned probability of less than 1 in a million can be embroidered into a “Scientists believe there’s a possibility …” story:

Scientific reasoning uses high probability (“true”) building blocks to reach a conclusion. For example: “All geologists drink beer” and “Koen is a geologist” can be combined to reach a conclusion “Koen drinks beer”. However, in the real world there’s always some wuss geologist who prefers shandies and Shirley Temples. While the premise “Koen is a geologist” is demonstrably true, the premise “All geologists drink beer” sounds dodgy, i.e. its probability is very likely less than 100% . The premise is testable (falsifiable) via a poll of geologists: only one geologist (the so-called anomaly or abnormality) has to admit to being a wuss to falsify a theory that was headed in the right direction. Suppose we change the reasoning to “95% of all geologists drink beer”, “Koen is a geologist”, and “Koen very likely drinks beer”. A much more satisfying and scientific conclusion is reached, as the dodgy premise has been replaced by a high-probability building block concluded from a measurement - the poll of geologists - and the overall conclusion has been qualified by weasel words that cater for the now demonstrably unlikely case I’m a milksop. Overall, a similar approach is often employed by major, less beverage-oriented scientific studies such as the IPCC study on climate change ^[1], who define their weasel words based on probabilities:

In this Report, the following terms have been used to indicate the assessed likelihood of an outcome or a result: Virtually certain 99–100% probability, Very likely 90–100%, Likely 66–100%, About as likely as not 33–66%, Unlikely 0–33%, Very unlikely 0–10%, Exceptionally unlikely 0–1%.

Their practice will be adopted in all future posts.

A standard Scientific Method does not exist, but until very recently Karl Popper’s method of “empirical falsifiability” was considered the Best Practice by scientists. A theory or hypothesis is falsifiable if it can be demonstrated to be false by some kind of test. The classic example is the theory “All swans are white.” which can be – and was proven to be – demonstrably false by the discovery of black swans. Popper’s method encourages scientists to look for the evidence that falsifies their own theories. It’s no use looking for more white swans when evaluating the “All swans are white.” theory, as that only feeds our confirmation bias. Instead we should be actively looking for non-white swans in order to formulate an improved theory that “All swans are white or black.” The main benefit of Popper’s method is that it makes scientific theories more robust and better able to predict the future, and therefore more useful to society.

Popper’s scientific method works via the falsification of the null hypothesis, that is the rejection of the claim that no relationship exists between the data sets being analysed. If this statement is confusing (rejection of no relationship?) then good, because then I won’t have pointlessly written up the following example, based on a true story (though the names have been changed etc. etc.). Suppose we have a married couple, say “Will” and “Hilaire”, and the latter accuses the former of an extra-marital relationship. Whether true or not, Will’s initial response to his accuser will likely be “Nuh-uh”. Will has stated a null hypothesis: there is no (extra-marital) relationship. Will’s next line is: ”Why would you ever think something like that, Sweetums?”, that is finding out how much Sweetums has ferreted out: a call for evidence. The evidence in turn can be assigned a probability of being true. If the evidence consists of “I read it on the Drudge Report” then the null hypothesis “There is no extra-marital relationship” cannot be rejected, as internet news sources often “embellish” the truth in order to get more clicks, that is they are often (scientifically) unreliable. In science words, there is a significant, non-negligible probability that internet news items are false. If however the evidence consists of Will’s semen stains on a dress then the null hypothesis “Nuh-uh” should be rejected, and the conclusion reached that Will “very likely” or “virtually certain” had an extra-marital affair. The process is very similar to the US Justice standard that calls for evidence to find a defendant guilty “beyond a reasonable doubt”: the null hypothesis “the defendant is not guilty” must be rejected by a convincing amount of evidence.

A scientific conclusion is reached when evidence indicates that the probability of the null hypothesis being true is so low – a cut-off level of 5% or 1% is often used - that it can be rejected. Note that mistakes are common: lotteries, casinos, and bookies make a comfortable living out of the fact that most humans (including scientists!) are terrible at estimating probabilities. A type I error (false positive) is the mistaken rejection of a null hypothesis. For example, recently KFC sent out an ad urging their German customers to celebrate Kristallnacht with Cheesy Chicken (BBC), suggesting their AI ad-bot needs to learn a bit more about how German customs and traditions have changed since 1938. The null hypothesis “This ad will not increase sales” should not have been rejected. A type II error (false negative) occurs when not rejecting a null hypothesis when it’s actually false, i.e. failing to conclude that an actual relationship exists. Some examples include Decca executive Dick Rowe’s dismissal of the Beatles (“Groups of four guitarists are on the way out.”) and Monarch executive Nate Duroff’s dismissal of Elvis Presley, because he knew for a fact that Western and Hillbilly music sales “stink”. In the case of Will and Hilaire, a type I error could have been made if some malicious individual got a hold of some of Will’s semen and smeared it on a dress, though this low-probability scenario is the stuff of Hollywood fiction (Presumed Innocent) and so-called “conspiracy theories”, discussed in the next post.

Share Think and Hammer

References:

[1] Cubasch, U., D. et al., 2013, Introduction. In: Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change

Think and Hammer

These are my principles. If you don't like them - well - I have others

(G. Marx)

Discussion about this post