Just occasionally someone asks if everything we believe about causation-related science is wrong. This time, the use of the t-test is a cause of doubt.
In the interpretation of rat lab results, animal
experimenters use the t-test. The t-test, when used as originally designed,
compares the means of two data distributions. The standard deviation of the
distribution is first reduced to the standard error of the mean (SEM) and
compared with the mean and SEM of the other distribution. If p < 0.05 it is
pronounced that the two distributions are probably different. So, when
comparing control animals with those dosed with a toxin the t-test is used to
detect the likelihood that the toxin did anything.
The reason for doubt is that SEM comparisons are only valid
for true means. The single result e.g. 4 out of 90 rats developed lung
cancer, is not a mean. Despite this, experimental scientists use the t-test to
decide if 4/90 is different from 5/90. In fact the same experiment, when
repeated, has only a 20% chance of producing 4/90 in the control group on the
second occasion. 5/90 is a very likely finding. Despite this, animal
experimenters say that 5/90 is almost certainly a true difference from 4/90 (P
< 0.05). A more thoughtful analysis would say that 5/90 is not even probably
different from 4/90 let alone a 95% certainty.
This is a problem for causation arguments.
So, my guess is that this problem has already been resolved
and the t-test is understood by the court to be nothing more than an indicator
of a suspicion of marginal differences?
But if so, why do causation experts still make strong
statements about marginal results from animal tests? It is very easy to show
that 5/90 is not likely to be different from 4/90 no matter how much you reduce
the standard deviation to SEM. Precisely wrong is still wrong.
For a decade now, encouraged by insurance claims handlers, I’ve been using my own test of difference when evaluating animal lab results. Defendants could easily show the lack of relevance of marginal results from lab work. A ‘probability of difference’ test would actually be informative.
The background is explained in greater detail in an accompanying paper and the “probability of difference” test is revealed to the wider public for the first time. here
It is time to re-set the standard?