Statistics with single samples!

In the second post of the series on HOW NOT TO SCREW UP BIOCOMPUTATIONAL RESEARCH, I continue with this great example of epic failure, Lin et al. (2015)1. Below I address another rather amazing, yet completely avoidable deficiency of the paper: relying on single samples.

Recall that Lin et al. are trying to make the argument that the expression of genes in e.g., brain, is more influenced by the species (mouse or human) than the tissue. In other words, they claim that human-mouse brain differences in gene expression are greater than e.g., mouse brain-liver and human brain-liver.

numbers matter

The first hint of something very wrong with this paper is that … there is no table detailing the numbers of samples. I would expect, and it is very common, to see a supplementary table that says something like “mouse brain: 5 samples; human brain: 6 samples” or some such. But nary a table is to found.

The only place where the number of samples per tissue and species might be INFERRED is under the “Noncoding Transcript Analysis” heading in the Supplementary Information section. I say “inferred”, because of course the section seems to pertain only to noncoding transcripts, so what about the ones that do code? One is left to … infer … that the same samples are probably being used throughout.

In short, nowhere in the paper is there a definitive statement about the number of samples per tissue. For a given species, they seem to be reporting results from more or less one sample per tissue. Yup, one. Think of this: That is saying that the universe of variance associated with the expression of genes in organ X across the organisms that form the species is properly represented in the organ of a single member of that species. Does anyone recall the notion from basic stats whereby the mean is never to be found in a distribution of actual data? How about the wisdom of drawing conclusions from N=1 samples?

Since the paper is making assertions at the level of entire species, and therefore needs to cover a broad spectrum of factors such as the age and sex of the sample donors, and since the paper is heavily statistically motivated, you would think that detailing the properties of the samples would be crucial to provide a description of whatever was attempted toward capturing that variance.

This is even more important for the human samples, since they were obtained from deceased individuals at (presumably) wildly different ages, a very different situation from the mouse samples (all sacrificed at 10 weeks). Thus, no table for sex, age distribution, and cause of death for the human samples. Imagine that.

batch effects rear their heads again

Now, let’s think whether any of this might cause a batch effect. Hum, all of the mice (which are clones, btw) are at 10 weeks of age, whereas the humans can be expected to be all over the place, although we don’t know, ’cause, you know, that’s just not important. Furthermore, let’s see, did they control for sex? Well, no. They don’t tell us anything about sex. I guess it’s not something known to have huge impact on the organism in all sorts of ways, right?

So let’s summarize:

  1. single sample per organ and species
  2. no control for age
  3. no control for sex

By “control”, I mean an attempt to defuse a potential source of batch effects. For example, sex-balanced sampling from a set of samples with a range of ages.

Some might argue that controlling for the above biases would have involved a lot more sequencing, to the point of rendering the experiment economically unfeasible. Putting aside that this argument implies that it is OK to publish very weak results, that is not necessarily the case. There are designs that would have involved exactly the same amount of sequencing. For example, for a given organ and species, they could have extracted samples from a range of donors, measured ribosomal RNA concentrations from each, mixed those samples in a 1:1 molar equivalent manner, and sequenced a single library made from those mixes. It’s not ideal in various ways, but it wouldn’t have involved more sequencing and it would have provided a truer reflection of the variance of gene expression for that organ.

Incidentally, the converse scenario (a wide range of individual samples) provides the benefit of sequencing many more samples, such that they would have been able to control for sequencing sample batch effects, since they would have had more than just 13 samples, thus addressing a comment made by Lin et al. that “No study design given the current constraints of multiplexing and lane organization can account for both primer index and lane effect simultaneously” (see comment section in Gilad and Mizrahi-Man’s re-analysis paper).

so how did this get published?

One might wonder as to how one gets something like this published. In other words, how could the reviewers let this pass?

Cutting to the chase, the only plausible explanation as to how this was published is that the authors were given a pass by some (all?) of the reviewers and by the editor (whose function it is to make sure the reviewers are doing their job). It is likely that such a pass might not have been forthcoming for less prominent researchers from lower profile universities than Stanford.

Folks,let’s remember that this such massive failures are not consequence-free. This “research” was paid mostly using our tax money. Taxpayers are entitled to expectations as to how these funds are being spent.

Beyond the dollars, let’s also bear in mind that whatever scientific benefit might have ensued if funding had gone elsewhere is now lost, since funding is a zero-sum game: dollars spent on X cannot be spent on Y.

And that’s not even addressing the human cost that results from the introduction of such noise into the literature. We are trying to understand life and cure disease here, and that task is not helped when finite resources are being consumed in this way.

How to not mess-up

So how does one help prevent the problem? Here are three things you can do to minimize the problem and increase the quality of research products, whether yours or others’:

  1. as a scientist: when reading a paper, start with the Methods section first. It should contain a wealth of details and specifications. If it doesn’t, something is potentially fish. Good papers have a good Methods section — simple as that.
  2. as a reviewer: insist on basic details in the Methods and Results sections. Basic questions should always be addressed (how many, what type, source). This is especially true for reagents. That includes insisting on catalog numbers for the latter, something which journals appear to increasingly request.
  3. as a lab director: follow the advice in (2). Make sure you ask, receive, and understand the details. It’s all details!


1. Lin S, Lin Y, Nery JR, et al.: Comparison of the transcriptional landscapes between human and mouse tissuesProc Natl Acad Sci U S A. 2014; 111(48): 17224–17229.

Statistics with single samples!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s