One of the most common questions people use statistics to help answer is “Are these groups the same or different?”

A between-subjects design uses different groups of people (or whatever) to see if it is more likely they are actually the same or different on a given measure. That’s a bit abstract, so let’s work through an example.

One of the most common between-subjects tests is the A/B test. In User Experience (UX) studies, companies might test whether different webpage designs net them more revenue due to the placement of different banners. Their question, then, is “Are these two banner placements probably the same or different when it comes to generating revenue?”

When comparing two groups on a continuous measure, the statistical test is called a t-test. There are other kinds of t-tests, but they all generally follow the same format, and here I’m talking about what’s called the Independent Samples t-test.

# t-test

The t-test works by simply asking whether two groups are different enough from each other that it’s more likely to assume they come from different Populations than from the same Population (see Inference or Description for an overview of Populations and Samples). If the averages of these groups are larger than we would expect by sampling error alone, then we conclude that there’s probably a real difference between them.

The standard in psychology and many other sciences is to set the threshold for this decision at p < 0.05. What this means is that we would expect to randomly sample mean differences this big only 5% of the time (or less) if they were actually sampled from the same population. Therefore, we decide that it is more likely that the groups are actually drawn from different populations – we decide that the two groups are probably actually different.

# ANOVA

Although the t-test is a very useful analysis, it is limited in a very important way. Specifically, the t-test works by subtracting the mean (average) of one group from the mean of the other group, and then dividing that difference by error. However, what happens if we have more than two groups?

# Problems with Between-Subjects Designs

Between-subjects designs are a very useful tool, but they have their limitations. First, you will need a lot of participants to be able to detect any differences between your groups. If you have lots of data, though, that’s not really a big deal. A more important problem is that the designs assume the groups are roughly the same, but it takes a logical leap to infer that the treatment is the only cause of any differences you observe in your groups. In other words, you can’t tell how a specific person changes in response to your conditions.

# Summary

Between-Subjects designs are used to see whether two groups of things are more likely the same or different. You can use them to test whether different groups of your employees are paid differently, whether different banner placement on your websites generate more revenue, or whether different teams work more hours than others.