Tutorial 4:
Tests of Difference
Overview
In this tutorial you will learn about test of differences. As the name suggests these test show you whether there is a difference between two or more groups.
The most common types of test here are:
Independent t-tests (Students) and its non-parametric alternatives (Welch's/Mann Whitney U)
Paired Sample t-tests and its non-parametric alternatives (Wilcox Rank)
One-way ANOVAs and its non-parametric alternative (Kruskal-Wallis)
Repeated Measures ANOVA and its non-parametric alternative (Friedman)
Factorial ANOVA
Please note, that this section focuses on the Independent Sample T-Test, One-Way ANOVA and Factorial ANOVAs, although we will provide you with a quick overview of what Paired and repeated measures tests are.
The concept is actually relatively simple. As suggested in the title of this tutorial, the purpose of these tests is to see whether or not there is a difference in outcome if you have more than one group. Note, the Dependent variable must be continuous (can be ordinal if you are using non-parametric tests).
In addition to these, test we will introduce you to Effect Size too. Effect Size allows you to see how big the difference is, rather than just saying a difference exists.
Additonal Learning Materials
EASY - Davis, C (2019) Statistical Testing with Jamovi and Jasp: Open Source Software. Vor Press. Read: Chapter 6 & Chapter 10
EASY - Rowntree, D (2018) Statistics without Tears. Penguin Books. Read: Chapter 6 & 7
MODERATE - Frost, J(2020) Hypotheses Testing: An intuitive guide for making Data-driven decisions. Read: pp. 43-76 & 197 - 234
HARD - Navarro, D & Foxcroft, D (2022) Learning Statistics with Jamovi. Read: Chapter 11, 14 & 15
Variables required for tests of difference
Independent t-tests:
Dependent Variable: A continuous variable (can be ordinal if using a non-parametric test)
Independent Variable: This should be a nominal variable with 2 levels (e.g. Male/Female or Yes/No)
ANOVAs:
Dependent Variable: A continuous variable (can be ordinal if using a non-parametric test)
Independent Variable: This can be a nominal or ordinal variable and should have 3 or more levels (e.g. favourite colour, Ethnicity, etc).
Concept: Tests of Difference
Tests of difference allow you to check whether there are differences in Mean between two or more groups. Once you have selected your variables, the Stats program of your choice will allow you to check whether this difference is significant and how big the effect is.
T-Tests and the non-parametric alternatives allow you to test whether there is a difference between 2 groups only (e.g. Male/Female; Degree/No Degree).
ANOVAs and the non-parametric alternatives allow you to test whether there is a difference between 3 or more groups (essentially it is the same as running a series of t-tests).
Factorial ANOVAs or two-way ANOVA is an ANOVA that allows you to use more than 1 categorical independent variable. This ANOVA will give you results about the interactions between the two categorical variables. For example, what impact does Gender and Education have on racism? That said, consider using linear regressions instead (we will cover them in more details in a later tutorial). More details can be found in the step-by-step tutorial.
Note: Independent v Paired/Repeated Measures tests
In Social and Political Sciences you will mostly use Independent T-Test/one-way ANOVAs. These tests assume that each level is independent from one another.
Independent T-Test/ANOVAs: The variable extremism measures an individuals extremism score. You would expect the score of men and women to be independent of each other.
That said, sometimes you may want to compare continuous variables that are paired (not independent from each other).
Paired T-test/Repeated Measures ANOVA: You have an assessment, after the assessment you provide extra support and run the test again. Now you want to check whether your intervention has made a difference. Observations are paired rather than independent from each other.
The Jamovi step-by-step guides only covers test that assume your data is independent. More details on Paired T-Test and Repeated Measures ANOVA are available on the links.
Test of difference assumptions:
Before we discuss these tests in more detail, it is important to cover assumptions. T-Tests and ANOVAs have certain assumptions which should be met. If your assumptions are not met you should use non-parametric tests or a robust test.
Don't worry, most stats programmes allow you to check the assumptions easily. The tests assumptions are:
Normality
Homogeneity of variance
The Assumptions for T-Test and ANOVAs are the same.
It is worth noting that if your sample size is relatively small (<50), the you should using non-parametric test instead. Non-parametric tests and robust tests do not have the above assumptions.
Robust tests deal with the outliers within the data, so if you have data that has extreme outliers, a Robust test, may be the way forward, rather than a non-parametric test
Non-parametric tests are not as robust, so where possible a parametric test should be used.
Let's look at each of the assumptions.
Normality:
Remember the Normal distribution? Tests of difference assume that your data is approximately normally distributed. When you check the computer essentially compares your data with the normal distribution and tells you whether there is a difference between the two distributions. Essentially, the following hypothesis is checked:
H0: There is no difference between the normal distributions and the sample data.
Ha: There is a difference between the normal distributions and the sample data
As with everything else, the computer will spew out a p-value. If you get a value of <0.05 we reject the H0 and assume that your data is not approximately normally distributed. So we use a non-parametric test instead.
Homogeneity of Variance:
Here the assumption is that the data for your two groups have a similar distribution and uniform structure and that the Standard Deviation for your samples is approximately similar. So you shouldn't have one group with a positive skew and the other with a negative skew.
Again the computer checks the following hypothesis:
H0: There is no difference in the distribution and structure of the two sample distributions.
Ha: There is a difference in the distribution and structure of the two sample distributions.
Again if you get a value of <.05 we reject H0 and assume that there is a difference.
Independent T-Tests
As outlined above, t-test allow you to see whether there is a difference between the mean of two groups. You can easily visualise the outcomes of a t-test using density plots or box plots.
Which test to select:
Assumptions of Normality and Homogeneity met: Use a Students T-Test (note some argue that we should use a Welch's Test as standard).
Assumption of Normality met, Assumption of Homogeneity not met: Use the Welch's Test
Assumptions of Normality and Homogeneity not met: Use the Mann Whitney U Test or a robust t-test.
Independent T-Test Example
In the Example below we will tests the following Hypothesis:
H0: There is no difference in the racism score of men and women
Ha: There is a difference in the racism score of men and women.
You can easily visualise the outcomes of a t-test using density plots or box plots. Note these are descriptive plots, so you should always get the statistics from the relevant test. As you can see from the graphs the distribution of both does not look like the normal distribution, but the structure of the two levels does look somewhat similar. The box plot nicely summaries your data for both Men and women.
Let's check the assumption.
In our case neither assumption is met (both below p <.001). This means we should use a Mann Whitney U test/robust t-test. Below is the output of using Mann Whitney U test.
The p-value suggest that we should reject H0 (p <.001) - there is a statistical difference between the racism score of men and women.
But hold on a minute, remember Confidence intervals? It is worth looking at them as they give us a good idea about the range of the difference. In this case the confidence intervals ranges from -0.397 to 0. So within our 95% confidence interval we could expect results of a 0 difference. This is a strong indication that we should consider accepting H0 rather than rejecting.
This is supported by the Effect Size of 0.157 which suggest that the effect is negligible (but more on the effect size below).
Why is this? The bigger the sample, the more likely you are to get statistically significant results even if the effect is extremely small.
One-Way ANOVA
A one-way ANOVA is essentially a series of t-test. The ANOVA allows you to check whether there is a difference in mean between three or more factors/levels. (e.g. a Likert scale variable).
ANOVAs returns an overall result, however, this does not mean that all means will be statistically different. To find out where the differences are you need to run a post-hoc test.
Which Test and post-hoc test to select:
Assumptions of Normality and Homogeneity met: Use a one-way ANOVA and the Tukey post-hoc test.
Assumption of Normality met, Assumption of Homogeneity not met: Use a one-way ANOVA and Games Howell post-hoc test.
Assumptions of Normality and Homogeneity not met: Use the the Kruskal-Wallis Test and DSCF pairwise comparison or a Robust ANOVA
ANOVA Example
Let's look at an example:
H0: Levels of education have no impact on a person's racism score
Ha: Levels of education impact a person's racism score.
Before we run the test, we must check the assumptions. It looks like the normality and homogeneity assumptions are violated. This means we should select either a Kruskal-Wallis or Robust ANOVA test. The output for both is below.
Both the Kruskal-Wallis and Robust ANOVA return a p-value of <.001. This suggest that we should reject the H0 in favour of Ha.
The level of education appears to affect individuals racism scores.
The post-hoc test allows us to explore where the difference are.
Looking at the table above, you can see that there are, for example, statistically significant differences between those with No Qualification and those with Higher Education (p-value 0.011). We can, of course, display the above table visually.
For a more detailed video how ANOVAs and T-Test work watch Stat Quest
Concept: Effect Size
What is Effect Size?
P-values tell you whether the relationship in your data is statistically significant. A significant p-value, however, does not tell you anything about the strength of the relationship, just that it did not occur by chance.
This can be problematic, with larger datasets, you often get significant p-values, but their actual effect is marginal at best. This is where the Effect size comes in. The effect size tells you about the magnitude of the relationship.
In the previous tutorial, you learned about Pearson's R, Spearman's and Cramer's V. Yes, these are correlation Coefficients - but essentially they are effect sizes that tell you about the strength of the relationship.
It is worth noting when you get an effect size that is very small, you should consider rejecting Ha even when the p-value is significant. Why? Because although the result did not occur by chance, the effect of the relationship indicates that the difference is too small to make a difference.
Below are some guides that will help you interpret the Effect Size when using T-Tests and ANOVAs.
Interpreting Effect size: Cohen's D & Pearson's r
Cohen's D effect size is used for T-Tests. You should already be familiar with Pearson's R (Yes, it is a correlation but it essentially tells you the effect of the correlation).
Interpreting Effect size: Use Eta Squared or Omga for ANOVAs
For ANOVAs you will either use (Partial) Eta Squared (denoted by the funny n2) or Omega (denoted by funny w2). Omega is less common, but the better alternative.
(Partial)Eta and Omega are interpreted the same way. Note if you have more than one 1 Independent Variable use Partial Eta Square rather than Eta Square.
(Partial)Eta Squared / Omega
No effect: <0.01
Small effect: 0.01
Medium effect: 0.06
Large: 0.14 or larger