# Step-by-Step Guide:

# (Multi) Linear Regression

## Overview

This step-by-step guide you will learn how to design a multi-linear regression and interpret its results.

Note that you results are always dependent on the variables in the model. Removing or adding variables will change your outcomes. For this reason, we will check the assumptions at the end. But do make sure to check your assumptions and use alternative test if the assumptions are not met.

## Dataset & Variables

Skoczylis, Joshua, 2021, "Extremism, Life Experiences and the Internet", https://doi.org/10.7910/DVN/ICTI8T, Harvard Dataverse, Version 3.

Dependent Variable(s): Continuous

Extremism Score Scaled: Measures extremism score of individuals on a scale of 1 - 10

Independent Variable(s): Continuous/Ordinal/Nominal

Age: Measures participants age

Social Media Use: Measures a participants use of Social Media (variables are standardised)

Highest Qualification: Measures a participants level of qualification

Household Income: Measures a participants income on an ordinal scale

Strain: Measures a participants level of strain (variable based on EFA and scores are standardised)

## Linear Regression: Step-by-Step Video Guides

The below vide takes your through how to run a linear regression in Jamovi.

Often you will have more than one independent predictor variable. You can just add these into your model, or you can use something called Block-wise entry. This tells you whether the additional variables you have entered make a difference to your overall model.

Once you are happy with your model, you will need to check whether the Regression assumptions are met. If they are not consider adding additional explanatory variables. But don't just add variables for the sake if it.

If the assumptions are still not met consider transforming your variables and use a Logistic regression instead.

## Hypothesis for Linear Regressions

H0: There is no correlation between Age, Social Media Use, income and Strain and a persons extremism score.

Ha: There is a significant correlation between Age, Social Media Use, income and Strain and a persons extremism score.

## Linear Regression: Step-by-Step Guide

1.

Analyses > Regression

Navigate to Analyses > Regression > Linear Regression.

You should now be able to add all of the above variables into the boxes.

2.

Simple Linear Regression

Let's start with a simple linear regression. Use the following:

Now select your Extremism Score Scaled and drag it into the Dependent Variable space. Then Drag Age to the Covariate (Independent Variable) space

Initial results should appear on the right.

Let's add some additional information to our results. Navigate to Model Fit and remove R. Now add RSquared. The results should appear in your table instantly.

Now Navigate to Model Coefficients. Here we can add the confidence intervals for our model coefficients. You can also add standardise coefficients with confidence intervals.

Let's also quickly visualise the outcome.

Go to Estimated Marginal Means > Drag Age to Term > Select Marginal Means Plot & Marginal Means Table.

2.

Simple Linear Regression Results

Let's quickly explore the results. As you can see the RSquared is very low at 0.030. This means 3% of the variance in the extremism score is explained by age. The F-test returns a p-value of <.001 indicating that this model improves does improve it from having no predictor variables.

The coefficient for age is -0.016 and we would expect this to be between somewhere between -0.020 and -0.011. Note, although significant (p <.001) the relationship is not particularly strong at all.

In brief, this means as age goes we expect the persons extremism score goes down.

The plot confirms the above. Note, the plot also shows you a negative slop and the confidence intervals.

A table with the Marginal means has also been included.

It is worth noting that at this point we have not check the assumptions. So we don't know how reliable our model is.

As indicated above, this simple model does not explain a lot of the variance of the extremism score. In the next steps, we will use Block-wise-entry to add additional explanatory variables and see if we can improve our model.

Once we are happy with our model, we will check our assumptions.

## Linear Regression: Block-wise-entry

1.

Add Additional Independent Predictor Variables

Let's see if we can improve our model by adding the following variables to our model:

Gender, Household Income, Strain, Social Media Use and Highest Qualification.

The results of the regression appear on your right. You can skip the Block-wise-entry below if you wish and go straight to the full results.

Block-wise-entry allows you to check whether the adding the new variables has improved your models predictive power or not.

2.

Block-wise-entry

Navigate to Model Builder.

Create a second Block

Aside from Age, move the predictor variables from Block 1 into Block 2.

This allows us to compare your simple linear regression model with our new multi regression model.

On your right you should now be able to select between Model 1 (your simple linear regression) and Model 2 (your multiple regression).

It is worth noting that you can also add interaction terms here. In a regression an interaction is an effect of an independent variable on the a dependent variable, depending on the values of another predictor variable.

3.

Compare your models

You should now be able to compare the two models. What we really want to know is whether adding the new variables has improved our model or not.

Before comparing the two models we will also add some additional information in the Model Fit.

In that section just select the Overall Model Fit just select F-Test

In the two Tables below we get all of the information about the two models.

The first table Model Fit Measures tells you the RSquare, Adjusted RSquare and the p-value for both models.

We can see that the Adjusted RSquare for the second model has improved from 0.040 to 0.099 and that both models are significant.

The second Table Model Comparison tells us whether there is a significant difference between the two models. We get a p-value of <.001 which indicates that adding the additional predictor variables has indeed improved our mode.

If the p-value is above .05 it suggest that the new variables have a limited impact on your model. It is also worth noting that you can have as many Blocks (models) as you want.

4.

Reference Levels

Now that we have created our model, let's set the reference levels for our ordinal and nominal variables.

Jamovi already sets them for you, however, depending on what you need, you may need to change them.

Navigate to the Reference level section. You can now change your reference level. Let's change the Household income to £10,000 to £14,000. This will change your results.

Here you might want to consider creating new variables that you can add to your model (one that you might consider is one that measures those who have above or below the median household income).

In the regression output, these will be the levels your results are compared to.

5.

Remove Variables that are not significant

Let's have a quick look at the p-values in our model. When a variable comes back with non-significant p-values, consider removing them from your model

Looking at the above results, let's remove Household Income and Highest Education from our model as both have p-values that are higher than <.05.

6.

Generate Plots

We can also generate plots to visually represent the outcomes.

Navigate to Estimate Marginal Means.

Drag up to three the variables into each Marginal Means Terms.

Create more Terms for more graphs.

You can also select to add Marginal Means tables, which you can use in your work.

Looking at the above results, let's remove Household Income and Highest Education from our model as both have p-values that are higher than <.05.

## Linear Regression: Interpret your Results

1.

Check your results

Finally, let's look at the results of our model. The Adjusted RSquare is not great and only explains around 8.6% of the variation.

Looking at the Coefficients, we can see for example that as age increases the extremism score goes down by -0.014. Confidence intervals are also included in the model).

We can Also see that women have a mean score of -0.629 less than men again Confidence intervals are shown and should be mentioned in your write up.

Notice, that strain is no longer significant. This is noteworthy, as our theory indicates that it should be, so here you would have something to discuss.

Below we have generated a plot that displays the relationship between Extremism Score, Social Media Use, Strain and Gender

As you can see as the extremism score goes up as Social Media use goes up. You can also see that women have on average lower extremism score than men.

## Linear Regression: Check your Assumptions

1.

Check your Assumptions

Navigate to Assumptions Checks.

Select all of the applicable test. Your results will appear on your right.

As you can see from the output below, the normality assumption is violated. You can clearly see see that the residual plots show residuals which are not random and a Q-Q plot which does not follow the line.

Multicollinearity on the other falls within accepted parameters.

1.

The Assumptions are not met

You now have a number of options:

Consider if there are more predictor variables which can help predict your dependent variable. If so add them to the model and re-check if your assumptions are met.

Check your data for outliers that affect your data - transform variables where necessary.

Transform your variables using Logs - this is not covered as interpretation is more difficult.

Are there any curvy-linear relationships? If so add them.

Use a Quantile or Robust regression

Transform your dependent variable into an ordinal or nominal variable and use a logistic regression.