Get your Data Ready
Here you will learn the following:
How clean an existing dataset using Excel;
how to deal with missing values; and
how to create a Codebook using Excel.
Often you will already have a dataset (whether from Qualtrics, SurveyMonkey or MS Forms). Here we will teach you how to get your dataset ready for analysis. This is an extremely important step, but it can be very time-consuming.
If you need to create a dataset from scratch, then you can do so in Jamovi, this is covered here. Sometimes you may want to clean a dataset that on initial inspection looks ready. You can easily download your Jamovi dataset as a .csv file.
In the example below, we will be using a .csv file opened in Excel. However, you can use a multitude of programs such as Keynote. If you know how to code you can automate some of the data cleaning functions using R or Python (this is however not covered here). What your dataset looks like and how much cleaning is required depends on where your dataset is from.
For this example, you should have a basic knowledge of how to use Excel.
Skoczylis, Joshua, 2021, "Extremism, Life Experiences and the Internet", https://doi.org/10.7910/DVN/ICTI8T, Harvard Dataverse, Version 3.
Here we will be using the raw data for the dataset above. You can access the raw data here.
The code book available on the Harvard Dataverse will also be very helpful for this exercise here.
Cleaning your Dataset(s): Step-by-Step Guide
Using Find & Replace values
Filtering Data in Excel
Deleted Variables (columns) that are not required
Before you delete any variables make sure you do the following:
Save a copy of your Raw data - just in case something goes wrong
Go through your variables and decide which ones you will need and which ones you can delete.
The image below is raw data from Qualtrics. As you can see there are a lot of unnecessary variables.
Now select and delete all that are unnecessary or that you will not need.
Select Columns > Right-click > Delete
Rename Variables (where necessary)
Depending on the dataset you have got, you may want to rename your variables.
If you have conducted your own survey, many of the variables will have the Question as the column name. This is not great. Change them
e.g. 'What is your age' can be changed to 'Age' or
'Select your Gender' can be changed to 'Gender'
Keep your variables as short as possible - but give it a name that makes some sense.
Keep a record of the questions asked, as you can add that information into your codebook later or into the description box in Jamovi.
Delete Unnecessary rows (Header related)
Often you may have a number of rows, that give you information about your column that is unnecessary. Delete them, so that you are only left with one row that contains the Variable Name.
Select > RIght Click > Delete
Remove Observations (rows)
If your dataset is survey data, you will have many observations (rows) that are only partially completed. Most survey platforms will give you an extra column that tells you how much they have completed.
Decide on a cut of point and delete any observations below the agreed threshold in our case we have decided to remove observations that have completed less than 80%.
Navigate to the Data Tab > Select all your data
Toggle the Filter on
Navigate to your progress variable.
You can either sort it Ascending/Descending
Now highlight all rows that are below 80% and delete them.
Now delete the rows. Also delete any header rows that you no longer need. You should be left with only one header row.
When you download your data, make sure you look up what the missing value is. Often the missing value is -99, but do check.
This value indicates that an observation is missing. Jamovi recognises 'blanks' and 'na' as missing data. This means we have to convert the missing value of '-99' to 'na'.
You will notice that most datasets will also have blank cells - this may be because the participant was never shown certain questions. In some instances these may be treated as missing data - just leave it blank. In other instances, you may want to place different values in.
To do this do the following:
Navigate to the Search icon (top-right) and select the downward arrow
Select Replace. In the box add '-99' into the replace field and 'na' or 'blank' into the Replace with field
Click on Replace all.
All your missing values have now been replace with 'na'
Create a Codebook: Step-by-Step Guide
A codebook will allow everyone to make sense of your dataset. Codebooks should include the following:
list of the variable names
variable descriptions - this could be the questions
Type of data
Group details e.g. Male (0)/Female (1)
Having a good codebook is essential. The below video shows you how to easily create a codebook in Excel.
Creating a Codebook
by Courtney Donovan