In this simple guide, we will run through how to find outliers in Excel using a few different methods. We will cover:
- Why we should find outliers in our excel datasets
- Find outliers using sort
- Find outliers using interquartile range
Why we should find outliers in our data
Excel can be a very powerful tool for data analysis. While using databases with tools such as SQL, R or Python is generally preferred for large datasets and more advanced methods, using a nice simple Excel spreadsheet can more than do the job in a lot of cases.
That being said, when working with data and using statistical methods to reach conclusions, we need to ensure that the dataset we are using in the first place is of a high quality. One thing that can impact our findings is outliers in our data – those few numbers that are simply far too low or far too high in comparison to the majority of other figures which can have an impact on our final output analysis.
This is where we need to find a nice easy way to remove this data before we begin any analysis.
In our guide, we will present two approaches – one that is quick but manual, and the other which involves a little more work but is great for larger datasets, or where we want to be one hundred percent sure a figure really is an outlier.
Lets start with the manual approach:
Finding Outliers in Excel using sort
Lets say we are working with a small dataset of under 50 rows and only a single column that we need to filter numbers out of. In this case it can be easier to just sort the numbers from smallest to largest, and have a quick scan to see if any of the lowest and highest seem obviously out of place.
For example, below we have a quick sample of numbers:
If we highlight all of these cells, and then under the data tab select Sort, we can choose to organise this column of numbers from smallest to largest:
This results in the below output:
Here we can quickly and easily pick out a few numbers that are abnormally low or high – in this case its pretty clear the bottom number is out of place.
That being said, a few aren’t necessarily easy to work out if they should be there. The top few are small, figures, but are single digit figures really small enough in this dataset to be outliers?
For examples like this, or when working with far larger datasets, this is where a statistical approach is a far better option.
Finding Outliers using Interquartile Range
Using interquartile range to calculate outliers in excel is easily the stronger of the two approaches, but it does involve a bit of a set up. First, we need to calculate 5 different numbers:
- First Quartile
- Third Quartile
- Interquartile Range
- Lowest acceptable number
- Highest acceptable number
From here we can either scan through each value to see if it sits above or below the acceptable range, or use another formula or conditional formatting to see which rows we need to eliminate.
Let’s create a quick table in Excel for the above 5 values, using the same dataset as above.
First, to calculate the first and third quartiles, we use the QUARTILE.INC function, followed by a number – in this case 1 and 3 for first and third.
The interquartile range is a nice easy formula where we simply subtract the third quartile from the first.
Next, to calculate the lowest and highest acceptable numbers before they become outliers, we use the following formula:
Lowest = First Quartile – (Interquartile Range * 1.5)
Highest = Third Quartile + (Interquartile Range * 1.5)
All of these formulas, along with the outputs can be seen below:
Now that we have our lowest and highest values to calculate outliers from, we have three ways we can find outliers
- Scan through each row and see if the number is higher or lower
- Use a formula
- Use conditional formatting
Scanning through each row is self explanatory, so lets start by looking at the formula. Here we can use an OR statement, and display a message on whether or not the value is in fact an outlier. In this case TRUE or FALSE. The OR formula is as follows:
Outputs using our dataset is below:
This nicely displays TRUE if a value is an outlier. What is really interesting here is 332 turns out to be an outlier, which wasn’t too obvious initially when scanning through it all.
While the TRUE and FALSE indication works very well, we can also create an output on top of the data itself that is more visual. The other approach is setting up conditional formatting. If we select Conditional Formatting in the home tab, and under the highlight cells rules for Greater than and Less Than type the relevant figures and colour you would like.
The output then displays as follows:
This provides a nice clear visual of which figures in our dataset are outliers. From here we can make a call to wipe them from our analysis.
This sums up our guide on how to find outliers in Excel. This now puts us in a position to conduct further analysis on our data with more confidence in the outputs with all outliers identified and filtered out.
For more handy guides on working with Excel, be sure to check out our Excel Tips page.