SQL Aggregate Functions Explained
This tutorial is part of the SQL Made Easy series of posts.
In this tutorial we will discuss:
- What is an aggregate function in SQL?
- COUNT Function
- SUM Function
- MIN Function
- MAX Function
- AVG Function
- Combining SQL Aggregate Functions with GROUP BY
What are SQL Aggregate Functions?
When working with data in SQL, there are a lot of ways we can filter down the data to bring in exactly what we are after. This could be a range of dates, a specific set of characters in a string, the top selling products etc
Sometimes however we want to quickly look at a single value, or count of values to do some checks rather than look at full rows of data based on filters.
This is where SQL Aggregate Functions come in. There are 5 commonly used functions that we will look at in detail, each with their own specific use cases and output. These are:
Function | Description |
COUNT() | Counts the total number of rows |
SUM() | Adds together the total values in a column |
MIN() | Provides the lowest value in a column |
MAX() | Provides the highest value in a column |
AVG() | Calculates the average of all values in a column |
These aggregate functions can be used on their own, typically when working with a single column such as sales, but can also be paired with the GROUP BY statement to look at the aggregations of multiple factors.
As purely looking at the sum or max values in an entire table may not be super helpful in a lot of cases, the aggregations can be paired with additional filters within a WHERE statement to help refine the data we are after.
These queries are relatively straight forward to work with. The syntax is typically as follows:
SELECT AGGREGATE_FUNCTION_NAME(COLUMN_NAME) AS OUTPUTNAME
FROM TABLE
The most important thing to note with this type of query is that the column name we are looking to work with sits inside the parentheses that follows the aggregate function name.
As we are not actually pulling in a column but an aggregation of the column the query output will also have no column name, so to make things as easy and readable as possible it is also recommended to use an alias for the outputname.
Lets look at examples now of all 5 common SQL Aggregate Functions in action.
For the examples used in this tutorial we are going to look at a dataset of top selling video games that is available on Kaggle.
SQL Count Function
In our database of the top selling video games, let’s say we wanted to quickly see how many of the titles were published by Nintendo.
To get this figure a simple count function will do the job. As we do not need to specify a single column and are just looking for the number of records in general, we can simply use the * symbol inside the parentheses. The syntax is as follows.
SQL Query:
SELECT COUNT(*) AS Count_Of_Titles
FROM vgsales
WHERE Publisher = 'Nintendo'
Result:
Count_Of_Titles |
---|
703 |
SQL SUM Function
Next up lets have a look at the SUM function.
In this example, let’s say we wanted to have a look at the total number of global sales for all Racing games that sit within our database.
Here we use the SUM() function with the Global_Sales column included in the parentheses as we are only looking at this specific column, and give it a name such as ‘Sum_Global_Sales. This will look like the below.
SQL Query:
SELECT SUM(Global_Sales) as Sum_Global_Sales
FROM vgsales
WHERE Genre = 'Racing'
Result:
Sum_Global_Sales |
---|
732.04 |
SQL MIN Function
Rather than just counting the values or adding everything together, what if we wanted to see how far back this database goes in terms of year? Because our year column in this dataset is simply a 4 character number we can work with this column and the MIN() function to see what the lowest number appearing is. See below.
SQL Query:
SELECT MIN(Year) as Earliest_Year
FROM vgsales
Result:
Earliest_Year |
---|
1980 |
SQL MAX Function
To keep things simple, lets do the opposite here and find the most recent year represented in our table. To do this we simply run the same query but with the MAX() function instead.
SQL Query:
SELECT MAX(Year) as Latest_Year
FROM vgsales
Result:
Latest_Year |
---|
2020 |
SQL AVG Function
Finally, lets work with the AVG function to calculate averages.
Let’s say we wanted to see what the average sales were like for the Xbox 360 globally. Here we will use the AVG() function and global_sales, while also using a filter on platform to ensure we are only averaging out the numbers for X360. See below.
SQL Query:
SELECT AVG(Global_Sales) as Average_X360_Sales
FROM vgsales
WHERE Platform = 'X360'
Result:
Average_X360_Sales |
---|
0.774671937 |
Note that these numbers are in millions so the console wasn’t really averaging less than 1 sale per game! There are ways around this using multiplication operators which you can read about below, but for the sake of this tutorial let’s keep things simple
SQL Aggregate Functions and SELECT DISTINCT
We can also pair additional statements within the functions parentheses which can help provide a more accurate picture of the tables contents – for example only counting unique values.
Let’s say we want to see how many different platforms have titles in our database. We could try running a basic COUNT() against the platform column such as below:
SQL Query:
SELECT COUNT(Platform) as Platforms
FROM vgsales
Result:
Platforms |
---|
16598 |
Clearly there is something wrong here! This output has provided us with the total number of records full stop, rather than the number of unique consoles.
This is where combining our COUNT function with SELECT DISTINCT will do the job. For this to work we simply include DISTINCT inside of the parentheses. See below.
SQL Query:
SELECT COUNT(DISTINCT Platform) as Platforms
FROM PracticeDB.dbo.vgsales
Result:
Platforms |
---|
31 |
31 platforms sounds far more realistic!
SQL GROUP BY combined with Aggregate Functions
As we can see, all of the examples we have worked with provided a single number. If we are working with quick calculations to bring in a single figure that we need this is perfect, but in a lot of cases we could do with some more info.
This is where the GROUP BY statement comes in really handy.
The SQL GROUP BY statement comes after the final WHERE filter, and allows us to group together numbers based on another column. For anyone familiar with Excel – it is pretty much like making a pivot table.
For this to work we also need to include that additional column at the start of the SELECT query.
Lets look at an example.
Our database of top selling video games contains over 16,000 titles across and as we saw earlier, this is made up of 31 unique platforms.
What if we wanted to quickly see how many appeared for each video game system? To get these figures we can use a quick COUNT() and GROUP BY query to bring in a simple table.
To get an idea on the most popular systems we can also pair this with ORDER BY in descending order against the count of titles column below, and limit to the top 10 records.
SQL Query:
SELECT Platform, COUNT(*) as Titles
FROM vgsales
GROUP BY Platform
ORDER BY Titles DESC
LIMIT 10;
Result:
Platform | Titles |
---|---|
DS | 2163 |
PS2 | 2161 |
PS3 | 1329 |
Wii | 1325 |
X360 | 1265 |
PSP | 1213 |
PS | 1196 |
PC | 960 |
XB | 824 |
GBA | 822 |
This sums up the most common examples of SQL Aggregate Functions. These can be very handy to pull in quick figures or create tables that can be exported and used in reporting.
For more tutorials in the SQL Made Easy series check out the below: