SQL Aggregate Functions Explained

SQL Aggregate Functions Explained

This tutorial is part of the SQL Made Easy series of posts.

In this tutorial we will discuss:

  • What is an aggregate function in SQL?
  • COUNT Function
  • SUM Function
  • MIN Function
  • MAX Function
  • AVG Function
  • Combining SQL Aggregate Functions with GROUP BY

What are SQL Aggregate Functions?

When working with data in SQL, there are a lot of ways we can filter down the data to bring in exactly what we are after. This could be a range of dates, a specific set of characters in a string, the top selling products etc

Sometimes however we want to quickly look at a single value, or count of values to do some checks rather than look at full rows of data based on filters.

This is where SQL Aggregate Functions come in. There are 5 commonly used functions that we will look at in detail, each with their own specific use cases and output. These are:

FunctionDescription
COUNT()Counts the total number of rows
SUM()Adds together the total values in a column
MIN()Provides the lowest value in a column
MAX()Provides the highest value in a column
AVG()Calculates the average of all values in a column

These aggregate functions can be used on their own, typically when working with a single column such as sales, but can also be paired with the GROUP BY statement to look at the aggregations of multiple factors.

As purely looking at the sum or max values in an entire table may not be super helpful in a lot of cases, the aggregations can be paired with additional filters within a WHERE statement to help refine the data we are after.

These queries are relatively straight forward to work with. The syntax is typically as follows:

SELECT AGGREGATE_FUNCTION_NAME(COLUMN_NAME) AS OUTPUTNAME
FROM TABLE

The most important thing to note with this type of query is that the column name we are looking to work with sits inside the parentheses that follows the aggregate function name.

As we are not actually pulling in a column but an aggregation of the column the query output will also have no column name, so to make things as easy and readable as possible it is also recommended to use an alias for the outputname.

Lets look at examples now of all 5 common SQL Aggregate Functions in action.

For the examples used in this tutorial we are going to look at a dataset of top selling video games that is available on Kaggle.

SQL Count Function

In our database of the top selling video games, let’s say we wanted to quickly see how many of the titles were published by Nintendo.

To get this figure a simple count function will do the job. As we do not need to specify a single column and are just looking for the number of records in general, we can simply use the * symbol inside the parentheses. The syntax is as follows.

SQL Query:

SELECT COUNT(*) AS Count_Of_Titles
FROM vgsales
WHERE Publisher = 'Nintendo'

Result:

Count_Of_Titles
703

SQL SUM Function

Next up lets have a look at the SUM function.

In this example, let’s say we wanted to have a look at the total number of global sales for all Racing games that sit within our database.

Here we use the SUM() function with the Global_Sales column included in the parentheses as we are only looking at this specific column, and give it a name such as ‘Sum_Global_Sales. This will look like the below.

SQL Query:

SELECT SUM(Global_Sales) as Sum_Global_Sales
FROM vgsales
WHERE Genre = 'Racing'

Result:

Sum_Global_Sales
732.04

SQL MIN Function

Rather than just counting the values or adding everything together, what if we wanted to see how far back this database goes in terms of year? Because our year column in this dataset is simply a 4 character number we can work with this column and the MIN() function to see what the lowest number appearing is. See below.

SQL Query:

SELECT MIN(Year) as Earliest_Year
FROM vgsales

Result:

Earliest_Year
1980

SQL MAX Function

To keep things simple, lets do the opposite here and find the most recent year represented in our table. To do this we simply run the same query but with the MAX() function instead.

SQL Query:

SELECT MAX(Year) as Latest_Year
FROM vgsales

Result:

Latest_Year
2020

SQL AVG Function

Finally, lets work with the AVG function to calculate averages.

Let’s say we wanted to see what the average sales were like for the Xbox 360 globally. Here we will use the AVG() function and global_sales, while also using a filter on platform to ensure we are only averaging out the numbers for X360. See below.

SQL Query:

SELECT AVG(Global_Sales) as Average_X360_Sales
FROM vgsales
WHERE Platform = 'X360'

Result:

Average_X360_Sales
0.774671937

Note that these numbers are in millions so the console wasn’t really averaging less than 1 sale per game! There are ways around this using multiplication operators which you can read about below, but for the sake of this tutorial let’s keep things simple

SQL Operators

SQL Aggregate Functions and SELECT DISTINCT

We can also pair additional statements within the functions parentheses which can help provide a more accurate picture of the tables contents – for example only counting unique values.

Let’s say we want to see how many different platforms have titles in our database. We could try running a basic COUNT() against the platform column such as below:

SQL Query:

SELECT COUNT(Platform) as Platforms
FROM vgsales

Result:

Platforms
16598

Clearly there is something wrong here! This output has provided us with the total number of records full stop, rather than the number of unique consoles.

This is where combining our COUNT function with SELECT DISTINCT will do the job. For this to work we simply include DISTINCT inside of the parentheses. See below.

SQL Query:

SELECT COUNT(DISTINCT Platform) as Platforms
FROM PracticeDB.dbo.vgsales

Result:

Platforms
31

31 platforms sounds far more realistic!

SQL GROUP BY combined with Aggregate Functions

As we can see, all of the examples we have worked with provided a single number. If we are working with quick calculations to bring in a single figure that we need this is perfect, but in a lot of cases we could do with some more info.

This is where the GROUP BY statement comes in really handy.

The SQL GROUP BY statement comes after the final WHERE filter, and allows us to group together numbers based on another column. For anyone familiar with Excel – it is pretty much like making a pivot table.

For this to work we also need to include that additional column at the start of the SELECT query.

Lets look at an example.

Our database of top selling video games contains over 16,000 titles across and as we saw earlier, this is made up of 31 unique platforms.

What if we wanted to quickly see how many appeared for each video game system? To get these figures we can use a quick COUNT() and GROUP BY query to bring in a simple table.

To get an idea on the most popular systems we can also pair this with ORDER BY in descending order against the count of titles column below, and limit to the top 10 records.

SQL Query:

SELECT Platform, COUNT(*) as Titles
FROM vgsales
GROUP BY Platform
ORDER BY Titles DESC
LIMIT 10;

Result:

PlatformTitles
DS2163
PS22161
PS31329
Wii1325
X3601265
PSP1213
PS1196
PC960
XB824
GBA822

This sums up the most common examples of SQL Aggregate Functions. These can be very handy to pull in quick figures or create tables that can be exported and used in reporting.

For more tutorials in the SQL Made Easy series check out the below: