SQL Wildcard Characters Explained
This tutorial is part of the SQL Made Easy series of posts.
In this tutorial we will discuss:
- What is a SQL Wildcard Character?
- Working with SQL Wildcard Characters using the LIKE operator
- Example SQL Queries
What is a SQL Wildcard Character?
Wildcard characters in SQL are a series of specific characters that are used within the LIKE operator as part of a WHERE filter. They are a great way to find specific characters and strings within a dataset where there may be variations that we aren’t completely aware of, or maybe there are 5-10 different variations we do know of but want to ensure we cover without adding a high number of additional ‘WHERE COLUMN =’ statements.
For more information on the various operators used in SQL see the following article: SQL Operators
There are a range of SQL wildcard characters available to use, however there are two that are far and away the most commonly used, especially by beginners. These are:
Character | Description |
% | Percent Symbol. Matches against a set of characters in a string, regardless of overall length |
_ | Underscore. Each underscore represents one character in the length of a string. Used when looking for a specific number of characters |
These symbols can be used in different positions of a statement, depending on what kind of output you are looking for.
Starting with the percent symbol, there are three commonly used approaches where the % symbol appears at the end of a string, beginning of a string, or wraps the string. Lets break this down, looking at finding data containing the letter L.
Wildcard Example | Description |
‘L%’ | Searches for values beginning with ‘L’ |
‘%L’ | Searches for values ending with ‘L’ |
‘%L%’ | Searches for values containing the letter ‘L’ anywhere |
Moving on to the underscore symbol, there are a few ways that this can be used. In these examples we are looking for ways to find a string containing ‘London’.
Wildcard Example | Description |
‘Lon___’ | Searches for values beginning with ‘Lon’ and contains three more characters |
‘Lon_on’ | Searches for values beginning with ‘Lon’, ending with ‘on’ and containing a single unknown value as the fourth character |
‘__ndon’ | Searches for two unknown characters at the beginning of a string, ending with ‘ndon’ |
The specific number of underscores is helpful here. If we ran with ‘Lon%’ it would include words such as ‘Long’ in the result, but knowing the word we are looking for is exactly 6 characters this helps us to eliminate results we do not want to appear.
Finally, the two symbols can work together in a statement. This is useful when you want to specify specific characters in something that might be part of a longer string. Using our London example, the overall string might be ‘London, United Kingdom’. The below combination will work here:
Wildcard Example | Description |
‘_ondon%’ | Searches for a string where the characters ‘ondon’ begin from the second character, but can contain an unspecified number of additional characters |
Using the SQL LIKE operator with wildcard characters
Let’s look at a couple of quick examples of the SQL Wildcard in action using full queries.
For the examples used in this tutorial we are going to look at a dataset of top selling video games that is available on Kaggle.
Let’s say we want to generate a list of the top 10 selling games that begin with ‘Mario’. We Would run the below query.
SQL Query:
SELECT Name, Platform, Year, Global_Sales
FROM vgsales
WHERE Name LIKE 'Mario%'
LIMIT 10;
Result:
Name | Platform | Year | Global_Sales |
---|---|---|---|
Mario Kart Wii | Wii | 2008 | 35.82 |
Mario Kart DS | DS | 2005 | 23.42 |
Mario Kart 7 | 3DS | 2011 | 12.21 |
Mario Kart 64 | N64 | 1996 | 9.87 |
Mario Party DS | DS | 2007 | 9.02 |
Mario Party 8 | Wii | 2007 | 8.42 |
Mario & Sonic at the Olympic Games | Wii | 2007 | 8.06 |
Mario Kart 8 | WiiU | 2014 | 6.96 |
Mario Kart: Double Dash!! | GC | 2003 | 6.95 |
Mario Kart: Super Circuit | GBA | 2001 | 5.47 |
Not every Mario game however begins with ‘Mario’ though – after all the above results didn’t include any ‘Super Mario’ games. In this case lets run with the % wildcard wrapping the name rather than just including it at the end.
SQL Query:
SELECT Name, Platform, Year, Global_Sales
FROM vgsales
WHERE Name LIKE '%Mario%'
LIMIT 10;
Result:
Name | Platform | Year | Global_Sales |
---|---|---|---|
Super Mario Bros. | NES | 1985 | 40.24 |
Mario Kart Wii | Wii | 2008 | 35.82 |
New Super Mario Bros. | DS | 2006 | 30.01 |
New Super Mario Bros. Wii | Wii | 2009 | 28.62 |
Mario Kart DS | DS | 2005 | 23.42 |
Super Mario World | SNES | 1990 | 20.61 |
Super Mario Land | GB | 1989 | 18.14 |
Super Mario Bros. 3 | NES | 1988 | 17.28 |
Mario Kart 7 | 3DS | 2011 | 12.21 |
Super Mario 64 | N64 | 1996 | 11.89 |
Much better! This gives a far more accurate picture of the top 10 selling Mario games as it includes any game with his name in the title no matter where it sits.
Lets have a look at an example now using the underscore character. A good use case for this could be looking at a specific set of years. What if we wanted to just look at the top selling games of the 1990’s? We know from our dataset that the year column is always 4 characters, so we can be specific with the use of a single underscore such as ‘199_’ as the fourth character can be anything from 0 – 9 in the data we are after. See below example.
SQL Query:
SELECT Name, Year, Global_Sales
FROM vgsales
WHERE year LIKE '199_'
LIMIT 10;
Result:
Name | Year | Global_Sales |
---|---|---|
Pokemon Red/Pokemon Blue | 1996 | 31.37 |
Pokemon Gold/Pokemon Silver | 1999 | 23.1 |
Super Mario World | 1990 | 20.61 |
Pokémon Yellow: Special Pikachu Edition | 1998 | 14.64 |
Super Mario 64 | 1996 | 11.89 |
Super Mario Land 2: 6 Golden Coins | 1992 | 11.18 |
Gran Turismo | 1997 | 10.95 |
Super Mario All-Stars | 1993 | 10.55 |
Mario Kart 64 | 1996 | 9.87 |
Final Fantasy VII | 1997 | 9.72 |
These SQL Wildcard characters can be very powerful when working with messy data. In the real world not everything will be as clean as a single column for city that can be filtered to ‘London’, or a set of years that are just 4 characters.
By using the % and _ symbols alongside the SQL LIKE operator, we can very easily filter down our data to rows containing the specific information we are looking for, while allowing for a lot of flexibility to take into account variations or even typos.
This sums up the most common use cases of the SQL Wildcard characters. As mentioned, there are a few others (such as [], ^ and -), but these two covered will be used the majority of the time.
For more tutorials in the SQL Made Easy series check out the below: