SQL Wildcard Characters Explained

SQL Wildcard Characters Explained

This tutorial is part of the SQL Made Easy series of posts.

In this tutorial we will discuss:

  • What is a SQL Wildcard Character?
  • Working with SQL Wildcard Characters using the LIKE operator
  • Example SQL Queries

What is a SQL Wildcard Character?

Wildcard characters in SQL are a series of specific characters that are used within the LIKE operator as part of a WHERE filter. They are a great way to find specific characters and strings within a dataset where there may be variations that we aren’t completely aware of, or maybe there are 5-10 different variations we do know of but want to ensure we cover without adding a high number of additional ‘WHERE COLUMN =’ statements.

For more information on the various operators used in SQL see the following article: SQL Operators

There are a range of SQL wildcard characters available to use, however there are two that are far and away the most commonly used, especially by beginners. These are:

CharacterDescription
%Percent Symbol. Matches against a set of characters in a string, regardless of overall length
_Underscore. Each underscore represents one character in the length of a string. Used when looking for a specific number of characters

These symbols can be used in different positions of a statement, depending on what kind of output you are looking for.

Starting with the percent symbol, there are three commonly used approaches where the % symbol appears at the end of a string, beginning of a string, or wraps the string. Lets break this down, looking at finding data containing the letter L.

Wildcard ExampleDescription
‘L%’Searches for values beginning with ‘L’
‘%L’Searches for values ending with ‘L’
‘%L%’Searches for values containing the letter ‘L’ anywhere

Moving on to the underscore symbol, there are a few ways that this can be used. In these examples we are looking for ways to find a string containing ‘London’.

Wildcard ExampleDescription
‘Lon___’Searches for values beginning with ‘Lon’ and contains three more characters
‘Lon_on’Searches for values beginning with ‘Lon’, ending with ‘on’ and containing a single unknown value as the fourth character
‘__ndon’Searches for two unknown characters at the beginning of a string, ending with ‘ndon’

The specific number of underscores is helpful here. If we ran with ‘Lon%’ it would include words such as ‘Long’ in the result, but knowing the word we are looking for is exactly 6 characters this helps us to eliminate results we do not want to appear.

Finally, the two symbols can work together in a statement. This is useful when you want to specify specific characters in something that might be part of a longer string. Using our London example, the overall string might be ‘London, United Kingdom’. The below combination will work here:

Wildcard ExampleDescription
‘_ondon%’Searches for a string where the characters ‘ondon’ begin from the second character, but can contain an unspecified number of additional characters

Using the SQL LIKE operator with wildcard characters

Let’s look at a couple of quick examples of the SQL Wildcard in action using full queries.

For the examples used in this tutorial we are going to look at a dataset of top selling video games that is available on Kaggle.

Let’s say we want to generate a list of the top 10 selling games that begin with ‘Mario’. We Would run the below query.

SQL Query:

SELECT Name, Platform, Year, Global_Sales
FROM vgsales
WHERE Name LIKE 'Mario%'
LIMIT 10;

Result:

NamePlatformYearGlobal_Sales
Mario Kart WiiWii200835.82
Mario Kart DSDS200523.42
Mario Kart 73DS201112.21
Mario Kart 64N6419969.87
Mario Party DSDS20079.02
Mario Party 8Wii20078.42
Mario & Sonic at the Olympic GamesWii20078.06
Mario Kart 8WiiU20146.96
Mario Kart: Double Dash!!GC20036.95
Mario Kart: Super CircuitGBA20015.47

Not every Mario game however begins with ‘Mario’ though – after all the above results didn’t include any ‘Super Mario’ games. In this case lets run with the % wildcard wrapping the name rather than just including it at the end.

SQL Query:

SELECT Name, Platform, Year, Global_Sales
FROM vgsales
WHERE Name LIKE '%Mario%'
LIMIT 10;

Result:

NamePlatformYearGlobal_Sales
Super Mario Bros.NES198540.24
Mario Kart WiiWii200835.82
New Super Mario Bros.DS200630.01
New Super Mario Bros. WiiWii200928.62
Mario Kart DSDS200523.42
Super Mario WorldSNES199020.61
Super Mario LandGB198918.14
Super Mario Bros. 3NES198817.28
Mario Kart 73DS201112.21
Super Mario 64N64199611.89

Much better! This gives a far more accurate picture of the top 10 selling Mario games as it includes any game with his name in the title no matter where it sits.

Lets have a look at an example now using the underscore character. A good use case for this could be looking at a specific set of years. What if we wanted to just look at the top selling games of the 1990’s? We know from our dataset that the year column is always 4 characters, so we can be specific with the use of a single underscore such as ‘199_’ as the fourth character can be anything from 0 – 9 in the data we are after. See below example.

SQL Query:

SELECT Name, Year, Global_Sales
FROM vgsales
WHERE year LIKE '199_'
LIMIT 10;

Result:

NameYearGlobal_Sales
Pokemon Red/Pokemon Blue199631.37
Pokemon Gold/Pokemon Silver199923.1
Super Mario World199020.61
Pokémon Yellow: Special Pikachu Edition199814.64
Super Mario 64199611.89
Super Mario Land 2: 6 Golden Coins199211.18
Gran Turismo199710.95
Super Mario All-Stars199310.55
Mario Kart 6419969.87
Final Fantasy VII19979.72

These SQL Wildcard characters can be very powerful when working with messy data. In the real world not everything will be as clean as a single column for city that can be filtered to ‘London’, or a set of years that are just 4 characters.

By using the % and _ symbols alongside the SQL LIKE operator, we can very easily filter down our data to rows containing the specific information we are looking for, while allowing for a lot of flexibility to take into account variations or even typos.

This sums up the most common use cases of the SQL Wildcard characters. As mentioned, there are a few others (such as [], ^ and -), but these two covered will be used the majority of the time.

For more tutorials in the SQL Made Easy series check out the below: