SQL Primary Key Explained
This tutorial is part of the SQL Made Easy series of posts.
When first learning about SQL and relational databases in general, one of the terms you are likely to come across when looking at the data is Primary Key. This is a really important concept to learn about early on as it will make life a lot easier when needing to work with multiple databases and tables.
What is a SQL Primary Key?
Put simply – a Primary Key in SQL is a column within a table that is the single identifier of records.
There are two key rules that must be met in order for a column to be a primary key –
- There can not be any duplicates. Each record must be completely unique.
- There can not be any NULL values. In order to be a key record there must be data present.
What is the benefit of a Primary Key?
There are a couple of reasons why having a primary key within a table is valuable. Lets look at both in detail.
Firstly, it helps when we are analyzing the data in a table and want to work with distinct values. Knowing that there is a column that is guaranteed to be a unique identifier can make life a fair bit easier if we include it in our queries. There is always the option of working with the SELECT DISTINCT function, but this requires us having a specific filter in mind to work with for that column, and will sometimes require aggregation of figures to work correctly.
If we have the primary key column in our table we can be confident that if this is included in a query as part of the SELECT statement, then there will be no missing or incorrectly aggregated records.
Secondly, it makes a world of difference when using joins. Working with relational databases will often require the joining of multiple tables together in order to collect all of the data needed for analysis, but there are challenges along the way when making sure that the results of the queries are correct.
One issue that can occur frequently is known as many to many. This concept isn’t an issue in itself, but for example, in SQL when you run a join query on a column that has duplicates in the selected column in the right table, this will result in the rows of the query result being duplicated as many times as needed to bring in the new values.
Working with primary keys helps prevent this from occurring as you can be confident that the values you are joining on are distinct by default.
What can be a Primary Key?
SQL Primary Keys can consist of just about anything! If you have a table with statistics for each state in the USA, where there is just one line per state then ‘State’ can be the primary key. If you have a table of total daily sales, with one row per day, then ‘date’ can be the primary key. In some cases it can just be as simple as a column with record number – ‘Row ID’ for example.
SQL Primary Key vs Foreign Key
Another term that you will likely come across when learning about SQL is a Foreign Key.
This concept isn’t as commonly discussed, but still important – especially when working with joins.
In short – a Foreign Key is a column in a table that can be joined on to a Primary Key in another table.
Let’s look at a quick example. Going back to our USA locations example, let’s say we have two tables. in Table 1 we have a table of every city in the USA, with statistics such as population. For this table ‘City Name’ will be the Primary Key, but we also have a column listing the state this city falls in.
Table 2 will be our table we discussed earlier that has 50 records – one for each state. This may include a range of state specific metrics. In this table ‘State’ is our Primary Key, but it also acts as a Foreign Key to our cities table as this is the column that we can join on.
In this example both columns have the exact same name – ‘State’ – but they do not need to match. The SQL syntax can handle this if they have different names. This is why marking something as a Foreign Key can be useful.
With this example we can easily join the columns from the State table on to the City table using this foreign key, and because the values in the State table are unique we will not have to worry about duplication occurring in our query result.
How do we create a Primary Key?
Finally, lets have a look at the SQL syntax required to set up a Primary Key.
If you are importing a flat file into a system such as SQL Server, the on screen wizard will often have a check box you can mark for Primary Key. In this case you need to ensure the values are unique in advance.
When creating a table from scratch with a single Primary Key column using SQL Syntax the code will look as follows:
CREATE TABLE table_name (
column1 datatype PRIMARY KEY
column2 datatype
);
Using the US States example with State name as the primary key, and an additional column for population, it would look like this:
CREATE TABLE US_States (
State VARCHAR (255) PRIMARY KEY
Population INT
);
If you already have a table that does not have a column defined as the Primary Key, you can easily update this using the ALTER command. Using the US State example again this would be as simple as the below:
ALTER TABLE US_States
ADD PRIMARY KEY(State);
Note that for this to work you need to make sure the column is already set to not allow NULLs, and each value must be unique.
This sums up our intro to Primary Keys. For more tutorials in the SQL Made Easy series check out the below: