Skip to main content

SQL Primary Key: A Comprehensive Technical Tutorial

Understand what an SQL primary key is and its function in database relationships and query performance in this technical tutorial.
Aug 7, 2025  · 10 min read

In relational databases, the primary key plays a crucial role in ensuring the uniqueness and integrity of data within a table. A primary key is a constraint that uniquely identifies each record in a table, acting as a foundational element for indexing, enforcing entity integrity, and establishing relationships between tables.

In this tutorial, I’ll walk you through an in-depth exploration of SQL primary keys, covering core principles, implementation methods, and optimization techniques. I’ll also cover some common mistakes and best practices, as well as real-world scenarios. If you’re still learning SQL, I recommend you check out our comprehensive SQL roadmap and take our SQL Fundamentals skill track

What is an SQL Primary Key?

An SQL primary key is a column or a set of columns that uniquely identifies each row in a table. This means that no two rows can have the same primary key value. A table can have only one primary key, which may consist of one or multiple columns.

Primary keys are vital for enforcing data integrity and enabling efficient querying. Let’s look at some foundational aspects.

Primary keys are essential for database normalization and relational integrity.

Key properties of primary keys

  1. Uniqueness: Ensures that every value in the primary key column(s) is unique.
  2. Non-nullability: Primary key columns cannot contain NULL values.
  3. Immutability: Ideally, primary key values should remain unchanged throughout their lifespan to avoid inconsistencies in referencing tables.

Constraints and indexing

When a primary key is defined, the database automatically creates a unique index to enforce the constraint. This index accelerates query performance, especially for lookup operations.

Through this index system, primary keys help the database engine in execution plan optimization. This will result in your queries running faster.

Common data types for primary keys

Primary keys can range from several types of data. Here are some various types being used:

  • INTEGER or SERIAL: Efficient and most commonly used.
  • BIGINT: Useful for tables expected to store billions of rows.
  • UUID: Good for distributed systems and avoiding sequential predictability.
  • CHAR/VARCHAR: Used cautiously, typically in natural keys where necessary.

SQL Primary Keys Types and Architecture

Primary keys can be classified into various types based on their structure and origin. Some examples are:

  • Natural keys: Derived from existing data (e.g., Social Security Number, email) and are easy to understand and often meaningful to business users. However, this brings the risk of updates, duplication, and potential privacy issues.
  • Surrogate keys: Artificial keys, such as system-generated IDs, that are usually implemented with SERIAL, BIGSERIAL, or UUID. Using such keys offers stability, privacy, and easier maintenance.

Comparing them based on features, we can reveal their differences in the following summary table:

Feature

Natural Key

Surrogate Key

Business Meaning

Yes

No

Stability

Often unstable

Highly stable

Performance

May require large fields

Generally compact

Privacy Concerns

Higher

Lower

Composite primary key architecture

A composite primary key is a common way to generate a unique key in tables. The key is typically derived from the data found in two or more columns.

Such keys are common in junction or associative tables where the relationship itself requires uniqueness.

  • Example: A table course_enrollments might use a composite key of student_id and course_id.

Considerations:

  • Minimalism: Avoid unnecessary columns in the composite key.
  • Complexity: Joins and foreign key references become more complex.
  • Index Size: Indexes can become large and affect performance.

Additional keys used in databases

Besides the primary key, there are several other keys that can be used, such as:

  • Candidate key: A set of attributes that can uniquely identify a row.
  • Super key: Any superset of a candidate key (includes additional columns).
  • Alternate key: A candidate key that is not chosen as the primary key.

If you’re deciding on which data field to use as your primary key, you can consider the following factors:

  • Data change frequency: How often does the data change?
  • Column data type and size: Is the data type suitable for indexing?
  • Expected query patterns: Will this field be suitable for querying?

Defining and Modifying Primary Keys in SQL

Next, let’s have a look at how we can declare and manage primary keys in PostgreSQL using SQL commands. 

We'll walk through various scenarios, including table creation, modifying existing tables, and managing constraint behavior.

Declaration and modification Syntax

During table creation

At the table creation stage, you should create fields that are likely to be found. In this case, we’ll generate an employee ID that consists of unique integers.

CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,  -- Automatically generates unique integers
    name TEXT NOT NULL,              -- Basic text field, required
    email TEXT UNIQUE                -- Unique constraint to prevent duplicate emails
);

More on SQL commands in our SQL Basics Cheat Sheet.

sql basics cheat sheet

This example demonstrates defining a primary key inline using the SERIAL pseudo-type. PostgreSQL treats SERIAL as an auto-incrementing integer column, ideal for surrogate keys. The email column is also given a UNIQUE constraint, allowing it to serve as an alternate key.

Using ALTER TABLE

Now, let’s set the employee_id as the primary key.

ALTER TABLE employees ADD PRIMARY KEY (employee_id);

This command is used when a table has already been created without a primary key and you want to add one afterward. Ensure the column you're adding the primary key to does not contain NULLs or duplicate values, or the command will fail.

Dropping primary key

With our primary key now set, you can also drop the primary key constraint.

ALTER TABLE employees DROP CONSTRAINT employees_pkey;

Use this statement to remove an existing primary key constraint. The constraint name employees_pkey follows PostgreSQL's default naming convention (<table>_pkey). This is helpful when redesigning the table schema or switching to a different primary key column.

In PostgreSQL, the default constraint name for primary keys is table_name_pkey.

Learn more about PostgreSQL in our cheat sheet.

postgresql cheat sheet

Null handling and collision management

NULL values are not permitted in primary key columns. Duplicate values will also cause constraint violation errors

Some strategies to manage this include:

  • Using INSERT ... ON CONFLICT to avoid duplicates.
  • Using transactions to check for key availability before inserts.

Auto-increment and global identifier strategies

PostgreSQL's SERIAL, BIGSERIAL, and IDENTITY types simplify ID generation.

Additionally, UUIDs provide safe global identifiers. Their non-sequential and unpredictable nature also enhances security, especially in preventing enumeration attacks. 

Here’s an implementation of how a UUID can be created and used as a primary key

CREATE EXTENSION IF NOT EXISTS "pgcrypto";
CREATE TABLE users (
    user_id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
    username TEXT NOT NULL
);

Primary Keys in Relations and Boosting Performance

Primary keys are essential for ensuring consistency across relational tables and boosting performance.

Entity integrity enforcement

Entity integrity is enforced through primary keys by disallowing NULLs and duplicates. Having a unique identifier helps prevent data anomalies during insert/update operations.

Foreign keys and referential integrity

Foreign keys must match existing primary key values in the referenced table. Referential integrity ensures that all references between tables remain valid and prevents "orphan records" where records reference non-existent data.

Cascading actions maintain consistency when referenced primary keys change. 

This may include common cascading functions like:

  • ON DELETE CASCADE
  • ON UPDATE CASCADE
  • SET NULL and SET DEFAULT

Indexing and query optimization

Unique indexes on primary keys speed up SELECT and JOIN queries. However, poor key design can lead to bloated indexes and degraded performance.

If your queries are slow and you suspect that it's due to bad indexing, use EXPLAIN ANALYZE to inspect query plans and assess index usage.

Design Best Practices and Potential Pitfalls

Good key design promotes maintainability, performance, and scalability. Let’s look at some best practices and common mistakes.

Key selection guidelines

  • Choose short, fixed-length data types (e.g., INT, UUID).
  • Avoid keys that are likely to change.
  • Ensure the key has no embedded business meaning to avoid logic entanglement.
  • Use surrogate keys for simplicity and normalization.

Common implementation errors

  • Using multiple primary keys (only one is allowed per table).
  • Mislabeling unique constraints as primary keys.
  • Failing to define primary keys, leading to orphaned records.

Fixes include:

  • Adding missing keys via ALTER TABLE.
  • Normalizing schema design.
  • Using consistent naming conventions.

Scalability considerations

When creating a database solution that scales, try to use surrogate keys to maintain uniformity across large datasets. Avoid composite keys in distributed systems unless necessary. Regular monitoring of your table/index size is also a must.

Advanced Implementation Scenarios

Handling primary keys in complex systems, like distributed databases or legacy systems, requires advanced strategies.

Distributed database systems

Distributed databases are more tricky. You’ll need to ensure key generation is conflict-free across nodes. For this, you can try using UUIDs and Centralized key management services

Partitioning Considerations:

  • Ensure keys do not create hotspots.
  • Use hashed or randomized keys when sharding.

Legacy system migration

When performing a system migration from a legacy database, first identify existing unique identifiers. Next, gradually introduce surrogate keys alongside natural keys.

Take a cautious approach by running dual-key support before fully transitioning.

Migration Process:

  1. Add surrogate key column.
  2. Populate keys for existing records.
  3. Modify application queries.
  4. Drop old natural keys after verification.

Demonstration of Primary Keys Using PostgreSQL

Before creating tables and inserting data, you need a working PostgreSQL environment. This section begins with a brief guide to setting up PostgreSQL, followed by instructions to define and use a sample dataset for practicing primary key concepts.

Step 1: Initial setup for PostgreSQL

If you haven't already installed PostgreSQL, follow these steps to get started:

  1. Install PostgreSQL
  2. Start the PostgreSQL service
    • After installation, the PostgreSQL service typically starts automatically.
    • You can also manually start it via your OS service manager (e.g., Services on Windows or systemctl on Linux).
  3. Launch pgAdmin or connect via CLI
    • Use pgAdmin to visually manage and query databases.
    • Alternatively, open a terminal and connect using:psql -U postgres
    • Replace postgres with your actual PostgreSQL username if different.
  4. Create a new database
    • In pgAdmin, right-click on "Databases" and select "Create > Database".

  • Or use the following SQL command:```sql

CREATE DATABASE pk_tutorial;

Connect to the new database in psql with:```bash
\c pk_tutorial

Once your environment is ready, you can proceed with the rest of the dataset setup as already outlined in the tutorial.

Next, let's look at a step-by-step guide to creating a sample dataset using PostgreSQL. The dataset consists of two related tables: customers and orders.

The customers table uses a surrogate key for simplicity, while the orders table uses a UUID-based primary key to illustrate global uniqueness.

Step 2: Creating tables

The first step is to define the schema and establish primary key constraints for both tables. We also define a foreign key relationship between orders.customer_id and customers.customer_id to demonstrate relational integrity.

For the customers table, the primary key is customer_id.

For the orders table, the primary key is order_id.

CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY, -- Auto-incremented surrogate key
    name TEXT NOT NULL,
    email TEXT UNIQUE -- Ensures no duplicate email addresses
);

CREATE TABLE orders (
    order_id UUID DEFAULT gen_random_uuid() PRIMARY KEY, -- Globally unique identifier
    order_date DATE NOT NULL,
    customer_id INT REFERENCES customers(customer_id) ON DELETE CASCADE -- Enforces referential integrity
);

Explanation:

  • customer_id uses the SERIAL keyword to generate unique IDs automatically.
  • email has a UNIQUE constraint to prevent duplicates.
  • order_id is a UUID which is auto-generated via gen_random_uuid().
  • The ON DELETE CASCADE clause ensures that deleting a customer also deletes their associated orders.

Here’s what that should look like in the pgAdmin interface:

creating tables

Step 3: Inserting sample data

Next, we populate the tables with sample records to observe how primary and foreign keys behave during typical insert operations. This data simulates a basic e-commerce scenario.

INSERT INTO customers (name, email) VALUES
('Alice Tan', '[email protected]'),
('Ben Koh', '[email protected]'),
('Clara Lim', '[email protected]');

INSERT INTO orders (order_date, customer_id) VALUES
('2025-07-01', 1),
('2025-07-02', 2),
('2025-07-03', 3);

Here’s what that should look like in the pgAdmin interface:

inserting data into tables

Step 4: Querying data

Finally, we perform a SQL JOIN query to retrieve information from both tables, demonstrating how the foreign key allows for seamless relational joins between orders and customers.

Here’s the code that we’ll be running:

SELECT o.order_id, o.order_date, c.name, c.email
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;

Here’s the sample output:

creating queries and joins

As you can see from the above image, the output table shows data from both tables. For example, the joined table includes the order_date field from the orders table and the name field from the customers table.

Conclusion

SQL primary keys are foundational to relational databases, enforcing uniqueness, enabling efficient queries, and supporting data integrity. 

We’ve covered the principles, classifications, syntax, best practices, and advanced strategies for SQL primary keys. For more learning resources, check out our Introduction to Relational Databases in SQL course or the Joining Data in SQL course.

If you prefer tutorial readings, the Integrity Constraints in SQL guide or the SQL Joins Cheat Sheet should be a great help as well.

SQL Primary Key FAQs

What are the differences between natural keys and surrogate keys?

Natural keys are derived from real-world data that already exists (e.g., email or national ID), while surrogate keys are system-generated values (e.g., auto-incremented IDs) used purely for uniquely identifying rows.

How do composite primary keys improve database design?

Composite primary keys use multiple columns to ensure uniqueness, which can better model relationships in junction tables and enforce multi-attribute uniqueness in many-to-many relationships.

What are the best practices for choosing a primary key?

A good primary key should be unique, stable (not change over time), minimal in size, and ideally numeric for performance and indexing efficiency.

How do primary keys contribute to data integrity?

Primary keys ensure that each row in a table is uniquely identifiable, preventing duplicate records and supporting accurate referencing through foreign keys.

Can you explain the role of primary keys in establishing relationships between tables?

Primary keys serve as the reference points for foreign keys in other tables, enabling relational integrity and supporting one-to-many or many-to-many relationships in a normalized schema.


Austin Chia's photo
Author
Austin Chia
LinkedIn

I'm Austin, a blogger and tech writer with years of experience both as a data scientist and a data analyst in healthcare. Starting my tech journey with a background in biology, I now help others make the same transition through my tech blog. My passion for technology has led me to my writing contributions to dozens of SaaS companies, inspiring others and sharing my experiences.

Topics

Top SQL Courses

Track

SQL Fundamentals

0 min
Master the SQL fundamentals needed for business, learn how to write SQL queries, and start analyzing your data using this powerful language.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

Tutorial

Joins in SQL Tutorial

This tutorial will explain how to join tables together using primary and foreign keys in an SQL Server.
DataCamp Team's photo

DataCamp Team

Tutorial

SQL String Functions: A Beginner's Guide

Understand how to use SQL String Functions to clean and process text data efficiently.
Eugenia Anello's photo

Eugenia Anello

Tutorial

Beginner's Guide to PostgreSQL

In this tutorial, you will learn how to write simple SQL queries in PostgreSQL.
Sayak Paul's photo

Sayak Paul

Tutorial

Introduction to Indexing in SQL

In this tutorial, learn about indexing in databases and different types of indexing techniques.
Sayak Paul's photo

Sayak Paul

Tutorial

SQL Database Overview Tutorial

In this tutorial, you'll learn about databases in SQL.
DataCamp Team's photo

DataCamp Team

code-along

SQL for Absolute Beginners

Start from the very basics of what SQL is and why it's essential, move through key components such as retrieving data from databases, manipulation of data, and basic SQL queries.
Adel Nehme's photo

Adel Nehme

See MoreSee More