
Primary Keys
A primary key is a column or set of columns that serves as the unique identifier for each row in a table. It enforces entity integrity, which is the rule that every row must be uniquely identifiable.
Core Purposes
- Uniqueness: Ensures each record can be uniquely identified and accessed
- Data integrity: Prevents duplicate records from being created
- Relational foundation: Provides a reliable point of reference for other tables to establish relationships
- Indexing: Automatically creates an index that speeds up data retrieval operations
Technical Characteristics
- Must contain a unique value for each record
- Cannot contain NULL values
- Should be immutable (rarely changed after creation)
- Is automatically indexed by the database management system
- Can be a single column (simple primary key) or multiple columns (composite primary key)
Example in an E-commerce Database
Consider a Customers
table in an e-commerce system:
CREATE TABLE Customers (
customer_id INT NOT NULL AUTO_INCREMENT,
email VARCHAR(100) NOT NULL,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
registration_date DATE NOT NULL,
PRIMARY KEY (customer_id)
);
In this example, customer_id
is the primary key. Each customer will have a unique ID automatically assigned by the database when a new record is created. Even if two customers share the same name, they will always have distinct customer_id
values.
Foreign Keys
Simple Definition
A foreign key is a column or set of columns in one table that establishes a link between the data in two tables. It creates and enforces a referential integrity constraint between the tables.
Core Purposes
- Creating relationships: Connects data between separate tables
- Referential integrity: Prevents orphaned records (records that reference non-existent data)
- Data consistency: Ensures that references across tables remain valid
- Logical structure: Expresses real-world relationships in database design
Technical Characteristics
- References a primary key (or unique key) in another table
- Can contain NULL values (unless explicitly constrained otherwise)
- May be part of a compound key that serves as the primary key in its own table
- Can have associated referential actions (CASCADE, SET NULL, etc.)
- Is not automatically indexed (though indexing is recommended)
Example in an E-commerce Database
Continuing with our e-commerce example, let's create an Orders
table that references the Customers
table:
CREATE TABLE Orders (
order_id INT NOT NULL AUTO_INCREMENT,
customer_id INT NOT NULL,
order_date DATETIME NOT NULL,
total_amount DECIMAL(10,2) NOT NULL,
PRIMARY KEY (order_id),
FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);
Here, customer_id
in the Orders
table is a foreign key that references the primary key customer_id
in the Customers
table. This establishes a relationship: each order belongs to one specific customer.
Primary Keys vs. Foreign Keys
Syntax
Primary Keys:
-- Method 1: During table creation
CREATE TABLE Products (
product_id INT NOT NULL AUTO_INCREMENT,
product_name VARCHAR(100) NOT NULL,
PRIMARY KEY (product_id)
);
-- Method 2: Altering an existing table
ALTER TABLE Products
ADD PRIMARY KEY (product_id);
Foreign Keys:
-- Method 1: During table creation
CREATE TABLE OrderItems (
item_id INT NOT NULL AUTO_INCREMENT,
order_id INT NOT NULL,
product_id INT NOT NULL,
quantity INT NOT NULL,
PRIMARY KEY (item_id),
FOREIGN KEY (order_id) REFERENCES Orders(order_id),
FOREIGN KEY (product_id) REFERENCES Products(product_id)
);
-- Method 2: Altering an existing table
ALTER TABLE OrderItems
ADD CONSTRAINT fk_product
FOREIGN KEY (product_id) REFERENCES Products(product_id);
The key differences in implementation are:
- Primary keys are simpler to define
- Foreign keys require specifying the reference table and column
- Foreign keys can have additional options (ON DELETE, ON UPDATE)
Uniqueness and NULL Value Constraints
Primary Keys:
- Must contain unique values; no duplicates allowed
- Cannot contain NULL values (SQLite is an exception)
- Automatically enforces these constraints
Foreign Keys:
- Can contain duplicate values (many records can reference the same parent record)
- Can contain NULL values (unless NOT NULL is specified)
- Values must exist in the referenced table's primary key column (or be NULL)
Cardinality Relationships
Primary Keys facilitate:
- One-to-one relationships (when both tables have a primary key that references the other)
- One-to-many relationships (when combined with foreign keys)
Foreign Keys facilitate:
- One-to-many relationships (most common use case)
- Many-to-many relationships
Example of a many-to-many relationship:
-- Products can belong to multiple categories, and categories can contain multiple products
CREATE TABLE ProductCategories (
product_id INT NOT NULL,
category_id INT NOT NULL,
PRIMARY KEY (product_id, category_id),
FOREIGN KEY (product_id) REFERENCES Products(product_id),
FOREIGN KEY (category_id) REFERENCES Categories(category_id)
);
Performance Implications
Primary Keys:
- Automatically indexed, improving query performance
- Optimized for uniqueness checks and frequent joins
- Clustered index in some database systems (physically orders data)
- Minimal storage overhead
Foreign Keys:
- Not automatically indexed (manual indexing recommended)
- Additional overhead during data modification to check constraints
- Can slow down write operations (inserts, updates, deletes)
- Improves query optimizer plans by providing relationship metadata
Modification Restrictions
Primary Keys:
- Difficult to change once data exists in the table
- Changing requires updating all foreign keys that reference it
- Can't be deleted if referenced by foreign keys
Foreign Keys:
- Can be modified if the new value exists in the referenced table
- Can have automatic cascading actions defined:
ON DELETE CASCADE
: Deletes child records when parent is deletedON UPDATE CASCADE
: Updates child records when parent key changesSET NULL
: Sets child keys to NULL when parent is deleted
CREATE TABLE OrderItems (
item_id INT NOT NULL AUTO_INCREMENT,
order_id INT NOT NULL,
product_id INT NOT NULL,
PRIMARY KEY (item_id),
FOREIGN KEY (order_id) REFERENCES Orders(order_id)
ON DELETE CASCADE,
FOREIGN KEY (product_id) REFERENCES Products(product_id)
ON DELETE RESTRICT
ON UPDATE CASCADE
);
Visual Representation
The diagram above illustrates the relationships between tables in our e-commerce database example:
-
The
CUSTOMERS
table has a primary key (customer_id
) that is referenced by the foreign key in theORDERS
table, creating a one-to-many relationship (one customer can place many orders). -
Similarly, the
ORDERS
table's primary key (order_id
) is referenced by theORDER_ITEMS
table, establishing another one-to-many relationship (one order can contain many items). -
The
PRODUCTS
table's primary key (product_id
) is referenced by theORDER_ITEMS
table, creating a one-to-many relationship (one product can be in many order items). -
The
PRODUCT_CATEGORIES
table demonstrates a many-to-many relationship between products and categories, implemented with composite primary key that consists of two foreign keys.
Common Misconceptions and Mistakes
Primary Key Misconceptions
-
Using natural data as primary keys
- Misconception: Using business data like email addresses or phone numbers as primary keys is a good practice.
- Reality: Business data can change. Using surrogate keys (like UUID or auto-incrementing integers) provides stability.
-
Composite primary keys are always problematic
- Misconception: You should always use single-column primary keys.
- Reality: While simpler, single-column keys aren't always appropriate. Composite keys can better represent certain relationships.
-
Primary keys must be integers
- Misconception: Only numeric values can be primary keys.
- Reality: Any data type can be used, though numeric types offer performance advantages.
Foreign Key Misconceptions
-
Foreign keys severely hurt performance
- Misconception: Foreign key constraints significantly slow down databases.
- Reality: While there is overhead, the benefits to data integrity often outweigh the performance impact. Proper indexing minimizes the impact.
-
Foreign keys must have the same column name as their referenced primary keys
- Misconception: If a primary key is called
id
, the foreign key must also be calledid
. - Reality: Foreign keys can have any name; it's the relationship definition that matters.
- Misconception: If a primary key is called
-
Foreign keys always protect against orphaned records
- Misconception: Having foreign keys automatically prevents all orphaned records.
- Reality: Without proper constraints (e.g.,
ON DELETE
actions), foreign keys can still lead to referential integrity issues.
Best Practices for Key Design
Primary Key Selection
-
Use surrogate keys for most tables
- Prefer system-generated values (
IDENTITY
,SEQUENCE
,UUID
) over business data - Auto-incrementing integers or GUIDs are typically best choices
- Prefer system-generated values (
-
Keep primary keys simple and stable
- Avoid using columns that might need to change
- Prefer single-column keys when possible
-
Consider the future
- Choose data types that allow for growth (e.g.,
INT
instead ofSMALLINT
) - Plan for potential data volume increases
- Choose data types that allow for growth (e.g.,
Foreign Key Design
-
Always index foreign key columns
CREATE INDEX idx_orders_customer_id ON Orders(customer_id);
-
Choose appropriate referential actions
- CASCADE: Use when child records should follow parent record actions
- RESTRICT/NO ACTION: Use when preventing orphaned records is critical
- SET NULL: Use when child records can exist independently
-
Name constraints meaningfully
ALTER TABLE Orders ADD CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES Customers(customer_id);
-
Consider nullable vs. non-nullable
- Make foreign keys NOT NULL when the relationship is mandatory
- Allow NULL when the relationship is optional
Scalability Considerations
-
Primary key data types impact scalability
INT
(4 bytes): Supports up to ~2.1 billion recordsBIGINT
(8 bytes): Supports virtually unlimited recordsUUID/GUID
: Supports distributed systems better but has higher storage requirements
-
Consider sharding implications
- Auto-incrementing keys can be problematic in sharded databases
- Use UUIDs or composite keys for distributed systems
-
Performance at scale
- Join performance depends on key types and indexing
- Composite keys increase index size and may impact performance