
This is a series of articles about database keys:
- How to Choose Database UUID? (this one)
- How to Choose between UUID and Auto Increment Integer / Serial as the Primary Key?
What Are UUIDs and Why Use Them in Database?
A UUID (Universally Unique Identifier) is a 128-bit value designed to uniquely identify information in distributed systems without requiring a central coordination mechanism. UUIDs are typically represented as 32 hexadecimal digits displayed in five groups separated by hyphens: 123e4567-e89b-12d3-a456-426614174000
.
Why use UUIDs in databases?
- Distributed systems: Generate IDs without coordination between nodes
- No sequential leaks: Unlike auto-incrementing IDs, UUIDs don't reveal information about record counts
- Merge-friendly: Helpful when merging data from different database instances
- Pregeneration: IDs can be created before database insertion
- Consistent API design: Same ID format across different resources
However, UUIDs come with tradeoffs. They consume more storage (16 bytes vs 4-8 bytes for integers) and can impact indexing performance.
Let's explore three popular UUID versions,UUIDv1
, UUIDv4
, and UUIDv7
and their implications for database usage.
UUIDv1: The Time-Based Pioneer
How UUIDv1 Works
UUIDv1 is generated using the current timestamp combined with the MAC address of the computer's network interface. Its structure includes:
- A 60-bit timestamp (measured in 100-nanosecond intervals since October 15, 1582)
- A 16-bit clock sequence (to avoid duplicates when the clock is set backward)
- A 48-bit node identifier (typically the MAC address)
Pros of UUIDv1
- Chronologically sortable: UUIDv1s can be sorted by creation time, making range queries potentially more efficient
- Guaranteed uniqueness: The combination of timestamp, clock sequence, and node ID ensures practical uniqueness
- Deterministic: Given the same inputs, UUIDv1 generation is reproducible
- Performance: Generation is computationally inexpensive (no need for cryptographically secure random numbers)
Cons of UUIDv1
- Privacy concerns❗️: Embeds the MAC address, potentially exposing network information
- Poor index performance: Despite being time-sortable, the most significant bits change rapidly, causing index fragmentation
- Non-sequential writes: The timestamp bits are not stored in a database-friendly order
- Security implications: Predictability can be a security risk in some contexts
- Requires system clock: Vulnerable to clock skew in distributed systems
Real-World Use Cases for UUIDv1
- Legacy systems that require backward compatibility
- Applications where chronological sorting is required but security concerns are minimal
- Systems where efficiency of ID generation is prioritized over other concerns
UUIDv4: The Random Approach
How UUIDv4 Works
UUIDv4 is generated using random or pseudo-random numbers. It's essentially a 128-bit random number with a few specific bits set to indicate the version and variant:
- Bit 6 of byte 8 is set to 0b100 (the UUID version, 4)
- Bit 6 of byte 9 is set to 0b10 (the UUID variant)
Pros of UUIDv4
- Maximum unpredictability: Provides strong protection against ID guessing attacks
- No privacy leakage: Contains no information about the generating system
- No system clock dependency: Generation doesn't rely on the system clock
- Truly distributed: Can be generated anywhere without coordination
- Widely supported: Available in most programming languages and databases
Cons of UUIDv4
- Not sortable: Random nature means they don't preserve insertion order
- Poor database performance: Random distribution causes index fragmentation and poor cache locality
- Higher collision possibility: Though extremely unlikely, has higher theoretical collision risk than time-based UUIDs in high-volume systems
- More intensive generation: Requires cryptographically secure random number generation
Real-World Use Cases for UUIDv4
- Public-facing IDs where security and unpredictability are priorities
- Microservice architectures where ID generation needs to be fully distributed
- Multi-master database setups that require conflict-free ID generation
- Applications where privacy concerns outweigh performance considerations
UUIDv7: The Modern Solution
How UUIDv7 Works
UUIDv7 is one of the newer UUID versions, designed to address the limitations of earlier versions. It combines the sortability of time-based UUIDs with the unpredictability of random UUIDs. The structure includes:
- A 48-bit Unix timestamp (milliseconds since January 1, 1970)
- A 74-bit random number
- Version and variant bits
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| unix_ts_ms |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| unix_ts_ms | ver | rand_a |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var| rand_b |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| rand_b |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Pros of UUIDv7
- Time-sortable: The most significant bits are a Unix timestamp, making them chronologically sortable
- Database-friendly: Sequential generation leads to better index performance
- No privacy leakage: Unlike UUIDv1, doesn't include MAC address
- Reduced collision risk: Combines timestamp with random data
- Modern design: Addresses known issues with earlier UUID versions
- Smaller timestamp: More efficient than UUIDv1's 60-bit timestamp
Cons of UUIDv7
- Newer standard: Less widely supported in languages and frameworks
- Relies on system clock: Though less problematic than UUIDv1, still depends on correct system time
- Less random than UUIDv4: The timestamp portion is predictable
- Millisecond precision: Only millisecond precision (vs. 100-nanosecond in UUIDv1)
Real-World Use Cases for UUIDv7
- Modern database applications where both performance and security matter
- Systems requiring time-ordered UUIDs without the privacy concerns of UUIDv1
- High-throughput applications that need efficient indexing
- New projects without legacy compatibility requirements
Comparison Table: UUIDv1 vs UUIDv4 vs UUIDv7
Feature | UUIDv1 | UUIDv4 | UUIDv7 |
---|---|---|---|
Storage size | 16 bytes | 16 bytes | 16 bytes |
Generation based on | Time + MAC address | Random | Time + Random |
Time-sortable | ✅ (but in non-ideal order) | ❌ | ✅ (optimized for databases) |
Privacy | ❌ (exposes MAC) | ✅ (fully private) | ✅ (fully private) |
Index performance | ⚠️ (poor, despite sortability) | ❌ (worst) | ✅ (best) |
Generation speed | ✅ (fastest) | ⚠️ (slowest) | ⚠️ (moderate) |
Security (unpredictability) | ❌ (most predictable) | ✅ (most unpredictable) | ⚠️ (partially predictable) |
Collision resistance | ✅ (in practice) | ✅ (theoretical risk in extreme volumes) | ✅ (in practice) |
Distributed generation | ✅ | ✅ | ✅ |
Clock dependency | ✅ (high) | ❌ (none) | ✅ (moderate) |
Wide adoption | ✅ | ✅ | ⚠️ (growing) |
Conclusion
- UUIDv1 is a legacy choice that offers time-sortability but comes with privacy concerns.
- UUIDv4 provides maximum unpredictability but at the cost of database performance
- UUIDv7 offers the best of both worlds with time-sortability and privacy, making it the recommended choice for most new applications
For most modern applications, UUIDv7 provides the best balance of features, addressing the weaknesses of both UUIDv1 and UUIDv4.