Open Addressing: Handling Collisions in Hash Tables
When two different keys map to the same index in a hash table, a collision occurs. Open addressing is a collision resolution technique where all elements are stored directly within the hash table array itself. Instead of using separate data structures (like linked lists in separate chaining), we probe for an alternative location within the table when a collision happens.
The Probing Process
In open addressing, when a collision occurs at index
h(key)
h(key, i)
i
Linear Probing
Linear probing is the simplest open addressing method. If a collision occurs at index
h(key)
(h(key) + 1) % table_size
(h(key) + 2) % table_size
h(key, i) = (h(key) + i) % table_size
Linear probing suffers from primary clustering, where occupied slots tend to form long contiguous runs, increasing search and insertion times.
Quadratic Probing
To mitigate primary clustering, quadratic probing uses a quadratic function to determine the next probe location. The probing sequence is typically
h(key, i) = (h(key) + c1*i + c2*i^2) % table_size
c1
c2
h(key, i) = (h(key) + i^2) % table_size
Quadratic probing reduces primary clustering but can lead to secondary clustering.
While quadratic probing spreads out probes more than linear probing, keys that initially hash to the same location will follow the same probe sequence, leading to secondary clustering.
The probing sequence in quadratic probing is designed to jump further away from the initial collision point. For example, if the initial hash is h
, the probes might be h
, h+1
, h+4
, h+9
, etc. (modulo table size). This helps to break up the long runs of occupied slots seen in linear probing. However, if two keys hash to the same initial index, they will follow the exact same sequence of probes, causing secondary clustering.
Double Hashing
Double hashing aims to eliminate both primary and secondary clustering by using a second hash function to determine the step size for probing. The probing sequence is
h(key, i) = (h1(key) + i * h2(key)) % table_size
h1
h2
h2(key)
The effectiveness of double hashing relies on the independence of the two hash functions. h1(key)
determines the initial slot, and h2(key)
determines the stride or step size for subsequent probes. A good h2
function will produce different step sizes for keys that might otherwise follow the same probe sequence. For example, h2(key) = R - (key % R)
, where R is a prime number smaller than the table size, is a common choice.
Text-based content
Library pages focus on text content
Comparison of Open Addressing Techniques
Feature | Linear Probing | Quadratic Probing | Double Hashing |
---|---|---|---|
Clustering | Primary Clustering | Secondary Clustering | Minimal Clustering |
Probing Function | (h(key) + i) % m | (h(key) + c1i + c2i^2) % m | (h1(key) + i * h2(key)) % m |
Complexity | Simpler to implement | More complex than linear | Most complex to implement |
Performance | Can degrade significantly with high load factors | Better than linear, but still susceptible to clustering | Generally offers the best performance |
Load Factor and Performance
The performance of open addressing techniques is highly dependent on the load factor (α), which is the ratio of the number of elements to the table size. As the load factor approaches 1, the probability of collisions increases dramatically, leading to longer probe sequences and degraded performance. It's generally recommended to keep the load factor below 0.7 for open addressing to maintain good performance. Rehashing (resizing the table and re-inserting elements) is often employed when the load factor becomes too high.
Primary clustering, where occupied slots form contiguous runs.
Double Hashing.
The ratio of the number of elements to the table size.
Learning Resources
Provides a clear explanation of open addressing, including linear probing, quadratic probing, and double hashing with examples and code snippets.
Details various collision resolution techniques, with a dedicated section on open addressing methods like linear, quadratic, and double hashing.
A visual explanation of open addressing in hash tables, covering linear probing, quadratic probing, and double hashing with clear diagrams.
A PDF document from Carnegie Mellon University that covers hash tables, including a section on open addressing and its probing techniques.
The Wikipedia page for hash tables, which includes a comprehensive overview of collision resolution strategies, including open addressing.
Programiz offers a beginner-friendly explanation of hash tables and collision handling, with a focus on open addressing methods.
Javatpoint provides a detailed explanation of open addressing, including the mathematical formulas for linear probing, quadratic probing, and double hashing.
This article from freeCodeCamp explains the concepts of hash table collisions and the different open addressing techniques with practical examples.
A comprehensive guide to hash tables, covering their implementation, operations, and collision resolution methods like open addressing.
Lecture notes from Princeton University's COS 226 course, which includes a thorough discussion on hash tables and open addressing techniques.