LibraryOpen Addressing: Linear Probing, Quadratic Probing, Double Hashing

Open Addressing: Linear Probing, Quadratic Probing, Double Hashing

Learn about Open Addressing: Linear Probing, Quadratic Probing, Double Hashing as part of GATE Computer Science - Algorithms and Data Structures

Open Addressing: Handling Collisions in Hash Tables

When two different keys map to the same index in a hash table, a collision occurs. Open addressing is a collision resolution technique where all elements are stored directly within the hash table array itself. Instead of using separate data structures (like linked lists in separate chaining), we probe for an alternative location within the table when a collision happens.

The Probing Process

In open addressing, when a collision occurs at index

code
h(key)
, we use a probing sequence to find the next available slot. This sequence is determined by a probing function. The general form of the probing function is
code
h(key, i)
, where
code
i
is the probe number (starting from 0). The goal is to find an empty slot for insertion or the target key for searching.

Linear Probing

Linear probing is the simplest open addressing method. If a collision occurs at index

code
h(key)
, we check the next slot
code
(h(key) + 1) % table_size
, then
code
(h(key) + 2) % table_size
, and so on, until an empty slot is found. The probing sequence is
code
h(key, i) = (h(key) + i) % table_size
.

Linear probing suffers from primary clustering, where occupied slots tend to form long contiguous runs, increasing search and insertion times.

Quadratic Probing

To mitigate primary clustering, quadratic probing uses a quadratic function to determine the next probe location. The probing sequence is typically

code
h(key, i) = (h(key) + c1*i + c2*i^2) % table_size
, where
code
c1
and
code
c2
are constants. A common choice is
code
h(key, i) = (h(key) + i^2) % table_size
.

Quadratic probing reduces primary clustering but can lead to secondary clustering.

While quadratic probing spreads out probes more than linear probing, keys that initially hash to the same location will follow the same probe sequence, leading to secondary clustering.

The probing sequence in quadratic probing is designed to jump further away from the initial collision point. For example, if the initial hash is h, the probes might be h, h+1, h+4, h+9, etc. (modulo table size). This helps to break up the long runs of occupied slots seen in linear probing. However, if two keys hash to the same initial index, they will follow the exact same sequence of probes, causing secondary clustering.

Double Hashing

Double hashing aims to eliminate both primary and secondary clustering by using a second hash function to determine the step size for probing. The probing sequence is

code
h(key, i) = (h1(key) + i * h2(key)) % table_size
, where
code
h1
is the primary hash function and
code
h2
is a secondary hash function. Crucially,
code
h2(key)
should never return 0 and should be relatively prime to the table size to ensure all slots can be probed.

The effectiveness of double hashing relies on the independence of the two hash functions. h1(key) determines the initial slot, and h2(key) determines the stride or step size for subsequent probes. A good h2 function will produce different step sizes for keys that might otherwise follow the same probe sequence. For example, h2(key) = R - (key % R), where R is a prime number smaller than the table size, is a common choice.

📚

Text-based content

Library pages focus on text content

Comparison of Open Addressing Techniques

FeatureLinear ProbingQuadratic ProbingDouble Hashing
ClusteringPrimary ClusteringSecondary ClusteringMinimal Clustering
Probing Function(h(key) + i) % m(h(key) + c1i + c2i^2) % m(h1(key) + i * h2(key)) % m
ComplexitySimpler to implementMore complex than linearMost complex to implement
PerformanceCan degrade significantly with high load factorsBetter than linear, but still susceptible to clusteringGenerally offers the best performance

Load Factor and Performance

The performance of open addressing techniques is highly dependent on the load factor (α), which is the ratio of the number of elements to the table size. As the load factor approaches 1, the probability of collisions increases dramatically, leading to longer probe sequences and degraded performance. It's generally recommended to keep the load factor below 0.7 for open addressing to maintain good performance. Rehashing (resizing the table and re-inserting elements) is often employed when the load factor becomes too high.

What is the primary disadvantage of linear probing?

Primary clustering, where occupied slots form contiguous runs.

Which open addressing technique uses a second hash function to determine the step size?

Double Hashing.

What is the load factor of a hash table?

The ratio of the number of elements to the table size.

Learning Resources

Open Addressing - GeeksforGeeks(blog)

Provides a clear explanation of open addressing, including linear probing, quadratic probing, and double hashing with examples and code snippets.

Hash Table Collision Resolution: Open Addressing(documentation)

Details various collision resolution techniques, with a dedicated section on open addressing methods like linear, quadratic, and double hashing.

Algorithms - Hash Tables (Open Addressing) - YouTube(video)

A visual explanation of open addressing in hash tables, covering linear probing, quadratic probing, and double hashing with clear diagrams.

Introduction to Hash Tables and Open Addressing(paper)

A PDF document from Carnegie Mellon University that covers hash tables, including a section on open addressing and its probing techniques.

Hash Table - Wikipedia(wikipedia)

The Wikipedia page for hash tables, which includes a comprehensive overview of collision resolution strategies, including open addressing.

Data Structures: Hash Tables - Open Addressing(blog)

Programiz offers a beginner-friendly explanation of hash tables and collision handling, with a focus on open addressing methods.

Algorithms - Hash Tables - Open Addressing(blog)

Javatpoint provides a detailed explanation of open addressing, including the mathematical formulas for linear probing, quadratic probing, and double hashing.

Understanding Hash Table Collisions: Linear Probing(blog)

This article from freeCodeCamp explains the concepts of hash table collisions and the different open addressing techniques with practical examples.

Data Structures and Algorithms: Hash Tables(documentation)

A comprehensive guide to hash tables, covering their implementation, operations, and collision resolution methods like open addressing.

Algorithms - Hash Table(paper)

Lecture notes from Princeton University's COS 226 course, which includes a thorough discussion on hash tables and open addressing techniques.