Open Addressing: Handling Collisions in Hash Tables

When two different keys map to the same index in a hash table, a collision occurs. Open addressing is a collision resolution technique where all elements are stored directly within the hash table array itself. Instead of using separate data structures (like linked lists in separate chaining), we probe for an alternative location within the table when a collision happens.

The Probing Process

In open addressing, when a collision occurs at index

code

h(key)

, we use a probing sequence to find the next available slot. This sequence is determined by a probing function. The general form of the probing function is

code

h(key, i)

, where

code

is the probe number (starting from 0). The goal is to find an empty slot for insertion or the target key for searching.

Linear Probing

Linear probing is the simplest open addressing method. If a collision occurs at index

code

h(key)

, we check the next slot

code

(h(key) + 1) % table_size

, then

code

(h(key) + 2) % table_size

, and so on, until an empty slot is found. The probing sequence is

code

h(key, i) = (h(key) + i) % table_size

Linear probing suffers from primary clustering, where occupied slots tend to form long contiguous runs, increasing search and insertion times.

Quadratic Probing

To mitigate primary clustering, quadratic probing uses a quadratic function to determine the next probe location. The probing sequence is typically

code

h(key, i) = (h(key) + c1*i + c2*i^2) % table_size

, where

code

c1

and

code

c2

are constants. A common choice is

code

h(key, i) = (h(key) + i^2) % table_size

Quadratic probing reduces primary clustering but can lead to secondary clustering.

While quadratic probing spreads out probes more than linear probing, keys that initially hash to the same location will follow the same probe sequence, leading to secondary clustering.

The probing sequence in quadratic probing is designed to jump further away from the initial collision point. For example, if the initial hash is h, the probes might be h, h+1, h+4, h+9, etc. (modulo table size). This helps to break up the long runs of occupied slots seen in linear probing. However, if two keys hash to the same initial index, they will follow the exact same sequence of probes, causing secondary clustering.

Double Hashing

Double hashing aims to eliminate both primary and secondary clustering by using a second hash function to determine the step size for probing. The probing sequence is

code

h(key, i) = (h1(key) + i * h2(key)) % table_size

, where

code

h1

is the primary hash function and

code

h2

is a secondary hash function. Crucially,

code

h2(key)

should never return 0 and should be relatively prime to the table size to ensure all slots can be probed.

The effectiveness of double hashing relies on the independence of the two hash functions. h1(key) determines the initial slot, and h2(key) determines the stride or step size for subsequent probes. A good h2 function will produce different step sizes for keys that might otherwise follow the same probe sequence. For example, h2(key) = R - (key % R), where R is a prime number smaller than the table size, is a common choice.

📚

Text-based content

Library pages focus on text content

Comparison of Open Addressing Techniques

Feature	Linear Probing	Quadratic Probing	Double Hashing
Clustering	Primary Clustering	Secondary Clustering	Minimal Clustering
Probing Function	(h(key) + i) % m	(h(key) + c1i + c2i^2) % m	(h1(key) + i * h2(key)) % m
Complexity	Simpler to implement	More complex than linear	Most complex to implement
Performance	Can degrade significantly with high load factors	Better than linear, but still susceptible to clustering	Generally offers the best performance

Load Factor and Performance

The performance of open addressing techniques is highly dependent on the load factor (α), which is the ratio of the number of elements to the table size. As the load factor approaches 1, the probability of collisions increases dramatically, leading to longer probe sequences and degraded performance. It's generally recommended to keep the load factor below 0.7 for open addressing to maintain good performance. Rehashing (resizing the table and re-inserting elements) is often employed when the load factor becomes too high.

What is the primary disadvantage of linear probing?

Primary clustering, where occupied slots form contiguous runs.

Which open addressing technique uses a second hash function to determine the step size?

Double Hashing.

What is the load factor of a hash table?

The ratio of the number of elements to the table size.

Learning Resources

Open Addressing - GeeksforGeeks(blog)

Provides a clear explanation of open addressing, including linear probing, quadratic probing, and double hashing with examples and code snippets.

Hash Table Collision Resolution: Open Addressing(documentation)

Details various collision resolution techniques, with a dedicated section on open addressing methods like linear, quadratic, and double hashing.

Algorithms - Hash Tables (Open Addressing) - YouTube(video)

A visual explanation of open addressing in hash tables, covering linear probing, quadratic probing, and double hashing with clear diagrams.

Introduction to Hash Tables and Open Addressing(paper)

A PDF document from Carnegie Mellon University that covers hash tables, including a section on open addressing and its probing techniques.

Hash Table - Wikipedia(wikipedia)

The Wikipedia page for hash tables, which includes a comprehensive overview of collision resolution strategies, including open addressing.

Data Structures: Hash Tables - Open Addressing(blog)

Programiz offers a beginner-friendly explanation of hash tables and collision handling, with a focus on open addressing methods.

Algorithms - Hash Tables - Open Addressing(blog)

Javatpoint provides a detailed explanation of open addressing, including the mathematical formulas for linear probing, quadratic probing, and double hashing.

Understanding Hash Table Collisions: Linear Probing(blog)

This article from freeCodeCamp explains the concepts of hash table collisions and the different open addressing techniques with practical examples.

Data Structures and Algorithms: Hash Tables(documentation)

A comprehensive guide to hash tables, covering their implementation, operations, and collision resolution methods like open addressing.

Algorithms - Hash Table(paper)

Lecture notes from Princeton University's COS 226 course, which includes a thorough discussion on hash tables and open addressing techniques.

Open Addressing: Linear Probing, Quadratic Probing, Double Hashing

Open Addressing: Handling Collisions in Hash Tables

The Probing Process

Linear Probing

Quadratic Probing

Quadratic probing reduces primary clustering but can lead to secondary clustering.

Double Hashing

Comparison of Open Addressing Techniques

Load Factor and Performance

Learning Resources