Running Kafka Brokers and ZooKeeper/KRaft

To effectively use Apache Kafka for real-time data engineering, understanding how to run its core components – Kafka Brokers and the coordination service (ZooKeeper or KRaft) – is fundamental. This section will guide you through the essential concepts and practical considerations for setting up and managing these components.

Understanding the Roles

Kafka brokers are the workhorses of the system, responsible for storing and serving data. The coordination service (historically ZooKeeper, now increasingly KRaft) manages the Kafka cluster's metadata, including broker registration, topic configurations, and leader election.

Component	Primary Role	Key Responsibilities
Kafka Broker	Data Storage & Serving	Storing topic partitions, serving produce/consume requests, replicating data
ZooKeeper/KRaft	Cluster Coordination	Managing broker metadata, controller election, topic/partition leadership, access control

ZooKeeper Mode: The Traditional Approach

For many years, Kafka relied on Apache ZooKeeper for cluster coordination. In this mode, ZooKeeper nodes form their own ensemble, and Kafka brokers register with and receive instructions from this ensemble.

ZooKeeper's role is to maintain the cluster's state and ensure consistency.

ZooKeeper acts as a distributed coordination service. It stores critical metadata like broker IDs, topic configurations, and partition leader information. Kafka brokers connect to ZooKeeper to register themselves and discover other brokers.

In a ZooKeeper-managed Kafka cluster, a minimum of three ZooKeeper nodes are typically recommended for fault tolerance. Each Kafka broker needs to be configured with the ZooKeeper connection string. ZooKeeper handles leader election for partitions, ensuring that only one broker is the leader for a given partition at any time. This prevents data inconsistencies and ensures reliable message delivery. However, managing a separate ZooKeeper ensemble adds operational overhead.

What is the minimum recommended number of ZooKeeper nodes for a fault-tolerant ensemble?

Three

KRaft Mode: The Modern, ZooKeeper-less Approach

Apache Kafka has evolved to include KRaft (Kafka Raft metadata mode), which allows Kafka to manage its own metadata without relying on ZooKeeper. This simplifies deployment and operations significantly.

KRaft eliminates the need for a separate ZooKeeper cluster.

KRaft uses the Raft consensus protocol internally to manage Kafka's metadata. Kafka brokers can act as both data brokers and metadata controllers, reducing the number of components to manage and simplifying the overall architecture.

In KRaft mode, Kafka brokers participate in a Raft quorum to elect a controller and manage metadata. This controller is responsible for tasks like partition leadership, topic creation, and configuration changes. KRaft mode requires a minimum of 3 brokers to form a quorum for metadata management. This approach streamlines operations, reduces operational complexity, and offers potential performance benefits by co-locating metadata management with data serving. It's the recommended path forward for new Kafka deployments.

KRaft mode is the future of Kafka deployment, offering a simpler, more integrated architecture by removing the ZooKeeper dependency.

Running Kafka Brokers: Configuration Essentials

Regardless of whether you're using ZooKeeper or KRaft, configuring your Kafka brokers involves several key parameters. These settings dictate how brokers operate, connect to the coordination service, and manage data.

Key Kafka Broker Configuration Parameters:

broker.id: A unique integer identifier for each broker.
listeners: Defines the network interfaces and ports brokers listen on for client connections (e.g., PLAINTEXT://:9092).
advertised.listeners: The addresses clients should use to connect to the broker, especially important in distributed or containerized environments.
zookeeper.connect (ZooKeeper mode): The connection string for the ZooKeeper ensemble (e.g., zk1:2181,zk2:2181,zk3:2181).
process.roles (KRaft mode): Specifies the roles of the broker, e.g., broker,controller or broker.
node.id (KRaft mode): The unique identifier for the broker in KRaft mode.
controller.quorum.voters (KRaft mode): A comma-separated list of node.id:port for the controller quorum.
log.dirs: The directories where Kafka stores log segments (partition data).
num.partitions: Default number of partitions for auto-created topics.
default.replication.factor: Default replication factor for auto-created topics.

📚

Text-based content

Library pages focus on text content

Deployment Considerations

When deploying Kafka brokers and the coordination service, consider scalability, fault tolerance, and monitoring. For production environments, running multiple brokers and coordination nodes is crucial. Tools like Docker, Kubernetes, or managed Kafka services can simplify deployment and management.

Loading diagram...

Monitoring and Maintenance

Effective monitoring of Kafka brokers and the coordination service is vital for maintaining a healthy data pipeline. Key metrics include broker health, topic throughput, consumer lag, and ZooKeeper/KRaft quorum status. Regular maintenance, such as log segment management and configuration updates, ensures optimal performance and stability.

Learning Resources

Kafka: Running Kafka with ZooKeeper(documentation)

Official Apache Kafka documentation detailing how to run Kafka brokers in conjunction with ZooKeeper.

Kafka: Running Kafka without ZooKeeper (KRaft)(documentation)

The definitive guide from Apache Kafka on setting up and running Kafka using the KRaft protocol, eliminating ZooKeeper.

Apache Kafka Configuration Reference(documentation)

A comprehensive list and explanation of all available Kafka configuration parameters, essential for broker setup.

ZooKeeper Administrator's Guide(documentation)

Essential reading for understanding ZooKeeper's operational aspects, crucial when running Kafka in ZooKeeper mode.

Kafka KRaft: The Future of Kafka(blog)

An insightful blog post explaining the benefits and transition to KRaft mode for Kafka deployments.

Kafka Broker Configuration Explained(blog)

A practical guide to understanding and tuning critical Kafka broker configuration settings.

Getting Started with Apache Kafka(tutorial)

A hands-on tutorial to get a basic Kafka cluster running, covering broker setup.

Kafka Monitoring Best Practices(blog)

Learn about key metrics and strategies for effectively monitoring your Kafka cluster.

Running Kafka on Kubernetes(blog)

A guide on deploying and managing Kafka clusters using Kubernetes, a common production pattern.

Apache Kafka(wikipedia)

A general overview of Apache Kafka, its architecture, and its role in real-time data processing.

Running Kafka Brokers and Zookeeper/KRaft