Creating and Managing Kafka Topics
Topics are the fundamental unit of organization in Apache Kafka. They represent a category or feed name to which records are published. Understanding how to create and manage topics is crucial for building robust real-time data pipelines.
What is a Kafka Topic?
A Kafka topic is a named stream of records. Producers write records to topics, and consumers read records from topics. Topics are partitioned, meaning each topic is divided into one or more partitions. Each partition is an ordered, immutable sequence of records.
Topics are the primary way data is organized and accessed in Kafka.
Think of a topic as a named channel or a log file where related messages are stored. Producers send messages to specific topics, and consumers subscribe to topics to receive those messages.
In Kafka, data is organized into topics. A topic is a category or a feed name to which records are published. Producers publish records to topics, and consumers subscribe to topics to consume these records. Topics are partitioned to allow for parallel processing and scalability. Each partition is an ordered, immutable sequence of records, identified by an offset. This partitioning strategy is key to Kafka's high throughput and fault tolerance.
Creating a Kafka Topic
Topics can be created manually using the Kafka command-line tools or programmatically. When a producer attempts to send a message to a non-existent topic, Kafka can be configured to automatically create the topic (auto-topic creation). However, manual creation offers more control over topic configurations.
To serve as a named stream or feed for records, enabling producers to publish and consumers to subscribe to data.
Manual Topic Creation (Command Line)
The
kafka-topics.sh
Loading diagram...
Example command:
bin/kafka-topics.sh --create --topic my-new-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1
Key Topic Configuration Parameters
Parameter | Description | Default Value |
---|---|---|
partitions | The number of partitions for the topic. More partitions allow for higher parallelism but can increase overhead. | 1 |
replication.factor | The number of copies of each partition that will be maintained across brokers for fault tolerance. Must be less than or equal to the number of brokers. | 1 |
cleanup.policy | Determines how old data is removed. Common policies are 'delete' (removes old segments based on time or size) and 'compact' (keeps the latest value for each key). | delete |
segment.bytes | The maximum size of a log segment file before it is rolled over. | 1GB |
retention.ms | The amount of time Kafka will retain data in a topic partition before it is deleted (if cleanup.policy is 'delete'). | 604800000 ms (7 days) |
Managing Existing Topics
Kafka provides tools to describe, list, alter, and delete topics. These operations are essential for maintaining and optimizing your Kafka cluster.
Describing and Listing Topics
You can view details about existing topics or list all topics in the cluster.
List all topics:
bin/kafka-topics.sh --list --bootstrap-server localhost:9092
Describe a specific topic:
bin/kafka-topics.sh --describe --topic my-new-topic --bootstrap-server localhost:9092
Altering Topics
Certain topic configurations can be altered after creation, such as increasing the number of partitions or changing retention policies. Note that decreasing partitions is generally not supported.
Example: Increase partitions for a topic:
bin/kafka-topics.sh --alter --topic my-new-topic --bootstrap-server localhost:9092 --partitions 5
Increasing the number of partitions is a common operation to scale throughput, but it's irreversible for existing partitions.
Deleting Topics
Topics can be deleted using the
kafka-topics.sh
Example: Delete a topic:
bin/kafka-topics.sh --delete --topic my-new-topic --bootstrap-server localhost:9092
You cannot decrease the number of partitions for an existing topic.
Best Practices for Topic Management
Choose the right number of partitions based on your expected throughput and consumer parallelism. Set an appropriate replication factor for fault tolerance. Configure retention policies carefully to balance data availability and storage costs.
A Kafka topic is logically divided into partitions. Each partition is an ordered, immutable sequence of records. Producers append records to partitions, and consumers read from partitions. The number of partitions determines the maximum parallelism for consumers reading from that topic. A replication factor ensures that each partition has copies on multiple brokers, providing fault tolerance. If a broker fails, another broker with a replica can take over.
Text-based content
Library pages focus on text content
Learning Resources
The official Apache Kafka documentation provides a comprehensive overview of topics, including their structure, partitioning, and management.
A clear and concise explanation of Kafka topics, their purpose, and how they function within the Kafka ecosystem.
Details on using the Kafka command-line interface (CLI) tools for creating, listing, describing, altering, and deleting topics.
A blog post that delves into the intricacies of Kafka topic partitioning, explaining its importance for scalability and performance.
A reference guide to all available topic configuration parameters and their effects on topic behavior.
An excerpt from a popular book that provides in-depth knowledge about Kafka topics and their management.
Information about Kafka Manager (now CMAK), a popular tool for managing Kafka clusters, including topic operations.
Explains how replication factors contribute to Kafka's fault tolerance and data durability for topics.
A detailed comparison of Kafka's topic cleanup policies, helping users decide between 'delete' and 'compact'.
The official introduction to Apache Kafka, covering its core concepts including topics, producers, and consumers.