LibraryCreating and Managing Topics

Creating and Managing Topics

Learn about Creating and Managing Topics as part of Real-time Data Engineering with Apache Kafka

Creating and Managing Kafka Topics

Topics are the fundamental unit of organization in Apache Kafka. They represent a category or feed name to which records are published. Understanding how to create and manage topics is crucial for building robust real-time data pipelines.

What is a Kafka Topic?

A Kafka topic is a named stream of records. Producers write records to topics, and consumers read records from topics. Topics are partitioned, meaning each topic is divided into one or more partitions. Each partition is an ordered, immutable sequence of records.

Topics are the primary way data is organized and accessed in Kafka.

Think of a topic as a named channel or a log file where related messages are stored. Producers send messages to specific topics, and consumers subscribe to topics to receive those messages.

In Kafka, data is organized into topics. A topic is a category or a feed name to which records are published. Producers publish records to topics, and consumers subscribe to topics to consume these records. Topics are partitioned to allow for parallel processing and scalability. Each partition is an ordered, immutable sequence of records, identified by an offset. This partitioning strategy is key to Kafka's high throughput and fault tolerance.

Creating a Kafka Topic

Topics can be created manually using the Kafka command-line tools or programmatically. When a producer attempts to send a message to a non-existent topic, Kafka can be configured to automatically create the topic (auto-topic creation). However, manual creation offers more control over topic configurations.

What is the primary purpose of a Kafka topic?

To serve as a named stream or feed for records, enabling producers to publish and consumers to subscribe to data.

Manual Topic Creation (Command Line)

The

code
kafka-topics.sh
script is commonly used to manage topics. To create a topic, you specify the topic name, the number of partitions, and the replication factor.

Loading diagram...

Example command:

bash
bin/kafka-topics.sh --create --topic my-new-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1

Key Topic Configuration Parameters

ParameterDescriptionDefault Value
partitionsThe number of partitions for the topic. More partitions allow for higher parallelism but can increase overhead.1
replication.factorThe number of copies of each partition that will be maintained across brokers for fault tolerance. Must be less than or equal to the number of brokers.1
cleanup.policyDetermines how old data is removed. Common policies are 'delete' (removes old segments based on time or size) and 'compact' (keeps the latest value for each key).delete
segment.bytesThe maximum size of a log segment file before it is rolled over.1GB
retention.msThe amount of time Kafka will retain data in a topic partition before it is deleted (if cleanup.policy is 'delete').604800000 ms (7 days)

Managing Existing Topics

Kafka provides tools to describe, list, alter, and delete topics. These operations are essential for maintaining and optimizing your Kafka cluster.

Describing and Listing Topics

You can view details about existing topics or list all topics in the cluster.

List all topics:

bash
bin/kafka-topics.sh --list --bootstrap-server localhost:9092

Describe a specific topic:

bash
bin/kafka-topics.sh --describe --topic my-new-topic --bootstrap-server localhost:9092

Altering Topics

Certain topic configurations can be altered after creation, such as increasing the number of partitions or changing retention policies. Note that decreasing partitions is generally not supported.

Example: Increase partitions for a topic:

bash
bin/kafka-topics.sh --alter --topic my-new-topic --bootstrap-server localhost:9092 --partitions 5

Increasing the number of partitions is a common operation to scale throughput, but it's irreversible for existing partitions.

Deleting Topics

Topics can be deleted using the

code
kafka-topics.sh
script. This operation is irreversible and removes all data associated with the topic.

Example: Delete a topic:

bash
bin/kafka-topics.sh --delete --topic my-new-topic --bootstrap-server localhost:9092
What is a key limitation when altering Kafka topics?

You cannot decrease the number of partitions for an existing topic.

Best Practices for Topic Management

Choose the right number of partitions based on your expected throughput and consumer parallelism. Set an appropriate replication factor for fault tolerance. Configure retention policies carefully to balance data availability and storage costs.

A Kafka topic is logically divided into partitions. Each partition is an ordered, immutable sequence of records. Producers append records to partitions, and consumers read from partitions. The number of partitions determines the maximum parallelism for consumers reading from that topic. A replication factor ensures that each partition has copies on multiple brokers, providing fault tolerance. If a broker fails, another broker with a replica can take over.

📚

Text-based content

Library pages focus on text content

Learning Resources

Apache Kafka Documentation: Topics(documentation)

The official Apache Kafka documentation provides a comprehensive overview of topics, including their structure, partitioning, and management.

Confluent Developer: Kafka Topics Explained(tutorial)

A clear and concise explanation of Kafka topics, their purpose, and how they function within the Kafka ecosystem.

Kafka CLI: Managing Topics(documentation)

Details on using the Kafka command-line interface (CLI) tools for creating, listing, describing, altering, and deleting topics.

Understanding Kafka Topic Partitioning(blog)

A blog post that delves into the intricacies of Kafka topic partitioning, explaining its importance for scalability and performance.

Kafka Topic Configuration Parameters(documentation)

A reference guide to all available topic configuration parameters and their effects on topic behavior.

Kafka: The Definitive Guide - Topics(book_excerpt)

An excerpt from a popular book that provides in-depth knowledge about Kafka topics and their management.

Managing Kafka Topics with Kafka Manager (CMAK)(documentation)

Information about Kafka Manager (now CMAK), a popular tool for managing Kafka clusters, including topic operations.

Kafka Topic Replication and Fault Tolerance(documentation)

Explains how replication factors contribute to Kafka's fault tolerance and data durability for topics.

Kafka Topic Cleanup Policies: Delete vs. Compact(blog)

A detailed comparison of Kafka's topic cleanup policies, helping users decide between 'delete' and 'compact'.

Introduction to Apache Kafka(documentation)

The official introduction to Apache Kafka, covering its core concepts including topics, producers, and consumers.