Skip to main content
Skip table of contents

Kafka tools

Overview

Apache Kafka is a distributed streaming platform used for moving real-time streaming data between systems or applications, and building real-time streaming applications that transform or react to the streams of data. Kafka is run as a cluster on one or more servers that can span multiple data centers. The Kafka server cluster stores streams of data records in categories called topics. Each data record consists of a key, a value, and a timestamp.

Data Management's Kafka Input and Output tools let you move data records into and out of Kafka topics. These tools support a subset of Kafka functionality:

  • Support for auto-commit events only; no manual commits are allowed.

  • Support for serialization/deserialization of String, Integer, Long, Short, Bytes, Float, and Double data types.

  • No support for externally-managed offsets.

  • No support for transactions.

  • No support for manual partitioning.

  • No support for topics with heterogeneous Avro schemas.

  • Avro payloads with embedded schemas cannot be consumed by the Kafka Input tool because the Avro Input tool cannot parse an embedded schema from field input.

Kafka tool shared settings

Data Management's Kafka tools use shared settings, which allows you to define a single set of configuration properties (typically access credentials) to share across multiple tools in your Data Management Site. You can override these settings on a per-tool basis by opening the Shared settings section on the tool's Properties pane, selecting Override shared settings, and specifying values for that specific tool.

To define Kafka shared tool settings

  1. Open the Tools folder under Settings in the repository.

  2. Select the Kafka tab.

  3. Go to the Properties pane.

  4. Configure the tool properties for your environment:

Property

Description

Bootstrap server

A comma-separated list of one or more host-port pairs that are the addresses of the Kafka brokers in the "bootstrap" Kafka cluster that a Kafka client connects to initially to bootstrap itself. These may be of the form localhost:9092,another.host:9092.

Advanced settings

While Data Management only exposes a small subset of configuration options on the Kafka Input and Output tool Property pages, you can optionally define other options by specifying name/value pairs in the format:

TEXT
auto.commit.interval.ms=1000
acks=all
retries=0
batch.size=16384
linger.ms=1
buffer.memory=33554432

To configure default shared tool settings from a Kafka Input or Output tool's Properties pane, open the Shared settings section, and select Edit default settings.

To override Kafka tool shared settings

  1. Select the desired Kafka tool.

  2. Go to the Configuration tab on the Properties pane.

  3. Open the Shared settings section, select Override shared settings, and specify new values for the tool.

Kafka Input

The Kafka Input tool reads events from one or more Kafka topics and outputs them to a single M (Message/Events) connector. These events have the following format.

Field

Description

Key

The partition key of this event, represented as a TextVar.

Offset

The offset of this event, a sequential ID number that uniquely identifies the record within the partition.

Value

The event value, represented as a TextVar.

Kafka Input tool configuration parameters

The Kafka Input tool has a single set of configuration parameters in addition to the standard execution options.

Parameter

Description

Topics

Comma-separated list of topics that events should be read from.

Batch size

Specifies how many events the consumer will buffer before committing offsets to the topic. Adjust this property to balance performance and robustness. Smaller batch sizes may reduce throughput, while large batch sizes may reduce the project’s durability in the event an abnormal exit. Note that a very large batch size may exceed available memory.

Group ID

The name of the consumer group to which this input tool consumer belongs.

Auto offset

Defines the behavior when an existing offset cannot be found:

  • Latest: (default) begin reading from the topic's most recent offset.

  • Earliest: begin reading from the topic's oldest offset.

  • None: fail with an error if no existing offset can be found.

Poll timeout

The duration (in seconds) the consumer should wait for records to become available on the subscribed topics.

Key deserializer

Key field data type. One of String, Integer, Long, Short, Bytes, Float, or Double.

Value deserializer

Value field data type. One of String, Integer, Long, Short, Bytes, Float, or Double.

Event limit

The number of events the tool will read before exiting.

Time limit

The amount of time (in seconds) the tool will read events before exiting.

Override shared settings

If selected, uses the Bootstrap server and Advanced settings defined in the tool properties rather than the Kafka tool shared settings defined in the repository.

Configure the Kafka Input tool

Before configuring a Kafka tool, you should have a Kafka connection defined in shared settings.

  1. Select the Kafka Input tool.

  2. Select the Configuration tab.

  3. Specify the comma-separated list of Topics that events will be read from.

  4. Optionally, edit Batch size.

  5. Specify Group ID as the name of the consumer group to which this input tool consumer belongs.

  6. Optionally, select Auto offset and select the tool's behavior when an existing offset can't be found:

    • Latest: begin reading from the topic's most recent offset.

    • Earliest: begin reading from the topic's oldest offset.

    • None: fail with an error if no existing offset can be found.

  7. Optionally, edit Poll timeout (amount of time in seconds the tool will read events before exiting).

  8. Select Key deserializer and Value deserializer data types.

  9. Optionally, edit Event limit (number of message the tool will read before exiting) and Time limit (amount of time in seconds the tool will read events before exiting).

  10. Optionally, override shared settings.

  11. Optionally, go to the Execution tab, and then set Web service options.

Kafka Output

The Kafka Output tool writes events to a Kafka topic. The tool expects two fields on its single input connector:

  • A key field, designated by the Key field parameter.

  • An event field, designated by the Event field parameter.

The tool has two output connectors: Success (S) and Failure (F).

The Success output connector emits records that were successfully sent to the Kafka topic.

Field

Type

Description

Key

Specified by the Key serializer property.

Event key

Value

Specified by the Value serializer property.

Event value

Offset

Integer(8)

Event’s topic offset

The Failure output connector emits records that could not be sent to the Kafka topic.

Field

Type

Description

Key

Specified by the Key serializer property

Event key

Value

Specified by the Value serializer property

Event value

Exception

TextVar

Description of the error that prevented the record from being sent

Kafka Output tool configuration parameters

The Kafka Output tool has one set of configuration parameters in addition to the standard execution options.

Parameter

Description

Topics

Comma-separated list of topics that events should be written to.

Batch size

Specifies how many events the consumer will buffer before sending events to the topic. Adjust this property to balance performance and robustness. Smaller batch sizes may reduce throughput, while large batch sizes may reduce the project’s durability in the event an abnormal exit. Note that a very large batch size may exceed available memory.

Key field

The partition key for the event. If present, Kafka will hash the key and use it to assign the event to a partition. If absent, Kafka will distribute events across all partitions.

Key serializer

Key field data type. One of String, Integer, Long, Short, Bytes, Float, or Double.

Value field

The field from which the event will be read.

Value serializer

Value field data type. One of String, Integer, Long, Short, Bytes, Float, or Double.

Override shared settings

If selected, uses the Bootstrap server and Advanced settings defined in the tool properties rather than the Kafka tool shared settings defined in the repository.

Configure the Kafka Output tool

Before configuring a Kafka tool, you should have a Kafka connection defined in shared settings.

  1. Select the Kafka Output tool.

  2. Go to the Configuration tab on the Properties pane.

  3. Specify the comma-separated list of Topics that events will be written to.

  4. Optionally, edit Batch size.

  5. Optionally specify the Key field, or leave blank to distribute events across all partitions.

  6. Specify the Key serializer data type: String, Integer, Long, Short, Bytes, Float, or Double.

  7. Specify the Event field.

  8. Specify the Event serializer data type: String, Integer, Long, Short, Bytes, Float, or Double.

  9. Optionally, override shared settings.

  10. Optionally, go to the Execution tab, and then set Web service options.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.