Kafka Unclean Leader Election

Introduction

The unclean leader election configuration is used to determine whether a replica that is not insync with the lead replica can itself become leader in a failure scenario. However if this were to happen, any messages that the unclean leader did not have would be lost.

It is therefore important to understand the consequences for the system based on how this parameter is configured, and the associated trade-offs that must be evaluated.

Availability vs Durability

Consider the following Kafka Broker state, with three Broker nodes. The diagram shows that node 1 is the lead replica for a particular topic partition, and the three messages written to the partition, namely ‘foo’, ‘bar’ and ‘xyz’, are being replicated to nodes 2 and 3.

When unclean leader election is enabled the following scenario results in message loss:

Three brokers, with broker 2 in-sync (up to date) with the lead broker 1.
Broker 3 is lagging behind and is not in-sync, having only messages (foo) and (bar).
Brokers and fail, leaving only broker available.
Broker 3 is made leader.
Message (xyz) which had not replicated to broker 3 is lost.

Allowing the unclean broker to become leader as a last resort increases availability at the expense of durability, as new reads and writes are still possible for the topic partition.

In this failure scenario, if unclean leader election was not enabled, then clients would have to wait for a consistent insync replica to come back online before it can perform further reads and writes to that topic partition. This configuration therefore favours durability over availability.

Configuration

The configuration parameter is unclean.leader.election.enable. It can be configured on the broker, and can be overridden on a per topic basis by applying the configuration to the topic itself.

The default configuration for the core Apache Kafka Broker and topics is false, meaning durability is favoured over availability. Note that this default was changed, as prior to version 0.11.0.0 it defaulted to true. 0.11.0.0 was released in June 2017 however, so it is unlikely to still be in use.

If using a managed Kafka service provider this default may again vary, so it is vital to check the documentation to determine whether the configuration is as required.

At the time of writing the Confluent offering defaults to false, while the Amazon’s AWS Managed Streaming for Apache Kafka (MSK) offering defaults to true.

Confluent defaults: https://docs.confluent.io/platform/current/installation/configuration/broker-configs.html

MSK defaults: https://docs.aws.amazon.com/msk/latest/developerguide/msk-default-configuration.html

Kafka Provider	Unclean Leader Election	Favours
Non-managed (< 0.11.0.0)	true	Availability over durability
Non-managed (0.11.0.0+)	false	Durability over availability
Confluent	false	Durability over availability
AWS MSK	true	Availability over durability

Conclusion

For systems that deem message loss as unacceptable, it is important to ensure that unclean leader election is not enabled. Where message loss is however acceptable, and high availability is considered more important, then unclean leader election can be enabled. In some cases it may be prudent to enable unclean leader election on some topics, and disable on others, and Kafka provides the flexibility to achieve this.

Introduction to Kafka with Spring Boot

Lydtech's Udemy course Introduction to Kafka with Spring Boot covers everything from the core concepts of messaging and Kafka through to step by step code walkthroughs to build a fully functional Spring Boot application that integrates with Kafka. Put together by our team of Kafka and Spring experts, this course is the perfect introduction to using Kafka with Spring Boot.

View this article on our Medium Publication.