Kafka Unclean Leader Election

Lydtech
Kafka Unclean Leader Election

Introduction

The unclean leader election configuration is used to determine whether a replica that is not insync with the lead replica can itself become leader in a failure scenario. However if this were to happen, any messages that the unclean leader did not have would be lost.

It is therefore important to understand the consequences for the system based on how this parameter is configured, and the associated trade-offs that must be evaluated.

Availability vs Durability

Consider the following Kafka Broker state, with three Broker nodes. The diagram shows that node 1 is the lead replica for a particular topic partition, and the three messages written to the partition, namely ‘foo’, ‘bar’ and ‘xyz’, are being replicated to nodes 2 and 3.

Figure 1

When unclean leader election is enabled the following scenario results in message loss:

  1. Three brokers, with broker 2 in-sync (up to date) with the lead broker 1.
  2. Broker 3 is lagging behind and is not in-sync, having only messages Message 1 (foo) and Message 2 (bar).
  3. Brokers Broker 1 and Broker 2 fail, leaving only broker Broker 3 available.
  4. Broker 3 is made leader.
  5. Message Message 3 (xyz) which had not replicated to broker 3 is lost.

Allowing the unclean broker to become leader as a last resort increases availability at the expense of durability, as new reads and writes are still possible for the topic partition.

In this failure scenario, if unclean leader election was not enabled, then clients would have to wait for a consistent insync replica to come back online before it can perform further reads and writes to that topic partition. This configuration therefore favours durability over availability.

Configuration

The configuration parameter is unclean.leader.election.enable. It can be configured on the broker, and can be overridden on a per topic basis by applying the configuration to the topic itself.

The default configuration for the core Apache Kafka Broker and topics is false, meaning durability is favoured over availability. Note that this default was changed, as prior to version 0.11.0.0 it defaulted to true. 0.11.0.0 was released in June 2017 however, so it is unlikely to still be in use.

If using a managed Kafka service provider this default may again vary, so it is vital to check the documentation to determine whether the configuration is as required.

At the time of writing the Confluent offering defaults to false, while the Amazon’s AWS Managed Streaming for Apache Kafka (MSK) offering defaults to true.

Confluent defaults: https://docs.confluent.io/platform/current/installation/configuration/broker-configs.html

MSK defaults: https://docs.aws.amazon.com/msk/latest/developerguide/msk-default-configuration.html

Kafka Provider Unclean Leader Election Favours
Non-managed (< 0.11.0.0) true Availability over durability
Non-managed (0.11.0.0+) false Durability over availability
Confluent false Durability over availability
AWS MSK true Availability over durability

Conclusion

For systems that deem message loss as unacceptable, it is important to ensure that unclean leader election is not enabled. Where message loss is however acceptable, and high availability is considered more important, then unclean leader election can be enabled. In some cases it may be prudent to enable unclean leader election on some topics, and disable on others, and Kafka provides the flexibility to achieve this.


View this article on our Medium Publication.