Customer Flexibility

Within Engineering@DataNexus, we work hard to build a data platform that works for all of our customers, regardless of size or complexity. One of our first internal engineering requirements is ubiquity, in that whether our customers run on AWS, Azure, OpenStack, or bare metal, our platform looks, feels, and behaves the same. To that end, we are huge fans of process automation, whether it's creating VPCs and networks, spinning up or down virtual machines, or configuring our data platform layer. We heavily rely on Ansible to stand up and configure our application and data layer, consisting of elements such as Kafka, Cassandra, Postgres, Elasticsearch, or even operational tasks such as wiring data sources to data destinations.

As most folks would expect, we organize each layer into one more or Git repositories. Our Kafka code is responsible for installing and configuring Apache Kafka, as well as both Confluent enterprise and community. That encompasses Apache projects such as Zookeeper as well as the Kafka brokers, plus Confluent specific technologies such as control center and KSQL. Then, we have higher level code that handles TLS encryption and authentication regardless if we installed these components or it's an existing cluster that we're tuning (more on that in a separate post).

Where it gets a little tricky is when we apply one of our next requirements, in that we should not touch our application Ansible code regardless of customer environment complexity, which loosely means that the entire platform is installable and configurable through a few customer defined YAML files. To enforce that,  we separate out the YAML files into Git repos that are either under our control, customer control, or joint control. Those YAML files are then included within the playbooks as optionally defined variables.

Here's a simple example around topic deletion in Kafka. With a default Confluent Kafka install, topic deletion is enabled. However in production, that behavior may be less desirable, especially in heavily regulated environments. Fortunately, this is easily configurable via a straightforward (and optional) flag in the /etc/kafka/server.conf file:


That line is not present in the default installation, so if a customer wishes to have topic deletion turned off, we append that line across every broker. If they want to specifically state that it's enabled (to avoid confusion), we would also append that line, otherwise we want to leave the server.conf file in a minimally changed state. That touches on another one of our internal rules, unless otherwise specified, make the minimal number of changes to achieve the desired outcome. It makes debugging, tuning, and upgrades much simpler.

Here is how we handle that within our Ansible code. Within each customer Git repo, we have a kafka_broker.yml file that may (or may not) have a line that says:

topic_deletion: false

Remember, we want our code to always do the smart thing, regardless if that variable is defined or left out. Within our platform kafka role, we set a variable hierarchy within roles/kafka/defaults/main.yml that looks something like the following:

   topic_deletion: "{{ (topic_deletion is defined | default('yes')) | ternary(topic_deletion, none) }}"

This basically says that the kafka.config.topic_deletion variable only has a usable value if the customer has chosen a stance on topic deletion. We can make use of this fact within the code that actually does the modification on the broker configuration:

- lineinfile:
    path: /etc/kafka/server.conf
    regexp: '^delete.topic.enable='
    line: "delete.topic.enable={{ kafka.config.topic_deletion }}"
  when: not((kafka.config.topic_deletion is undefined) or (kafka.config.topic_deletion is none) or (kafka.config.topic_deletion | trim == ''))
  notify: restart broker

In a nutshell, if the default variable is defined and has a value, we set the specified value within /etc/kafka/server.conf and notify our broker handler to bounce the service when ready. This code remains the same, regardless of how we choose to implement the variable definition, which means less tested code changed as our platform evolves.

This pattern has proven very effective for building a flexible platform that has handled many deployment scenarios.


Popular posts from this blog

DataTrust Technology

Amazing updates to DataNexus Platfrom v2