Deploying code has a cost. It can take time, may require application restart, and (hopefully) has safeguarding processes that add friction. That’s where dynamic configuration comes into play. As the name suggests, dynamic configuration is the ability to change the behavior of a system on the fly. This is incredibly useful for things like feature flags, dev-ops switches, network routing, and customizing behavior for different customers. Most companies will have dynamic configuration of some kind, and some even develop their own systems for it such as Netflix’s Archaius or Twitter’s ConfigBus.
In this blog post, I’ll talk about one particular tool we’ve used for dynamic configuration since very early on at Amplitude. We lovingly refer to it as DynConf.
What is DynConf?
At its core, DynConf is a wrapper around a DynamoDB table call that supports getting string key-value pairs. The wrapper simply adds a layer of local caching with a periodic refresh on any fetched keys, and also provides some type casting and defaulting of the fetched string values.
Besides the fantastic wordplay opportunity, there were several reasons for implementing DynConf this way:
- Reliability: Because it’s just a wrapper, DynConf inherits all the robustness of the DynamoDB for free.
- Code Simplicity: There was existing code to communicate with DynamoDB, making DynConf very simple to write, use, and think about.
- __ Operational Simplicity:__ We didn’t have to worry about being able to scale or manage any clusters of hosts.
- Flexibility: With creativity, a generic key-value store can be adapted for almost every dynamic configuration use case (albeit, not optimally).
The initial design of DynConf was pretty simple.
Overall, we valued the simplicity and low operational overhead for the very small team at the time.
How is it Used?
At Amplitude, DynConf settings usually fall into one of the following patterns
- General toggles for behavior:
- Numeric configuration:
- Targeted toggles
- Customized configuration (depending on access pattern)
Here are some of the most valuable use cases we were able to cover using these patterns.
Feature Flags for New Behavior
General toggles in DynConf are extremely helpful when releasing a new feature or behavior since it allows quick and easy rollback without needing to wait several minutes for a re-deploy.
We also often use targeted toggles to enable behavior for internal dogfooding and early beta testing.
Migrating or upgrading services often involves staged rollout of behavior, where deploying each time may be disruptive.
For example, if the load is a concern for a new service, we use a numeric configuration to control the rate at which requests are redirected to the new service and observe performance metrics under production load.
Tuning Performance Configuration
When working with performance, it isn’t always clear what the best cutoffs or settings will be, especially for production load and data patterns. By making certain timeouts, limits, and thresholds numeric configurations, we are able to more quickly find the right trade-offs.
As a B2B company, we often have to support special behavior for a very small set of customers, such as special query semantics or temporary overrides. Targeted toggles and customized configuration are great for this.
Related Reading: Building Customer Empathy With Legos
What We Learned
After using DynConf for many years, we’ve realized some important strengths and weaknesses.
Managing dynamic configuration settings is hard
In general, it’s easy to accumulate random dynamic configuration scattered around the It’s easy to accumulate random dynamic configuration scattered around the code. code. As a result, important information like what values keys are set to, what a given key does, or even what keys exist and matter can become tribal (or lost) knowledge.
Because DynConf is incredibly simple, it doesn’t really organize this information. This prompted us to make a basic internal admin tool for listing keys, finding keys, and setting configuration values. In addition to reducing the likelihood of mistakes when setting values, the tool also records changes to a MySQL database to track historical setting values.
The admin tool isn’t perfect though. We still occasionally run into issues where developers are confused by “abnormal” behavior caused by a DynConf setting they didn’t know about. Truly solving the problem would require investing in a more sophisticated system for managing dynamic configuration information.
Related Reading: How to Build Product-Oriented Engineering Teams
Not everything makes sense as a DynConf
DynConf is incredibly flexible, but a specialized tool can often a better choice.
You’ll get it into the wall, but there might be a better way…
There are several broad types of feature flags and toggles, and the simplicity and lack of structure in DynConf makes it unsuitable for some of them. For example, we’ve created a separate system for releasing new features to end-users and for managing the complexities of what feature offerings a customer might have.
On the other hand, the reliability of DynConf continues making it the tool of choice for dev-ops kill switches in emergency situations. For example, we have a general toggle to disable real-time computation under heavy query load and a toggle to swap over to a backup kafka cluster in our ingestion pipeline.
DynConf’s value lies in speed of iteration and peace of mind
Due to its reliability and flexibility in handling complex rollout, DynConf makes it easyDynConf makes it easy to release new behavior with a built-in rollback switch. to release new behavior with a built-in rollback switch. This greatly mitigates risk when making changes to critical services, which means worriers like me can save several days or weeks of over-validating and over-testing before being confident enough to deploy.
Similarly, as a small team with limited resources, being able to dynamically tune performance configurations against conditions in the production environment lets us quickly find “good enough” settings and move on, knowing we can always easily retweak them if needed.
And, best of all, there’s been little overhead around scaling or availability of DynConf. The only major change has been extra caching layers using Redis or DAX to reduce DynamoDB costs.
While a bike isn’t not as nice as a car, it’ll still get you from A -> B way faster than walking. It’s easier to assemble too!
All things considered, DynConf gave us a massive boost in development velocity for a very small amount of investment.
Over time, we’ll probably keep seeing configurations move off of DynConf and needing further effort towards managing its growing complexity. Still, this simple, early investment into dynamic configuration has had a massive impact on the engineering team’s velocity getting to where we are today.
So, if you don’t have anything yet, adding something simple like Dyn(amic)Conf(ig) backed by your favorite reliable key-value store might be worth your time.