Application Configuration

2026-01-19

Lets look at designing your applications configuration. It is worth spending time to design, as it is a common cause of outages, lost time, and bugs.

A good place to start is the Twelve-Factor App configuration guidelines. The Twelve-Factor App patterns have been around for awhile and are really robust. If you read and follow all of the Twelve patterns you will be well ahead of the curve.

We can summarize the recommendation as “Store config in the environment”.

Environment variables are well suited to this task. Every programming language has built in mechanisms to access them, every OS supports them, they are fast, support introspection, and are automatically inherited by child processes.

I have encountered a surprising amount of resistance to them. I think it is because of a lack of familiarity. They are out of sight most of the time, and since they are frequently used for configuration, people may only have experience with them while trying to solve configuration issues, and whatever pain they felt at the time has tainted the environment variables by association.

Lets get familiar with them.

To see the value of an environment variable in the shell you can echo the value, or use env

echo $USER

$ env
MAIL=/var/mail/jordan
USER=jordan
XDG_SEAT=seat0
XDG_SESSION_TYPE=wayland
SSH_AGENT_PID=3414
BUN_INSTALL=/home/jordan/.bun
EDITOR=emacsclient -t
....

you can of course search for a value using something like env | grep -i {query}

You can set environment variables with export (export SPICE=Berbere) and unset them with unset. You can also set them as key-value pairs on the command line when starting an application. Like FOOD=stew ./recipe-manager. It is also a common convention to uppercase them, but the environment system doesn’t care, and will store any case.

By default any process you start will inherit your current environment.

You can also inspect the environment of running processes. Here is how you would do that on Linux:

cat /proc/{pid}/environ | tr '\0' '\n'

Replace {pid} with the process ID of the process you want to inspect. The tr reformats it to display as one variable per line.

0.1. What to configure

Lets look back at the Twelve-Factor App definition of what configuration is:

An app’s config is everything that is likely to vary between deploys (staging, production, developer environments, etc).

And while it is a tautology, that is what you should have in your configuration, only the things that actual change between deployments. You want to minimize what needs to be, and generally what can be configured.

This is a really valuable point. Most applications that I see have many configuration options, frequently in the hundreds. I try to keep things to less than seven. If you are at a backer’s dozen, it is time to clean house.

There is a horrible trap here. Programmers love complexity (we shouldn’t but we do), generally we want to make the application ready to handle anything in any environment. Don’t fall into that trap.

Also slow compilation isn’t a legitimate excuse to move constants out of code and into configuration. The Java community did that, and now do their high level programming in XML (which wouldn’t be so bad if they had just kept it to themselves…)

0.2. Convention over configuration

I thought the Convention over configuration argument was settled around 2015 but I’ve seen it crop up again lately. You want conventions. Take away configuration knobs that are unimportant. Personally I remove stuff until you are honestly feeling some pain or inconvenience, and then back it off just a bit. Or don’t.

0.3. Be defensive

Watch for, and attempt to prevent, application misconfiguration. As a default opening position, I favor sane defaults, and immediate application termination with an error message if a required configuration options isn’t available.

Another reasonable position is that there are no defaults, if it is a configuration option, a configuration must be provided.

Another school of though has a subtle point around application start up time, as opposed to ongoing operation. To illustrate lets consider a database connection string. Typically I find it advantageous to consider the application startup, the time right after it starts as a distinct period, with distinct requirements. For example this is the point at which I test for the existence of, and access to, file system resources, external commands, database connections, etc. If anything is missing, I can warn or shutdown at once. And later throughout the application I don’t need to check if a resource exists, I did all the checks at startup.

Another school of thought is that resources should not be checked until time of use. You might for example require that a database connection string be available via configuration, but not check that it works until you actually need to make a database connection. This style is generally most useful when your service dependency graph is unknowable. For example your team is responsible for one service at megacorp and you don’t know who all is using your service, or if any of the services you depend on are using anything that depends on you…

0.4. User feedback

If the configuration is invalid, supply clear user feedback. If you can, say why/how it is invalid.

0.5. Managing configuration

You are using env variables to configure for your app/service, and you’ve trimmed things down to just what needs to be changed between deployments. How do you keep track of what needs to be set in dev, test, and prod?

There is a quote from the Twelve-Factor patterns that is often miss-understood.

…requires strict separation of config from code. Config varies substantially across deploys, code does not.

This does not mean you can’t store configuration in git, nor does it mean that you have to use a separate git repository.

Definitely keep things under revision control. Also Monorepos are really good sauce. The alternative, poly-repos/multi-repos create both a time sink, and are error prone as you attempt to keep multiple repos in sync. Put the configuration in a folder that includes only configuration, no code. Use it as the source of truth for your deployments. This doesn’t mean you need to check out the source code on your deployments, only that the source for code, and for configuration is unified.

How you write out the configurations to disk is up to you. Dotenv is really popular. I don’t see a good reason to add a dependency for such a trivial task. Sourcing a shell script is generally enough.

You might instead end of with docker-compose.yml or Helm charts. That is ok, if it works for you. Just set things through env vars, and keep the config minimal.

0.6. Avoid inheritance

A common ant-pattern I’ve seen is that you get the dev time config working. Then for production, you create it by (automatically) sourcing the dev time config as a base, and then overlaying a limited number of production settings.

Don’t do it.

This is a really good way to accidentally break production. You just went through the work above of insuring sane defaults in the code, or the absence of defaults, and now you are adding them back in another way.

You also just made dev time configuration capable of breaking production. What if production doesn’t set a (newly added) required config option? Yup, it gets the dev time value. I’ve experienced variations on this failure mode happen so many times. Don’t do it.

0.7. I’m still going to use configuration files

Well, if you must you must. A few pointers: don’t invent new configuration file syntax, don’t write your own parser, do use a format that supports comments.

There are many common formats: JSON, TOML, ini, YAML, etc. You can hack comments into your JSON by either using one of the extended JSON specs with support for comments: Hjson, JSON5, JSONC or by adding a “comment” field to objects. JSON has the advantage that

YAML is an interesting beast. It is not simple. It is almost a DevOps standard. It has native support for way too much: references, including files/imports, etc. May you never need to write your own parser for YAML.

An interesting quirk is that YAML is a super-set of JSON. This is non-obvious since most JSON and most YAML look nothing like each other. But any standards compliant YAML parser can parse JSON. So if you want, you take your YAML confing, convert it to JSON, and it will still work.

0.8. Override Conventions

If you support multiple sources of configuration (files, env vars, CLI arguments, etc) there is an establish convention for what happens when they conflict: configuration files are the base, env vars override files, and CLI arguments override everything. In theory this is: the closer you are to the human currently specifying behavior, the more you override.

Please keep the convention. You will make someone want to hunt you down if you don’t.

0.9. Stranger things

Configuration overlaps with data. Most people’s mental models don’t though. Also the format an application requires doesn’t have to be what you build/store the configuration in. These seem both self evident and frequently missed.

So if you have a massive blob of “configuration” JSON, which is a pain to edit… break it up into pieces, stick it in a DB, generate it using code, etc. Just because the end result is a massive JSON blob doesn’t mean you need to manage it that way.

If it is “massive” it is more likely it belongs in a DB.

Machines are good at translating between formats. Generate the actual configs just in time.

By the same token, if you are creating the application that needs the massive or complicated config, don’t invent a new configuration language, or add extra interpreters or templates on top of YAML. Just use the native language & data structures of the system.

Said another way — don’t add programing features to configuration languages. Use the language the application is programmed in as a configuration language.

SQLite is a great file format.

The end.

May your configurations be always clear & simple.