(Better) Free SSL Certs

2019-08-01

A brief word on terminology, HTTPS actually stands for Hypertext Transfer Protocol Secure but at the start, and for many years, that meant HTTP over Secure Sockets Layer (SSL). SSL has long been depreciated in favor of TLS (Transport Layer Security) but common parlance hasn't changed to match. Throughout this article I use HTTPS, SSL, and TLS interchangeably to refer to TLS.

SSL has been a good idea for a long time, and with all of the major browsers updating their user interfaces to mark non-HTTPS connections as insecure, with ever increasing intensity it is time to have every site delivered over HTTPS.

There are a few ways to get that SSL cert, you could buy a "cheap" SSL cert from GoDaddy for $80/yr or a "high quality" one from DigiCert for $399/yr, or get a free one from Lets Encrypt. It does feel a bit odd to be talking about high and low quality bytes, especially standardized ones and few enough that they could be printed on a napkin! But I can vouch that DigiCert's checkout process, customer support, and verification efforts feel like the quality you are paying for.

Being the frugal engineer|devops|sysadmin that you are, you setup Certbot, problem solved, and you go happily about your day

...

Whoops, the SSL cert is expired.

Whoops the HTTP redirect isn't working right.

Whoops the cron script failed to renew the SSL cert so it's expired again.

Whoops the cert is renewed, but NGINX wasn't restarted so it didn't see the changed cert, so the one it is serving is expired.

Whoops NGINX was restarted, sort of, it definitely went down.... but there was a config error so it didn't come back up, now the entire site is down.

Testing says: Why isn't testing and staging using SSL?

Your coworker says: Lets move to docker containers!

Other coworker: we need to add an additional domain, to each environment. to serve static assets, and it has to be over SSL so that we don't get mixed content warnings.

By now you may have run screaming from the room shouting something about why didn't we just pay the $80 and not have to fix this again! Can you tell that I've done this a few times?

It turns out that the level of automation to support automated certs / Lets Encrypt is substantial. To start you need to have a web server configured to respond to the ACME challenge. If you are redirecting all traffic to HTTPS, that web server needs to already have an SSL cert. You need to renew that cert on a relatively short (90 day) schedule. You cannot renew too frequently because the requests are rate limited — also don't lose your cert, those free bytes are rate limited. You'll want to handle a limited number of retries, with an exponential back off, to handle transient failures. You'll also need to tell the web server to reload it's configuration or otherwise inform it that a new certs is in place, preferably without downtime (ps for NGINX that is `nginx -s reload`). All this coordination, likely across multiple machines, or docker containers. Also, did you notice that your has-to-be-rock-solid proxy is no longer stateless, has ad-hoc automation to restart it on demand, and your setup now requires a cron/time based task system?

I've always found that developing the automation, even with the existing libraries, Certbot, etc makes that $80/yr look cheap (and more reliable). But, but... I couldn't leave it at that! Those Lets Encrypt certs are free!

Fear not, what we need is a better proxy, a more modern and dynamic proxy, one with all that automation built in: Traefik.

Traefik is written in Go-lang and boasts performance close to that of NGINX all the while offering a slew of modern features:

Traefik really nailed it. The next time you are setting up a host give it a spin!