If you run Python inside containers, chances are you have seen Linux’s OOMKiller working …
I recently fall into a trap using Traefik as the ingress controller in one cluster. I decided to write about it with hopes it maybe help someone else.
We got the architecture like this:
Cloudflare -> Traefik LoadBalancer -> Traefik Pods -> App Pods
Traefik was running in a node pool that had only non-preemptible machines. The app pods were running in a mix of preemptible and not preemtible.
We also have other things running, so, in reality, we had a lot of nodes running neither the app nor Traefik pods, just
kube-system stuff, monitoring, batch jobs and so on.
Clients would sometimes report that Cloudflare returned a HTTP 520.
520 is not specified by any spec, but is used by Cloudflare to indicate “unknown connection issues between Cloudflare and the origin web server”.
We though maybe something was up on our app, or maybe even on Traefik instances. We looked into our logs and monitoring, but couldn’t find anything relevant.
I was pretty much losing my hair and was sure something was going on with Traefik and that for some reason I couldn’t find what it was.
Because of that, I decided to try to create a GCE ingress — which translates to GCE Load balancer directly.
Then it hit me… after days of looking into things that had nothing to do with the problem, as usually, it was something obvious (once you get it).
Traffic inside a Kubernetes cluster
Basically, any node can route to a pod running in any other node.
That is handled by
That, plus the following screenshot made clicked me:
Even though we had Traefik running only on non-preemptible nodes, all nodes, including the preemtible ones, were receiving traffic from the load balancer.
From there it is straightforward to figure out what can happen:
- Client access
- LB randomly routes to a preemptible node
- Request is still going
- Node is preempted
- Request is terminated mid way (we have some slow requests for several reasons)
- Cloudflare doesn’t know what the hell happened, and they have a status code for it: 520
So, what we need, ideally, is to have only the nodes that are actually running Traefik to be “green” on the load balancer, all others must fail the healthchecks.
After digging a bit, I found out about a setting called
The default for it is
Cluster, which makes the traffic being NATed. The main reason to change this is, usually, because you want to use
traefik.ingress.kubernetes.io/whitelist-source-range to filter traffic, but, since by default the requests get NATed, you can’t, because you won’t get the correct client IP.
Local has two effects:
- traffic is no longer NATed — which means
- oh, also: only the nodes that have Traefik running in then will pass the load balancer healthchecks.
Which, yeah, fixes our issue.
Literally one line of code…
After hours and hours investigating…
It seems to me
Local should be the default option instead of
Cluster, as it seems to be what you will usually want — maybe this is an old default they don’t want to change, didn’t get into it.
Local in a live cluster will cause some requests to fail. On some clusters it was only for a few seconds, in another one it took almost 1 minute to stabilize.