Awesome blog post! I very much enjoy hearing how large web properties implement these technologies and any issues they experience along the way.
Are you using envoy at all in your main http ingress path? You mentioned haproxy and AWS ELBs, but it wasn't clear if envoy is also being considered for public ingress traffic.
We have not yet put Envoy in our main HTTP ingress path, but internally we have designs and implementation paths ready to go, and it's definitely being considered for public ingress traffic. As we noted in the last "teaser" section of the post we'd really like to leverage Envoy's routing functionality to facilitate migrating client-facing APIs in the backend without affecting frontend interfaces.
Our HAProxy layer that routes ingress traffic to the core backend infrastructure has considerable routing logic that can be moved to Envoy and then further extended. We'd love to explore that path in the coming months.
I look forward to hearing more about your plans for ingress and how the various pieces fit together (CDN, L4/L7 LBs, TLS termination, Geo/policy DNS balancing). Especially regarding the performance and new features available using Envoy. I've use HAProxy before and it was great for simple routing/reverse proxy but not so great at complex/dynamic configuration or cert management.
HAProxy supports quite complex configurations. We've actually found that many of our users are only realizing extremely basic capabilities so we have been working on increasing our blog content to help them take advantage of some of the more complex configurations that can be done. We've even found that many users are not aware that HAProxy now supports Hitless Reloads [1].
Quite a bit of complex routing and dynamic configurations can be provided by map files [2] and these and many other settings can be updated directly from the Runtime API [3].
With that said -- we are actively working to make things even better and intend to introduce support for updating SSL certificates/keys directly through the Runtime API as well as introducing a Data Plane API for HAProxy.
We have a new release coming any day now and this will lay the foundation that will allow us to continue to provide best-in-class performance while accelerating cutting edge feature delivery.
Yes! HAProxy is a terrific piece of tech, and has been awesome for our use cases so far. We do quite a bit with it for our main ingress routing and it was basically flawless as our data plane in SmartStack.
I'm really excited about what we're building out for next year and can't wait to share as well. Feel free to reach out on reddit (u/wangofchung) or directly at courtney.wang@reddit.com for more in-depth discussion!
1. Have you considered/are considering ISTIO control plane for your Envoy fleet? Why or why not?
2. Did you containerize your applications before using Envoy? The blog post talks about running them on autoscaled ec2 instances but its not clear if you're running application binaries on those vms or serving from containers
1. We are considering Istio! This is especially true for our Kubernetes environment. We are already planning to deploy Pilot for the first iteration of our control plane in our non-K8s environment, so the other pieces that comprise Istio is a natural place for us to continue exploring.
2. We have not containerized prior to Envoy. We're running application binaries provisioned with Puppet on EC2 for most of our infrastructure still.
We run one proxy per machine, even when there are multiple services running. The proxy is just an abstraction to the downstream dependencies. Even if there are multiple services per machine, they can still reach downstream services via the same proxy path.
Thanks for all the answers, I really appreciate it! I've got one more question.
In the period you had parts of your system with envoy and parts without, have your routed the outbound traffic from envoy-equiped services through their local proxy before reaching its envoy-less destination? Or did you omit envoy then?
We route all outbound traffic from internal services through Envoy, even if the destination isn't running Envoy. We don't have envoy running as a "front" proxy right now, i.e. our L4 setup isn't Envoy <-> Envoy, it's Envoy -> service directly. An example of this is the DB layer - traffic going to our DBs from services goes through Envoy service-side but Envoy isn't running on our DB instances.
What are the reasons for your to have chosen to do so? I mean going for a "back" proxy instead of envoy-envoy (which seems to be the most "advertised" approach) or "front" proxy. As I understand, this way you're losing the envoy features for your most "shallow" service. Or do you also run envoy on your ingresses?
The "back" proxy was the initial setup with SmartStack, so we went with that for minimal viable first steps. We wanted to make incremental changes, changing as little as possible, for this migration so we could monitor for correctness and performance at every step. The eventual plan is to run Envoy as a front proxy for ingress, and maybe even Envoy <-> Envoy everywhere, where we have Envoy as both a front and back proxy on every service deployment (instance, container, etc.)
Others have mentioned that there are some gotchas with Envoy, and you mention a few about the migration bumps. Did you encounter other gotchas? And do you have any suggestions on how to avoid/mitigate their impact?
but as the above indicates, they were resolved _very_ quickly.
The most important thing when making a transition like this is to have as much monitoring and observability as possible without the new tech. We were able to quickly identify and respond to issues we had with Envoy based on existing application and system instrumentation that weren't directly provided by Envoy, along with the vigilance of our engineering team.
Hey, I myself am planning to introduce envoy into an existing mixed kubernetes/bare-metal architecture, having the same "one service at a time" considerations.
Have you been thinking about adopting istio? If yes, why didn't you?
We're currently evaluating the pieces that comprise Istio, both within Kubernetes and outside of it in our existing infrastructure.
We didn't do so immediately because we did not want to immediately update all of our technology at once and felt that a piece-wise migration would be both the least disruptive to our infrastructure and safest. I think of Istio as like Smartstack in that it's not actually a complete "thing" so much as a suite of technologies that can be individually evaluated and deployed. It's very easy to fall into the trap of wanting to do everything at once, and we opted to make small progressive steps for this initiative.
Yeah, I'm thinking of going with only the minimal istio installation (pilot only), not to roll our own, and setting it up with our existing consul service Discovery.