Guest stephaneeyskens Posted April 7, 2023 Posted April 7, 2023 Service Meshes have a great value when running distributed applications at scale on K8s. Many meshes are available nowadays. The usual suspects are Istio and Linkerd but other meshes have come to the surface, such as Open Service Mesh (OSM) from Microsoft. OSM is available as an AKS addon. The promises of a service mesh are: Increased agility thanks to the built-in support of various deployment/testing models Increased resilience thanks to built-in retry, circuit breakers and fault injections (chaos engineering) Increased observability Increased security, thanks to mTLS and traffic policies Enhanced load balancing algorithms that understand the application layer All meshes implement to a larger/lesser extent all of the features listed above. These very handy capabilities come at a cost since additional compute capacity must be foreseen to accommodate the needs of the mesh. This is mostly due to the fact that every pod is injected with a sidecar container that implements the ambassador pattern, and each sidecar will have a memory and CPU footprint. This makes sense but you must keep this under control. Before diving into OSM itself, let's first see what happens when a cluster is under memory pressure: AKS will start killing low priority pods randomly. Even when cluster/nodepool autoscaling is turned on, under high pressure, memory will be released at the cost of low priority pods. You will likely see unpleasant K8s events such as below: K8s gives us tools to control this possible chaos (Pod Priority and Preemption , Disruptions and Resource Management for Pods and Containers) but whatever you define, you'll be in trouble if there is no more memory in the cluster. There are multiple reasons why memory could be at risk in a cluster: Not enough worker nodes Memory leaks Memory consumption peaks Unlike CPU, memory is not compressible...Running out of memory is NOT an option. Now that we have seen the impact of an excessive overall memory consumption, let's see what you must look at when working with OSM from that perspective. At the time of writing, when enabling OSM on a vanilla AKS cluster, the default mesh config spec is as follows: spec: certificate: certKeyBitSize: 2048 serviceCertValidityDuration: 24h featureFlags: enableAsyncProxyServiceMapping: false enableEgressPolicy: true enableEnvoyActiveHealthChecks: false enableIngressBackendPolicy: true enableRetryPolicy: false enableSnapshotCacheMode: false enableWASMStats: true observability: enableDebugServer: true osmLogLevel: info tracing: enable: false sidecar: configResyncInterval: 0s enablePrivilegedInitContainer: false localProxyMode: Localhost logLevel: debug resources: {} tlsMaxProtocolVersion: TLSv1_3 tlsMinProtocolVersion: TLSv1_2 traffic: enableEgress: true enablePermissiveTrafficPolicyMode: true inboundExternalAuthorization: enable: false failureModeAllow: false statPrefix: inboundExtAuthz timeout: 1s inboundPortExclusionList: [] networkInterfaceExclusionList: [] outboundIPRangeExclusionList: [] outboundIPRangeInclusionList: [] outboundPortExclusionList: [] Some default settings have a big impact on memory, and more particularly enableWASMStats and enablePermissiveTrafficPolicyMode. For very small-scale meshes, you might not really see the difference, but as soon as you inject more pods, with these default settings, you will see an excessive memory consumption. To showcase this, I simply wrote a simple Console program that generates any number of service accounts, deployments and services: using System.Text; using (StreamWriter sw = new StreamWriter("autogeneratedosm.yaml")) { StringBuilder sb = new StringBuilder(); for(int i = 0; i < Convert.ToInt32(args[0]);i++) { sb.Append("apiVersion: v1\r\n"); sb.Append("kind: ServiceAccount\r\n"); sb.Append("metadata:\r\n"); sb.AppendFormat(" name: api{0}\r\n", i); sb.Append("---\r\n"); sb.Append("apiVersion: apps/v1\r\n"); sb.Append("kind: Deployment\r\n"); sb.Append("metadata:\r\n"); sb.AppendFormat(" name: api{0}\r\n", i); sb.Append("spec:\r\n"); sb.Append(" replicas: 1\r\n"); sb.Append(" selector:\r\n"); sb.Append(" matchLabels:\r\n"); sb.AppendFormat(" app: api{0}\r\n",i); sb.Append(" template:\r\n"); sb.Append(" metadata:\r\n"); sb.Append(" labels:\r\n"); sb.AppendFormat(" app: api{0}\r\n", i); sb.Append(" spec:\r\n"); sb.AppendFormat(" serviceAccountName: api{0}\r\n",i); sb.Append(" containers:\r\n"); sb.Append(" - name: api\r\n"); sb.Append(" image: stephaneey/osmapi:dev\r\n"); sb.Append(" imagePullPolicy: Always\r\n"); sb.Append("---\r\n"); sb.Append("apiVersion: v1\r\n"); sb.Append("kind: Service\r\n"); sb.Append("metadata:\r\n"); sb.AppendFormat(" name: apisvc{0}\r\n",i); sb.Append(" labels:\r\n"); sb.AppendFormat(" app: api{0}\r\n",i); sb.AppendFormat(" service: apisvc{0}\r\n", i); sb.Append("spec:\r\n"); sb.Append(" ports:\r\n"); sb.Append(" - port: 80\r\n"); sb.Append(" name: http\r\n"); sb.Append(" selector:\r\n"); sb.AppendFormat(" app: api{0}\r\n",i); sb.Append("---\r\n"); } sw.Write(sb.ToString()); } The resulting YAML manifest contains any number of triples: ServiceAccount (used by OSM for mTLS), Deployment (deploying application containers into pods which are injected by OSM) and the Service, which is simply used to let K8s write Linux IP tables. With the default settings, and more particularly, when enablePermissiveTrafficPolicyMode is set to true, all meshed services can talk to each other without any restriction. Only non-meshed services are denied because they do not present a client certificate that was issued by OSM. While we may think that this setting only impacts security, this also impact memory. Indeed, when using enablePermissiveTrafficPolicyMode in conjunction with enableWASMStats, we can see a huge impact on memory. If we produce 60 ServiceAccount/Deployment/Service with our console program, without OSM, we can see that the memory consumption is rather low: The .NET API consumes between 17 and 21 Megabytes, which is rather low. Now, running those three commands: kubectl scale deploy --all --replicas=0 -n osmdemo osm namespace add osmdemo kubectl scale deploy --all --replicas=1 -n osmdemo We first stop all our APIs, then we ask OSM to monitor the osmdemo namespace, and we eventually restart all of our deployments. Running the following command: kubectl top pod -n osmdemo quickly reveals an excessive memory consumption: especially given the initial 17-21 MB consumption. This can quickly lead your cluster to the chaotic situation I described earlier. The memory killer setting is enableWASMStats, which enables live collection of metrics made available by the envoy sidecar. With this setting turned on, an API is available to extract metrics. Turning off this setting only is enough to come back to a "normal" memory consumption: However, doing so, you won't have the metrics anymore...so you're losing functionality here. Let's turn enableWASMStats on again and disable enablePermissiveTrafficPolicyMode! With that config, memory consumption remains low and metrics are still collected, but services cannot talk to each other anymore. The only way communication can be authorized is through the use of HTTPRouteGroup and TrafficTarget resource types. When all services can talk to each other, the number of possible routes is huge...while when you explicitly define those routes, you will only define the ones that are really needed, which results in a lower amount of information. Bottom line, if you want to keep memory under control using OSM, here are the combinations: enableWASMStats: true enablePermissiveTrafficPolicyMode: true ==> bad enableWASMStats: true enablePermissiveTrafficPolicyMode: false ==> ok enableWASMStats: false enablePermissiveTrafficPolicyMode: false ==> ok enableWASMStats: false enablePermissiveTrafficPolicyMode: true ==> ok You just need to avoid to set both enableWASMStats and enablePermissiveTrafficPolicyMode to true or else, have huge memory capacity. The winning combination is probably enableWASMStats: true and enablePermissiveTrafficPolicyMode: false because you will keep memory under control while ensuring higher security. If you are unsure about what you plan to do, what you can also do as a precaution measure is to define resource requests and limits for the sidecar. OSM makes it possible through this section: sidecar: resources: {} However, keep in mind that defining low limits when both enableWASMStats and enablePermissiveTrafficPolicyMode are set to true, will inevitably lead to the killing of meshed-pods, but you will at least preserve non-meshed pods from being evicted by K8s. Continue reading... Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.