Jump to content

Preventing polynomial memory consumption effect with Open Service Mesh's envoy sidecar


Recommended Posts

Guest stephaneeyskens
Posted

Service Meshes have a great value when running distributed applications at scale on K8s. Many meshes are available nowadays. The usual suspects are Istio and Linkerd but other meshes have come to the surface, such as Open Service Mesh (OSM) from Microsoft. OSM is available as an AKS addon. The promises of a service mesh are:

 

 

 

  • Increased agility thanks to the built-in support of various deployment/testing models
  • Increased resilience thanks to built-in retry, circuit breakers and fault injections (chaos engineering)
  • Increased observability
  • Increased security, thanks to mTLS and traffic policies
  • Enhanced load balancing algorithms that understand the application layer

 

All meshes implement to a larger/lesser extent all of the features listed above.

 

 

 

These very handy capabilities come at a cost since additional compute capacity must be foreseen to accommodate the needs of the mesh. This is mostly due to the fact that every pod is injected with a sidecar container that implements the ambassador pattern, and each sidecar will have a memory and CPU footprint.

 

 

 

This makes sense but you must keep this under control. Before diving into OSM itself, let's first see what happens when a cluster is under memory pressure:

 

 

 

largevv2px999.png.f5d5dfeac74f874af92e770e499ca906.png

 

 

 

AKS will start killing low priority pods randomly. Even when cluster/nodepool autoscaling is turned on, under high pressure, memory will be released at the cost of low priority pods. You will likely see unpleasant K8s events such as below:

 

 

 

largevv2px999.png.a7e3c0fdbac601eed429d346d02e17b5.png

 

 

 

K8s gives us tools to control this possible chaos (Pod Priority and Preemption , Disruptions and Resource Management for Pods and Containers) but whatever you define, you'll be in trouble if there is no more memory in the cluster.

 

 

 

There are multiple reasons why memory could be at risk in a cluster:

 

 

 

  • Not enough worker nodes
  • Memory leaks
  • Memory consumption peaks

 

Unlike CPU, memory is not compressible...Running out of memory is NOT an option.

 

 

 

Now that we have seen the impact of an excessive overall memory consumption, let's see what you must look at when working with OSM from that perspective. At the time of writing, when enabling OSM on a vanilla AKS cluster, the default mesh config spec is as follows:

 

 

 

 

 

 

 

 

 

 

 

spec:

certificate:

certKeyBitSize: 2048

serviceCertValidityDuration: 24h

featureFlags:

enableAsyncProxyServiceMapping: false

enableEgressPolicy: true

enableEnvoyActiveHealthChecks: false

enableIngressBackendPolicy: true

enableRetryPolicy: false

enableSnapshotCacheMode: false

enableWASMStats: true

observability:

enableDebugServer: true

osmLogLevel: info

tracing:

enable: false

sidecar:

configResyncInterval: 0s

enablePrivilegedInitContainer: false

localProxyMode: Localhost

logLevel: debug

resources: {}

tlsMaxProtocolVersion: TLSv1_3

tlsMinProtocolVersion: TLSv1_2

traffic:

enableEgress: true

enablePermissiveTrafficPolicyMode: true

inboundExternalAuthorization:

enable: false

failureModeAllow: false

statPrefix: inboundExtAuthz

timeout: 1s

inboundPortExclusionList: []

networkInterfaceExclusionList: []

outboundIPRangeExclusionList: []

outboundIPRangeInclusionList: []

outboundPortExclusionList: []

 

 

 

 

 

 

 

 

 

 

 

Some default settings have a big impact on memory, and more particularly enableWASMStats and enablePermissiveTrafficPolicyMode. For very small-scale meshes, you might not really see the difference, but as soon as you inject more pods, with these default settings, you will see an excessive memory consumption. To showcase this, I simply wrote a simple Console program that generates any number of service accounts, deployments and services:

 

 

using System.Text;

 

using (StreamWriter sw = new StreamWriter("autogeneratedosm.yaml"))

{

StringBuilder sb = new StringBuilder();

for(int i = 0; i < Convert.ToInt32(args[0]);i++)

{

sb.Append("apiVersion: v1\r\n");

sb.Append("kind: ServiceAccount\r\n");

sb.Append("metadata:\r\n");

sb.AppendFormat(" name: api{0}\r\n", i);

sb.Append("---\r\n");

 

sb.Append("apiVersion: apps/v1\r\n");

sb.Append("kind: Deployment\r\n");

sb.Append("metadata:\r\n");

sb.AppendFormat(" name: api{0}\r\n", i);

sb.Append("spec:\r\n");

sb.Append(" replicas: 1\r\n");

sb.Append(" selector:\r\n");

sb.Append(" matchLabels:\r\n");

sb.AppendFormat(" app: api{0}\r\n",i);

sb.Append(" template:\r\n");

sb.Append(" metadata:\r\n");

sb.Append(" labels:\r\n");

sb.AppendFormat(" app: api{0}\r\n", i);

sb.Append(" spec:\r\n");

sb.AppendFormat(" serviceAccountName: api{0}\r\n",i);

sb.Append(" containers:\r\n");

sb.Append(" - name: api\r\n");

sb.Append(" image: stephaneey/osmapi:dev\r\n");

sb.Append(" imagePullPolicy: Always\r\n");

sb.Append("---\r\n");

 

sb.Append("apiVersion: v1\r\n");

sb.Append("kind: Service\r\n");

sb.Append("metadata:\r\n");

sb.AppendFormat(" name: apisvc{0}\r\n",i);

sb.Append(" labels:\r\n");

sb.AppendFormat(" app: api{0}\r\n",i);

sb.AppendFormat(" service: apisvc{0}\r\n", i);

sb.Append("spec:\r\n");

sb.Append(" ports:\r\n");

sb.Append(" - port: 80\r\n");

sb.Append(" name: http\r\n");

sb.Append(" selector:\r\n");

sb.AppendFormat(" app: api{0}\r\n",i);

sb.Append("---\r\n");

}

sw.Write(sb.ToString());

}

 

The resulting YAML manifest contains any number of triples: ServiceAccount (used by OSM for mTLS), Deployment (deploying application containers into pods which are injected by OSM) and the Service, which is simply used to let K8s write Linux IP tables.

 

With the default settings, and more particularly, when enablePermissiveTrafficPolicyMode is set to true, all meshed services can talk to each other without any restriction. Only non-meshed services are denied because they do not present a client certificate that was issued by OSM. While we may think that this setting only impacts security, this also impact memory. Indeed, when using

enablePermissiveTrafficPolicyMode in conjunction with enableWASMStats, we can see a huge impact on memory. If we produce 60 ServiceAccount/Deployment/Service with our console program, without OSM, we can see that the memory consumption is rather low:

 

largevv2px999.png.efd55559564ced274f6cadf816358d47.png

 

The .NET API consumes between 17 and 21 Megabytes, which is rather low. Now, running those three commands:

 

 

kubectl scale deploy --all --replicas=0 -n osmdemo

osm namespace add osmdemo

kubectl scale deploy --all --replicas=1 -n osmdemo

 

 

 

We first stop all our APIs, then we ask OSM to monitor the osmdemo namespace, and we eventually restart all of our deployments. Running the following command:

 

 

 

kubectl top pod -n osmdemo

 

 

 

quickly reveals an excessive memory consumption:

 

 

 

largevv2px999.png.4233ddf4d64087c89cbcb3681ddabbc2.png

 

 

 

especially given the initial 17-21 MB consumption. This can quickly lead your cluster to the chaotic situation I described earlier. The memory killer setting is enableWASMStats, which enables live collection of metrics made available by the envoy sidecar. With this setting turned on, an API is available to extract metrics. Turning off this setting only is enough to come back to a "normal" memory consumption:

 

 

 

largevv2px999.png.a401fd7883ef6f0414106b7b44d25678.png

 

 

 

However, doing so, you won't have the metrics anymore...so you're losing functionality here. Let's turn enableWASMStats on again and disable enablePermissiveTrafficPolicyMode! With that config, memory consumption remains low and metrics are still collected, but services cannot talk to each other anymore. The only way communication can be authorized is through the use of HTTPRouteGroup and TrafficTarget resource types. When all services can talk to each other, the number of possible routes is huge...while when you explicitly define those routes, you will only define the ones that are really needed, which results in a lower amount of information.

 

 

 

Bottom line, if you want to keep memory under control using OSM, here are the combinations:

 

 

 

  • enableWASMStats: true enablePermissiveTrafficPolicyMode: true ==> bad
  • enableWASMStats: true enablePermissiveTrafficPolicyMode: false ==> ok
  • enableWASMStats: false enablePermissiveTrafficPolicyMode: false ==> ok
  • enableWASMStats: false enablePermissiveTrafficPolicyMode: true ==> ok

 

You just need to avoid to set both enableWASMStats and enablePermissiveTrafficPolicyMode to true or else, have huge memory capacity. The winning combination is probably enableWASMStats: true and enablePermissiveTrafficPolicyMode: false because you will keep memory under control while ensuring higher security.

 

 

 

If you are unsure about what you plan to do, what you can also do as a precaution measure is to define resource requests and limits for the sidecar. OSM makes it possible through this section:

 

 

 

sidecar:

resources: {}

 

 

 

However, keep in mind that defining low limits when both enableWASMStats and enablePermissiveTrafficPolicyMode are set to true, will inevitably lead to the killing of meshed-pods, but you will at least preserve non-meshed pods from being evicted by K8s.

 

Continue reading...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...