This blog post is the successor of the HPA blog post my colleague Furkan published. We will go more deeper into HPA and will see what’s really inside of it. We will also explore what we can do with it at a more advanced level.
Let’s remember what HPA, Horizontal Pod Autoscaler, does:
Although those fundamental functions serve most use-cases, if we are aware of what can be done more with HPA, we can find better solutions for our problems.
HPA has an algorithm that looks like this:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
What does this mean? Let’s break it down!
desiredReplicas: Replica count that will be sent to the controller after calculations.
ceil(): This is a function that rounds a fractional number upwards. For example ceil(12.18) is 13.
currentReplicas: Current number of pods for a given deployment or any other superset of “scale” object type.
currentMetricValue: Current value of metric for a given scaling factor metric. Can be 800m or 1.5Gi, for custom metrics it can be 500 events per second etc.
desiredMetricValue: Metric that has been set to comply by HPA. Eventually with all mechanisms HPA provides, your app runs at this metric value. This value should not be too low or too high.
Let’s go over an example first. I know it is simple math but we will dive into the philosophy of the formula and we need an example first.
Let’s say we have an HPA configuration with a target CPU usage of 60%, a minimum pod count of 5 and a maximum pod count of 14.
Current deployment status is: 8 pods averaging %70 usage.
desiredReplicas = ceil[8*(70/60)] = ceil(9.33) = 10
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2020-10-27T14:00:05Z"
deletionGracePeriodSeconds: 30
deletionTimestamp: "2020-10-27T14:00:39Z"
Let’s say we have an HPA configuration with a target CPU usage of 60%, a minimum pod count of 12 and a maximum pod count of 16.
Current deployment status is: There are 14 total pods. 10 pods averaging %85 usage. 2 pods are failing. 2 pods are ready but not sending metrics for a while.
This is how calculations are different from the first example calculations:
Now average is [(10*85)+(2*0)+(2*60)]/14 = 69.28
desiredReplicas = ceil[14*(69.28/60)] = ceil(16.16) = 17 > 14
We saw how we can set scaling options with controller-manager flags. Since Kubernetes 1.18 and v2beta2 API we also have a behavior field. With the behavior field, we can configure all of these parameters on a per HPA object basis.
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: example-deployment
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: example-deployment
minReplicas: 1
maxReplicas: 10
behavior:
scaleDown:
selectPolicy: Disabled
scaleUp:
stabilizationWindowSeconds: 120
policies:
- type: Percent
value: 10
periodSeconds: 60
- type: Pods
value: 4
periodSeconds: 60
selectPolicy: Max
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
status:
observedGeneration: 1
lastScaleTime: <some-time>
currentReplicas: 1
desiredReplicas: 1
currentMetrics:
- type: Resource
resource:
name: cpu
current:
averageUtilization: 0
averageValue: 0
behavior: This field is a declarative way to configure the scaling behavior.
selectPolicy: This field is used to select the preferred policy dynamically. Default value is “Max”. Not a mandatory field. Can be set to “Disabled” which disables the defined scaling operation entirely.
scaleDown, scaleUp: These fields represent scaling up or down operations, they have the same configurations.
policies: This field represents how scaling actually happens. This is an array-based field.
type: This field represents the type of the value for a given policy.
value: This field represents the value of the type for a given policy.
periodSeconds: This field represents the period between scaling operations for a given policy.
behavior:
scaleDown:
selectPolicy: Disabled
scaleUp:
stabilizationWindowSeconds: 120
policies:
- type: Percent
value: 30
periodSeconds: 60
- type: Pods
value: 7
periodSeconds: 60
selectPolicy: Max
Let’s have a quick example: We have 18 pods available and we need lots of pods. First let's calculate 30% of the 18 pods, the answer is 5.4 but it will be rounded up to 6. Other option is straight up 7. Since selectPolicy is “Max” the deployment will be scaled by 7 pods. Now we need to wait 60 seconds before any scaling up operations can happen because periodSeconds is set to 60.
Now we have 25 pods and let’s say we still need to scale up by a large amount. Let’s calculate 30% of the 25 pods, the answer is 7.5 and it will be rounded up to 8. Other option was static which is 7. Since selectPolicy is “Max” we will scale by 8 pods this time.
scaleTargetRef: This configuration block sets up HPA’s target object. This can be Deployment, replicaSet, replication controller or custom resources that are built upon the Scale object. This is why you need to provide apiVersion and kind, not just name.
metrics: This configuration block sets up the type of the metric and when should target object scale.
Let’s examine a detailed example for this:
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Pods
pods:
metric:
name: packets-per-second
target:
type: AverageValue
averageValue: 1k
- type: Object
object:
metric:
name: requests-per-second
describedObject:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
name: main-route
target:
type: Value
value: 10k
As we seen above, you can use custom metrics directly in the HPA configuration. This means custom metrics are first class citizens in HPA. Let’s dive into internals.
Metrics aggregator is a pathway for all the metrics server implementations. It can query metrics from a series of aggregated APIs: “metrics.k8s.io”, “custom.metrics.k8s.io”, and “external.metrics.k8s.io”. This means that if you provide an adapter that implements these APIs, you can use your metrics provider.
The most common metric provider is indeed “metrics-server” which is not included with Kubernetes natively. This application is implemented with “metrics.k8s.io” API. You can think of it as the default provider for CPU and memory metrics.
We will not do a Prometheus example but as you might understand by now, it is just an implementation over an abstraction and won’t change how HPA works by any means. If you want to implement a Prometheus based HPA solution, there are many great blog posts about this matter. There is also a great blog post about this in the “suggested reads” section.
I tried to demystify HPA as much as possible. I did this because I learn better if I know the internals as much as possible. I tried my best to provide examples and graphics to appeal to more people with different types of learning process.
Another advanced area worth exploring is the extensibility part of the HPA, which means implementing custom “Scale” based custom resources or implementing custom metrics and scale based upon those metrics.
Suggested reading: