Resource Limits

This page describes the resource requirements for various CloudCasa server components. Note that the exact requirements depend on many factors such as the type of data (number of files, size of files), churn rate, number of clusters, and frequency of the operations (schedules defined in policies). As such, one should start with defaults and tune required parameters based on the need.

Workloads

The following workloads are present irrespective of any ongoing operations.

Component	CPU (m, req/limit)	Memory (req/limit)	Replicas
apiserver	250/500	512 Mi / 1 Gi	3
kas	100/250	256 Mi / 1 Gi	2
eventreconciler	50/250	512 Mi / 2 Gi	3
gandalf	25/50	128 Mi / 1 Gi	1
cronjobber	10/25	64 Mi / 500 Mi	1
amds-frontend	10/25	16 Mi / 64 Mi	1
amds-envoy	10/25	16 Mi / 64 Mi	1
grpcapiserver	10/25	16 Mi / 64 Mi	1
dex	5/25	32 Mi / 64Mi	1
redis	10/25	32 Mi / 64 Mi	1
fluentd	25/50	256 Mi / 512 Mi	daemonset
minio	25/50	256 Mi / 512 Mi	1
mongo (*)	25/50	512 Mi / 1 Gi	1

In addition to the above components, a pod is started to run a job such as backup and restore. Each such pod (called “job runner”) consumes very few resources as it only orchestrates the job and does not run for more than few minutes in most cases. The only exception is when it waits if there is another overlapping job currently running on the same cluster but even in that case, it doesn’t consume many resources.

Similarly, a pod is started to perform backup repository maintenance (such as removing expired backups) once every day (by default). There is one such pod (called “cloud task”) per object store. Its memory requirements vary depending on the type of data in the backup (such as number of files, size of files etc), number of backups to delete etc.

Currently, no limits are configured for cloud task and job runner pods.

Autoscaling

Horizontal Pod Autoscaling can be optionally enabled for 2 CloudCasa server deployments: apiserver and kubeagentserver. With many active agents it may be optimal to use autoscaling for these components since load can vary depending on the number of running jobs or active users.

Use the following helm values to enable HPA and set the target CPU utilization per deployment:

apiserver:
  hpa:
    enabled: true
    minReplicas: 2
    maxReplicas: 10
    targetCPU: 80
kubeagentserver:
  hpa:
    enabled: true
    minReplicas: 2
    maxReplicas: 10
    targetCPU: 80

Notes

Mongo details apply only if CloudCasa installed Mongo component is used (default behavior).
The default settings above should be sufficient to support at least hundred jobs, clusters, and policies.
Requests, limits, and replicas can only be configured for few components. For others, the defaults should suffice for the most part. We will continue to make changes in this regard as needed.

To find supported configurable parameters, run the command helm show values cloudcasa/cloudcasa-server and look for the following components
- apiserver
- cronjobber
- dex
- envoy
- eventreconciler
- fluentd
- frontend
- gandalf
- grpcapiserver
- kubeagentserver
- minio
- redis

Storage

CloudCasa server creates the following PVCs. The default values should be sufficient for large number of clusters, jobs, and other resources.

PVC to host logs bucket: Default size is 4 Gi. Can be configured with the parameter log.pv.size.
PVC to host Mongo catalog: Default size is 16 Gi. Can be configured with the parameter mongo.pv.size.

In addition to the PVCs, some of the components use “emptyDir” volumes as scratch space, which come from node ephemeral storage. So it is recommended that nodes are configured with at least 25 GB of such storage.