Server metrics

The CloudCasa server exposes metrics related to the execution of backup and restore jobs. These can be queried with a monitoring tool such as Prometheus, and used for basic reporting and alerting.

These metrics can be retrieved using the following command:

kubectl -n cloudcasa-server exec -it <gandalf-controller-pod-name> -- curl -v http://localhost:8080/metrics

Note

Another way is to use casactl command.

Server metrics provide insights into both backup and restore operations. The names of all metrics are prefixed with “cloudcasa” for easy lookup in Grafana/Prometheus. All metrics can be filtered using the listed labels for more granularity.

The following count metrics are exposed:

  • Total Jobs - The cumulative count of all jobs (RUNNING and COMPLETED): cloudcasa_jobs_count

    • Available Labels:

      • job_type

      • cluster

      • job_definition

  • Completed Jobs - The total count of completed jobs: cloudcasa_jobs_completed_count

    • Available Labels:

      • job_type

      • cluster

      • job_definition

      • state

  • Running Jobs - The total count of running jobs: cloudcasa_jobs_running_count

    • Available Labels:

      • job_type

      • cluster

      • job_definition

  • Job Duration - The cumulative duration of completed jobs, in seconds: cloudcasa_jobs_duration_total

    • Available Labels:

      • job_type

      • cluster

      • job_definition

      • state

    • Example: PromQL cloudcasa_jobs_completed_count{cluster="my-test-cluster", Type="K8s_SNAP"} returns a list of matching completed_jobs:

      {Cluster="my-test-cluster", State="SKIPPED", Type="K8S_SNAP", __name__="cloudcasa_completed_jobs", instance="gandalf-controller-manager-metrics-service.cloudcasa-server:8443", job="cloudcasa-server-metrics"}
      

Scraping metrics using Prometheus

Update your Prometheus configuration on the Kubernetes cluster where the CloudCasa server runs to add a new job for scraping the CloudCasa metrics.

Example configuration:

scrape_configs:
- job_name: 'cloudcasa-server-metrics'
  static_configs:
  - targets: ['gandalf-controller-manager-metrics-service.cloudcasa-server:8443']
  scheme: https
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true

Ensure the Prometheus ServiceAccount has permissions to GET the metrics endpoint:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
  name: prometheus
rules:
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]

After configuring Prometheus for metric scraping, you can verify its functionality by accessing the Prometheus web UI. Use the web interface to view and query the collected metrics.

See also

For more information see the Prometheus documentation.

Grafana Dashboard

A template Grafana dashboard is available for download on the CloudCasa support portal at https://support.cloudcasa.io (under Resources). This dashboard includes Time series views for completed jobs by cluster, status, and type, along with a table view for exploring detailed job metrics.

To use the dashboard, download the “CloudCasa_Metrics_Dashboard.json” file from Support Portal and Import into Grafana with your Prometheus data source.