Metrics Overview

The collector-manager has a designated endpoint for Prometheus-style metrics, serving them on the /metrics path over port 8080.

Collection Metrics

  • Metric resource_collector_failure_count (counter)

    Indicates the total count of resource collection failures.

  • Metric resource_collector_success_count (counter)

    Counter for the number of successful resource collection operations.

The collector-manager, exposes standard go-metrics that provide insights into its performance and resource utilization, providing valuable insights into the controller's performance.

Monitoring and Alerting

Monitoring these metrics is crucial for ensuring the health and performance of the collector-manager. We strongly recommend leveraging monitoring tools such as Prometheus, Grafana, and AlertManager to gather, visualize, and set up alerts based on these metrics. Setting alerting thresholds for resource collection failures can help proactive issue resolution, ensuring the system's reliability.