Metrics Overview
The collector-manager has a designated endpoint for Prometheus-style metrics, serving them on the /metrics
path over port 8080
.
Collection Metrics
-
Metric resource_collector_failure_count (counter)
Indicates the total count of resource collection failures.
-
Metric resource_collector_success_count (counter)
Counter for the number of successful resource collection operations.
The collector-manager, exposes standard go-metrics that provide insights into its performance and resource utilization, providing valuable insights into the controller's performance.
Monitoring and Alerting
Monitoring these metrics is crucial for ensuring the health and performance of the collector-manager. We strongly recommend leveraging monitoring tools such as Prometheus, Grafana, and AlertManager to gather, visualize, and set up alerts based on these metrics. Setting alerting thresholds for resource collection failures can help proactive issue resolution, ensuring the system's reliability.