Metrics

The Scalr Agent generates various metrics that provide insights into its performance.

Configuration

To enable this feature, set the SCALR_AGENT_OTLP_ENDPOINT environment variable to the host:port address of your OpenTelemetry collector – a gRPC server running an OTLP collector, and enable SCALR_AGENT_OTLP_METRICS_ENABLED option.

Naming conventions

Scalr Agent metrics adhere to a consistent naming convention to provide clear context.

Structure

Metrics are prefixed with a component path, like scalr_agent.app.policy or scalr_agent.worker, to identify the source of the metric.

If a unit is necessary for clarity, it's appended to the metric name in full, such as _bytes or _seconds.

Data Types

  • Time: All timing metrics are measured in seconds with millisecond precision, aligning with established conventions like those used by Prometheus and OpenTelemetry.
  • Size: All data size metrics are measured in bytes.
  • Boolean: Boolean values are represented as the strings true or false.

Labels

Metrics prefixed with scalr_agent.app include two specific labels for context: account_name and run_id.

Main Metrics

Metrics from the Scalr Agent and internal worker pool.

  • scalr_agent.cmd.startup_duration_seconds. Type: Histogram Time taken for the Scalr Agent to be ready to accept incoming tasks after launch.

  • scalr_agent.cmd.uptime_seconds. Type: Gauge Uptime of the service in seconds.

  • scalr_agent.python.mem_used_bytes Type: Gauge Resident memory used by the Python process.

  • scalr_agent.python.greenlets_count Type: Gauge Number of active Python greenlets.

  • scalr_agent.python.threads_count Type: Gauge Number of active Python threads.

  • scalr_agent.worker.cancel_duration_seconds Type: Histogram Time required to cancel a task.

  • scalr_agent.worker.cancel_errors_total Type: Counter Counter for failed task cancellation attempts.

  • scalr_agent.worker.tasks_total Type: Gauge Total number of tasks on the Huey worker.

  • scalr_agent.worker.run_tasks_total Type: Gauge Total number of Scalr Run tasks on the Huey worker.

  • scalr_agent.worker.task.update_status_duration_seconds Type: Histogram Time taken to send a status update to the Scalr platform.

  • scalr_agent.worker.task.update_result_duration_seconds Type: Histogram Time taken to send a result update to the Scalr platform.

  • scalr_agent.consumer.acquire_tasks_duration_seconds Type: Histogram Time taken to acquire tasks on the Scalr Agent.

  • scalr_agent.consumer.connect_to_pool_duration_seconds Type: Histogram Time taken for the Scalr Agent to register with the platform.

  • scalr_agent.containers.schedule_duration_seconds Type: Histogram Time taken by Kubernetes to schedule a task Pod.

  • scalr_agent.containers.delegate_duration_seconds Type: Histogram Time taken by the worker to pick up a scheduled Pod.

  • scalr_agent.containers.cpu_limit_nanocores Type: Gauge CPU limit in nanocores for containers during Plan/Apply stages.

  • scalr_agent.containers.mem_limit_bytes Type: Gauge Memory limit in bytes for containers during Plan/Apply stages.


IaC Metrics

Metrics for OpenTofu, Terraform, and Terragrunt, emitted by terraform.plan and terraform.apply tasks.

  • scalr_agent.app.terraform.install_binaries_duration_seconds Type: Histogram Time taken to install binaries for IaC operations.

  • scalr_agent.app.terraform.download_configuration_version_duration_seconds Type: Histogram Time taken to download the configuration version.

  • scalr_agent.app.terraform.plan_command_duration_seconds Type: Histogram Time taken to execute the plan operation.

  • scalr_agent.app.terraform.show_command_duration_seconds Type: Histogram Time taken to run the show command and generate a JSON plan.

  • scalr_agent.app.terraform.upload_plan_duration_seconds Type: Histogram Time taken to compress and upload plan results (binary and JSON).

  • scalr_agent.app.terraform.init_command_duration_seconds Type: Histogram Time taken to execute the init operation.

  • scalr_agent.app.terraform.install_providers_duration_seconds Type: Histogram Time taken to install providers before init.

  • scalr_agent.app.terraform.cache_providers_duration_seconds Type: Histogram Time taken to cache providers after init.

  • scalr_agent.app.terraform.apply_command_duration_seconds Type: Histogram Time taken to execute the apply operation.

  • scalr_agent.app.terraform.provider_cache_total_used_size_bytes Type: Gauge Current size of the provider plugin cache.

  • scalr_agent.app.terraform.provider_cache_freed_size_bytes Type: Counter Total size of provider cache cleared by GC.

  • scalr_agent.app.terraform.provider_cache_usage_count Type: Gauge Usage count of provider plugins in the cache.

  • scalr_agent.app.terraform.provider_cache_usage_size_bytes Type: Gauge Disk space used per provider plugin in the cache.

  • scalr_agent.binary_cache.total_used_size_bytes Type: Gauge Current size of the binary cache.

  • scalr_agent.app.terraform.run_config_dir_used_size_bytes Type: Counter Size of the initialized run config directory after Plan/Apply.


Policy Metrics

Emitted by policy.check tasks for pre-init and post-plan stages.

  • scalr_agent.app.policy.download_tfinput_duration_seconds Type: Histogram Time taken to download tfinput files.

  • scalr_agent.app.policy.evaluate_policy_group_duration_seconds Type: Histogram Time taken to evaluate a policy group.

  • scalr_agent.app.policy.evaluate_policy_duration_seconds Type: Histogram Time taken to evaluate all files in a policy.

  • scalr_agent.app.policy.opa_eval_command_duration_seconds Type: Histogram Time taken to run the OPA evaluate command.

  • scalr_agent.app.policy.install_binaries_duration_seconds Type: Histogram Time taken to install binaries for policy checks.


Cost Estimation Metrics

Emitted by cost.estimate tasks during the cost estimation stage.

  • scalr_agent.app.cost.download_plan_json_duration_seconds Type: Histogram Time taken to download the plan JSON for cost analysis.

  • scalr_agent.app.cost.estimate_duration_seconds Type: Histogram Time taken to perform cost estimation.

  • scalr_agent.app.cost.install_binaries_duration_seconds Type: Histogram Time taken to install binaries for cost estimation.

  • scalr_agent.app.cost.infracost_breakdown_command_duration_seconds Type: Histogram Time taken to execute the Infracost breakdown command.

  • scalr_agent.app.cost.infracost_output_command_duration_seconds Type: Histogram Time taken to execute the Infracost output command.


Checkov Metrics

Emitted by checkov.analyze tasks during the pre-plan stage.

  • scalr_agent.app.checkov.download_configuration_version_duration_seconds Type: Histogram Time taken to download the configuration version for Checkov.

  • scalr_agent.app.checkov.download_external_checks_duration_seconds Type: Histogram Time taken to download external Checkov check files.

  • scalr_agent.app.checkov.checkov_analyze_command_duration_seconds Type: Histogram Time taken to execute the Checkov analyze command.

  • scalr_agent.app.checkov.install_binaries_duration_seconds Type: Histogram Time taken to install binaries for Checkov operations.