Metrics
The Scalr Agent generates various metrics that provide insights into its performance.
Configuration
To enable this feature, set the SCALR_AGENT_OTLP_ENDPOINT environment variable to the host:port address of your OpenTelemetry collector – a gRPC server running an OTLP collector, and enable SCALR_AGENT_OTLP_METRICS_ENABLED option.
Naming conventions
Scalr Agent metrics adhere to a consistent naming convention to provide clear context.
Structure
Metrics are prefixed with a component path, like scalr_agent.app.policy or scalr_agent.worker, to identify the source of the metric.
If a unit is necessary for clarity, it's appended to the metric name in full, such as _bytes or _seconds.
Data Types
- Time: All timing metrics are measured in seconds with millisecond precision, aligning with established conventions like those used by Prometheus and OpenTelemetry.
- Size: All data size metrics are measured in bytes.
- Boolean: Boolean values are represented as the strings true or false.
Labels
Metrics prefixed with scalr_agent.app include two specific labels for context: account_name and run_id.
Main Metrics
Metrics from the Scalr Agent and internal worker pool.
-
scalr_agent.cmd.startup_duration_seconds
. Type: Histogram Time taken for the Scalr Agent to be ready to accept incoming tasks after launch. -
scalr_agent.cmd.uptime_seconds
. Type: Gauge Uptime of the service in seconds. -
scalr_agent.python.mem_used_bytes
Type: Gauge Resident memory used by the Python process. -
scalr_agent.python.greenlets_count
Type: Gauge Number of active Python greenlets. -
scalr_agent.python.threads_count
Type: Gauge Number of active Python threads. -
scalr_agent.worker.cancel_duration_seconds
Type: Histogram Time required to cancel a task. -
scalr_agent.worker.cancel_errors_total
Type: Counter Counter for failed task cancellation attempts. -
scalr_agent.worker.tasks_total
Type: Gauge Total number of tasks on the Huey worker. -
scalr_agent.worker.run_tasks_total
Type: Gauge Total number of Scalr Run tasks on the Huey worker. -
scalr_agent.worker.task.update_status_duration_seconds
Type: Histogram Time taken to send a status update to the Scalr platform. -
scalr_agent.worker.task.update_result_duration_seconds
Type: Histogram Time taken to send a result update to the Scalr platform. -
scalr_agent.consumer.acquire_tasks_duration_seconds
Type: Histogram Time taken to acquire tasks on the Scalr Agent. -
scalr_agent.consumer.connect_to_pool_duration_seconds
Type: Histogram Time taken for the Scalr Agent to register with the platform. -
scalr_agent.containers.schedule_duration_seconds
Type: Histogram Time taken by Kubernetes to schedule a task Pod. -
scalr_agent.containers.delegate_duration_seconds
Type: Histogram Time taken by the worker to pick up a scheduled Pod. -
scalr_agent.containers.cpu_limit_nanocores
Type: Gauge CPU limit in nanocores for containers during Plan/Apply stages. -
scalr_agent.containers.mem_limit_bytes
Type: Gauge Memory limit in bytes for containers during Plan/Apply stages.
IaC Metrics
Metrics for OpenTofu, Terraform, and Terragrunt, emitted by terraform.plan
and terraform.apply
tasks.
-
scalr_agent.app.terraform.install_binaries_duration_seconds
Type: Histogram Time taken to install binaries for IaC operations. -
scalr_agent.app.terraform.download_configuration_version_duration_seconds
Type: Histogram Time taken to download the configuration version. -
scalr_agent.app.terraform.plan_command_duration_seconds
Type: Histogram Time taken to execute the plan operation. -
scalr_agent.app.terraform.show_command_duration_seconds
Type: Histogram Time taken to run theshow
command and generate a JSON plan. -
scalr_agent.app.terraform.upload_plan_duration_seconds
Type: Histogram Time taken to compress and upload plan results (binary and JSON). -
scalr_agent.app.terraform.init_command_duration_seconds
Type: Histogram Time taken to execute the init operation. -
scalr_agent.app.terraform.install_providers_duration_seconds
Type: Histogram Time taken to install providers before init. -
scalr_agent.app.terraform.cache_providers_duration_seconds
Type: Histogram Time taken to cache providers after init. -
scalr_agent.app.terraform.apply_command_duration_seconds
Type: Histogram Time taken to execute the apply operation. -
scalr_agent.app.terraform.provider_cache_total_used_size_bytes
Type: Gauge Current size of the provider plugin cache. -
scalr_agent.app.terraform.provider_cache_freed_size_bytes
Type: Counter Total size of provider cache cleared by GC. -
scalr_agent.app.terraform.provider_cache_usage_count
Type: Gauge Usage count of provider plugins in the cache. -
scalr_agent.app.terraform.provider_cache_usage_size_bytes
Type: Gauge Disk space used per provider plugin in the cache. -
scalr_agent.binary_cache.total_used_size_bytes
Type: Gauge Current size of the binary cache. -
scalr_agent.app.terraform.run_config_dir_used_size_bytes
Type: Counter Size of the initialized run config directory after Plan/Apply.
Policy Metrics
Emitted by policy.check
tasks for pre-init and post-plan stages.
-
scalr_agent.app.policy.download_tfinput_duration_seconds
Type: Histogram Time taken to downloadtfinput
files. -
scalr_agent.app.policy.evaluate_policy_group_duration_seconds
Type: Histogram Time taken to evaluate a policy group. -
scalr_agent.app.policy.evaluate_policy_duration_seconds
Type: Histogram Time taken to evaluate all files in a policy. -
scalr_agent.app.policy.opa_eval_command_duration_seconds
Type: Histogram Time taken to run the OPA evaluate command. -
scalr_agent.app.policy.install_binaries_duration_seconds
Type: Histogram Time taken to install binaries for policy checks.
Cost Estimation Metrics
Emitted by cost.estimate
tasks during the cost estimation stage.
-
scalr_agent.app.cost.download_plan_json_duration_seconds
Type: Histogram Time taken to download the plan JSON for cost analysis. -
scalr_agent.app.cost.estimate_duration_seconds
Type: Histogram Time taken to perform cost estimation. -
scalr_agent.app.cost.install_binaries_duration_seconds
Type: Histogram Time taken to install binaries for cost estimation. -
scalr_agent.app.cost.infracost_breakdown_command_duration_seconds
Type: Histogram Time taken to execute the Infracost breakdown command. -
scalr_agent.app.cost.infracost_output_command_duration_seconds
Type: Histogram Time taken to execute the Infracost output command.
Checkov Metrics
Emitted by checkov.analyze
tasks during the pre-plan stage.
-
scalr_agent.app.checkov.download_configuration_version_duration_seconds
Type: Histogram Time taken to download the configuration version for Checkov. -
scalr_agent.app.checkov.download_external_checks_duration_seconds
Type: Histogram Time taken to download external Checkov check files. -
scalr_agent.app.checkov.checkov_analyze_command_duration_seconds
Type: Histogram Time taken to execute the Checkov analyze command. -
scalr_agent.app.checkov.install_binaries_duration_seconds
Type: Histogram Time taken to install binaries for Checkov operations.
Updated 3 days ago