Metrics
The Scalr Agent generates various metrics that provide insights into its performance.
Configuration
To enable this feature, set the SCALR_AGENT_OTLP_ENDPOINT environment variable to the host:port address of your OpenTelemetry collector – a gRPC server running an OTLP collector, and enable SCALR_AGENT_OTLP_METRICS_ENABLED option.
Naming conventions
Scalr Agent metrics adhere to a consistent naming convention to provide clear context.
Structure
Metrics are prefixed with a component path, like scalr_agent.app.policy or scalr_agent.worker, to identify the source of the metric.
If a unit is necessary for clarity, it's appended to the metric name in full, such as _bytes or _seconds.
Data  Types
- Time: All timing metrics are measured in seconds with millisecond precision, aligning with established conventions like those used by Prometheus and OpenTelemetry.
 - Size: All data size metrics are measured in bytes.
 - Boolean: Boolean values are represented as the strings true or false.
 
Labels
Metrics prefixed with scalr_agent.app include two specific labels for context: account_name and run_id.
Main Metrics
Metrics from the Scalr Agent and internal worker pool.
- 
scalr_agent.cmd.startup_duration_seconds. Type: Histogram Time taken for the Scalr Agent to be ready to accept incoming tasks after launch. - 
scalr_agent.cmd.uptime_seconds. Type: Gauge Uptime of the service in seconds. - 
scalr_agent.python.mem_used_bytesType: Gauge Resident memory used by the Python process. - 
scalr_agent.python.greenlets_countType: Gauge Number of active Python greenlets. - 
scalr_agent.python.threads_countType: Gauge Number of active Python threads. - 
scalr_agent.worker.cancel_duration_secondsType: Histogram Time required to cancel a task. - 
scalr_agent.worker.cancel_errors_totalType: Counter Counter for failed task cancellation attempts. - 
scalr_agent.worker.tasks_totalType: Gauge Total number of tasks on the Huey worker. - 
scalr_agent.worker.run_tasks_totalType: Gauge Total number of Scalr Run tasks on the Huey worker. - 
scalr_agent.worker.task.update_status_duration_secondsType: Histogram Time taken to send a status update to the Scalr platform. - 
scalr_agent.worker.task.update_result_duration_secondsType: Histogram Time taken to send a result update to the Scalr platform. - 
scalr_agent.consumer.acquire_tasks_duration_secondsType: Histogram Time taken to acquire tasks on the Scalr Agent. - 
scalr_agent.consumer.connect_to_pool_duration_secondsType: Histogram Time taken for the Scalr Agent to register with the platform. - 
scalr_agent.containers.schedule_duration_secondsType: Histogram Time taken by Kubernetes to schedule a task Pod. - 
scalr_agent.containers.delegate_duration_secondsType: Histogram Time taken by the worker to pick up a scheduled Pod. - 
scalr_agent.containers.cpu_limit_nanocoresType: Gauge CPU limit in nanocores for containers during Plan/Apply stages. - 
scalr_agent.containers.mem_limit_bytesType: Gauge Memory limit in bytes for containers during Plan/Apply stages. 
IaC Metrics
Metrics for OpenTofu, Terraform, and Terragrunt, emitted by terraform.plan and terraform.apply tasks.
- 
scalr_agent.app.terraform.install_binaries_duration_secondsType: Histogram Time taken to install binaries for IaC operations. - 
scalr_agent.app.terraform.download_configuration_version_duration_secondsType: Histogram Time taken to download the configuration version. - 
scalr_agent.app.terraform.plan_command_duration_secondsType: Histogram Time taken to execute the plan operation. - 
scalr_agent.app.terraform.show_command_duration_secondsType: Histogram Time taken to run theshowcommand and generate a JSON plan. - 
scalr_agent.app.terraform.upload_plan_duration_secondsType: Histogram Time taken to compress and upload plan results (binary and JSON). - 
scalr_agent.app.terraform.init_command_duration_secondsType: Histogram Time taken to execute the init operation. - 
scalr_agent.app.terraform.install_providers_duration_secondsType: Histogram Time taken to install providers before init. - 
scalr_agent.app.terraform.cache_providers_duration_secondsType: Histogram Time taken to cache providers after init. - 
scalr_agent.app.terraform.apply_command_duration_secondsType: Histogram Time taken to execute the apply operation. - 
scalr_agent.app.terraform.provider_cache_total_used_size_bytesType: Gauge Current size of the provider plugin cache. - 
scalr_agent.app.terraform.provider_cache_freed_size_bytesType: Counter Total size of provider cache cleared by GC. - 
scalr_agent.app.terraform.provider_cache_usage_countType: Gauge Usage count of provider plugins in the cache. - 
scalr_agent.app.terraform.provider_cache_usage_size_bytesType: Gauge Disk space used per provider plugin in the cache. - 
scalr_agent.binary_cache.total_used_size_bytesType: Gauge Current size of the binary cache. - 
scalr_agent.app.terraform.run_config_dir_used_size_bytesType: Counter Size of the initialized run config directory after Plan/Apply. 
Policy Metrics
Emitted by policy.check tasks for pre-init and post-plan stages.
- 
scalr_agent.app.policy.download_tfinput_duration_secondsType: Histogram Time taken to downloadtfinputfiles. - 
scalr_agent.app.policy.evaluate_policy_group_duration_secondsType: Histogram Time taken to evaluate a policy group. - 
scalr_agent.app.policy.evaluate_policy_duration_secondsType: Histogram Time taken to evaluate all files in a policy. - 
scalr_agent.app.policy.opa_eval_command_duration_secondsType: Histogram Time taken to run the OPA evaluate command. - 
scalr_agent.app.policy.install_binaries_duration_secondsType: Histogram Time taken to install binaries for policy checks. 
Cost Estimation Metrics
Emitted by cost.estimate tasks during the cost estimation stage.
- 
scalr_agent.app.cost.download_plan_json_duration_secondsType: Histogram Time taken to download the plan JSON for cost analysis. - 
scalr_agent.app.cost.estimate_duration_secondsType: Histogram Time taken to perform cost estimation. - 
scalr_agent.app.cost.install_binaries_duration_secondsType: Histogram Time taken to install binaries for cost estimation. - 
scalr_agent.app.cost.infracost_breakdown_command_duration_secondsType: Histogram Time taken to execute the Infracost breakdown command. - 
scalr_agent.app.cost.infracost_output_command_duration_secondsType: Histogram Time taken to execute the Infracost output command. 
Checkov Metrics
Emitted by checkov.analyze tasks during the pre-plan stage.
- 
scalr_agent.app.checkov.download_configuration_version_duration_secondsType: Histogram Time taken to download the configuration version for Checkov. - 
scalr_agent.app.checkov.download_external_checks_duration_secondsType: Histogram Time taken to download external Checkov check files. - 
scalr_agent.app.checkov.checkov_analyze_command_duration_secondsType: Histogram Time taken to execute the Checkov analyze command. - 
scalr_agent.app.checkov.install_binaries_duration_secondsType: Histogram Time taken to install binaries for Checkov operations. 
Updated about 2 months ago
