Metrics
The Scalr Agent generates various metrics that provide insights into its performance.
Configuration
To enable this feature, set the SCALR_AGENT_OTLP_ENDPOINT environment variable to the host:port address of your OpenTelemetry collector – a gRPC server running an OTLP collector, and enable SCALR_AGENT_OTLP_METRICS_ENABLED option.
Naming conventions
Scalr Agent metrics adhere to a consistent naming convention to provide clear context.
Structure
Metrics are prefixed with a component path, like scalr_agent.core or scalr_agent.app.policy to identify the source of the metric.
If a unit is necessary for clarity, it's appended to the metric name in full, such as _bytes or _seconds.
Data Types
- Time: All timing metrics are measured in seconds with millisecond precision, aligning with established conventions like those used by Prometheus and OpenTelemetry.
- Size: All data size metrics are measured in bytes.
- Boolean: Values are represented as the strings "true" or "false".
Labels
All metrics follow Unified Service Tagging, including env, service, and version labels. The service is set to scalr-agent, version contains the version of the Scalr Agent emitting the metrics, and env is always set to prod.
Additional labels:
app: The URL endpoint without scheme that the agent is connected to. E.g.myaccount.scalr.io.hostname: The agent hostname.kube_component: The Kubernetes component (controller or worker), related to agent-k8s and agent-job Helm charts.kube_namespace: The Kubernetes namespace the agent is deployed to.
Metrics prefixed with scalr_agent.app include an additional context-specific label: account_name.
Core Metrics
Metrics produced by the Scalr Agent runtime.
scalr_agent.core.update_status_duration_seconds
scalr_agent.core.update_status_duration_secondsTime taken to send a status update to the Scalr platform.
Type: Histogram
Unit: seconds
scalr_agent.core.update_result_duration_seconds
scalr_agent.core.update_result_duration_secondsTime taken to send a result update to the Scalr platform.
Type: Histogram
Unit: seconds
scalr_agent.core.cpu_limit_nanocores
scalr_agent.core.cpu_limit_nanocoresCPU limit in nanocores for the container during Plan and Apply Scalr run stages.
Type: Gauge
Unit: nanocores
scalr_agent.core.mem_limit_bytes
scalr_agent.core.mem_limit_bytesMemory limit in bytes for the container during Plan and Apply Scalr run stages.
Type: Gauge
Unit: bytes
scalr_agent.core.kubernetes_job_max_scheduling_delay_seconds
scalr_agent.core.kubernetes_job_max_scheduling_delay_secondsA real-time gauge showing the maximum scheduling delay across all active jobs handled by the agent controller (using kubernetes_job driver).
Type: ObservableGauge
Unit: seconds
scalr_agent.core.kubernetes_job_startup_latency_seconds
scalr_agent.core.kubernetes_job_startup_latency_secondsThe time taken by an agent worker (using kubernetes_job driver) to pick up an agent task pod created by the agent controller.
Type: Histogram
Unit: seconds
scalr_agent.core.kubernetes_schedule_duration_seconds
scalr_agent.core.kubernetes_schedule_duration_secondsThe time taken by the agent controller (using kubernetes or kubernetes_job driver) to schedule an agent task pod.
Type: Histogram
Unit: seconds
scalr_agent.core.kubernetes_delegate_duration_seconds
scalr_agent.core.kubernetes_delegate_duration_secondsThe time taken by an agent worker (using kubernetes or kubernetes_job driver) to pick up an agent task pod created by an agent controller.
Type: Histogram
Unit: seconds
scalr_agent.core.acquire_tasks_duration_seconds
scalr_agent.core.acquire_tasks_duration_secondsTime taken to acquire tasks on the Scalr Agent.
Type: Histogram
Unit: seconds
scalr_agent.core.tasks_total
scalr_agent.core.tasks_totalTotal number of tasks on the Huey worker.
Type: Gauge
scalr_agent.core.run_tasks_total
scalr_agent.core.run_tasks_totalTotal number of Scalr Run tasks on the Huey worker.
Type: Gauge
scalr_agent.core.import_duration_seconds
scalr_agent.core.import_duration_secondsTime taken for the Python runtime to launch.
Type: Histogram
Unit: seconds
scalr_agent.core.startup_duration_seconds
scalr_agent.core.startup_duration_secondsTime taken be ready to accept incoming tasks after launched.
Type: Histogram
Unit: seconds
scalr_agent.core.connect_to_pool_duration_seconds
scalr_agent.core.connect_to_pool_duration_secondsTime taken to register with the Scalr platform.
Type: Histogram
Unit: seconds
scalr_agent.binary_cache.total_used_size_bytes
scalr_agent.binary_cache.total_used_size_bytesCurrent size of the binary cache.
Type: ObservableGauge
Unit: bytes
scalr_agent.binary_cache.module_cache_usage_size_bytes
scalr_agent.binary_cache.module_cache_usage_size_bytesDisk space used by each module in the cache.
Type: ObservableGauge
Unit: bytes
scalr_agent.core.cancel_duration_seconds
scalr_agent.core.cancel_duration_secondsTime required to cancel a task.
Type: Histogram
Unit: seconds
scalr_agent.core.cancel_errors_total
scalr_agent.core.cancel_errors_totalCounter for failed task cancellation attempts.
Type: Counter
Blob Storage Metrics
Emitted by the HTTP blob client during read/write/extract operations to Scalr blob storage. Includes I/O transfer metrics for processing configuration versions, Run Stage logs, software binaries, plan files, and state files, etc.
scalr_agent.core.blobclient.upload_blob_duration_seconds
scalr_agent.core.blobclient.upload_blob_duration_secondsDuration of full blob uploads (HTTP PUT).
Type: Histogram
Unit: seconds
scalr_agent.core.blobclient.upload_blob_bytes
scalr_agent.core.blobclient.upload_blob_bytesTotal bytes uploaded (HTTP PUT).
Type: Counter
Unit: bytes
scalr_agent.core.blobclient.read_blob_duration_seconds
scalr_agent.core.blobclient.read_blob_duration_secondsDuration of blob downloads (HTTP GET).
Type: Histogram
Unit: seconds
scalr_agent.core.blobclient.read_blob_bytes
scalr_agent.core.blobclient.read_blob_bytesTotal bytes downloaded (HTTP GET).
Type: Counter
Unit: bytes
scalr_agent.core.blobclient.write_blob_duration_seconds
scalr_agent.core.blobclient.write_blob_duration_secondsDuration of blob write operations (HTTP PATCH).
Type: Histogram
Unit: seconds
scalr_agent.core.blobclient.write_blob_bytes
scalr_agent.core.blobclient.write_blob_bytesTotal bytes written to blob (HTTP PATCH).
Type: Counter
Unit: bytes
scalr_agent.core.blobclient.extract_blob_duration_seconds
scalr_agent.core.blobclient.extract_blob_duration_secondsDuration of blob extraction (download and decompress).
Type: Histogram
Unit: seconds
IaC Component Metrics
Metrics for OpenTofu, Terraform, and Terragrunt, emitted by terraform.plan and terraform.apply tasks.
scalr_agent.app.terraform.apply_command_duration_seconds
scalr_agent.app.terraform.apply_command_duration_secondsTime taken to execute the Terraform/OpenTofu apply command.
Type: Histogram
Unit: seconds
scalr_agent.app.terraform.upload_configuration_version_changes_duration_seconds
scalr_agent.app.terraform.upload_configuration_version_changes_duration_secondsTime taken to compress and upload configuration version changes to Scalr
Type: Histogram
Unit: seconds
scalr_agent.app.terraform.init_command_duration_seconds
scalr_agent.app.terraform.init_command_duration_secondsTime taken to execute the Terraform/OpenTofu init command.
Type: Histogram
Unit: seconds
scalr_agent.app.terraform.validate_command_duration_seconds
scalr_agent.app.terraform.validate_command_duration_secondsTime taken to execute the Terraform/OpenTofu validate command.
Type: Histogram
Unit: seconds
scalr_agent.app.terraform.fmt_command_duration_seconds
scalr_agent.app.terraform.fmt_command_duration_secondsTime taken to execute the Terraform/OpenTofu fmt command.
Type: Histogram
Unit: seconds
scalr_agent.app.terraform.get_command_duration_seconds
scalr_agent.app.terraform.get_command_duration_secondsTime taken to execute the Terraform/OpenTofu get command.
Type: Histogram
Unit: seconds
scalr_agent.app.terraform.install_providers_duration_seconds
scalr_agent.app.terraform.install_providers_duration_secondsTime taken to install providers before the Terraform/OpenTofu init command.
Type: Histogram
Unit: seconds
scalr_agent.app.terraform.cache_providers_duration_seconds
scalr_agent.app.terraform.cache_providers_duration_secondsTime taken to cache providers after the Terraform/OpenTofu init command.
Type: Histogram
Unit: seconds
scalr_agent.app.terraform.install_modules_duration_seconds
scalr_agent.app.terraform.install_modules_duration_secondsTime taken to install modules before the Terraform/OpenTofu get command.
Type: Histogram
Unit: seconds
scalr_agent.app.terraform.cache_modules_duration_seconds
scalr_agent.app.terraform.cache_modules_duration_secondsTime taken to cache modules after the Terraform/OpenTofu get command.
Type: Histogram
Unit: seconds
scalr_agent.app.terraform.plan_command_duration_seconds
scalr_agent.app.terraform.plan_command_duration_secondsTime taken to execute Terraform/OpenTofu plan command.
Type: Histogram
Unit: seconds
scalr_agent.app.terraform.show_command_duration_seconds
scalr_agent.app.terraform.show_command_duration_secondsTime taken to execute show operation to generate JSON plan.
Type: Histogram
Unit: seconds
scalr_agent.app.terraform.upload_plan_duration_seconds
scalr_agent.app.terraform.upload_plan_duration_secondsTime taken to compress and upload plan results back to Scalr in both binary and JSON formats.
Type: Histogram
Unit: seconds
scalr_agent.app.terraform.test_command_duration_seconds
scalr_agent.app.terraform.test_command_duration_secondsTime taken to execute the OpenTofu test command.
Type: Histogram
Unit: seconds
scalr_agent.app.terraform.install_binaries_duration_seconds
scalr_agent.app.terraform.install_binaries_duration_secondsTime taken to install binaries for IAC operation.
Type: Histogram
Unit: seconds
scalr_agent.app.terraform.download_configuration_version_duration_seconds
scalr_agent.app.terraform.download_configuration_version_duration_secondsTime taken to download configuration version for plan, apply and test operations.
Type: Histogram
Unit: seconds
scalr_agent.app.terraform.download_configuration_version_changes_duration_seconds
scalr_agent.app.terraform.download_configuration_version_changes_duration_secondsTime taken to download configuration version changes for apply operations.
Type: Histogram
Unit: seconds
scalr_agent.app.terraform.provider_cache_freed_size_bytes
scalr_agent.app.terraform.provider_cache_freed_size_bytesTotal size in bytes of provider plugins cache deleted by garbage collection.
Type: Counter
Unit: bytes
scalr_agent.app.terraform.run_config_dir_used_size_bytes
scalr_agent.app.terraform.run_config_dir_used_size_bytesTotal size in bytes of the initialized Scalr Run configuration directory, measured after a Plan or Apply operation completes.
Type: Counter
Unit: bytes
scalr_agent.app.terraform.provider_cache_total_used_size_bytes
scalr_agent.app.terraform.provider_cache_total_used_size_bytesCurrent size of the provider plugins cache directory.
Type: ObservableGauge
Unit: bytes
scalr_agent.app.terraform.provider_cache_usage_count
scalr_agent.app.terraform.provider_cache_usage_countNumber of times each provider plugin in the cache has been used.
Type: ObservableGauge
Unit: count
scalr_agent.app.terraform.provider_cache_usage_size_bytes
scalr_agent.app.terraform.provider_cache_usage_size_bytesDisk space used by each provider plugin in the cache.
Type: ObservableGauge
Unit: bytes
Policy Component Metrics
Emitted by policy.check tasks for pre-init and post-plan stages.
scalr_agent.app.policy.opa_eval_command_duration_seconds
scalr_agent.app.policy.opa_eval_command_duration_secondsTime taken to call the opa eval command.
Type: Histogram
Unit: seconds
scalr_agent.app.policy.download_tfinput_duration_seconds
scalr_agent.app.policy.download_tfinput_duration_secondsTime taken to download tfinput files.
Type: Histogram
Unit: seconds
scalr_agent.app.policy.evaluate_policy_group_duration_seconds
scalr_agent.app.policy.evaluate_policy_group_duration_secondsTime taken to evaluate a policy group.
Type: Histogram
Unit: seconds
scalr_agent.app.policy.evaluate_policy_duration_seconds
scalr_agent.app.policy.evaluate_policy_duration_secondsTime taken to evaluate all files within a policy.
Type: Histogram
Unit: seconds
scalr_agent.app.policy.install_binaries_duration_seconds
scalr_agent.app.policy.install_binaries_duration_secondsTime taken to install binaries for policy operation.
Type: Histogram
Unit: seconds
Cost Estimate Component Metrics
Emitted by cost.estimate tasks during the cost estimation stage.
scalr_agent.app.cost.infracost_breakdown_command_duration_seconds
scalr_agent.app.cost.infracost_breakdown_command_duration_secondsTime taken to execute the Infracost breakdown command.
Type: Histogram
Unit: seconds
scalr_agent.app.cost.infracost_output_command_duration_seconds
scalr_agent.app.cost.infracost_output_command_duration_secondsTime taken to execute the Infracost output command.
Type: Histogram
Unit: seconds
scalr_agent.app.cost.download_plan_json_duration_seconds
scalr_agent.app.cost.download_plan_json_duration_secondsTime taken to download the plan JSON file as input for the cost estimation workflow.
Type: Histogram
Unit: seconds
scalr_agent.app.cost.estimate_duration_seconds
scalr_agent.app.cost.estimate_duration_secondsTime taken for the cost estimation workflow.
Type: Histogram
Unit: seconds
scalr_agent.app.cost.install_binaries_duration_seconds
scalr_agent.app.cost.install_binaries_duration_secondsTime taken to install binaries for cost operations.
Type: Histogram
Unit: seconds
Checkov Component Metrics
Emitted by checkov.analyze tasks during the pre-plan stage.
scalr_agent.app.checkov.download_configuration_version_duration_seconds
scalr_agent.app.checkov.download_configuration_version_duration_secondsTime taken to download the configuration version for Checkov operations.
Type: Histogram
Unit: seconds
scalr_agent.app.checkov.download_external_checks_duration_seconds
scalr_agent.app.checkov.download_external_checks_duration_secondsTime taken to download external Checkov check files.
Type: Histogram
Unit: seconds
scalr_agent.app.checkov.checkov_analyze_command_duration_seconds
scalr_agent.app.checkov.checkov_analyze_command_duration_secondsTime taken to execute the Checkov analyze command.
Type: Histogram
Unit: seconds
scalr_agent.app.checkov.install_binaries_duration_seconds
scalr_agent.app.checkov.install_binaries_duration_secondsTime taken to install binaries for Checkov operations.
Type: Histogram
Unit: seconds
Updated 30 days ago
