Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog , and this project adheres to Semantic Versioning.

[0.57.0] - 2025-12-12

Fixed

Core: Minor stability improvements to asynchronous processing.
Local Driver: Minor fixes and logging improvments for termination signals propagation.
IaC: HCL configuration parsing issues related to parsing user-tags for provider configurations.

[0.56.0] - 2025-11-20

Fixed

Provider Cache: Improvements in provider cache locking; minor logging improvements.
Core: Added periodic health monitoring for runner containers. The agent now detects when containers are killed manually or by the runtime (Kubernetes Pod evictions, OOM kills, etc.) and fails runs appropriately. Enabled by default with a 30-second check interval. Can be configured via:
- SCALR_AGENT_RUNNER_LIVECHECK_ENABLED (default: true)
- SCALR_AGENT_RUNNER_LIVECHECK_INTERVAL (default: 30 seconds)
Core: Fixed propagation of the SIGTERM signal to runner subprocess (e.g., Terraform, OpenTofu, OPA, etc.) during graceful shutdown.

Removed

Removed experimetal SCALR_AGENT_MEMRAY config option.

[0.55.2] - 2025-11-03

Fixed

Fixed an internal error with non-UTF-8 characters in logs.

[0.55.2] - 2025-11-03

Fixed

Fixed an internal error with non-UTF-8 characters in logs.

[0.55.1] - 2025-11-3

Fixed

The agent would fail if there were non-utf8 characters in the output.

[0.55.1] - 2025-10-20

Added

Added support for building agent-level hooks directly into the Docker image.

[0.55.0] - 2025-10-15

Added

Kubernetes Driver: Added the kubernetes-job driver option to enable stateless Job workers for the Kubernetes runtime. When enabled, run stage workers are created as short-lived Kubernetes Jobs instead of relying on a persistent DaemonSet. This charts is in Alpha, and implementation details are subject to change, please don't use in production.

Changes

Core: Automatically select the local driver if a driver is not explicitly set and the Docker socket is not present.
Core: The command entrypoint now use scalr-agent run as the default command when none is explicitly provided to simplify command handling. The Scalr Agent can now be started with docker run scalr/agent --token=xxx, while the previous behavior (docker run scalr/agent run --token=xxx) is still supported.
Core: Minor update to the async library stack (gevent/greenlet).

Fixed

Core: Fixed an issue where the agent could become stuck while writing to the run logs

[0.54.0] - 2025-09-08

Added

IAC: Added support of agent-level hooks.
IAC: Added SCALR_AWS_SERVICE_CREDENTIALS_SOURCE shell variable support

[0.53.0] - 2025-08-29

Added

agent-local: Added new labels to agent-local Kubernetes chart for Karpenter and GKE Autopilot to reduce the risk of pod eviction:
- karpenter.sh/do-not-evict: "true"
- karpenter.sh/do-not-disrupt: "true"
- autopilot.gke.io/priority: high
Provider Cache: SCALR_AGENT_PROVIDER_CACHE_DROP_KERNEL_PAGES configuration option (boolean). Controls whether to drop Linux kernel page cache entries for the OpenTofu/Terraform Provider Cache directory. Defaults to true. This frees cached filesystem pages but does not remove any provider files.
Provider Cache: Add an attempt to free cached kernel pages associated with the provider cache directory (best effort) using the POSIX_FADV_DONTNEED advice after each plan/apply stage is finished. This feature aims to reduce kernel page cache usage by OpenTofu/Terraform providers and improve memory reporting in the Kubernetes runtime.

[0.52.3] - 2025-08-21

Added

Docker: Include Git in the scalr/agent image to make it a suitable choice for the local driver. Previously, only the bulk scalr/agent-runner image could be used for the local driver due to the lack of Git in the basic agent image.
Kubernetes: Add SCALR_AGENT_KUBERNETES_TERMINATION_GRACE_PERIOD_SECONDS option to configure the termination grace period for agent task pods on Kubernetes driver. This allows better control over pod shutdown behavior. Default is 30 seconds.

Changes

Kubernetes: The agent task pod deletion now respects the SCALR_AGENT_KUBERNETES_TERMINATION_GRACE_PERIOD_SECONDS setting. The default value has been updated from 0 seconds (immediate force deletion) to 30 seconds.

Fixed

IAC: Fixed issue with module reports on agent runs

[0.52.2] - 2025-08-19

Fixed

IAC: Added support for variables with the OpenTofu extension
IAC: terragrunt hclvalidate fails with an empty list of invalid HCL files.

[0.52.1] - 2025-08-15

Fixed

Provider Cache: Resolve h1: checksums for the HCL lock file when SCALR_AGENT_PROVIDER_CACHE_WARM_UP_FROM_LOCKFILE is disabled.
Core: Add posting of state components to the Scalr API for reports.

[0.52.0] - 2025-08-08

Added

Core: Reporting of runtime and driver details to the Scalr API when connecting to an agent pool.
Core: Add unzip to the Docker image and use it as the preferred method to unpack provider plugins.
Core: Add new OpenTelemetry metrics to track RSS memory usage, threads and greenlets count.
Provider Cache: Add the SCALR_AGENT_PROVIDER_CACHE_WARM_UP_FROM_LOCKFILE config option to allow disabling provider pre-download by the Scalr Agent, useful for debugging or in case of incompatibilities or issues.

Fixed

Core: Fix intermittent timeout issues during provider plugin installation.

[0.51.0] - 2025-07-18

Added

Core: Reporting of runtime and driver details to the Scalr API when connecting to an agent pool.
Core: Add unzip to the Docker image and use it as the preferred method to unpack provider plugins.
Core: Add new OpenTelemetry metrics to track RSS memory usage, threads and greenlets count.
Provider Cache: Add the SCALR_AGENT_PROVIDER_CACHE_WARM_UP_FROM_LOCKFILE config option to allow disabling provider pre-download by the Scalr Agent, useful for debugging or in case of incompatibilities or issues.

Fixed

Core: Fix intermittent timeout issues during provider plugin installation.

[0.51.0] - 2025-07-18

Changes

Updated SCALR_AGENT_PROVIDER_CACHE_SIZE_LIMIT_MB default from 2560 (2.5Gb) to 5120 (5Gb).

Fixed

Core: Fixed Provider Cache garbage collection issues where the cache could significantly exceed the expected threshold (5Gb by default). A reduction in disk usage is expected after the upgrade.
Core: Minor fixes in OTLP metrics collection pipelines.

[0.50.0] - 2025-07-11

Changes

Core: Disabled strict shell pipeline mode (set -e) for hooks. This reverts a change introduced in 0.48.0, where the mode was unintentionally applied to user-defined hooks.

Fixed

IAC: Fixed random Cannot configure volume errors when apply is triggered via the auto-approve workflow after a plan.
IAC: Fixed 'utf-8' codec can't decode byte... errors when parsing declared configuration version variables.

[0.49.0] - 2025-07-03

Added

Core: New metrics added for core, policy, cost, and checkov components.

Changes

Core: Updated HOME directory handling. Previously, the HOME directory could be /root or /tmp, depending on the user under which the agent was launched. For the local driver, HOME defaults to app/terraform/runs/run-xxx/tmp within the Scalr agent data directory. In containerized environments (Docker or Kubernetes drivers), the /tmp mount is used. This change ensures HOME is consistent across user configurations and improves run behavior predictability across various agent deployments.
Provider Configurations: Initialize Google/AWS OIDC provider configuration files before the init phase to make these files available in hooks for custom scripts. Previously, these files were initialized after init, so they weren’t available for pre-init hooks.

Fixed

Core: Resolved stat /root/.netrc: permission denied error when installing a Terraform module under the Docker agent running as a non-root user.
Core: Added new shell variables for the run environment:
- SCALR_AGENT_RUN_CONFIG_DIR – Root directory of the current configuration version containing the .tf files.
- SCALR_AGENT_RUN_DATA_DIR – Directory for data files and configs stored outside the OpenTofu/Terraform .tf configuration files.
- SCALR_AGENT_RUN_WORK_DIR – Terraform working directory within the configuration directory where all OpenTofu/Terraform commands are executed. These variables may be useful for hook scripts under the local driver, where directories can differ between runs (e.g. /var/lib/scalr-agent/app/terraform/runs/run-xxx/data, /var/lib/scalr-agent/app/terraform/runs/run-xxx/workdir), compared to containerized drivers like Docker or Kubernetes, where directories are represented by static mounts inside containers (e.g., /opt/data, /opt/workdir, etc.).
Core: Resolved Python UserWarning on service startup.
Local Driver: Resolved issues with Google/AWS OIDC provider configuration on the local driver.

[0.48.0] - 2025-06-18

Added

Local Driver: Added the scalr/agent-runner image for the agent-local Kubernetes chart, based on scalr/runner:0.1.4. See scalr/runner 0.1.4 release for details.
Kubernetes/Docker Driver: Added an option to enforce the use of a custom runner image instead of the default scalr/runner:x.y.z - SCALR_AGENT_CONTAINER_TASK_IMAGE. It can also be used to pin a specific version of the scalr/runner image, as by default this is controlled by the Scalr platform settings. This option only applies if software binary releases are enabled for the agent pool’s account. This option will ignore SCALR_AGENT_CONTAINER_TASK_IMAGE_REGISTRY which was intended to match against a large set of images for different software versions. If you want to use a custom image registry for custom runner image (aka golden image), simply specify it with SCALR_AGENT_CONTAINER_TASK_IMAGE=registry.example.com/company/runner:1.2.3.
Kubernetes Driver: Added standardized labels for Pods spawned by the Scalr Agent controller:
- app.kubernetes.io/name: "agent-k8s-task"
- app.kubernetes.io/managed-by: "agent-k8s"
- app.kubernetes.io/component: "task"
- app.kubernetes.io/instance: "atask-xxx" (Scalr Agent Task ID)
These labels can be used by monitoring and logging tools, such as DataDog Agent or Grafana Alloy, to identify all pods related to the Scalr Agent installation.

Changes

Core: Updated the filesystem directory layout for run directories and GC behavior for stale run directories:
- Old path: /var/lib/scalr-agent/workspaces/ws-v0ord7i9m8b4h7bvg/runs/run-v0ormbhajtqial94u/plan-v0ormbtfj2avqk453
- New path: /var/lib/scalr-agent/app/terraform/runs/run-v0ormbhajtqial94u The new path is simplified and consistent for both the plan and apply phases. Garbage collection will now remove stale run directories strictly older than 24 hours (based on st_mtime), without making additional API calls to the Scalr Platform to check the run status. Under normal conditions, run directories are temporary and are removed immediately after run stage completion.
Core: Explicitly export the $HOME directory as a shell variable by default for better compatibility with Python’s pip packages installations.
Core: Include $HOME/.local/binin the $PATH by default for better compatibility with Python’s pip binary packages installations.
Core: Respect all Scalr Agent configuration options under theSCALR*AGENT**environment variable prefix, in addition to the currentSCALR\_*prefix, except forSCALR*URL. This change improves configuration consistency by distinguishing Scalr platform options (SCALR**) from Scalr Agent options (SCALR*AGENT**). Old prefix will be backward compatible and no changes for existing configuration is requred.
Core: Upgraded Python Huey worker from 2.3.0 to 2.5.3.
Core: Increased the filesystem polling timeout for execution loop file polling from 5 to 15 seconds to improve behavior on NFS filesystems with metadata attribute caching.

Fixed

Local Driver: Resolved the git must be available and on the PATH error when installing OpenTofu/Terraform modules via Git (fixed by adding the scalr/agent-runner image, which includes Git preinstalled).
Local Driver: Resolved the Error: failed to get shared config profile, default error when using AWS Provider Configurations with the Local Driver. This issue occurred because the AWS configuration path was inconsistent between the plan and apply phases (fixed by the filesystem directory layout changes).
Core: Fixed /usr/bin/git exited with 128: fatal: detected dubious ownership in repository at ... when cloning Git modules on NFS during CLI-driven runs.

[0.47.0] - 2025-05-30

Added

Introduced the $SCALR_AGENT_ENV environment variable to customize the Run environment through Workspace Hook scripts.

Changes

Failed tofu/terraform show commands will log their stderr output to the Run Console.
Local Driver: Shell environment variables from the host are propagated to the Scalr Run shell environment. The Scalr Agent configuration variables (SCALR_*) are an exception.
Upgraded Python version from 3.12.7 to 3.13.3.
Upgraded various Python libraries to their latest versions (gevent, greenlet, blinker, orjson, structlog, and some minor sub-dependencies).

Fixed

Fixed compatibility issues on NFS-backed storage. Previously, runs could get stuck during the initialization phase on Kubernetes agents using storages like AWS EFS. The container task entrypoint will now avoid using filesystem FIFO pipes to maintain compatibility with network filesystems, which have limited support for these features. The solution has been tested on NFSv4.1.
Fixed an issue where the formatting check integration could fail with exit code 1 and report an empty list of improperly formatted files.

[0.46.0] - 2025-05-20

Fixed

Fixed a random issue where the command failed with unexpected exit code 93.
Fixed a random issue where the command did not complete within 10 seconds.
Kubernetes Driver: Fixed an issue where Agent Workers could stop accepting incoming agent tasks due to problems tracking the resourceVersion while listening to the Pods event stream. The resourceVersion is now reset to 0 after 5 unsuccessful attempts to reconnect to the event stream.

[0.45.0] - 2025-05-15

Added

Implemented the local driver. The Scalr Agent can now run tasks locally without isolation, removing the dependency on a Docker daemon or Kubernetes API. When the local driver is enabled, tasks are executed directly in the same environment the agent is running in. To enable it, start the agent with the --local flag or set the SCALR_DRIVER=local configuration option. This local driver is useful when running agents in environments that don’t require isolation themselves — such as serverless platforms (AWS Fargate, Cloud Run, Azure Container Apps, etc.) — or when you don’t want to grant agents access to the Docker socket and prefer to manage orchestration independently. The local driver is best used with Single Mode or by setting SCALR_AGENT_CONCURRENCY to 1 to ensure that only one run stage is executed at a time in such an environment.
Improved handling of the --url/SCALR_URL configuration. For agent pool tokens generated after Scalr version 8.162.0 this configuration is optional and the Scalr API endpoint can now be auto-extracted from the token payload. Explicitly setting the URL is still recommended for long-lived services to avoid issues if the account is renamed.
Introduced runtime environment variable SCALR_RUN_CONTENT_ROOT that contains absolute path to the root of the configuration being executed.

Changes

Removed Docker socket requirement from VCS agents.
Multi-version OPA and Checkov tasks now require Software Binaries to be enabled.
In single mode, the agent will wait for a new incoming task instead of exiting immediately if the task queue is empty.
Modified exec-loop logging format and communication pipelines with the container entrypoint. The entrypoint now shares logs and exit codes via the filesystem, rather than relying on Docker or Kubernetes logging drivers.
Added detailed logging and tracing coverage for Checkov, Infracost, and OPA policy tasks.
Performance improvements for OPA policy tasks, the exec-loop entrypoint shell script, and log streaming of run stages.
The Policy Checks and Checkov run stages, which relies on multiple different versions of software, now require Software Binaries to be enabled. See.

[0.44.4] - 2025-05-06

Fixed

Fixed issues with filesystem permissions while starting the run container on Docker backend.
Fixed issue where terraform variable files were not loaded during the init phase.

[0.44.2] - 2025-05-02

Fixed

Fixed an issue where memory and CPU limits (SCALR_CONTAINER_TASK_MEM_LIMIT and SCALR_CONTAINER_TASK_CPU_LIMIT) from the agent configuration were ignored and instead taken from the Scalr billing plan for Docker-based agents. This regression was introduced in version 0.42.0.

[0.44.0] - 2025-04-25

Changes

Bumped docker version to 28.1.1.

Removed

Removed the workflow that attempted to retry failed run stages after unexpected agent shutdowns. This workflow was non-functional and caused multiple issues. Failed stages must now be restarted manually.

[0.43.1] - 2025-04-23

Fixed

Issue where the Scalr Agent could get stuck during provider downloads. Added stricter timeouts for fetching the provider registry and downloading provider artifacts.

Added

Introduced SCALR_PROVIDER_CACHE_INSTALL_TIMEOUT_SEC config option to control the maximum time (in seconds) allowed for downloading and installing provider plugins outside the Terraform init phase.
Introduced SCALR_PROVIDER_CACHE_CONCURRENCY config option to configure the number of concurrent threads used for provider installation. This value is global across all Scalr Agent runs.

[0.43.0] - 2025-04-18

Added

Added support for the arm64 architecture.
Added OpenTelemetry tracing (disabled by default). Use the SCALR_OTLP_ENDPOINT environment variable to set the host:port address of your OpenTelemetry collector — a gRPC server running an OTLP collector. Use SCALR_OTLP_TRACES_ENABLED to enable tracing for plan and apply tasks.

Changes

Change the container log format from JSON to plain text to simplify the logging pipeline and avoid issues related to double encoding messages into JSON. From now on, containers running Terraform/OpenTofu operations will log data in plain text.
Add detailed log timestamps. All initialization phases will now log more internal workflow details, including execution time.
Reworked provider cache storage and removed tofu/terraform providers lock from the pipeline. This command was originally introduced to ensure consistency of h1 checksums in cases where the lockfile doesn’t include the platform-specific checksum for the Scalr agent’s target platform, but it significantly impacted performance by forcing Terraform to re-download all providers every time. The Scalr agent now ensures h1 consistency independently by implementing the Provider Registry Protocol and storing providers in the cache with both h1 and zh checksums. If an HCL lockfile is present in the configuration, the agent will use it to download providers concurrently (across all runs) into the global cache, storing both zh and h1 checksums. The download is limited to 10 concurrent processes per Scalr agent service. If the provider from the HCL lockfile is missing the h1 checksum for the target platform, it will be added after validation using the zh (zip archive) checksum from the providers cache metadata. If the dependency lockfile is missing, or some providers are not listed in it, the remaining providers will be downloaded by tofu/terraform init and cached after the init step. The Provider Cache directory has changed from $SCALR_AGENT_DATA_DIR/plugins to $SCALR_AGENT_DATA_DIR/providers. The old directory will be removed by the Provider Cache garbage collector.

Removed

Support for the Terraform provider cache with Terraform versions before 0.14. The provider cache now only works with the HCL dependency lockfile.

Fixed

Issues with Terraform working directories that contain spaces in the path.

[0.42.0] - 2025-03-28

Added

Added support for the registry hooks execution.
Introduced a golden image approach to minimize excessive image polling. Instead of pulling multiple images, we now pull a single runner image and bind it into binaries for multiple software types (e.g., Terragrunt, OpenTofu, Checkov, OPA). Since this feature is currently in beta, it is disabled by default. The scalr admin can enable it by setting the account option settings.agent_software_binaries_enabled to 1. Important: Customers using private registries with the container_task_image_registry option must upload the runner (golden) image to their private registry. The runner image is available at scalr/runner.

[0.41.0] - 2025-03-21

Changes

The backup HTTP polling acquisition task interval in Private Relay mode changed from 40 seconds to 15 seconds. This helps the agent acquire tasks with minimal delays when the Public Relay fails to deliver a message to the Private Relay. Once the Public Relay stabilizes, the timeout will be increased again.

[0.40.0] - 2025-03-07

Fixed

Checkov: multiple instances of Checkov for run lead to error

[0.39.0] - 2025-02-21

[0.38.1] - 2025-02-14

Fixed

Implemented automatic refresh for short-lived Kubernetes service account tokens mounted via a file path (e.g., /var/run/secrets/kubernetes.io/serviceaccount/token). This change resolves an issue where services would encounter authentication errors when accessing the Kubernetes API after the token expired (typically 1 hour).

[0.38.0] - 2025-02-07

Added

Added single mode that can be activated by using --single option. In the single mode the agent runs only one task and terminates. When no acquired task to run, it terminates with exit code 2 (requires Scalr >= 8.174.0).
Added supporting aliases in Google OIDC Provider (requires Scalr >= 8.171.0).
Added support for the TERRAGRUNT_INCLUDE_EXTERNAL_DEPENDENCIES environment variable (requires Scalr >= 8.174.0).

Fixed

Environment variables declared in pre-plan are now available in plan and post-plan stages. Handling of env vars is now consistent with Scalr hosted worker.

[0.37.1] - 2025-01-16

Added

Added SCALR_CONTAINER_TASK_PIDS_LIMIT option to control the maximum number of process IDs (PIDs) a container can spawn.
Bump default PIDs limit from 4096 to 8192.

[0.37.0] - 2025-01-15

[0.36.0] - 2025-01-10

Fixed

Terragrunt: run all not working if workdir not have tf or hcl files

[0.35.0] - 2025-01-03

Added

Support for the terragrunt run-all command.

Fixed

Kubernetes Agent: pod not being deleted after the task is completed.

Fixed

KeyError when sending a relay command result.

[0.34.0] - 2024-12-20

Fixed

Policy checks with rules that evaluate to a non-string result caused the task to fail.

[0.33.0] - 2024-12-16

Added

The ca_cert configuration option to configure a custom SSL certificate bundle.
The HTTP proxy is configured using the HTTP_PROXY/HTTPS_PROXY environment variables.

Changed

Updated container_task_image_registry behavior. If the repository path ends with a trailing slash, the original repository will be included in the resulting image path. Example:
- mirror.io/myproject combined with scalr/opentofu:1.0.0 → mirror.io/myproject/opentofu:1.0.0
- mirror.io/myproject/ combined with scalr/opentofu:1.0.0 → mirror.io/myproject/scalr/opentofu:1.0.0

[0.32.0] - 2024-12-12

Added

Support the Agent Pool relay feature to enable HTTP Relay for Scalr-to-Agent communication. This feature maintains a persistent channel via an HTTP long-polling connection, allowing agent tasks and cancellation commands to be delivered almost instantly. HTTP Relay was already used for proxying VCS agent requests and has now been extended to the new command interface for all Agents. The feature is controlled by the Scalr Account settings and will be rolled out to all agents gradually.

Added

Support for Checkov tasks.

Fixed

Fix iptables link for the DEB/RPM packages.

[0.31.0] - 2024-11-29

Changed

Include iptables in the DEB/RPM packages to fix package installation on distributions without system-provided iptables.
Updated the private pool idle timeout from 20 to 40 seconds. Now the agent will reestablish the pool connection every 40 seconds.
Updated container_task_image_registry behavior. If the repository path ends with a trailing slash, the original repository will be included in the resulting image path. Example:
- mirror.io/myproject combined with scalr/opentofu:1.0.0 → mirror.io/myproject/opentofu:1.0.0
- mirror.io/myproject/ combined with scalr/opentofu:1.0.0 → mirror.io/myproject/scalr/opentofu:1.0.0

Fixed

Improve handling of “no space left on device” errors raised by the local-exec TF provider.

[0.30.0] - 2024-11-27

Added

The container_task_image_registry option to enforce the use of a custom image registry to pull all container task images. All images must be preemptively pushed to this registry for the agent to work with this option. The registry path may include a repository to be replaced. Example: 'mirror.io' or 'mirror.io/myproject'.

[0.29.0] - 2024-11-20

Added

Support for SSH key usage during job execution on agents (requires Scalr >= 8.154.0)

Fixed

Kubernetes Agent: unable to create a policy pod with a large payload (requires Scalr >= 8.154.0)

[0.28.0] - 2024-11-12

Changed

Updated Python requirements to the latest versions (pip, gevent, greenlet, requests, cryptography, orjson, pydantic).

[0.27.0] - 2024-11-05

Changed

Kubernetes Agent: Add acknowledgment mechanism for delegating Pods from the controller to a worker.
Remove unused libraries from DEB/RPM packages (e.g., git, ruby, curl, cmake, pkg-config, and some minor libraries) to minimize package size and address security-related issues.
Remove unused libraries from the Docker package (e.g., git, ca-certificates, curl, openssl, python3.11, and various minor libraries) to minimize package size and address security-related issues.
Updated base Docker image from Debian Bookworm to Debian Trixie.
Bump Python version to 3.12.7. The OS-provided package has been replaced with a standalone build. The Python distribution’s path has changed from /usr/local/lib/python3.12 to /usr/lib/python3.12.

Fixed

Kubernetes Agent: Fixed "Container not found" error during the Cost Estimate Run stage.

[0.26.1] - 2024-10-25

[0.26.0] - 2024-10-25

Added

Support OPA policies with common functions

Removed

Remove support for Ubuntu 18.04