Installation

The Scalr Agent can be run as a Docker service, a Kubernetes deployment, or as a containerized service on platforms like AWS Fargate or Google Cloud Run.

Before installing an agent, you must first create an Agent Pool to connect it to, and obtain an Scalr agent token by registering a new agent on the Agent Pool page.

Docker

To deploy a run agent, use the following command:

docker run \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v /var/lib/scalr-agent:/var/lib/scalr-agent \
  -e SCALR_AGENT_TOKEN={token} \
  --rm -it --pull=always --name=scalr-agent scalr/agent:latest run

You need to mount SCALR_AGENT_DATA_DIR for data persistence and provide the Docker socket to enable multi-concurrency with the Docker driver.

To deploy a VCS agent, provide only the Scalr agent token:

$~ docker run \
  -e SCALR_AGENT_TOKEN={token} \
  --rm -it --pull=always --name=scalr-agent scalr/agent:latest run

You can also use Docker Compose to run agents as a service. Below is an example docker-compose.yml file for a run agent:

version: "3.8"

services:
  scalr-agent:
    image: scalr/agent:latest
    container_name: scalr-agent
    environment:
      - SCALR_AGENT_TOKEN=${SCALR_AGENT_TOKEN}
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /var/lib/scalr-agent:/var/lib/scalr-agent
    command: run
    pull_policy: always

Place the content in the file and run the following command from the same directory:

docker-compose up -d

Kubernetes

The agent can be deployed onto a Kubernetes cluster and have multiple deployment modes. The helm charts are available here https://github.com/Scalr/agent-helm

The Scalr Agent deployment on Kubernetes uses the local driver. This is best suited for simple deployments and VCS agents.

The agent-local chart deploys the Scalr Agent as a single deployment with ephemeral storage for provider and binary caching. Storage can optionally be upgraded to persistent volumes to maintain cache persistence across pod restarts.

The chart uses the local Scalr Agent driver, where all operations are executed in local subprocesses. As a result, it uses the scalr/agent image as the base environment for runs instead of scalr/runner image.

The concurrency of each agent instance is limited to 1. To scale concurrency, the recommended approach is to increase the replicaCount.

Pros

  • Simple to deploy.
  • Scalr Agent service doesn’t require permissions to access the Kubernetes API.
  • Includes Provider Cache and Binary Cache by default.

Cons

  • Doesn’t support autoscaling out of the box. You need to manually increase or decrease the number of replicas or configure the Horizontal Pod Autoscaler.
  • Not cost-efficient for bursty workloads — e.g., deployments with a high number of runs during short periods and low activity otherwise, as resources remain allocated even when idle.
  • Low multi-tenant isolation. A sequence of Scalr runs shares the same container and data storage. This chart should only be used within a single RBAC perimeter and is unsuitable for untrusted environments.

Installation instructions.

This is the Scalr Agent deployment on Kubernetes using the kubernetes driver with a controller/worker mode.

Best suited for large-scale deployments and environments with strict multi-tenancy requirements. Requires more complex configuration and a separate node pool.

The Agent deploys as two components: a controller and a worker. The controller consumes jobs from Scalr and schedules pods, while the worker supervises the jobs.

The agent worker is a DaemonSet that scales up/down with the cluster, registering and deregistering agents from the pool. When an Agent controller receives a job from Scalr, it schedules a pod for execution. The Kubernetes workload scheduler assigns the pod to a specific node, where the Agent worker running on that node oversees the execution of the job. By enabling the Kubernetes auto-scaler, Scalr Run workloads can scale linearly based on the load.

Pros

  • Cost-efficient for bursty workloads — e.g., deployments with high number of Runs during short periods and low activity otherwise, as resources allocated on demand for each Scalr Run.
  • High multi-tenant isolation, as each Scalr Run always has its own newly provisioned environment.
  • Better observability, as each Scalr Run is tied to its own unique Pod.

Cons

  • Requires access to the Kubernetes API to launch new Pods.
  • Requires a ReadWriteMany Persistent Volume configuration for provider/binary caching. This type of volume is generally vendor-specific and not widely available across all cloud providers.
  • May spawn too many services without having its own dedicated node pool. Details.
  • Relies on a hostPath volume. Details.

Installation instructions.

Serverless

For serverless workloads, deploy the scalr/agent-runner:latest image in your containerized environment as a persistent service using the SCALR_AGENT_DRIVER=local configuration option.

Serverless agents allow users to create agents on-demand via webhook triggers, eliminating the need for persistent compute resources. When a run is triggered, Scalr calls your API Gateway to spin up a container task (i.e. Fargate) for agent execution. Configure your agent pool with an API Gateway URL and optional custom headers, set up your serverless infrastructure (API Gateway → Lambda → Fargate), then enable serverless execution for the agent pool.

Before getting started in Scalr, create an API gateway in your cloud of choice as the URL will be needed to set up the agent in Scalr. Once the gateway is created, go to the agent pools page in Scalr and create the agent with the URL and optional headers:

Once the pool is created, generate a token that will be used in the gateway for Scalr to authenticate to it:

The token can now be added to the gateway for authentication and the remaining components needed for the serverless agent can be set up.

The agents will only appear in Scalr when they are being used, otherwise the agents page will not show the agents.

Please see the example below that shows how to set this up in AWS with API Gateway, Lambda, and Fargate.

Example

This guide walks through building a serverless AWS architecture that:

  • Uses API Gateway as the entry point.
  • Triggers an AWS Lambda function.
  • Lambda starts an ECS Fargate Task.
  • ECS task fetches a Scalr token from AWS Secrets Manager.
  • ECS task connects to Scalr API to pull and execute a run.

This approach enables a lightweight, decoupled, secure automation flow aligned with AWS and Scalr best practices.

Architecture Flow

[API Gateway] ---> [Lambda] ---> [ECS Fargate Task] ---> [Secrets Manager] ---> [Scalr API]

Prerequisites

AWS Account Requirements

  • AWS Services: Ensure the following services are available in your region:

    • Amazon API Gateway
    • AWS Lambda
    • Amazon ECS (Elastic Container Service)
    • AWS Secrets Manager
    • Amazon CloudWatch (for logging)
    • AWS IAM (Identity and Access Management)
  • Permissions: Your AWS user/role must have permissions to:

    • Create and manage IAM roles and policies
    • Create and configure ECS clusters and task definitions
    • Create and manage Lambda functions
    • Create API Gateway APIs
    • Create and access Secrets Manager secrets
    • Create CloudWatch log groups
    • Manage VPC resources (if creating custom VPC)
  • Network Requirements:

    • VPC with at least 2 subnets in different Availability Zones
    • Internet connectivity (either public subnets or NAT Gateway for private subnets)
    • Security groups configured for ECS tasks

Scalr Requirements

  • Scalr Token: Valid Scalr Agent Pool token. Obtain it from Scalr console: Settings > Agent Pools > Create/View Pool

Step 1: Store Scalr Token in AWS Secrets Manager

  1. Go to AWS Console > Secrets Manager

  2. Click Store a new secret

  3. Select Other type of secrets

  4. Enter key-value pair:

    • Key: token
    • Value: <your-scalr-token>
  5. Click Next, name the secret:

    • Name: scalr/api/token
  6. Skip rotation unless needed, then click Store

  7. Important: Note down the complete secret ARN from the secret details page. It will look like:

    arn:aws:secretsmanager:<region>:<account-id>:secret:scalr/api/token-AbCdEf

Step 2: Create IAM Roles and Policies

Note: Now that you have created the secret, you can use its exact ARN in the policy documents below.

2.1: Create ECS Task Role

This role allows the ECS task to access AWS Secrets Manager to retrieve the Scalr token.

Create ECS Task Policy
  1. Go to IAM > Policies > Create policy

  2. Switch to JSON mode and enter the following policy. ⚠️ Replace the Resource ARN with your actual secret ARN from Step 1:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": "secretsmanager:GetSecretValue",
          "Resource": "arn:aws:secretsmanager:<region>:<account-id>:secret:scalr/api/token*"
        }
      ]
    }
  3. Click Next, enter policy name: ScalrECSTaskPolicy

  4. Click Create policy

Create ECS Task Role
  1. Go to IAM > Roles > Create role
  2. Select Trusted entity type: AWS service
  3. Select Use case: Elastic Container Service > Elastic Container Service Task
  4. Click Next
  5. Search and select the policy created above: ScalrECSTaskPolicy
  6. Click Next, enter role name: ScalrECSTaskRole
  7. Click Create role

2.2: Create ECS Task Execution Role

This role allows ECS to pull container images, write logs to CloudWatch, and access secrets.

Create ECS Task Execution Policy
  1. Go to IAM > Policies > Create policy

  2. Switch to JSON mode and enter the following policy. ⚠️ Replace the secret ARN with your actual secret ARN from Step 1:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "ecr:GetAuthorizationToken",
            "ecr:BatchCheckLayerAvailability",
            "ecr:GetDownloadUrlForLayer",
            "ecr:BatchGetImage",
            "logs:CreateLogStream",
            "logs:PutLogEvents"
          ],
          "Resource": "*"
        },
        {
          "Effect": "Allow",
          "Action": "secretsmanager:GetSecretValue",
          "Resource": "arn:aws:secretsmanager:<region>:<account-id>:secret:scalr/api/token*"
        }
      ]
    }
  3. Click Next, enter policy name: ScalrECSExecutionPolicy

  4. Click Create policy

Create ECS Task Execution Role
  1. Go to IAM > Roles > Create role
  2. Select Trusted entity type: AWS service
  3. Select Use case: Elastic Container Service > Elastic Container Service Task
  4. Click Next
  5. Search and select the policy created above: ScalrECSExecutionPolicy
  6. Click Next, enter role name: ScalrECSExecutionRole
  7. Click Create role

Step 3: Create CloudWatch Log Group

Before creating the ECS cluster, create a CloudWatch log group for the ECS tasks:

  1. Go to CloudWatch > Log groups
  2. Click Create log group
  3. Enter log group name: /ecs/scalr-agent-pool-cluster
  4. Set retention period as needed (e.g., 7 days)
  5. Click Create

Step 4: Create ECS Fargate Objects

4.1: Create ECS Cluster

  1. Go to ECS > Clusters
  2. Click Create cluster
  3. Enter cluster name: ScalrServerless
  4. Infrastructure: AWS Fargate (serverless)
  5. Click Create

4.2: Create Task Definition

  1. Go to ECS > Task Definitions > Create new task definition
  2. Choose Create new task definition with JSON
  3. Replace the default JSON with the following configuration. ⚠️ Replace placeholders with your actual values, including the secret ARN from Step 1:
{
    "family": "scalr-agent-run",
    "taskRoleArn": "arn:aws:iam::<account-id>:role/ScalrECSTaskRole",
    "executionRoleArn": "arn:aws:iam::<account-id>:role/ScalrECSExecutionRole",
    "networkMode": "awsvpc",
    "requiresCompatibilities": ["FARGATE"],
    "cpu": "2048",
    "memory": "4096",
    "containerDefinitions": [
        {
            "name": "scalr-agent-run",
            "image": "scalr/agent-runner:latest",
            "essential": true,
            "environment": [
                {
                    "name": "SCALR_SINGLE",
                    "value": "true"
                },
                {
                    "name": "SCALR_DRIVER",
                    "value": "local"
                }
            ],
            "secrets": [
                {
                    "name": "SCALR_TOKEN",
                    "valueFrom": "arn:aws:secretsmanager:<region>:<account-id>:secret:scalr/api/token*"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/scalr-agent-pool-cluster",
                    "awslogs-region": "<region>",
                    "awslogs-stream-prefix": "ecs"
                }
            },
            "stopTimeout": 120
        }
    ]
}
  1. Click Create to save the task definition

Step 5: Configure Networking

5.1: Create or Identify VPC Resources

You'll need the following networking components for ECS Fargate tasks:

Option A: Use Default VPC (Simplest)
  1. Go to VPC Console
  2. Note down your default VPC ID
  3. Note down at least 2 subnet IDs from different availability zones
  4. Note down the default security group ID
Option B: Create Custom VPC (Recommended for Production)
  1. Go to VPC > Create VPC
  2. Choose VPC and more for guided setup
  3. Configure:
    • Name: scalr-serverless-vpc
    • IPv4 CIDR: 10.0.0.0/16
    • Availability Zones: 2
    • Public subnets: 2
    • Private subnets: 2 (if you want NAT Gateway)
    • NAT gateways: 1 (optional, for private subnets)
  4. Click Create VPC

5.2: Create Security Group

  1. Go to EC2 > Security Groups > Create security group
  2. Configure:
    • Name: scalr-ecs-sg
    • Description: Security group for Scalr ECS tasks
    • VPC: Select your VPC
  3. Outbound rules: Keep default (All traffic to 0.0.0.0/0)
  4. Inbound rules: No inbound rules needed for this use case
  5. Click Create security group
  6. Note down the security group ID

Step 6: Create Lambda Function and Role

6.1: Create Lambda Execution Role

This role allows Lambda to trigger ECS tasks and pass the required ECS roles.

Create Lambda Execution Policy
  1. Go to IAM > Policies > Create policy

  2. Switch to JSON mode and enter the following policy. ⚠️ Replace <region> and <account-id> with your actual values:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "logs:CreateLogGroup",
            "logs:CreateLogStream",
            "logs:PutLogEvents"
          ],
          "Resource": "arn:aws:logs:<region>:<account-id>:*"
        },
        {
          "Effect": "Allow",
          "Action": "ecs:RunTask",
          "Resource": "arn:aws:ecs:<region>:<account-id>:task-definition/scalr-agent-run:*"
        },
        {
          "Effect": "Allow",
          "Action": "iam:PassRole",
          "Resource": [
            "arn:aws:iam::<account-id>:role/ScalrECSTaskRole",
            "arn:aws:iam::<account-id>:role/ScalrECSExecutionRole"
          ]
        },
        {
          "Effect": "Allow",
          "Action": [
            "ec2:CreateNetworkInterface",
            "ec2:DescribeNetworkInterfaces",
            "ec2:DeleteNetworkInterface"
          ],
          "Resource": "*"
        }
      ]
    }
  3. Click Next, enter policy name: ScalrLambdaExecutionPolicy

  4. Click Create policy

Create Lambda Execution Role
  1. Go to IAM > Roles > Create role
  2. Select Trusted entity type: AWS service
  3. Select Use case: Lambda
  4. Click Next
  5. Search and select the policy created above: ScalrLambdaExecutionPolicy
  6. Click Next, enter role name: ScalrLambdaExecutionRole
  7. Click Create role

6.2: Create Lambda Function

  1. Go to AWS Lambda > Create Function
  2. Choose Author from scratch
  3. Enter function name: ScalrServerless
  4. Runtime: Python 3.13
  5. Under Change default execution role, select Use an existing role
  6. Choose the role created above: ScalrLambdaExecutionRole
  7. Click Create function
Configure Lambda Function Code
  1. In the Lambda function console, scroll down to Code source

  2. Replace the default code with the following. ⚠️ Replace the subnet and security group IDs with the values from Step 5:

    import boto3
    import json
    import logging
    
    # Configure logging
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    
    def lambda_handler(event, context):
      """
      Lambda function to trigger ECS Fargate task for Scalr agent execution
      """
      ecs_client = boto3.client('ecs')
      
      try:
          # Configure ECS task parameters
          cluster_name = 'ScalrServerless'
          task_definition = 'scalr-agent-run'
          
          # Network configuration - use values from Step 5
          subnet_ids = ['subnet-xxxxxx']  # Replace with subnet IDs from Step 5
          security_group_ids = ['sg-xxxxxx']  # Replace with security group ID from Step 5
          
          # Run the ECS task
          response = ecs_client.run_task(
              cluster=cluster_name,
              launchType='FARGATE',
              taskDefinition=task_definition,
              networkConfiguration={
                  'awsvpcConfiguration': {
                      'subnets': subnet_ids,
                      'securityGroups': security_group_ids,
                      'assignPublicIp': 'ENABLED'  # Set to DISABLED if using NAT Gateway
                  }
              }
          )
          
          task_arn = response['tasks'][0]['taskArn']
          logger.info(f"ECS Task started successfully: {task_arn}")
          
          return {
              "statusCode": 200,
              "body": json.dumps({
                  "message": "ECS Task triggered successfully",
                  "taskArn": task_arn
              })
          }
          
      except Exception as e:
          logger.error(f"Error triggering ECS task: {str(e)}")
          return {
              "statusCode": 500,
              "body": json.dumps({
                  "error": "Failed to trigger ECS task",
                  "details": str(e)
              })
          }
  3. Click Deploy to save the function

Step 7: Create API Gateway

7.1: Create HTTP API

  1. Go to API Gateway > Create API
  2. Choose HTTP API and click Build
  3. Configure:
    • API name: scalr-serverless-api
    • Description: API to trigger Scalr serverless tasks
  4. Click Next

7.2: Configure Routes

  1. Method: POST
  2. Resource path: /trigger
  3. Integration target: Select your Lambda function (ScalrServerless)
  4. Click Next

7.3: Configure Stages

  1. Stage name: prod
  2. Auto-deploy: Enable
  3. Click Next, then Create

7.4: Configure API Key Authentication

  1. In the API Gateway console, go to your API (scalr-serverless-api)
  2. Click Routes in the left sidebar
  3. Select your POST /trigger route
  4. Click Edit
  5. Under Authorization, select API Key Required: true
  6. Click Update
Create API Key
  1. Go to API Keys in the left sidebar
  2. Click Create API key
  3. Configure:
    • Name: scalr-serverless-key
    • Description: API key for Scalr serverless triggers
  4. Click Create
  5. Important: Copy the API key value - you won't be able to see it again
Create Usage Plan
  1. Go to Usage plans in the left sidebar
  2. Click Create usage plan
  3. Configure:
    • Name: scalr-serverless-plan
    • Description: Usage plan for Scalr serverless API
    • Throttling: Set limits as needed (e.g., 100 requests per second)
    • Quota: Set daily/monthly limits as needed (e.g., 1000 requests per day)
  4. Click Next
  5. Add API stage: Select your API and prod stage
  6. Click Next
  7. Add API keys: Select the API key created above
  8. Click Create

7.5: Test the Secured API

  1. Note down the Invoke URL from the API Gateway console
  2. Test using curl with the API key:
curl -X POST https://your-api-id.execute-api.region.amazonaws.com/prod/trigger \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY_HERE" \
  -d '{}'

Important: Always include the x-api-key header with your API key value.

The API should return a success response and trigger the ECS task.

Security Best Practices

ComponentPractice
API GatewayRequire API keys for authentication, implement throttling and quotas
Secrets ManagerStore secrets as key-value or string, rotate if needed
IAMLeast privilege: Lambda can only run tasks; ECS task can only read token
Container SecurityDo not store secrets in image; fetch at runtime
LoggingEnable CloudWatch logs for ECS, Lambda

Example Use Case

Scalr webhook → API Gateway → Lambda → ECS → token fetch → Scalr API

This flow supports automation like:

  • Remote plan/apply runners
  • Remote policy checks
  • CI/CD integrations

Troubleshooting

Common Issues and Solutions

1. Lambda Function Issues

Error: "Task failed to start"

  • Cause: Incorrect subnet IDs or security group IDs in Lambda code
  • Solution: Verify subnet and security group IDs in Lambda function code
  • Check: Ensure subnets are in the same VPC and have internet access

Error: "Access Denied" when running ECS task

  • Cause: Lambda execution role missing permissions
  • Solution: Verify ScalrLambdaExecutionPolicy includes ecs:RunTask and iam:PassRole permissions
  • Check: Ensure task definition ARN matches the policy resource
2. ECS Task Issues

Error: "Task stopped with exit code 1"

  • Cause: Container cannot access Secrets Manager or Scalr API
  • Solution: Check CloudWatch logs at /ecs/scalr-agent-pool-cluster
  • Verify:
    • ECS task role has secretsmanager:GetSecretValue permission
    • Secret ARN format is correct in task definition
    • Container has internet access

Error: "CannotPullContainerError"

  • Cause: ECS cannot pull the container image
  • Solution: Verify ECS execution role has ECR permissions
  • Check: Ensure ScalrECSExecutionPolicy includes ECR permissions
3. Networking Issues

Error: "Task failed to start" with networking errors

  • Cause: Subnet or security group configuration issues
  • Solution:
    • Verify subnets have internet access (public IP or NAT Gateway)
    • Check security group allows outbound traffic to internet
    • Ensure subnets are in different AZs
4. Secrets Manager Issues

Error: "Secrets Manager secret not found"

  • Cause: Incorrect secret ARN or name
  • Solution: Verify secret ARN format in task definition:
    arn:aws:secretsmanager:<region>:<account-id>:secret:scalr/api/token:token::
5. API Gateway Issues

Error: "Forbidden" or "Missing Authentication Token"

  • Cause: Missing or invalid API key
  • Solution: Ensure you're including the x-api-key header with a valid API key
  • Check: Verify API key is associated with the usage plan and stage

Error: "Internal Server Error" from API

  • Cause: Lambda function error
  • Solution: Check Lambda function logs in CloudWatch
  • Verify: Lambda function has correct permissions and configuration

Debugging Steps

  1. Check CloudWatch Logs:

    • Lambda logs: /aws/lambda/ScalrServerless
    • ECS logs: /ecs/scalr-agent-pool-cluster
  2. Verify IAM Permissions:

    • Use AWS CLI to test permissions: aws sts get-caller-identity
    • Check role trust relationships and policies
  3. Test Components Individually:

    • Test Lambda function directly from AWS Console
    • Manually run ECS task from ECS Console
    • Verify secret access from ECS task
  4. Monitor ECS Task Status:

    • Go to ECS Console > Clusters > ScalrServerless > Tasks
    • Check task status and details for error messages

Next Steps

Enhancements

  • Error Handling: Add retry logic to Lambda in case of ECS launch failures
  • Monitoring: Set up CloudWatch alarms for task failures
  • Logging: Configure structured logging for better debugging
  • Security: Implement API Gateway authentication (API keys, JWT, etc.)
  • Scaling: Configure auto-scaling for ECS tasks if needed

Monitoring and Observability

  • CloudWatch Dashboards: Create dashboards to monitor API calls, Lambda executions, and ECS task status
  • AWS X-Ray: Enable tracing for end-to-end request tracking
  • EventBridge: Use AWS EventBridge to capture ECS task state changes for automated workflows

Infrastructure as Code

  • Terraform: Use the included Terraform modules for automated deployment
  • CDK/CloudFormation: Convert to Infrastructure as Code for version control and repeatability

Resource Summary

After completing this guide, you will have created:

Resource TypeNamePurpose
IAM PolicyScalrECSTaskPolicyAllows ECS task to access Secrets Manager
IAM RoleScalrECSTaskRoleTask role for ECS container
IAM PolicyScalrECSExecutionPolicyAllows ECS to pull images and write logs
IAM RoleScalrECSExecutionRoleExecution role for ECS service
IAM PolicyScalrLambdaExecutionPolicyAllows Lambda to trigger ECS tasks
IAM RoleScalrLambdaExecutionRoleExecution role for Lambda function
CloudWatch Log Group/ecs/scalr-agent-pool-clusterStores ECS task logs
ECS ClusterScalrServerlessFargate cluster for running tasks
ECS Task Definitionscalr-agent-runDefines the Scalr agent container
Lambda FunctionScalrServerlessTriggers ECS tasks via API calls
API Gatewayscalr-serverless-apiHTTP API endpoint
Secrets Manager Secretscalr/api/tokenStores Scalr agent token

For Infrastructure as Code deployment using Terraform, see the included modules in this repository.