Run Agents

Overview

By default, when executing a Terraform or OpenTofu run in scalr.io, it will execute on a shared pool of resources that are maintained by Scalr. This method suffices for the majority of use cases, but sometimes there are use cases due to security, compliance, or network requirements that require the runs to be executed on a self-hosted pool of agents. The Scalr self-hosted agent pools are deployed on your infrastructure, fully encrypted, and will only need network access back to Scalr.io to report the run results. Scalr.io will never need network access back to the agent.

Run agents are not included in the Scalr.io run concurrency. Each agent will have a limit of 5 concurrent runs at time to avoid overloading them. The agent was decoupled from the scalr.io concurrency limit to allow customers to control their own concurrency if needed.

Example: If you have five concurrent runs on the scalr.io runners and two self-hosted agents running, you will have 15 concurrent runs

Configuring Agent Pools

Prerequisites:

  • Run and VCS agents SHOULD NOT be deployed on the same infrastructure.
  • Agents can be deployed on:
    • Rocky Linux 9.x
    • Ubuntu 20.04/22.04
    • Docker (version 18+) containers
    • Kubernetes - The helm chart for Kubernetes can be found here.
  • ARM64 is supported - Minimum agent version is 0.43.0, and the golden image must be used.
  • The agents must have HTTPS connections to *scalr.io and *docker.io.
  • Agent sizing depends on your workloads. For most workloads, 512MB of RAM and 1 CPU allocated for each run/container will be sufficient. You may need to increase the memory size for larger workloads to ensure sufficient memory. If you need more than one concurrent run, the sizing to consider is calculated with RAM x Concurrency, where RAM is the amount of RAM allocated for a container, and concurrency is how many parallel runs are required. For example, if two concurrent runs are needed, then the sizing should be 1024MB RAM. Free RAM is the main factor with agents; always ensure there is enough allocated for the OS to continue to run as well. Each agent currently has a max of five concurrent runs.

Agent pools are created at the account scope and can be made the default for the entire account or linked to specific workspaces. To create a pool, go to the account scope, expand the inventory menu, and click agent pools. Select Runs, and then follow the in-app instructions:

Assigning an Agent Pool

Agent pools can be set for the entire account, assigned to environments, or selected in the workspace settings. If an agent is set as the default for an account, all workspaces will inherit the agent unless a different agent is explicitly set in the workspace settings. If an agent pool is assigned to an environment, only that environment will be able to use it.

To set a default for the account, go to the agent pools page and click "make default" next to the agent pool:

To assign an agent pool to one or more environments, add the environments in the environment access settings:

Agent pools can also be selected in the workspace and override the default setting:

Managing Agent Pools

Once a pool is created, you can check the status of agents in the pool:

Logs

The logs for the agents can be seen by running the following commands depending on the platform the agent is running on:

  • VM: journalctl -xe -u scalr-agent > scalr_agent.logs
  • Docker: docker logs <container-name>
  • Kubernetes: kubectl logs <POD_NAME>

Customization

The instructions below are for VM or Docker-based deployments. For Kubernetes-based agents, see the helm chart options here.

If you need to customize the agent to add software, certs, or anything else that a Terraform run might need, you can do so with the following:

Create a Docker file that points to the Scalr Docker image, update the version as needed, and then add the customization:

FROM scalr/terraform:1.0.0
ADD ...
RUN ...

Once the Docker file is done, run the following command to build the image:

/opt/scalr-agent/embedded/bin/docker build . -t scalr/terraform:1.0.0

IMPORTANT: The image must be named scalr/terraform:<version>to ensure Scalr uses it.

For agent version 0.42.0 or higher, we suggest using the golden image option, which has the benefit of using a single image for all Terraform and OpenTofu versions.

Golden Image (Beta)

👋

This feature is currently in beta and is disabled by default. If you’d like to enable it, please open a ticket at support.scalr.com and ensure you are using agent version 0.42.0 or greater.

Scalr now supports a "golden image", which reduces the overhead compared to the traditional images that Scalr agents used. With the golden image, users only need to maintain a single image for all required software versions. Previously, using the traditional image, customers would need to build an image for every Terraform, OpenTofu, OPA, and Infracost version that was used. Now only a single image is required, and the Scalr agent will pull the versions and cache them on the agent.

The customization of the image will remain the same, but the image name is now scalr/runner:

FROM scalr/runner:0.1.0

ADD... 
RUN...

If you use a private registry with the container_task_image_registry option, you must upload the image to your private registry.

Benefits

The golden image provides the following benefits over the traditional image:

  • Maintain a single image rather than one per Terraform and OpenTofu versions.
  • Optimizes Docker pulls to avoid rate limits.
  • Support for the ARM OS architecture.

Adding a CA bundle

The instructions below are for VM or Docker-based deployments. For Kubernetes-based agents, use the agent.container_task_ca_cert setting in the helm chart to path to the certificate. See more here.

To configure SSL certificates globally, use the SCALR_CA_CERT variable option. To configure SSL certificates only for isolated containers for the tasks (e.g. tofu/terraform/infracost operations), set the SCALR_CONTAINER_TASK_CA_CERT option.

The CA file can be located on the agent's VM, allowing a certificate to be selected by its file path. If the agent is running within Docker, ensure the certificate is mounted into the agent container.

Alternatively, a base64-encoded string containing the certificate bundle can be used. Example of encoding a bundle:

$~ cat /path/to/bundle.ca | base64

Example of running an agent with custom CA certificates with a Docker deployment method:

$~ docker run  
  -v /var/run/docker.sock:/var/run/docker.sock  
  -v /var/lib/scalr-agent:/var/lib/scalr-agent  
  -e SCALR_URL=https\://<account>.scalr.io  
  -e SCALR_TOKEN=<token>  
  -e SCALR_DATA_HOME=/var/lib/scalr-agent  
  -e SCALR_CA_CERT=/var/lib/scalr-agent/ca.cert  
  --rm -it --pull=always --name=scalr-agent scalr/agent:latest run

Note that the certificate is located in the /var/lib/scalr-agent/ directory, which is mounted into the container.

You can optionally bundle your certificate into an agent image. Place the custom CA file at extra_ca_root.crt and build the customized image:

FROM scalr/agent:latest

ADD extra_ca_root.crt /usr/local/share/ca-certificates/extra-ca.crt  
RUN apt update  
    && apt install ca-certificates -y  
    && chmod 644 /usr/local/share/ca-certificates/extra-ca.crt  
    && update-ca-certificates  
ENV SCALR_CA_CERT="/etc/ssl/certs/ca-certificates.crt"

This step also bundles your certificate with the set of public certificates provided by ca-certificates system package. You can optionally skip this step and instead point SCALR_CA_CERT to your certificate if it already includes public CA certificates or if they are not needed (e.g., in a setup completely hidden behind a proxy).

Note that by default, the scalr agent uses the certificate bundle provided by the certifi package instead of the system certificate bundle provided by the ca-certificates package.

Adding a Proxy

The instructions below are for VM or Docker based deployments. For Kubernetes-based agents, see the proxy settings in the helm chart here.

VM-Based

For a VM, if the agent requires a proxy to get back to scalr.io, please create a system drop-in directory:

mkdir -p  /etc/systemd/system/scalr-agent.service.d/

Create the /etc/systemd/system/scalr-agent.service.d/proxy.conf file, with the following contents:

[Service]
Environment="HTTP_PROXY=http://proxy.example.com:3128"
Environment="HTTPS_PROXY=http://proxy.example.com:3128"

Symlink the proxy.conf into the scalr-docker drop-in.

mkdir -p /etc/systemd/system/scalr-docker.service.d/
ln -s /etc/systemd/system/scalr-agent.service.d/proxy.conf \
   /etc/systemd/system/scalr-docker.service.d/proxy.conf

Once the above is added, execute the following commands:

systemctl daemon-reload
systemctl restart scalr-docker
systemctl restart scalr-agent

Docker-Based

For Docker, add the option environment variables (HTTP_PROXY, HTTPS_PROXY, NO_PROXY):

$~ docker run \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v /var/lib/scalr-agent:/var/lib/scalr-agent \
  -e SCALR_URL=https://<account>.scalr.io \
  -e SCALR_TOKEN=<token> \
  -e SCALR_DATA_HOME=/var/lib/scalr-agent \
  -e HTTP_PROXY="<proxy-address>" \
  -e HTTPS_PROXY="<proxy-address>" \
  -e NO_PROXY="<addr1>,<addr2>" \
  --rm -it --pull=always --name=scalr-agent scalr/agent:latest run

Other Configuration Options

Kubernetes Deployments

The instructions below are for VM or Docker-based deployments. For Kubernetes-based agents, see the helm chart here.

Docker & VM Deployments

The Docker and VM-based agents both use the same underlying application via a Docker backend; all options seen below can be applied to both, but are done so in a different way.

For example, to customize using the Docker installation method:

$~ docker run \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v /var/lib/scalr-agent:/var/lib/scalr-agent \
  -e SCALR_URL=https://<account>.scalr.io \
  -e SCALR_TOKEN=<token> \
  -e SCALR_DATA_HOME=/var/lib/scalr-agent \
  -e HTTP_PROXY="<proxy-address>" \
  -e HTTPS_PROXY="<proxy-address>" \
  -e NO_PROXY="<addr1>,<addr2>" \
  -e SCALR_CONTAINER_TASK_MEM_LIMIT= 16384 \
  --rm -it --pull=always --name=scalr-agent scalr/agent:latest run

To customize using the VM (RPM/DEB) method, it would use standard OS environment variables:

export variable=value

Below are all of the variable options that can be used for customization:

OptionTypeDefaultDescription
SCALR_CONTAINER_TASK_CPU_REQUESTfloat1.0CPU resource request defined in cores. If your container needs two full cores to run, you would put the value 2. If your container only needs ¼ of a core, you would put a value of 0.25 cores.
SCALR_CONTAINER_TASK_CPU_LIMITfloat8.0CPU resource limit defined in cores. If your container needs two full cores to run, you would put the value 2. If your container only needs ¼ of a core, you would put a value of 0.25 cores.
SCALR_CONTAINER_TASK_MEM_REQUESTint1024Memory resource request defined in megabytes.
SCALR_CONTAINER_TASK_MEM_LIMITint16384Memory resource limit defined in megabytes
SCALR_CONTAINER_TASK_CA_CERTstrnullThe CA certificates bundle to mount it into the container task at /etc/ssl/certs/ca-certificates.crt. The CA file can be located inside the agent VM, allowing selection of a certificate by its path. If running the agent within Docker, ensure the certificate is mounted to an agent container. Alternatively, a base64 string containing the certificate bundle can be used. The example encoding it: `cat /path/to/bundle.ca \base64`.The bundle should include both your private CAs and the standard set of public CAs.
SCALR_CONTAINER_TASK_IMAGE_REGISTRYstrnullEnforce the use of a custom image registry to pull all container task images. All images must be preemptively pushed to this registry for the agent to work with this option. The registry path may include a repository to be replaced. If the path ends with a trailing slash, it will be appended to the original repository.Example: 'mirror.io', 'mirror.io/myproject' or 'mirror.io/myproject/'.

Custom .terraformrc Files

Add the SCALR_TERRAFORM_RC shell variable in a workspace to use a custom a .terraformrc file. The variable's value should be the contents you normally put in the custom .terraformrc file. This enables direct configuration of Terraform CLI settings, including credentials, plugin caching, and proxy configurations for private module registry access and enhanced security.

Serverless

Serverless agents allow users to create agents on-demand via webhook triggers, eliminating the need for persistent compute resources. When a run is triggered, Scalr calls your API Gateway to spin up a container task (i.e. Fargate) for agent execution. Configure your agent pool with an API Gateway URL and optional custom headers, set up your serverless infrastructure (API Gateway → Lambda → Fargate), then enable serverless execution for the agent pool.

Before getting started in Scalr, create an API gateway in your cloud of choice as the URL will be needed to set up the agent in Scalr. Once the gateway is created, go to the agent pools page in Scalr and create the agent with the URL and optional headers:

Once the pool is created, generate a token that will be used in the gateway for Scalr to authenticate to it:

The token can now be added to the gateway for authentication and the remaining components needed for the serverless agent can be set up.

The agents will only appear in Scalr when they are being used, otherwise the agents page will not show the agents.

Please see the example below that shows how to set this up in AWS with API Gateway, Lambda, and Fargate.

Example

This guide walks through building a serverless AWS architecture that:

  • Uses API Gateway as the entry point.
  • Triggers an AWS Lambda function.
  • Lambda starts an ECS Fargate Task.
  • ECS task fetches a Scalr token from AWS Secrets Manager.
  • ECS task connects to Scalr API to pull and execute a run.

This approach enables a lightweight, decoupled, secure automation flow aligned with AWS and Scalr best practices.

Architecture Flow

[API Gateway] ---> [Lambda] ---> [ECS Fargate Task] ---> [Secrets Manager] ---> [Scalr API]

Prerequisites

AWS Account Requirements

  • AWS Services: Ensure the following services are available in your region:

    • Amazon API Gateway
    • AWS Lambda
    • Amazon ECS (Elastic Container Service)
    • AWS Secrets Manager
    • Amazon CloudWatch (for logging)
    • AWS IAM (Identity and Access Management)
  • Permissions: Your AWS user/role must have permissions to:

    • Create and manage IAM roles and policies
    • Create and configure ECS clusters and task definitions
    • Create and manage Lambda functions
    • Create API Gateway APIs
    • Create and access Secrets Manager secrets
    • Create CloudWatch log groups
    • Manage VPC resources (if creating custom VPC)
  • Network Requirements:

    • VPC with at least 2 subnets in different Availability Zones
    • Internet connectivity (either public subnets or NAT Gateway for private subnets)
    • Security groups configured for ECS tasks

Scalr Requirements

  • Scalr Token: Valid Scalr Agent Pool token. Obtain it from Scalr console: Settings > Agent Pools > Create/View Pool

Step 1: Store Scalr Token in AWS Secrets Manager

  1. Go to AWS Console > Secrets Manager

  2. Click Store a new secret

  3. Select Other type of secrets

  4. Enter key-value pair:

    • Key: token
    • Value: <your-scalr-token>
  5. Click Next, name the secret:

    • Name: scalr/api/token
  6. Skip rotation unless needed, then click Store

  7. Important: Note down the complete secret ARN from the secret details page. It will look like:

    arn:aws:secretsmanager:<region>:<account-id>:secret:scalr/api/token-AbCdEf

Step 2: Create IAM Roles and Policies

Note: Now that you have created the secret, you can use its exact ARN in the policy documents below.

2.1: Create ECS Task Role

This role allows the ECS task to access AWS Secrets Manager to retrieve the Scalr token.

Create ECS Task Policy
  1. Go to IAM > Policies > Create policy

  2. Switch to JSON mode and enter the following policy. ⚠️ Replace the Resource ARN with your actual secret ARN from Step 1:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": "secretsmanager:GetSecretValue",
          "Resource": "arn:aws:secretsmanager:<region>:<account-id>:secret:scalr/api/token*"
        }
      ]
    }
  3. Click Next, enter policy name: ScalrECSTaskPolicy

  4. Click Create policy

Create ECS Task Role
  1. Go to IAM > Roles > Create role
  2. Select Trusted entity type: AWS service
  3. Select Use case: Elastic Container Service > Elastic Container Service Task
  4. Click Next
  5. Search and select the policy created above: ScalrECSTaskPolicy
  6. Click Next, enter role name: ScalrECSTaskRole
  7. Click Create role

2.2: Create ECS Task Execution Role

This role allows ECS to pull container images, write logs to CloudWatch, and access secrets.

Create ECS Task Execution Policy
  1. Go to IAM > Policies > Create policy

  2. Switch to JSON mode and enter the following policy. ⚠️ Replace the secret ARN with your actual secret ARN from Step 1:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "ecr:GetAuthorizationToken",
            "ecr:BatchCheckLayerAvailability",
            "ecr:GetDownloadUrlForLayer",
            "ecr:BatchGetImage",
            "logs:CreateLogStream",
            "logs:PutLogEvents"
          ],
          "Resource": "*"
        },
        {
          "Effect": "Allow",
          "Action": "secretsmanager:GetSecretValue",
          "Resource": "arn:aws:secretsmanager:<region>:<account-id>:secret:scalr/api/token*"
        }
      ]
    }
  3. Click Next, enter policy name: ScalrECSExecutionPolicy

  4. Click Create policy

Create ECS Task Execution Role
  1. Go to IAM > Roles > Create role
  2. Select Trusted entity type: AWS service
  3. Select Use case: Elastic Container Service > Elastic Container Service Task
  4. Click Next
  5. Search and select the policy created above: ScalrECSExecutionPolicy
  6. Click Next, enter role name: ScalrECSExecutionRole
  7. Click Create role

Step 3: Create CloudWatch Log Group

Before creating the ECS cluster, create a CloudWatch log group for the ECS tasks:

  1. Go to CloudWatch > Log groups
  2. Click Create log group
  3. Enter log group name: /ecs/scalr-agent-pool-cluster
  4. Set retention period as needed (e.g., 7 days)
  5. Click Create

Step 4: Create ECS Fargate Objects

4.1: Create ECS Cluster

  1. Go to ECS > Clusters
  2. Click Create cluster
  3. Enter cluster name: ScalrServerless
  4. Infrastructure: AWS Fargate (serverless)
  5. Click Create

4.2: Create Task Definition

  1. Go to ECS > Task Definitions > Create new task definition
  2. Choose Create new task definition with JSON
  3. Replace the default JSON with the following configuration. ⚠️ Replace placeholders with your actual values, including the secret ARN from Step 1:
{
    "family": "scalr-agent-run",
    "taskRoleArn": "arn:aws:iam::<account-id>:role/ScalrECSTaskRole",
    "executionRoleArn": "arn:aws:iam::<account-id>:role/ScalrECSExecutionRole",
    "networkMode": "awsvpc",
    "requiresCompatibilities": ["FARGATE"],
    "cpu": "2048",
    "memory": "4096",
    "containerDefinitions": [
        {
            "name": "scalr-agent-run",
            "image": "scalr/agent-runner:latest",
            "essential": true,
            "environment": [
                {
                    "name": "SCALR_SINGLE",
                    "value": "true"
                },
                {
                    "name": "SCALR_DRIVER",
                    "value": "local"
                }
            ],
            "secrets": [
                {
                    "name": "SCALR_TOKEN",
                    "valueFrom": "arn:aws:secretsmanager:<region>:<account-id>:secret:scalr/api/token*"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/scalr-agent-pool-cluster",
                    "awslogs-region": "<region>",
                    "awslogs-stream-prefix": "ecs"
                }
            },
            "stopTimeout": 120
        }
    ]
}
  1. Click Create to save the task definition

Step 5: Configure Networking

5.1: Create or Identify VPC Resources

You'll need the following networking components for ECS Fargate tasks:

Option A: Use Default VPC (Simplest)
  1. Go to VPC Console
  2. Note down your default VPC ID
  3. Note down at least 2 subnet IDs from different availability zones
  4. Note down the default security group ID
Option B: Create Custom VPC (Recommended for Production)
  1. Go to VPC > Create VPC
  2. Choose VPC and more for guided setup
  3. Configure:
    • Name: scalr-serverless-vpc
    • IPv4 CIDR: 10.0.0.0/16
    • Availability Zones: 2
    • Public subnets: 2
    • Private subnets: 2 (if you want NAT Gateway)
    • NAT gateways: 1 (optional, for private subnets)
  4. Click Create VPC

5.2: Create Security Group

  1. Go to EC2 > Security Groups > Create security group
  2. Configure:
    • Name: scalr-ecs-sg
    • Description: Security group for Scalr ECS tasks
    • VPC: Select your VPC
  3. Outbound rules: Keep default (All traffic to 0.0.0.0/0)
  4. Inbound rules: No inbound rules needed for this use case
  5. Click Create security group
  6. Note down the security group ID

Step 6: Create Lambda Function and Role

6.1: Create Lambda Execution Role

This role allows Lambda to trigger ECS tasks and pass the required ECS roles.

Create Lambda Execution Policy
  1. Go to IAM > Policies > Create policy

  2. Switch to JSON mode and enter the following policy. ⚠️ Replace <region> and <account-id> with your actual values:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "logs:CreateLogGroup",
            "logs:CreateLogStream",
            "logs:PutLogEvents"
          ],
          "Resource": "arn:aws:logs:<region>:<account-id>:*"
        },
        {
          "Effect": "Allow",
          "Action": "ecs:RunTask",
          "Resource": "arn:aws:ecs:<region>:<account-id>:task-definition/scalr-agent-run:*"
        },
        {
          "Effect": "Allow",
          "Action": "iam:PassRole",
          "Resource": [
            "arn:aws:iam::<account-id>:role/ScalrECSTaskRole",
            "arn:aws:iam::<account-id>:role/ScalrECSExecutionRole"
          ]
        },
        {
          "Effect": "Allow",
          "Action": [
            "ec2:CreateNetworkInterface",
            "ec2:DescribeNetworkInterfaces",
            "ec2:DeleteNetworkInterface"
          ],
          "Resource": "*"
        }
      ]
    }
  3. Click Next, enter policy name: ScalrLambdaExecutionPolicy

  4. Click Create policy

Create Lambda Execution Role
  1. Go to IAM > Roles > Create role
  2. Select Trusted entity type: AWS service
  3. Select Use case: Lambda
  4. Click Next
  5. Search and select the policy created above: ScalrLambdaExecutionPolicy
  6. Click Next, enter role name: ScalrLambdaExecutionRole
  7. Click Create role

6.2: Create Lambda Function

  1. Go to AWS Lambda > Create Function
  2. Choose Author from scratch
  3. Enter function name: ScalrServerless
  4. Runtime: Python 3.13
  5. Under Change default execution role, select Use an existing role
  6. Choose the role created above: ScalrLambdaExecutionRole
  7. Click Create function
Configure Lambda Function Code
  1. In the Lambda function console, scroll down to Code source

  2. Replace the default code with the following. ⚠️ Replace the subnet and security group IDs with the values from Step 5:

    import boto3
    import json
    import logging
    
    # Configure logging
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    
    def lambda_handler(event, context):
      """
      Lambda function to trigger ECS Fargate task for Scalr agent execution
      """
      ecs_client = boto3.client('ecs')
      
      try:
          # Configure ECS task parameters
          cluster_name = 'ScalrServerless'
          task_definition = 'scalr-agent-run'
          
          # Network configuration - use values from Step 5
          subnet_ids = ['subnet-xxxxxx']  # Replace with subnet IDs from Step 5
          security_group_ids = ['sg-xxxxxx']  # Replace with security group ID from Step 5
          
          # Run the ECS task
          response = ecs_client.run_task(
              cluster=cluster_name,
              launchType='FARGATE',
              taskDefinition=task_definition,
              networkConfiguration={
                  'awsvpcConfiguration': {
                      'subnets': subnet_ids,
                      'securityGroups': security_group_ids,
                      'assignPublicIp': 'ENABLED'  # Set to DISABLED if using NAT Gateway
                  }
              }
          )
          
          task_arn = response['tasks'][0]['taskArn']
          logger.info(f"ECS Task started successfully: {task_arn}")
          
          return {
              "statusCode": 200,
              "body": json.dumps({
                  "message": "ECS Task triggered successfully",
                  "taskArn": task_arn
              })
          }
          
      except Exception as e:
          logger.error(f"Error triggering ECS task: {str(e)}")
          return {
              "statusCode": 500,
              "body": json.dumps({
                  "error": "Failed to trigger ECS task",
                  "details": str(e)
              })
          }
  3. Click Deploy to save the function

Step 7: Create API Gateway

7.1: Create HTTP API

  1. Go to API Gateway > Create API
  2. Choose HTTP API and click Build
  3. Configure:
    • API name: scalr-serverless-api
    • Description: API to trigger Scalr serverless tasks
  4. Click Next

7.2: Configure Routes

  1. Method: POST
  2. Resource path: /trigger
  3. Integration target: Select your Lambda function (ScalrServerless)
  4. Click Next

7.3: Configure Stages

  1. Stage name: prod
  2. Auto-deploy: Enable
  3. Click Next, then Create

7.4: Configure API Key Authentication

  1. In the API Gateway console, go to your API (scalr-serverless-api)
  2. Click Routes in the left sidebar
  3. Select your POST /trigger route
  4. Click Edit
  5. Under Authorization, select API Key Required: true
  6. Click Update
Create API Key
  1. Go to API Keys in the left sidebar
  2. Click Create API key
  3. Configure:
    • Name: scalr-serverless-key
    • Description: API key for Scalr serverless triggers
  4. Click Create
  5. Important: Copy the API key value - you won't be able to see it again
Create Usage Plan
  1. Go to Usage plans in the left sidebar
  2. Click Create usage plan
  3. Configure:
    • Name: scalr-serverless-plan
    • Description: Usage plan for Scalr serverless API
    • Throttling: Set limits as needed (e.g., 100 requests per second)
    • Quota: Set daily/monthly limits as needed (e.g., 1000 requests per day)
  4. Click Next
  5. Add API stage: Select your API and prod stage
  6. Click Next
  7. Add API keys: Select the API key created above
  8. Click Create

7.5: Test the Secured API

  1. Note down the Invoke URL from the API Gateway console
  2. Test using curl with the API key:
curl -X POST https://your-api-id.execute-api.region.amazonaws.com/prod/trigger \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY_HERE" \
  -d '{}'

Important: Always include the x-api-key header with your API key value.

The API should return a success response and trigger the ECS task.

Security Best Practices

ComponentPractice
API GatewayRequire API keys for authentication, implement throttling and quotas
Secrets ManagerStore secrets as key-value or string, rotate if needed
IAMLeast privilege: Lambda can only run tasks; ECS task can only read token
Container SecurityDo not store secrets in image; fetch at runtime
LoggingEnable CloudWatch logs for ECS, Lambda

Example Use Case

Scalr webhook → API Gateway → Lambda → ECS → token fetch → Scalr API

This flow supports automation like:

  • Remote plan/apply runners
  • Remote policy checks
  • CI/CD integrations

Troubleshooting

Common Issues and Solutions

1. Lambda Function Issues

Error: "Task failed to start"

  • Cause: Incorrect subnet IDs or security group IDs in Lambda code
  • Solution: Verify subnet and security group IDs in Lambda function code
  • Check: Ensure subnets are in the same VPC and have internet access

Error: "Access Denied" when running ECS task

  • Cause: Lambda execution role missing permissions
  • Solution: Verify ScalrLambdaExecutionPolicy includes ecs:RunTask and iam:PassRole permissions
  • Check: Ensure task definition ARN matches the policy resource
2. ECS Task Issues

Error: "Task stopped with exit code 1"

  • Cause: Container cannot access Secrets Manager or Scalr API
  • Solution: Check CloudWatch logs at /ecs/scalr-agent-pool-cluster
  • Verify:
    • ECS task role has secretsmanager:GetSecretValue permission
    • Secret ARN format is correct in task definition
    • Container has internet access

Error: "CannotPullContainerError"

  • Cause: ECS cannot pull the container image
  • Solution: Verify ECS execution role has ECR permissions
  • Check: Ensure ScalrECSExecutionPolicy includes ECR permissions
3. Networking Issues

Error: "Task failed to start" with networking errors

  • Cause: Subnet or security group configuration issues
  • Solution:
    • Verify subnets have internet access (public IP or NAT Gateway)
    • Check security group allows outbound traffic to internet
    • Ensure subnets are in different AZs
4. Secrets Manager Issues

Error: "Secrets Manager secret not found"

  • Cause: Incorrect secret ARN or name
  • Solution: Verify secret ARN format in task definition:
    arn:aws:secretsmanager:<region>:<account-id>:secret:scalr/api/token:token::
5. API Gateway Issues

Error: "Forbidden" or "Missing Authentication Token"

  • Cause: Missing or invalid API key
  • Solution: Ensure you're including the x-api-key header with a valid API key
  • Check: Verify API key is associated with the usage plan and stage

Error: "Internal Server Error" from API

  • Cause: Lambda function error
  • Solution: Check Lambda function logs in CloudWatch
  • Verify: Lambda function has correct permissions and configuration

Debugging Steps

  1. Check CloudWatch Logs:

    • Lambda logs: /aws/lambda/ScalrServerless
    • ECS logs: /ecs/scalr-agent-pool-cluster
  2. Verify IAM Permissions:

    • Use AWS CLI to test permissions: aws sts get-caller-identity
    • Check role trust relationships and policies
  3. Test Components Individually:

    • Test Lambda function directly from AWS Console
    • Manually run ECS task from ECS Console
    • Verify secret access from ECS task
  4. Monitor ECS Task Status:

    • Go to ECS Console > Clusters > ScalrServerless > Tasks
    • Check task status and details for error messages

Next Steps

Enhancements

  • Error Handling: Add retry logic to Lambda in case of ECS launch failures
  • Monitoring: Set up CloudWatch alarms for task failures
  • Logging: Configure structured logging for better debugging
  • Security: Implement API Gateway authentication (API keys, JWT, etc.)
  • Scaling: Configure auto-scaling for ECS tasks if needed

Monitoring and Observability

  • CloudWatch Dashboards: Create dashboards to monitor API calls, Lambda executions, and ECS task status
  • AWS X-Ray: Enable tracing for end-to-end request tracking
  • EventBridge: Use AWS EventBridge to capture ECS task state changes for automated workflows

Infrastructure as Code

  • Terraform: Use the included Terraform modules for automated deployment
  • CDK/CloudFormation: Convert to Infrastructure as Code for version control and repeatability

Resource Summary

After completing this guide, you will have created:

Resource TypeNamePurpose
IAM PolicyScalrECSTaskPolicyAllows ECS task to access Secrets Manager
IAM RoleScalrECSTaskRoleTask role for ECS container
IAM PolicyScalrECSExecutionPolicyAllows ECS to pull images and write logs
IAM RoleScalrECSExecutionRoleExecution role for ECS service
IAM PolicyScalrLambdaExecutionPolicyAllows Lambda to trigger ECS tasks
IAM RoleScalrLambdaExecutionRoleExecution role for Lambda function
CloudWatch Log Group/ecs/scalr-agent-pool-clusterStores ECS task logs
ECS ClusterScalrServerlessFargate cluster for running tasks
ECS Task Definitionscalr-agent-runDefines the Scalr agent container
Lambda FunctionScalrServerlessTriggers ECS tasks via API calls
API Gatewayscalr-serverless-apiHTTP API endpoint
Secrets Manager Secretscalr/api/tokenStores Scalr agent token

For Infrastructure as Code deployment using Terraform, see the included modules in this repository.