Troubleshooting
This page covers remediation steps for common issues encountered during Scalr Agent maintenance.
Internal Errors
If you encounter internal system errors or unexpected behavior:
- If the issue appeared after a Scalr Agent upgrade, open a support ticket and consider downgrading to the previous version while we investigate your support request.
- Otherwise, ensure you're running the latest agent version — many issues are fixed in newer releases. Check the changelog to see if your issue is already addressed. Open a support ticket if you believe it's not covered.
Performance Issues
If you're experiencing slow runs, out-of-memory errors, or out-of-disk-space errors, verify that your agents have sufficient CPU, memory, and storage. See the hardware requirements for minimums.
You can configure OpenTelemetry Metrics and Tracing to gain detailed performance insights and identify bottlenecks in your on-premises setup.
If you're using Helm charts, follow the Performance Optimization guide for your chart:
If the issue persists or you need help investigating your on-premises setup, open a support ticket.
Delays in Run Stage Startup
Under normal circumstances, the agent picks up a queued run in a fraction of a second. If run stages are slow to start when using a self-hosted agent pool, check the following:
- Ensure there are enough agents in the pool with free capacity. If not, increase the number of agents.
- Use agent version 0.42.0 or later. Older versions may have up to a 10-second pickup delay.
- On Docker runtimes, cold starts may include time to pull the runner image. Pre-pull the image before the agent joins the pool.
If you're using Helm charts, follow the performance optimization guide for your chart for additional tuning options:
Check agent logs for errors — unexpected agent termination or internal errors can also cause delays or prevent runs from being picked up.
Collecting Logs
The Scalr Agent writes logs to stdout. Log storage is handled by the runtime or an external logging platform. If you have external monitoring configured, use your existing tools to retrieve logs, otherwise, you can collect them directly from the runtime using the instructions below.
If possible, enable debug logging using the SCALR_AGENT_DEBUG option (or agent.debug in Helm charts) before collecting.
Learn about Scalr Agent logging.
Docker
Use the docker logs command to capture logs from an affected agent (replace scalr-agent with the actual container name if needed):
docker logs scalr-agent > scalr-agent-log.txtKubernetes
Use kubectl logs to archive all logs from the Scalr Agent namespace into a single bundle. Replace the ns variable with your Helm release namespace and run:
ns="scalr-agent"
mkdir -p logs && for pod in $(kubectl get pods -n $ns -o name); do kubectl logs -n $ns $pod > "logs/${pod##*/}.log"; done && zip -r scalr-agent-logs.zip logs && rm -rf logsThis produces a ZIP bundle named scalr-agent-logs.zip containing logs from all pods in the namespace.
This method requires the
kubectlandzipcommands, with sufficient permissions to read pod logs from the agent release namespace.
Pull logs immediately after an incident — this command does not retrieve logs from restarted or terminated pods.
Creating a Support Ticket
Providing complete information upfront allows for faster resolution and less back-and-forth. When opening a ticket, consider including the following:
- The exact sequence of actions that triggered the issue and the full text of any error messages.
- Any special circumstances around the discovery, such as whether it was a first occurrence, was triggered by a specific event, frequency of occurrence, business impact, or suggested urgency.
- Any component upgrades or configuration changes that occurred before the issue appeared (e.g. the agent, Docker, or Kubernetes was upgraded).
- Installation type (Docker package, Linux package, Kubernetes chart such as
agent-k8soragent-job), platform version (e.g. Kubernetes 1.34.4), and platform vendor (e.g. GKE, Amazon EKS, EC2). - Log files covering the time window when the incident occurred. See Collecting Logs for instructions.
- Any non-default configuration such as HTTP proxies, self-signed SSL certificates, agent image customizations, custom Terraform mirrors, or anything else outside a standard setup.
For performance issues, also include:
- CPU and memory limits and utilization metrics
- Root disk/volume configuration, IOPS limits, and I/O throughput metrics if available
- External cache or network-attached disk configuration, if used (e.g. Amazon EFS, Google Filestore, Google Cloud Hyperdisk)
Open a support request at the Scalr Support Center and attach all details and logs.
Updated 7 days ago
