Open Policy Agent
Introduction
Scalr utilizes Open Policy Agent (OPA) to govern Terraform and OpenTofu deployments. OPA is policy-as-code that uses the rego language to evaluate Terraform input data against administrator defined rules. Any data in the Terraform JSON can be evaluated as part of the OPA checks. The evaluation returns a result, which is then interpreted by Scalr to determine what course of action to take. To be clear, OPA does not make any decisions regarding a policy, and it simply returns a result (true, false, some text) which Scalr then uses to make a decision.
The OPA code is treated similarly to your Terraform code in that it is stored in a VCS provider and managed through a GitOps model. Administrators create, manage, and open pull requests on the policies directly in VCS providers and trigger speculative runs to preview the impact of policy changes.
There are two main sections that the policies can evaluate, the tfplan
and tfrun
. The Terraform input data will be the output of the Terraform plan (tfplan in rego) and the run time (tfrun in rego) environment data, such as workspace details, usernames, run sources, etc. From this data, OPA can evaluate the attributes of all Terraform providers, resources, data, and anything mentioned in the plan.
Pre-Plan vs Post Plan Policy Checks
Scalr implements OPA checks in two different stages: pre-plan and/or post-plan. The pre-plan stage can only evaluate information available before the plan, such as the run source, who executed the run, VCS details, and more. The post-plan stage can evaluate everything included in the pre-plan as well as information included in the terraform plan JSON, specifically deployment details about the resources being created, changed, and deleted.
Pre-Plan
The pre-plan checks execute right before the Terraform plan stage. At this stage, Scalr has access to the tfrun
data, which OPA can evaluate. The tfrun
data contains information such as the run source, the user who executed the run, commit authors, workspace name, and anything else that is around how the run execution started:
tfrun
"tfrun": {
"workspace": {
"name": "workspace-a",
"description": null,
"auto_apply": true,
"working_directory": "",
"environment_type": "unmapped",
"tags": [],
"module": null
},
"environment": {
"id": "env-tkaervoct5ou123",
"name": "Environment A"
},
"vcs": {
"repository_id": "rtfee/null_resource_module",
"path": "",
"branch": "main",
"commit": {
"sha": "83714928d9a5dec29268c40dcc31e472e521ceea",
"message": "Merge pull request #4 from user/dev\n\nUpdate main.tf",
"author": {
"name": "John",
"email": "[email protected]",
"username": "john123",
"teams": null
}
},
"pull_request": {
"author": "john123",
"merged_by": "john123"
}
},
"cost_estimate": null,
"source": "dashboard-workspace",
"message": "Merge pull request #4 from user/dev\n\nUpdate main.tf",
"is_destroy": false,
"is_dry": false,
"created_by": {
"name": null,
"email": "[email protected]",
"username": "[email protected]",
"teams": [
{
"id": "team-uc8rrb9qoudc123",
"name": "Module Team"
}
]
},
"shell_variables": [
{
"key": "SCALR_RUNNER_ROLE",
"value": "runner-account-access"
}
]
}
Pre-plan checks improve the developer experience by stopping runs that violate policy early on in the pipeline rather than waiting for the plan to run. Any runs that error or are denied at this stage are not billed for.
The following enforcement levels can be applied to pre-plan checks in the scalr-policy.hcl (see more on this below):
- Hard Mandatory - Causes the run to go into an errored state with an error message.
- Soft Mandatory - Causes the run to go into an errored state with an error message.
- Advisory - Provides a warning only, the run can continue.
You may be wondering why soft and hard policies end in the same result. The difference only applies during post-plan checks, as can be seen below.
Post-Plan
The post-plan checks execute right after the Terraform plan stage. At this stage, Scalr has access to the tfrun
and tfplan
data, which OPA can evaluate. The tfrun
data contains information such as the run source, the user who executed the run, commit authors, workspace name, and anything else that is around how the run execution started (as seen above). The tfplan
data is generated based on the Terraform plan output:
tfplan
"tfplan": {
"format_version": "1.2",
"terraform_version": "1.5.2",
"planned_values": {
"root_module": {
"resources": [
{
"address": "null_resource.example",
"mode": "managed",
"type": "null_resource",
"name": "example",
"provider_name": "registry.terraform.io/hashicorp/null",
"schema_version": 0,
"values": {
"id": "3177230123079022363",
"triggers": null
},
"sensitive_values": {}
},
{
"address": "null_resource.example2",
"mode": "managed",
"type": "null_resource",
"name": "example2",
"provider_name": "registry.terraform.io/hashicorp/null",
"schema_version": 0,
"values": {
"triggers": null
},
"sensitive_values": {}
}
]
}
},
"resource_changes": [
{
"address": "null_resource.example",
"mode": "managed",
"type": "null_resource",
"name": "example",
"provider_name": "registry.terraform.io/hashicorp/null",
"change": {
"actions": [
"no-op"
],
"before": {
"id": "3177230123079022363",
"triggers": null
},
"after": {
"id": "3177230123079022363",
"triggers": null
},
"after_unknown": {},
"before_sensitive": {},
"after_sensitive": {}
}
},
{
"address": "null_resource.example2",
"mode": "managed",
"type": "null_resource",
"name": "example2",
"provider_name": "registry.terraform.io/hashicorp/null",
"change": {
"actions": [
"create"
],
"before": null,
"after": {
"triggers": null
},
"after_unknown": {
"id": true
},
"before_sensitive": false,
"after_sensitive": {}
}
}
],
"prior_state": {
"format_version": "1.0",
"terraform_version": "1.5.2",
"values": {
"root_module": {
"resources": [
{
"address": "null_resource.example",
"mode": "managed",
"type": "null_resource",
"name": "example",
"provider_name": "registry.terraform.io/hashicorp/null",
"schema_version": 0,
"values": {
"id": "3177230123079022363",
"triggers": null
},
"sensitive_values": {}
}
]
}
}
},
"configuration": {
"provider_config": {
"null": {
"name": "null",
"full_name": "registry.terraform.io/hashicorp/null"
}
},
"root_module": {
"resources": [
{
"address": "null_resource.example",
"mode": "managed",
"type": "null_resource",
"name": "example",
"provider_config_key": "null",
"provisioners": [
{
"type": "local-exec",
"expressions": {
"command": {
"constant_value": "echo 'Congrats on your first run! Woohoo!!!!'"
}
}
}
],
"schema_version": 0
},
{
"address": "null_resource.example2",
"mode": "managed",
"type": "null_resource",
"name": "example2",
"provider_config_key": "null",
"provisioners": [
{
"type": "local-exec",
"expressions": {
"command": {
"constant_value": "echo 'Congrats on your first run! Woohoo!!!!123'"
}
}
}
],
"schema_version": 0
}
]
}
},
"timestamp": "2024-09-17T17:27:39Z"
}
Post-plan checks help with security standards regarding the creation and deployment of provider resources. For example, they can be used to prevent users from deploying public S3 buckets or restrict which Terraform providers can be used.
The following enforcement levels can be applied to pre-plan checks:
- Hard Mandatory - Causes the run to go into an errored state with an error message.
- Soft Mandatory - Causes the run to go into an approval state, users with the permission
policy-checks:override
can approve or deny the run. In plan-only runs, soft-mandatory policies behave like hard mandatory and will fail the PR. - Advisory - Provides a warning only; the run will continue.
Creating an OPA Policy
When writing an OPA policy to be used in Scalr, the package must be namedterraform
as seen in the example below.
Scalr requires that the OPA evaluation returns an array of strings named deny
. Scalr will interpret the presence of any values in the array as a policy failure. Thus, in policies used with Scalr, a rule named deny
must always perform the evaluation. Here is a very simple example that checks the cost of the resources in a Terraform run using the cost estimation data:
package terraform
import input.tfrun as tfrun
deny[reason] {
cost_delta = tfrun.cost_estimate.delta_monthly_cost
cost_delta > 10
reason := sprintf("Plan is too expensive: $%.2f, while up to $10 is allowed", [cost_delta])
}
An evaluation that detects the cost to be higher than $10 would return a result like this:
{
"deny": [
"Plan is too expensive: $12.00, while up to $10 is allowed"
]
}
All deny
rules are independent and if at least one rule fails - it will fail the entire group and then depending on the enforcement level will mark it either advisory or fail the run.
The import function will either be tfrun
or tfplan
or both, depending on the data you want to evaluate and if it is a pre or post plan policy:
import input.tfrun as tfrun
import input.tfplan as tfplan
Creating Policy Groups
Once OPA policies are created, you then want to create a policy group. A policy group is the collection of one or more OPA policies stored in a repository with a scalr-policy.hcl
file declaring if the policies are enabled and what the enforcement level is. One or more policy gorups can be attached to each environment, and it is normal to have a common policy group shared across all environments as well as environment-specific policy groups to ease overhead. Common shared policy groups could include restricting network access, preventing public S3 buckets, allowing specific Terraform modules, and much more. The following example helps with the basics, if you have commonly shared OPA code that you want to reuse, please see the functions section below.
Create policy rego files in a VCS repo:
Note: The above repo can be found here.
Add a scalr-policy.hcl
file to set the enforcement levels and if the policy is enabled or not for each policy file in the repo, as shown in this example. If a the name of the rego file is not included in the scalr-policy.hcl
, then it will be ignored. Note the version is required:
version = "v1"
#policy "terraform_version_check" {
# enabled = true
# enforcement_level = "soft-mandatory"
#}
policy "terraform_min_version" {
enabled = true
enforcement_level = "hard-mandatory"
}
policy "terraform_suggested_version" {
enabled = true
enforcement_level = "soft-mandatory"
}
policy "limit_modules" {
enabled = false
enforcement_level = "soft-mandatory"
}
policy "workspace_name_convention" {
enabled = true
enforcement_level = "advisory"
}
policy "limit_cost" {
enabled = true
enforcement_level = "soft-mandatory"
}
policy "protect_controller" {
enabled = true
enforcement_level = "soft-mandatory"
}
policy "destroy_approval" {
enabled = true
enforcement_level = "soft-mandatory"
}
After the files are created in your repo, go into Scalr, the run policy engine, and create the policy group. To use policy groups, it is necessary to create a VCS providerin Scalr if this has not already been done. Go to integrations at the account scope, then OPA, and create a policy group:
Select if the policy should run as a pre-plan or post-plan checks. Save the policy and then enforce it on environments by clicking on "Enforce run policy group":
Policy reports will appear within a few seconds (maybe longer depending on the amount of workspaces) once the policy is enforced. The policies will automatically be applied to all workspaces in the environments and the results can be viewed within a run:
Functions
Scalr supports OPA functions, allowing OPA admins to specify a path for common Rego functions. This enables Scalr to import shared functions before each policy evaluation, reducing code duplication and simplifying OPA policy management. To use functions, there are three main steps:
- Create the function
- Import the function into the main OPA rego file
- Set the path to the function in Scalr
Function:
Note that the file name in this case is utils.rego
package utils
array_contains(arr, elem) {
arr[_] = elem
}
get_basename(path) = basename{
arr := split(path, "/")
basename:= arr[count(arr)-1]
}
Main Rego File:
Note that the function is imported with import data.utils
as it references the function file name.
# Prevent specified providers from being used
package terraform
import input.tfplan as tfplan
import data.utils
# Blacklisted Terraform providers
not_allowed_provider = [
"null"
]
deny[reason] {
resource := tfplan.resource_changes[_]
action := resource.change.actions[count(resource.change.actions) - 1]
utils.array_contains(["create", "update"], action) # allow destroy action
provider_name := utils.get_basename(resource.provider_name)
utils.array_contains(not_allowed_provider, provider_name)
reason := sprintf(
"%s: provider type %q is not allowed",
[resource.address, provider_name]
)
}
The function is pulled into the policy check by Scalr when the function folder is defined:
Policy Reports
Once a policy group is created and assigned to environments, Scalr will automatically start populating the reports for that policy group. The report will retroactively run on all workspace the policy group is linked to and update once new run are executed within the workspace. The policy group reports gives you a single view of the current results of policies across all environments and workspaces.
Policy Impact Analysis
Scalr provides a preview of future policy changes based on pull requests against the repository linked to the policy group. Similar to a Terraform dry run, but for OPA! This report gives administrators the piece of mind to know if they made an error updating the policy or if there are workspaces that will violate the upcoming policy change, allowing the administrator to give workspace owners a warning before merging the PR. The results of the previews are also pushed back to the VCS provider with checks for branch policies. These previews do not cause current runs to fail.
Examples
Below are a few example policies, please see many more in this community repository
Limit Run Source
In the example, below we are limiting the run source. For example, the organization has decided that runs in a specific environment cannot be CLI based. We want to make sure we catch this as early as possible to improve development speed, so we will implement this as a pre-plan check. We have also added exception for certain user as needed. In this case, we will only input the tfrun
data:
package terraform
import input.tfrun as tfrun
allowed_cli_users = ["d.johnson", "j.smith"]
array_contains(arr, elem) {
arr[_] = elem
}
get_basename(path) = basename{
arr := split(path, "/")
basename:= arr[count(arr)-1]
}
deny["User is not allowed to perform runs from Terraform CLI"] {
"cli" == tfrun.source
not array_contains(allowed_cli_users, tfrun.created_by.username)
}
Prevent Auto Apply
In this example, we are blocking users from being able to execute runs that have auto-apply
enabled. Our organizational standard is that auto-apply
is ok for dev or lab environments, but not for production so we will only apply this policy to production environments. We want to make sure we catch this as early as possible to improve development speed, so we will implement this as a pre-plan check. In this case, we will only input the tfrun
data:
# Allprod runs must be approved
package terraform
import input.tfrun as tfrun
# Deny if auto-apply is enabled
deny["auto-apply is not allowed"] {
tfrun.workspace.auto_apply == true
}
Allowed Regions
In this example, we want to restrict which regions users are allows to deploy to depending on the cloud provider used. The regions are only available in the tfplan
data because the Terraform plan must be generated first so this will runs as a post-plan check:
# Enforce a list of allowed locations / availability zones for each provider
package terraform
import input.tfplan as tfplan
allowed_locations = {
"aws": ["us-east-1", "us-east-2"],
"azurerm": ["eastus", "eastus2"],
"google": ["us-central1-a", "us-central1-b", "us-west1-a"]
}
array_contains(arr, elem) {
arr[_] = elem
}
get_basename(path) = basename{
arr := split(path, "/")
basename:= arr[count(arr)-1]
}
eval_expression(plan, expr) = constant_value {
constant_value := expr.constant_value
} else = reference {
ref = expr.references[0]
startswith(ref, "var.")
var_name := replace(ref, "var.", "")
reference := plan.variables[var_name].value
}
get_location(resource, plan) = aws_region {
# registry.terraform.io/hashicorp/aws -> aws
provider_name := get_basename(resource.provider_name)
"aws" == provider_name
provider := plan.configuration.provider_config[_]
"aws" = provider.name
region_expr := provider.expressions.region
aws_region := eval_expression(plan, region_expr)
} else = azure_location {
provider_name := get_basename(resource.provider_name)
"azurerm" == provider_name
azure_location := resource.change.after.location
} else = google_zone {
provider_name := get_basename(resource.provider_name)
"google" == provider_name
google_zone := resource.change.after.zone
}
deny[reason] {
resource := tfplan.resource_changes[_]
location := get_location(resource, tfplan)
provider_name := get_basename(resource.provider_name)
not array_contains(allowed_locations[provider_name], location)
reason := sprintf(
"%s: location %q is not allowed",
[resource.address, location]
)
}
Policy Mocks
Policy mocks in Scalr are very helpful in creating OPA policies as they present all of the tfrun
and tfplan
information that is possible to use within an OPA policy. A policy mock can be downloaded per run in the individual run dashboard:
More Resources
For examples of OPA policies, please visit our GitHub repository: Open Policy Agent Examples
To learn more about using the terraform plan data with OPA please review our Scalr OPA Blog series, in particular OPA Series Part 3: How to analyze the JSON plan.
To learn more about writing OPA policies for Terraform and Scalr we recommend the following resources:
Video
More of a visual learner? Check out this feature on YouTube.
Updated about 1 month ago