Agent policies enable automated installation and maintenance of the
Google Cloud Observability agents
across a fleet of Compute Engine VMs that match
user-specified criteria. With one command, you can create a policy for a
Google Cloud project that governs existing and new VMs associated with that
Google Cloud project, ensuring proper installation
and optional auto-upgrade of all Google Cloud Observability agents
on those VMs.
You create and manage agent policies by using the
gcloud beta compute instances ops-agents policies
command
group in the Google Cloud CLI. The commands in this group use the
VM Manager
suite of tools in
Compute Engine to manage
OS policies
,
which can automate the deployment and maintenance of software configurations
like the
Google Cloud Observability agents: the Ops Agent, the legacy Monitoring agent, and the legacy Logging agent.
Supported operating systems
You can apply an agent policy to
Compute Engine
VM instances running the operating systems shown in the following table.
In the table, the agent columns map to an
agent type
specified to the
gcloud beta compute instances ops-agents policies
create
invocation:
- Logging agent
maps to policies with agent type
logging
.
- Monitoring agent
maps to policies with agent type
metrics
.
- Ops Agent
maps to policies with agent type
ops-agent
.
Operating system
|
Logging agent
|
Monitoring agent
|
Ops Agent
|
CentOS 7
|
|
|
|
CentOS 8
|
|
|
|
Rocky Linux 8
|
|
|
|
RHEL 6
|
|
|
|
RHEL 7:
rhel-7, rhel-7-6-sap-ha, rhel-7-7-sap-ha, rhel-7-9-sap-ha
|
|
1
|
|
RHEL 8:
rhel-8, rhel-8-4-sap-ha, rhel-8-6-sap-ha, rhel-8-8-sap-ha
|
|
1
|
|
Debian 9 (Stretch)
|
|
|
|
Debian 10 (Buster)
|
|
|
|
Debian 11 (Bullseye)
|
|
|
|
Ubuntu LTS 18.04 (Bionic Beaver):
ubuntu-1804-lts, ubuntu-minimal-1804-lts
|
|
|
|
Ubuntu LTS 20.04 (Focal Fossa):
ubuntu-2004-lts, ubuntu-minimal-2004-lts
|
|
|
|
Ubuntu LTS 22.04 (Jammy Jellyfish):
ubuntu-2204-lts, ubuntu-minimal-2204-lts
|
|
|
|
SLES 12:
sles-12, sles-12-sp5-sap
|
|
|
|
SLES 15:
sles-15, sles-15-sp2-sap, sles-15-sp3-sap,
sles-15-sp4-sap, sles-15-sp5-sap
|
|
|
|
OpenSUSE Leap 15:
opensuse-leap (opensuse-leap-15-3-*,
opensuse-leap-15-4-*)
|
|
|
|
Windows Server:
2016, 2019, 2022, Core 2016, Core 2019, Core 2022
|
|
|
|
1
The Monitoring agent is not
supported on
rhel-7-9-sap-ha
,
rhel-8-2-sap-ha
, or
rhel-8-4-sap-ha
.
Create an agent policy
To create an agent policy by using the Google Cloud CLI, complete the following
steps:
If you haven't done so already, install the
Google Cloud CLI
.
This document describes the
beta
command group for managing agent policies.
If you haven't done so already, install the
beta
component of the
gcloud CLI:
gcloud components install beta
To check if you have the
beta
component for the installed, run:
gcloud components list
If you previously installed the
beta
component, ensure you have the
latest version:
gcloud components update
Use the following script to enable the APIs and to set the proper permissions
for using the Google Cloud CLI:
set-permissions.sh
.
For information about the script, refer to
What's the
set-permissions.sh
script doing?
.
Use the
gcloud beta compute instances ops-agents policies
create
command
to create a policy. For the syntax of the command, see the
gcloud beta compute instances ops-agents policies
create
documentation.
For examples showing how to format the command, see the
Examples
section in the Google Cloud CLI documentation.
For more information about the other commands in the command group and
the available options, see the
gcloud beta compute instances ops-agents policies
documentation.
Best practices for using agent policies
To control the impact to production systems during rollout, we recommend that
you use instance labels and zones to filter the instances that the policy
applies to.
If you're creating a policy for the Ops Agent, ensure that your VMs
don't have the legacy Logging agent or Monitoring agent installed on them. Running the
Ops Agent and the legacy agents on the same VM can cause ingestion of
duplicate logs or a conflict in metrics ingestion. If necessary,
uninstall the Monitoring agent
and
uninstall the
Logging agent
before creating a policy to install the Ops Agent.
Here is an example of a phased rollout plan for CentOS 7 VMs in a project
called
my_project
:
Phase 1: Create a policy named
ops-agents-policy-safe-rollout
to install the
Ops Agent on all VMs with the labels
env=test
and
app=myproduct
.
gcloud beta compute instances \
ops-agents policies create ops-agents-policy-safe-rollout \
--agent-rules="type=ops-agent,version=current-major,package-state=installed,enable-autoupgrade=true" \
--os-types=short-name=centos,version=7 \
--group-labels=env=test,app=myproduct \
--project=my_project
For more information about specifying the operating system, see
gcloud beta compute instances ops-agents policies
create
.
Phase 2: Update that policy to target VMs in a single zone that have the
labels
env=prod
and
app=myproduct
.
gcloud beta compute instances \
ops-agents policies update ops-agents-policy-safe-rollout \
--group-labels=env=prod,app=myproduct \
--zones=us-central1-c \
Phase 3: Update that policy to clear the zones filter so it rolls out globally
gcloud beta compute instances \
ops-agents policies update ops-agents-policy-safe-rollout \
--clear-zones
Limitations
For a policy to take effect on VMs that
predate
OS Config, additional setup is
needed to ensure the OS Config agent that the policy relies on is installed on
the VMs. To install the OS Config agent on a fleet of VMs, complete the
following steps:
Ensure you have run the
set-permissions.sh
script in the
Create an agent policy
section.
Identify the VMs on which you want to install the OS Config agent and list
them in a CSV file. For example, to get a list of VMs that aren't managed
by Google Kubernetes Engine, App Engine, or other Google Cloud services and then save
it in a file called
instances.csv
, run the following command:
gcloud compute instances list \
--filter="-labels.list(show="keys"):goog-" \
--format="csv(name,zone)" \
| grep -v -x -F -f <(gcloud compute instances os-inventory list-instances \
--format="csv(name,zone)") \
| sed 's/$/,update/' > instances.csv
The
grep
section filters out the VMs that already have the OS Config agent
installed and enabled. The VM-label exclusion based on
goog-
filters out
Compute Engine VMs managed by GKE,
App Engine, and other services.
To further filter the instances by zones or labels, change the value
of the
--filter
flag to something similar to the following:
"-labels.list(show="keys"):goog- AND zone:(
ZONE_1
,
ZONE_2
) AND labels.
KEY_1
:
VALUE_1
AND labels.
KEY_2
=
VALUE_2
"
To install the OS Config agent on Linux VMs, download and run the
mass-install-osconfig-agent.sh
script.
The following command installs the OS Config agent on the VMs specified
in the
instances.csv
file in the specified project:
bash mass-install-osconfig-agent.sh --project
PROJECT_ID
--input-file instances.csv
For more information about using the script, see the comments in the
script.
Troubleshooting
The
ops-agents policy
commands fail
When a
gcloud beta compute instances ops-agents policies
command fails, the response shows a
validation error. Correct the errors by fixing the command arguments and
flags as suggested by the error message.
In addition to the validation errors, you might see the following errors:
Insufficient IAM permission
A sample error looks like:
ERROR: (gcloud.beta.compute.instances.ops-agents.policies.
command
) PERMISSION_DENIED: Caller does not have required permission to
command
Make sure you run the
set-permissions.sh
script in the
Create an agent policy
section to set up the
osconfig.guestPolicy
specific IAM role.
To verify whether you have the sufficient OS Config guest policy role
enabled for the project, you can run the following command. In this example,
the command checks if the user has the
roles/osconfig.guestPolicyAdmin
role. The
GCLOUD_MEMBER
value must be in the format of
user:USER_EMAIL
or
serviceaccount:SERVICE_ACCOUNT_EMAIL
.
gcloud projects get-iam-policy
PROJECT_ID
\
--filter=--member=
GCLOUD_MEMBER
\
| grep "roles/osconfig.guestPolicyAdmin" -B 2
The expected output is:
- members:
-
GCLOUD_MEMBER
role: roles/osconfig.guestPolicyAdmin
OS Config API is not enabled
A sample error looks like:
API [osconfig.googleapis.com] not enabled on project
PROJECT_ID
.
Would you like to enable and retry (this will take a few minutes)?
(y/N)?
Make sure you run the
set-permissions.sh
script in the
Create an agent policy
section to grant all the
necessary permissions.
To verify whether the OS Config API is enabled for the project, you can run the
following commands:
gcloud services list --project
PROJECT_ID
\
| grep osconfig.googleapis.com
The expected output is:
osconfig.googleapis.com Cloud OS Config API
The policy does not exist
A sample error looks like:
NOT_FOUND: Requested entity was not found
This suggests the policy has already been deleted. Make sure the policy ID
in the
describe
,
update
or
delete
command maps to an existing policy.
The policy is created, but seems to have no effect
OS Config agents are deployed to each Compute Engine instance to manage the
packages for the Logging and Monitoring agents.
The policy may seem to have no effect if the underlying OS Config agent isn't
installed.
LINUX
To verify that the OS Config agent is installed, run the following command:
gcloud compute ssh
instance-id
\
--project
project-id
\
-- sudo systemctl status google-osconfig-agent
A sample output is:
google-osconfig-agent.service - Google OSConfig Agent
Loaded: loaded (/lib/systemd/system/google-osconfig-agent.service; enabled; vendor preset:
Active: active (running) since Wed 2020-01-15 00:14:22 UTC; 6min ago
Main PID: 369 (google_osconfig)
Tasks: 8 (limit: 4374)
Memory: 102.7M
CGroup: /system.slice/google-osconfig-agent.service
└─369 /usr/bin/google_osconfig_agent
WINDOWS
To verify that the OS Config agent is installed, run the following steps:
Connect to your instance using RDP or a similar tool and login to Windows.
Open a PowerShell terminal, then run the following PowerShell command. You
don't need administrator privileges.
Get-Service google_osconfig_agent
A sample output is:
Status Name DisplayName
------ ---- -----------
Running google_osconfig_a… Google OSConfig Agent
SUSE and Ubuntu Compute Engine instances don't have the OS Config agent
preinstalled, so you need to follow the OS Config agent installation
instructions
to
get the OS Config agent installed on those Compute Engine instances.
The OS Config agent is installed, but it does not install the Ops agents
To verify if there are any errors when the OS Config agent applies policies, you
can check the OS Config agent's log. This can be done either by using
Logs Explorer or using SSH or RDP to check individual Compute Engine
instances.
To view OS Config agent logs in
Logs Explorer
,
use the following filter:
resource.type="gce_instance"
logName="projects/
PROJECT_ID
/logs/OSConfigAgent"
To view OS Config agent logs by using SSH for individual Compute Engine Linux
instances, run the following command:
CentOS / RHEL / SLES / SUSE
gcloud compute ssh
INSTANCE_ID
\
--project
PROJECT_ID
\
-- sudo cat /var/log/messages \
| grep "OSConfigAgent\|google-fluentd\|stackdriver-agent"
Debian / Ubuntu
gcloud compute ssh
INSTANCE_ID
\
--project
PROJECT_ID
\
-- sudo cat /var/log/syslog \
| grep "OSConfigAgent\|google-fluentd\|stackdriver-agent"
To view OS Config agent logs by using RDP for individual Compute Engine Windows
instances, run the following steps:
Connect to your instance using RDP or a similar tool and login to Windows.
Open the
Event Viewer
app, under
Windows Logs
=>
Application
, search
for logs with
Source
equal to
OSConfigAgent
.
If there is an error connecting to the OS Config service, make sure you run the
set-permissions.sh
script in the
Creating an agent policy
section to set up the metadata.
To verify that the OS Config metadata is enabled, you can run the following
command:
gcloud compute project-info describe \
--project
PROJECT_ID
\
| grep "enable-osconfig\|enable-guest-attributes" -A 1
The expected output is:
- key: enable-guest-attributes
value: 'TRUE'
- key: enable-osconfig
value: 'TRUE'
Observability agents are installed, but not functioning properly
For information about debugging specific agents, see the following documents:
Enable debug-level logs for the OS Config agent
It can be useful to enable debug-level logging in the OS Config agent when
reporting an issue.
You can set the
osconfig-log-level: debug
metadata to enable debug-level
logging for the OS Config agent. The collected logs have more information to
help with the investigation.
To enable debug-level logging for the entire project, run the following command:
gcloud compute project-info add-metadata \
--project
PROJECT_ID
\
--metadata osconfig-log-level=debug
To enable debug-level logging for one VM, run the following command:
gcloud compute instances add-metadata
INSTANCE_ID
\
--project
PROJECT_ID
\
--metadata osconfig-log-level=debug
What's the
set-permissions.sh
script doing?
Given a project ID, an Identity and Access Management (IAM) role, and an email or a
service account, the
set-permissions.sh
script performs
the following actions:
Enables the Cloud Logging API, the Cloud Monitoring API, and the
OS Config API for the project.
Grants the
roles/logging.logWriter
and the
roles/monitoring.metricWriter
roles to the
Compute Engine default
service account
so that the agents can write logs and metrics to the
Logging and Cloud Monitoring APIs.
Enables the OS Config metadata for the project so that OS Config agents
get activated on the VMs.
Grants the specified IAM role to the
gcloud
user or
the service account. Project owners have full access to create and
manage a policy. For all other users or service accounts, project
owners must grant one of the following roles:
roles/osconfig.guestPolicyAdmin
: Provides full access to a
policy.
roles/osconfig.guestPolicyEditor
: Allows users to get, update,
and list a policy.
roles/osconfig.guestPolicyViewer
: Provides read-only access to
get and list a policy.
When running the script, you only need to specify the
guestPolicy*
part of the role name. The script supplies the
roles/osconfig.
part of the name.
The following invocation of the script enables the APIs, grants the necessary
roles to the default service account, and enables the OS Config metadata:
bash set-permissions.sh --project=
PROJECT_ID
To use the script to also grant one of the OS Config roles to a user who does
not have the
roles/owner
(Owner) role on the project, run the script as
follows:
bash set-permissions.sh --project=
PROJECT_ID
\
--iam-user=
USER_EMAIL
\
--iam-permission-role=guestPolicy
[Admin|Editor|Viewer]
To use the script to also grant one of the OS Config roles to a non-default
service account, run the script as follows:
bash set-permissions.sh --project=
PROJECT_ID
\
--iam-service-account=
SERVICE_ACCT_EMAIL
\
--iam-permission-role=guestPolicy
[Admin|Editor|Viewer]
For more information, see the contents of the script.
What's the
diagnose.sh
script doing?
Given a project, a Compute Engine instance ID, and an Ops agent policy ID, the
diagnose.sh
script automatically collects
the necessary information to help diagnose issues with the policy:
The OS Config agent version
The underlying OS Config guest policy
The policies that are applicable to this Compute Engine instance
The agent package repos that are pulled on to a Compute Engine instance
Terraform support is built on top of the Google Cloud CLI commands. To create an
agent policy using Terraform, follow the
Terraform module instruction
.