This guide explains some of the issues that might arise when you use
the Monitoring API.
The Monitoring API is one of the set of Cloud APIs.
These APIs share a common set of error codes. For a list of the error
codes defined by the Cloud APIs and general suggestions on
handling the errors, see
Handling errors
.
Use APIs Explorer for debugging
APIs Explorer is a widget built into the reference pages for API methods.
It lets you invoke the method by filling out fields; it doesn't require you
to write code.
If you are having trouble with a method invocation, use the APIs Explorer
(
Try this API
) widget on the reference page for that method to
debug your problem. For more information, see
APIs Explorer
.
General API errors
Here are some of the Monitoring API errors and messages you might
see from your API calls:
404 NOT_FOUND
with "The requested URL was not found on this server":
Some part of the URL is incorrect. Compare the URL against
the URL for the method shown on the method's reference page.
This error might mean that there is a spelling error,
such as "project" instead of "projects", or a capitalization error,
such as "TimeSeries" instead of "timeSeries".
401 UNAUTHENTICATED
with "User is not authorized to access the project
(or metric)": This error code typically indicate an authorization problem,
but it can mean that there is an error in the project ID or metric type
name. Verify the spelling and capitalization.
If you aren't using APIs Explorer, then try using it. When your API
call works in APIs Explorer, there is probably an authorization
issue in the environment where you're making the API call. Go to the
API manager page
to verify
that the Monitoring API is enabled for your project.
400 INVALID_ARGUMENT
with "Field filter had an invalid value": Verify the
spelling and formatting of the monitoring filter. For more information, see
Monitoring Filters
.
400 INVALID_ARGUMENT
with "Request was missing field interval.endTime"":
You see this message when the end time missing, or when it is present but
not properly formatted.
If you are using
APIs Explorer, then don't quote the value of the time field.
Here are some examples of valid time specifications:
2024-05-11T01:23:45Z
2024-05-11T01:23:45.678Z
2024-05-11T01:23:45.678+05:00
2024-05-11T01:23:45.678-04:30
Missing results
When an API call returns the status code
200
and an empty response, consider
the following:
- When the call uses a filter, the filter might not have matched
anything. The filter match is case-sensitive. To resolve filter
problems, start by specifying only one filter component, such as
metric.type
, and verify that you get results. Add the other filter
components one by one to build up your request.
- When working with a custom metric, verify that that the project which
defines the metric is specified.
There are several reasons why data points might be missing when you use the
timeSeries.list
method:
The data might have aged out.
For more information, see
Data retention
.
The data might not have propagated to Monitoring yet.
For more information, see
Latency of metric data
.
The interval is invalid:
- Verify that the end time is correct.
- Verify that the start time is correct and that it is earlier than the
end time. When the start time is missing or malformed, the API sets the
start-time to the end-time. For
GAUGE
metrics, this time interval only
matches points whose start and end times are exactly the interval's
end time. For
CUMULATIVE
or
DELTA
metrics, which measure across
time intervals, no points are matched.
For more information, see
Time intervals
.
Retrying API errors
Two of the Cloud APIs error codes indicate circumstances in
which it might be useful to retry the request:
503 UNAVAILABLE
: retries are useful when the problem is a short-lived or
transient condition.
429 RESOURCE_EXHAUSTED
: retries are useful, after a delay, for
long-running background jobs with time-based quota such as
n
calls per
t
seconds. Retries aren't useful when the problem is a short-lived or
transient condition, or when you've exhausted a volume-based quota. For
transient conditions, consider tolerating the failure. For quota-related
issues, consider reducing your quota usage or requesting a quota increase.
When writing code that might retry requests, first ensure that the request
is safe to retry.
Is the request safe to retry?
If your request is idempotent, then it is safe to retry. An
idempotent
action is one where any change in state does not depend on the current state.
For example:
- Reading
x
is idempotent; there is no change to the value.
- Setting
x
to 10 is idempotent; this might change the state, if the
value isn't already 10, but it doesn't matter what the current value is.
And it doesn't matter how many times you attempt to set the value.
- Incrementing
x
is
not
idempotent; the new value depends on the current
value.
Retry with exponential backoff
When implementing code to retry requests, you don't want to rapidly issue
new requests indefinitely. If a system is overloaded, this approach
contributes to the problem.
Instead, use a
truncated exponential backoff
approach.
When requests fail because of transient overloads rather than true
unavailability, the solution is reduce the load. A truncated exponential
backoff follows this general pattern:
Establish how long you are willing to wait while retrying or how
many attempts you are willing to make. When this limit is exceeded,
consider the service unavailable and handle that
condition appropriately for your application. This is what makes
the backoff
truncated
; you stop retrying at some point.
Retry the request with increasingly long pauses to
back off
the
frequency of retries. Retry until the request succeeds or your
established limit is reached.
The interval is typically increased by some function of the power of
the retry count, making it an
exponential
backoff.
There are many ways to implement an exponential backoff.
The following is an example that adds an increasing backoff delay
to a minimum delay of 1000ms. The initial backoff delay is 2ms, and it
increases to 2
retry_count
ms with each attempt.
The following table shows the retry intervals using the initial values:
- Minimum delay = 1s = 1000ms
- Initial backoff = 2ms
Retry count
|
Additional delay (ms)
|
Retry after (ms)
|
0
|
2
0
= 1
|
1001
|
1
|
2
1
= 2
|
1002
|
2
|
2
2
= 4
|
1004
|
3
|
2
3
= 8
|
1008
|
4
|
2
4
= 16
|
1016
|
...
|
...
|
...
|
n
|
2
n
|
1000 + 2
n
|
You can truncate the retry cycle by stopping either after
n
attempts
or when the time spent exceeds a reasonable value for your application.
For more information, see the Wikipedia article
Exponential backoff
.