This document describes how you can request asynchronous (non-HTTPS)
background functions to retry on failure.
Semantics of retry
Cloud Functions guarantees at-least-once execution of an event-driven function
for each event emitted by an event source. By default, if a function
invocation terminates with an error, the function is not invoked again
and the event is dropped. When you enable retries on an event-driven
function, Cloud Functions retries a failed function invocation until
it completes successfully or the retry window expires.
For 2nd gen functions,
this retry window expires after 24 hours. For 1st gen functions, it expires
after 7 days. Cloud Functions retries newly created event-driven functions using
an exponential backoff strategy, with an increasing backoff of between 10
and 600 seconds. This policy is applied to new functions
the first time you deploy them. It is not retroactively applied to existing
functions that were first deployed before the changes described in
this release note
took effect, even if you redeploy the functions.
When retries are not enabled for a function, which is the default, the function
always reports that it executed successfully, and
200 OK
response codes might
appear in its logs. This occurs even if the function encountered an error. To
make it clear when your function encounters an error, be sure to
report errors
appropriately.
Why event-driven functions fail to complete
On rare occasions, a function might exit prematurely due to an internal error,
and by default the function might or might not be automatically retried.
More typically, an event-driven function might fail to successfully complete due
to errors thrown in the function code itself. The reasons this might
happen include:
- The function contains a bug and the runtime throws an exception.
- The function cannot reach a service endpoint, or times out while trying to
do so.
- The function intentionally throws an exception (for example, when a parameter
fails validation).
- A Node.js function returns a rejected promise, or passes a non-
null
value to
a callback.
In any of the above cases, the function stops executing by default and the event
is discarded. To retry the function when an error occurs, you can
change the default retry policy by
setting the "retry on failure" property
.
This causes the event to be retried repeatedly until the
function successfully completes or the retry timeout expires.
Enable or disable retries
If you're creating a new function:
- From the
Create Function
screen,
select
Add trigger
and choose the type of event to act as a trigger for your
function.
- In the
Eventarc trigger
pane, select the
Retry on failure
checkbox
to enable retries.
If you're updating an existing function:
- From the
Cloud Functions Overview
page, click the name of the
function you're updating to open its
Function details
screen, then
choose
Edit
from the menu bar to display the
HTTPS
and
Eventarc
trigger panes.
- In the
Eventarc trigger
pane, click the
edit
edit
icon to edit your
trigger's settings.
- In the
Eventarc trigger
pane, select or clear the
Retry on failure
checkbox to enable or disable retries.
With Cloud Functions for Firebase, you can enable retries in the code for a
function. To do this for a background function such as
functions.foo.onBar(myHandler);
, use
runWith
and configure a failure policy:
functions.runWith({failurePolicy: true}).foo.onBar(myHandler);
Setting
true
as shown configures a function to retry on failure.
Best practices
This section describes best practices for using retries.
Use retry to handle transient errors
Because your function is retried continuously until successful execution,
permanent errors like bugs should be eliminated from your code through testing
before enabling retries. Retries are best used to handle intermittent/transient
failures that have a high likelihood of resolution upon retrying, such as a
flaky service endpoint or timeout.
Set an end condition to avoid infinite retry loops
It is best practice to protect your function against continuous looping when
using retries. You can do this by including a well-defined end condition,
before
the function begins processing. Note that this technique only works if
your function starts successfully and is able to evaluate the end condition.
A simple yet effective approach is to discard events with timestamps older than
a certain time. This helps to avoid excessive executions when failures are
either persistent or longer-lived than expected.
For example, this code snippet discards all events older than 10 seconds:
const eventAgeMs = Date.now() - Date.parse(event.timestamp);
const eventMaxAgeMs = 10000;
if (eventAgeMs > eventMaxAgeMs) {
console.log(`Dropping event ${event} with age[ms]: ${eventAgeMs}`);
callback();
return;
}
Use
catch
with Promises
If your function has retries enabled, any unhandled error will trigger a retry.
Make sure that your code captures any errors that should not result in a retry.
Here is an example of what you should do:
return doFooAsync().catch((err) => {
if (isFatal(err)) {
console.error(`Fatal error ${err}`);
}
return Promise.reject(err);
});
Make retryable event-driven functions idempotent
Event-driven functions that can be retried must be idempotent. Here are some
general guidelines for making such a function idempotent:
- Many external APIs (such as Stripe) let you supply an idempotency key
as a parameter. If you are using such an API, you should use the event ID as
the idempotency key.
- Idempotency works well with at-least-once delivery, because it makes it safe to
retry. So a general best practice for writing reliable code is to combine
idempotency with retries.
- Make sure that your code is internally idempotent. For example:
- Make sure that mutations can happen more than once without changing the
outcome.
- Query database state in a transaction before mutating the state.
- Make sure that all side effects are themselves idempotent.
- Impose a transactional check outside the function, independent of the code.
For example, persist state somewhere recording that a given event ID has
already been processed.
- Deal with duplicate function calls out-of-band. For example, have a separate clean up
process that cleans up after duplicate function calls.
Depending on the needs of your Cloud Function, you may want to configure the
retry policy directly. This would allow you to set up any combination of the
following:
- Shorten the retry window from 7 days to as little as 10 minutes.
- Change the minimum and maximum backoff time for the exponential backoff
retry strategy.
- Change the retry strategy to retry immediately.
- Configure a
dead-letter topic
.
- Set a maximum and minimum number of delivery attempts.
To configure the retry policy:
- Write an HTTP function.
- Use the Pub/Sub API to create a Pub/Sub subscription, specifying the URL of
the function as the target.
See
Pub/Sub documentation on handling failures
for a more information on configuring Pub/Sub directly.