Lifecycle of an incident
Google Maps Platform adheres to the
Google Cloud Platform Incident Management framework
.
When an outage or service degradation occurs, the product engineering team and the
Google Maps Platform Support team work together to resolve the incident and communicate it to you.
Detection
Google uses internal and black box monitoring to detect incidents and trigger alerts to our
engineers for investigation. For more information, see
Chapter 6 of the Site Reliability Engineering book
.
If you detect an incident that has not yet been reported in the
Issue Tracker
,
go to the
Google Maps Platform Support Create a Case page
(in Google Cloud Console) and create new a support case.
Initial Response
When Google detects an incident, the Support team leads communication with you. Initial
notification of an incident is often sparse, frequently only mentioning the product
in question along with key symptoms. This is because we prioritize fast notification over detail.
As we learn more, additional details are provided in subsequent updates.
Incident communication channels
To provide the appropriate amount of information, the Google Maps Platform Support team offers
different incident communication channels, depending on the scope and severity of an issue:
The
Maps Public Status
Dashboard
is the first place to check when you discover an issue is affecting you. The dashboard
shows incidents that affect many customers, so if you see an incident listed it is likely related to
your problem. To indicate severity, the status dashboard marks incidents as either a service outage,
disruption, or information.
The
Google Maps Platform Notifications Group
is a public Google group where all
widespread outages are reported, in addition to other technical updates about Google Maps Platform
APIs. All group members will receive an email notification when an outage is initially detected with
subsequent updates until the issue is resolved.
The Maps Platform status card is an informational message that is always visible in the
Maps Support
section of the Cloud Console showing the current status of Maps Platform APIs and
services. When there is an active incident, there will be a message identifying the affected product
and includes a link to the Maps Public Status Dashboard where you can see active incidents.
The
Issue Tracker
contains a reference list of all known incidents. You can view open incidents, follow their
progress by
subscribing
to them, and add comments to help our teams investigate. You can also find
the link to the Issue Tracker in the Google Maps Platform
support documentation
.
Support cases are used if the issue might be isolated to your project(s) or impacts a
limited number of customers. If no incident has been declared, but you are still experiencing an
issue, go to the
Google Maps Platform Support Create a Case page
(in the
Cloud Console) and create a new support case.
Investigation
Product engineering teams are responsible for investigating the root cause of incidents.
Incident management is often done by Site Reliability Engineers but might be done by
software engineers or others, depending on the situation and product. For more information, see
Chapter 12 of the Site Reliability Engineering Book
.
Mitigation/Fix
An issue is considered
fixed
only when changes have been made that Google is
confident will end the impact indefinitely. For example, the fix could be rolling back
a change that triggered an incident.
While an incident is in progress, the Support and Product teams will attempt to
mitigate
the issue. Mitigation occurs when the impact or scope of an issue can be reduced,
for example by temporarily providing additional resources to a service suffering overload.
If no mitigation has been found, when possible, the Support team will find and communicate
workarounds
. Workarounds are steps that you can take to solve the underlying need
despite the incident. A workaround might be to use different settings for an API call to
avoid a problematic code path.
Follow Up
While an incident is ongoing, the Support team provides regular updates. Updates typically provide:
- More information about the incident, such as error messages, which features are affected,
and how widespread it is.
- Progress towards mitigation, including any workarounds.
- Timelines for communication, tailored to the incident.
- Changes in status, such as when an incident is fixed.
Postmortem
All incidents result in a postmortem (post incident) internal analysis to fully
understand the incident and to identify reliability improvements that Google can make. These
improvements are then tracked and implemented. For more information on postmortems at Google,
see
Chapter 15 of the Site Reliability Engineering Book
.
Incident Report
When incidents have very wide and serious impact, Google provides incident reports that
outline the symptoms, impact, root cause, remediation, and future prevention of incidents.
As with postmortems, we pay particular attention to the steps that we take to learn from
the issue and improve reliability. Google's goal in writing and releasing postmortems is to
be transparent and demonstrate our commitment to building stable services for our customers.
FAQ
I want to get notified when there’s an ongoing outage. What should I do?
- Join the
Google Maps Platform Notifications group
to get notified of ongoing
issues and to follow the progress of the incident in real-time. This group will also help you
stay up to date with product and platform announcements.
- Use the
RSS Feed
or
JSON History
links at the bottom of the
Maps Public Status Dashboard
to view a feed of current and past incidents.
Every post to the Dashboard will trigger a post to the feed.
To keep you updated, each post to the feed will include all the messages and updates
pertaining to the corresponding Dashboard event. That way you won't need to dig through your
feed history to piece together how things are progressing.
RSS feeds are published in XML format. Browser extensions such as
RSS Subscription Extension (by Google)
allow you to preview the feed content and subscribe
through your favorite RSS reader. JSON History is a
JSON Web Feed
of past
incidents. A range of software libraries and web frameworks
support
content syndication via JSON Feed.
What type of status information can I find on the dashboard home page?
The
Google Maps Public Status Dashboard
provides information on APIs and services that are part
of Google Maps Platform. If there is an active incident, information will be posted here for
each specific API and service within Google Maps Platform. Status indicators are always
shown, representing the overall health for each API and service, from one of the following:
- Service Outage
: A production system or service
is down. Workaround is not
available or is not easily implemented.
- Service Disruption
: A production system or
service is partially impaired
and/or does not work as expected. Workaround exists.
- Service Information
: A production system or
service is partially impaired and/or does not work as expected. Generally, the service is
still available, impact is minor, and affects a small number of users.
- Available
: Service is fully functional and
working as expected.
Is the dashboard real-time?
The Maps Public Status Dashboard is intended to provide a near real-time status of products
that are generally available and covered by the Google Maps Platform SLA. All incidents are
first verified before posting; so there may be a slight delay from the time they were first
detected. As such, the dashboard should not be used for uptime-tracking purposes.
The Maps Public Status Dashboard is not intended for monitoring the status of GMP services
based on the
GMP SLA
since the outage durations shown in the dashboard may not reflect actual "Downtime"
(as defined in the SLA) for your project, especially for lower-severity incidents.
Furthermore, the durations shown may include additional time after the issue was mitigated to
fully confirm the fix.
To monitor API usage, create dashboards, and create alerts, visit
Google Maps Platform Monitoring
.
What if I don't see an incident on the dashboard?
Not all customers and projects are impacted by every incident. Only broad and severe incidents
are reflected on the dashboard. If you experience an issue that’s not listed on the dashboard,
contact Support
.
The
History
page in the Maps Public Status Dashboard is a repository of disruptions and outages from the
past 365 days. Click an incident to review the posts about the incident while it was ongoing,
as well as any incident reports published by the Support team.
Who updates the dashboard?
The global Google Maps Platform Support team monitors the status of services using many
different types of signals and updates the dashboard in the event of a widespread issue. If
needed, they will also post a detailed analysis report after an incident has been resolved.
What is the difference between an "incident" and an "outage"?
Although these terms are often used interchangeably, Maps Public Status Dashboard and our
external communications use "incident" to refer to any period of degraded service and "outage"
to refer only to the most serious impairment, where a service is nonfunctioning to the extent
that it renders our customers' experience effectively useless.