Using cached query results
BigQuery writes all query results to a table. The table is either
explicitly identified by the user (a destination table), or it is a temporary,
cached results table. If you run the exact same query again,
BigQuery returns the results from
the cached table, if it exists.
Temporary, cached results tables are maintained per-user,
per-project. Depending on your edition, you might have access to
cached results from other users
running queries in the
same project. There are no storage costs for cached query result tables, but if
you write query results to a permanent table, you are charged for
storing
the data.
All query results, including both
interactive and batch queries
,
are cached in temporary tables for approximately 24 hours with some
exceptions
.
Limitations
Using the query cache is subject to the following limitations:
- When you run a duplicate query, BigQuery attempts to reuse
cached results. To retrieve data from the cache, the duplicate query text must
be the same as the original query.
- For query results to persist in a cached results table, the result set must be
smaller than the maximum response size. For more information about managing
large result sets, see
Returning large query results
.
- You cannot target cached result tables with
DML
statements.
- Although current semantics allow it, the use of cached results as
input for dependent jobs is discouraged. For example, you shouldn't
submit query jobs that retrieve results from the cache table. Instead, write
your results to a named destination table. To simplify cleanup, features such
as the dataset level
defaultTableExpirationMs
property can expire the data
automatically after a given duration.
Pricing and quotas
Cached query results are stored as temporary tables. You aren't charged for the
storage of cached query results in temporary tables. When query results are
retrieved from a cached results table, the job statistics property
statistics.query.cacheHit
returns as
true
, and you are not charged for the
query. Though you are not charged for queries that use cached results, the
queries are subject to the BigQuery
quota policies
.
In addition to reducing costs, queries that use cached results are
significantly faster because BigQuery does not need to compute
the result set.
Exceptions to query caching
Query results are not cached:
- When a destination table is specified in the job configuration, the
Google Cloud console, the bq command-line tool, or the API.
- If any of the referenced tables or logical views have changed since the
results were previously cached.
- When any of the tables referenced by the query have recently received
streaming inserts (table has data in the write-optimized storage) even if no new
rows have arrived.
- If the query uses non-deterministic functions; for example, date and time
functions such as
CURRENT_TIMESTAMP()
and
CURRENT_DATE
, and
other functions such as
SESSION_USER()
, it returns different values
depending on when a query is executed.
- If you are querying multiple tables using a
wildcard
.
- If the cached results have expired; typical cache lifetime is 24 hours, but
the cached results are best-effort and may be invalidated sooner.
- If the query runs against an
external data source
.
other than Cloud Storage. (GoogleSQL
queries on Cloud Storage are supported by cached query results.)
- If the query runs against a table protected by
column-level security
,
results may not be cached.
- If the query runs against a table protected by
row-level security
,
then the results are not cached.
How cached results are stored
When you run a query, a temporary, cached results table is created in a special
type of
hidden dataset
referred to as an
anonymous dataset
.
Unlike regular datasets which
inherit permissions from the IAM resource hierarchy model (project and
organization permissions), access to anonymous datasets is restricted to the
owner. The owner of an anonymous dataset is the user who ran the query that
produced the cached result. In addition, the
bigquery.jobs.create
permission
is checked on the project to verify that the user has access to the project.
BigQuery doesn't support sharing anonymous datasets. If you
intend to share
query results, don't use the cached results stored in an anonymous dataset.
Instead, write the results to a named destination table.
Although the user that runs the query has full access to the dataset and the
cached results table, using them as inputs for dependent jobs is discouraged.
The names of anonymous datasets begin with an underscore. This hides them from
the datasets list in the Google Cloud console. You can list anonymous datasets
and audit anonymous dataset access controls by using the bq command-line tool or the API.
For more information about listing and getting information about datasets,
including anonymous datasets,
see
Listing datasets
.
Cross-user caching
If you are using the Enterprise or Enterprise Plus
edition
and have the required
permissions to execute a query that is cached in your project for another user,
then BigQuery produces the cached result. The cached result is
copied into your personal anonymous dataset and remains there for 24 hours from
when you ran the query. The same limits and exceptions for single-user caching
apply to cross-user caching.
Disabling retrieval of cached results
The
Use cached results
option reuses results from a previous run of the
same query unless the tables being queried have changed. Using cached results is
only beneficial for repeated queries. For new queries, the
Use cached
results
option has no effect, though it is enabled by default.
When you repeat a query with the
Use cached results
option disabled,
the existing cached result is overwritten. This requires BigQuery
to compute the query result, and you are charged for the query. This is
particularly useful in benchmarking scenarios.
If you want to disable retrieving cached results and force live evaluation of a
query job, you can set the
configuration.query.useQueryCache
property of your query job to
false
.
To disable the
Use cached results
option:
Console
Open the Google Cloud console.
Go to the BigQuery page
Click
Compose new query
.
Enter a valid SQL query in the
Query editor
text area.
Click
More
and select
Query settings
.
For
Cache preference
, clear
Use cached results
.
bq
Use the
nouse_cache
flag to overwrite the query cache. The following
example forces BigQuery to process the query without using the
existing cached results:
bq query \
--nouse_cache \
--batch \
'SELECT
name,
count
FROM
`my-project`.mydataset.names_2013
WHERE
gender = "M"
ORDER BY
count DESC
LIMIT
6'
API
To process a query without using the existing cached results, set the
useQueryCache
property to
false
in the
query
job configuration.
Ensuring use of the cache
If you use the
jobs.insert
method
to run a query, you can force a query job to fail unless cached results can be
used by setting the
createDisposition
property of the
query
job
configuration to
CREATE_NEVER
.
If the query result does not exist in the cache, a
NOT_FOUND
error is
returned.
bq
Use the
--require_cache
flag to require results from the query cache. The
following example forces BigQuery to process the query if its
results exist in the cache:
bq query \
--require_cache \
--batch \
'SELECT
name,
count
FROM
`my-project`.mydataset.names_2013
WHERE
gender = "M"
ORDER BY
count DESC
LIMIT
6'
API
To process a query with existing cached results, set the
createDisposition
property
to
CREATE_NEVER
in the
query
job configuration.
Verifying use of the cache
There are two ways to determine if BigQuery returned a result
using the cache:
- Using the Google Cloud console
. Go to
Query results
and click
Job Information
.
Bytes processed
shows
0 B (results cached)
.
- Using the
BigQuery API
.
The
cacheHit
property in the query result is set to
true
.
Impact of column-level security
By default, BigQuery caches query results for 24 hours, with the
exceptions
noted previously. Queries against a table
protected by
column-level security
might not be cached. If
BigQuery does cache the result, the 24-hour cache lifetime
applies.
A change such as removing a group or a user from the
Data Catalog Fine Grained Reader
role used for a policy tag does
not invalidate the 24-hour cache. A change to the
Data Catalog Fine
Grained Reader
access control group itself is propagated immediately, but the
change does not invalidate the cache.
The impact is if a user ran a query, the query results remain visible to the
user on screen. The user can also retrieve those results from the cache even if
they lost access to the data within the last 24 hours.
During the 24 hours after a user is removed from the
Data Catalog
Fine Grained Reader
role for a policy tag, the user can access the cached
data only for data that the user was previously allowed to see. If rows are
added to the table, the user can't see the added rows, even if the results
are cached.