Create materialized view replicas
This document describes how to create
materialized view replicas
in BigQuery. You can use materialized view replicas to make
Amazon Simple Storage Service (Amazon S3) data available locally for joins.
Before you begin
-
Sign in to your Google Cloud account. If you're new to
Google Cloud,
create an account
to evaluate how our products perform in
real-world scenarios. New customers also get $300 in free credits to
run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page,
select or
create a Google Cloud project
.
Go to project selector
-
Make sure that billing is enabled for your Google Cloud project
.
-
In the Google Cloud console, on the project selector page,
select or
create a Google Cloud project
.
Go to project selector
-
Make sure that billing is enabled for your Google Cloud project
.
Required permissions
Grant Identity and Access Management (IAM) roles that give users the necessary permissions
to perform each task in this document.
To create materialized view replicas, you need the following
IAM permissions:
bigquery.tables.create
bigquery.tables.get
bigquery.tables.getData
bigquery.tables.replicateData
bigquery.jobs.create
Each of the following predefined IAM roles includes the
permissions that you need in order to create a materialized view:
bigquery.dataEditor
bigquery.dataOwner
bigquery.admin
For more information about
BigQuery Identity and Access Management (IAM), see
Introduction to IAM in BigQuery
.
Prerequisites
Before creating a materialized view replica, you must complete the following
tasks:
- Create a dataset
in a
region that supports Amazon S3
.
- Create an Amazon S3 BigLake table
in the dataset you created in Step 1. Enable
metadata caching
when creating the table.
- Create a materialized view
over
the Amazon S3 BigLake table in the dataset you created in
Step 1. You could also create the materialized view in a different dataset
that is in a region that supports Amazon S3.
- Authorize the materialized view
on the
datasets that contain the source Amazon S3 BigLake tables
used in the query that created the materialized view.
- If you configured manual metadata cache refreshing for the Amazon S3
BigLake table, run the
BQ.REFRESH_EXTERNAL_METADATA_CACHE
system procedure
to refresh the metadata cache.
- Run the
BQ.REFRESH_MATERIALIZED_VIEW
system procedure
to refresh the materialized view.
Create materialized view replicas
To create materialized view replicas, use the
CREATE MATERIALIZED VIEW AS REPLICA OF
statement
:
CREATE
MATERIALIZED
VIEW
PROJECT_ID
.
BQ_DATASET
.
REPLICA_NAME
OPTIONS(replication_interval_seconds=
REPLICATION_INTERVAL
)
AS REPLICA OF
PROJECT_ID
.
S3_DATASET
.
MATERIALIZED_VIEW_NAME
;
Replace the following:
PROJECT_ID
: the name of your
project in which you want to create the materialized view replica?for example,
myproject
.
BQ_DATASET
: the name of the
BigQuery dataset that you want to create the
materialized view replica in?for example,
bq_dataset
.
The dataset must be in the BigQuery
region
that maps to the region of the source materialized view.
REPLICA_NAME
: the name of the
materialized view replica that you want to create?for example,
my_mv_replica
.
REPLICATION_INTERVAL
: specifies how often to
replicate the data from the source materialized view to the replica, in
seconds. Must be a value between 60 and 3,600, inclusive. Defaults to
300 (5 minutes).
S3_DATASET
: the name of the
dataset that contains the source materialized view?for example,
s3_dataset
.
MATERIALIZED_VIEW_NAME
: the name of the
materialized view to replicate?for example,
my_mv
.
The following example creates a materialized view replica named
mv_replica
in
bq_dataset
:
CREATE
MATERIALIZED
VIEW `myproject.bq_dataset.mv_replica`
OPTIONS(
replication_interval_seconds=600
)
AS REPLICA OF `myproject.s3_dataset.my_s3_mv`
After you create the materialized view replica, the replication process
polls the source materialized view for changes and replicates data to the
materialized view replica, refreshing the data at the interval you specified
in the
replication_interval_seconds
option. If you query the replica before
the first backfill completes, you get a
backfill in progress
error. You can
query the data in the materialized view replica after the first replication
completes.