Introduction to external data sources
This page provides an overview of querying data stored outside of
BigQuery.
An external data source is a data source that you can query directly from
BigQuery, even though the data is not stored in
BigQuery storage. For example, you might have data in a
different Google Cloud database, in files in Cloud Storage, or in a
different cloud product altogether that you would like to analyze in
BigQuery, but that you aren't prepared to migrate.
Use cases for external data sources include the following:
- For extract-load-transform (ELT) workloads, loading and cleaning your data
in one pass and writing the cleaned result into BigQuery
storage, by using a
CREATE TABLE ... AS SELECT
query.
- Joining BigQuery tables with frequently changing data from
an external data source. By querying the external data source directly, you
don't need to reload the data into BigQuery storage every
time it changes.
BigQuery has two different mechanisms for querying external
data: external tables and federated queries.
External tables
External tables are similar to standard BigQuery tables, in
that these tables store their metadata and schema in BigQuery
storage. However, their data resides in an external source.
External tables are contained inside a dataset, and you manage them in
the same way that you manage a standard
BigQuery table. For example, you can
view the table's properties
,
set access controls
, and so
forth. You can query these tables and in most cases you can join them with
other tables.
There are four kinds of external tables:
- BigLake tables
- BigQuery Omni tables
- Object tables
- Non-BigLake external tables
BigLake tables
BigLake tables let you query structured data in
external data stores with access delegation. Access delegation
decouples access to the BigLake table from access to
the underlying data store. An
external connection
associated with a service account is used to connect to the data store. Because
the service account handles retrieving data from the data store, you only have
to grant users access to the BigLake table. This lets you enforce
fine-grained security at the table level, including
row-level
and
column-level
security. For
BigLake tables based on Cloud Storage, you can also use
dynamic data masking
. To learn more about
multi-cloud analytic solutions using BigLake tables with
Amazon S3 or Blob Storage data, see
BigQuery Omni
.
For more information, see
Introduction to BigLake tables
.
Object tables
Object tables let you analyze unstructured data in
Cloud Storage. You can perform analysis with remote functions or
perform inference by using BigQuery ML, and then join the results of
these operations with the rest of your structured data in BigQuery.
Like BigLake tables, object tables use access delegation,
which decouples access to the object table from access to the
Cloud Storage objects. An
external connection
associated with a service account is used to connect to Cloud Storage,
so you only have to grant users access to the object table. This lets you
enforce
row-level
security and manage
which objects users have access to.
For more information, see
Introduction to object tables
.
Non-BigLake external tables
Non-BigLake external tables let you query structured data
in external data stores. To query a non-BigLake external
table, you must have permissions to both the external table and the
external data source. For example, to query a non-BigLake
external table that uses a data source in Cloud Storage,
you must have the following permissions:
bigquery.tables.getData
bigquery.jobs.create
storage.buckets.get
storage.objects.get
For more information, see
Introduction to external tables
.
Federated queries
Federated queries let you send a query statement to AlloyDB, Spanner, or Cloud SQL databases
and get the result back as a temporary table. Federated queries use the
BigQuery Connection API to establish a connection with AlloyDB, Spanner, or Cloud SQL.
In your query, you use the
EXTERNAL_QUERY
function to send a
query statement to the external database, using that database's SQL dialect.
The results are converted to GoogleSQL data types.
For more information, see
Introduction to federated queries
.
External data source feature comparison
The following table compares the behavior of external data sources:
|
BigLake tables
|
Object tables
|
Non-BigLake external tables
|
Federated queries
|
Uses access delegation
|
Yes, through a service account
|
Yes, through a service account
|
No
|
Yes, through a database user account (Cloud SQL only)
|
Can be based on multiple source URIs
|
Yes
|
Yes
|
Yes (Cloud Storage only)
|
Not applicable
|
Row mapping
|
Rows represent file content
|
Rows represent file metadata
|
Rows represent file content
|
Not applicable
|
Accessible by other data processing tools by using connectors
|
Yes (Cloud Storage only)
|
No
|
Yes
|
Not applicable
|
Can be joined to other BigQuery tables
|
Yes (Cloud Storage only)
|
Yes
|
Yes
|
Yes
|
Can be accessed as a temporary table
|
Yes (Cloud Storage only)
|
No
|
Yes
|
Yes
|
Works with Amazon S3
|
Yes
|
No
|
No
|
No
|
Works with Azure Storage
|
Yes
|
No
|
No
|
No
|
Works with Bigtable
|
No
|
No
|
Yes
|
No
|
Works with Spanner
|
No
|
No
|
No
|
Yes
|
Works with Cloud SQL
|
No
|
No
|
No
|
Yes
|
Works with Google Drive
|
No
|
No
|
Yes
|
No
|
Works with Cloud Storage
|
Yes
|
Yes
|
Yes
|
No
|
What's next