Create Cloud Storage external tables
BigQuery supports querying Cloud Storage data in the
following formats:
- Comma-separated values (CSV)
- JSON (newline-delimited)
- Avro
- ORC
- Parquet
- Datastore exports
- Firestore exports
BigQuery supports querying Cloud Storage data from these
storage classes
:
- Standard
- Nearline
- Coldline
- Archive
To query a Cloud Storage external table, you must have permissions
on both the external table and the Cloud Storage files. We recommend
using a
BigLake table
instead if
possible. BigLake tables provide access delegation, so that
you only need permissions on the BigLake table in order to query
the Cloud Storage data.
Be sure to
consider the location
of your dataset and Cloud Storage bucket when you query data stored in
Cloud Storage.
Before you begin
Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document. The permissions required to perform a task (if any) are listed in the "Required permissions" section of the task.
Required roles
To create an external table, you need the
bigquery.tables.create
BigQuery Identity and Access Management (IAM) permission.
Each of the following predefined Identity and Access Management roles includes this permission:
- BigQuery Data Editor (
roles/bigquery.dataEditor
)
- BigQuery Data Owner (
roles/bigquery.dataOwner
)
- BigQuery Admin (
roles/bigquery.admin
)
You also need the following permissions to access the Cloud Storage
bucket that contains your data:
storage.buckets.get
storage.objects.get
storage.objects.list
(required if you are using a URI
wildcard
)
The Cloud Storage Storage Admin (
roles/storage.admin
)
predefined Identity and Access Management role includes these permissions.
If you are not a principal in any of these roles, ask your administrator
to grant you access or to create the external table for you.
For more information on Identity and Access Management roles and permissions in
BigQuery, see
Predefined roles and
permissions
.
Access scopes for Compute Engine instances
If, from a Compute Engine instance, you need to query an external table
that is linked to a Cloud Storage source, the instance must have at least the
Cloud Storage read-only
access scope
(
https://www.googleapis.com/auth/devstorage.read_only
).
The scopes control the Compute Engine instance's access to Google Cloud
products, including Cloud Storage. Applications running on the instance use
the service account attached to the instance to call Google Cloud APIs.
If you set up a Compute Engine instance to run as the
default Compute Engine service account
,
the instance is by default granted a number of
default scopes
,
including the
https://www.googleapis.com/auth/devstorage.read_only
scope.
If instead you set up the instance with a custom service account, make sure to
explicitly grant the
https://www.googleapis.com/auth/devstorage.read_only
scope to the instance.
For information about applying scopes to a Compute Engine instance,
see
Changing the service account and access scopes for an instance
.
For more information about Compute Engine service accounts, see
Service accounts
.
Create external tables on unpartitioned data
You can create a permanent table linked to your external data source by:
Select one of the following options:
Console
Go to the
BigQuery
page.
Go to BigQuery
In the
Explorer
pane, expand your project and select a dataset.
Expand the
more_vert
Actions
option and click
Create table
.
In the
Source
section, specify the following details:
For
Create table from
, select
Google Cloud Storage
For
Select file from GCS bucket or use a URI pattern
, browse to
select a bucket and file to use, or type the path in the format
gs://bucket_name/[folder_name/]file_name
.
You can't specify multiple URIs in the Google Cloud console, but
you can select multiple files by specifying one asterisk (
*
)
wildcard character. For example,
gs://mybucket/file_name*
. For more
information, see
Wildcard support for Cloud Storage URIs
.
The Cloud Storage bucket
must be in the same location as the dataset that contains the table
you're creating.
For
File format
, select the format that matches your file.
In the
Destination
section, specify the following details:
For
Project
, choose the project in which to create the table.
For
Dataset
, choose the dataset in which to create the table.
For
Table
, enter the name of the table you are creating.
For
Table type
, select
External table
.
In the
Schema
section, you can either enable
schema auto-detection
or manually specify
a schema if you have a source file. If you don't have a source file, you
must manually specify a schema.
To enable schema auto-detection, select the
Auto-detect
option.
To manually specify a schema, leave the
Auto-detect
option
unchecked. Enable
Edit as text
and enter the table schema as a
JSON array
.
To ignore rows with extra column values that do not match the schema,
expand the
Advanced options
section and select
Unknown values
.
Click
Create table
.
After the permanent table is created, you can run a query against the table
as if it were a native BigQuery table. After your query
completes, you can
export the results
as CSV or JSON files, save the results
as a table, or save the results to Google Sheets.
SQL
You can create a permanent external table by running the
CREATE EXTERNAL TABLE
DDL statement
.
You can specify the schema explicitly, or use
schema auto-detection
to infer the schema
from the external data.
In the Google Cloud console, go to the
BigQuery
page.
Go to BigQuery
In the query editor, enter the following statement:
CREATE
EXTERNAL
TABLE `
PROJECT_ID
.
DATASET
.
EXTERNAL_TABLE_NAME
`
OPTIONS
(
format ="
TABLE_FORMAT
",
uris = ['
BUCKET_PATH
'[,...]]
);
Replace the following:
PROJECT_ID
: the name of your
project in which you want to create the table?for example,
myproject
DATASET
: the name of the
BigQuery dataset that you want to create the table
in?for example,
mydataset
EXTERNAL_TABLE_NAME
: the name of the table
that you want to create?for example,
mytable
TABLE_FORMAT
: the format of the table that
you want to create?for example,
PARQUET
BUCKET_PATH
: the path to the
Cloud Storage bucket that contains the data for the
external table, in the format
['gs://bucket_name/[folder_name/]file_name']
.
You can select multiple files from the bucket by specifying one asterisk (
*
)
wildcard character in the path. For example,
['gs://mybucket/file_name*']
. For more
information, see
Wildcard support for Cloud Storage URIs
.
You can specify multiple buckets for the
uris
option by providing multiple
paths.
The following examples show valid
uris
values:
['gs://bucket/path1/myfile.csv']
['gs://bucket/path1/*.csv']
['gs://bucket/path1/*', 'gs://bucket/path2/file00*']
When you specify
uris
values that target multiple files, all of those
files must share a compatible schema.
For more information about using Cloud Storage URIs in
BigQuery, see
Cloud Storage resource path
.
Click
play_circle
Run
.
For more information about how to run queries, see
Run an interactive query
.
Examples
The following example uses schema auto-detection to create an external table
named
sales
that is linked to a CSV file stored in Cloud Storage:
CREATE OR REPLACE EXTERNAL TABLE mydataset.sales
OPTIONS (
format = 'CSV',
uris = ['gs://mybucket/sales.csv']);
The next example specifies a schema explicitly and skips the first row in
the CSV file:
CREATE OR
REPLACE
EXTERNAL
TABLE mydataset.sales (
Region STRING,
Quarter STRING,
Total_Sales INT64
) OPTIONS (
format = 'CSV',
uris = ['gs://mybucket/sales.csv'],
skip_leading_rows = 1);
bq
To create an external table, use the
bq mk
command
with the
--external_table_definition
flag. This flag contains either a path to a
table definition file
or an inline
table definition.
Option 1: Table definition file
Use the
bq mkdef
command to create a table definition file, and then pass the file path to
the
bq mk
command as follows:
bq mkdef --source_format=
SOURCE_FORMAT
\
BUCKET_PATH
>
DEFINITION_FILE
bq mk --table \
--external_table_definition=
DEFINITION_FILE
\
DATASET_NAME
.
TABLE_NAME
\
SCHEMA
Replace the following:
SOURCE_FORMAT
: the format of the external data source.
For example,
CSV
.
BUCKET_PATH
: the path to the
Cloud Storage bucket that contains the data for the
table, in the format
gs://bucket_name/[folder_name/]file_pattern
.
You can select multiple files from the bucket by specifying one asterisk (
*
)
wildcard character in the
file_pattern
. For example,
gs://mybucket/file00*.parquet
. For more
information, see
Wildcard support for Cloud Storage URIs
.
You can specify multiple buckets for the
uris
option by providing multiple
paths.
The following examples show valid
uris
values:
gs://bucket/path1/myfile.csv
gs://bucket/path1/*.parquet
gs://bucket/path1/file1*
,
gs://bucket1/path1/*
When you specify
uris
values that target multiple files, all of those
files must share a compatible schema.
For more information about using Cloud Storage URIs in
BigQuery, see
Cloud Storage resource path
.
DEFINITION_FILE
: the path to the
table definition file
on your local machine.
DATASET_NAME
: the name of the dataset that contains the
table.
TABLE_NAME
: the name of the table you're creating.
SCHEMA
: specifies a path to a
JSON schema file
,
or specifies the schema in the form
field:data_type,field:data_type,...
.
Example:
bq mkdef --source_format=CSV gs://mybucket/sales.csv > mytable_def
bq mk --table --external_table_definition=mytable_def \
mydataset.mytable \
Region:STRING,Quarter:STRING,Total_sales:INTEGER
To use schema auto-detection, set the
--autodetect=true
flag in the
mkdef
command and omit the schema:
bq mkdef --source_format=CSV --autodetect=true \
gs://mybucket/sales.csv > mytable_def
bq mk --table --external_table_definition=mytable_def \
mydataset.mytable
Option 2: Inline table definition
Instead of creating a table definition file, you can pass the table
definition directly to the
bq mk
command:
bq mk --table \
--external_table_definition=@
SOURCE_FORMAT
=
BUCKET_PATH
\
DATASET_NAME
.
TABLE_NAME
\
SCHEMA
Replace the following:
SOURCE_FORMAT
: the format of the external data source
For example,
CSV
.
BUCKET_PATH
: the path to the
Cloud Storage bucket that contains the data for the
table, in the format
gs://bucket_name/[folder_name/]file_pattern
.
You can select multiple files from the bucket by specifying one asterisk (
*
)
wildcard character in the
file_pattern
. For example,
gs://mybucket/file00*.parquet
. For more
information, see
Wildcard support for Cloud Storage URIs
.
You can specify multiple buckets for the
uris
option by providing multiple
paths.
The following examples show valid
uris
values:
gs://bucket/path1/myfile.csv
gs://bucket/path1/*.parquet
gs://bucket/path1/file1*
,
gs://bucket1/path1/*
When you specify
uris
values that target multiple files, all of those
files must share a compatible schema.
For more information about using Cloud Storage URIs in
BigQuery, see
Cloud Storage resource path
.
DATASET_NAME
: the name of the dataset that contains the table.
TABLE_NAME
: the name of the table you're creating.
SCHEMA
: specifies a path to a
JSON schema file
,
or specifies the schema in the form
field:data_type,field:data_type,...
. To use schema
auto-detection, omit this argument.
Example:
bq mkdef --source_format=CSV gs://mybucket/sales.csv > mytable_def
bq mk --table --external_table_definition=mytable_def \
mydataset.mytable \
Region:STRING,Quarter:STRING,Total_sales:INTEGER
Create external tables on partitioned data
You can create an external table for Hive partitioned data that resides in
Cloud Storage. After you create an externally partitioned table, you
can't change the partition key. You need to recreate the table to change the
partition key.
To create an external table for Hive partitioned data, choose one of the
following options:
Console
In the Google Cloud console, go to
BigQuery
.
Go to BigQuery
- In the
Explorer
pane, expand your project and select a dataset.
- Click
more_vert
View actions
,
and then click
Create table
. This opens the
Create table
pane.
- In the
Source
section, specify the following details:
- For
Create table from
, select
Google Cloud Storage
.
- For
Select file from Cloud Storage bucket
, enter the path to the
Cloud Storage folder, using
wildcards
.
For example,
my_bucket/my_files*
. The Cloud Storage bucket must be in the
same location as the dataset that contains the table you want to create, append, or overwrite.
- From the
File format
list, select the file type.
- Select the
Source data partitioning
checkbox, and then for
Select
Source URI Prefix
, enter the Cloud Storage URI prefix. For example,
gs://my_bucket/my_files
.
- In the
Partition inference mode
section, select one of the following
options:
- Automatically infer types
: set the partition schema detection mode
to
AUTO
.
- All columns are strings
: set the partition schema detection mode to
STRINGS
.
- Provide my own
: set the partition schema detection mode to
CUSTOM
and manually enter the schema
information for the partition keys. For more information, see
Provide
a custom partition key schema
.
- Optional: To require a partition filter on all queries for this table,
select the
Require partition filter
checkbox. Requiring a partition
filter can reduce cost and improve performance. For more information, see
Requiring
predicate filters on partition keys in queries
.
- In the
Destination
section, specify the following details:
- For
Project
, select the project in which you want to create the
table.
- For
Dataset
, select the dataset in which you want to create the
table.
- For
Table
, enter the name of the table that you want to create.
- For
Table type
, select
External table
.
- In the
Schema
section, enter the
schema
definition.
- To enable the
auto detection
of schema,
select
Auto detect
.
- To ignore rows with extra column values that do not match the schema,
expand the
Advanced options
section and select
Unknown values
.
- Click
Create table
.
SQL
Use the
CREATE EXTERNAL TABLE
DDL statement
.
The following example uses automatic detection of Hive partition keys:
CREATE
EXTERNAL
TABLE `
PROJECT_ID
.
DATASET
.
EXTERNAL_TABLE_NAME
`
WITH
PARTITION COLUMNS
OPTIONS
(
format = '
SOURCE_FORMAT
',
uris = ['
GCS_URIS
'],
hive_partition_uri_prefix = '
GCS_URI_SHARED_PREFIX
',
require_hive_partition_filter =
BOOLEAN
);
Replace the following:
SOURCE_FORMAT
: the format of the external data
source, such as
PARQUET
GCS_URIS
: the path to the Cloud Storage
folder, using wildcard format
GCS_URI_SHARED_PREFIX
: the source URI prefix without
the wildcard
BOOLEAN
: whether to require a predicate filter at
query time. This flag is optional. The default value is
false
.
The following example uses custom Hive partition keys and types by listing
them in the
WITH PARTITION COLUMNS
clause:
CREATE
EXTERNAL
TABLE `
PROJECT_ID
.
DATASET
.
EXTERNAL_TABLE_NAME
`
WITH
PARTITION COLUMNS
(
PARTITION_COLUMN_LIST
)
OPTIONS
(
format = '
SOURCE_FORMAT
',
uris = ['
GCS_URIS
'],
hive_partition_uri_prefix = '
GCS_URI_SHARED_PREFIX
',
require_hive_partition_filter =
BOOLEAN
);
Replace the following:
PARTITION_COLUMN_LIST
: a list of columns following
the same order in the path of Cloud Storage folder, in the format
of:
KEY1
TYPE1
,
KEY2
TYPE2
The following example creates an externally partitioned table. It uses schema
auto-detection to detect both the file schema and the hive partitioning
layout. If the external path is
gs://bucket/path/field_1=first/field_2=1/data.parquet
, the partition columns
are detected as
field_1
(
STRING
) and
field_2
(
INT64
).
CREATE
EXTERNAL
TABLE dataset.AutoHivePartitionedTable
WITH
PARTITION COLUMNS
OPTIONS
(
uris = ['gs://bucket/path/*'],
format = 'PARQUET',
hive_partition_uri_prefix = 'gs://bucket/path',
require_hive_partition_filter = false);
The following example creates an externally partitioned table by explicitly
specifying the partition columns. This example assumes that the external file
path has the pattern
gs://bucket/path/field_1=first/field_2=1/data.parquet
.
CREATE
EXTERNAL
TABLE dataset.CustomHivePartitionedTable
WITH
PARTITION COLUMNS
(
field_1 STRING, -- column order must match the external path
field_2 INT64)
OPTIONS
(
uris = ['gs://bucket/path/*'],
format = 'PARQUET',
hive_partition_uri_prefix = 'gs://bucket/path',
require_hive_partition_filter = false);
bq
First, use the
bq mkdef
command to
create a table definition file:
bq mkdef \
--source_format=
SOURCE_FORMAT
\
--hive_partitioning_mode=
PARTITIONING_MODE
\
--hive_partitioning_source_uri_prefix=
GCS_URI_SHARED_PREFIX
\
--require_hive_partition_filter=
BOOLEAN
\
GCS_URIS
>
DEFINITION_FILE
Replace the following:
SOURCE_FORMAT
: the format of the external data source. For
example,
CSV
.
PARTITIONING_MODE
: the Hive partitioning mode. Use one of the
following values:
AUTO
: Automatically detect the key names and types.
STRINGS
: Automatically convert the key names to strings.
CUSTOM
: Encode the key schema in the source URI prefix.
GCS_URI_SHARED_PREFIX
: the source URI prefix.
BOOLEAN
: specifies whether to require a predicate filter at query
time. This flag is optional. The default value is
false
.
GCS_URIS
: the path to the Cloud Storage folder, using
wildcard format.
DEFINITION_FILE
: the path to the
table definition file
on your local
machine.
If
PARTITIONING_MODE
is
CUSTOM
, include the partition key schema
in the source URI prefix, using the following format:
--hive_partitioning_source_uri_prefix=
GCS_URI_SHARED_PREFIX
/{
KEY1
:
TYPE1
}/{
KEY2
:
TYPE2
}/...
After you create the table definition file, use the
bq mk
command to
create the external table:
bq mk --external_table_definition=
DEFINITION_FILE
\
DATASET_NAME
.
TABLE_NAME
\
SCHEMA
Replace the following:
DEFINITION_FILE
: the path to the table definition file.
DATASET_NAME
: the name of the dataset that contains the
table.
TABLE_NAME
: the name of the table you're creating.
SCHEMA
: specifies a path to a
JSON schema file
,
or specifies the schema in the form
field:data_type,field:data_type,...
. To use schema
auto-detection, omit this argument.
Examples
The following example uses
AUTO
Hive partitioning mode:
bq mkdef --source_format=CSV \
--hive_partitioning_mode=AUTO \
--hive_partitioning_source_uri_prefix=gs://myBucket/myTable \
gs://myBucket/myTable/* > mytable_def
bq mk --external_table_definition=mytable_def \
mydataset.mytable \
Region:STRING,Quarter:STRING,Total_sales:INTEGER
The following example uses
STRING
Hive partitioning mode:
bq mkdef --source_format=CSV \
--hive_partitioning_mode=STRING \
--hive_partitioning_source_uri_prefix=gs://myBucket/myTable \
gs://myBucket/myTable/* > mytable_def
bq mk --external_table_definition=mytable_def \
mydataset.mytable \
Region:STRING,Quarter:STRING,Total_sales:INTEGER
The following example uses
CUSTOM
Hive partitioning mode:
bq mkdef --source_format=CSV \
--hive_partitioning_mode=CUSTOM \
--hive_partitioning_source_uri_prefix=gs://myBucket/myTable/{dt:DATE}/{val:STRING} \
gs://myBucket/myTable/* > mytable_def
bq mk --external_table_definition=mytable_def \
mydataset.mytable \
Region:STRING,Quarter:STRING,Total_sales:INTEGER
API
To set Hive partitioning using the BigQuery API, include a
hivePartitioningOptions
object in the
ExternalDataConfiguration
object when you create the
table definition file
.
If you set the
hivePartitioningOptions.mode
field to
CUSTOM
, you must
encode the partition key schema in the
hivePartitioningOptions.sourceUriPrefix
field as follows:
gs://
BUCKET
/
PATH_TO_TABLE
/{
KEY1
:
TYPE1
}/{
KEY2
:
TYPE2
}/...
To enforce the use of a predicate filter at query time, set the
hivePartitioningOptions.requirePartitionFilter
field to
true
.
Query external tables
For more information, see
Query Cloud Storage data in external tables
.
Upgrade external tables to BigLake
You can upgrade tables based on Cloud Storage to BigLake
tables by associating the external table to a connection. If you want to use
metadata caching
with the BigLake table, you can specify settings for this at
the same time. To get table details such as source format and source URI, see
Get table information
.
To update an external table to a BigLake table, select one of the
following options:
SQL
Use the
CREATE OR REPLACE EXTERNAL TABLE
DDL statement
to update a table:
In the Google Cloud console, go to the
BigQuery
page.
Go to BigQuery
In the query editor, enter the following statement:
CREATE OR REPLACE EXTERNAL TABLE
`
PROJECT_ID
.
DATASET
.
EXTERNAL_TABLE_NAME
`
WITH CONNECTION `
REGION
.
CONNECTION_ID
`
OPTIONS(
format ="
TABLE_FORMAT
",
uris = ['
BUCKET_PATH
'],
max_staleness =
STALENESS_INTERVAL
,
metadata_cache_mode = '
CACHE_MODE
'
);
Replace the following:
PROJECT_ID
: the name of the project that
contains the table
DATASET
: the name of the dataset that
contains the table
EXTERNAL_TABLE_NAME
: the name of the
table
REGION
: the region that contains the
connection
CONNECTION_ID
: the name of the connection
to use
TABLE_FORMAT
: the format used by the table
You can't change this when updating the table.
BUCKET_PATH
: the path to the
Cloud Storage bucket that contains the data for the
external table, in the format
['gs://bucket_name/[folder_name/]file_name']
.
You can select multiple files from the bucket by specifying one asterisk (
*
)
wildcard character in the path. For example,
['gs://mybucket/file_name*']
. For more
information, see
Wildcard support for Cloud Storage URIs
.
You can specify multiple buckets for the
uris
option by providing multiple
paths.
The following examples show valid
uris
values:
['gs://bucket/path1/myfile.csv']
['gs://bucket/path1/*.csv']
['gs://bucket/path1/*', 'gs://bucket/path2/file00*']
When you specify
uris
values that target multiple files, all of those
files must share a compatible schema.
For more information about using Cloud Storage URIs in
BigQuery, see
Cloud Storage resource path
.
STALENESS_INTERVAL
: specifies whether
cached metadata is used by operations against the table, and
how fresh the cached metadata must be in order for the operation to
use it
For more information about metadata caching considerations, see
Metadata caching for performance
.
To disable metadata caching, specify 0. This is the default.
To enable metadata caching, specify an
interval literal
value between 30 minutes and 7 days. For example, specify
INTERVAL 4 HOUR
for a 4 hour staleness interval.
With this value, operations against the table use cached metadata if
it has been refreshed within the past 4 hours. If the cached metadata
is older than that, the operation retrieves metadata from
Cloud Storage instead.
CACHE_MODE
: specifies whether the metadata
cache is refreshed automatically or manually
For more information
on metadata caching considerations, see
Metadata caching for performance
.
Set to
AUTOMATIC
for the metadata cache to be
refreshed at a system-defined interval, usually somewhere between 30 and
60 minutes.
Set to
MANUAL
if you want to refresh
the metadata cache on a schedule you determine. In this case, you can call
the
BQ.REFRESH_EXTERNAL_METADATA_CACHE
system procedure
to refresh
the cache.
You must set
CACHE_MODE
if
STALENESS_INTERVAL
is set to a value greater
than 0.
Click
play_circle
Run
.
For more information about how to run queries, see
Run an interactive query
.
bq
Use the
bq mkdef
and
bq update
commands
to update a table:
Generate an
external table definition
,
that describes the aspects of the table to change:
bq mkdef --connection_id=
PROJECT_ID
.
REGION
.
CONNECTION_ID
\
--source_format=
TABLE_FORMAT
\
--metadata_cache_mode=
CACHE_MODE
\
"
BUCKET_PATH
" > /tmp/
DEFINITION_FILE
Replace the following:
PROJECT_ID
: the name of the project that
contains the connection
REGION
: the region that contains the
connection
CONNECTION_ID
: the name of the connection
to use
TABLE_FORMAT
: the format used by the
table. You can't change this when updating the table.
CACHE_MODE
: specifies whether the metadata
cache is refreshed automatically or manually. For more information
on metadata caching considerations, see
Metadata caching for performance
.
Set to
AUTOMATIC
for the metadata cache to be refreshed at a
system-defined interval, usually somewhere between 30 and
60 minutes.
Set to
MANUAL
if you want to refresh the metadata cache on a
schedule you determine. In this case, you can call the
BQ.REFRESH_EXTERNAL_METADATA_CACHE
system procedure
to refresh
the cache.
You must set
CACHE_MODE
if
STALENESS_INTERVAL
is set to a value
greater than 0.
BUCKET_PATH
: the path to the
Cloud Storage bucket that contains the data for the
external table, in the format
gs://bucket_name/[folder_name/]file_name
.
You can limit the files selected from the bucket by specifying one asterisk (
*
)
wildcard character in the path. For example,
gs://mybucket/file_name*
. For more
information, see
Wildcard support for Cloud Storage URIs
.
You can specify multiple buckets for the
uris
option by providing multiple
paths.
The following examples show valid
uris
values:
gs://bucket/path1/myfile.csv
gs://bucket/path1/*.csv
gs://bucket/path1/*,gs://bucket/path2/file00*
When you specify
uris
values that target multiple files, all of those
files must share a compatible schema.
For more information about using Cloud Storage URIs in
BigQuery, see
Cloud Storage resource path
.
DEFINITION_FILE
: the name of the table
definition file that you are creating.
Update the table using the new external table definition:
bq update --max_staleness=
STALENESS_INTERVAL
\
--external_table_definition=/tmp/
DEFINITION_FILE
\
PROJECT_ID
:
DATASET
.
EXTERNAL_TABLE_NAME
Replace the following:
STALENESS_INTERVAL
: specifies whether
cached metadata is used by operations against the
table, and how fresh the cached metadata must be in order for
the operation to use it. For more information about metadata
caching considerations, see
Metadata caching for performance
.
To disable metadata caching, specify 0. This is the default.
To enable metadata caching, specify an interval value between 30
minutes and 7 days, using the
Y-M D H:M:S
format described in the
INTERVAL
data type
documentation. For example, specify
0-0 0 4:0:0
for a 4
hour staleness interval.
With this value, operations against the table use cached metadata if
it has been refreshed within the past 4 hours. If the cached metadata
is older than that, the operation retrieves metadata from
Cloud Storage instead.
DEFINITION_FILE
: the name of the table
definition file that you created or updated.
PROJECT_ID
: the name of the project
that contains the table
DATASET
: the name of the dataset that
contains the table
EXTERNAL_TABLE_NAME
: the name of the table
Cloud Storage resource path
When you create an external table based on a Cloud Storage data source,
you must provide the path to the data.
The Cloud Storage resource path contains your bucket name and your
object (filename). For example, if the Cloud Storage bucket is named
mybucket
and the data file is named
myfile.csv
, the resource path would be
gs://mybucket/myfile.csv
.
BigQuery does not support Cloud Storage resource paths
that include multiple consecutive slashes after the initial double slash.
Cloud Storage object names can contain multiple consecutive slash ("/")
characters. However, BigQuery converts multiple consecutive
slashes into a single slash. For example, the following resource path, though
valid in Cloud Storage, does not work in BigQuery:
gs://
bucket
/my//object//name
.
To retrieve the Cloud Storage resource path:
Open the Cloud Storage console.
Cloud Storage console
Browse to the location of the object (file) that contains the source data.
Click on the name of the object.
The
Object details
page opens.
Copy the value provided in the
gsutil URI
field, which begins with
gs://
.
Wildcard support for Cloud Storage URIs
If your data is separated into multiple files, you can use an asterisk (*)
wildcard to select multiple files. Use of the asterisk wildcard must follow
these rules:
- The asterisk can appear inside the object name or at the end of the object
name.
- Using multiple asterisks is unsupported. For example, the path
gs://mybucket/fed-*/temp/*.csv
is invalid.
- Using an asterisk with the bucket name is unsupported.
Examples:
The following example shows how to select all of the files in all the
folders which start with the prefix
gs://mybucket/fed-samples/fed-sample
:
gs://mybucket/fed-samples/fed-sample*
The following example shows how to select only files with a
.csv
extension
in the folder named
fed-samples
and any subfolders of
fed-samples
:
gs://mybucket/fed-samples/*.csv
The following example shows how to select files with a naming pattern of
fed-sample*.csv
in the folder named
fed-samples
. This example doesn't
select files in subfolders of
fed-samples
.
gs://mybucket/fed-samples/fed-sample*.csv
When using the bq command-line tool, you might need to escape the asterisk on some
platforms.
You can't use an asterisk wildcard when you create external tables linked to
Datastore or Firestore exports.
Limitations
For information about limitations that apply to external tables, see
External table limitations
.
What's next