Query Bigtable data
This document describes how to query data stored in a
Bigtable external table
.
Bigtable
is Google's sparsely populated NoSQL
database that can scale to billions of rows, thousands of columns, and
petabytes of data. For information on the Bigtable data model,
see
Storage model
.
Query permanent external tables
Before you begin, you or someone in your organization must create an external
table for you to use. For details and required permissions, see
Create a
BigQuery external table
.
Required roles
To query Bigtable permanent external tables, ensure
you have the following roles:
- BigQuery Data Viewer (
roles/bigquery.dataViewer
)
- BigQuery User (
roles/bigquery.user
)
- Bigtable Reader (
roles/bigtable.reader
)
Depending on your permissions, you can
grant these roles to yourself or ask your administrator
to grant them to you. For more information about granting roles, see
Viewing the grantable roles on resources
.
To see the exact BigQuery permissions that are required to query
external tables, expand the
Required permissions
section:
You might also be able to get these permissions with
custom roles
or other
predefined roles
.
Query the table
You can run a query against a permanent external Bigtable table
exactly as if it were a
standard BigQuery table
,
subject to the
limitations
on external data sources. For more information, see
Run interactive and batch
queries
.
Query temporary external tables
Querying an external data source using a temporary table is useful
for one-time, ad-hoc queries over external data, or for extract, transform, and load (ETL)
processes.
To query an external data source without creating a permanent table, you provide a table
definition for the temporary table, and then use that table definition in a command or call
to query the temporary table. You can provide the table definition in any of the following
ways:
The table definition file or supplied schema is used to create the temporary external table,
and the query runs against the temporary external table.
When you use a temporary external table, you do not create a table in one of your
BigQuery datasets. Because the table is not permanently stored in a dataset, it
cannot be shared with others.
Using a temporary external table instead of a permanent external
table has some limitations, including the following:
- You must have the Bigtable Admin (
roles/bigtable.admin
) role.
- This approach does not let you use the Google Cloud console to infer the
schema of the Bigtable table and automatically create the table
definition. You must create the table definition yourself.
Required roles
To query Bigtable temporary external tables, ensure
you have the following roles:
- BigQuery Data Viewer (
roles/bigquery.dataViewer
)
- BigQuery User (
roles/bigquery.user
)
- Bigtable Admin (
roles/bigtable.admin
)
Depending on your permissions, you can
grant these roles to yourself or ask your administrator
to grant them to you. For more information about granting roles, see
Viewing the grantable roles on resources
.
To see the exact BigQuery permissions that are required to query
external tables, expand the
Required permissions
section:
You might also be able to get these permissions with
custom roles
or other
predefined roles
.
Create and query the table
To query Bigtable data using a temporary external table, you:
Creating and querying a temporary external table is supported by the
bq command-line tool and the API.
bq
To query a temporary table using a table definition file, enter the
bq query
command with the
--external_table_definition
flag.
(Optional) Supply the
--location
flag and set the value to your
location
.
bq --location=
LOCATION
query \
--use_legacy_sql=false \
--external_table_definition=
TABLE
::
DEFINITION_FILE
\
'
QUERY
'
Replace the following:
LOCATION
: the name of your
location
.
The
--location
flag is optional.
TABLE
: the name of the temporary table you're creating.
DEFINITION_FILE
: the path to the
table definition file
on your local machine.
QUERY
: the query you're submitting to the temporary table.
For example, the following command creates and queries a temporary table
named
follows
using a table definition file named
follows_def
.
bq query \
--use_legacy_sql=false \
--external_table_definition=follows::/tmp/follows_def \
'SELECT
COUNT(rowkey)
FROM
follows'
API
Create a query. See
Querying data
for
information about creating a query job.
(Optional) Specify your location in the
location
property in the
jobReference
section of the
job resource
.
Specify the external data source properties by setting the
ExternalDataConfiguration
for the
table resource
.
The performance of queries against Bigtable external data sources
depends on three factors:
- The number of rows
- The amount of data read
- The extent of parallelization
BigQuery tries to read as little data as possible by only reading
the column families that are referenced in the query. The extent of
parallelization depends on how many nodes you have in your
Bigtable cluster and how many splits you have for your table.
Note that Bigtable auto-merges splits based on load. If your
table is not being read frequently, there will be fewer splits over time and
a gradual degradation in query performance. For more information on splitting
a table by row key, see
Managing tables
.
Querying Bigtable from BigQuery consumes
Bigtable CPU cycles. CPU consumption by BigQuery
might affect latency and throughput for other concurrent requests such as live
user traffic serving. For example, high CPU usage on Bigtable
affects long-tail queries and increases latency at the 99th percentile.
You should monitor Bigtable CPU usage to verify that you're
within the recommended bounds as noted on the Bigtable
monitoring dashboard in the Google Cloud console. Increasing the number of
nodes for your instance lets you handle both
BigQuery traffic and traffic from other
concurrent requests.
Query filters
Queries with a row equality filter only read that specific row. For example, in
GoogleSQL syntax:
SELECT
COUNT(follows.column.name)
FROM
`
dataset.table
`
WHERE
rowkey = "alice";
Range filters such as
rowkey > '1'
and
rowkey < '8'
are also supported, but
only when rowkey is read as a string with the
readRowkeyAsString
option.