Stay organized with collections
Save and categorize content based on your preferences.
Load and query data with the bq tool
Learn how to create a dataset, load sample data, and query tables with the
bq command-line tool.
To follow step-by-step guidance for this task directly in the
Google Cloud console, click
Guide me
:
Guide me
Before you begin
-
Sign in to your Google Cloud account. If you're new to
Google Cloud,
create an account
to evaluate how our products perform in
real-world scenarios. New customers also get $300 in free credits to
run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page,
select or
create a Google Cloud project
.
Go to project selector
-
In the Google Cloud console, on the project selector page,
select or
create a Google Cloud project
.
Go to project selector
-
Make sure that billing is enabled for your Google Cloud project
.
If you do not enable billing for the Google Cloud project that you use in this
tutorial, then you will work with data in the BigQuery sandbox. The
BigQuery sandbox lets you learn BigQuery with a limited
set of BigQuery features at no charge.
Ensure that the BigQuery API is enabled.
Enable the API
If you created a new project, the BigQuery API is automatically
enabled.
-
In the Google Cloud console, activate Cloud Shell.
Activate Cloud Shell
At the bottom of the Google Cloud console, a
Cloud Shell
session starts and displays a command-line prompt. Cloud Shell is a shell environment
with the Google Cloud CLI
already installed and with values already set for
your current project. It can take a few seconds for the session to initialize.
Download the source public data file
- Download the
baby names zip file
.
- Extract the zip file. It contains a file named
NationalReadMe.pdf
that
describes the dataset schema.
Learn more about the baby names dataset
.
- Open the
yob2010.txt
file. It's a comma-separated value (CSV) file that
contains three columns: name, assigned sex at birth, and number of children
with that name. The file has no header row.
- Move the file to your working directory.
- If you're working in Cloud Shell, click
more_vert
More
>
Upload
, click
Choose Files
, choose the
yob2010.txt
file, and then click
Upload
.
- If you're working in a local shell, copy or move the file
yob2010.txt
into the directory where you're running the bq tool.
Create a dataset
Create a dataset named
babynames
:
bq mk babynames
The output is similar to the following:
Dataset 'myproject:babynames' successfully created.
A dataset name can be up to 1,024 characters long and consist of A-Z, a-z,
0-9, and the underscore. The name cannot start with a number or underscore,
and it cannot have spaces.
Confirm that the dataset
babynames
now appears in your project:
bq ls
The output is similar to the following:
datasetId
-------------
babynames
Load data into a table
In the
babynames
dataset, load the source file
yob2010.txt
into a
new table that's named
names2010
:
bq load babynames.names2010 yob2010.txt name:string,assigned_sex_at_birth:string,count:integer
The output is similar to the following:
Upload complete.
Waiting on bqjob_r3c045d7cbe5ca6d2_0000018292f0815f_1 ... (1s) Current status: DONE
By default, when you load data, BigQuery expects UTF-8
encoded data. If you have data in ISO-8859-1 (or Latin-1) encoding and
you have problems with it, instruct BigQuery to treat
your data as Latin-1 using
bq load -E=ISO-8859-1
. For more information,
see
Encoding
.
Confirm that the table
names2010
now appears in the
babynames
dataset:
bq ls babynames
The output is similar to the following. Some columns are omitted to simplify
the output.
tableId Type
----------- ---------
names2010 TABLE
Confirm that the table schema of your new
names2010
table is
name: string
,
assigned_sex_at_birth: string
, and
count: integer
:
bq show babynames.names2010
The output is similar to the following. Some columns are omitted to simplify
the output.
Last modified Schema Total Rows Total Bytes
----------------- ------------------------------- ------------ ------------
14 Mar 17:16:45 |- name: string 34089 654791
|- assigned_sex_at_birth: string
|- count: integer
Query table data
Determine the most popular girls' names in the data:
bq query --use_legacy_sql=false \
'SELECT
name,
count
FROM
`babynames.names2010`
WHERE
assigned_sex_at_birth = "F"
ORDER BY
count DESC
LIMIT 5;'
The output is similar to the following:
+----------+-------+
| name | count |
+----------+-------+
| Isabella | 22925 |
| Sophia | 20648 |
| Emma | 17354 |
| Olivia | 17030 |
| Ava | 15436 |
+----------+-------+
Determine the least popular boys' names in the data:
bq query --use_legacy_sql=false \
'SELECT
name,
count
FROM
`babynames.names2010`
WHERE
assigned_sex_at_birth = "M"
ORDER BY
count ASC
LIMIT 5;'
The output is similar to the following:
+----------+-------+
| name | count |
+----------+-------+
| Aamarion | 5 |
| Aarian | 5 |
| Aaqib | 5 |
| Aaidan | 5 |
| Aadhavan | 5 |
+----------+-------+
The minimum count is 5 because the source data omits names with fewer than
5 occurrences.
Clean up
To avoid incurring charges to your Google Cloud account for
the resources used on this page, delete the Google Cloud project with the
resources.
Delete the project
If you used the
BigQuery sandbox
to query
the public dataset, then billing is not enabled for your project.
The easiest way to eliminate billing is to delete the project that you
created for the tutorial.
To delete the project:
-
In the Google Cloud console, go to the
Manage resources
page.
Go to Manage resources
-
In the project list, select the project that you
want to delete, and then click
Delete
.
-
In the dialog, type the project ID, and then click
Shut down
to delete the project.
Delete the resources
If you used an existing project, delete the resources that you created:
Delete the
babynames
dataset:
bq rm --recursive=true babynames
The
--recursive
flag deletes all tables in the dataset, including the
names2010
table.
The output is similar to the following:
rm: remove dataset 'myproject:babynames'? (y/N)
To confirm the delete command, enter
y
.
Except as otherwise noted, the content of this page is licensed under the
Creative Commons Attribution 4.0 License
, and code samples are licensed under the
Apache 2.0 License
. For details, see the
Google Developers Site Policies
. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-06-11 UTC.
[{
"type": "thumb-down",
"id": "hardToUnderstand",
"label":"Hard to understand"
},{
"type": "thumb-down",
"id": "incorrectInformationOrSampleCode",
"label":"Incorrect information or sample code"
},{
"type": "thumb-down",
"id": "missingTheInformationSamplesINeed",
"label":"Missing the information/samples I need"
},{
"type": "thumb-down",
"id": "otherDown",
"label":"Other"
}]
[{
"type": "thumb-up",
"id": "easyToUnderstand",
"label":"Easy to understand"
},{
"type": "thumb-up",
"id": "solvedMyProblem",
"label":"Solved my problem"
},{
"type": "thumb-up",
"id": "otherUp",
"label":"Other"
}]