Share sensitive data with data clean rooms
Data clean rooms provide a security-enhanced environment in which multiple
parties can share, join, and analyze their data assets without moving or
revealing the underlying data.
BigQuery data clean rooms are built on the Analytics Hub
platform. While standard
Analytics Hub data exchanges
provide a way to share data across organizational boundaries at scale, data
clean rooms help you address sensitive and protected data-sharing use cases.
Data clean rooms provide additional security controls to help protect the
underlying data and enforce
analysis rules
that the data owner defines.
The following are primary use cases:
- Campaign planning and audience insights.
Let two parties (such as
sellers and buyers) mix first-party data and improve data enrichment in a
privacy-centric way.
- Measurement and attribution.
Match customer and media performance data to
better understand the effectiveness of marketing efforts and make more
informed business decisions.
- Activation.
Combine customer data with data from other parties to enrich
understanding of customers, enabling improved segmentation capabilities and
more effective media activation.
There are also several data clean room use cases beyond the marketing industry:
- Retail and consumer packaged goods (CPG).
Optimize marketing and
promotional activities by combining point-of-sale data from retailers and
marketing data from CPG companies.
- Financial services.
Improve fraud detection by combining sensitive data
from other financial and government agencies. Build credit risk scoring by
aggregating customer data across multiple banks.
- Healthcare.
Share data between doctors and pharmaceutical researchers to
learn how patients are reacting to treatments.
- Supply chain, logistics, and transportation.
Combine data from suppliers
and marketers to get a complete picture of how products perform throughout
their lifecycle.
Roles
There are three main roles in BigQuery data clean rooms:
- Data clean room owner
: a user that manages permissions, visibility, and
membership of one or more data clean rooms within a project. This role is
analogous to the
Analytics Hub Admin
.
- Data contributor
: a user that is assigned by the data clean room owner
to publish data to a data clean room. In many cases, a data clean room owner
is also a data contributor. This role is analogous to the
Analytics Hub Publisher
.
- Subscriber
: a user that is assigned by the data clean room owner to
subscribe to the data published in a data clean room, letting them run
queries on the data. This role is analogous to a combination of the
Analytics Hub Subscriber
and
Analytics Hub Subscription Owner
.
Subscribers must have
non-edition offerings or the Enterprise Plus edition
.
Architecture
BigQuery data clean rooms are built on a publish and subscribe
model of BigQuery data. BigQuery architecture
provides a separation between compute and storage, enabling data contributors to
share data without having to make multiple copies of the data. The following
image is an overview of the BigQuery data clean room
architecture:
Data clean room
A
data clean room
is an environment to share sensitive data where raw access
is prevented and query restrictions are enforced. Only users or groups that are
added as subscribers to a data clean room can subscribe to the shared data. Data
clean room owners can create as many data clean rooms as they want in
Analytics Hub.
Shared resources
A
shared resource
is the unit of data sharing in a data clean room. The
resource must be a BigQuery table or view. As a data contributor,
you create or use an existing BigQuery resource in your project
that you want to share with your subscribers.
Listings
A
listing
is created when a data contributor adds data into a data clean room.
It contains a reference to the data contributor's shared resource along with
descriptive information that helps subscribers use the data. As a data
contributor, you can create a listing and include information such as a
description, sample queries, and links to documentation for your subscribers.
Linked datasets
A
linked dataset
is a read-only BigQuery dataset that serves as
a symbolic link to all data in a data clean room. When subscribers query
resources in a linked dataset, data from the shared resources is returned,
satisfying analysis rules set by the data contributor. As a subscriber, a
linked dataset is created inside your project when you subscribe to a data clean
room. No copy of the data is created, and subscribers can't see certain
metadata, such as view definitions.
Analysis rules
As a data contributor, you configure
analysis rules
on the resources
that you share in the data clean room.
Analysis rules
prevent raw access to
underlying data and enforce query restrictions. For example, data clean rooms
support the
aggregation threshold analysis rule
,
which lets subscribers analyze data only through aggregation queries.
Data egress controls
Data egress
controls
are automatically enabled to help prevent subscribers from copying and exporting
raw data from a data clean room. Data contributors can configure additional
controls to help prevent the copy and export of query results that are obtained
by the subscribers.
Limitations
BigQuery data clean rooms have the following limitations:
- You can set
analysis rules
only on views,
not on tables or materialized views. Due to this limitation, if a data
contributor directly shares tables or materialized views (or views without
analysis rules) into a data clean room, then subscribers have raw access to
the data in those resources.
- As data clean rooms are built on the Analytics Hub platform, all
Analytics Hub limitations
apply.
- Data clean rooms are only available in
Analytics Hub regions
.
- As a subscriber, you can't search for shared resources in Dataplex or
Data Catalog.
- As a subscriber, you can't query
INFORMATION_SCHEMA
views
on
linked datasets.
- As a data contributor, you can't publish an entire dataset directly to a data
clean room.
- As a data contributor, you can't publish models or routines to a data clean
room.
- You can add a maximum of 100 shared resources to a data clean room. If you
need to increase this limit, contact
bq-dcr-feedback@google.com
.
Before you begin
Grant Identity and Access Management (IAM) roles that give users the necessary permissions
to perform each task in this document, enable the Analytics Hub
API, and assign the Analytics Hub Admin role to your data clean
room owner (the user who will create the data clean room).
Required permissions
To get the permissions that you need to use data clean rooms,
ask your administrator to grant you the
BigQuery Data Editor
(
roles/bigquery.dataEditor
) IAM role.
For more information about granting roles, see
Manage access
.
This predefined role contains
the permissions required to use data clean rooms. To see the exact permissions that are
required, expand the
Required permissions
section:
Required permissions
The following permissions are required to use data clean rooms:
-
serviceUsage.services.get
-
serviceUsage.services.list
-
serviceUsage.services.enable
You might also be able to get
these permissions
with
custom roles
or
other
predefined roles
.
For more information about IAM roles and permissions in
BigQuery, see
Introduction to IAM
.
Enable the Analytics Hub API
To enable the Analytics Hub API, select one of the following
options:
Once you enable the Analytics Hub API, you can access the
Analytics Hub page
.
Assign the Analytics Hub Admin role
Your data clean room owner must have the
Analytics Hub Admin role
(
roles/analyticshub.admin
). To learn how to grant this role to other users,
see
Create Analytics Hub administrators
.
Data clean room owner workflows
As a data clean room owner, you can do the following:
- Create a data clean room.
- Update data clean room properties.
- Delete a data clean room.
- Manage data contributors.
- Manage subscribers.
- Share a data clean room.
Additional data clean room owner permissions
You must have the
Analytics Hub Admin role
(
roles/analyticshub.admin
) on your project to perform data clean room owner
tasks.
Create a data clean room
In the Google Cloud console, go to the
Analytics Hub
page.
Go to Analytics Hub
Click
Create clean room
.
For
Project
, select the project for the data clean room. The
Analytics Hub API must be enabled for the project.
Specify the location, name, primary contact, icon (optional), and
description for the data clean room. Only resources that are in the same
region as the data clean room can be listed in the data clean room.
Click
Create clean room
.
Optional: In the
Clean Room Permissions
section, add other data clean
room owners, data contributors, or subscribers.
Update a data clean room
In the Google Cloud console, go to the
Analytics Hub
page.
Go to Analytics Hub
Click the display name of the data clean room that you want to update.
In the
Details
tab, click
Edit clean room details
.
Update the data clean room name, primary contact, icon, or description as
needed.
Click
Save
.
Delete a data clean room
In the Google Cloud console, go to the
Analytics Hub
page.
Go to Analytics Hub
In the row of the data clean room that you want to delete, click
more_vert
More actions
>
Delete
.
To confirm, enter
delete
, and then click
Delete
. You can't undo this
action.
When you delete a data clean room, all the listings within it are deleted.
However, the shared resources and linked datasets are not deleted. The linked
datasets are unlinked from the source datasets, so querying resources in the
data clean room starts to fail for subscribers.
Manage data contributors
As a data clean room owner, you manage which users can add data to your data
clean rooms (your data contributors). To let a user add data to a data
clean room, grant them the
Analytics Hub Publisher role
(
roles/analyticshub.publisher
) on a specific data clean room:
In the Google Cloud console, go to the
Analytics Hub
page.
Go to Analytics Hub
Click the display name of the data clean room that you want to grant
permissions to.
In the
Details
tab, click
Set permissions
.
Click
Add principal
.
For
New principals
, enter the usernames or emails of the data
contributors that you're adding.
For
Select a role
, select
Analytics Hub
>
Analytics Hub Publisher
.
Click
Save
.
You can delete and update data contributors at any time by clicking
Set Permissions
.
You can grant the Analytics Hub Publisher role for an entire
project from the
IAM page
, which
gives a user permission to add data to any data clean room in a project.
However, we don't recommend this action, as it might result in users having
overly permissive access.
Manage subscribers
As a data clean room owner, you manage which users can subscribe to your data
clean rooms (your subscribers). To allow a user to subscribe to a data clean
room, grant them the
Analytics Hub Subscriber
(
roles/analyticshub.subscriber
) and
Analytics Hub Subscription Owner
(
roles/analyticshub.subscriptionOwner
) roles on a specific data clean room:
In the Google Cloud console, go to the
Analytics Hub
page.
Go to Analytics Hub
Click the display name of the data clean room that you want to grant
permissions to.
In the
Details
tab, click
Set permissions
.
Click
Add principal
.
For
New principals
, enter the usernames or emails of the subscribers
that you're adding.
For
Select a role
, select
Analytics Hub
>
Analytics Hub Subscriber
.
Click
add_box
Add another role
.
For
Select a role
, select
Analytics Hub
>
Analytics Hub Subscription Owner
.
Click
Save
.
You can delete and update subscribers at any time by clicking
Set Permissions
.
You can grant the Analytics Hub Subscriber and
Analytics Hub Subscription Owner roles for an entire project from
the
IAM page
, which gives a user
permission to subscribe to any data clean room in a project. However, we don't
recommend this action, as it might result in users having overly permissive
access.
Share a data clean room
You can directly share a data clean room with subscribers:
In the Google Cloud console, go to the
Analytics Hub
page.
Go to Analytics Hub
In the row of the data clean room that you want to share, click
more_vert
More actions
>
Copy share link
.
Share the copied link with subscribers to let them view and subscribe to the
data clean room.
Data contributor workflows
As a data contributor, you can do the following:
- Add data to a data clean room by creating a listing.
- Update a listing.
- Delete a listing.
- Share a data clean room.
- Monitor listings.
Additional data contributor permissions
To perform data contributor tasks, you must have the
Analytics Hub Publisher role
(
roles/analyticshub.publisher
) on a data clean room.
In addition, you need the
bigquery.datasets.link
permission for the datasets
that contain the resources that you want to list in a data clean room. You also
need the
resourcemanager.organization.get
permission if you want to view data
clean rooms in your organization that are not in your current project.
Create a listing (add data)
To prepare data with
analysis rules
and
publish to a data clean room as a listing, do the following:
In the Google Cloud console, go to the
Analytics Hub
page.
Go to Analytics Hub
Click the display name of the data clean room that you want to create a
listing in.
If you're in a different organization than your data clean room owner and
the data clean room is not visible to you, ask the data clean room owner for
a direct link.
Click
Add data
.
For
Select dataset
and
Table/view name
, enter the table or view that
you want to list in the data clean room and its corresponding dataset. You
will add analysis rules to prevent raw access to this underlying data in a
few steps.
Select the columns of your resource that you want to publish.
Set the view name, primary contact, and description (optional) for the
listing.
Click
Next
.
Choose an analysis rule for your listing and configure the details.
Set
data egress
controls for the listing.
Click
Next
.
Review the data and analysis rule that you're adding to the data clean
room.
Click
Add data
. A view is created for your data and is added as a
listing to the data clean room. The source table or view itself isn't added.
By listing a resource in a data clean room, you grant all current and future data
clean room subscribers access to the data in your shared resource.
If you try to create a listing with a shared resource that doesn't have an
analysis rule, you're shown a warning that subscribers will be able to access
the raw data for that resource. If you confirm that you're willingly publishing
such resources without analysis rules, you can still create the listing.
If you get the
Failed to save listing
error, ensure that you have the
necessary permissions to perform data contributor tasks
.
Update a listing
In the Google Cloud console, go to the
Analytics Hub
page.
Go to Analytics Hub
Click the display name of the data clean room that contains the listing.
In the row of the listing that you want to update, click
more_vert
More actions
>
Edit listing
.
Update the primary contact or description as needed.
Click
Next
.
Update the analysis rule as needed. You can only update the parameters of
the chosen rule. You can't switch to a different rule.
Click
Next
.
Review the listing and click
Add data
.
You can't change the source resource or data egress controls for a listing
after it's created.
Delete a listing
In the Google Cloud console, go to the
Analytics Hub
page.
Go to Analytics Hub
Click the display name of the data clean room that contains the listing.
In the row of the listing that you want to delete, click
more_vert
More actions
>
Delete listings
.
To confirm, enter
delete
, and then click
Delete
. You cannot undo this
action.
When you delete a listing, the shared resources and linked datasets are not
deleted. The linked datasets are unlinked from the source datasets, so querying
data in that listing starts to fail for subscribers.
Share a data clean room
You can directly share a data clean room with subscribers:
In the Google Cloud console, go to the
Analytics Hub
page.
Go to Analytics Hub
In the row of the data clean room that you want to share, click
more_vert
More actions
>
Copy share link
.
Share the copied link with subscribers to let them view and subscribe to the
data clean room.
Monitor listings
You can view the usage metrics on the source datasets of the resources that you
share in a data clean room by querying the
INFORMATION_SCHEMA.SHARED_DATASET_USAGE
view
.
To view your listing subscribers, do the following:
In the Google Cloud console, go to the
Analytics Hub
page.
Go to Analytics Hub
Click the display name of the data clean room.
In the row of a listing that you want to view, click
more_vert
More actions
>
View subscriptions
.
Subscriber workflows
A subscriber can view and subscribe to a data clean room. Subscribing to a data
clean room creates one linked dataset in the subscriber's project. Each linked
dataset has the same name as the data clean room.
You can't subscribe to a specific listing within a data clean room. You can only
subscribe to the data clean room itself.
Additional subscriber permissions
You must have the
Analytics Hub Subscriber
(
roles/analyticshub.subscriber
) and
Analytics Hub Subscription Owner
(
roles/analyticshub.subscriptionOwner
) roles
on a data clean room to perform subscriber tasks.
In addition, you need the
bigquery.datasets.create
permission in a project to
create a linked dataset when you subscribe to a clean room.
Subscribe to a data clean room
Subscribing to a data clean room gives you query access to the data in the
listings by creating a linked dataset in your project. To subscribe to a data
clean room, do the following:
In the Google Cloud console, go to the
BigQuery
page.
Go to BigQuery
In the
Explorer
pane, click
add_box
Add
.
Select
Analytics Hub
. A discovery page opens.
To display the data clean rooms that you have access to, in the filters
list, select
Clean rooms
.
Click the data clean room that you want to subscribe to. A description page
of the data clean room opens.
Click
Subscribe
.
Select the destination project for the subscription and click
Subscribe
.
A linked dataset is now added to the project that you specified and is available
for query.
As a subscriber, you can edit some metadata of your linked datasets, such as
description and labels. You can also set permissions on your linked datasets.
However, changes to linked datasets don't affect the source datasets. You also
can't see view definitions.
Resources that are contained in linked datasets are read-only. As a subscriber,
you can't edit data or metadata for resources in linked datasets. You also can't
specify permissions for individual resources within the linked dataset.
To unsubscribe to the data clean room, delete your linked dataset.
Query data in a linked dataset
To query data in a linked dataset, use the
SELECT WITH AGGREGATION_THRESHOLD
syntax
,
which lets you run queries on analysis rule-enforced views. For an
example of this syntax, see
Query an aggregation threshold analysis rule?enforced view
.
Example scenario: Advertiser and publisher attribution analysis
An advertiser wants to track the effectiveness of its marketing campaigns. The
advertiser has first-party data on its customers, including their purchase
history, demographics, and interests. The publisher has data from its website,
including which ads were shown to visitors and their conversions.
The advertiser and publisher agree to use a data clean room to combine data and
measure the results of their campaigns. In this case, the publisher creates the
data clean room and makes their data available for the advertiser to perform the
analysis. The result is an attribution report that shows the advertiser which
ads were most effective in driving sales. The advertiser can then use this
information to improve its future marketing campaigns.
The advertiser and publisher orchestrate the BigQuery data clean
room through the following process:
Create the data clean room (publisher)
- A data clean room owner in the publisher organization enables the
Analytics Hub API in their BigQuery project
and assigns User A as the data clean room owner
(Analytics Hub Admin).
- User A creates a data clean room called
Campaign Analysis
and assigns the
following permissions:
- Data contributor (Analytics Hub Publisher): User B, a data
engineer in the publisher organization.
- Subscriber (Analytics Hub Subscriber and Subscription Owner):
User C, a marketing analyst in the advertiser organization.
Add data to the data clean room (publisher)
- User B creates a new listing in the data clean room called
Publisher Conversion Data
. As part of listing creation, a new view with
analysis rules is created.
Subscribe to the data clean room (advertiser)
- User C subscribes to the data clean room, which creates a linked dataset for
all listings in the data clean room, including the
Publisher Conversion Data
listing.
- User C can now run aggregation queries to combine the data from this linked
dataset with their first-party data to measure the campaign effectiveness.
Entity resolution
Data clean room use cases often require linking entities across data contributor
and subscriber datasets that don't include a common identifier. Subscribers and
data contributors might represent the same records differently in multiple
datasets, either because datasets originate from different data sources or
because datasets use identifiers from different namespaces.
As a part of
data preparation
, entity resolution in
BigQuery does the following:
- For data contributors, it deduplicates and resolves records in their shared
resources by using identifiers from a common provider of their choice. This
process enables cross-contributor joins.
- For subscribers, it deduplicates and resolves records in their first-party
datasets and links to entities in data contributor datasets. This process
enables joins between subscriber and data contributor data.
To set up entity resolution with the identity provider of your choice, see
Configure and use entity resolution in BigQuery
.
Discover data clean room assets
To find all the data clean rooms that you have access to, do the following:
For data clean room owners and data contributors, in the
Google Cloud console, go to the
Analytics Hub
page.
Go to Analytics Hub
All the data clean rooms that you can access are listed.
For subscribers, do the following:
In the Google Cloud console, go to the
BigQuery
page.
Go to BigQuery
In the
Explorer
pane, click
add_box
Add
.
Select
Analytics Hub
. A discovery page opens.
To display the data clean rooms that you have access to, in the filters
list, select
Clean rooms
.
To find all the linked datasets created by data clean rooms in your project, run
the following command in a command-line environment:
PROJECT=
PROJECT_ID
\
for dataset in $(bq ls --project_id $PROJECT | tail +3); \
do [ "$(bq show -d --project_id $PROJECT $dataset | egrep LINKED)" ] \
&& echo $dataset; done
Replace
PROJECT_ID
with the project that contains your
linked datasets.
Pricing
Data contributors are only charged for
data storage
. Subscribers are only charged for
compute (analysis)
when they run
queries.
Subscribers must have
non-edition offerings or the Enterprise Plus edition
.
What's next