Use custom holidays in a time-series forecasting model
This tutorial shows you how to do the following tasks:
- Create an
ARIMA_PLUS
time-series forecasting model
that uses only built-in holidays.
- Create an
ARIMA_PLUS
time-series forecasting model that uses custom
holidays in addition to built-in holidays.
- Visualize the forecasted results from these models.
- Inspect a model to see which holidays it models.
- Evaluate the effects of the custom holidays on the forecasted results.
- Compare the performance of the model that uses only built-in holidays to the
performance of the model that uses custom holidays in addition to
built-in holidays.
This tutorial uses the
bigquery-public-data.wikipedia.pageviews_*
public tables.
Required permissions
- To create the dataset, you need the
bigquery.datasets.create
IAM permission.
To create the connection resource, you need the following permissions:
bigquery.connections.create
bigquery.connections.get
To create the model, you need the following permissions:
bigquery.jobs.create
bigquery.models.create
bigquery.models.getData
bigquery.models.updateData
bigquery.connections.delegate
To run inference, you need the following permissions:
bigquery.models.getData
bigquery.jobs.create
For more information about IAM roles and permissions in
BigQuery, see
Introduction to IAM
.
Costs
In this document, you use the following billable components of Google Cloud:
- BigQuery:
You incur costs for the data you
process in BigQuery.
To generate a cost estimate based on your projected usage,
use the
pricing calculator
.
New Google Cloud users might be eligible for a
free trial
.
For more information, see
BigQuery pricing
.
Before you begin
-
Sign in to your Google Cloud account. If you're new to
Google Cloud,
create an account
to evaluate how our products perform in
real-world scenarios. New customers also get $300 in free credits to
run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page,
select or
create a Google Cloud project
.
Go to project selector
-
Make sure that billing is enabled for your Google Cloud project
.
-
Enable the BigQuery API.
Enable the API
-
In the Google Cloud console, on the project selector page,
select or
create a Google Cloud project
.
Go to project selector
-
Make sure that billing is enabled for your Google Cloud project
.
-
Enable the BigQuery API.
Enable the API
Create a dataset
Create a BigQuery dataset to store your ML model:
In the Google Cloud console, go to the BigQuery page.
Go to the BigQuery page
In the
Explorer
pane, click your project name.
Click
more_vert
View actions > Create dataset
.
On the
Create dataset
page, do the following:
For
Dataset ID
, enter
bqml_tutorial
.
For
Location type
, select
Multi-region
, and then select
US (multiple regions in United States)
.
The public datasets are stored in the
US
multi-region
. For simplicity,
store your dataset in the same location.
Leave the remaining default settings as they are, and click
Create dataset
.
Prepare the time-series data
Aggregate the Wikipedia page view data for the
Google I/O
page into a single
table, grouped by day:
Go to the
BigQuery
page.
Go to BigQuery
In the SQL editor pane, run the following SQL statement:
CREATE OR REPLACE TABLE `bqml_tutorial.googleio_page_views`
AS
SELECT
DATETIME_TRUNC(datehour, DAY) AS date,
SUM(views) AS views
FROM
`bigquery-public-data.wikipedia.pageviews_*`
WHERE
datehour >= '2017-01-01'
AND datehour
< '2023-01-01'="" and="" title='Google_I/O' group="" by="" datetime_trunc(datehour,="" day)="">
Create a time-series forecasting model that uses built-in holidays
Create a model that forecasts daily page views for the Wikipedia
"Google I/O" page, based on page
view data before 2022 and taking built-in holidays into account:
Go to the
BigQuery
page.
Go to BigQuery
In the SQL editor pane, run the following SQL statement:
CREATE OR REPLACE MODEL `bqml_tutorial.forecast_googleio`
OPTIONS (
model_type = 'ARIMA_PLUS',
holiday_region = 'US',
time_series_timestamp_col = 'date',
time_series_data_col = 'views',
data_frequency = 'DAILY',
horizon = 365)
AS
SELECT
*
FROM
`bqml_tutorial.googleio_page_views`
WHERE
date
< '2022-01-01';="">
Visualize the forecasted results
After you create the model using built-in holidays, join the original data from
the
bqml_tutorial.googleio_page_views
table with the forecasted value from the
ML.EXPLAIN_FORECAST
function
,
and then visualize it by
using Looker Studio
:
Go to the
BigQuery
page.
Go to BigQuery
In the SQL editor pane, run the following SQL statement:
SELECT
original.date,
original.views AS original_views,
explain_forecast.time_series_adjusted_data
AS adjusted_views_without_custom_holiday,
FROM
`bqml_tutorial.googleio_page_views` original
INNER JOIN
(
SELECT
*
FROM
ML.EXPLAIN_FORECAST(
MODEL `bqml_tutorial.forecast_googleio`,
STRUCT(365 AS horizon))
) explain_forecast
ON
TIMESTAMP(original.date)
= explain_forecast.time_series_timestamp
ORDER BY
original.date;
In the
Query results
pane, click
Explore data
,
and then click
Explore with Looker Studio
. Looker Studio
opens in a new tab.
In the Looker Studio tab, click
Add a chart
, and then
click the time series chart:
Place the chart on the report.
On the
Setup
tab of the
Chart
pane, click
Add metric
and select
adjusted_views_without_custom_holiday
:
The chart looks similar to the following:
You can see that the forecasting model captures the general trend pretty
well. However, it isn't capturing the increased traffic related to previous
Google I/O events, and it isn't able to generate an accurate forecast for
- The next sections show you how to deal with some of these
limitations.
Create a time-series forecasting model that uses built-in holidays and custom holidays
As you can see in
Google I/O history
,
the Google I/O event occurred on different dates between 2017 and 2022. To take
this variation into account, create a model that forecasts page views for the
Wikipedia "Google_I/O" page through 2022, based on page view data from before
2022, and using custom holidays to represent the Google I/O event each year.
In this model, you also adjust the holiday effect window to cover three days
around the event date, to better capture some potential page traffic before
and after the event.
Go to the
BigQuery
page.
Go to BigQuery
In the SQL editor pane, run the following SQL statement:
CREATE OR REPLACE MODEL `bqml_tutorial.forecast_googleio_with_custom_holiday`
OPTIONS (
model_type = 'ARIMA_PLUS',
holiday_region = 'US',
time_series_timestamp_col = 'date',
time_series_data_col = 'views',
data_frequency = 'DAILY',
horizon = 365)
AS (
training_data AS (
SELECT
*
FROM
`bqml_tutorial.googleio_page_views`
WHERE
date
< '2022-01-01'="" ),="" custom_holiday="" as="" (="" select="" 'us'="" as="" region,="" 'googleio'="" as="" holiday_name,="" primary_date,="" 1="" as="" preholiday_days,="" 2="" as="" postholiday_days="" from="" unnest(="" [="" date('2017-05-17'),="" date('2018-05-08'),="" date('2019-05-07'),="" --="" cancelled="" in="" 2020="" due="" to="" pandemic="" date('2021-05-18'),="" date('2022-05-11')])="" as="" primary_date="" )="" );="">
Visualize the forecasted results
After you create the model using custom holidays, join the original data from
the
bqml_tutorial.googleio_page_views
table with the forecasted value from the
ML.EXPLAIN_FORECAST
function
,
and then visualize it by
using Looker Studio
:
Go to the
BigQuery
page.
Go to BigQuery
In the SQL editor pane, run the following SQL statement:
SELECT
original.date,
original.views AS original_views,
explain_forecast.time_series_adjusted_data
AS adjusted_views_with_custom_holiday,
FROM
`bqml_tutorial.googleio_page_views` original
INNER JOIN
(
SELECT
*
FROM
ML.EXPLAIN_FORECAST(
MODEL
`bqml_tutorial.forecast_googleio_with_custom_holiday`,
STRUCT(365 AS horizon))
) explain_forecast
ON
TIMESTAMP(original.date)
= explain_forecast.time_series_timestamp
ORDER BY
original.date;
In the
Query results
pane, click
Explore data
,
and then click
Explore with Looker Studio
. Looker Studio
opens in a new tab.
In the Looker Studio tab, click
Add a chart
, click the
time series chart, and place the chart on the report.
On the
Setup
tab of the
Chart
pane, click
Add metric
and select
adjusted_views_with_custom_holiday
.
The chart looks similar to the following:
As you can see, the custom holidays boosted the performance
of the forecasting model. It now effectively captures the increase of page
views caused by Google I/O.
Inspect the list of holidays that were taken into account during modeling
by using the
ML.HOLIDAY_INFO
function
:
Go to the
BigQuery
page.
Go to BigQuery
In the SQL editor pane, run the following SQL statement:
SELECT *
FROM
ML.HOLIDAY_INFO(
MODEL `bqml_tutorial.forecast_googleio_with_custom_holiday`);
The results show both Google I/O and the built-in holidays in the list
of holidays:
Evaluate the effects of the custom holidays
Evaluate the effects of the custom holidays on the forecasted results by
using the
ML.EXPLAIN_FORECAST
function
:
Go to the
BigQuery
page.
Go to BigQuery
In the SQL editor pane, run the following SQL statement:
SELECT
time_series_timestamp,
holiday_effect_GoogleIO,
holiday_effect_US_Juneteenth,
holiday_effect_Christmas,
holiday_effect_NewYear
FROM
ML.EXPLAIN_FORECAST(
model
`bqml_tutorial.forecast_googleio_with_custom_holiday`,
STRUCT(365 AS horizon))
WHERE holiday_effect != 0;
The results show that Google I/O contributes a large amount of holiday
effect to the forecasted results:
Use the
ML.EVALUATE
function
to compare the performance of the first model created without custom holidays
and the second model created with custom holidays. To see how the second
model performs when it comes to forecasting a future custom holiday, set the
time range to the week of Google I/O in 2022:
Go to the
BigQuery
page.
Go to BigQuery
In the SQL editor pane, run the following SQL statement:
SELECT
"original" AS model_type,
*
FROM
ml.evaluate(
MODEL `bqml_tutorial.forecast_googleio`,
(
SELECT
*
FROM
`bqml_tutorial.googleio_page_views`
WHERE
date >= '2022-05-08'
AND date
< '2022-05-12'="" ),="" struct(="" 365="" as="" horizon,="" true="" as="" perform_aggregation))="" union="" all="" select="" "with_custom_holiday"="" as="" model_type,="" *="" from="" ml.evaluate(="" model="" `bqml_tutorial.forecast_googleio_with_custom_holiday`,="" (="" select="" *="" from="" `bqml_tutorial.googleio_page_views`="" where="" date="">= '2022-05-08'
AND date
< '2022-05-12'="" ),="" struct(="" 365="" as="" horizon,="" true="" as="" perform_aggregation));="">
The results show that the second model offers a significant performance
improvement:
Clean up
-
In the Google Cloud console, go to the
Manage resources
page.
Go to Manage resources
-
In the project list, select the project that you
want to delete, and then click
Delete
.
-
In the dialog, type the project ID, and then click
Shut down
to delete the project.