Conversational Actions let you extend Google Assistant with your own
conversational interfaces that give users access to your products and
services. Actions leverage Assistant's powerful natural language
understanding (NLU) engine to process and understand natural language input
and carry out tasks based on that input.
Overview
A Conversational Action is a simple object that defines an
entry point (referred to as invocation) into a conversation:
- An
invocation
defines how users tell Assistant they want to start a
conversation with one of your Actions. An Action's invocation is defined by a
single
intent
that gets matched when users request the Action.
- A
conversation
defines how users interact with an Action after
it's invoked. You build conversations with
intents
,
types
,
scenes
, and
prompts
.
- In addition, your Actions can delegate extra work to
fulfillment
, which
are web services that communicate with your Actions via webhooks. This
lets you do data validation, call other web services, carry out business
logic, and more.
You bundle one or many Actions together, based on the use cases that are
important for your users, into a logical container called an Actions project.
Your Actions project contains your entire invocation model (the collection of
all your invocations), which lets users start at logical places in your
conversation model (all the possible things users can say and all the possible
ways you respond back to users).
Invocation
Invocation is associated with a
display name
that represents a brand,
name, or persona that lets users ask Assistant to invoke your Actions.
Users can use this display name on its own (called the main invocation) or in
combination with optional,
deep link
phrases to invoke your Actions.
For example, users can say the following phrases to invoke three separate
Actions in an project with a display name of "Facts about Google":
- "Ok Google, talk to Facts about Google"
- "Ok Google, talk to Facts about Google to get company facts"
- "Ok Google, talk to Facts about Google to get history facts"
The first invocation in the example is the
main invocation
. This
invocation is associated with a special system intent named
actions.intent.MAIN
. The second and third invocations are deep link
invocations that let you specify additional phrases that let users ask for
specific functionality. These invocations correspond to user intents that you
designated as global. Each invocation in this example provides an entry point
into a conversation and corresponds to a single Action.
Figure 2 describes a typical main invocation flow:
- When users request an Action, they typically ask Assistant for it
by your display name.
- Assistant matches the user's request with the corresponding intent
that matches the request. In this case, it would be
actions.intent.MAIN
.
- The Action is notified of the intent match and responds with the
corresponding prompt to start a conversation with the user.
Conversation
Conversation defines how users interact with an Action after it's invoked. You
build these interactions by defining the valid user input for your
conversation, the logic to process that input, and the corresponding prompts
to respond back to the user with. The following figure and explanation shows
you how a typical conversation turn works with a conversation's low level
components:
intents
,
types
,
scenes
, and
prompts
.
Figure 3 describes a typical conversation turn:
- When users say something, the Assistant NLU matches the input to an
appropriate intent. An intent is matched if the
language model
for that
intent can closely or exactly match the user input. You define the language
model by specifying
training phrases
, or examples of things users might want
to say. Assistant takes these training phrases and expands upon them to
create the intent's language model.
- When the Assistant NLU matches an intent, it can extract
parameters
that
you need from the input. These parameters have
types
associated with them,
such as a date or number. You annotate specific parts of an intent's training
phrases to specify what parameters you want to extract.
- A
scene
then processes the matched intent. You can think of scenes as the
logic executors of an Action, doing the heavy lifting and carrying out logic
necessary to drive a conversation forward. Scenes run in a loop, providing a
flexible execution lifecycle that lets you do things like validate intent
parameters, do slot filling, send prompts back to the user, and more.
- When a scene is done executing, it typically sends a prompt back to users
to continue the conversation or can end the conversation if appropriate.
Fulfillment
During invocation or a conversation, your Action can trigger a webhook that
notifies a fulfillment service to carry out some tasks.
Figure 4 describes how you can use fulfillment to generate prompts, a common
way to use fulfillment:
- At specific points of your Action's execution, it can trigger a webhook
that sends a request to a registered webhook handler (your fulfillment
service) with a JSON payload.
- Your fulfillment processes the request, such as calling a REST API to do
some data lookup or validating some data from the JSON payload. A very common
way to use fulfillment is to generate a dynamic prompt at runtime so your
conversations are more tailored to the current user.
- Your fulfillment returns a response back to your Action containing a JSON
payload. It can use the data from the payload to continue it's execution and
respond back to the user.