This page describes best practices for implementing Neural Networks API (NNAPI)
drivers to allow for broad adoption of the NNAPI by app developers.
Keep startup times short
If your driver transforms the weights of a model on first use, make sure the
driver supports compilation caching, which reduces the time used for compilation
when an app starts. This is important as apps might avoid using hardware
acceleration if start-up times are too long. For example, some apps have
more than 100 MB of weights and transforming these each time the app
launches is wasteful.
Reduce minimal latency
To ensure that models use hardware acceleration, it's important to reduce the
minimal latency in drivers. Many apps use small models that are executed
multiple times and if the minimal latency to execute a workload is too high,
such as a few milliseconds, models might run the workload on the CPU, which only
takes one or two milliseconds, instead of
using hardware accelerations. Be careful of costly thread synchronization.
Use the NN HAL SchedTune group
From Android 11 or higher, AOSP includes a dedicated
NN HAL
SchedTune
group that allows interprocess NN HAL processes to use big
cores, similar to same-process implementation within the predefined
top-app
cgroup
. Using this
SchedTune group reduces driver overhead, especially for small models.
To use the SchedTune group, add the following line to the
init.rc
file of
the NN HAL process:
writepid /dev/stune/nnapi-hal/tasks