QNN LPAI Backend Configuration Guide¶
Overview¶
This document outlines the structure and usage of LPAI backend configuration files employed by QNN tools such as qnn-net-run and qnn-context-binary-generator.
These JSON-formatted files enable fine-grained control over model preparation, runtime behavior, debugging, profiling, and internal backend features.
There are two primary JSON configuration files:
Backend Extension Configuration File Specifies the path to the LPAI backend extension shared library and the path to the LPAI backend configuration file.
Example usage:
--config_file <path_to_backend_extension_JSON>Example format:
{ "backend_extensions" : { "shared_library_path" : "path_to_Lpai_extension_shared_library", "config_file_path" : "path_to_Lpai_extension_config_file" } }
LPAI Backend Configuration File Defines all configurable parameters for model generation and execution. This file is parsed by the LPAI backend extension library.
Configuration Schema¶
The configuration is organized into the following sections:
lpai_backend: Global backend settings.lpai_graph: Graph generation and execution parameters.lpai_profile: Profiling options (optional).lpai_debug: Debug options (optional).
Each section and its parameters are described below.
lpai_backend¶
target_env(string):Target environment for model execution.
Options:
arm,adsp,x86Default:adspenable_hw_ver(string):Hardware version of target refer to Supported Snapdragon Devices.
Options:
v5,v5_1,v6Default:v6
lpai_graph¶
executeUsed by
qnn-net-runduring runtime execution.fps(integer): Target frames per second. Default:1ftrt_ratio(integer): Frame-to-real-time ratio. Default:10client_type(string): Type of workload. Options:real_time,non_real_time. Default:real_timeaffinity(string): Core affinity policy. Options:soft,hard. Default:softcore_selection(integer): Specific core number. Default:0
lpai_profile (Optional)¶
level(string): Profiling level:basic,detailed. Default:basicLpai Profiling
lpai_debug (Optional)¶
force_nhwc(bool): Enforce NHWC tensor layout. Default:false
QNN LPAI Backend Configuration Parameters¶
Fps and ftrt_ratio information¶
These parameters define how a client configures its processing behavior for eNPU hardware.
- fps (Frames Per Second)
Specifies how frequently inference must be completed.
For example, fps = 10 means the system must process one frame every 100 milliseconds (i.e., 1000 ms / 10).
This sets the overall time budget for each frame, including pre-processing, inference, and post-processing.
- ftrt_ratio (Factor to Real-Time Ratio)
Determines the hardware configuration to meet the latency requirement for inference.
If pre- and post-processing take up most of the frame time (e.g., 80 ms out of 100 ms), only 20 ms remain for inference.
To ensure inference completes within this reduced time window, the eNPU must be boosted.
Setting ftrt_ratio = 50 applies a multiplication factor of 5.0 to the base clock frequency, helping the eNPU meet the tighter latency constraint.
- Default Values
fps = 1 (1 frame per second, allowing 1000 ms per frame)
ftrt_ratio = 10 (moderate clock scaling factor)
These defaults imply a relaxed processing schedule and a balanced performance-power tradeoff.
Realtime vs Non-Realtime client¶
Real-time: Indicates that the model is intended for real-time use cases, where a specific performance threshold must be met. If the required performance cannot be achieved, the finalize function will return an error.
Non-real-time: Refers to models without strict performance requirements. In these cases, LPAI will make a best-effort attempt to accommodate the workload, and finalize will not fail due to performance limitations.
Core Selection & Affinity¶
Clients can configure core selection and affinity settings for the eAI to control how their model’s offloaded operations (Ops) are assigned to processing cores.
If no settings are provided:
Core Selection defaults to
0x00(no specific preference — any available core may be selected).Affinity defaults to soft affinity.
Core Selection¶
coreSelectionis a bitmask that specifies which core(s) are eligible for selection.Each bit represents a core:
0x01→ selects core 00x02→ selects core 10x00→ no specific preference; any available core may be selected
Important
Mixed core selection (e.g.,
0x03to select both core 0 and core 1) is not yet supported.
Platform-Specific Guidance¶
For platforms with only one processing core, users should configure:
coreSelection = 0x00(no specific preference), orcoreSelection = 0x01(explicitly select core 0)
This ensures compatibility and avoids undefined behavior due to unsupported multi-core selection.
Important
The API does not expose core characteristics (e.g., whether a core is “big” or “small”).
Users should consult platform documentation to determine core capabilities and make informed decisions about core selection and affinity strategy.
Affinity Strategy¶
Hard Affinity: Forces Ops to run only on the selected core.
Soft Affinity: Prefers the selected core but allows fallback to another if the preferred is busy.
Guidance Based on Core Behavior¶
Scenario |
Recommended coreSelection |
Affinity Type |
Rationale |
|---|---|---|---|
Heavy compute workloads (e.g., large convNets) |
|
Hard or Soft |
Core 1 is typically a big core, offering better performance |
Audio use cases |
|
Soft |
Core 0 (small core) is sufficient and more power-efficient |
Camera use cases |
|
Soft |
Core 1 provides faster inference for image processing |
Shared workloads (audio + camera) |
|
Soft |
Allows dynamic load balancing across cores |
Power-sensitive applications |
|
Soft |
Core 0 consumes less power |
Performance-critical apps |
|
Hard |
Ensures consistent execution on the high-performance core |
System-Level Considerations¶
Core affinity should be tuned based on:
System concurrency
Workload characteristics
KPI targets
Power budget
Profiling results
Core shutdown is not required: Idle cores are automatically power collapsed, ensuring efficient power management.
Runtime Layout Control in LPAI¶
Purpose¶
The force_nhwc option is a runtime configuration setting used in Qualcomm’s LPAI (Low Power AI) backend to enforce NHWC tensor layout during model execution.
Its primary role is to help avoid automatic layout transformations—specifically TRANSPOSE operations—around convolutional layers, which can negatively impact performance and profiling clarity.
Why It Matters¶
When executing models on the eNPU, layout transformations often appear around operations like Conv2D, especially at graph boundaries.
These transformations are inserted to reconcile differences between the model’s tensor layout (e.g., NHWC) and the eNPU’s internal hardware-native layout, which is typically blocked or tiled.
Even if a model is converted with NHWC input/output layouts and no output layout is explicitly forced, the runtime may still insert TRANSPOSE operations unless force_nhwc is enabled.
These transformations can dominate execution time on the DSP and obscure the performance of the actual accelerated operation.
Recommended Usage¶
To minimize or eliminate layout transformations at graph boundaries:
Set input and output tensor layouts to NHWC during model conversion.
Enable
force_nhwcin the runtime configuration. This instructs the runtime to preserve NHWC layout and avoid inserting layout transforms.Avoid forcing output layout during conversion, which can trigger post-processing transforms.
Limitations¶
If
force_nhwcis not enabled, layout transforms will likely appear even if the graph is NHWC.For single operations at graph boundaries, layout transforms may still occur due to the eNPU’s internal layout requirements.
To fully avoid layout transforms, it is often necessary to chain multiple eNPU-compatible operations, allowing the internal layout to be reused across ops without conversion.
Summary¶
force_nhwc is a critical setting for developers aiming to optimize LPAI model execution and profiling.
It ensures that NHWC layouts are respected at runtime, reducing overhead and improving clarity in performance analysis.
However, due to hardware constraints, some layout transforms may still be unavoidable unless multiple operations are chained together.
Full JSON Scheme¶
Below is a complete scheme of the LPAI backend configuration file with all supported parameters:
{
"lpai_backend": {
// Selection of targets [options: arm/adsp/x86] [default: adsp] (Simulator or target)
// Used by qnn-context-binary-generator during offline generation
"target_env": "adsp",
// Corresponds to the LPAI hardware version [options: v5/v5_1/v6] [default: v6]
// Used by qnn-context-binary-generator during offline generation
"enable_hw_ver": "v6"
},
"lpai_graph": {
"execute": {
// Specify the fps rate number, used for clock voting [options: number] [default: 1]
// Used by qnn-net-run during execution
"fps": {"type": "integer"},
// Specify the ftrt_ratio number [options: number] [default: 10]
// Used by qnn-net-run during execution
"ftrt_ratio": {"type": "integer"},
// Definition of client type [options: real_time/non_real_time] [default: real_time]
// Used by qnn-net-run during execution
"client_type": {"type": "string"},
// Definition of affinity type [options: soft/hard] [default: soft]
// Used by qnn-net-run during execution
"affinity": {"type": "string"},
// Specify the core number [options: number] [default: 0]
// Used by qnn-net-run during execution
"core_selection": {"type": "integer"}
}
}
}
Full JSON Example¶
Below is a complete example of the LPAI backend configuration file with all supported parameters:
{
"lpai_backend": {
"target_env": "adsp",
"enable_hw_ver": "v6"
},
"lpai_graph": {
"execute": {
"fps": 1,
"ftrt_ratio": 10,
"client_type": "real_time",
"affinity": "soft",
}
}
}
Best Practices¶
Minimal Changes: Use default values unless specific tuning is required.
Validation: Ensure all values conform to expected types and allowed options.
Version Compatibility: Refer to the Supported Snapdragon Devices for supported LPAI versions.