QNN LPAI Backend Configuration Guide

Overview

This document outlines the structure and usage of LPAI backend configuration files employed by QNN tools such as qnn-net-run and qnn-context-binary-generator. These JSON-formatted files enable fine-grained control over model preparation, runtime behavior, debugging, profiling, and internal backend features.

There are two primary JSON configuration files:

  1. Backend Extension Configuration File Specifies the path to the LPAI backend extension shared library and the path to the LPAI backend configuration file.

    Example usage: --config_file <path_to_backend_extension_JSON>

    Example format:

    {
        "backend_extensions" : {
            "shared_library_path" : "path_to_Lpai_extension_shared_library",
            "config_file_path" : "path_to_Lpai_extension_config_file"
        }
    }
    
  2. LPAI Backend Configuration File Defines all configurable parameters for model generation and execution. This file is parsed by the LPAI backend extension library.

Configuration Schema

The configuration is organized into the following sections:

  • lpai_backend: Global backend settings.

  • lpai_graph : Graph generation and execution parameters.

  • lpai_profile: Profiling options (optional).

  • lpai_debug : Debug options (optional).

Each section and its parameters are described below.

lpai_backend

  • target_env (string):

    Target environment for model execution.

    Options: arm, adsp, x86 Default: adsp

  • enable_hw_ver (string):

    Hardware version of target refer to Supported Snapdragon Devices.

    Options: v5, v5_1, v6 Default: v6

lpai_graph

lpai_profile (Optional)

  • level (string): Profiling level: basic, detailed. Default: basic Lpai Profiling

lpai_debug (Optional)

QNN LPAI Backend Configuration Parameters

Fps and ftrt_ratio information

These parameters define how a client configures its processing behavior for eNPU hardware.

  • fps (Frames Per Second)
    • Specifies how frequently inference must be completed.

    • For example, fps = 10 means the system must process one frame every 100 milliseconds (i.e., 1000 ms / 10).

    • This sets the overall time budget for each frame, including pre-processing, inference, and post-processing.

  • ftrt_ratio (Factor to Real-Time Ratio)
    • Determines the hardware configuration to meet the latency requirement for inference.

    • If pre- and post-processing take up most of the frame time (e.g., 80 ms out of 100 ms), only 20 ms remain for inference.

    • To ensure inference completes within this reduced time window, the eNPU must be boosted.

    • Setting ftrt_ratio = 50 applies a multiplication factor of 5.0 to the base clock frequency, helping the eNPU meet the tighter latency constraint.

  • Default Values
    • fps = 1 (1 frame per second, allowing 1000 ms per frame)

    • ftrt_ratio = 10 (moderate clock scaling factor)

These defaults imply a relaxed processing schedule and a balanced performance-power tradeoff.

Realtime vs Non-Realtime client

  • Real-time: Indicates that the model is intended for real-time use cases, where a specific performance threshold must be met. If the required performance cannot be achieved, the finalize function will return an error.

  • Non-real-time: Refers to models without strict performance requirements. In these cases, LPAI will make a best-effort attempt to accommodate the workload, and finalize will not fail due to performance limitations.

Core Selection & Affinity

Clients can configure core selection and affinity settings for the eAI to control how their model’s offloaded operations (Ops) are assigned to processing cores.

If no settings are provided:

  • Core Selection defaults to 0x00 (no specific preference — any available core may be selected).

  • Affinity defaults to soft affinity.

Core Selection

  • coreSelection is a bitmask that specifies which core(s) are eligible for selection.

  • Each bit represents a core:

    • 0x01 → selects core 0

    • 0x02 → selects core 1

    • 0x00 → no specific preference; any available core may be selected

Important

  • Mixed core selection (e.g., 0x03 to select both core 0 and core 1) is not yet supported.

Platform-Specific Guidance

For platforms with only one processing core, users should configure:

  • coreSelection = 0x00 (no specific preference), or

  • coreSelection = 0x01 (explicitly select core 0)

This ensures compatibility and avoids undefined behavior due to unsupported multi-core selection.

Important

  • The API does not expose core characteristics (e.g., whether a core is “big” or “small”).

  • Users should consult platform documentation to determine core capabilities and make informed decisions about core selection and affinity strategy.

Affinity Strategy

  • Hard Affinity: Forces Ops to run only on the selected core.

  • Soft Affinity: Prefers the selected core but allows fallback to another if the preferred is busy.

Guidance Based on Core Behavior

Scenario

Recommended coreSelection

Affinity Type

Rationale

Heavy compute workloads (e.g., large convNets)

0x02 (Core 1)

Hard or Soft

Core 1 is typically a big core, offering better performance

Audio use cases

0x01 (Core 0)

Soft

Core 0 (small core) is sufficient and more power-efficient

Camera use cases

0x02 (Core 1)

Soft

Core 1 provides faster inference for image processing

Shared workloads (audio + camera)

0x00 (Any)

Soft

Allows dynamic load balancing across cores

Power-sensitive applications

0x01 (Core 0)

Soft

Core 0 consumes less power

Performance-critical apps

0x02 (Core 1)

Hard

Ensures consistent execution on the high-performance core

System-Level Considerations

  • Core affinity should be tuned based on:

    • System concurrency

    • Workload characteristics

    • KPI targets

    • Power budget

    • Profiling results

  • Core shutdown is not required: Idle cores are automatically power collapsed, ensuring efficient power management.

Runtime Layout Control in LPAI

Purpose

The force_nhwc option is a runtime configuration setting used in Qualcomm’s LPAI (Low Power AI) backend to enforce NHWC tensor layout during model execution. Its primary role is to help avoid automatic layout transformations—specifically TRANSPOSE operations—around convolutional layers, which can negatively impact performance and profiling clarity.

Why It Matters

When executing models on the eNPU, layout transformations often appear around operations like Conv2D, especially at graph boundaries. These transformations are inserted to reconcile differences between the model’s tensor layout (e.g., NHWC) and the eNPU’s internal hardware-native layout, which is typically blocked or tiled.

Even if a model is converted with NHWC input/output layouts and no output layout is explicitly forced, the runtime may still insert TRANSPOSE operations unless force_nhwc is enabled. These transformations can dominate execution time on the DSP and obscure the performance of the actual accelerated operation.

Limitations

  • If force_nhwc is not enabled, layout transforms will likely appear even if the graph is NHWC.

  • For single operations at graph boundaries, layout transforms may still occur due to the eNPU’s internal layout requirements.

  • To fully avoid layout transforms, it is often necessary to chain multiple eNPU-compatible operations, allowing the internal layout to be reused across ops without conversion.

Summary

force_nhwc is a critical setting for developers aiming to optimize LPAI model execution and profiling. It ensures that NHWC layouts are respected at runtime, reducing overhead and improving clarity in performance analysis. However, due to hardware constraints, some layout transforms may still be unavoidable unless multiple operations are chained together.

Full JSON Scheme

Below is a complete scheme of the LPAI backend configuration file with all supported parameters:

{
   "lpai_backend": {

      // Selection of targets [options: arm/adsp/x86] [default: adsp] (Simulator or target)
      // Used by qnn-context-binary-generator during offline generation
      "target_env": "adsp",

      // Corresponds to the LPAI hardware version [options: v5/v5_1/v6] [default: v6]
      // Used by qnn-context-binary-generator during offline generation
      "enable_hw_ver": "v6"
   },
   "lpai_graph": {
      "execute": {

         // Specify the fps rate number, used for clock voting [options: number] [default: 1]
         // Used by qnn-net-run during execution
         "fps": {"type": "integer"},

         // Specify the ftrt_ratio number [options: number] [default: 10]
         // Used by qnn-net-run during execution
         "ftrt_ratio": {"type": "integer"},

         // Definition of client type [options: real_time/non_real_time] [default: real_time]
         // Used by qnn-net-run during execution
         "client_type": {"type": "string"},

         // Definition of affinity type [options: soft/hard] [default: soft]
         // Used by qnn-net-run during execution
         "affinity": {"type": "string"},

         // Specify the core number [options: number] [default: 0]
         // Used by qnn-net-run during execution
         "core_selection": {"type": "integer"}
      }
   }
}

Full JSON Example

Below is a complete example of the LPAI backend configuration file with all supported parameters:

{
   "lpai_backend": {
      "target_env": "adsp",
      "enable_hw_ver": "v6"
   },
   "lpai_graph": {
      "execute": {
         "fps": 1,
         "ftrt_ratio": 10,
         "client_type": "real_time",
         "affinity": "soft",
      }
   }
}

Best Practices

  • Minimal Changes: Use default values unless specific tuning is required.

  • Validation: Ensure all values conform to expected types and allowed options.

  • Version Compatibility: Refer to the Supported Snapdragon Devices for supported LPAI versions.