QNN LPAI Memory Management

This document describes how the QNN Low-Power AI (LPAI) runtime uses and manages memory. The runtime relies on user-allocated buffers that must obey backend-provided alignment constraints. Incorrect alignment or insufficient memory will cause initialization or execution failures.

Overview of Memory Types

The LPAI runtime uses three distinct memory pools, each required for correct graph execution:

  1. Scratch Memory

  2. Persistent Memory

  3. IO Memory

Each type has unique allocation rules, lifetime characteristics, and backend alignment requirements.

  • Scratch Memory: temporary and overwriteable tensors.

  • Persistent Memory: long-lived tensors such as RNN state.

  • IO Memory: input/output tensors; may be user-provided or automatically placed into scratch memory.

All memory pools must be correctly aligned according to backend requirements.

Get Memory Alignment Requirements

Before allocating any memory, clients must retrieve backend alignment constraints. These constraints apply to:

  • Scratch memory

  • Persistent memory

  • User-provided IO buffers

To query backend alignment requirements:

 1QnnLpaiBackend_BufferAlignmentReq_t bufferAlignmentReq;
 2
 3QnnLpaiBackend_CustomProperty_t customBackendProp;
 4customBackendProp.option   = QNN_LPAI_BACKEND_GET_PROP_ALIGNMENT_REQ;
 5customBackendProp.property = &bufferAlignmentReq;
 6
 7QnnBackend_Property_t backendProp;
 8backendProp.option         = QNN_BACKEND_PROPERTY_OPTION_CUSTOM;
 9backendProp.customProperty = &customBackendProp;
10
11QnnBackend_Property_t *backendPropPtrs[2] = {0};
12backendPropPtrs[0] = &backendProp;
13
14QnnBackend_getProperty(backendHandle, backendPropPtrs);
15
16if (!error) {
17  *startAddrAlignment = bufferAlignmentReq.startAddrAlignment;
18  *sizeAlignment      = bufferAlignmentReq.sizeAlignment;
19}

Scratch Memory

Scratch memory holds temporary intermediate results that the runtime can overwrite and reuse during execution.

Key Properties

  • Used for intermediate tensors across graph execution.

  • Fully memory-planned offline by the backend.

  • Size must be queried from the graph.

  • Must be provided before QnnGraph_finalize().

  • May be replaced at runtime but must always exist.

Querying Scratch Memory Requirements

QnnLpaiGraph_CustomProperty_t customGraphProp;
customGraphProp.option   = QNN_LPAI_GRAPH_GET_PROP_SCRATCH_MEM_SIZE;
customGraphProp.property = scratchSize;

QnnGraph_Property_t graphProp;
graphProp.option         = QNN_GRAPH_PROPERTY_OPTION_CUSTOM;
graphProp.customProperty = &customGraphProp;

QnnGraph_Property_t *graphPropPtrs[2] = {0};
graphPropPtrs[0] = &graphProp;

QnnGraph_getProperty(graphHandle, graphPropPtrs);

Allocating and Configuring Scratch Memory

QnnLpaiGraph_Mem_t lpaiGraphMem;
lpaiGraphMem.memType = memType;
lpaiGraphMem.size    = scratchSize;
lpaiGraphMem.addr    = scratchBuffer;

QnnLpaiGraph_CustomConfig_t customGraphCfg;
customGraphCfg.option = QNN_LPAI_GRAPH_SET_CFG_SCRATCH_MEM;
customGraphCfg.config = &lpaiGraphMem;

QnnGraph_Config_t graphConfig;
graphConfig.option       = QNN_GRAPH_CONFIG_OPTION_CUSTOM;
graphConfig.customConfig = &customGraphCfg;

QnnGraph_Config_t *graphCfgPtrs[2] = {0};
graphCfgPtrs[0] = &graphConfig;

QnnGraph_setConfig(graphHandle, (const QnnGraph_Config_t **)graphCfgPtrs);

Persistent Memory

Persistent memory stores intermediate tensors that cannot be overwritten, because they must persist across operations. Examples include RNN state tensors.

Key Properties

  • Holds long-lived intermediate data.

  • User must allocate memory after querying required size.

  • Must follow backend alignment constraints.

  • Must remain valid until QnnContext_free().

Querying Persistent Memory Requirements

QnnLpaiGraph_CustomProperty_t customGraphProp;
customGraphProp.option   = QNN_LPAI_GRAPH_GET_PROP_PERSISTENT_MEM_SIZE;
customGraphProp.property = persistentSize;

QnnGraph_Property_t graphProp;
graphProp.option         = QNN_GRAPH_PROPERTY_OPTION_CUSTOM;
graphProp.customProperty = &customGraphProp;

QnnGraph_Property_t *graphPropPtrs[2] = {0};
graphPropPtrs[0] = &graphProp;

QnnGraph_getProperty(graphHandle, graphPropPtrs);

Allocating and Configuring Persistent Memory

QnnLpaiGraph_Mem_t lpaiGraphMem;
lpaiGraphMem.memType = memType;
lpaiGraphMem.size    = persistentSize;
lpaiGraphMem.addr    = persistentBuffer;

QnnLpaiGraph_CustomConfig_t customGraphCfg;
customGraphCfg.option = QNN_LPAI_GRAPH_SET_CFG_PERSISTENT_MEM;
customGraphCfg.config = &lpaiGraphMem;

QnnGraph_Config_t graphConfig;
graphConfig.option       = QNN_GRAPH_CONFIG_OPTION_CUSTOM;
graphConfig.customConfig = &customGraphCfg;

QnnGraph_Config_t *graphCfgPtrs[2] = {0};
graphCfgPtrs[0] = &graphConfig;

QnnGraph_setConfig(graphHandle, (const QnnGraph_Config_t **)graphCfgPtrs);

IO Memory

IO memory contains all graph input and output tensors.

Key Properties

  • Can be user-provided or mapped into scratch memory by default.

  • User-provided IO buffers must follow alignment requirements.

  • Must remain valid during graph execution.

Querying IO Memory Requirements

// QnnSystemInterface is defined in ${QNN_SDK_ROOT}/include/QNN/System/QnnSystemInterface.h
QnnSystemInterface qnnSystemInterface;

// Init qnn system interface ......
// See ${QNN_SDK_ROOT}/examples/QNN/SampleApp/SampleAppLPAI code
// Extract QNN binaryInfo
const QnnSystemContext_BinaryInfo_t* binaryInfo;
Qnn_ContextBinarySize_t binaryInfoSize;
qnnSystemInterface->systemContextGetBinaryInfo(qnnSystemCtxHandle,
                                               contextBinaryBuffer,
                                               contextBinaryBufferSize,
                                               &binaryInfo,
                                               &binaryInfoSize);
// Extract graph info from QNN binaryInfo, assume only one graph in the context
QnnSystemContext_GraphInfo_t* graphInfos = binaryInfo->contextBinaryInfoV1.graphs;
QnnSystemContext_GraphInfo_t* graphInfo  = &(graphInfos[0]);

// Extract tensor info from graphInfo
Qnn_Tensor_t* inputs     = graphInfo->graphInfoV1.graphInputs;
Qnn_Tensor_t* outputs    = graphInfo->graphInfoV1.graphOutputs;
size_t numInputs         = graphInfo->graphInfoV1.numGraphInputs;
size_t numOutputs        = graphInfo->graphInfoV1.numGraphOutputs;

Allocating and Configuring IO Memory

// Qnn_Tensor_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnTypes.h
Qnn_Tensor_t tensors[numTensors];

size_t startAddrAlignment, sizeAlignment
// Retrieve buffer start address and size alignment requirements
// See ${QNN_SDK_ROOT}/examples/QNN/SampleApp/SampleAppLPAI code

for (uint32_t i = 0; i < numTensors; i++) {
   Qnn_Tensor_t* tensor = &tensors[i];
   tensor->v1.memType   = QNN_TENSORMEMTYPE_RAW;
   int dataSize         = calculate_tensor_size(qnnTensor->v1);
   tensor->v1.clientBuf.data =
      allocate_aligned_memory(startAddrAlignment, sizeAlignment, dataSize);
   tensor->v1.clientBuf.dataSize = dataSize;
}

Shared Buffers in the LPAI Backend

In the LPAI backend, shared buffers offer an efficient mechanism for moving data between the host CPU and the LPAI accelerator without requiring additional memory copies. Shared buffers allow both domains to reference the same underlying memory, enabling:

  • Zero-copy tensor transfers

  • Reduced latency during graph execution

  • Avoidance of redundant CPU-to-accelerator buffer duplication

  • Improved overall memory efficiency

Shared buffers are especially valuable when frequently updating input tensors or retrieving output tensors at high frame rates.

The following tutorial explains how to register and use shared buffers within the LPAI backend, covering the required API calls and expected memory constraints:

Memory Lifetime and Allocation Requirements

  • Scratch and persistent memory must be allocated and provided before QnnGraph_finalize().

  • Persistent memory must remain accessible for the entire lifetime of the LPAI context.

  • Scratch memory may be replaced dynamically but must always exist.

  • IO memory must remain valid throughout execution.