QNN LPAI Memory Management¶

This document describes how the QNN Low-Power AI (LPAI) runtime uses and manages memory. The runtime relies on user-allocated buffers that must obey backend-provided alignment constraints. Incorrect alignment or insufficient memory will cause initialization or execution failures.

Overview of Memory Types
Get Memory Alignment Requirements
Scratch Memory
Persistent Memory
IO Memory
Memory Lifetime and Allocation Requirements
Recommended Workflow

Overview of Memory Types ¶

The LPAI runtime uses three distinct memory pools, each required for correct graph execution:

Scratch Memory
Persistent Memory
IO Memory

Each type has unique allocation rules, lifetime characteristics, and backend alignment requirements.

Scratch Memory: temporary and overwriteable tensors.
Persistent Memory: long-lived tensors such as RNN state.
IO Memory: input/output tensors; may be user-provided or automatically placed into scratch memory.

All memory pools must be correctly aligned according to backend requirements.

Get Memory Alignment Requirements ¶

Before allocating any memory, clients must retrieve backend alignment constraints. These constraints apply to:

Scratch memory
Persistent memory
User-provided IO buffers

To query backend alignment requirements:

QnnLpaiBackend_BufferAlignmentReq_t bufferAlignmentReq;

QnnLpaiBackend_CustomProperty_t customBackendProp;
customBackendProp.option   = QNN_LPAI_BACKEND_GET_PROP_ALIGNMENT_REQ;
customBackendProp.property = &bufferAlignmentReq;

QnnBackend_Property_t backendProp;
backendProp.option         = QNN_BACKEND_PROPERTY_OPTION_CUSTOM;
backendProp.customProperty = &customBackendProp;

QnnBackend_Property_t *backendPropPtrs[2] = {0};
backendPropPtrs[0] = &backendProp;

QnnBackend_getProperty(backendHandle, backendPropPtrs);

if (!error) {
  *startAddrAlignment = bufferAlignmentReq.startAddrAlignment;
  *sizeAlignment      = bufferAlignmentReq.sizeAlignment;
}

Scratch Memory ¶

Scratch memory holds temporary intermediate results that the runtime can overwrite and reuse during execution.

Key Properties ¶

Used for intermediate tensors across graph execution.
Fully memory-planned offline by the backend.
Size must be queried from the graph.
Must be provided before QnnGraph_finalize().
May be replaced at runtime but must always exist.

Querying Scratch Memory Requirements ¶

QnnLpaiGraph_CustomProperty_t customGraphProp;
customGraphProp.option   = QNN_LPAI_GRAPH_GET_PROP_SCRATCH_MEM_SIZE;
customGraphProp.property = scratchSize;

QnnGraph_Property_t graphProp;
graphProp.option         = QNN_GRAPH_PROPERTY_OPTION_CUSTOM;
graphProp.customProperty = &customGraphProp;

QnnGraph_Property_t *graphPropPtrs[2] = {0};
graphPropPtrs[0] = &graphProp;

QnnGraph_getProperty(graphHandle, graphPropPtrs);

Allocating and Configuring Scratch Memory ¶

QnnLpaiGraph_Mem_t lpaiGraphMem;
lpaiGraphMem.memType = memType;
lpaiGraphMem.size    = scratchSize;
lpaiGraphMem.addr    = scratchBuffer;

QnnLpaiGraph_CustomConfig_t customGraphCfg;
customGraphCfg.option = QNN_LPAI_GRAPH_SET_CFG_SCRATCH_MEM;
customGraphCfg.config = &lpaiGraphMem;

QnnGraph_Config_t graphConfig;
graphConfig.option       = QNN_GRAPH_CONFIG_OPTION_CUSTOM;
graphConfig.customConfig = &customGraphCfg;

QnnGraph_Config_t *graphCfgPtrs[2] = {0};
graphCfgPtrs[0] = &graphConfig;

QnnGraph_setConfig(graphHandle, (const QnnGraph_Config_t **)graphCfgPtrs);

Persistent Memory ¶

Persistent memory stores intermediate tensors that cannot be overwritten, because they must persist across operations. Examples include RNN state tensors.

Key Properties ¶

Holds long-lived intermediate data.
User must allocate memory after querying required size.
Must follow backend alignment constraints.
Must remain valid until QnnContext_free().

Querying Persistent Memory Requirements ¶

QnnLpaiGraph_CustomProperty_t customGraphProp;
customGraphProp.option   = QNN_LPAI_GRAPH_GET_PROP_PERSISTENT_MEM_SIZE;
customGraphProp.property = persistentSize;

QnnGraph_Property_t graphProp;
graphProp.option         = QNN_GRAPH_PROPERTY_OPTION_CUSTOM;
graphProp.customProperty = &customGraphProp;

QnnGraph_Property_t *graphPropPtrs[2] = {0};
graphPropPtrs[0] = &graphProp;

QnnGraph_getProperty(graphHandle, graphPropPtrs);

Allocating and Configuring Persistent Memory ¶

QnnLpaiGraph_Mem_t lpaiGraphMem;
lpaiGraphMem.memType = memType;
lpaiGraphMem.size    = persistentSize;
lpaiGraphMem.addr    = persistentBuffer;

QnnLpaiGraph_CustomConfig_t customGraphCfg;
customGraphCfg.option = QNN_LPAI_GRAPH_SET_CFG_PERSISTENT_MEM;
customGraphCfg.config = &lpaiGraphMem;

QnnGraph_Config_t graphConfig;
graphConfig.option       = QNN_GRAPH_CONFIG_OPTION_CUSTOM;
graphConfig.customConfig = &customGraphCfg;

QnnGraph_Config_t *graphCfgPtrs[2] = {0};
graphCfgPtrs[0] = &graphConfig;

QnnGraph_setConfig(graphHandle, (const QnnGraph_Config_t **)graphCfgPtrs);

IO Memory ¶

IO memory contains all graph input and output tensors.

Key Properties ¶

Can be user-provided or mapped into scratch memory by default.
User-provided IO buffers must follow alignment requirements.
Must remain valid during graph execution.

Querying IO Memory Requirements ¶

// QnnSystemInterface is defined in ${QNN_SDK_ROOT}/include/QNN/System/QnnSystemInterface.h
QnnSystemInterface qnnSystemInterface;

// Init qnn system interface ......
// See ${QNN_SDK_ROOT}/examples/QNN/SampleApp/SampleAppLPAI code
// Extract QNN binaryInfo
const QnnSystemContext_BinaryInfo_t* binaryInfo;
Qnn_ContextBinarySize_t binaryInfoSize;
qnnSystemInterface->systemContextGetBinaryInfo(qnnSystemCtxHandle,
                                               contextBinaryBuffer,
                                               contextBinaryBufferSize,
                                               &binaryInfo,
                                               &binaryInfoSize);
// Extract graph info from QNN binaryInfo, assume only one graph in the context
QnnSystemContext_GraphInfo_t* graphInfos = binaryInfo->contextBinaryInfoV1.graphs;
QnnSystemContext_GraphInfo_t* graphInfo  = &(graphInfos[0]);

// Extract tensor info from graphInfo
Qnn_Tensor_t* inputs     = graphInfo->graphInfoV1.graphInputs;
Qnn_Tensor_t* outputs    = graphInfo->graphInfoV1.graphOutputs;
size_t numInputs         = graphInfo->graphInfoV1.numGraphInputs;
size_t numOutputs        = graphInfo->graphInfoV1.numGraphOutputs;

Allocating and Configuring IO Memory ¶

// Qnn_Tensor_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnTypes.h
Qnn_Tensor_t tensors[numTensors];

size_t startAddrAlignment, sizeAlignment
// Retrieve buffer start address and size alignment requirements
// See ${QNN_SDK_ROOT}/examples/QNN/SampleApp/SampleAppLPAI code

for (uint32_t i = 0; i < numTensors; i++) {
   Qnn_Tensor_t* tensor = &tensors[i];
   tensor->v1.memType   = QNN_TENSORMEMTYPE_RAW;
   int dataSize         = calculate_tensor_size(qnnTensor->v1);
   tensor->v1.clientBuf.data =
      allocate_aligned_memory(startAddrAlignment, sizeAlignment, dataSize);
   tensor->v1.clientBuf.dataSize = dataSize;
}

Shared Buffers in the LPAI Backend ¶

In the LPAI backend, shared buffers offer an efficient mechanism for moving data between the host CPU and the LPAI accelerator without requiring additional memory copies. Shared buffers allow both domains to reference the same underlying memory, enabling:

Zero-copy tensor transfers
Reduced latency during graph execution
Avoidance of redundant CPU-to-accelerator buffer duplication
Improved overall memory efficiency

Shared buffers are especially valuable when frequently updating input tensors or retrieving output tensors at high frame rates.

The following tutorial explains how to register and use shared buffers within the LPAI backend, covering the required API calls and expected memory constraints:

Allocate and Use Shared Buffers

Memory Lifetime and Allocation Requirements ¶

Scratch and persistent memory must be allocated and provided before QnnGraph_finalize().
Persistent memory must remain accessible for the entire lifetime of the LPAI context.
Scratch memory may be replaced dynamically but must always exist.
IO memory must remain valid throughout execution.

Recommended Workflow ¶

Query backend alignment requirements.
Query scratch memory size.
Query persistent memory size.
Allocate aligned memory buffers.
Pass scratch and persistent memory to the graph using QnnGraph_setConfig().
Call QnnGraph_finalize().
Optionally provide user-defined IO buffers.
Execute the graph.