QNN LPAI Memory Management¶
This document describes how the QNN Low-Power AI (LPAI) runtime uses and manages memory. The runtime relies on user-allocated buffers that must obey backend-provided alignment constraints. Incorrect alignment or insufficient memory will cause initialization or execution failures.
Overview of Memory Types¶
The LPAI runtime uses three distinct memory pools, each required for correct graph execution:
Each type has unique allocation rules, lifetime characteristics, and backend alignment requirements.
Scratch Memory: temporary and overwriteable tensors.
Persistent Memory: long-lived tensors such as RNN state.
IO Memory: input/output tensors; may be user-provided or automatically placed into scratch memory.
All memory pools must be correctly aligned according to backend requirements.
Get Memory Alignment Requirements¶
Before allocating any memory, clients must retrieve backend alignment constraints. These constraints apply to:
Scratch memory
Persistent memory
User-provided IO buffers
To query backend alignment requirements:
1QnnLpaiBackend_BufferAlignmentReq_t bufferAlignmentReq;
2
3QnnLpaiBackend_CustomProperty_t customBackendProp;
4customBackendProp.option = QNN_LPAI_BACKEND_GET_PROP_ALIGNMENT_REQ;
5customBackendProp.property = &bufferAlignmentReq;
6
7QnnBackend_Property_t backendProp;
8backendProp.option = QNN_BACKEND_PROPERTY_OPTION_CUSTOM;
9backendProp.customProperty = &customBackendProp;
10
11QnnBackend_Property_t *backendPropPtrs[2] = {0};
12backendPropPtrs[0] = &backendProp;
13
14QnnBackend_getProperty(backendHandle, backendPropPtrs);
15
16if (!error) {
17 *startAddrAlignment = bufferAlignmentReq.startAddrAlignment;
18 *sizeAlignment = bufferAlignmentReq.sizeAlignment;
19}
Scratch Memory¶
Scratch memory holds temporary intermediate results that the runtime can overwrite and reuse during execution.
Key Properties¶
Used for intermediate tensors across graph execution.
Fully memory-planned offline by the backend.
Size must be queried from the graph.
Must be provided before
QnnGraph_finalize().May be replaced at runtime but must always exist.
Querying Scratch Memory Requirements¶
QnnLpaiGraph_CustomProperty_t customGraphProp;
customGraphProp.option = QNN_LPAI_GRAPH_GET_PROP_SCRATCH_MEM_SIZE;
customGraphProp.property = scratchSize;
QnnGraph_Property_t graphProp;
graphProp.option = QNN_GRAPH_PROPERTY_OPTION_CUSTOM;
graphProp.customProperty = &customGraphProp;
QnnGraph_Property_t *graphPropPtrs[2] = {0};
graphPropPtrs[0] = &graphProp;
QnnGraph_getProperty(graphHandle, graphPropPtrs);
Allocating and Configuring Scratch Memory¶
QnnLpaiGraph_Mem_t lpaiGraphMem;
lpaiGraphMem.memType = memType;
lpaiGraphMem.size = scratchSize;
lpaiGraphMem.addr = scratchBuffer;
QnnLpaiGraph_CustomConfig_t customGraphCfg;
customGraphCfg.option = QNN_LPAI_GRAPH_SET_CFG_SCRATCH_MEM;
customGraphCfg.config = &lpaiGraphMem;
QnnGraph_Config_t graphConfig;
graphConfig.option = QNN_GRAPH_CONFIG_OPTION_CUSTOM;
graphConfig.customConfig = &customGraphCfg;
QnnGraph_Config_t *graphCfgPtrs[2] = {0};
graphCfgPtrs[0] = &graphConfig;
QnnGraph_setConfig(graphHandle, (const QnnGraph_Config_t **)graphCfgPtrs);
Persistent Memory¶
Persistent memory stores intermediate tensors that cannot be overwritten, because they must persist across operations. Examples include RNN state tensors.
Key Properties¶
Holds long-lived intermediate data.
User must allocate memory after querying required size.
Must follow backend alignment constraints.
Must remain valid until
QnnContext_free().
Querying Persistent Memory Requirements¶
QnnLpaiGraph_CustomProperty_t customGraphProp;
customGraphProp.option = QNN_LPAI_GRAPH_GET_PROP_PERSISTENT_MEM_SIZE;
customGraphProp.property = persistentSize;
QnnGraph_Property_t graphProp;
graphProp.option = QNN_GRAPH_PROPERTY_OPTION_CUSTOM;
graphProp.customProperty = &customGraphProp;
QnnGraph_Property_t *graphPropPtrs[2] = {0};
graphPropPtrs[0] = &graphProp;
QnnGraph_getProperty(graphHandle, graphPropPtrs);
Allocating and Configuring Persistent Memory¶
QnnLpaiGraph_Mem_t lpaiGraphMem;
lpaiGraphMem.memType = memType;
lpaiGraphMem.size = persistentSize;
lpaiGraphMem.addr = persistentBuffer;
QnnLpaiGraph_CustomConfig_t customGraphCfg;
customGraphCfg.option = QNN_LPAI_GRAPH_SET_CFG_PERSISTENT_MEM;
customGraphCfg.config = &lpaiGraphMem;
QnnGraph_Config_t graphConfig;
graphConfig.option = QNN_GRAPH_CONFIG_OPTION_CUSTOM;
graphConfig.customConfig = &customGraphCfg;
QnnGraph_Config_t *graphCfgPtrs[2] = {0};
graphCfgPtrs[0] = &graphConfig;
QnnGraph_setConfig(graphHandle, (const QnnGraph_Config_t **)graphCfgPtrs);
IO Memory¶
IO memory contains all graph input and output tensors.
Key Properties¶
Can be user-provided or mapped into scratch memory by default.
User-provided IO buffers must follow alignment requirements.
Must remain valid during graph execution.
Querying IO Memory Requirements¶
// QnnSystemInterface is defined in ${QNN_SDK_ROOT}/include/QNN/System/QnnSystemInterface.h
QnnSystemInterface qnnSystemInterface;
// Init qnn system interface ......
// See ${QNN_SDK_ROOT}/examples/QNN/SampleApp/SampleAppLPAI code
// Extract QNN binaryInfo
const QnnSystemContext_BinaryInfo_t* binaryInfo;
Qnn_ContextBinarySize_t binaryInfoSize;
qnnSystemInterface->systemContextGetBinaryInfo(qnnSystemCtxHandle,
contextBinaryBuffer,
contextBinaryBufferSize,
&binaryInfo,
&binaryInfoSize);
// Extract graph info from QNN binaryInfo, assume only one graph in the context
QnnSystemContext_GraphInfo_t* graphInfos = binaryInfo->contextBinaryInfoV1.graphs;
QnnSystemContext_GraphInfo_t* graphInfo = &(graphInfos[0]);
// Extract tensor info from graphInfo
Qnn_Tensor_t* inputs = graphInfo->graphInfoV1.graphInputs;
Qnn_Tensor_t* outputs = graphInfo->graphInfoV1.graphOutputs;
size_t numInputs = graphInfo->graphInfoV1.numGraphInputs;
size_t numOutputs = graphInfo->graphInfoV1.numGraphOutputs;
Allocating and Configuring IO Memory¶
// Qnn_Tensor_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnTypes.h
Qnn_Tensor_t tensors[numTensors];
size_t startAddrAlignment, sizeAlignment
// Retrieve buffer start address and size alignment requirements
// See ${QNN_SDK_ROOT}/examples/QNN/SampleApp/SampleAppLPAI code
for (uint32_t i = 0; i < numTensors; i++) {
Qnn_Tensor_t* tensor = &tensors[i];
tensor->v1.memType = QNN_TENSORMEMTYPE_RAW;
int dataSize = calculate_tensor_size(qnnTensor->v1);
tensor->v1.clientBuf.data =
allocate_aligned_memory(startAddrAlignment, sizeAlignment, dataSize);
tensor->v1.clientBuf.dataSize = dataSize;
}
Memory Lifetime and Allocation Requirements¶
Scratch and persistent memory must be allocated and provided before
QnnGraph_finalize().Persistent memory must remain accessible for the entire lifetime of the LPAI context.
Scratch memory may be replaced dynamically but must always exist.
IO memory must remain valid throughout execution.
Recommended Workflow¶
Query backend alignment requirements.
Query scratch memory size.
Query persistent memory size.
Allocate aligned memory buffers.
Pass scratch and persistent memory to the graph using
QnnGraph_setConfig().Call
QnnGraph_finalize().Optionally provide user-defined IO buffers.
Execute the graph.