Revision History

This page contains the change log revision history starting from QAIRT SDK v2.34.0. For details on earlier releases, please refer to ReleaseNotes.txt in QNN_SDK_ROOT for QNN revision history.

Version

Date

Description

2.44.0

Feb 2026

  • Core: Added –deferred_init support to SNPE libs and snpe-net-run. {158344}

  • CPU: Enhanced MatMul kernel reliability on ARM Cortex-A55 cores by rerouting to an optimized implementation. {165001}

  • Genie: Added the genie-app source code as an example in the SDK. {123560}

  • OpDef: Added Op definition for ElementWiseMux. {166667}

  • API:Genie: Added GenieNode_train and GenieNode_saveLora APIs. {155439}

  • Op:HTP: Added support for the ONNX RotaryEmbedding (RoPE) Op. {147230}

  • Op:HTP: Enhanced PoolMax2d Op edge-window handling when rounding_mode is set to ceil. {160577}

  • Tool:Converter: Added a new graph optimization for the GroupNorm Op that reshapes wide tensors to a more performant format, improving HTP performance. {159606}

  • Tool:Converter: Implemented a new remove_disconnected_nodes optimization pass to automatically prune unused nodes from the graph, which can help reduce model size. {158568}

  • Genie: Fixed an issue where the genie-app tool could return a success code even if an underlying Genie API call failed. {166170}

  • Genie: Fixed issue where the token query API would always fail for dialogs with embedding LUT encoders. {167327}

  • GPU: Fixed graph prepare failures on certain Adreno GPU tiers caused by incorrect compatibility checks for image2darray. {165815}

  • HTP: Fixed a timeout in multi-threaded scenarios caused by a resource hang during cooperative pre-emption. {164978}

  • HTP: Fixed an issue where grouped LoRA adapter application failed on WoS. {161545}

  • LPAI: Fixed incorrect per-layer profiling time data caused by timestamp truncation. {165829}

  • SDK: Fixed an issue in the QNN sample app where graph finalization was unconditionally called on deserialized graphs, causing failures on backends that do not support this step. {163438}

  • SNPE: Fixed a crash that occurred when running the SNPE sample application on certain hardware targets. {165960}

  • Op:HTP: Fixed a context binary creation failure caused by missing INT16 to INT32 casting support in the Cast Op. {152878}

  • Op:HTP: Fixed an accuracy issue in the GatherND kernel. {163397}

  • Tool:Converter: Fixed a data type inference issue affecting multiple Ops during conversion, particularly when using 16-bit float precision. {164079}

  • Tool:Converter: Fixed an issue in the FoldMultipleTranspose pass where consecutive Transpose Ops were incorrectly pruned when their combined permutation was not an identity. {164340}

  • Tool:Converter: Fixed an issue in the matmul_to_fc optimization where the buffer shape of the bias was not correctly updated, causing conversion failures. {164520}

  • Tool:Converter: Fixed an issue in the Cast Op translation where BF16 data types were not handled correctly. {164517}

  • Tool:Converter: Fixed an issue where a graph output was incorrectly pruned by the remove-disconnected-nodes pass after LSTM unrolling. {166306}

  • Tool:Converter: Fixed an issue with Concat Op quantization where its output bit-width was incorrectly calculated, leading to model conversion failures. {159401}

  • Tool:Converter: Resolved an issue where model conversion would fail during BatchNormalization Op validation for some BF16 models. {165682}

  • Tool:Quantizer: Fixed a regression in a graph optimization pass that caused model conversion failures for models with a Transpose op of 7 or more dimensions with non-consecutive axes. {164567}

2.43.0

Jan 2026

  • API:HTP: Introduced a new Python API, tuner.optimize, to enable compiler option tuning to control tiling granularity for HTP backends. {127018}

  • CPU: Support RoPE OP in CPU Backend {147231}

  • Docs: Added documentation with instructions for executing models on the LPAI backend on Windows on Snapdragon (WoS) platforms. {156078}

  • Genie: Added support for weight-shared LoRA adapters via the weight-shared-lora JSON configuration option. {155687}

  • Genie: Added the GenieDialog_getValue API and GENIE_DIALOG_PARAM_CONTEXT_OCCUPANCY option. {145722}

  • Genie: Added the dialog rewindQuery and dialog setStopSequence commands to the genie-app tool. {163223}

  • Op:HTP: Added support for 5D GeLU Op for FP16 and FP32 data types on the HTP backend. {138861}

  • Op:LPAI: Added support for 32-bit integer data types for the Split and Concat Ops on the v5 platform. {164355}

  • Op:LPAI: Added support for 32-bit integer data types for the Split and Concat Ops on the v6 platform. {164353}

  • Tool:Converter:ONNX: Added support for BF16 datatype in Converter. {151133}

  • Tool:Genie: Added the ability to write output to a file for pipeline execute commands in the genie-app tool. {162608}

  • Tool:Quantizer: Added support for Q2_K, Q3_K, Q4_K, Q5_K, Q6_K, and Q8_0 GGUF data types for quantization on the HTP backend. {129186}

  • Genie: Fixed a segmentation fault that could occur when using the save/restore KV cache feature with the GPU backend. {158549}

  • Genie: Fixed an issue in the QNN GenAiTransformer engine’s GGUF file reader that could cause parsing errors and segmentation faults. {163156}

  • HTP: Fixed an accuracy regression for the ReduceSum Op when using FP16 precision on certain hardware targets. {163097}

  • Op:CPU: Added support for INT8, UINT8 Support in CPU FP32 Transpose Execution {155986}

  • Op:HTP: Added support for casting from INT16 to INT32 data types for the Cast Op on the HTP backend. {152878}

  • SDK: Fixed an issue in the SNPE Android sample APK where repeatedly building a network with User-Defined Operations (UDOs) could cause a crash or an “Invalid OpPackage” error. {163340}

  • Tool:Converter: Added support for the remove_unused_inputs parameter to the converter Python API, matching existing command-line functionality. {151089}

  • Tool:Converter: Fixed a model context saving failure by ensuring quantization encodings are preserved for the RMSNorm gamma parameter when it has multiple consumers. {154731}

  • Tool:Converter: Fixed an accuracy issue where quantization encodings were dropped during optimizations involving a static Slice op. {154605}

  • Tool:Converter: Fixed an issue in the Logit op’s data type inference that could cause op validation failures on the backend, particularly when using a separate quantization step. {153054}

  • Tool:Converter: Fixed an issue in the ONNX converter where an incorrect graph optimization would fold a Mul -> Softmax pattern, causing accuracy degradation in certain models. {116260}

  • Tool:Converter: Fixed an issue where the qairt-lora-adapter-bin-updater tool would fail when processing models that did not contain multiple graph splits. The tool’s logic for determining graph execution order now correctly handles models with a single graph. {161292}

  • Tool:Quantizer: Fixed a regression in a graph optimization pass that caused model conversion failures for models with a Transpose op of 7 or more dimensions with non-consecutive axes. {164567}

2.42.0

Dec 2025

  • API:HTP: Added support for a monolithic LSTM feature, configurable via a new graph option, to improve performance for certain LSTM model structures. {146918}

  • API:SNPE: Added new HTP backend option, monolithic_lstm (default: false). This flag can be controlled during offline preparation with snpe-dlc-graph-prepare or during online preparation through platform configuration options. {161369}

  • Docs: Added documentation for LPAI core control. {155151}

  • Docs: Updated the LPAI Sample App tutorial to remove an incorrect signing step for Android artifacts that could break the executables. {158732}

  • Genie: Added GenieDialog_tokenQuery support in genie-app. {135259}

  • Genie: Added a configuration option for controlling the KeyDiff anchor weight. {146125}

  • Genie: Added look-ahead decoding dialog support for GenieDialog_tokenQuery. {160627}

  • Genie: Added support for 16 KB page sizes in the Android sample application build to ensure compatibility with Android 15. {160693}

  • Genie: Added support for YaRN RoPE. {138733}

  • Genie: Added support for cross-attention. {154798}

  • Genie: Added support for linear RoPE. {157494}

  • Op:CPU: Resolved an issue that caused mismatched output for the FullyConnected Op when using a dynamic bias by correcting the kernel selection logic. {160281}

  • Op:HTP: Added support for 3D inputs for the LSTM Op. {143117}

  • Op:HTP: Added support for the INT16 data type for the ElementWiseDivide Op. {152779}

  • Op:HTP: Added support for the INT16 data type for the ElementWiseMaximum Op. {152780}

  • SDK: Updated the QNN sample apps to optionally compose a graph from a DLC file. {157809}

  • Tool: The qnn-context-binary-generator tool now includes a new profiling event to capture the time taken for finalizing a graph after tensor updates. {156608}

  • Tool:Converter: Added layout transformation support for IsInf, Convert, Dequantize, Quantize, CombinedNms, and Unpack Ops. {147648}

  • Tool:Converter: Added support for using the Quantizer v2 engine for calibration via the –use_quantize_v2 flag in qnn-onnx-converter. {142674}

  • Tool:Converter: During quantization, the Buffer Op will now copy the quantization encodings from its input to its output to ensure consistency. {153288}

  • CPU: Fixed a memory leak on the CPU backend by ensuring that memory allocated by the underlying cpuinfo library is properly de-initialized. {147345}

  • Core: Fixed a race condition during the de-initialization sequence that could occur when multiple SNPE instances were used in different threads. {161975}

  • Genie: Fixed a bug in perplexity calculations with FP32 models or tokenized inputs. {154447}

  • Genie: Fixed an issue that caused a build error in the sample code on Windows. {160178}

  • Genie: Fixed an issue where grouped LoRA adapters would fail to be applied. {154778}

  • HTP: Fixed a Windows-specific execution failure for low-priority graphs that occurred when burst mode, high-performance mode, or RPC polling was enabled. {142747}

  • HTP: Fixed a bug in the HTP backend where an ArgMax kernel could return an uninitialized index value under certain data conditions. {159115}

  • HTP: Improved window partition operations and multi batch single headed attention in some transformer models. {148501}

  • HTP: Fixed prepare failure caused by large broadcast op. {157143}

  • Op: Added 6D Support for Int32 Mul and 5D Int32 Pow with Broadcasting {159296}

  • Op:CPU: Fixed an accuracy issue with the ReluMinMax Op on ARMv7 (32-bit) devices. {155120}

  • Op:CPU: Fixed an issue in the ElementWiseMultiply Op. {160547}

  • Op:HTP: Fixed a graph finalization failure for 6D StridedSlice when the batch dimension equals one. {159012}

  • Op:HTP: Fixed an issue with the conversion of 5D PReLU to 4D for FP16 models. {158971}

  • Tool:Converter: Enabled the MatMul + Add fusion for the LPAI backend, which squashes the pattern into a single MatMul Op with bias. {155659}

  • Tool:Converter: Fixed a TFLite model conversion failure caused by incorrect handling of quantization offsets in pre-quantized models. {157117}

  • Tool:Converter: Fixed a model context saving failure by ensuring quantization encodings are preserved for the RMSNorm gamma parameter when it has multiple consumers. {154731}

  • Tool:Converter: Fixed an ‘index out of bounds’ error in the fold-concat graph optimization pass. {148528}

  • Tool:Converter: Fixed an issue where conversion of GGUF models failed in context binary due to incorrect datatype and quant schema. {156493}

  • Tool:Converter: Fixed an issue where using the –preserve_io flag with a quantized ONNX model could cause an unexpected Convert Op to be added to the graph output. {137974}

  • Tools:Converter: Fixed a qairt-quantizer dequantization failure when the graph output tensor has consumers. {152814}

2.41.0

Nov 2025

  • SNPE Core: Added support for new profiling level “qhas” for advanced HTP chrometrace profiling. Workflow documented under Benchmarking and Accuracy, Benchmarking, QHAS profile. {144800}

  • SNPE Core: Added the –use_native_input_files argument to snpe-throughput-net-run similar to snpe-net-run {149784}

  • API: Added support for the BF16 datatype to the core QNN API. {151153}

  • API:Genie: Added a new C/C++ API (GenieAccuracy.h) for calculating accuracy metrics for models. {115211}

  • Genie: Added multimodal RoPE support. {145055}

  • HTP: Added support for the BF16 datatype for runtime execution on the HTP backend. {152435}

  • HTP: Improved performance of the MobileBERT model. {155606}

  • Op:HTP: Added support for converting tensors from sfxp16 to fp16 datatype within the Convert Op on the HTP backend. {157470}

  • QNN Core: Fixed qnn-net-run exit error logs related to destroying power config id on some platforms {146296}

  • SNPE Core: Fixed a backward compatibility issue where using a cached model with a resized input dimension could lead to incorrect predictions. {144435}

  • SNPE DSP: Fixed an issue where models with a 5D Split Op would fail on the DSP backend. {151456}

  • API:Python: Fixed an issue in the LoRA model transformation process where the optimizer would cause a KeyError when searching for LoRA encodings in a model split that contained no updatable tensors. {157941}

  • CPU: Resolved an accuracy issue with the MatMul Op for quantized models on certain targets. {149047}

  • Core: Resolved an issue in qnn-net-run where continuous profiling for the LPAI backend could incorrectly report a minimum execution time of zero. {153294}

  • DSP: Fixed a bug in the DSP backend that caused model execution to fail for certain depthwise convolution patterns that had zero left padding. {150291}

  • GPU: Resolved an inference failure in Conv2d on some targets. {156699}

  • GenAiTransformer: Enabled log support for custom ops. {147362}

  • Genie: Fixed an issue that could cause poor-quality output from certain 8-bit quantized decoder models by improving the detection logic for the prefill stage of the decoder. {156683}

  • Genie: Fixed issue where the HTP engine does not mmap serialized binaries when the use-mmap flag is enabled. {157330}

  • Genie: Fixed issue with example code build on Windows platforms. {154894}

  • HTP: Fixed a LoRA-related checksum issue that occurred during context binary generation for models with a QINT32 zero-point bias. {155550}

  • HTP: Fixed a segmentation fault that could occur during context binary generation when using the concurrent_deserialize_patch=measure option for very small graphs. {153476}

  • HTP: Fixed an accuracy issue with the Dequantize Op when converting from uint8 to fp16 on the HTP backend. {154787}

  • HTP: Fixed an issue where context binary generation would fail for models with auxiliary graphs. {155944}

  • HTP: Improved the qnn-context-binary-generator tool by adding an informational log message. {137064}

  • HTP: Resolved a rare race condition that could cause a crash when executing graphs in asynchronous mode across multiple threads. {152851}

  • Op:CPU: Improved performance of the FullyConnected Op. {152835}

  • Op:CPU: Resolved an integer overflow issue in the tensor size calculation for the MatMul Op, which could cause execution failures. {152348}

  • Op:HTP: Added support for fusing Conv3D with Relu and ElementWiseNeuron operations on the HTP backend. {152639}

  • Op:HTP: Added support for the Conv3D`+`ReLU supergroup (fused operation) on the HTP backend. This resolves converter errors that previously occurred when quantizing models with this pattern. {148609}

  • Op:HTP: Improved the accuracy of the FP16 LogSoftmax Op for inputs with large variance. {148540}

  • Op:HTP: Resolved a corner-case failure during the creation of GroupedConv2D layers. {156026}

  • Op:HTP: Resolved a numerical accuracy issue with the Pad Op on the HTP backend. {154337}

  • Op:HTP: Resolved an execution failure for stateful FP16 LSTM models when the reset tensor was null. {146825}

  • SDK: Fixed a compilation error in the LPAI sample application for Android builds. {149435}

  • SDK: Improved the performance of the qnn-model-lib-generator tool on Windows. {136223}

  • SDK: Resolved an issue where an asynchronous graph execution failure could lead to incorrect error codes and incomplete resource cleanup. {156488}

  • Tool: Fixed a bug in the model preparation step where custom Op detection would fail. {154956}

  • Tool: Resolved an issue where the qairt-dlc-info tool failed on DLC files that were updated using qnn-context-binary-generator. {153977}

  • Tool: The qairt-lora-model-creator tool now correctly handles LoRA weight shapes when attaching to a Grouped Convolution layer. {148214}

  • Tool:Converter: Fixed an issue in the TFLite converter where fused activations for TransposeConv2D and TransposeConv3D Ops were not being handled correctly. {150806}

  • Tool:Converter: Resolved a GRU quantization issue by ensuring the Op uses its specific quantization logic instead of a common default. {151608}

  • Tool:Converter: Resolved a LSTM related accuracy issue that caused by wrong schema setting in quantization optimization module. {155539}

  • Tool:Converter: Resolved an issue where the output layout was not preserved when using the –preserve_io option for ONNX models ending with a Softmax Op. {150766}

2.40.0

Oct 2025

  • API:SNPE: Added two new HTP backend options, advanced_activation_fusion (default: true) and high_precision_sigmoid (default: false). These flags can be controlled during offline preparation with snpe-dlc-graph-prepare or during online preparation through platform configuration options. {153694}

  • CPU: Added support for context binary support in QNN CPU. {142754}

  • Genie: Aligned the attention mask behavior for Sliding Window Attention layers with Hugging Face implementations to improve model compatibility. {151861}

  • HTP: Added Support for multi-graph switch on HNRD {147026}

  • SNPE: Added new low level perf voting corners for HTP TURBO_L4 and L5 {152587}

  • SNPE: Added support for QMX in CPU runtime via new SNPE Builder API - SNPEBuilder::setCpuQmxMode() / Snpe_SNPEBuilder_SetCpuQmxMode(). A new flag –enable_cpu_qmx has been added to net-run apps as well. {150785}

  • Tool:Converter: Added support for signed asymmetric quantization in the ONNX converter and quantizer. {111082}

  • Tool:Quantizer: Added support for Quantizer v2 for calibration in the QNN Quantizer tool, enabled via the –use_quantize_v2 flag. {144453}

  • Tool:qairt-accuracy-debugger: Enhanced the accuracy debugger to allow comparison between two different backend configurations. This feature uses the one-shot algorithm and supports starting from pre-compiled DLC files. {149818}

  • Add advanced_activation_fusion flag to enable/disable the few activations fusion with preceding convolution layer. {147374}

  • QNN Core: Fixed memory leaks in qnn-throughput-net-run during cleanup phase. {145228}

  • API: Resolved an IR graph serialization failure that occurred when converting certain LoRA models using the Python API. {151851}

  • CPU: Added support for convert op for float input and output datatypes. {146089}

  • CPU: Fix the LSTM flow to save the input_gate, forget_gate, cell_gate, output_gate, and hidden_state, values from scratch to dedicated output memory. {149330}

  • CPU: Fixed padding for count_pad_for_edges param {143281}

  • CPU: Fixed the cell gate calculation to preserve scratch memory for final LSTM output. {149464}

  • Core: Fixed a bug in snpe/qairt-dlc-diff tool when –compare_layers argument is passed. {150517}

  • Core: Resolved warnings related to the priority management library in HTP non-RPC mode by ensuring the correct API symbol is used. {152714}

  • Core: The qairt-dlc-info and snpe-dlc-info tools now list input tensors in an order that follows the topological sort of the graph. {147076}

  • GPU: Resolved inference failures in models having AvgPool Op on QCM2290 {150565}

  • Genie: Fixed a memory deallocation issue that could cause a crash when executing a model with memory-mapped tensors enabled. {149468}

  • HTP: Added an option to disable Conv+Activation fusion to resolve accuracy issues with the GRU Op in certain models. {148860}

  • HTP: Fixed a bug that prevented some models from executing for multiple iterations in qnn-net-run or qnn-throughput-netrun. {145419}

  • HTP: Fixed a logging issue where a failed context deserialization was incorrectly reported as successful in verbose logs. {148549}

  • HTP: Fixed an issue related to a data type mismatch error during graph preparation. {152793}

  • HTP: Resolved an issue that caused model failures when using weight-sharing with multicore configurations due to an incorrect calculation of shared weight blobs. Recommend combination is udma=on + weight-sharing + multicore without lora. {154732}

  • HTP: Resolved an issue where a signed PD session was not correctly configured during backend initialization, ensuring it is enabled as expected. {152031}

  • Op:CPU: Added support for signed 8-bit fixed-point weights in the FullyConnected Op. {149602}

  • Op:CPU: Fixed an issue in the Division Op to prevent potential division-by-zero with certain quantized inputs. {150626}

  • Op:CPU: Fixed an issue in the Relu Op implementation for Armv7 that caused incorrect outputs for uint8 quantized models. {149048}

  • Op:GPU: Resolved a performance regression for the Convolution Op on QCS8250. {139732}

  • Op:HTP: Fixed an accuracy issue for certain Concat operations that follow a Gather Op. {153063}

  • QNN: Reduced verbosity by suppressing ‘Bad quantization: zero scale!’ log messages, improving terminal readability. {123651}

  • Tool: Fixed an issue in the quantizer where the encoding offset could go out of range. {153153}

  • Tool: Resolved a Python error in the quantizer that occurred during an optimization pass by ensuring correct encoding information is used for activations. {153154}

  • Tool:Converter: Added support for converting the GroupNorm layer from TensorFlow models. {146973}

  • Tool:Converter: Fixed a bug in the Where Op translation that produced an incorrect output shape when the inputs were broadcastable but had different shapes. {143835}

  • Tool:Converter: For the LPAI backend, disabled the matmul_to_fc optimization and the automatic insertion of Convert Ops before Matmul to better support mixed-precision models and avoid performance issues. {152658}

  • Tool:Converter: Resolved a segmentation fault that occurred during model conversion for certain backends by implementing different schema selection strategies. {153271}

  • Tool:Converter: Resolved an accuracy regression that caused incorrect outputs in some Generative AI models by reverting a change related to asymmetric quantization for signed data types. {154206}

  • Tool:Converter: Resolved an issue where a graph’s output tensor could be incorrectly removed because squashing of producer. {152011}

  • Tool:Converter: Resolved an issue where models with Conv2d ops failed on the HTP backend due to unsupported input or output data types. {153277}

  • Tool:Converter: Resolved an issue where the LayerNorm Op failed validation due to an unsupported data type. {153276}

  • Tool:Converter: The qairt-lora-model-creator tool no longer restricts the quantization bitwidth of tensors in the LoRA branch. {148088}

  • Tool:Converter:ONNX: Added 2 patterns mapping to RMS Norm to avoid ops falling back to float during quantization {149931}

2.39.0

Sep 2025

  • API:Genie: Added the GenieDialog_embeddingTokenQuery API. {148803}

  • API:Genie: Added the GenieDialog_setMaxNumTokens API. {146820}

  • API:HTP: Added a new HTP-specific property to support a detachable buffers feature. {148227}

  • API:HTP: Enhanced profiling capabilities to expose detailed timing information for each component during the graph preparation phase (QnnGraph_finalize). {143804}

  • API:HTP: Implemented a feature allowing read-only weights buffers to be detached and unmapped. {141354}

  • API:HTP: Introduced new APIs and configuration options to support a detachable buffers feature. {143832}

  • API:SNPE: Added new builder API for enabling accelerated HTP inititialization with a pre-prepared cache Snpe_SNPEBuilder_SetAcceleratedInit() / SNPEBuilder::setAcceleratedInit(). Support also added to snpe-net-run, snpe-throughput-net-run and snpe-parallel-run via cmd line argument –enable_htp_accelerated_init. {149873}

  • Docs: Updated documentation for qairt-accuracy-debugger to include support for the Windows on Snapdragon (WoS) platform, including updated help sections and sample commands. {149286}

  • Docs: Updated the LPAI documentation to include a summary of the required steps for model preparation. {142076}

  • Genie: Added new profiling option for collecting detailed trace events. {133638}

  • Genie: Added the GENIE_STATUS_ERROR_CONTEXT_EXCEEDED error code to provide a specific status when a prompt exceeds the model’s context length limit. {145721}

  • HTP: Added support for multi-graph switching, which allows multiple graphs to be loaded and retained in memory simultaneously. {139603}

  • HTP: Added support for several operator fusion patterns on the HTP backend, including combinations like Conv-Relu and Conv-Batchnorm-HardSwish. {125633}

  • HTP: Added support for the BFloat16 data type by including the necessary header and definitions in the HTP backend. {140994}

  • HTP: Minor performance improvement for benchmark models. {147751}

  • LPAI: Fixed an issue where the quantization process would incorrectly modify the offset specified in a quant.json file. {145916}

  • LPAI: Resolved an accuracy issue with audio context detection models on the LPAI backend. The issue was caused by incorrect bias quantization settings for convolution and GEMM operations. {146710}

  • Op:GPU: Added support for QNN_DATATYPE_INT_32 inputs to StridedSlice op. {142629}

  • Op:HTP: Added support for 6D variants of Cast, GatherElements, Pad, and StridedSlice with certain constraints. For GatherElements, input and index shapes must match except along the axis dimension. For Pad, padding is limited to dimensions 5D or smaller. For StridedSlice, slicing is limited to dimensions 5D or smaller, and some axis parameters are not supported. {147157}

  • Op:HTP: Enabled support for the SFIXED_POINT_16 data type for the Sqrt Op in QNN HTP Op validation flow. {142710}

  • OpDef: Added support for the RandomUniformLike Op. This includes the ONNX to QNN IR translation in the converter and the backend implementation. {138616}

  • OpDef: Updated the NonZero Op definition to clarify that it outputs -1 for padded values in static shapes. Also updated Gather and Scatter Ops to restrict index tensors to non-negative values, allowing -1 only as a sentinel value for indices generated from other Ops. {142505}

  • QNN: TFLite Delegate: Added support for the Broadcast_to Op. {149782}

  • Tool: Added native support for WoS to the Accuracy Evaluator tool. This includes updates to handle platform-specific file paths and resolves a file permission error in the SQuAD evaluation script on Windows. {136566}

  • Tool: Added support for multi-graph switching in qnn-net-run and qnn-throughput-net-run via the new custom configuration option graphs_retention_order. {145979}

  • Tool: Enabled support for the Windows on Snapdragon (WoS) platform in the accuracy debugger. Users can now debug models on WoS using both the CLI and Python API interfaces. {147963}

  • Tool:Converter: Added reference implementations for static tensor manipulation Ops, including Add, Mul, Sub, Div, Transpose, and Reshape. {133602}

  • Tool:Converter: Fixed a segmentation fault in qairt-converter that occurred during float fallback for models with external data. {147000}

  • Tool:Converter: Fixed an issue where FP16 constant tensors were not correctly interpreted at the Python layer. {147009}

  • Tool:Converter: Introduced new flags to provide fine-grained control over the IR optimizer passes. {135982}

  • Tool:Converter: RMSNorm node names now use either the common prefix of all matched nodes in the pattern or, if no common prefix exists, the output buffer name of the pattern. This replaces the previous rms_norm_i naming based on topological order. {146838}

  • Tool:Converter: Removed exception handling for 6D tensors in the converter. {144599}

  • API:HTA: Resolved an application crash that occurred when calling the QNN API to get the HTA device infrastructure for performance tuning. {146157}

  • DLC: Fixed issues within the DLC format when per-channel block quantization is employed on a multi-graph DLC. {138853}

  • GPU: Improved performance by updating heuristics for Pooling and Reduction Ops to better utilize hardware resources, addressing inference time regressions on some models. {147242}

  • Genie: Fixed an accuracy bug with cross-layer attention networks when the decoder block is a single context binary. {150908}

  • Genie: Fixed an issue that caused incorrect calculation of KV cache tensor sizes on the HTP backend, which could lead to segmentation faults. {148675}

  • Genie: Fixed an issue where no output was generated for certain models when the prompt prefill phase required multiple graph executions. {145896}

  • HTP: Enabled support for using the ScatterElements Op within LoRA-updatable models. {147845}

  • HTP: Fixed a checksum mismatch error that could occur during graph finalization for models using LoRa. {147901}

  • HTP: Fixed a crash that could occur during long-running stress tests involving VTCM sharing. {148064}

  • HTP: Fixed a graph finalization failure by adjusting the optimization pass order for certain Ops like Split and Unpack. {141064}

  • HTP: Fixed a memory leak that occurred in the HTP backend during repeated inference runs when performance profiling was enabled. {146627}

  • HTP: Fixed an Op package deregistration failure that could occur in specific multi-core use cases. {143977}

  • HTP: Fixed an issue preventing context binary generation for models using LoRA adapters where a MatMul operation of size 16x16 was present. {149711}

  • HTP: Fixed an issue that caused graph finalization failures for certain large models on specific SoCs. {147402}

  • HTP: Fixed an issue that caused incorrect error code translation when writing shared weight buffers. {147793}

  • HTP: Fixed an issue where applying a LoRA adapter binary would fail for multicore scenarios or float-precision graphs. {149995}

  • HTP: Fixed an issue where requesting a signed PD would fail on x86 simulation environments. The configuration is now ignored for x86, as it makes no difference in that context. {145651}

  • HTP: Fixed an occasional VTCM memory allocation error that could occur during context binary generation. {145879}

  • HTP: Optimized performance for a text encoder model by successfully applying MHA-to-SHA transformations, converting MatMuls to Convolutions, and ensuring correct quantization settings. {136947}

  • HTP: Resolved a failure in on-device context binary generation when using custom Ops. {147187}

  • HTP: Resolved an error where applying a LoRA adapter failed with the message “Apply cannot happen as context bin did not have serialized bin.” {149992}

  • HTP: Resolved an issue where using Op packages in multi-threaded applications could cause a QNN_OP_PACKAGE_ERROR_LIBRARY_ALREADY_INITIALIZED error, halting execution. {147431}

  • HTP: Resolved memory leaks observed under specific stress scenarios. {145181}

  • LPAI: Fixed an issue that caused the ADSP driver to fail to load on certain Windows on Snapdragon platforms. {149188}

  • Op:CPU: Fixed the Mod Op to align its calculation with the behavior of standard frameworks. {147060}

  • Op:CPU: Resolved an issue that caused model failures on the CPU backend when a quantized Div Op encountered a zero-valued divisor. {150630}

  • SDK: Optimized specific library functions on Windows by replacing parts of the C++ standard library with native Windows API calls, reducing the overall binary size. {150497}

  • SNPE:DSP: Resolved an issue where executing a model with a UDO package on the DSP backend could fail with a QNN_OP_PACKAGE_ERROR_LIBRARY_ALREADY_INITIALIZED error. {135967}

  • Tool: Fixed an input parsing issue in the ModelModifierArchChecker tool. {144884}

  • Tool: Resolved an issue where qnn-accuracy-debugger would fail with a FileNotFoundError when using a compiled model (–stage compiled). {149891}

  • Tool:Compiler: Fixed an issue in the context binary generator where a SpaceToDepth Op adjacent to a graph input could cause an error. {147548}

  • Tool:Converter: Enabled support for dynamic 16-bit weights by default in qairt-converter and qairt-quantizer. This resolves an issue where an unnecessary Convert Op was inserted for MatMul weights, which previously led to increased model size and reduced accuracy. A new –disable_dynamic_16_bit_weights flag has been added to revert to 8-bit conversion if needed. {147008}

  • Tool:Converter: Fixed a bug in the quantizer where node-squashing logic could fail for nodes that were both a graph output and had inputs with multiple consumers. {136028}

  • Tool:Converter: Fixed a bug that could cause a ‘Duplicate buffer name’ error during certain graph optimizations. {145690}

  • Tool:Converter: Fixed a fatal “access violation” exception that occurred when running the ONNX converter on WoS devices. {149750}

  • Tool:Converter: Fixed an issue with generating quantization encodings for models containing LSTM or GRU layers. {146424}

  • Tool:Converter: Fixed an issue with handling dynamic inputs for the slope tensor in the PReLU Op. {145599}

  • Tool:Converter: Fixed an issue with the LoRA model conversion flow where certain graph optimization passes were not being applied consistently. {150868}

  • Tool:Converter: Fixed incorrect weight broadcasting behavior in the RMSNorm and LayerNorm fusion patterns within the ONNX converter. {124105}

  • Tool:Converter: Resolved an issue where certain graph optimizations could incorrectly remove a tensor that was also a graph output. {150933}

  • Tool:qairt-tool: Added support for Clip,SpaceToDepth,Relu Ops in mha2sha-v2 {149759}

  • Models with very large buffers (~1 GB or more) can abort during execution with “Could not create context from binary” due to FastRPC mapping failures {148198}

2.38.0

Aug 2025

  • API: Generalized the qairt.transform API to support multiple, interchangeable transformation implementations. {138775}

  • API:GPU: Added support for the QNN_GPU_PRECISION_USER_PROVIDED precision mode to the GPU backend extension API, allowing users to specify custom precision settings for a graph. {142096}

  • API:Genie: Added GENIE_NODE_IMAGE_ENCODER_IMAGE_FULL_ATTN_MASK and GENIE_NODE_IMAGE_ENCODER_IMAGE_WINDOW_ATTN_MASK node inputs. {145051}

  • Genie: Added a source code example for genie-t2e-run to the SDK. {144427}

  • Genie: Added embeddingQuery support for offline embeddings in genie-app. {146044}

  • Genie: Added engine sharing support for models used across different dialogs, currently available for the HTP backend and applicable to basic and SSD dialogs. {147585}

  • Genie: Added support for encoder-decoder models in Gen AI Transformer. {136070}

  • HTP: Improved performance and reduced memory usage for certain vision models by removing redundant space_rearrange operations from the graph. {141570}

  • HTP: Removed the -ffast-math compiler flag from the build configuration to prevent potential numerical inconsistencies and improve accuracy alignment for floating-point operations. {139547}

  • Op:CPU: Added support for the Logit Op. {136656}

  • Op:GPU: Added support for INT32 data type inputs to the ArgMax Op on the GPU backend. {133989}

  • Op:GPU: Added support for the CumulativeSum Op. {38682}

  • Op:HTP: Added backend support for the STFT Op. {134956}

  • Op:HTP: Added documentation for dynamic dimension constraints in HTP Op definitions. {143878}

  • Op:HTP: Added support for Int32 ElementWiseAbs and ElementWiseUnary with Abs operation. {138856}

  • Op:HTP: Added support for signed int16 data type in Unpack Op validation. {142708}

  • Op:HTP: Enabled support for the 5D Cast Op. {143121}

  • Op:HTP: Enabled support for the 5D GatherElements Op with non-zero axis values. {143123}

  • Op:HTP: Enabled support for the 5D Pad Op with a constant padding scheme for FP16 and FP32 data types. {143122}

  • OpDef: Added Op definition for STFT {134955}

  • OpDef: Added support for int32 and UFIXEDPOINT8 data types for the RandomUniformLike Op. {146810}

  • QNN: TFLite Delegate: Added support for the Broadcast_to Op. {138848}

  • QNN:HTP: Enabled Quant & Dequant Op between FP32 and QINT16 op validator {141056}

  • SDK: Added a new RandomUniformLike Op definition and reference implementation to align with the ONNX specification. {134859}

  • SDK: Enhanced OEM control over QNN priority levels, allowing more flexible configuration of graph execution priorities on HTP backend. {126262}

  • SNPE: Added documentation for low-level performance APIs under “Tutorials and Examples”, “Application Tips” {145899}

  • Tool: Added the ability to debug a specific subgraph by introducing two new command-line options: –debug_subgraph_inputs and –debug_subgraph_outputs. These options allow specifying the input and output tensors that define the subgraph to be analyzed. {127762}

  • Tool: Introduced a new Network Specialization module and API to programmatically convert and optimize models with multiple graph configurations into a single DLC file. This replaces the previous command-line-only workflow. {108571}

  • Tool:Converter: Added support for the Logit Op. {138107}

  • Tool:Converter: Added support for the ONNX RandomUniformLike Op. {134348}

  • Tool:Converter: Added support for the ONNX STFT Op. {134349}

  • Tool:Converter: Added support for the `STFT Op in the ONNX converter. {138613}

  • Tool:Converter: Added support for the buffer_padding parameter in the Buffer Op. {128998}

  • Tool:Converter: Enhanced the converter to automatically apply a float-fallback quantization behavior for models that contain Quantize-Dequantize nodes or are provided with quantization overrides (e.g., for LoRA). {139341}

  • Tool:Converter: First version (v0.1) of the QAIRT Quantization Specification is released which supports 2.0.0 schema version for quantization overrides file. {114160}

  • DSP: Significantly improved performance for models with a batch size greater than one by optimizing the 5D Reshape-Transpose-Gather pattern in the backend. {140837}

  • GPU: Improved inference performance for select models in GPU FP16 mode on certain chipsets. {144204}

  • Genie: Added the missing ‘type’ field to the sampler.json configuration example. {138004}

  • Genie: Fixed a regression in Eaglet token generation rate. {145608}

  • Genie: Fixed a segmentation fault caused by uninitialized variables. {144692}

  • Genie: Fixed a segmentation fault that occurred when running LLM models with the genie-t2t-run tool. {147760}

  • Genie: Fixed an issue loading lm_head or LoRA adapters on Windows platforms. {143661}

  • Genie: Fixed an issue where paused queries with LUT encoder models could not resume. {145135}

  • Genie: Fixed an issue where prompt templates were not applied when GenieEmbedding_generate outputs were truncated. {143445}

  • Genie: Fixed memory leaks occurring during GenieDialog_applyLora. {136542}

  • HTP: Added support for casting from uint8 to fp16 to resolve an accuracy issue where uint8 was incorrectly interpreted during a cast to a float type. {135317}

  • HTP: Enabled support for asynchronous context initialization in multi-core environments. {138427}

  • HTP: Fixed a memory corruption crash that could occur in multi-threaded applications during deinitialization. {144587}

  • HTP: Fixed a segmentation fault that occurred when using asynchronous initialization on multi-core HTP configurations. {138335}

  • HTP: Fixed an accuracy issue that produced incorrect output when using LPBQ. {146380}

  • HTP: Fixed an issue where models would crash or hang on the HTP backend when the inference batch size was greater than one. {144574}

  • HTP: Fixed an issue where the deviceGetPlatformInfo API returned incorrect SoC information when using the non-RPC path. {141569}

  • HTP: Implemented a fix to prevent a CDSP crash when Virtual Address space is exhausted during memory allocation. {145909}

  • HTP: Resolved an intermittent failure in asynchronous execution mode that could lead to errors {138318}

  • HTP: Resolved an issue on certain platforms where a failure to lock the HMX context could cause a DMA execution failure. {138289}

  • HTP: Resolved execution failures for certain models in Gen AI corner cases. {129730}

  • HTP: Significantly improved performance for models using grouped TransposeConv2d by enabling an optimization that was previously restricted to operations with zero padding. {143544}

  • Op:HTP: Added support for FP32 weight-only quantization in fully connected layers. {131398}

  • Op:HTP: Fixed NullRequant Op registration failure when using w16 and per-channel quantization. {145523}

  • Op:HTP: Fixed a crash in PoolAvg2d Op when reducing NxM inputs to 1x1 with padding and count_pad=0. {131311}

  • Op:HTP: Fixed a crash occurring during GroupNorm fusion. {130501}

  • Op:HTP: Fixed a runtime failure during context creation when a spill_fill_buffer was configured. {143863}

  • Op:HTP: Fixed an accuracy issue in ElementWiseAdd Op when broadcasting a constant zero. {143254}

  • Op:HTP: Fixed an accuracy issue in FP16 models caused by a faulty SlicePad_shape->Transpose graph optimization rule. {145638}

  • Op:HTP: Improved performance of the ReduceSum Op for FP16 data types by ensuring a faster, optimized implementation is used. {143158}

  • Op:HTP: Resolved a performance regression affecting model execution. {145191}

  • Op:HTP: Resolved accuracy issue in Gather Op for depth=1 cases. {134448}

  • Op:HTP: Resolved performance regressions for select models. {143809}

  • SNPE: Added support for the –optimization_preset option in snpe-dlc-graph-prepare and enabled online preparation via platform options. {135223}

  • SNPE: Fixed an issue where setting HTP graph optimization levels in online preparation did not support distinct optimization levels for different SNPE instances. {142940}

  • SNPE: The snpe-dlc-info tool now displays input, output, and unconsumed tensors in topologically sorted order. {146793}

  • Tool: Fixed an accuracy regression that could occur in certain models due to an incorrect start index calculation in a transpose operation. {144858}

  • Tool: Fixed an issue where block quantized convolution with special dimensions could cause preparation failures. {144994}

  • Tool: Resolved an issue where snpe-parallel-run-cpp would crash when used with the –userbuffer_memorymapped argument. {119102}

  • Tool:Converter: Fixed a bug in Expand Op translation caused by incorrect data type population. {141810}

  • Tool:Converter: Fixed a bug in sink_transpose optimization where a transpose node could be consumed twice by the same node. {140535}

  • Tool:Converter: Fixed a bug that introduced redundant Convert nodes before LSTM/GRU nodes during mixed precision conversion. {145617}

  • Tool:Converter: Fixed an axis tracking issue in ONNX PRelu Op that could cause incorrect broadcasting. {142728}

  • Tool:Converter: Fixed an issue where 0D tensors were incorrectly retained as 1D tensors by propagating scalar tensor information as needed. {141899}

  • Tool:Converter: Fixed an issue where models with extremely small, near-zero quantization scale values (e.g., 1e-35) would fail during inference on the CPU backend. {127367}

  • Tool:Converter: Fixed an issue where the –float_bitwidth option could incorrectly update non-quantizable tensors. {145723}

  • Tool:Converter: Fixed an issue where the second input tensor of MatMul nodes from QDQ models was not correctly quantized. {136049}

  • Tool:Converter: Fixed an issue with encoding population in LayerNorm pattern matching. {141265}

  • Tool:Converter: Fixed issue where squashable elementwise operations following convolution operations caused errors when encodings of the convolution’s weights/bias were provided. {85485}

  • Tool:Converter: Improved validation in Resize optimization to prevent errors when invalid scale values are provided. {138778}

  • Tool:Converter: Resolved a model conversion failure for large ONNX models caused by excessive memory consumption. {122217}

  • Tool:Converter: Resolved an issue where recent updates to the model converter caused excessive memory consumption during graph serialization, leading to failures when creating context binaries for large models. {136952}

  • Tool:Converter: Squashed identity Expand and Tile nodes in the graph to remove redundant operations. {144693}

  • Tool:Converter: Updated the logic for matching RmsNorm patterns to improve pattern recognition. {146093}

2.37.0

July 2025

  • QNN HTP opdef supplement doc updated with descriptions of use of QNN_DEFINITION_IMPL_GENERATED encoding definition. {127977}

  • API:GPU: Added support for the Qnn_DeviceHandle_t argument in the QnnContext_create API. {123584}

  • API:GPU: Added support for the Qnn_GlobalConfig API. {135731}

  • Genie: Added an async command to genie-app allowing for execution of asynchronous statements. {137243}

  • Genie: Added support for non-updatable quantization (NUQ) and grouped LoRA adapters. {138782}

  • Genie: Added the cache-groups JSON configuration option allowing for the sliding window attention (SWA) cache management policy. {135552}

  • Genie: Introduced the SSD dialog “branch-mode” config option with “top-1” and “all-expand” supported values. {134925}

  • Genie: Added Eaglet dialog support for dual head draft models. {134373}

  • Genie:API: Added GENIE_NODE_IMAGE_ENCODER_IMAGE_POS_SIN and GENIE_NODE_IMAGE_ENCODER_IMAGE_POS_COS node inputs. {133935}

  • HTP: Support LoRA weights sharing feature by extracting updatable weights across all graphs into a shared blob. {126930}

  • HTP: Added Support for QAIRT Block Ops Stateful LSTM, Stateful GRU & Buffer Ops for FP16 precision {125048}

  • HTP: Added support for VA Reservation on Windows platforms. {138341}

  • HTP: Support LoRA weights sharing feature by extracting updatable weights across all graphs into a shared blob. {128558}

  • Op:GPU: Added support for the GatherND Op on the GPU backend. {61057}

  • OpDef: Added Op definition for IsNaN. {135847}

  • QNN: Fixed html documentation broken links for SNPE documentation URL “Qualcomm Neural Processing SDK” under Overview -> Integration workflow and in the tutorial for Utilizing DLCs. {143420}

  • Tool: Lora Creator: Added support for any kernel shape for Conv in Lora Branch. This removes limitation where only 1x1 Conv was supported. {140575}

  • Tool:Converter: Added support for SparseConvolution2D. {118014}

  • Tool:Converter: Optimized Lora Importer for non-updatable quantization (NUQ). {127586}

  • Tool:Converter: Resolved performance regression on CPU/DSP backends by removing redundant clip operations in the TFLite converter; now, clip is only added when required based on fused_activation_function. {123581}

  • Tool:Genie: Added support for GenieEmbedding APIs in genie-app. {123549}

  • Fixed for wrongly freeing rpc memory allocation for lora adapter in scenarios where context had multiple graphs. {138835}

  • Fixed lora weight tensor names not found issue when graph transformation involved {136062}

  • QNN Docs: Corrected html docs for qnn-net-run command line argument –output to –output_dir {144805}

  • SNPE Tools: snpe/qairt dlc-info fixed to display the correct graph optimization level for HTP cache records generated via API Snpe_SNPEBuilder_SetInitCacheMode() / SNPEBuilder::setInitCacheMode() or net-run option –enable_init_cache {142514}

  • Support is added for Conv2D ops with reuse_space_indices parameter defined. Prepare/graph finalization failures will be prevented. {143040}

  • Tool Update: [Converter]: Few performances regression observed on CPU/DSP backends and fixed by removing redundant clip operations in the TFLite converter; now, clip is only added when required based on fused_activation_function. {141085}

  • Fixed updatable attribute tracking error for torch models {145158}

  • CPU: Fixed quantization issues for large models by correcting the softmax Op implementation. {140260}

  • CPU: Resolved an issue with axis permutation for BW_AXIS_SCALE_OFFSET quantization encoding in Conv operations. {138266}

  • DLC: Fixed small memory leak in DLC based initialization in SNPE and QNN. made to track it {135810}

  • Genie: Fixed a crash when running SSD or SPD dialog types on certain Linux platforms. {137954}

  • Genie: Fixed an out of bounds read issue observed on uint16 embedding LUTs. {144801}

  • Genie: Fixed issue where first context binary split does not contain sufficient information about graph variants to properly initialize the KV$ Manager. {136530}

  • Genie: Fixed issue where the draft model EOS token was not set causing an Eaglet initialization failure. {145057}

  • Genie: Fixed minor memory leaks. {136813}

  • Genie: Fixed segmentation fault when graph switching is enabled along with memory mapping. {143826}

  • HTP: Fixed a deadlock issue that could cause the qnn-throughput-netrun application to hang under stress conditions. {142471}

  • KI: In QNN HTP BE, update on the prepare sequence is causing a regression on some specific models. This will be fixed in the next release (2.36) {136438}

  • Op:HTP: Optimized qu16 Dequantize op {136231}

  • Op:HTP: Optimized the pattern of TransposeConv2d-Dequantize pattern near Output. {134467}

  • Op:HTP: Optimized the pattern of TransposeConv2d-Dequantize pattern near Output. {136219}

  • Op:HTP: Reduced preparation time for 5D operations with large batch sizes. {130280}

  • SNPE: Fixed a crash in snpe-throughput-net-run when the container argument was not specified before certain optional arguments. {141598}

  • Tool: Calibration Input Validation, Quantizer Params, Input Type Conversion handled for HTP Memory Pipeline {138064}

  • Tool: Fixed a failure in the memory pipeline when filtered inference schemas were non-sequential. {142391}

  • Tool: Ordered ONNX Runtime outputs based on output name to resolve issues in memory pipeline inference. {136967}

  • Tool: Removed backend_info from Quantizer params to resolve issue in memory pipeline compilation {136586}

  • Tool: Updated params access way of pydantic object to resolve preserve_io_datatype issue in memory pipeline {144331}

  • Tool:Converter: Added support for Layernorm with multiple normalization dimension {137898}

  • Tool:Converter: Added support for matching new GeLU Op patterns that include Reshape operations to addsress an issue where semantic search models failed conversion with AutoMHA2SHA. {139465}

  • Tool:Converter: Fixed a bug in the Conv/MatMul quantizer optimization to ensure safe indexing. {142845}

  • Tool:Converter: Resolved performance regression on CPU/DSP backends by removing redundant clip operations in the TFLite converter; now, clip is only added when required based on fused_activation_function. {140762}

  • Tool:Converter: Updated conv node’s weight/bias naming during BatchNorm fusion to resolve quantization parameter naming conflicts. {139997}

  • Tool:Converter: Added support for a new pattern in RMSNORM pattern matching {134922}

  • Tool:Converter: Added fix to remove injected ops blocking supergroups {134113}

  • Tool:Converter: Fixed accuracy drop in models having shared biases {134589}

  • Tool:Converter: Updated Tensor Name Sanitization Logic in {141135}

  • Tool:Converter: Updated gamma and beta shape of Layernorm Onnx Op {130934}

  • Tool:Converter:TFLite: Add support for int64 quantized bias {140882}

  • Tool:Converters: Fixed issue of LayerNorm pattern mismatch. {137459}

  • Tool:Converters: Supported dynamic bias to ConvOp. {142223}

  • Tool:qairt-accuracy-evaluator: Fixed inclusion of converter params in execcution summary {140752}

  • Tool:qairt-accuracy-evaluator: Limit parallel qnn x86 evaluations to1 {138075}

  • Tool:snpe-net-run: Fixed a dynamic resizing issue in Conv op when using the –input_dimensions option. {142139}

  • Tools:Converters: Reduced conversion time for large models with more than 10000 ops. {135822}

2.36.0

June 2025

  • API: Added LLM support in the Python API. {118016}

  • API: Added support for quantizer-specific options in the Converter Python API, including parameters for act_quantizer_schema, param_quantizer_schema, and target_backend. These options are now available through the CalibrationConfig object, improving feature parity with the command-line interface. {136135}

  • API: Added support for the Baichuan2-7b model through the high-level Generative AI Python API, enabling both builder and executor workflows. {126702}

  • API: Added support for the Phi-3.5-mini model through the high-level Generative AI Python API, enabling both builder and executor workflows. {138126}

  • API: Added support for the Qwen2-7b model through the high-level Generative AI Python API, enabling both builder and executor workflows. {132444}

  • API: Enabled the generation and consumption of JSON profiling data on Windows platforms. Users can now utilize the profiling capabilities of the Python API on Windows on Snapdragon (WoS) systems. {138647}

  • API: Introduced a model conversion capability to modify the Auto-Regression (AR) number and Context Length (CL) of ONNX-based language models. This allows for flexible adaptation of models to different deployment requirements. {123570}

  • API:Genie: Introduced Genie Dialog and Embedding APIs to set and get performance policy. {137070}

  • API:HTP: Added support for ContextFinalize for the HTP backend, enhancing context management capabilities. {136699}

  • API:HTP: Implemented a URI Builder abstraction to simplify the programmatic construction of FastRPC URIs used for opening sessions with the HTP backend. {110797}

  • Core: Added custom Op support to oe–gcc11.2 and oe-gcc 9.3 toolchains for QNN OP Package Support on LE Target for HTP. {130471}

  • Docs: Updated the LoRAv2 tutorial to indicate support for Windows operating systems in both offline and online workflows. {138772}

  • Genie: Added skip-lora-validation option to reduce LoRA adapter switch time by allowing skipping of LoRA CRC checks on QnnHtp engines. {134913}

  • Genie: Added experimental support for the arm64x-windows-msvc platform. {129093}

  • Genie: Added support for Non-Updateable Quantization (NUQ) and Grouped LoRA, allowing LoRA adapter groups to share encoding bins and supporting non-updateable quant adapters. {138782}

  • Genie: Added support for pausing and resuming active queries using a signal API, introducing an architecture for resuming paused queries in SSD and basic dialogs. {119704}

  • Genie: Added support for profiling and logging of GenieEngine APIs, enabling measurement of switch time, creation time, and other metrics. {131908}

  • Genie: Added support for repetition penalties in sampling within the Genie Sampler. {118081}

  • HTP: Added support for HTP online graph preparation optimization level via platform options. {138420}

  • HTP: Added validation to reject Per-Graph-Execution (PGE) configurations that specify incompatible features such as shared spill/fill buffers or VTCM backup sharing. A warning is now issued to prevent these unsupported setups. {128832}

  • HTP: Enabled 64-bit UDMA support in QNN HTP, allowing access to memory beyond 4GB for large neural networks, and implemented shared-weights far mapping. {91520}

  • HTP: Enabled multi-context spill/fill buffer sharing for QNX. {128061}

  • HTP: Enhanced the HTP backend polling mechanism to support separate polling contexts and threads for each execution priority level. This design improves performance and resource management for multithreaded applications that concurrently run graphs with different priorities. {131859}

  • LPAI: Added support for LPAI backend RPC mode and QNN_GRAPH_ERROR_EARLY_TERMINATION in qnn-throughput-net-run. {121599}

  • Op:CPU: Added support for Sparse Convolution 2D. {120883}

  • Op:CPU: Updated the Cast Op to correctly map NaN (Not a Number) inputs to True when casting floating-point values to BOOL8, aligning with ONNX implementation. {136649}

  • Op:HTP: Added support for the MaskedSoftmax Op on the HTP backend for LLM use cases. {110661}

  • Op:LPAI: Added support for the frame_pad parameter to the Buffer Op on the LPAI backend. {128999}

  • OpDef: Added an optional parameter reuse_sparse_indices to the Conv2d Op, with default support for AIC, GPU, HTA, and LPAI backends. {118012}

  • SDK: Introduced QAIRT_SDK_ROOT as the new primary environment variable for setting the SDK path. The previous QNN_SDK_ROOT and SNPE_ROOT variables are now deprecated and will be removed in a future release. For backward compatibility, they are currently set based on QAIRT_SDK_ROOT. {121206}

  • Tool: Enhanced layerwise debugging tools to accept externally provided “golden” reference outputs for comparison. This allows users to supply their own reference data. A new option to disable layout transformation during this process has also been added to accommodate various data sources. {122717}

  • Tool:Converter: Added support for the new Einsum equation nkctv,kvw->nctw, expanding the range of supported ONNX models. {126231}

  • Tool:Converter: Added support to serialize disconnected model inputs (dangling inputs) from the source framework into the DLC file. {139058}

  • Tool:Converter: Defer loading is now enabled by default for the ONNX converter to improve memory usage and processing time. To disable this feature, use the new –onnx_disable_defer_loading flag for the QAIRT converter or the –disable_defer_loading flag for the QNN/SNPE ONNX converter. {139858}

  • Tool:Converter: Enabled support for the –defer_loading option in the QNN ONNX converter when generating C++/binary outputs. This feature, which was previously unsupported for this output format, helps reduce memory consumption and processing time during conversion. {139859}

  • Tool:Converter: Removed a limitation in the ONNX converter that previously prevented using defer loading (–onnx_defer_loading) and ONNX model simplification in the same conversion. Both features can now be used simultaneously. {116422}

  • Tool:Converter:ONNX: Added support for the ONNX Size Op, which outputs the total number of elements of an input tensor as an int64 scalar. {138523}

  • API: Fixed a bug in the converter input configuration where the data type of the first input was incorrectly applied to all other inputs. {137113}

  • API: Fixed a bug in the model-level API where a typo in an internal variable could cause issues with input list file generation. {137830}

  • API: Fixed an issue in the Quantizer API where parsing an input list file containing comment lines (e.g., lines starting with ‘%’) could fail. {136414}

  • API: Fixed an issue where the GenAIExecutor would return invalid performance metrics, such as -1 or 0 for timing and tokens per second. {137575}

  • API: Reduced excessive warning messages generated by qairt.compile by correcting an internal log level configuration. {137628}

  • API: Refactored the Python API to ensure model configuration files (config.json) can be loaded correctly using standard methods like autoconfig.from_pretrained. {131057}

  • API:CPU: Fixed an issue where graph composition for the CPU backend would fail with an OpConfig validation error for the Transpose Op, particularly when using the float_precision=16 conversion option. {138242}

  • CPU: Fixed an issue where certain models failed during inference due to an invalid layer parameter value resulting from a GroupNorm operation failure. {135924}

  • Core: Improved model initialization time on the HTP backend by optimizing internal system calls during runtime setup. {136899}

  • Genie: Fixed LM head execution for split LEQ models during the last iteration of prefill. {139824}

  • Genie: Fixed a memory leak in the tokenizer implementation observed when running genie-t2t-run with the LoRA adapter. {130865}

  • Genie: Fixed an issue where LLM inference could produce random or incorrect output. {124867}

  • Genie: Fixed sampling for float16 models which would produce nonsensical response text. {134604}

  • Genie: Reduced peak RAM by removing unnecessary copies for embedding LUT encoders when running embeddings on CPU, addressing high memory usage for longer prompts. {134506}

  • Genie: Resolved a crash in the Genie runtime that occurred when using non-empty stop sequences in a dialogue query. {138311}

  • HTA: Fixed a segmentation fault that could occur when executing a cached model on the HTA backend if a subgraph fell back to the DSP backend. {127808}

  • HTP: Fixed a performance regression on the HTP backend that affected certain transformer models, including those using masked softmax. {137554}

  • HTP: Fixed an accuracy regression for models using the ResizeNearestNeighbour Op. The fix adapts the HTP backend to handle updated quantization parameters resulting from an improved CPU backend implementation of the Op. {116566}

  • HTP: Fixed an issue that prevented the DSP driver from loading correctly for multicore execution on Android. {135235}

  • HTP: Fixed memory deregistering failures in GenAI use cases by deallocating unused tensor buffers after inference completion in async mode. {129731}

  • HTP: Resolved a performance regression on the HTP backend that affected both synchronous and asynchronous inference modes for certain models. {137386}

  • HTP:Op: Fixed ElementwiseFloorDiv name mismatch. {135158}

  • LPAI: Fixed an accuracy regression for models using asymmetric parameter quantization. A change was introduced to correctly handle the –param_quantizer_schema flag, which may require users to update their quantization settings. When a tensor’s encoding is symmetric, the quantizer schema must now be set to unsignedsymmetric to ensure correct behavior. {138453}

  • Op:CPU: Fixed a dynamic bias issue in the DepthwiseConv2d Op that caused a segmentation fault with the QNN CPU backend. {137313}

  • Op:CPU: Fixed a memory leak in the Expand Dims Op by ensuring the freeing of space created for axis data. {138049}

  • Op:CPU: Fixed an issue by adding INT8 support for GroupNorm Op. {135932}

  • Op:DSP: Fixed a performance regression by preventing an unnecessary Reshape Op from being added by the LogSoftmax implementation when its input and output shapes are identical. {137013}

  • Op:HTP: Added 5D rank constraints for Softmax and Conv Ops, resolving an issue with ExecuTorch QNN Delegate model preparation. {137462}

  • Op:HTP: Fixed an accuracy drop in the HTP backend’s GridSample Op that occurred with multi-batch inputs (batch size > 1). {134663}

  • Op:HTP: Fixed an accuracy regression in the HTP backend implementation of the DepthToSpace Op. This change restores the behavior to align with previous versions, resolving potential output deviations for models utilizing this operation. {139578}

  • Op:HTP: Resolved an accuracy issue where models using the Concat Op on the HTP backend could produce different and less accurate results when running without the –debug flag in qnn-net-run. {134084}

  • Tool: Fixed an issue where an incorrect offset was generated during the dequantization of tensors with signed symmetric, per-channel encodings. {137056}

  • Tool: Resolved a segmentation fault that could occur in the qnn-context-binary-generator tool during the QnnContext_free call. {139746}

  • Tool:Converter: Added support for GRU Op quantization, specifically enabling quantization for LPAI backend by optimizing static inputs. {126350}

  • Tool:Converter: Corrected an issue that could lead to accuracy regressions on the LPAI backend for models using 4-bit activation quantization. The SDK now correctly enforces the use of 8-bit activation quantization, as 4-bit is not supported on the LPAI backend. {137976}

  • Tool:Converter: Enabled enableQnnQuant flag for Resize Op in-out optimization, resolving issues with Nearest Neighbor and Bilinear modes. {137641}

  • Tool:Converter: Fixed a bug in the Converter tool that ensures the correct order of input and output tensors in the QNN graph JSON file during serialization, aligning them with the IR graph. {118500}

  • Tool:Converter: Fixed a corner case in the Expand Op pattern matching, specifically resolving an issue in the Squash Tile Unsqueeze optimization that led to incorrect shape inference for multi-consumer cases. {136864}

  • Tool:Converter: Fixed a log print format issue that affected accuracy when converting LLM models with maskedsoftmax. {137471}

  • Tool:Converter: Fixed an issue where Batch Normalization (BN) scales and offsets were not correctly obtained from QDQ models, ensuring proper application of BN parameter encodings. {129578}

  • Tool:Converter: Fixed an issue where ONNX Logsoftmax Opset11 would add unnecessary reshapes, leading to extra transpose operations, even when input/output shapes were identical. {137545}

  • Tool:Converter: Fixed an issue where per-Block/per-Channel encodings were not correctly applied for weights during QAIRT conversion, resolving the inability to quantize DLC with 4-bit BQ weights. {134363}

  • Tool:Converter: Fixed an issue where using multiple Static Tensor nodes in a single graph would fail due to duplicate output tensor names. {136080}

  • Tool:Converter: Fixed an issue with merging Mul and Add operations into Batchnorm by correcting pattern definitions and adding validation checks. {136756}

  • Tool:Converter: Reduced converter memory and time usage by avoiding unnecessary access to tensor weights. {137665}

  • Tool:Converter: Removed the beartype import in the PyTorch converter. {134045}

  • Tool:Converter: Resolved an issue in the Layout Transform post-optimization where a node could be incorrectly squashed multiple times, causing incorrect broadcasted output shapes for certain Reshape and Transpose operations. {139382}

  • Tool:Converter: Updated tensor name sanitization logic to ensure uniqueness and prevent conflicts, resolving issues like “Compose Graph failed: Sigmoid Tensor already exists”. {135409}

  • Tool:Converter:ONNX: Enhanced support for the If Op in the ONNX converter to allow subgraphs with multiple outputs. {136721}

  • Tool:Converter:ONNX: Resolved a NameError in the quantizer tool that occurred due to a missing internal logging function. {140893}

  • Tool:Quantizer: Resolved an issue in the quantizer to correctly apply per-channel quantization for grouped ConvTranspose Ops. {136585}

  • Tool:qnn-context-binary-generator: Enhanced qnn-context-binary-generator to precompute and validate adaptation weight metadata paths, allowing early error detection for erroneous LoRA config contents and avoiding long wait times. {126629}

  • Tool:qnn-model-lib-generator: Redirected error logs to stderr and all other logs to stdout. {135807}

2.35.0

May 2025

  • API: Added LLM support in the Python API. {118016}

  • API:Genie: Added a data-alignment-size configuration option for dialog and embeddings APIs. {130270}

  • API:Genie: Introduced the GeniePipeline.h and GenieNode.h APIs, providing multimodal support. {123389}

  • API:Genie: Introduced the GenieTokenizer.h API. {126408}

  • API:HTP: Added support for new memory buffer types (QNN_HTP_MEM_WEIGHTS_BUFFER and QNN_HTP_MEM_SCRATCH_BUFFER) in the QnnMem_register and QnnMem_deregister APIs. {121766}

  • API:HTP: Introduced API changes to support external weights and spillfill buffers. {121760}

  • CPU: Added Phi 3 and Phi 3.5 model configurations to the Genie SDK. {134117}

  • CPU: Added dangling inputs support in Graph. {134280}

  • Core: Added platform information to the JSON output of the context binary utility. {129905}

  • Docs: Updated QNN/SNPE documentation to include QCS8625 in the list of supported Snapdragon devices. {134450}

  • Genie: Added support for use-mmap on Windows platforms. {116519}

  • Genie: Enabled support for multi-modal inference with low latency through the GenIE pipeline, supporting various input/output modalities and utilizing shared embedding weights. {120507}

  • Genie: Removed printing of KPIs to stdout, favoring use of GenieProfile. {123352}

  • HTP: Added initial support for multi-core weight sharing during deserialization, including functions to handle VA allocation for weights per core and passing multi-core metadata. {124612}

  • HTP: Added multicore weight sharing support during deserialization to map shared weights to different cores without requiring VA reservations. {135411}

  • HTP: Added support for configuring extended_udma prepare time. {136435}

  • HTP: Added support for measuring end-to-end latency in the runtime. {98570}

  • HTP: Added support for the QNN_HTP_CONTEXT_CONFIG_OPTION_DEFER_GRAPH_INIT context configuration option to postpone graph-related tasks. {130605}

  • HTP: Added support for the QNN_HTP_CONTEXT_GET_PROP_BUFFER_START_ALIGNMENT context property to retrieve buffer start alignment. {134678}

  • HTP: Added support for the usage of external weights and scratch buffers on the HTP backend. {121767}

  • HTP: Added support to save the transport result for multicore transport during async execution. {132146}

  • HTP: Enabled support for dynamic input and output resolution for SD3 on the HTP backend. {105781}

  • HTP: Enabled the mmap budget feature for WoS to reduce peak RAM usage during context initialization for GenAI use cases. {131070}

  • HTP: Extended binary format support for spill/fill to include external buffers. {136017}

  • HTP: Implemented buffer size calculations for the HTP backend, including consideration for graph selection and calculation of maximum spill/fill buffer size. {121765}

  • HTP: Updated the Throughput Net Run (TNR) application to utilize thread_pool utilities for thread management. {113123}

  • Op:CPU: Added dynamic dimension support for AvgPool2D. {126775}

  • Op:CPU: Added dynamic dimension support for InstanceNorm Op. {101384}

  • Op:CPU: Added support for the ‘frame_pad’ parameter in Buffer Op. {133242}

  • Op:GPU: Added support for the Cast operation from INT64 to INT32 on Windows. {132750}

  • Op:HTP: Added INT16 support for the ElementWiseAsin Op on the HTP backend. {114479}

  • Op:HTP: Added support for the MaskedSoftmax Op on the HTP backend for LLM use cases. {110661}

  • Op:HTP: Implemented performance optimizations for the Score Filter and NMS operations on the HTP backend. {134740}

  • OpDef: Added Op definition for IsInf. {125370}

  • SDK: Added an option to enable optrace profiling in the TNR application. {135588}

  • SDK: Enabled SNPE, QNN, and QNN delegate support for the QCM8550 platform. {129533}

  • Tool:Converter: Added dynamic weights support for the Deconv Op in TensorFlow models. {109713}

  • Tool:Converter: Added support for Add, Subtract, Multiply, and Divide operations in Float32 precision for static tensor manipulation within the G2G IR. {125540}

  • Tool:Converter: Added support for ONNX 1.16.1 in the Ubuntu 20.04 (Focal) environment. {134975}

  • Tool:Converter: Added support for the Size operation and updated Relu opset versions in the ONNX converter to address unsupported operations in certain models. {133472}

  • Tool:Genie: Introduced the genie-app command-line utility. {123548}

  • Tool:HTP: Added support for the HTP MCP Binary format in the QnnHtpBinaryBufferPrinter tool, enabling proper parsing and printing of MCP binaries. {128507}

  • API: Allowed passing extra arguments through the Python API’s ConverterConfig to underlying modules. {133985}

  • API: Fixed an encodings path issue during the build phase with GenAI models using the Python API. {133815}

  • API: Fixed an issue where quantized and compiled models failed during execution with the Python API when using default CalibrationConfig values. {134858}

  • API: Fixed an issue where the QAIRT Python API failed to load backend libraries (QnnCpu.dll/QnnHtp.dll) on certain devices. {134461}

  • API: Fixed an issue with the JSON reader setting in QNN profiling on Windows. {134565}

  • CPU: Fixed a memory management issue for xnnpack Conv2D nodes. {132710}

  • CPU: Fixed an issue where certain models failed during inference due to an invalid layer parameter value resulting from a GroupNorm operation failure. {135924}

  • Core: Fixed cross SoC compatibility issues caused by unsynchronized GpuInfo fields between SocServer and SocUtility. {135786}

  • DSP: Fixed a context binary generation issue on OE Linux Platform. {124376}

  • DSP: Fixed an issue where snpe-net-run failed due to an unavailable runtime. {135399}

  • DSP: Fixed inference time regressions observed on HTP_FP16 and HTP backends by propagating DSP architecture characteristics to the HTP core. {133777}

  • GPU: Resolved model verification failures encountered with certain CNN models on the GPU backend, related to Conv Kernel processing. {130041}

  • Genie: Fixed an asynchronous initialization issue for Windows platforms. {135904}

  • Genie: Fixed an issue where GenieDialog_save/restore could not be used with GENIE_DIALOG_SENTENCE_REWIND. {135558}

  • Genie: Fixed an issue where GenieProfiling data could report invalid initialization time data. {134498}

  • Genie: Fixed an issue where stop sequences did not work with GenieDialog_embeddingQuery. {134592}

  • HTP: Adjusted max PD size calculation to correctly account for far weights, resolving an issue with unexpected secondary PD triggers during specific test conditions. {127268}

  • HTP: Fixed a Stability issue with Llama 3 3B multicore models by updating the method for setting the mc_spill_fill buffer. {135253}

  • HTP: Fixed a crash occurring in multicore graphs due to incorrect identification of spillfill memory pools by the Hexagon NN API. {135543}

  • HTP: Fixed an issue where qnn-net-run failed to open a session due to library loading and device transport instance creation errors. {135028}

  • HTP: Fixed an issue where core information was not correctly captured in optrace for multicore execution. {133797}

  • HTP: Fixed an out-of-memory issue occurring when running Llama 3 8B models on a single core without splitting. {134696}

  • HTP: Fixed async execution failures observed while running certain models in a multicore configuration with shared buffers. {135047}

  • HTP: Fixed logic in graph switching to prevent a bug. {133794}

  • HTP: Fixed multicore async inference failures, including issues observed with Zero copy. {134701}

  • HTP: Improved model execution time performance on SM8750, addressing an issue where the execution time KPI was not being met. {128145}

  • HTP: Resolved a graph execution failure issue observed during the async_group_init_llama7b_graph_switch_no_shared_resources test. {126402}

  • HTP: Resolved an issue causing incorrect mapping of test failures in nightly reports. {125884}

  • HTP: Resolved an issue leading to a “Failed to deregister ion memory with the backend” log message during multi-threaded HTP binary execution with shared buffers. {129716}

  • HTP: Resolved differences in adapter switch time between Genie and qnn-net-run by addressing issues related to graph switching and power settings. {131776}

  • Op:CPU: Fixed TransposeConv2d for asymmetric kernels in Float execution. {133778}

  • Op:CPU: Fixed an issue by adding INT8 support for GroupNorm Op. {135932}

  • Op:GPU: Fixed accuracy errors with the ReduceSum operation when used with Image2DArray for non-Mean ops and specific dimensions. {131616}

  • Op:GPU: Fixed inference failures in models with Argmax/Argmin Ops. {133052}

  • Op:HTP: Added support for LayerNorm when the constant input is FP16 converted to FP32. {131420}

  • Op:HTP: Enabled UINT_8 datatype support for the StridedSlice Op on the HTP backend, resolving model conversion and graph preparation failures. {125597}

  • Op:HTP: Fixed accuracy issue for GatherNd Op. {110126}

  • Op:HTP: Fixed an accuracy issue with LPBQ convolution for MOE on v73. {133134}

  • Op:HTP: Fixed an issue where the Genie output resulted in an infinite loop with WoS by updating the prompt file. {134680}

  • Op:HTP: Fixed an issue with high power consumption for DepthwiseConv op with asymmetric stride by optimizing the pattern on the HTP backend. {133635}

  • Op:HTP: Improved accuracy of the Swish Op. {133898}

  • Op:HTP: Improved performance of the MatMul Op running on HVX. {135210}

  • Op:HTP: Improved the performance of the 5D GridSample Op on the HTP backend for W8A16 quantization. {122831}

  • Op:HTP: Improved the performance of the GridSample Op on the HTP backend by addressing tiling and scheduling issues. {126462}

  • SDK: Fixed an issue where some models failed at the concat operation during graph preparation. {132887}

  • Tool: Added a validation check for float fallback to prevent quantizer failures when encodings or calibration lists are not provided. {133463}

  • Tool: Added support for the –onnx_batch and –tensorflow_batch options in Hypertuner after QAIRT converter changes. {131064}

  • Tool: Eliminated a misleading warning message “Function not called, PrepareLib isn’t loaded!” that would appear when running qnn-net-run successfully on HTP. {122382}

  • Tool: Fixed an issue where the is_symmetric value for 32-bit bias tensors was incorrectly reset during Float Fallback, causing failures when the output DLC was passed back to the quantizer. {135379}

  • Tool: Fixed quantizer to insert Convert Op for LayerNorm weights with external encoding. {134466}

  • Tool: Resolved an issue where snpe-dlc-graph-prepare failed for certain models due to incompatible float bitwidths when QParams were present, particularly in the float fallback path. {130558}

  • Tool:Converter: Added a fix for a bug in LayerNorm squeeze_axes. {126234}

  • Tool:Converter: Added a pattern to map to expand op to reduce inference time. {132363}

  • Tool:Converter: Added a warning message for the Non-Zero Op when the output shape is dynamic. {126185}

  • Tool:Converter: Added support for a new einsum equation, expanding the range of supported ONNX models. {133824}

  • Tool:Converter: Converter-generated FullyConnected Ops now have 2D input and 2D output. {127049}

  • Tool:Converter: Ensured that ApplyEncodings is called by the quantizer when –use_quantize_v2 is provided internally, even if not on the command line. {133705}

  • Tool:Converter: Fixed JSON dumping for 4-bit quantized tensors. {133481}

  • Tool:Converter: Fixed KernelScale expansion for scalars in TFLite DeConv dequantization. {128978}

  • Tool:Converter: Fixed a bug in NonZero Op translation constant folding. {127165}

  • Tool:Converter: Fixed a bug in the squash_node_into_nn_node optimization. {126354}

  • Tool:Converter: Fixed a conversion error that occurred when –float_bitwidth 16 was provided on the command line with existing quantization parameters. {134716}

  • Tool:Converter: Fixed a corner case in the DCE process in the converter to correctly handle node removal based on the number of consumers of output tensors. {129704}

  • Tool:Converter: Fixed an error in the squash_node_into_nn_node optimization. {132836}

  • Tool:Converter: Fixed an issue where output nodes for BatchMatMul and BatchMatMulV2 Ops were missing by adding support to convert them to FullyConnected Op. {127139}

  • Tool:Converter: Fixed an issue where the converter failed when using the –desired_input_layout argument with the new layout transform algorithm by unifying its behavior with custom_io. {136144}

  • Tool:Converter: Fixed an issue with 6D support for Concat and Constant Ops in the frontend, resolving a core dump error during quantization. {117698}

  • Tool:Converter: Fixed incorrect population of the “is_symmetric” flag, ensuring encodings are dumped correctly. {134673}

  • Tool:Converter: Fixed issue observed when several GRU share one init hidden status, add UT for bidirectional GRU. {91127}

  • Tool:Converter: Resolved an accuracy regression issue related to the squash_batchnorm optimization in the converter by ensuring the optimization correctly handles encodings. {130130}

  • Tool:Converter: Skipped adding dummy weights and bias tensors during LayerNorm pattern matching. {128870}

  • Tool:Converter:ONNX: Added a fix for axis_format handling in matmul_to_fc translation. {118318}

  • Tool:Converter:ONNX: Fixed a model conversion issue with the Resize operation in the ONNX converter. {131677}

  • Tool:Converter:ONNX: Fixed an ONNX conversion failure for the Sam2 Image Encoder model by addressing layout format issues for Matmul node inputs and outputs. {131098}

  • Tool:Op:HTP: Optimized the DepthwiseConv op with asymmetric stride to improve performance for specific models. {132474}

  • Tool:accuracy_debugger: Corrected a tensor shape issue for the oneshot algorithm with ONNX batch=1; the onnx_batch override option is no longer accessible. {133915}

  • Tool:qairt-accuracy-evaluator: Removed the preproc-file option from the Accuracy Evaluator CLI as it is no longer valid due to the deprecation of minimal mode. {129278}

  • Tool:qnn-onnx-converter: Fixed an issue where static tensor framework trace information was missing for some tensors. {120982}

  • Tool:qnn-tensorflow-converter: Added logic to ensure the min-max in TensorFlow FakeQuantPerChannel nodes are symmetric. {118672}

  • Tool:quantizer: Fixed an issue with 2-bit weight quantization calculation, resolving incorrect output values. {132048}

2.34.0

April 2025

  • API:Genie: Added GenieSampler_registerUserDataCallback API which adds a userData argument to the sampler custom callback. {130164}

  • API:Genie: Added GenieEngine.h, GenieDialog_getEngine, and GenieDialog_bindEngine APIs. {126715}

  • API:SNPE: Added Java API setUnconsumedTensorsOutput(), equivalent to the C/C++ builder API Snpe_SNPEBuilder_SetUnconsumedTensorsAsOutputs() / SNPEBuilder::setUnconsumedTensorsAsOutputs(). {125891}

  • CPU: Added BOOL support in CPU Concat Op. {130940}

  • CPU: Added axes parameter support in L2Norm. {121463}

  • DSP:SNPE: Added the ability to display the exact priority of the HVX thread in the log to help identify potential issues related to HVX concurrency scenarios. {117790}

  • Genie: Added KV quantization support for GenAiTransformer backend. {123438}

  • Genie: Added a LoRAv3 reference/sample Genie configuration to the SDK examples. {130008}

  • Genie: Added the Eaglet dialog type. {126452}

  • Genie: Added token-acceptance-rate to the GenieProfile output for some dialog types. {123350}

  • Genie: Introduced a performance optimization where logits are sampled using the native datatype output of the model. {121359}

  • HTP: Deprecated optrace collection via debug configuration files. Use optrace via profiling instead. {124739}

  • HTP: Fixed an issue where the number of items was missing in the multicore callback. {129636}

  • HTP: Implemented service call to do dspqueue_close for multicore environments. {126381}

  • HTP: Introduced parallel graph execution, enabling concurrent running of multiple graphs on a single HTP core to improve throughput and resource utilization {89181}

  • HTP: Performance improvement for Softmax Op with 32 channels or less. {130819}

  • Op:GPU: Added support for GridSample Op. {127898}

  • Op:HTP: Optimized DepthWiseConv2d op execution by ensuring it runs on HMX {128655}

  • Op:HTP: Optimized DepthwiseConv op performance for an ASR model on SM8750 HTP W8A16. {129860}

  • OpDef: Added dynamic shape support for FullyConnected Op. {116235}

  • OpDef: Added optional parameter buffer_padding to Buffer Op. {125962}

  • Tool:Converter: Added support for BQ and LPBQ in JSON serializer and deserializer. {132650}

  • Tool:Converter: Added support for quantized DLC files as input to the quantizer module. 1. If all tensors are quantized or overridden float, return directly. 2. If half-quantized DLC, dequantize the fixed-point tensors back to float before quantization. 3. Quantize all float tensors. {129135}

  • Tool:Converter: Added support to trigger Quantizer with float_fallback mode. {129131}

  • Tool:Converter: Fixed handling of dynamic input shapes with a more informative error message. {127631}

  • Tool:Converter: Introduced a new Converter argument to guide different Converter output export formats: –export_format ["DLC_DEFAULT", "DLC_STRIP_QUANT"] {129132}

  • Tool:Converter: QAIRT Quantizer now skips quantization steps if float_fallback is specified for an input Quant DLC. {130397}

  • Tool:qnn-onnx-converter: Added the –preserve_onnx_output_order option to maintain ONNX output order in the converted graph. {126070}

  • QNN Core: Fixed an issue where QNN Savecontext failed for multiple models on Windows platforms due to the inability to find the graph in the DLC. {130104}

  • CPU: Added int32 data datatype for ScatterElements. {126766}

  • CPU: Fixed L2Norm to handle multiple axis {127053}

  • CPU: Fixed verifier failures for single-layer resize models on ONNX16 framework. {124524}

  • CPU: Implemented deep copy of opConfig in CPU to prevent model failures. {128204}

  • DSP: Fixed an SNPE inference failure due to QnnContext_createFromBinary failing with a memory allocation error. {127804}

  • DSP: Fixed an SNPE inference failure where multiple models failed due to errors obtaining input tensor names {127809}

  • DSP: Fixed inference failures for specific models on HTP due to network partition issues. {131151}

  • GPU: Fixed accuracy error in QnnGpuOperationTestActivationAndroid. {125640}

  • GPU: Fixed accuracy error in QnnGpuOperationTestTransposeConvAndroid. {125992}

  • GPU: Fixed inference regressions in models having Convolution Op in gpu_fp16 mode for some devices. {120026}

  • Genie: Fixed issue in genie-t2t-run where dialog de-initialization data was not saved. {132621}

  • Genie: Fixed issue where GenieEmbedding_generate would return a rank of 0. {131581}

  • Genie: Fixed issue where quantized values may overflow or underflow. {125929}

  • HTP: Addressed inference time regressions on multiple chipsets for HTP and HTP_FP16 configurations. {128165}

  • HTP: Corrected the TransportResult resize function to properly set the number of cores. {132311}

  • HTP: Fixed a LayerNorm validation failure by checking rank of bias only if it’s present in LayerNorm Op. {106186}

  • HTP: Fixed a Windows compatibility issue related to non-shared weight VA reservation. {130567}

  • HTP: Fixed a crash in libQnnHtp.so that occurred in graph switch scenarios involving spill fill buffer sharing. {131575}

  • HTP: Fixed a deadlock in allocateAndMapPersistentSpillFillBuffer() that occurred due to locking conflicts. {132488}

  • HTP: Fixed a hang issue in GenAI TNR tests when using asynchronous group initialization with weight sharing and spill-fill sharing with weight sharing. {132586}

  • HTP: Fixed a multithreaded concurrency issue with LLM and small models that caused a ‘memHandles registration failure’. {131051}

  • HTP: Fixed a performance regression for a MobileBERT model that was introduced in a previous release. {132111}

  • HTP: Fixed a prepare failure for the L2Norm op with fp16 when the relaxed_precision_flag is not set during converter stage. {129566}

  • HTP: Fixed an issue where QNN HTP inference failed during MC detailed profiling. {132564}

  • HTP: Fixed an issue where multiple VA sharing groups caused the error ‘Unable to map reserved buffer for non-shared weights’. {131009}

  • HTP: Fixed an issue where qnn-context-binary-generator would hang, consuming excessive CPU and memory. {126833}

  • HTP: Fixed intermittent hangs that occurred during the creation of a context from a binary in concurrent scenarios. {131049}

  • HTP: Fixed the checker failures related to the OpPackage example by correcting the include path. {130707}

  • HTP: Improved performance to address inference time regressions observed on multiple chipsets. {131073}

  • HTP: Resolved an issue related to spill-fill buffer sharing, which caused incorrect output. {124544}

  • HTP: Resolved an issue with x86_prepare failures during savecontext. High CPU utilization during graph preparation was addressed. {125093}

  • HTP: Resolved failures in LoRA v2 test cases due to DSP transport call issues, impacting multi-model context and graph switch scenarios. {130142}

  • HTP: Resolved inference time regressions on SM8750. Avoided broadcast overhead on mul_op to improve performance of uint16 elementwise multiplication. {125746}

  • HTP: Reverted the enablement of the 64-bit flag to address reported hangs. {130301}

  • HTP: Updated PGE support check to use support Features on SoC Model. {127754}

  • LPAI: Fixed a failure in LPAI direct mode {131750}

  • LPAI: Fixed an issue where LPAI single layer models were failing. {130729}

  • Op:DSP: Supported LayerNorm; modified the hard code check. {122112}

  • Op:HTP: Added 5D support for float Sigmoid. {128867}

  • Op:HTP: Addressed performance issues when converting models with w8a16 compared to w8a8 on SM8350 by optimizing matmul and Gemm OPs. {121404}

  • Op:HTP: Fixed ReduceMax FP16 compilation error. {127900}

  • Op:HTP: Fixed a QNN context-binary-generator failure due to a TCM insufficient tile error when processing a custom model. {129510}

  • Op:HTP: Fixed context binary generation failures for ArgMin/ArgMax ops due to TCM overflow. {108763}

  • Op:HTP: Fixed model validation errors during context saving, specifically addressing issues with the DepthToSpace Op. {131083}

  • Op:HTP: Fixed numerical issue for DepthwiseConv2d -> HardSwish in a MobileNetV3 model. {128158}

  • Op:HTP: Fixed rank constraints of Op replacement rule. {130194}

  • Op:HTP: Improved DepthwiseConv2D performance. {126421}

  • Op:HTP: Optimized Reshape Ops when PCQ is enabled on constant tensors going into a MatMul Op, improving performance. {130415}

  • Op:HTP: Registered QInt16 for Concat Op to resolve graph preparation failures when using QuantInt16 tensors. {125735}

  • Op:HTP: Resolved an issue where context binary size calculation failed during graph preparation. {124130}

  • Op:HTP: Resolved an on-device hang issue during execution of Dynamic MobileNet V2, specifically during the Transpose Op {126806}

  • Op:HTP: Resolved context binary generation failures for the BevFormer model with AMP encodings. {129991}

  • SDK: Fixed build issues in Qnn SampleApp, Qnn SampleAppAsyncExecution and Qnn SampleAppSharedBuffer. {131442}

  • SDK: Removed “pytorch to onnx conversion avoidance suggestions” from QNN SDK Docs. {132125}

  • SDK: ReleaseNotes.txt renamed to QAIRT_ReleaseNotes.txt and now contains release notes for both Unix and WoS. {127817}

  • SNPE: Fixed API Snpe_SNPEBuilder_SetInitCacheMode()/SNPEBuilder::setInitCacheMode() breakage for non-HTP backends when using the snpe-net-run option –enable_init_cache. {129545}

  • SNPE: Fixed the –enable_init_cache option (API SNPEBuilder::setInitCacheMode()/Snpe_SNPEBuilder_SetInitCacheMode()) in net-run for AIP runtime. {131929}

  • Tool:Converter: Corrected an issue where qnn-context-binary-generator logged an incorrect QPC path when the –backend_binary option was used. {126169}

  • Tool:Converter: Corrected the allowed length for pad amounts for 4D tensors in the emitter. {132185}

  • Tool:Converter: Enabled data invariant optimizations for the Tile Op. If the input of Tile Op is quantized, the input dataType and qInfo are copied to the output. {126372}

  • Tool:Converter: Fixed Layout Transform to avoid unintentionally loading deferred weights. {132173}

  • Tool:Converter: Fixed a segfault issue in IrJsonDeserializer during deserialization of newly generated model JSON files. {129816}

  • Tool:Converter: Fixed an issue where Accuracy Evaluator runs failed at the Netrun stage. {129997}

  • Tool:Converter: Fixed an issue where FOLD_MULTIPLE_TRANSPOSE was incorrectly pruning graph outputs. {127963}

  • Tool:Converter: Fixed an issue where context binary generation failed with a ‘Graph Finalize failure’ when using multi-Qranium pipelined partitioning. {124908}

  • Tool:Converter: Fixed an issue where qnn-context-binary generation failed for LVM UNet models due to tensor updateability and GroupNorm Op validation errors with the HTP backend. {127887}

  • Tool:Converter: Fixed an issue where the qnn-context-binary-generator tool failed on Windows-X86 when processing LoRAv3 models. {130894}

  • Tool:Converter: Fixed index error failure in remove identity optimization. {125867}

  • Tool:Converter: Fixed issue when folding multiple transposes to retain graph output names. {128685}

  • Tool:Converter: Resolved a serialization issue with MatMul ops involving int16*int16 data types when using dynamic 16-bit weights. {129733}

  • Tool:Converter:ONNX: Added support for dynamic inputs for Clip Op. {124203}

  • Tool:Converter:ONNX: Fixed an issue in the Converter to ensure correct name sanitization following C++ naming conventions. {129356}

  • Tool:Converter:ONNX: Fixed axis tracking in ScatterElements. {118614}

  • Tool:Converter:ONNX: Fixed issue for reverse GRU Op to ensure the correct order of input names for the first output. {130544}

  • Tool:Converter:ONNX: Updated translation for ExpandOp to reduce inference time. {127065}

  • Tool:qairt-accuracy-evaluator: Fixed issue where the input list was incorrectly passed to the quantizer. {130537}

  • Tool:qairt-accuracy-evaluator: - Added support for the ‘algorithms’ quantizer parameter in the evaluator. - Provided input shape to the converter for PyTorch models. {126291}

  • Tool:qnn-accuracy-debugger: Enhanced the qnn-accuracy-debugger tool to provide more meaningful metrics for intermediate tensor cosine similarity. {126437}

  • Tool:qnn-net-run: Resolved an issue in accuracy evaluator runs where the error “‘Namespace’ object has no attribute ‘preserve_graph_output_order’” was encountered. {132180}

  • Tool:qnn-onnx-converter: Aligned the ONNX Resize Op translator’s behavior with ONNX definitions. {123092}

  • Tool:snpe-architecture-checker: Fixed an issue where snpe-architecture-checker would fail due to an uninitialized variable. {126778}

  • Tool:snpe-stress-net-run: Fixed a memory leak issue when loading QNN models. {128498}