Struct Qnn_BwBlockwiseExpansionMapped_t¶

Nested Relationships¶

struct Qnn_BwBlockwiseExpansionMapped_t¶

A struct to express bw block-wise mapped expansion quantization parameters.

Note

This quantization encoding must not be used with dynamically shaped tensors.

Public Members

uint32_t bitwidth¶: Weight bitwidth must be <= number of bits specified by data type of tensor.

Qnn_QuantizationEncodingMapping_t mapping¶: Specifies mapping from low bitwidth values to quantized values e.g. for custom symmetric encodings bitwidth=2 mapping=QNN_QUANTIZATION_ENCODING_MAPPING_LINEAR_SYMMETRIC_EXCLUDE_ZERO signed values {-2, -1, 0, 1} map to quantized values of {-1.5, -0.5, 0.5, 1.5} such that dequantized_values = scale * {-1.5, -0.5, 0.5, 1.5} Backends are free to manage integer representation at execution time. For the above example, if 4-bit values are used at execution time, the backend may use the mapping {-2, -1, 0, 1} -> {-3, -1, 1, 3} adjusting the scale to scale/2

Qnn_ScaleOffset_t *scaleOffsets¶: Array of size axisSize of scale offset pairs.

uint32_t blockScaleBitwidth¶: Per block scale factor bitwidth (e.g. 12 bits for 4 to 16 expansion)

Qnn_BlockwiseExpansionBlockScaleStorageType_t blockScaleStorageType¶: Size of the block scaling storage, must be able to store at least blockScaleBitwidth sized values.

union unnamed¶

Public Members

uint8_t *blocksScale8¶: A contiguous array of block scalings of size axisSize*numBlocksPerAxis. The array is laid out such that an element can be accessed via blocksScale8[axisIter*numBlocksPerAxis+blockIter]. Used when blockStorageSize is QNN_BLOCKWISE_EXPANSION_BITWIDTH_SCALE_STORAGE_8.

uint16_t *blocksScale16¶: A contiguous array of block scalings of size axisSize*numBlocksPerAxis. The array is laid out such that an element can be accessed via blocksScale16[axisIter*numBlocksPerAxis+blockIter]. Used when blockStorageSize is QNN_BLOCKWISE_EXPANSION_BITWIDTH_SCALE_STORAGE_16.