Skip to content

torch_to_nnef.tensor.quant

torch_to_nnef.tensor.quant

Advanced QTensor (<= 8bits) with complex quant scheme non torch native.

QScalePerGroupF16

QScalePerGroupF16(group_size: int, scale: torch.Tensor, n_bits: int)

Bases: QScheme

f16 scale only per group.

Tract aligned using negative scales.

QScheme

Bases: ABC

to_device
to_device(new_device)

Specific device handling.

Each QScheme may implement support for specific device switching for internal quant/dequant (like GPU, ...) allowing faster computation

QTensor

QTensor(fp_tensor: torch.Tensor, qscheme: QScheme, dequant_to_dtype=torch.float32, u8_compressors: T.Optional[T.List[U8Compressor]] = None)

Bases: OpaqueTensor

Common interface for all Compressed storage.

to_device
to_device(new_device)

Specific device handling.

write_in_file
write_in_file(dirpath: T.Union[str, Path], label: str)

Called at NNEF write time.

Each specific inference engine format should implement the file dump prefered.

QTensorTract

QTensorTract(fp_tensor: torch.Tensor, qscheme: QScheme, dequant_to_dtype=torch.float32, u8_compressors: T.Optional[T.List[U8Compressor]] = None)

Bases: QTensor

All QTensorTract implementations.

QTensorTractScaleOnly

QTensorTractScaleOnly(*args, specific_machine: T.Optional[str] = None, **kwargs)

Bases: QTensorTract

Tract data format it serializes to: Q4_0.

decompress
decompress()

Tract dequantization depends on hardware.

Typically dequantization happen with ops in f16 on ARM and f32 (scale directly casted) on others so we overwrite the function to be consistant with tract.

U8Compressor

Abstract class to add u8 compression methods.

This can be used to

Apply bitpack elements bellow 8bit Apply classic compression algorithm

Warning !! .shape of u8_tensor compressed must be same as .shape once decompressed

compress abstractmethod
compress(u8_tensor) -> torch.Tensor

Compress a u8 tensor (into u8).

Parameters:

Name Type Description Default
u8_tensor

tensor to be compressed with dtype torch.uint8

required

Return: compressed tensor with dtype torch.uint8

decompress abstractmethod
decompress(u8_tensor) -> torch.Tensor

Decompress an u8 torch tensor (into u8).

Parameters:

Name Type Description Default
u8_tensor

compressed tensor with dtype torch.uint8

required

Return: tensor decompressed with dtype torch.uint8

to_device
to_device(new_device)

Specific device handling.

Each compressor may implement support for specific device (like GPU, ...)

Allowing faster computation

fp_to_tract_q4_0_with_min_max_calibration

fp_to_tract_q4_0_with_min_max_calibration(fp_tensor, percentile: float = 1.0) -> QTensorTractScaleOnly

Min-Max method to quantize float tensor to tract supported Q4_0.

qscale_per_group_f16_min_max_calibration

qscale_per_group_f16_min_max_calibration(fp_tensor, n_bits: int, group_size: int, percentile: float = 1.0) -> QScalePerGroupF16

Build QScalePerGroupF16 and calibrate requested float tensor.

Return

Tuple( QScalePerGroupF16 qscheme, torch.Tensor[uint8] )