Skip to content

base

torch_to_nnef.tensor.quant.base

QScalePerGroupF16

QScalePerGroupF16(group_size: int, scale: torch.Tensor, n_bits: int)

Bases: QScheme

f16 scale only per group.

Tract aligned using negative scales.

QScheme

Bases: ABC

to_device
to_device(new_device)

Specific device handling.

Each QScheme may implement support for specific device switching for internal quant/dequant (like GPU, ...) allowing faster computation

QTensor

QTensor(fp_tensor: torch.Tensor, qscheme: QScheme, dequant_to_dtype=torch.float32, u8_compressors: T.Optional[T.List[U8Compressor]] = None)

Bases: OpaqueTensor

Common interface for all Compressed storage.

to_device
to_device(new_device)

Specific device handling.

write_in_file
write_in_file(dirpath: T.Union[str, Path], label: str)

Called at NNEF write time.

Each specific inference engine format should implement the file dump prefered.

U8Compressor

Abstract class to add u8 compression methods.

This can be used to

Apply bitpack elements bellow 8bit Apply classic compression algorithm

Warning !! .shape of u8_tensor compressed must be same as .shape once decompressed

compress abstractmethod
compress(u8_tensor) -> torch.Tensor

Compress a u8 tensor (into u8).

Parameters:

Name Type Description Default
u8_tensor

tensor to be compressed with dtype torch.uint8

required

Return: compressed tensor with dtype torch.uint8

decompress abstractmethod
decompress(u8_tensor) -> torch.Tensor

Decompress an u8 torch tensor (into u8).

Parameters:

Name Type Description Default
u8_tensor

compressed tensor with dtype torch.uint8

required

Return: tensor decompressed with dtype torch.uint8

to_device
to_device(new_device)

Specific device handling.

Each compressor may implement support for specific device (like GPU, ...)

Allowing faster computation

qscale_per_group_f16_min_max_calibration

qscale_per_group_f16_min_max_calibration(fp_tensor, n_bits: int, group_size: int, percentile: float = 1.0) -> QScalePerGroupF16

Build QScalePerGroupF16 and calibrate requested float tensor.

Return

Tuple( QScalePerGroupF16 qscheme, torch.Tensor[uint8] )