Skip to content

torch_to_nnef.inference_target

torch_to_nnef.inference_target

Targeted inference engine.

We mainly focus our effort to best support SONOS 'tract' inference engine.

Stricter Khronos NNEF specification mode also exist but is less extensively tested.

InferenceTarget

InferenceTarget(version: T.Union[SemanticVersion, str], check_io: bool = False)

Base abstract class to implement a new inference engine target.

Init InferenceTarget.

Each inference engine is supposed to have at least a version and a way to check output given an input.

has_dynamic_axes property
has_dynamic_axes: bool

Define if user request dynamic axes to be in the NNEF graph.

Some inference engines may not support it hence False by default.

post_export
post_export(model: nn.Module, nnef_graph: NGraph, args: T.List[T.Any], exported_filepath: Path, debug_bundle_path: T.Optional[Path] = None)

Get called after NNEF model asset is generated.

This is typically where check_io is effectively applied.

post_trace
post_trace(nnef_graph: NGraph, active_custom_extensions: T.List[str])

Get called just after PyTorch graph is parsed.

pre_trace
pre_trace(model: nn.Module, input_names: T.Optional[T.List[str]], output_names: T.Optional[T.List[str]])

Get called just before PyTorch graph is traced.

(after auto wrapper)

specific_fragments
specific_fragments(model: nn.Module) -> T.Dict[str, str]

Optional custom fragments to pass.

KhronosNNEF

KhronosNNEF(version: T.Union[SemanticVersion, str], check_io: bool = True)

Bases: InferenceTarget

Khronos Specification compliant NNEF asset build.

in case of check_io=True we perform evaluation against nnef_tool nnef to pytorch converter. And access original and reloaded pytorch model provide same outputs

post_export
post_export(model: nn.Module, nnef_graph: NGraph, args: T.List[T.Any], exported_filepath: Path, debug_bundle_path: T.Optional[Path] = None)

Check io via the Torch interpreter of NNEF-Tools.

TractNNEF

TractNNEF(version: T.Union[str, SemanticVersion], feature_flags: T.Optional[T.Set[TractFeatureFlag]] = None, check_io: bool = True, dynamic_axes: T.Optional[T.Dict[str, T.Dict[int, str]]] = None, specific_tract_binary_path: T.Optional[Path] = None, check_io_tolerance: TractCheckTolerance = TractCheckTolerance.APPROXIMATE, specific_properties: T.Optional[T.Dict[str, str]] = None, dump_identity_properties: bool = True, force_attention_inner_in_f32: bool = False, force_linear_accumulation_in_f32: bool = False, force_norm_in_f32: bool = False, reify_sdpa_operator: bool = False, upsample_with_debox: bool = False)

Bases: InferenceTarget

Tract NNEF inference target.

Init.

Parameters:

Name Type Description Default
version Union[str, SemanticVersion]

tract version targeted for export

required
feature_flags Optional[Set[TractFeatureFlag]]

set of possibly added feature flags from tract (for example complex numbers)

None
check_io bool

check between tract cli and Pytorch original model that given provided input, output is similar

True
dynamic_axes Optional[Dict[str, Dict[int, str]]]

Optional specification of dynamic dimension By default the exported model will have the shapes of all input and output tensors set to exactly match those given in args. To specify axes of tensors as dynamic (i.e. known only at runtime) set dynamic_axes to a dict with schema: KEY (str): an input or output name. Each name must also be provided in input_names or output_names. VALUE (dict or list): If a dict, keys are axis indices and values are axis names. If a list, each element is an axis index.

None
specific_tract_binary_path Optional[Path]

filepath of tract cli in case of custom non released version of tract (for testing purpose)

None
check_io_tolerance TractCheckTolerance

TractCheckTolerance level of difference tolerance between original output values and those generated by tract (those are defined tract levels)

APPROXIMATE
specific_properties Optional[Dict[str, str]]

custom tract_properties you wish to add inside NNEF asset (will be parsed by tract as metadata)

None
dump_identity_properties bool

add tract_properties relative to user identity (host, username, OS...), helpfull for debug

True
force_attention_inner_in_f32 bool
control if attention should be forced as f32 inside
(even if inputs are all f16), usefull for unstable networks
like qwen2.5
False
force_linear_accumulation_in_f32 bool

usefull for f16 models to ensure that output of f16. f16 matmul become f32 accumulators.

False
force_norm_in_f32 bool

ensure that all normalization layers are in f32 whatever the original PyTorch modeling.

False
reify_sdpa_operator bool

enable the conversion of scaled_dot_product_attention as a tract operator (intead of a NNEF fragment). Experimental feature.

False
upsample_with_debox bool

use debox upsample operator instead of deconvolution. This should be faster. (if tract version support it). Experimental feature.

False
post_export
post_export(model: nn.Module, nnef_graph: NGraph, args: T.List[T.Any], exported_filepath: Path, debug_bundle_path: T.Optional[Path] = None)

Perform check io and build debug bundle if fail.

post_trace
post_trace(nnef_graph, active_custom_extensions)

Add dynamic axes in the NNEF graph.

pre_trace
pre_trace(model: nn.Module, input_names: T.Optional[T.List[str]], output_names: T.Optional[T.List[str]])

Check dynamic_axes are correctly formated.

specific_fragments
specific_fragments(model: nn.Module) -> T.Dict[str, str]

Optional custom fragments to pass.