Skip to content

nemo_tract

torch_to_nnef.nemo_tract

Support for NVIDIA NeMo models export to NNEF (with TractNNEF focus).

Provide utilities to export NeMo models, particularly ASR models, to the NNEF format using TractNNEF. Includes functions to handle model subnets, dynamic axes, and custom extensions required for the export process.

DecoderWithoutTargetLength

DecoderWithoutTargetLength(decoder: torch.nn.Module, *, nemo_asr: InjectedNemoModule = INJECTED)

Bases: Module

Wraps the decoder or joint+decoder for export.

This remove the parameters 'target_length' that are not needed during inference...

Enabled classes

nemo.collections.asr.modules.rnnt.RNNTDecoderJoint nemo.collections.asr.modules.rnnt.RNNTDecoder

Alter forward by auto adding the target_length parameter based on the shape of the input tensors (Batch size). as an array of shape (batch_size, 1) full of ones. Then remove it from the output (this is the 2nd argument). This is only applied for enabled classes.

This should lead at export to a complete removal of the unused target_length.

filter_original_input_example
filter_original_input_example(inputs: T.List[torch.Tensor]) -> T.List[torch.Tensor]

Filter out target_length from inputs.

WrapPreprocessorCast

WrapPreprocessorCast(preprocessor: torch.nn.Module, dtype: torch.dtype)

Bases: Module

Wraps the preprocessor to add a cast to float32 at the output.

build_custom_subnet_tract_properties

build_custom_subnet_tract_properties(subnet_name, subnet, *, nemo: InjectedNemoModule = INJECTED)

Build custom tract properties for nemo subnet.

build_dynamic_axes

build_dynamic_axes(subnet, nemo_dynamic_axes)

Build dynamic axes mapping and custom extensions for nemo subnet.

Parameters:

Name Type Description Default
subnet

nemo subnet module

required
nemo_dynamic_axes

dynamic axes info from nemo export

required

Returns: dynamic_axes: dynamic axes mapping for torch_to_nnef custom_extensions: custom extensions for torch_to_nnef

Note

this code will not scale well and should be refactored when more nemo models are supported.

decoder_fix_input_example_batch_size

decoder_fix_input_example_batch_size(input_example: T.Tuple[torch.Tensor, ...], batch_size: int) -> T.List[torch.Tensor]

Fix the batch size of the input example for decoder models.

NeMo decoder batch size option input example is wrong. This function adjusts the input example to have the specified batch size.

Parameters:

Name Type Description Default
input_example Tuple[Tensor, ...]

The original input example tuple (input_ids, ...).

required
batch_size int

The desired batch size.

required

Returns:

Type Description
List[Tensor]

The adjusted input example tuple with the specified batch size.

export_nemo_asr_model

export_nemo_asr_model(asr_model, inference_target, export_dir: Path, compress_registry: str, compress_method: T.Optional[str] = None, skip_preprocessor: bool = False, split_joint_decoder: bool = False, extra_cfg: T.Optional[T.Dict[str, T.Any]] = None, float_dtype: T.Optional[torch.dtype] = None, remove_unused_inputs: bool = True, dump_checked_io: bool = False, *, omegaconf: InjectedOmegaConfModule = INJECTED, **kwargs)

Export a generic NeMo ASR model to NNEF format using TractNNEF.

Parameters:

Name Type Description Default
asr_model

The NeMo ASR model to export.

required
inference_target

The inference target configuration for export.

required
export_dir Path

Directory where the exported NNEF files will be saved.

required
skip_preprocessor bool

If True, skip exporting the preprocessor subnet.

False
split_joint_decoder bool

Whether to split the joint&decoder subnets export.

False
compress_registry str

Compression registry for the exported NNEF subnets.

required
compress_method Optional[str]

Compression method for the exported NNEF subnets. if None, no compression is applied.

None
extra_cfg Optional[Dict[str, Any]]

Additional configuration to save alongside the model.

None
float_dtype Optional[dtype]

Optional float dtype to use for export.

None
remove_unused_inputs bool

To remove unused inputs in the exported model. This happen for decoder subnets that do not use target_length.

True
dump_checked_io bool

Whether to dump checked input/output examples.

False
omegaconf InjectedOmegaConfModule

Injected OmegaConf module.

INJECTED
kwargs

Additional keyword arguments to pass to the export function.

{}

exportable_nemo_net

exportable_nemo_net(output_name, model, input_example, use_dynamo=False, batch_size: int = 1, float_dtype: T.Optional[torch.dtype] = None, *, nemo: InjectedNemoModule = INJECTED, pytorch_lightning: InjectedLightningModule = INJECTED)

Context manager to follow export way of nemo models.

see: nemo.core.classes.Exportable._export

iter_export_params_for_generic_nemo_asr_model

iter_export_params_for_generic_nemo_asr_model(asr_model, inference_target, skip_preprocessor: bool = False, split_joint_decoder: bool = False, remove_unused_inputs: bool = True, float_dtype: T.Optional[torch.dtype] = None) -> T.Iterator[ExportParameters]

Iterator over export parameters for a generic NeMo ASR model.

Parameters:

Name Type Description Default
asr_model

The NeMo ASR model to export.

required
inference_target

The target inference type.

required
skip_preprocessor bool

Whether to skip exporting the preprocessor subnet.

False
split_joint_decoder bool

Whether to split the joint and decoder subnets exported.

False
remove_unused_inputs bool

Whether to remove unused inputs from the exported model.

True
float_dtype Optional[dtype]

Optional float dtype to use for export.

None

Yields:

Type Description
ExportParameters

ExportParameters for each subnet of the ASR model, with the preprocessor

iter_nemo_model_subnets

iter_nemo_model_subnets(model, input_example=None, float_dtype: T.Optional[torch.dtype] = None, split_joint_decoder: bool = False, remove_unused_inputs: bool = True, apply_sequential_examples: bool = False, batch_size: int = 3)

Iterator over exportable subnets of a nemo model.

Parameters:

Name Type Description Default
model

NeMo model to iterate over.

required
input_example

Optional input example to use for export.

None
float_dtype Optional[dtype]

Optional float dtype to use for export.

None
split_joint_decoder bool

To split joint decoder subnets (if encountered).

False
remove_unused_inputs bool

To remove unused inputs from subnet exports.

True
apply_sequential_examples bool

If True, use sequential input examples for each subnet.

False
batch_size int

Batch size to use for dummy input generation.

3

Yields:

Name Type Description
subnet_name

name of the subnet

subnet

the subnet module

input_example

input example for the subnet

dynamic_axes

dynamic axes info for the subnet

see: nemo.core.classes.Exportable.export

load_asr_model_from_nemo_slug

load_asr_model_from_nemo_slug(model_slug: str, *, nemo_asr: InjectedNemoModule = INJECTED, huggingface_hub: InjectedHuggingFaceHubModule = INJECTED)

Load a NeMo ASR model from a given model slug.

nemo_asr_hg_list

nemo_asr_hg_list(huggingface_hub: HuggingFaceHubModule)

Return the list of available NeMo ASR models from HuggingFace.

parser_cli

parser_cli()

Build the CLI parser for NeMo ASR model export to NNEF format.

setup_inference_target_from_cli_args

setup_inference_target_from_cli_args(args) -> TractNNEF

Setup TractNNEF inference target from CLI arguments.

use_pytorch_sdpa

use_pytorch_sdpa(model: torch.nn.Module)

Modify the model to use PyTorch sdpa implementations where applicable.

This leverage attention modules set in NeMo with specific use_pytorch_sdpa flag.