10. Nemo ASR support
Goals
By the end of this guide, you will know how to:
- Export a NeMo ASR model to NNEF using
t2n_export_nemo - Run WAV inference from a minimal Rust binary
- Run inference from Python using
tract - Evaluate the exported model using Word Error Rate (WER)
Prerequisite
- Basic Python knowledge
- Basic Rust knowledge
- Approximately 10 minutes to read this page
Overview
This page documents the end-to-end workflow for exporting an NVIDIA NeMo Automatic Speech Recognition (ASR) model to NNEF using torch-to-nnef, running inference with tract, and evaluating the exported model against standard ASR benchmarks.
Export a NeMo ASR model
The t2n_export_nemo command loads a pre-trained ASR model from the NeMo toolkit and exports it to the NNEF format.
If not already installed, install torch_to_nnef with the nemo-tract extra. This enables the NeMo-specific export command:
t2n_export_nemo \
-e ./dump_parakeet_v3_06B \ # export directory
--tract-specific-path $HOME/SONOS/src/tract/target/release/tract \ # optional path to tract binary
-tt very # numerical tolerance for NeMo vs tract checks
# -s nvidia/parakeet-tdt-0.6b-v3 \ # optional explicit model slug
# -p ~/user/finetuned-parakeet.nemo \ # optional explicit path to .nemo file
# --compress-method min_max_q4_0_all # optional model compression
Since in this example no -s argument is provided, the command defaults to listing the known 'nemo' compatible models on HuggingFace Hub and Nemo registeries (we mostly tested parakeet and nemotron).
After the command completes, the export directory (e.g. ./dump_parakeet_v3_06B) will contain:
- The exported NNEF model files
- A model_config.json file describing the exported pipeline
- A export_config.json with all export options used
- A .log file with export details
Additional export options are available via:
Some NeMo preprocessing components are not yet fully supported by tract. In such cases, options such as --skip-preprocessor can be used to exclude those stages from the export.
CLI flags quick reference
-e, --export-dir: Output directory (must not pre-exist).-s, --model-slug: Explicit NeMo model slug; omit to choose interactively.-p, --model-path: Explicit local path to .nemo file.--tract-specific-version/--tract-specific-path: Select tract version or binary.--tract-reify-sdpa: Enable SDPA reification where supported by selected tract.-tt, --tract-check-io-tolerance: IO check strictness (exact,approximate,loose, orskip).--skip-preprocessor: Export only encoder/decoder/joint parts.--split-joint-decoder: Splitdecoderandjointinto separate subnets.--compress-registry/--compress-method: Apply weight compression during export.
Run t2n_export_nemo --help for the full list of options.
Shape configuration (boundary remodeler)
See also: the dedicated remodeler tutorial for broader, provider-agnostic usage and API details
In many cases you will want to control the symbolic shapes and boundary transforms used during export (e.g., set a stable BATCH symbol, collapse size-1 dims, bind a scalar to a dynamic size, or keep only a subset of outputs). You can manage this via a YAML shape config file passed to the CLI.
Generate a starting template aligned to your model with:
t2n_export_nemo \
--inspect-signatures \
--dump-shape-config ./shapes.yaml \
# ... your usual flags (model slug/path, etc.)
The generated shapes.yaml uses a nested layout per subnet:
inputs: mapping of input-name -> settingsoutputs(optional): mapping of output-name -> settingsrenamed_symbols(optional):{ TARGET: [SOURCES...] }aliasing of dynamic symbolsoutputs_keep(always present in the template): ordered list of output names to keep (default if omitted: keep all)extensions(optional): list of custom extension strings (e.g.,tract_assertconstraints for pulsification). For known pretrained models, these are auto-populated from a built-in registry
Per-input settings under inputs:
original_shape: list of dims (ints or strings)collapse_dims(optional): list of symbols to collapse at the boundarybind_scalar_to_dim_size(optional): dynamic source assubnet.input.SYMBOLeval_symbols(optional):{ SYMBOL: int_value }-- pin dynamic symbols to concrete sizes in test inputs during export (e.g.,{TARGETS__TIME: 1}for single-step decoding)
Per-output settings under outputs:
collapse_dims(optional): list of axis indices to squeeze from the output tensor (e.g.,[0]to remove the batch axis)
Example (abbreviated):
encoder:
inputs:
audio_signal:
original_shape: [AUDIO_SIGNAL__BATCH, 128, AUDIO_SIGNAL__TIME]
collapse_dims: [AUDIO_SIGNAL__BATCH]
length:
original_shape: [LENGTH__BATCH]
collapse_dims: [LENGTH__BATCH]
bind_scalar_to_dim_size: encoder.audio_signal.AUDIO_SIGNAL__TIME
outputs:
outputs:
collapse_dims: [0]
decoder_joint:
inputs:
encoder_outputs:
original_shape: [ENCODER_OUTPUTS__BATCH, 1024, ENCODER_OUTPUTS__TIME]
collapse_dims: [ENCODER_OUTPUTS__BATCH, ENCODER_OUTPUTS__TIME]
decoder:
renamed_symbols: { BATCH: [TARGETS__BATCH, STATES_0__BATCH, STATES_1__BATCH] }
# Typical RNNT decoder outputs include: outputs, prednet_lengths, states_out
# Keep only the ones you need (e.g., drop prednet_lengths)
outputs_keep: [outputs, states_out]
inputs:
targets:
original_shape: [TARGETS__BATCH, TARGETS__TIME]
collapse_dims: [BATCH]
states_0:
original_shape: [2, STATES_0__BATCH, 640]
collapse_dims: [BATCH]
states_1:
original_shape: [2, STATES_1__BATCH, 640]
collapse_dims: [BATCH]
Decoder: dropping prednet_lengths while keeping IO aligned
When you exclude prednet_lengths from decoder outputs via outputs_keep,
also bind the target_length input to the TIME dimension of targets so it becomes
an internal scalar (and is no longer exposed as an external input):
decoder:
outputs_keep: [outputs, states_out]
inputs:
targets:
original_shape: [TARGETS__BATCH, TARGETS__TIME]
collapse_dims: []
target_length:
original_shape: [TARGET_LENGTH__BATCH]
collapse_dims: []
bind_scalar_to_dim_size: decoder.targets.TARGETS__TIME
states_0:
original_shape: [2, STATES_0__BATCH, 640]
collapse_dims: [BATCH]
states_1:
original_shape: [2, STATES_1__BATCH, 640]
collapse_dims: [BATCH]
This keeps the external input/output quantities consistent and makes the
boundary contract explicit: target_length = size(targets, TIME).
Notes:
- Use namespaced symbols: batch axes appear as
INPUT__BATCHper input. - To expose a common tract-facing name (e.g.,
BATCH) across inputs, declare it viarenamed_symbols. - Aliases listed in
renamed_symbolsare accepted anywhere a symbol is referenced (collapse/bind). renamed_symbolstargets cannot include themselves in sources.collapse_dims(inputs) requires the symbol to be dynamic on that input at the selected stage.collapse_dims(outputs) takes axis indices (integers), not symbols. Only axes of size 1 are squeezed.bind_scalar_to_dim_sizebinds a dynamic size as anint64scalar.outputs_keepfilters exported outputs; order follows the subnet’s originaloutput_names. The template always includes it so you can easily trim.- When batch collapse is detected on inputs, the NeMo registry auto-populates
outputs.collapse_dims: [0]for all outputs of that subnet. Explicit config takes precedence.
Boundary semantics
- Inputs that are Python tuples in the module API are flattened at the boundary (e.g., RNNT
states→states_0,states_1). collapse_dimsremoves listed dynamic axes externally and reinserts them internally so inner modules see their expected rank.bind_scalar_to_dim_sizeremoves the bound input from the external IO and injectsshape(source)[axis]as a dynamicint64tensor.renamed_symbolsonly affects the tract-facing dynamic axes; inspector views remain namespaced by input (e.g.,TARGETS__BATCH).
Quick commands
See also: the provider‑agnostic remodeler tutorial for programmatic usage and richer inspection:
Inspect with config applied (human-rich):
t2n_export_nemo \
--model-slug nvidia/parakeet-tdt-0.6b-v3 \
--export-dir ./noop \
--inspect-signatures \
--inspect-stage final \
--inspect-format human-rich \
--shape-config shapes.yaml \
--dry-run \
--split-joint-decoder
Export with config:
t2n_export_nemo \
--model-slug nvidia/parakeet-tdt-0.6b-v3 \
--export-dir ./export_with_shapes \
--shape-config shapes.yaml \
--split-joint-decoder
Audio preprocessing requirements
All supported NeMo ASR models expect audio input with the following characteristics:
- 16 kHz sample rate
- Mono channel
- WAV format
Ensure that all input audio conforms to these requirements before running inference.
next sections are limited to RNNT and TDT models.
Due to limited time and resources, the following sections focus on RNNT and TDT models. Others are not guaranteed to work as is, but contributions are welcome!
Example: Running a NeMo ASR model with tract
in this example directory The example uses a pre-trained ASR model from NVIDIA NeMo and shows how to perform inference using the exported NNEF artifacts.
Run the exported model in Rust
To run the exported NeMo ASR model from Rust, add the tract-nemo crate to your Cargo.toml:
[dependencies]
tract-nemo = {
git = "https://github.com/sonos/torch-to-nnef.git",
branch = "main",
subdir = "docs/examples/nemo_asr/"
}
Rust inference example
use tract_nemo::nemo_asr::NemoAsrModel;
fn main() -> tract_nemo::TractResult<()> {
// Load the exported NeMo ASR model
let model_path = "./dump_parakeet_v3_06B";
let mut asr_model = NemoAsrModel::load(model_path)?;
let input_wavs = vec![
// paths to input WAV files
];
// Run inference
let transcripts = asr_model.infer_from_wav_paths(&input_wavs)?;
// Display results
for (i, t) in transcripts.iter().enumerate() {
println!("Transcription[{}]: '{}'", i, t.text);
// Each transcript also contains detailed items:
// - token
// - logit
// - emitted_at_encoder_timestep
// - emitted_at_encoder_timestep_iteration
}
Ok(())
}
Run the exported model in Python
The exported NeMo ASR model can also be executed from Python using the tract-nemo Python bindings.
First, install the Python package:
pip install "git+https://github.com/sonos/torch-to-nnef.git@main#egg=nemo-asr-tract&subdirectory=docs/examples/nemo_asr/src/nemo_asr_py"
Python inference example
import nemo_asr_tract
def main():
# Load the exported NeMo ASR model
model_path = "./dump_parakeet_v3_06B"
asr_model = nemo_asr_tract.nemo_asr.NemoAsrModel.load(model_path)
input_wavs = [
"path/to/your/input1.wav",
"path/to/your/input2.wav",
]
# Run inference
transcripts = asr_model.infer_from_wav_paths(input_wavs)
# Display results
for i, t in enumerate(transcripts):
print(f"Transcription[{i}]: '{t.text}'")
print(f"Items[{i}]: {t.items}")
if __name__ == "__main__":
main()
Evaluation
If not already installed you need to setup the same python package, as the one for running tract model, with the eval extra for evaluation:
pip install "git+https://github.com/sonos/torch-to-nnef.git@main#egg=nemo-asr-tract[eval]&subdirectory=docs/examples/nemo_asr/src/nemo_asr_py"
The Python tooling also supports evaluation of the exported model using standard ASR benchmarks and WER metrics.
Run an ASR Open Leaderboard evaluation
This command runs an evaluation following the same protocol as the Hugging Face ASR Open Leaderboard.
It produces, for each dataset:
.jsonlmanifest files containing predictions and references- Per-dataset WER scores
- Aggregated summary metrics
Use --help to inspect all available evaluation options.
Display sample-level differences between runners
This command displays side-by-side comparisons (by default, NeMo vs tract) for a subset of samples, sorted by absolute WER difference.
Recompute scores and display a summary table
This recomputes WER scores from the generated manifest files and prints a summary table. This is useful when experimenting with alternative scoring logic.
Custom runner support
For more advanced use cases, the evaluation framework supports custom runners and datasets.
To define a new runner or model, inherit from the base class and implement the required methods:
from nemo_asr_tract.eval.runner import AsRRunner
class MyCustomRunner(AsRRunner):
def __init__(self, model: str, device: int = 0):
super().__init__(model, device)
def name(self) -> str:
my_super_model_and_runner_name = "dummy"
return clean_name(my_super_model_and_runner_name)
@classmethod
def load_from_path(
cls,
*,
cfg: EvalConfig,
device: torch.device,
dtype: torch.dtype,
) -> "AsrRunner":
"""Load the ASR runner from a model directory."""
return cls(model, batch_size=cfg.batch_size)
def transcribe_from_wav_paths(self, wav_paths: List[str]):
return []
The custom runner can then be selected via the --model_runner_class argument in the evaluation CLI.
Tracking runner issues
In the past we have observed some issues with the exported models, such as mismatches between NeMo and tract runner outputs, or unexpected WER scores. To help track and debug these issues, we maintain a script where we log any runner-related discrepancy when running on specific batch, with specific hardware target (due to Kernel precisions differences). Here is a sample usage (it needs extra eval to run properly).