Skip to content

10. Nemo ASR support

Goals

By the end of this guide, you will know how to:

  1. Export a NeMo ASR model to NNEF using t2n_export_nemo
  2. Run WAV inference from a minimal Rust binary
  3. Run inference from Python using tract
  4. Evaluate the exported model using Word Error Rate (WER)

Prerequisite

  • Basic Python knowledge
  • Basic Rust knowledge
  • Approximately 10 minutes to read this page

Overview

This page documents the end-to-end workflow for exporting an NVIDIA NeMo Automatic Speech Recognition (ASR) model to NNEF using torch-to-nnef, running inference with tract, and evaluating the exported model against standard ASR benchmarks.

Export a NeMo ASR model

The t2n_export_nemo command loads a pre-trained ASR model from the NeMo toolkit and exports it to the NNEF format.

If not already installed, install torch_to_nnef with the nemo-tract extra. This enables the NeMo-specific export command:

t2n_export_nemo \
    -e ./dump_parakeet_v3_06B \ # export directory
    --tract-specific-path $HOME/SONOS/src/tract/target/release/tract \ # optional path to tract binary
    -tt very # numerical tolerance for NeMo vs tract checks

# -s nvidia/parakeet-tdt-0.6b-v3 \ # optional explicit model slug
# --compress-method min_max_q4_0_all # optional model compression

Since in this example no -s argument is provided, the command defaults to listing the known 'nemo' compatible models on HuggingFace Hub and Nemo registeries (we mostly tested parakeet and nemotron).

After the command completes, the export directory (e.g. ./dump_parakeet_v3_06B) will contain:

  • The exported NNEF model files
  • A model_config.json file describing the exported pipeline
  • A export_config.json with all export options used
  • A .log file with export details

Additional export options are available via:

t2n_export_nemo --help

Some NeMo preprocessing components are not yet fully supported by tract. In such cases, options such as --skip-preprocessor can be used to exclude those stages from the export.

Audio preprocessing requirements

All supported NeMo ASR models expect audio input with the following characteristics:

  • 16 kHz sample rate
  • Mono channel
  • WAV format

Ensure that all input audio conforms to these requirements before running inference.


next sections are limited to RNNT and TDT models.

Due to limited time and resources, the following sections focus on RNNT and TDT models. Others are not guaranteed to work as is, but contributions are welcome!


Example: Running a NeMo ASR model with tract

in this example directory The example uses a pre-trained ASR model from NVIDIA NeMo and shows how to perform inference using the exported NNEF artifacts.

Run the exported model in Rust

To run the exported NeMo ASR model from Rust, add the tract-nemo crate to your Cargo.toml:

[dependencies]
tract-nemo = {
  git = "https://github.com/sonos/torch-to-nnef.git",
  branch = "main",
  subdir = "docs/examples/nemo_asr/"
}

Rust inference example

use tract_nemo::nemo_asr::NemoAsrModel;

fn main() -> tract_nemo::TractResult<()> {
    // Load the exported NeMo ASR model
    let model_path = "./dump_parakeet_v3_06B";
    let mut asr_model = NemoAsrModel::load(model_path)?;

    let input_wavs = vec![
        // paths to input WAV files
    ];

    // Run inference
    let transcripts = asr_model.infer_from_wav_paths(&input_wavs)?;

    // Display results
    for (i, t) in transcripts.iter().enumerate() {
        println!("Transcription[{}]: '{}'", i, t.text);

        // Each transcript also contains detailed items:
        // - token
        // - logit
        // - emitted_at_encoder_timestep
        // - emitted_at_encoder_timestep_iteration
    }

    Ok(())
}

Run the exported model in Python

The exported NeMo ASR model can also be executed from Python using the tract-nemo Python bindings.

First, install the Python package:

pip install "git+https://github.com/sonos/torch-to-nnef.git@main#egg=nemo-asr-tract&subdirectory=docs/examples/nemo_asr/src/nemo_asr_py"

Python inference example

import nemo_asr_tract

def main():
    # Load the exported NeMo ASR model
    model_path = "./dump_parakeet_v3_06B"
    asr_model = nemo_asr_tract.nemo_asr.NemoAsrModel.load(model_path)

    input_wavs = [
        "path/to/your/input1.wav",
        "path/to/your/input2.wav",
    ]

    # Run inference
    transcripts = asr_model.infer_from_wav_paths(input_wavs)

    # Display results
    for i, t in enumerate(transcripts):
        print(f"Transcription[{i}]: '{t.text}'")
        print(f"Items[{i}]: {t.items}")

if __name__ == "__main__":
    main()

Evaluation

If not already installed you need to setup the same python package, as the one for running tract model, with the eval extra for evaluation:

pip install "git+https://github.com/sonos/torch-to-nnef.git@main#egg=nemo-asr-tract[eval]&subdirectory=docs/examples/nemo_asr/src/nemo_asr_py"

The Python tooling also supports evaluation of the exported model using standard ASR benchmarks and WER metrics.

Run an ASR Open Leaderboard evaluation

nemo_tract_eval \
    -e ./dump_parakeet_v3_06B \
    -r ~/SONOS/data/test_asr_export_parakeet \
    --device 0

This command runs an evaluation following the same protocol as the Hugging Face ASR Open Leaderboard.

It produces, for each dataset:

  • .jsonl manifest files containing predictions and references
  • Per-dataset WER scores
  • Aggregated summary metrics

Use --help to inspect all available evaluation options.

Display sample-level differences between runners

nemo_tract_eval_compare_manifest \
    --results-dir ./../my-results-dir/ \
    --max-items 5

This command displays side-by-side comparisons (by default, NeMo vs tract) for a subset of samples, sorted by absolute WER difference.

Recompute scores and display a summary table

nemo_tract_eval_score_manifest ./../my-results-dir/

This recomputes WER scores from the generated manifest files and prints a summary table. This is useful when experimenting with alternative scoring logic.

Custom runner support

For more advanced use cases, the evaluation framework supports custom runners and datasets.

To define a new runner or model, inherit from the base class and implement the required methods:

from nemo_asr_tract.eval.runner import AsRRunner

class MyCustomRunner(AsRRunner):
    def __init__(self, model: str, device: int = 0):
        super().__init__(model, device)

    def name(self) -> str:
        my_super_model_and_runner_name = "dummy"
        return clean_name(my_super_model_and_runner_name)

    @classmethod
    def load_from_path(
        cls,
        *,
        cfg: EvalConfig,
        device: torch.device,
        dtype: torch.dtype,
    ) -> "AsrRunner":
        """Load the ASR runner from a model directory."""
        return cls(model, batch_size=cfg.batch_size)

    def transcribe_from_wav_paths(self, wav_paths: List[str]):
        return []

The custom runner can then be selected via the --model_runner_class argument in the evaluation CLI.

Tracking runner issues

In the past we have observed some issues with the exported models, such as mismatches between NeMo and tract runner outputs, or unexpected WER scores. To help track and debug these issues, we maintain a script where we log any runner-related discrepancy when running on specific batch, with specific hardware target (due to Kernel precisions differences). Here is a sample usage (it needs extra eval to run properly).

nemo_tract_eval_batch_align_checker \
    --results-dir ./../my-results-dir/ \
    --output-file ./runner_issues_log.jsonl
    --model-dir ../../assets/model \
    --dataset librispeech \
    --split test.clean \
    --sample-idx 1000 \
    -o ~/SONOS/data/2026_02_05_debug_batched_metal \
    [--force-cpu]