10. Nemo ASR support
Goals
By the end of this guide, you will know how to:
- Export a NeMo ASR model to NNEF using
t2n_export_nemo - Run WAV inference from a minimal Rust binary
- Run inference from Python using
tract - Evaluate the exported model using Word Error Rate (WER)
Prerequisite
- Basic Python knowledge
- Basic Rust knowledge
- Approximately 10 minutes to read this page
Overview
This page documents the end-to-end workflow for exporting an NVIDIA NeMo Automatic Speech Recognition (ASR) model to NNEF using torch-to-nnef, running inference with tract, and evaluating the exported model against standard ASR benchmarks.
Export a NeMo ASR model
The t2n_export_nemo command loads a pre-trained ASR model from the NeMo toolkit and exports it to the NNEF format.
If not already installed, install torch_to_nnef with the nemo-tract extra. This enables the NeMo-specific export command:
t2n_export_nemo \
-e ./dump_parakeet_v3_06B \ # export directory
--tract-specific-path $HOME/SONOS/src/tract/target/release/tract \ # optional path to tract binary
-tt very # numerical tolerance for NeMo vs tract checks
# -s nvidia/parakeet-tdt-0.6b-v3 \ # optional explicit model slug
# --compress-method min_max_q4_0_all # optional model compression
Since in this example no -s argument is provided, the command defaults to listing the known 'nemo' compatible models on HuggingFace Hub and Nemo registeries (we mostly tested parakeet and nemotron).
After the command completes, the export directory (e.g. ./dump_parakeet_v3_06B) will contain:
- The exported NNEF model files
- A model_config.json file describing the exported pipeline
- A export_config.json with all export options used
- A .log file with export details
Additional export options are available via:
Some NeMo preprocessing components are not yet fully supported by tract. In such cases, options such as --skip-preprocessor can be used to exclude those stages from the export.
Audio preprocessing requirements
All supported NeMo ASR models expect audio input with the following characteristics:
- 16 kHz sample rate
- Mono channel
- WAV format
Ensure that all input audio conforms to these requirements before running inference.
next sections are limited to RNNT and TDT models.
Due to limited time and resources, the following sections focus on RNNT and TDT models. Others are not guaranteed to work as is, but contributions are welcome!
Example: Running a NeMo ASR model with tract
in this example directory The example uses a pre-trained ASR model from NVIDIA NeMo and shows how to perform inference using the exported NNEF artifacts.
Run the exported model in Rust
To run the exported NeMo ASR model from Rust, add the tract-nemo crate to your Cargo.toml:
[dependencies]
tract-nemo = {
git = "https://github.com/sonos/torch-to-nnef.git",
branch = "main",
subdir = "docs/examples/nemo_asr/"
}
Rust inference example
use tract_nemo::nemo_asr::NemoAsrModel;
fn main() -> tract_nemo::TractResult<()> {
// Load the exported NeMo ASR model
let model_path = "./dump_parakeet_v3_06B";
let mut asr_model = NemoAsrModel::load(model_path)?;
let input_wavs = vec![
// paths to input WAV files
];
// Run inference
let transcripts = asr_model.infer_from_wav_paths(&input_wavs)?;
// Display results
for (i, t) in transcripts.iter().enumerate() {
println!("Transcription[{}]: '{}'", i, t.text);
// Each transcript also contains detailed items:
// - token
// - logit
// - emitted_at_encoder_timestep
// - emitted_at_encoder_timestep_iteration
}
Ok(())
}
Run the exported model in Python
The exported NeMo ASR model can also be executed from Python using the tract-nemo Python bindings.
First, install the Python package:
pip install "git+https://github.com/sonos/torch-to-nnef.git@main#egg=nemo-asr-tract&subdirectory=docs/examples/nemo_asr/src/nemo_asr_py"
Python inference example
import nemo_asr_tract
def main():
# Load the exported NeMo ASR model
model_path = "./dump_parakeet_v3_06B"
asr_model = nemo_asr_tract.nemo_asr.NemoAsrModel.load(model_path)
input_wavs = [
"path/to/your/input1.wav",
"path/to/your/input2.wav",
]
# Run inference
transcripts = asr_model.infer_from_wav_paths(input_wavs)
# Display results
for i, t in enumerate(transcripts):
print(f"Transcription[{i}]: '{t.text}'")
print(f"Items[{i}]: {t.items}")
if __name__ == "__main__":
main()
Evaluation
If not already installed you need to setup the same python package, as the one for running tract model, with the eval extra for evaluation:
pip install "git+https://github.com/sonos/torch-to-nnef.git@main#egg=nemo-asr-tract[eval]&subdirectory=docs/examples/nemo_asr/src/nemo_asr_py"
The Python tooling also supports evaluation of the exported model using standard ASR benchmarks and WER metrics.
Run an ASR Open Leaderboard evaluation
This command runs an evaluation following the same protocol as the Hugging Face ASR Open Leaderboard.
It produces, for each dataset:
.jsonlmanifest files containing predictions and references- Per-dataset WER scores
- Aggregated summary metrics
Use --help to inspect all available evaluation options.
Display sample-level differences between runners
This command displays side-by-side comparisons (by default, NeMo vs tract) for a subset of samples, sorted by absolute WER difference.
Recompute scores and display a summary table
This recomputes WER scores from the generated manifest files and prints a summary table. This is useful when experimenting with alternative scoring logic.
Custom runner support
For more advanced use cases, the evaluation framework supports custom runners and datasets.
To define a new runner or model, inherit from the base class and implement the required methods:
from nemo_asr_tract.eval.runner import AsRRunner
class MyCustomRunner(AsRRunner):
def __init__(self, model: str, device: int = 0):
super().__init__(model, device)
def name(self) -> str:
my_super_model_and_runner_name = "dummy"
return clean_name(my_super_model_and_runner_name)
@classmethod
def load_from_path(
cls,
*,
cfg: EvalConfig,
device: torch.device,
dtype: torch.dtype,
) -> "AsrRunner":
"""Load the ASR runner from a model directory."""
return cls(model, batch_size=cfg.batch_size)
def transcribe_from_wav_paths(self, wav_paths: List[str]):
return []
The custom runner can then be selected via the --model_runner_class argument in the evaluation CLI.
Tracking runner issues
In the past we have observed some issues with the exported models, such as mismatches between NeMo and tract runner outputs, or unexpected WER scores. To help track and debug these issues, we maintain a script where we log any runner-related discrepancy when running on specific batch, with specific hardware target (due to Kernel precisions differences). Here is a sample usage (it needs extra eval to run properly).