2. NNEF archive composition
Goals
At the end of this tutorial you will know:
- The basics of .nnef format
- The reference Khronos specification
Prerequisite
- Understanding what a neural network is
- 5 min to read this page

The NNEF generated by our getting-started (and all torch_to_nnef
export) respects most of the format defined by the
specification.
In fact this package uses their tools to perform the last serialization part of the export. That's why, most informations you may need to understand this format is available in their specification. This page provides a quick glimpse with emphasis on key elements/differences, to allow you to easily navigate it.
Inside the archive
As shown in the 1st tutorial, export step creates a ${MY_MODEL_NAME}.nnef.tgz
archive.
This is really just a classical tar
archive with gzip
compression option.
You can in fact control the compression ratio with the compression_level: int
parameter of the export function (0 meaning no compression). Compression allows to trade
space on disk and network transfers against time to load the model during the first time (decompression).
Info
tract itself is agnostic to specific archive or 'compression' format and as long as you provide a
graph.nnef
text file and related .dat
files it should be possible to run it
through tract. In fact tract is even able to take the model from a bit stream given
proper interface call.
Since it's just an archive you can just extract it:
You should see a list of .dat
binary files each corresponds to a specific 'parameter' tensor from the ViT neural
network exported, and a graph.nnef
that contains the textual representation of the graph.
graph.nnef
Let's start by looking at the graph.nnef
.
version 1.0;
extension tract_registry tract_core;
fragment tract_core_properties(
) -> (properties: (string, tensor<scalar>)[])
{
properties = [
("tract_target_version", "0.21.13"),
("torch_to_nnef_version", "0.18.6"),
//...
("export_cmd", "getting_started.py")
];
}
fragment tract_gelu( x: tensor<scalar> ) -> ( y: tensor<scalar> )
{
y = 0.5 * x * ( 1.0 + tract_core_erf(x * 0.7071067811865475));
}
# ...
graph network(input) -> (output)
{
input = tract_core_external(shape = [1, 3, 224, 224], datum_type = 'f32');
class_token = variable<scalar>(label = 'class_token', shape = [1, 1, 768]);
# ...
output = linear(
select0,
heads_head_weight,
heads_head_bias_aligned_rank_expanded
);
}
First we observe the extension
s, in our case some of the operators used later are specific to tract,
so the tract registry called tract_core
is needed. In tract there are various registry for different purposes:
tract_core
(all classical operators in tract), tract_transformers
that holds some operators specific to transformers,
tract_onnx
(operators very specific happening in ONNX), tract_extra
that is specific to peculiar operators such as exponential unit norm
,
tract_resource
to load custom variables inside your graph...
Those are added automatically by torch_to_nnef
except if you use custom operators.
After that we see a set of fragment
s you can think of those as pure functions, for
most of them (there are few exceptions like if there is scan
operator but that is a good
approximate). These fragments allow to compose graph to 'compile' into smaller reusable
blocks.
tract_gelu
is interesting because it is replaced on the fly by gelu
specific
operator if it exists in selected registries and for your hardware.
Finally there is the network
which is the main entry point that will describe the
inference computations to perform from inputs to outputs by calling operators and fragments,
and assigning the result in temporary variables
.
tract_core_external
is likeexternal
from NNEF original spec but with data type specification, allowing fine grained definition of what is expected.variable<scalar>
is the standard declaration to load a parameter tensor as described in NNEF spec. label value attribute directly matches with the${label_value}.dat
file that will be loaded.
Info
As of today graph is mostly 'flat' with exception of few fragments.
torch.nn.Module
structure is not maintained in the final NNEF, so
there is repetitions in the control flow expressed here.
Nothing, prevent us to further factorize the graph in the future, beyond simplicity of this package codebase, as this graph representation is NOT the final optimized graph that tract will run but rather the blueprint.
.dat files
We follow the .dat
format specification defined in the Khronos spec,
including support for q8 quantization.
We also leverage the flexibility left to define new formats, for example:
Q4_0
.dat files in a format that is close to isolated GGML_TYPE_Q4_0,
and can be exported to tract with our package seamlessly as explained
in the quantization tutorial.