3. Model with multiple inputs or outputs
Goals
At the end of this tutorial you will know:
- How to export multi io neural network
- Data type limitations of the NNEF representation
Prerequisite
- PyTorch and Python basics
- 10 min to read this page
A lot of neural network models require more than 1 input or output. In that case we support no less than classical ONNX export.
How to export ?
To exemplify this, let's simply try to export a classical Language
model called Albert (from 'ALBERT: A lite BERT for self-supervised
learning of language representations', 2020) with the transformers
library.
First let's create a dir and install few dependencies:
mkdir multi_io_py
cd multi_io_py
python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install torch==2.7.0 \
transformers==4.53.2 \
sentencepiece==0.2.0 \
torch_to_nnef
touch export_albert.py
We are now ready to start, load the model and prepare an input sample:
tokenizer = AlbertTokenizer.from_pretrained("albert-base-v2")
inputs = tokenizer("Hello, I am happy", return_tensors="pt")
albert_model = AlbertModel.from_pretrained("albert-base-v2")
Using basic export API
What would happen if we used the same call as previously in the getting started tutorial ?
Let's not forget that inputs
generated from the tokenizer are put in a Python object BatchEncoding
. It contains the tensors that we will use in the forward
pass of this network in the following attributes:
input_ids
, attention_mask
, token_type_ids
.
So we need to add those attributes in args
and refer to them in input_names
of export API.
Let's try together:
from pathlib import Path
from torch_to_nnef import export_model_to_nnef, TractNNEF
file_path_export = Path("albert_v2.nnef.tgz")
input_names = [
'input_ids', 'attention_mask', 'token_type_ids'
]
export_model_to_nnef(
model=albert_model,
# here we can not simply write
# args=inputs.values()
# because order of values
# is different than .forward parameters !
args=[inputs[k] for k in input_names],
file_path_export=file_path_export,
inference_target=TractNNEF(
version="0.21.13",
check_io=True,
),
input_names=input_names,
output_names=["output"],
debug_bundle_path=Path("./debug.tgz"),
)
Warning
This export is for demonstration of multi inputs outputs only ! The dynamic dimensions specification is missing which creates a limited sub-optimal exported NNEF model.
If you run this script you should get a model very close to the definition in transformers library with the graph.nnef signature that look like this:
graph network(
input_ids,
token_type_ids,
attention_mask
) -> (output_last_hidden_state, output_pooler_output)
Wait but what did just happen to the outputs ?
- This transformer model return in Python a special object:
BaseModelOutputWithPooling
torch_to_nnef
export is called with theoutput_names
specified with 1 element namedoutput
So how come do we get 2 outputs ?
It turns out torch_to_nnef
tries hard to make sense of the provided inputs and outputs.
In this case, the output object has been partially filled because of the inputs provided
to the model. In the upper snippet, we did not add parameters for the AlbertModel.forward
method: output_attentions
or output_hidden_states
: to True
. So all the graph
traced and exported use the control-flow not collecting those outputs.
Warning
This is one of the key limitation of the NNEF export, since it is based on internal Graph representation in PyTorch it doesn't really know more than PyTorch. All the control-flow existing Python side are unknown. This is why selecting correctly your input so that the correct trace and outputs end up being exported is very important.
That's also why conditional sub-model execution is not embededable directly to NNEF (think Mixture Of Experts for example). But fear not we have solutions for that.
Ok now that's a bit clearer 😮💨, but why output names differ from those in Python modeling ?
Well we requested the first output object to be named output
. So all its internal torch tensors are mapped
and are automatically prefixed like this: output_{internal_object_key}
in NNEF.
IO Specification
A bit lost about what is and is not possible to export as Input/Outputs ?
Input(s) provided in export parameter: args
can be:
- a simple
torch.Tensor
- a list of supported elements
Those supported elements being:
- a
torch.Tensor
- a dict of
torch.Tensor
- a list or a tuple of
torch.Tensor
- An object that mimic a dict by implementing
__getitem__
magic function - Containers (dict, list, tuples, mimic object) can themselves embed sub-containers that
contains
torch.Tensor
Important:
- Python Primitives (boolean, integer, float, string values) are passed but the trace and export will 'constantize' them so they will not be variable anymore in NNEF.
- List, Tuple and dict are as well fixed in length and keys possible at export time.
Variable Python primitive in NNEF
To work-around Python primitives constantization you can
transform those into torch.Tensor
. This will only work on
primitive that does not change the control-flow.
Outputs have the same object flexibility.
Also, if some names are not provided in input_names
and output_names
they will be automatically generated with following template input_{}
and output_{}
where the content of the brackets depends on indexes and keys.
Selection of inputs and outputs to export
Ok that's nice, we should now start to better understand what's possible to do with simple torch_to_nnef
export call.
What about if you want something that only exports the last_hidden_states
?
In that case you can simply wrap the model into a new nn.Module
like so:
import torch
class ALBERTModelWrapper(torch.nn.Module):
def __init__(self, transformer_model: torch.nn.Module):
super().__init__()
self.transformer_model = transformer_model
def forward(self, *args):
outputs = self.transformer_model(*args)
last_hidden_states = outputs.last_hidden_state
return last_hidden_states
wrapped_model = ALBERTModelWrapper(albert_model)
You can now export this wrapped model with the same API call we specified upper.
The same logic would apply if you wish to ignore some inputs
, you can set those in
the __init__
and reference them in the forward
pass.
A concrete example of this is available in the torch_to_nnef
codebase with regard
to LLM wrappers that need to handle the KV-cache properly: here.
Working around limitations
Dynamic number of input or output is not supported out of the box
Given your network has a variable number of input or outputs
with the same shape you can envision to wrap your network inside
a torch.nn.Module
that will concatenate those into a single tensor
of variable size.
If the shapes are of varying size preventing the direct concatenation, you can pad your tensors before and add another tensor responsible to keep track of your padding values.
This again will only work if the underlying PyTorch IR graph is not altered by the different inputs.
While suboptimal in RAM and compute, it should allow to express any possible network in that respect.