7. Offloaded Tensors

Goals

At the end of this tutorial you will be able to:

Use offloaded tensors when wished

Prerequisite

PyTorch and Python basics
5 min to read this page

Offload tensors have been developed to allow to manipulate and export more easily large neural network models.

Recall that if you only want to export a LLM model offloaded you can look at our related LLM tutorial and do not need to look at what happen behind.

This class is defined as such:

torch_to_nnef.tensor.offload.OffloadedTensor

OffloadedTensor(elem, device, offload_dir: Path, name: str, offloaded_tensor_type: T.Type[torch.Tensor], force_gc_collect: bool = False)

Bases: OpaqueTensor

Tensor subclass that maintains data on disk.

It hold an virtual internal memory storage (permanent) and a temporary instantiation at each operation accessing it on targeted device.

Warning

we recommend to version of PyTorch > 1.12 for best compatibility.

is_meta `property`

is_meta: bool

Whether the tensor is on the meta device.

Always False as the tensor is (off|re)loaded from disk.

from_original_tensor `classmethod`

from_original_tensor(tensor: torch.Tensor, name: str, offload_dir: T.Optional[Path] = None, suffix_log_msg: str = '')

Take a torch.Tensor or OpaqueTensor and offload it to disk.

Parameters:

Name	Type	Description	Default
`tensor`	`Tensor`	the torch.Tensor or torch_to_nnef.tensor.OpaqueTensor to dump on disk	required
`name`	`str`	the name of the tensor that will be used to create the filename store on disk	required
`offload_dir`	`Optional[Path]`	The directory where this file will be stored (temporarly)	`None`
`suffix_log_msg`	`str`	Added message log suffix for context	`''`

to

to(*args, **kwargs)

Change the target device when reloaded in memory.

update_values

update_values(values: torch.Tensor, strict_shape: bool = True, strict_dtype: bool = True)

Replace offloaded tensor by new 'values' tensor.

Parameters:

Name	Type	Description	Default
`values`	`Tensor`	The tensor that will replace it on disk assertion are made to ensure same shape, dtype as prior	required
`strict_shape`	`bool`	if True (default) the shape of the new tensor must be the same as the prior one	`True`
`strict_dtype`	`bool`	if True (default) the dtype of the new tensor must be the same as the prior one	`True`

You can directly load any .safetensor or .pt into this object that will mimic classical torch.Tensor except that each access will load the Tensor from disk and remove it from RAM as soon as those are not needed, allowing to manipulate very large model bit by bit. It is composable with other torch_to_nnef.tensor.opaque.OpaqueTensor such as QTensor.

To load from disk without overhead, you can call the t2n_load_checkpoint_and_dispatch with appropriate options like in the following example:

example of offload usage from disk (extracted from LLM exporter)

import tempfile
from pathlib import Path
from torch_to_nnef.tensor.offload import (
    ON_DISK_DEVICE_MAP_KEY,
    t2n_load_checkpoint_and_dispatch,
)
from torch_to_nnef.utils import init_empty_weights

from transformers import AutoModelForCausalLM
import huggingface_hub

slug = "meta-llama/Llama-3.2-1B-Instruct"
with init_empty_weights():
    # model instantiation with empty tensors
    # this can be come from any library (here transformers)
    model = AutoModelForCausalLM.from_pretrained(slug, **kwargs)
hf_repo_files = huggingface_hub.list_repo_files(slug)
weights_location = Path(
    huggingface_hub.hf_hub_download(
        slug, hf_repo_files[-1]
    )  # assume at least 1 file is in targeted repo
).parent

# here model tensors are properly loaded into
t2n_load_checkpoint_and_dispatch(
    model,
    weights_location,
    device_map=ON_DISK_DEVICE_MAP_KEY,
    offload_dir=Path(tempfile.mkdtemp(suffix="offload_t2n")),
)

These OffloadedTensor are also very useful to implement into quantization techniques to support very large model quantization with a calibration based on observed values like Hessian from activation. Indeed if we think of the Hessian example: these square matrices can be pretty large especially when multiplied by the number of activations on a big neural network.

If you only wish to maintain QTensor into OffloadedTensor if original float tensor was offloaded you can just use the helper:

torch_to_nnef.compress.offloaded_tensor_qtensor
```
offloaded_tensor_qtensor(q_fn, tensor: torch.Tensor, suffix_name: str) -> torch.Tensor
```
Maintains a QTensor offloaded if original tensor is offloaded.

If this is a new tensor just use the OffloadedTensor.from_original_tensor defined upper.

7. Offloaded Tensors

torch_to_nnef.tensor.offload.OffloadedTensor

is_meta property

from_original_tensor classmethod

to

update_values

torch_to_nnef.compress.offloaded_tensor_qtensor

is_meta `property`

from_original_tensor `classmethod`