Skip to content

7. Offloaded Tensors

Goals

At the end of this tutorial you will be able to:

  1. Use offloaded tensors when wished

Prerequisite

  • PyTorch and Python basics
  • 5 min to read this page

Offload tensors have been developed to allow to manipulate and export more easily large neural network models.

Recall that if you only want to export a LLM model offloaded you can look at our related LLM tutorial and do not need to look at what happen behind.

This class is defined as such:

  • torch_to_nnef.tensor.offload.OffloadedTensor

    OffloadedTensor(elem, device, offload_dir: Path, name: str, offloaded_tensor_type: T.Type[torch.Tensor], force_gc_collect: bool = False)
    

    Bases: OpaqueTensor

    Tensor subclass that maintains data on disk.

    It hold an virtual internal memory storage (permanent) and a temporary instantiation at each operation accessing it on targeted device.

    Warning

    we recommend to version of PyTorch > 1.12 for best compatibility.

    is_meta property

    is_meta: bool
    

    Whether the tensor is on the meta device.

    Always False as the tensor is (off|re)loaded from disk.

    from_original_tensor classmethod

    from_original_tensor(tensor: torch.Tensor, name: str, offload_dir: T.Optional[Path] = None, suffix_log_msg: str = '')
    

    Take a torch.Tensor or OpaqueTensor and offload it to disk.

    Parameters:

    Name Type Description Default
    tensor Tensor

    the torch.Tensor or torch_to_nnef.tensor.OpaqueTensor to dump on disk

    required
    name str

    the name of the tensor that will be used to create the filename store on disk

    required
    offload_dir Optional[Path]

    The directory where this file will be stored (temporarly)

    None
    suffix_log_msg str

    Added message log suffix for context

    ''

    to

    to(*args, **kwargs)
    

    Change the target device when reloaded in memory.

    update_values

    update_values(values: torch.Tensor, strict_shape: bool = True, strict_dtype: bool = True)
    

    Replace offloaded tensor by new 'values' tensor.

    Parameters:

    Name Type Description Default
    values Tensor

    The tensor that will replace it on disk assertion are made to ensure same shape, dtype as prior

    required
    strict_shape bool

    if True (default) the shape of the new tensor must be the same as the prior one

    True
    strict_dtype bool

    if True (default) the dtype of the new tensor must be the same as the prior one

    True

You can directly load any .safetensor or .pt into this object that will mimic classical torch.Tensor except that each access will load the Tensor from disk and remove it from RAM as soon as those are not needed, allowing to manipulate very large model bit by bit. It is composable with other torch_to_nnef.tensor.opaque.OpaqueTensor such as QTensor.

To load from disk without overhead, you can call the t2n_load_checkpoint_and_dispatch with appropriate options like in the following example:

example of offload usage from disk (extracted from LLM exporter)
import tempfile
from pathlib import Path
from torch_to_nnef.tensor.offload import (
    ON_DISK_DEVICE_MAP_KEY,
    t2n_load_checkpoint_and_dispatch,
)
from torch_to_nnef.utils import init_empty_weights

from transformers import AutoModelForCausalLM
import huggingface_hub

slug = "meta-llama/Llama-3.2-1B-Instruct"
with init_empty_weights():
    # model instantiation with empty tensors
    # this can be come from any library (here transformers)
    model = AutoModelForCausalLM.from_pretrained(slug, **kwargs)
hf_repo_files = huggingface_hub.list_repo_files(slug)
weights_location = Path(
    huggingface_hub.hf_hub_download(
        slug, hf_repo_files[-1]
    )  # assume at least 1 file is in targeted repo
).parent

# here model tensors are properly loaded into
t2n_load_checkpoint_and_dispatch(
    model,
    weights_location,
    device_map=ON_DISK_DEVICE_MAP_KEY,
    offload_dir=Path(tempfile.mkdtemp(suffix="offload_t2n")),
)

These OffloadedTensor are also very useful to implement into quantization techniques to support very large model quantization with a calibration based on observed values like Hessian from activation. Indeed if we think of the Hessian example: these square matrices can be pretty large especially when multiplied by the number of activations on a big neural network.

If you only wish to maintain QTensor into OffloadedTensor if original float tensor was offloaded you can just use the helper:

  • torch_to_nnef.compress.offloaded_tensor_qtensor

    offloaded_tensor_qtensor(q_fn, tensor: torch.Tensor, suffix_name: str) -> torch.Tensor
    

    Maintains a QTensor offloaded if original tensor is offloaded.

If this is a new tensor just use the OffloadedTensor.from_original_tensor defined upper.