7. Offloaded Tensors
Goals
At the end of this tutorial you will be able to:
- Use offloaded tensors when wished
Prerequisite
- PyTorch and Python basics
- 5 min to read this page
Offload tensors have been developed to allow to manipulate and export more easily large neural network models.
Recall that if you only want to export a LLM model offloaded you can look at our related LLM tutorial and do not need to look at what happen behind.
This class is defined as such:
-
torch_to_nnef.tensor.offload.OffloadedTensor
OffloadedTensor(elem, device, offload_dir: Path, name: str, offloaded_tensor_type: T.Type[torch.Tensor], force_gc_collect: bool = False)
Bases:
OpaqueTensor
Tensor subclass that maintains data on disk.
It hold an virtual internal memory storage (permanent) and a temporary instantiation at each operation accessing it on targeted device.
Warning
we recommend to version of PyTorch > 1.12 for best compatibility.
is_meta
property
Whether the tensor is on the meta device.
Always False as the tensor is (off|re)loaded from disk.
from_original_tensor
classmethod
from_original_tensor(tensor: torch.Tensor, name: str, offload_dir: T.Optional[Path] = None, suffix_log_msg: str = '')
Take a torch.Tensor or OpaqueTensor and offload it to disk.
Parameters:
Name Type Description Default tensor
Tensor
the torch.Tensor or torch_to_nnef.tensor.OpaqueTensor to dump on disk
required name
str
the name of the tensor that will be used to create the filename store on disk
required offload_dir
Optional[Path]
The directory where this file will be stored (temporarly)
None
suffix_log_msg
str
Added message log suffix for context
''
update_values
Replace offloaded tensor by new 'values' tensor.
Parameters:
Name Type Description Default values
Tensor
The tensor that will replace it on disk assertion are made to ensure same shape, dtype as prior
required strict_shape
bool
if True (default) the shape of the new tensor must be the same as the prior one
True
strict_dtype
bool
if True (default) the dtype of the new tensor must be the same as the prior one
True
You can directly load any .safetensor or .pt into this object that will mimic classical
torch.Tensor
except that each access will load the Tensor from disk and remove it from RAM as
soon as those are not needed, allowing to manipulate very large model bit by bit.
It is composable with other torch_to_nnef.tensor.opaque.OpaqueTensor
such as QTensor
.
To load from disk without overhead,
you can call the t2n_load_checkpoint_and_dispatch
with appropriate options like in the following example:
import tempfile
from pathlib import Path
from torch_to_nnef.tensor.offload import (
ON_DISK_DEVICE_MAP_KEY,
t2n_load_checkpoint_and_dispatch,
)
from torch_to_nnef.utils import init_empty_weights
from transformers import AutoModelForCausalLM
import huggingface_hub
slug = "meta-llama/Llama-3.2-1B-Instruct"
with init_empty_weights():
# model instantiation with empty tensors
# this can be come from any library (here transformers)
model = AutoModelForCausalLM.from_pretrained(slug, **kwargs)
hf_repo_files = huggingface_hub.list_repo_files(slug)
weights_location = Path(
huggingface_hub.hf_hub_download(
slug, hf_repo_files[-1]
) # assume at least 1 file is in targeted repo
).parent
# here model tensors are properly loaded into
t2n_load_checkpoint_and_dispatch(
model,
weights_location,
device_map=ON_DISK_DEVICE_MAP_KEY,
offload_dir=Path(tempfile.mkdtemp(suffix="offload_t2n")),
)
These OffloadedTensor
are also very useful to implement into quantization techniques to
support very large model quantization with a calibration based on observed values like Hessian from activation.
Indeed if we think of the Hessian example: these square matrices can be pretty large especially
when multiplied by the number of activations on a big neural network.
If you only wish to maintain QTensor into OffloadedTensor if original float tensor was offloaded you can just use the helper:
If this is a new tensor just use the OffloadedTensor.from_original_tensor
defined upper.