7. Offloaded Tensors
Goals
At the end of this tutorial you will be able to:
- Use offloaded tensors when wished
Prerequisite
- PyTorch and Python basics
- 5 min to read this page
Offload tensors have been developed to allow to manipulate and export more easily large neural network models.
Recall that if you only want to export a LLM model offloaded you can look at our related LLM tutorial and do not need to look at what happen behind.
This class is defined as such:
-
torch_to_nnef.tensor.offload.OffloadedTensorOffloadedTensor(elem, device, offload_dir: Path, name: str, offloaded_tensor_type: T.Type[torch.Tensor], force_gc_collect: bool = False)Bases:
OpaqueTensorTensor subclass that maintains data on disk.
It hold an virtual internal memory storage (permanent) and a temporary instantiation at each operation accessing it on targeted device.
Warning
we recommend to version of PyTorch > 1.12 for best compatibility.
is_metapropertyWhether the tensor is on the meta device.
Always False as the tensor is (off|re)loaded from disk.
from_original_tensorclassmethodfrom_original_tensor(tensor: torch.Tensor, name: str, offload_dir: T.Optional[Path] = None, suffix_log_msg: str = '')Take a torch.Tensor or OpaqueTensor and offload it to disk.
Parameters:
Name Type Description Default tensorTensorthe torch.Tensor or torch_to_nnef.tensor.OpaqueTensor to dump on disk
required namestrthe name of the tensor that will be used to create the filename store on disk
required offload_dirOptional[Path]The directory where this file will be stored (temporarly)
Nonesuffix_log_msgstrAdded message log suffix for context
''update_valuesReplace offloaded tensor by new 'values' tensor.
Parameters:
Name Type Description Default valuesTensorThe tensor that will replace it on disk assertion are made to ensure same shape, dtype as prior
required strict_shapeboolif True (default) the shape of the new tensor must be the same as the prior one
Truestrict_dtypeboolif True (default) the dtype of the new tensor must be the same as the prior one
True
You can directly load any .safetensor or .pt into this object that will mimic classical
torch.Tensor except that each access will load the Tensor from disk and remove it from RAM as
soon as those are not needed, allowing to manipulate very large model bit by bit.
It is composable with other torch_to_nnef.tensor.opaque.OpaqueTensor such as QTensor.
To load from disk without overhead,
you can call the t2n_load_checkpoint_and_dispatch with appropriate options like in the following example:
import tempfile
from pathlib import Path
from torch_to_nnef.tensor.offload import (
ON_DISK_DEVICE_MAP_KEY,
t2n_load_checkpoint_and_dispatch,
)
from torch_to_nnef.utils import init_empty_weights
from transformers import AutoModelForCausalLM
import huggingface_hub
slug = "meta-llama/Llama-3.2-1B-Instruct"
with init_empty_weights():
# model instantiation with empty tensors
# this can be come from any library (here transformers)
model = AutoModelForCausalLM.from_pretrained(slug, **kwargs)
hf_repo_files = huggingface_hub.list_repo_files(slug)
weights_location = Path(
huggingface_hub.hf_hub_download(
slug, hf_repo_files[-1]
) # assume at least 1 file is in targeted repo
).parent
# here model tensors are properly loaded into
t2n_load_checkpoint_and_dispatch(
model,
weights_location,
device_map=ON_DISK_DEVICE_MAP_KEY,
offload_dir=Path(tempfile.mkdtemp(suffix="offload_t2n")),
)
These OffloadedTensor are also very useful to implement into quantization techniques to
support very large model quantization with a calibration based on observed values like Hessian from activation.
Indeed if we think of the Hessian example: these square matrices can be pretty large especially
when multiplied by the number of activations on a big neural network.
If you only wish to maintain QTensor into OffloadedTensor if original float tensor was offloaded you can just use the helper:
If this is a new tensor just use the OffloadedTensor.from_original_tensor defined upper.