offload
torch_to_nnef.tensor.offload
OffLoad Tensor.
Tensor subclass to work around memories limit on various devices by offloading on disk or on a different 'memory' than final one.
It holds an internal memory storage (permanent) and a temporary instantiation at each operation accessing it on targeted device.
HuggingFace 'accelerate' difference
This is different than HuggingFace 'accelerate' that would spread once the layout of your network accross the different devices available, but preventing to move data to other device afterward.
Indeed we use the torch "Tensor" API instead of the torch.device("meta") allowing to hold more informations such as the final targeted device (and other stuff).
This avoid us to have any need for the Hooking system done in accelerate, and skip need to align data flow graph by pre&post casting.
In short it is transparent for end-user that can use those like read-only device movable tensors (mutation support could be envisioned if needed).
OffloadedTensor
OffloadedTensor(elem, device, offload_dir: Path, name: str, offloaded_tensor_type: T.Type[torch.Tensor], force_gc_collect: bool = False)
Bases: OpaqueTensor
Tensor subclass that maintains data on disk.
It hold an virtual internal memory storage (permanent) and a temporary instantiation at each operation accessing it on targeted device.
Warning
we recommend to version of PyTorch > 1.12 for best compatibility.
is_meta
property
Whether the tensor is on the meta device.
Always False as the tensor is (off|re)loaded from disk.
from_original_tensor
classmethod
from_original_tensor(tensor: torch.Tensor, name: str, offload_dir: T.Optional[Path] = None, suffix_log_msg: str = '')
Take a torch.Tensor or OpaqueTensor and offload it to disk.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tensor |
Tensor
|
the torch.Tensor or torch_to_nnef.tensor.OpaqueTensor to dump on disk |
required |
name |
str
|
the name of the tensor that will be used to create the filename store on disk |
required |
offload_dir |
Optional[Path]
|
The directory where this file will be stored (temporarly) |
None
|
suffix_log_msg |
str
|
Added message log suffix for context |
''
|
update_values
Replace offloaded tensor by new 'values' tensor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
values |
Tensor
|
The tensor that will replace it on disk assertion are made to ensure same shape, dtype as prior |
required |
strict_shape |
bool
|
if True (default) the shape of the new tensor must be the same as the prior one |
True
|
strict_dtype |
bool
|
if True (default) the dtype of the new tensor must be the same as the prior one |
True
|
ctx_maybe_load_from_disk_as_offloaded
Context manager to force safetensors/torch_load to offload to disk.
Example:
will offload every tensor to disk as soon as possible.
load_state_dict
load_state_dict(checkpoint_file, device_map=None, offload_dir: T.Optional[Path] = None, apply_offload: bool = False)
Load a checkpoint from a given file.
If the checkpoint is in the safetensors format and a device map is passed, the weights can be fast-loaded directly on the GPU.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
checkpoint_file |
`str`
|
The path to the checkpoint to load. |
required |
device_map |
`Dict[str, Union[int, str, torch.device]]`, *optional*
|
A map that specifies where each submodule should go. It doesn't need to be refined to each parameter/buffer name, once a given module name is inside, every submodule of it will be sent to the same device. |
None
|
offload_dir |
Optional[Path]
|
Path optional Offload directory to store tensors |
None
|
apply_offload |
bool
|
bool if activated it will offload each loaded tensor as soon as possible (we disable it in most case to allow set_module_tensor_to_device dtype casting in memory directly) |
False
|
safe_load_file
safe_load_file(filename: T.Union[str, os.PathLike], device: TDEVICE = 'cpu', offload_dir: T.Optional[Path] = None, apply_offload: bool = False) -> T.Dict[str, torch.Tensor]
Loads a safetensors file into torch format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filename |
`str`, or `os.PathLike`
|
The name of the file which contains the tensors |
required |
device |
`Union[str, int]`, *optional*, defaults to `cpu`
|
The device where the tensors need to be located after load. available options are all regular torch device locations. |
'cpu'
|
offload_dir |
Optional[Path]
|
Path location where tensor with device disk will be offloaded |
None
|
apply_offload |
bool
|
if offload is applyied or left to cpu |
False
|
Returns:
Type | Description |
---|---|
Dict[str, Tensor]
|
|
Dict[str, Tensor]
|
value as |
Example:
set_module_tensor_to_device
set_module_tensor_to_device(mod_updater: ModTensorUpdater, tensor_name: str, device: TDEVICE, value: T.Optional[torch.Tensor] = None, dtype: T.Optional[T.Union[str, torch.dtype]] = None, offload_dir: T.Optional[Path] = None)
A helper function to set a given tensor (parameter of buffer) to device.
(
note that doing param.to(device)
creates a new tensor not linked
to the parameter, which is why we need this function
).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mod_updater |
`ModTensorUpdater`
|
The module updater instance that contains the module |
required |
tensor_name |
`str`
|
The full name of the parameter/buffer. |
required |
device |
`int`, `str` or `torch.device`
|
The device on which to set the tensor. |
required |
value |
`torch.Tensor`, *optional*
|
The value of the tensor (useful when going from the meta device to any other device). |
None
|
dtype |
`torch.dtype`, *optional*
|
If set, the value of the parameter will be cast to this |
None
|
offload_dir |
Optional[Path]
|
The directory where tensor offloaded on disk will be stored. |
None
|
t2n_load_checkpoint_and_dispatch
t2n_load_checkpoint_and_dispatch(model: nn.Module, checkpoint: Path, device_map: T.Optional[T.Union[str, T.Dict[str, T.Union[str, int, torch.device]]]], offload_dir: Path, strict: bool = False, offload_at_load_state_dict: bool = False)
Allow to offload as soon as possible.
This may be benefical in some rare case where partitioned safetensors file are too big for RAM else it's better to offload after dtype cast in set_module_tensor_to_device.