exporter
torch_to_nnef.llm_tract.exporter
LLMExporter
LLMExporter(hf_model_causal: nn.Module, tokenizer: AutoTokenizer, local_dir: T.Optional[Path] = None, force_module_dtype: T.Optional[DtypeStr] = None, force_inputs_dtype: T.Optional[DtypeStr] = None, num_logits_to_keep: int = 1)
Init LLMExporter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hf_model_causal |
Module
|
Any Causal model from |
required |
tokenizer |
AutoTokenizer
|
Any tokenizer from |
required |
local_dir |
Optional[Path]
|
If set this is the local directory from where model was loaded. |
None
|
force_module_dtype |
Optional[DtypeStr]
|
Force PyTorch dtype in parameters. |
None
|
force_inputs_dtype |
Optional[DtypeStr]
|
Force PyTorch dtype in inputs of the models. |
None
|
num_logits_to_keep |
int
|
int number of token to keep (if 0 all are kept) by default for classical inference setting it to 1 is fine, in case of speculative decoding it may be more (typically 2 or 3) |
1
|
apply_half_precision_fixes
Align float dtype arguments in few graph ops.
Indeed all LLM are trained using GPU/TPU/CPU kernels related PyTorch backend support f16 dtype in some operators contrary to PyTorch CPU inference (@ 2024-09-09).
To solve this issue we monkey patch in this cli few functional API.
dump_all_io_npz_kind
Realistic dump of IO's.
export_model
export_model(export_dirpath: Path, inference_target: TractNNEF, naming_scheme: VariableNamingScheme = LM_VAR_SCHEME, log_level=logging.INFO, dump_with_tokenizer_and_conf: bool = False, check_inference_modes: bool = True, sample_generation_total_size: int = 0, ignore_already_exist_dir: bool = False, export_dir_struct: ExportDirStruct = ExportDirStruct.DEEP, debug_bundle_path: T.Optional[Path] = None)
Export model has is currently in self.hf_model_causal.
and dump some npz tests to check io latter-on
load
staticmethod
Load from either huggingface model slug hub or local_dir.
prepare
prepare(compression_method: T.Optional[str] = None, compression_registry: str = DEFAULT_COMPRESSION_REGISTRY, test_display_token_gens: bool = False, wrapper_io_check: bool = True, export_dirpath: T.Optional[Path] = None, log_level: int = logging.INFO)
Prepare model to export (f16/compression/checks...).
StateLessF32LayerNorm
dump_llm
dump_llm(model_slug: T.Optional[str] = None, local_dir: T.Optional[Path] = None, force_module_dtype: T.Optional[DtypeStr] = None, force_inputs_dtype: T.Optional[DtypeStr] = None, merge_peft: T.Optional[bool] = None, num_logits_to_keep: int = 1, device_map: TYPE_OPTIONAL_DEVICE_MAP = None, **kwargs) -> T.Tuple[T.Union[Path, None], LLMExporter]
Util to export LLM model.
find_subdir_with_filename_in
Find a subdir with filename in it.