loss
torch_to_nnef.op.aten.loss
ATen loss-family op emitters (mse_loss, nll_loss, cross_entropy_loss, ...).
Each loss is decomposed via a pointwise NNEF fragment (where pointwise
makes sense -- mse, bce-with-logits, kl_div) plus a full-tensor
mean_reduce / sum_reduce + squeeze chain governed by torch's
reduction enum (0 = none, 1 = mean, 2 = sum).
binary_cross_entropy_with_logits
Map aten::binary_cross_entropy_with_logits to NNEF.
Signature: (input, target, weight, pos_weight, reduction).
Pointwise BCE via the numerically-stable softplus formulation lives
in the binary_cross_entropy_with_logits fragment; weight /
pos_weight modulators are not currently supported.
cross_entropy_loss
Map aten::cross_entropy_loss to NNEF.
Lowers to nll_loss(log_softmax(input, dim=1), target, ...).
weight / ignore_index / label_smoothing are not currently
supported (raise on non-default values).
huber_loss
Map PyTorch aten::huber_loss(input, target, reduction, delta).
Pointwise piecewise: quadratic when |input - target| < delta,
linear otherwise. Reduction applied by the emitter.
kl_div
Map aten::kl_div(input, target, reduction, log_target) to NNEF.
Two pointwise fragments, picked by log_target:
- kl_div (default): target * (log(target) - input)
- kl_div_log_target: exp(target) * (target - input)
input is assumed to be log-probabilities (caller normally feeds
log_softmax(...)). Torch's reduction='batchmean' is lowered to
sum plus an external division upstream of the aten op, so the
aten reduction enum here is only 0 / 1 / 2.
mse_loss
Map PyTorch aten::mse_loss(input, target, reduction) to NNEF.
Pointwise (input - target) ** 2 is delegated to the mse_loss
fragment, then reduced if reduction != none. Torch broadcasts
input / target upstream of the aten op (we see a separate
aten::broadcast_tensors in the trace), so the fragment can assume
matching shapes.
nll_loss
Map PyTorch's nll_loss family to NNEF.
Signature (all three variants):
nll_loss(input, target, weight, reduction, ignore_index).
The per-sample loss is -input[n, target[n], ...] along the class
axis (=1). Class-weighting and ignore-index masking are common
training-side knobs; we raise T2NErrorNotImplemented for both
until a real need shows up.
smooth_l1_loss
Map aten::smooth_l1_loss(input, target, reduction, beta).
Same piecewise shape as huber_loss with a different scaling: the
quadratic branch is 0.5 * diff^2 / beta and the linear branch is
|diff| - 0.5 * beta (vs huber's delta * (|diff| - 0.5 * delta)).