eazygrad.AdamW

class eazygrad.AdamW(parameters: Sequence[_Tensor], lr: float = 0.001, betas: tuple[float, float] = (0.9, 0.99), eps: float = 1e-08, weight_decay: float = 0.01)[source]

Bases: Adam

AdamW optimizer with decoupled weight decay.

Parameters:
  • parameters (sequence of _Tensor) – Iterable of tensors to optimize.

  • lr (float, default=1e-3) – Learning rate.

  • betas (tuple of float, default=(0.9, 0.99)) – Coefficients used for the running averages of the gradient and squared gradient.

  • eps (float, default=1e-8) – Small value added for numerical stability.

  • weight_decay (float, default=0.01) – Decoupled weight decay coefficient applied before the Adam update.