eazygrad.AdamW¶

class eazygrad.AdamW(parameters: Sequence[_Tensor], lr: float = 0.001, betas: tuple[float, float] = (0.9, 0.99), eps: float = 1e-08, weight_decay: float = 0.01)[source]¶

Bases: Adam

AdamW optimizer with decoupled weight decay.

Parameters:

parameters (sequence of _Tensor) – Iterable of tensors to optimize.
lr (float, default=1e-3) – Learning rate.
betas (tuple of float, default=(0.9, 0.99)) – Coefficients used for the running averages of the gradient and squared gradient.
eps (float, default=1e-8) – Small value added for numerical stability.
weight_decay (float, default=0.01) – Decoupled weight decay coefficient applied before the Adam update.