fastai16-accel SGD
通用优化器 SGD | 随机梯度下降 def sgd_cb(p, lr, **kwargs): p.data.add_(-lr, p.grad.data) opt_func = partial(Optimizer, cbs=[sgd_cb]) Momentum | 动量 def average_grad(p, mom, grad_avg=None, **kwargs): if grad_avg is None: grad_avg = torch.zeros_like(p.grad.data) return {'grad_