pytorch 正则项,pytorch 函数

利用pytorch框架构建神经网络时，可以使用框架中各优化方法中自带的“weight_decay”使用权重衰减，即L2范数正则项，例如：

torch.optim.Adam(net.parameters(), weight_decay=0.1)

但是当我们想使用自定义正则项，例如L1正则项的时候，可以使用如下方法：
https://stackoverflow.com/a/50100180/9640590

loss_func = nn.CrossEntropyLoss()lambda = torch.tensor(10**-4)l2_reg = torch.tensor(0.)for param in net.parameters(): l2_reg = l2_reg + param.norm(1)loss = loss_func(output, labels) + lambda * l2_reg

根据这种方式，也可以自定义地使用其他正则化方法。

然而在使用L2,1范数时，由于存在开根号的操作，但是pytorch仅对norm函数的次梯度进行处理（零点次梯度为0），而未对开根号的次梯度进行处理

Note that with this, at 0, it is now different to use .norm() or .pow(2).sum().sqrt() as the first one will return 0 and the second one NaN
https://github.com/pytorch/pytorch/pull/2775

因此需要通过一些操作过滤出0项，例如

def groupl1norm_l1(params): gl1_reg = Variable(torch.tensor(0.), requires_grad=True) if torch.cuda.is_available(): gl1_reg = gl1_reg.cuda() for param in params: norm2 = param.norm(2, dim=0) d = torch.tensor(param.shape[0]).float() mask = norm2.gt(0) # 使用mask过滤掉组内二范数为0的项 gl1_reg = gl1_reg + torch.sqrt(d)*torch.masked_select(norm2, mask).sum() return gl1_reg

或者使用更加简单粗暴的方式，不要让框架自动求梯度了，我们手动求模型中所有参数的梯度并应用：

loss_func = nn.CrossEntropyLoss()optimizer = torch.optim.Adam(net.parameters())loss = loss_func(output, labels)loss.backward()for group in optimizer.param_groups: for p in group[''params'']: if p.grad is not None: ... # p即为模型中的各层参数，假设我们手动求的梯度矩阵为 regu_grad p.grad = p.grad + lmbda*regu_gradoptimizer.step()

这种方法灵活性比上一种要高得多，但是手动求会更麻烦一些。