pytorch debug,pytorch训练loss不变

关于Pytorch的Loss源代码

了解Pytorch的人应该知道历史负担很重。吸收Caffe2的基本代码，自己借用其中的一部分基本代码写各种OP逻辑，最后露出Python接口供用户使用。

因此，您可能有点不熟悉第一次接触Pytorch源代码，但基本上Pytorch的大多数OP逻辑实现代码都放在Aten/native下。这里主要基于Loss.cpp进行说明

MarginRankingLoss RankingLoss系列用于计算输入采样的距离，而不是像MSELoss那样直接回归。其主要思想分为Margin和Ranking。

MarginRankingLoss官方Margin一词的意思是页面边缘的空白，平时我们打印时，文本内容外部的空白叫做Margin。

在Loss中也有类似的意义，相当于当样本距离（即Loss）超过范围，即表示样本差异性足够了的一定范围，不需要计算Loss。

Ranking 则是排序，当target=1，则说明x1排名需要大于x2；当target=2，则说明x2排名需要大于x1

源代码逻辑也很简单，根据公式计算，最后根据reduction类型进行reduce_mean/sum

在Pytorch的MarginRankingLoss代码下有相应的numpy实现代码

efnp _ margin _ ranking _ loss (input 1，input2，target，margin，reduction ) 3360output=NP.maximum(0， -target* (input1-input2) margin ) if reduction==' mean ' : return NP.mean (output ) elif reduction==' sum ' 33333 else 360 rection 中提出的，用于测量不同人脸特征之间的距离，实现人脸识别和聚类

相对于TripletLoss，TripletMarginLoss结合了TripletLoss和MarginRankingLoss的思想，具体而言是learninglocalfeaturedescriptorswithtripletsandshoshs

其中，d是pbldmg函数

距离函数bldmg的具体公式是

对于不同的样本对，bldmg格式Loss可以有以下三种情况

简单的样本，也就是

那时3358www.Sina.com/仍然小于正样本距离anchor的距离d(ai, pi) + Margin，在这种情况下，由于正样本距离足够小并且被认为不需要优化，所以Loss为0

困难的样本，即

在这种情况下，负样本距离anchor的距离d(ai, ni)小于负样本距离anchor的距离d(ai, ni)，因此需要进行优化

半困难的样品，即

此时正样本距离anchor的距离d(ai, pi)大于负样本距离anchor的距离d(ai, ni)，但尚未超过Margin，需要优化

另外，论文作者提出了swap这一概念。这是因为，在我们的公式中为正样本距离anchor的距离d(ai, pi)，而在只考虑了anchor距离正类和负类的距离中，考虑以下内容

没有考虑正类和负类之间的距离，所以可以进行一种叫做交换操作的swap。代码中出现的操作是取最小值。

#伪代码ifswap3360d(a，n )=min ) a，n )，d ) p，n ) )这样取最小值，在Loss的计算公式中Loss值变大，进而有助于区分负采样。

我理解有了前面的铺位，Pytorch的三重映射链接源代码也非常简单

TripletMarginLoss源代码at:pairwise_distance是距离计算函数，首先计算了anchor与正类和负类的距离。然后，基于参数swap，决定是否考虑正类别和负类别之间的距离。最重要的

后output就是按照公式进行计算，下面是numpy的对应代码

def np_triplet_margin_loss(anchor, postive, negative, margin, swap, reduction="mean", p=2, eps=1e-6): def _np_distance(input1, input2, p, eps): # Compute the distance (p-norm) np_pnorm = np.power(np.abs((input1 - input2 + eps)), p) np_pnorm = np.power(np.sum(np_pnorm, axis=-1), 1.0 / p) return np_pnorm dist_pos = _np_distance(anchor, postive, p, eps) dist_neg = _np_distance(anchor, negative, p, eps) if swap: dist_swap = _np_distance(postive, negative, p, eps) dist_neg = np.minimum(dist_neg, dist_swap) output = np.maximum(margin + dist_pos - dist_neg, 0) if reduction == "mean": return np.mean(output) elif reduction == "sum": return np.sum(output) else: return output

这里比较容易踩坑的是pbldmg的计算，因为当p=2，根据bldmg的公式，如果输入有负数是不合法的，比如

于是我们从distance函数开始找线索，发现它是调用at::norm

pairwise_distance

根据Pytorch的文档，它其实在计算的时候调用了abs绝对值，来避免最后负数出现，从而保证运算的合理性

Norm文档 KLDivLoss

该损失函数是计算KL散度（即相对熵），它可以用于衡量两个分布的差异

KL散度基本定义

当p和q分布越接近，则趋近于1，经过log运算后，loss值为0

当分布差异比较大，则损失值就比较高

Pytorch中计算公式中还不太一样

Pytorch的KLDivLoss公式

下面我们看看Pytorch对应的源码

KLDivLoss源码

首先可以观察到，除了常规的input，target，reduction，还有一个额外的参数 log_target，用于表示target是否已经经过log运算。根据这个参数，KLDivLoss进而分成两个函数 _kl_div_log_target 和 _kl_div_non_log_target 实现。

_kl_div_log_target 的实现比较简单，就是按照公式进行计算

而 _kl_div_non_log_target 有些许不同，因为target的数值范围不确定，当为负数的时候，log运算时不合法的。因此Pytorch初始化了一个全0数组，然后在最后的loss计算中，在target小于0的地方填0，避免nan数值出现

下面是对应的numpy实现代码

def np_kldivloss(input, target, log_target, reduction="mean"): if log_target: output = np.exp(target)*(target - input) else: output_pos = target*(np.log(target) - input) zeros = np.zeros_like(input) output = np.where(target>0, output_pos, zeros) if reduction == "mean": return np.mean(output) elif reduction == "sum": return np.sum(output) else: return output BCEWithLogitsLoss

熟悉二分类交叉熵损失函数BCELoss的应该知道，该函数输入的是个分类概率，范围在0~1之间，最后计算交叉熵。我们先看下该损失函数的参数

BCEWithLogitsLoss参数

weight 表示最后loss缩放权重

reduction 表示最后是做mean, sum, none 操作

pos_weight 表示针对正样本的权重，即positive weight

下面是其计算公式其中表示sigmoid运算

BCEWithLogitsLoss

BCEWithLogitsLoss 相当于 sigmoid + BCELoss，但实际上 Pytorch为了更好的数值稳定性，并不是这么做的，下面我们看看对应的源代码

Pytorch的BCEWithLogitsLoss源码

这段源代码其实看的不太直观，我们可以看下numpy对应的代码

def np_bce_with_logits_loss(np_input, np_target, np_weight, np_pos_weight, reduction="mean"): max_val = np.maximum(-np_input, 0) if np_pos_weight.any(): log_weight = ((np_pos_weight - 1) * np_target) + 1 loss = (1 - np_target) * np_input loss_1 = np.log(np.exp(-max_val) + np.exp(-np_input - max_val)) + max_val loss += log_weight * loss_1 else: loss = (1 - np_target) * np_input loss += max_val loss += np.log(np.exp(-max_val) + np.exp(-np_input - max_val)) output = loss * np_weight if reduction == "mean": return np.mean(output) elif reduction == "sum": return np.sum(output) else: return output

因为涉及到了sigmoid运算，所以有以下公式

计算中，如果x过大或过小，会导致指数运算出现上溢或下溢，因此我们可以用 log-sum-exp 的技巧来避免数值溢出，具体可以看下面公式推导（特此感谢德澎！）

公式推导总结

看源代码没有想象中那么难，只要破除迷信，敢于尝试，你也能揭开源码的神秘面纱~