PaddlePaddle/Paddle

CrossEntropyLoss文档公式及当提供weight参数时的计算问题

NKNaN opened this issue · 3 comments

NKNaN commented

文档链接&描述 Document Links & Description

https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/nn/CrossEntropyLoss_cn.html

  • 前四个公式里的求和 如果从 i=0 开始的话应该是加到 C-1 吧。
  • I.softmax 交叉熵 > 2.2. 当 use_softmax=False 时,公式应该是这个吧
    $$loss_j = -\sum_{i=0}^{C-1}(label_i \cdot \log(P_{i})), j=1,...,N$$
    因为 $label = [label_0,...,label_{C-1}]$ 如果表示实际的类别概率的话,原来写的 $P_{label_{i}}$ 就不对了
  • II.Weight 和 reduction 处理 > 1.2. Soft labels (soft_label = True)
        if weight is not None:
            # trans weight from class to sample, shape:N or [N,H,W] for 1d and 2d cases.
            if soft_label:
                # chajchaj:
                # weight's shape is C, where C is class num.
                # for 1d case: label's shape is [N,C], weight_gather's shape is N.
                # for 2d case: label's shape is [N,H,W,C], weight_gather's shape is [N,H,W].
                weight_gather = paddle.matmul(
                    x=paddle.cast(label, weight.dtype),
                    y=weight,
                    transpose_x=False,
                    transpose_y=True,
                )
                out_shape = list(out.shape)
                weight_gather_reshape = reshape(weight_gather, shape=out_shape)
                out = paddle.cast(out, weight_gather_reshape.dtype)

                out = _C_ops.multiply(out, weight_gather_reshape)

目前 paddle 的文档里写的是
$$loss_j = loss_j^{\prime} \cdot \sum_{i} (weight[label_i] \cdot logits_i)$$
如果从代码里面看的话这块的处理应该是
$$loss_j = loss_j^{\prime} \cdot \sum_{i=0}^{C-1} label_i \cdot weight_i$$
$$loss_j^{\prime} = -\sum_{i=0}^{C-1} label_i \cdot (logits_i - \log(\sum_{k=0}^{C-1} \exp(logits_k)))$$
$loss_j^{\prime}$ 是代码中的 out , shape 是 [N] 或者 [N,H,W]
有点不理解这块,如果按torch的文档这里应该是
$$loss_j = -\sum_{i=0}^{C-1}weight_i \cdot label_i \cdot (logits_i - \log(\sum_{k=0}^{C-1} \exp(logits_k)))$$

请提出你的建议 Please give your suggestion

No response

NKNaN commented
inp = paddle.to_tensor([1.1,1.2,1.3,1.4])
y = paddle.to_tensor([0.1,0.2,0.3,0.4])
w = paddle.to_tensor([0.4,0.3,0.2,0.1])
paddle.nn.functional.cross_entropy(inp,y,weight=w, reduction="none",soft_label=True)
# Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
       [0.26850712])
inp = torch.tensor([1.1,1.2,1.3,1.4])
y = torch.tensor([0.1,0.2,0.3,0.4])
w = torch.tensor([0.4,0.3,0.2,0.1])
torch.nn.functional.cross_entropy(inp,y,weight=w, reduction="none")
# tensor(0.2785)
## computation by paddle (with weight)
logits = np.array([1.1,1.2,1.3,1.4])
y = np.array([0.1,0.2,0.3,0.4])
w = np.array([0.4,0.3,0.2,0.1])
s = 0
denom_for_logits = np.sum(np.exp(logits))
normalized_logits = np.exp(logits)/denom_for_logits

for c in range(len(logits)):
    s += -np.log(normalized_logits)[c]*y[c]
s = s * np.dot(w, y)
print(s)
# 0.26850710589103255
## computation by torch (with weight)
logits = np.array([1.1,1.2,1.3,1.4])
y = np.array([0.1,0.2,0.3,0.4])
w = np.array([0.4,0.3,0.2,0.1])
s = 0
denom_for_logits = np.sum(np.exp(logits))
normalized_logits = np.exp(logits)/denom_for_logits

for c in range(len(logits)):
    s += -w[c]*np.log(normalized_logits)[c]*y[c]
print(s)
# 0.27850710589103256

@NKNaN 你直接提PR修?

@NKNaN 你直接提PR修?

可以的