CrossEntropyLoss文档公式及当提供weight参数时的计算问题
NKNaN opened this issue · 3 comments
NKNaN commented
文档链接&描述 Document Links & Description
https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/nn/CrossEntropyLoss_cn.html
- 前四个公式里的求和 如果从 i=0 开始的话应该是加到 C-1 吧。
- I.softmax 交叉熵 > 2.2. 当 use_softmax=False 时,公式应该是这个吧
$$loss_j = -\sum_{i=0}^{C-1}(label_i \cdot \log(P_{i})), j=1,...,N$$
因为$label = [label_0,...,label_{C-1}]$ 如果表示实际的类别概率的话,原来写的$P_{label_{i}}$ 就不对了 - II.Weight 和 reduction 处理 > 1.2. Soft labels (soft_label = True)
if weight is not None:
# trans weight from class to sample, shape:N or [N,H,W] for 1d and 2d cases.
if soft_label:
# chajchaj:
# weight's shape is C, where C is class num.
# for 1d case: label's shape is [N,C], weight_gather's shape is N.
# for 2d case: label's shape is [N,H,W,C], weight_gather's shape is [N,H,W].
weight_gather = paddle.matmul(
x=paddle.cast(label, weight.dtype),
y=weight,
transpose_x=False,
transpose_y=True,
)
out_shape = list(out.shape)
weight_gather_reshape = reshape(weight_gather, shape=out_shape)
out = paddle.cast(out, weight_gather_reshape.dtype)
out = _C_ops.multiply(out, weight_gather_reshape)
目前 paddle 的文档里写的是
如果从代码里面看的话这块的处理应该是
out
, shape 是 [N]
或者 [N,H,W]
有点不理解这块,如果按torch的文档这里应该是
请提出你的建议 Please give your suggestion
No response
NKNaN commented
inp = paddle.to_tensor([1.1,1.2,1.3,1.4])
y = paddle.to_tensor([0.1,0.2,0.3,0.4])
w = paddle.to_tensor([0.4,0.3,0.2,0.1])
paddle.nn.functional.cross_entropy(inp,y,weight=w, reduction="none",soft_label=True)
# Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
[0.26850712])
inp = torch.tensor([1.1,1.2,1.3,1.4])
y = torch.tensor([0.1,0.2,0.3,0.4])
w = torch.tensor([0.4,0.3,0.2,0.1])
torch.nn.functional.cross_entropy(inp,y,weight=w, reduction="none")
# tensor(0.2785)
## computation by paddle (with weight)
logits = np.array([1.1,1.2,1.3,1.4])
y = np.array([0.1,0.2,0.3,0.4])
w = np.array([0.4,0.3,0.2,0.1])
s = 0
denom_for_logits = np.sum(np.exp(logits))
normalized_logits = np.exp(logits)/denom_for_logits
for c in range(len(logits)):
s += -np.log(normalized_logits)[c]*y[c]
s = s * np.dot(w, y)
print(s)
# 0.26850710589103255
## computation by torch (with weight)
logits = np.array([1.1,1.2,1.3,1.4])
y = np.array([0.1,0.2,0.3,0.4])
w = np.array([0.4,0.3,0.2,0.1])
s = 0
denom_for_logits = np.sum(np.exp(logits))
normalized_logits = np.exp(logits)/denom_for_logits
for c in range(len(logits)):
s += -w[c]*np.log(normalized_logits)[c]*y[c]
print(s)
# 0.27850710589103256