Pytorch中Softmax和LogSoftmax的使用详解

2025-04-01 16:09:00

一、函数解释

1.Softmax函数常用的用法是指定参数dim就可以：

（1）dim=0：对每一列的所有元素进行softmax运算，并使得每一列所有元素和为1。

（2）dim=1：对每一行的所有元素进行softmax运算，并使得每一行所有元素和为1。

class Softmax(Module):
    r"""Applies the Softmax function to an n-dimensional input Tensor
    rescaling them so that the elements of the n-dimensional output Tensor
    lie in the range [0,1] and sum to 1.
    Softmax is defined as:
    .. math::
        \text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}
    Shape:
        - Input: :math:`(*)` where `*` means, any number of additional
          dimensions
        - Output: :math:`(*)`, same shape as the input
    Returns:
        a Tensor of the same dimension and shape as the input with
        values in the range [0, 1]
    Arguments:
        dim (int): A dimension along which Softmax will be computed (so every slice
            along dim will sum to 1).
    .. note::
        This module doesn't work directly with NLLLoss,
        which expects the Log to be computed between the Softmax and itself.
        Use `LogSoftmax` instead (it's faster and has better numerical properties).
    Examples::
        >>> m = nn.Softmax(dim=1)
        >>> input = torch.randn(2, 3)
        >>> output = m(input)
    """
    __constants__ = ['dim']

    def __init__(self, dim=None):
        super(Softmax, self).__init__()
        self.dim = dim

    def __setstate__(self, state):
        self.__dict__.update(state)
        if not hasattr(self, 'dim'):
            self.dim = None

    def forward(self, input):
        return F.softmax(input, self.dim, _stacklevel=5)

    def extra_repr(self):
        return 'dim={dim}'.format(dim=self.dim)

2.LogSoftmax其实就是对softmax的结果进行log，即Log(Softmax(x))

class LogSoftmax(Module):
    r"""Applies the :math:`\log(\text{Softmax}(x))` function to an n-dimensional
    input Tensor. The LogSoftmax formulation can be simplified as:
    .. math::
        \text{LogSoftmax}(x_{i}) = \log\left(\frac{\exp(x_i) }{ \sum_j \exp(x_j)} \right)
    Shape:
        - Input: :math:`(*)` where `*` means, any number of additional
          dimensions
        - Output: :math:`(*)`, same shape as the input
    Arguments:
        dim (int): A dimension along which LogSoftmax will be computed.
    Returns:
        a Tensor of the same dimension and shape as the input with
        values in the range [-inf, 0)
    Examples::
        >>> m = nn.LogSoftmax()
        >>> input = torch.randn(2, 3)
        >>> output = m(input)
    """
    __constants__ = ['dim']

    def __init__(self, dim=None):
        super(LogSoftmax, self).__init__()
        self.dim = dim

    def __setstate__(self, state):
        self.__dict__.update(state)
        if not hasattr(self, 'dim'):
            self.dim = None

    def forward(self, input):
        return F.log_softmax(input, self.dim, _stacklevel=5)

二、代码示例

输入代码

import torch
import torch.nn as nn
import numpy as np

batch_size = 4
class_num = 6
inputs = torch.randn(batch_size, class_num)
for i in range(batch_size):
    for j in range(class_num):
        inputs[i][j] = (i + 1) * (j + 1)

print("inputs:", inputs)

得到大小batch_size为4，类别数为6的向量（可以理解为经过最后一层得到）

tensor([[ 1., 2., 3., 4., 5., 6.],
[ 2., 4., 6., 8., 10., 12.],
[ 3., 6., 9., 12., 15., 18.],
[ 4., 8., 12., 16., 20., 24.]])

接着我们对该向量每一行进行Softmax

Softmax = nn.Softmax(dim=1)
probs = Softmax(inputs)
print("probs:\n", probs)

得到

tensor([[4.2698e-03, 1.1606e-02, 3.1550e-02, 8.5761e-02, 2.3312e-01, 6.3369e-01],
[3.9256e-05, 2.9006e-04, 2.1433e-03, 1.5837e-02, 1.1702e-01, 8.6467e-01],
[2.9067e-07, 5.8383e-06, 1.1727e-04, 2.3553e-03, 4.7308e-02, 9.5021e-01],
[2.0234e-09, 1.1047e-07, 6.0317e-06, 3.2932e-04, 1.7980e-02, 9.8168e-01]])

此外，我们对该向量每一行进行LogSoftmax

LogSoftmax = nn.LogSoftmax(dim=1)
log_probs = LogSoftmax(inputs)
print("log_probs:\n", log_probs)

得到

tensor([[-5.4562e+00, -4.4562e+00, -3.4562e+00, -2.4562e+00, -1.4562e+00, -4.5619e-01],
[-1.0145e+01, -8.1454e+00, -6.1454e+00, -4.1454e+00, -2.1454e+00, -1.4541e-01],
[-1.5051e+01, -1.2051e+01, -9.0511e+00, -6.0511e+00, -3.0511e+00, -5.1069e-02],
[-2.0018e+01, -1.6018e+01, -1.2018e+01, -8.0185e+00, -4.0185e+00, -1.8485e-02]])

验证每一行元素和是否为1

# probs_sum in dim=1
probs_sum = [0 for i in range(batch_size)]

for i in range(batch_size):
    for j in range(class_num):
        probs_sum[i] += probs[i][j]
    print(i, "row probs sum:", probs_sum[i])

得到每一行的和，看到确实为1

0 row probs sum: tensor(1.)
1 row probs sum: tensor(1.0000)
2 row probs sum: tensor(1.)
3 row probs sum: tensor(1.)

验证LogSoftmax是对Softmax的结果进行Log

# to numpy
np_probs = probs.data.numpy()
print("numpy probs:\n", np_probs)

# np.log()
log_np_probs = np.log(np_probs)
print("log numpy probs:\n", log_np_probs)

得到

numpy probs:
[[4.26977826e-03 1.16064614e-02 3.15496325e-02 8.57607946e-02 2.33122006e-01 6.33691311e-01]
[3.92559559e-05 2.90064461e-04 2.14330270e-03 1.58369839e-02 1.17020354e-01 8.64669979e-01]
[2.90672347e-07 5.83831024e-06 1.17265590e-04 2.35534250e-03 4.73083146e-02 9.50212955e-01]
[2.02340233e-09 1.10474026e-07 6.03167746e-06 3.29318427e-04 1.79801770e-02 9.81684387e-01]]
log numpy probs:
[[-5.4561934e+00 -4.4561934e+00 -3.4561934e+00 -2.4561932e+00 -1.4561933e+00 -4.5619333e-01]
[-1.0145408e+01 -8.1454077e+00 -6.1454072e+00 -4.1454072e+00 -2.1454074e+00 -1.4540738e-01]
[-1.5051069e+01 -1.2051069e+01 -9.0510693e+00 -6.0510693e+00 -3.0510693e+00 -5.1069155e-02]
[-2.0018486e+01 -1.6018486e+01 -1.2018485e+01 -8.0184851e+00 -4.0184855e+00 -1.8485421e-02]]

验证完毕

三、整体代码

import torch
import torch.nn as nn
import numpy as np

batch_size = 4
class_num = 6
inputs = torch.randn(batch_size, class_num)
for i in range(batch_size):
    for j in range(class_num):
        inputs[i][j] = (i + 1) * (j + 1)

print("inputs:", inputs)
Softmax = nn.Softmax(dim=1)
probs = Softmax(inputs)
print("probs:\n", probs)

LogSoftmax = nn.LogSoftmax(dim=1)
log_probs = LogSoftmax(inputs)
print("log_probs:\n", log_probs)

# probs_sum in dim=1
probs_sum = [0 for i in range(batch_size)]

for i in range(batch_size):
    for j in range(class_num):
        probs_sum[i] += probs[i][j]
    print(i, "row probs sum:", probs_sum[i])

# to numpy
np_probs = probs.data.numpy()
print("numpy probs:\n", np_probs)

# np.log()
log_np_probs = np.log(np_probs)
print("log numpy probs:\n", log_np_probs)

基于pytorch softmax,logsoftmax 表达

import torch
import numpy as np
input = torch.autograd.Variable(torch.rand(1, 3))

print(input)
print('softmax={}'.format(torch.nn.functional.softmax(input, dim=1)))
print('logsoftmax={}'.format(np.log(torch.nn.functional.softmax(input, dim=1))))

以上为个人经验，希望能给大家一个参考，也希望大家多多支持我们。

PyTorch的SoftMax交叉熵损失和梯度用法

在PyTorch中可以方便的验证SoftMax交叉熵损失和对输入梯度的计算关于softmax_cross_entropy求导的过程,可以参考HERE 示例: # -*- coding: utf-8 -*- import torch import torch.autograd as autograd from torch.autograd import Variable import torch.nn.functional as F import torch.nn as nn import nu
浅谈pytorch中torch.max和F.softmax函数的维度解释

在利用torch.max函数和F.Ssoftmax函数时,对应该设置什么维度,总是有点懵,遂总结一下: 首先看看二维tensor的函数的例子: import torch import torch.nn.functional as F input = torch.randn(3,4) print(input) tensor([[-0.5526, -0.0194, 2.1469, -0.2567], [-0.3337, -0.9229, 0.0376, -0.0801], [ 1.4721, 0.1
PyTorch: Softmax多分类实战操作

多分类一种比较常用的做法是在最后一层加softmax归一化,值最大的维度所对应的位置则作为该样本对应的类.本文采用PyTorch框架,选用经典图像数据集mnist学习一波多分类. MNIST数据集 MNIST 数据集(手写数字数据集)来自美国国家标准与技术研究所, National Institute of Standards and Technology (NIST). 训练集 (training set) 由来自 250 个不同人手写的数字构成, 其中 50% 是高中学生, 50% 来自人口
Pytorch中Softmax和LogSoftmax的使用详解

一.函数解释 1.Softmax函数常用的用法是指定参数dim就可以: (1)dim=0:对每一列的所有元素进行softmax运算,并使得每一列所有元素和为1. (2)dim=1:对每一行的所有元素进行softmax运算,并使得每一行所有元素和为1. class Softmax(Module): r"""Applies the Softmax function to an n-dimensional input Tensor rescaling them so that th
对Pytorch中nn.ModuleList 和 nn.Sequential详解

简而言之就是,nn.Sequential类似于Keras中的贯序模型,它是Module的子类,在构建数个网络层之后会自动调用forward()方法,从而有网络模型生成.而nn.ModuleList仅仅类似于pytho中的list类型,只是将一系列层装入列表,并没有实现forward()方法,因此也不会有网络模型产生的副作用. 需要注意的是,nn.ModuleList接受的必须是subModule类型,例如: nn.ModuleList( [nn.ModuleList([Conv(inp_dim
在Pytorch中计算卷积方法的区别详解(conv2d的区别)

在二维矩阵间的运算: class torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True) 对由多个特征平面组成的输入信号进行2D的卷积操作.详解 torch.nn.functional.conv2d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1)
PyTorch中torch.manual_seed()的用法实例详解

目录一.torch.manual_seed(seed) 介绍 torch.manual_seed(seed) 功能描述语法参数返回二.类似函数的功能三.实例实例 1 :不设随机种子,生成随机数实例 2 :设置随机种子,使得每次运行代码生成的随机数都一样实例 3 :不同的随机种子生成不同的值总结一.torch.manual_seed(seed) 介绍 torch.manual_seed(seed) 功能描述设置 CPU 生成随机数的种子 ,方便下次复现实验结果. 为 CP
PyTorch中的拷贝与就地操作详解

前言 PyTroch中我们经常使用到Numpy进行数据的处理,然后再转为Tensor,但是关系到数据的更改时我们要注意方法是否是共享地址,这关系到整个网络的更新.本篇就In-palce操作,拷贝操作中的注意点进行总结. In-place操作 pytorch中原地操作的后缀为_,如.add_()或.scatter_(),就地操作是直接更改给定Tensor的内容而不进行复制的操作,即不会为变量分配新的内存.Python操作类似+=或*=也是就地操作.(我加了我自己~) 为什么in-place操作可以
Pytorch中TensorBoard及torchsummary的使用详解

1.TensorBoard神经网络可视化工具 TensorBoard是一个强大的可视化工具,在pytorch中有两种调用方法: 1.from tensorboardX import SummaryWriter 这种方法是在官方还不支持tensorboard时网上有大神写的 2.from torch.utils.tensorboard import SummaryWriter 这种方法是后来更新官方加入的 1.1 调用方法 1.1.1 创建接口SummaryWriter 功能:创建接口调用方法:
Pytorch中的学习率衰减及其用法详解

Pytorch 学习率衰减及其用法学习率衰减是一个非常有效的炼丹技巧之一,在神经网络的训练过程中,当accuracy出现震荡或loss不再下降时,进行适当的学习率衰减是一个行之有效的手段,很多时候能明显提高accuracy. Pytorch中有两种学习率调整(衰减)方法: 使用库函数进行调整: 手动调整. 1. 使用库函数进行调整: Pytorch学习率调整策略通过 torch.optim.lr_sheduler 接口实现.pytorch提供的学习率调整策略分为三大类,分别是: (1)有序调整
PyTorch中torch.nn.functional.cosine_similarity使用详解

目录概述按照dim=0求余弦相似: 按照dim=1求余弦相似: 总结概述根据官网文档的描述,其中 dim表示沿着对应的维度计算余弦相似.那么怎么理解呢? 首先,先介绍下所谓的dim: a = torch.tensor([[ [1, 2], [3, 4] ], [ [5, 6], [7, 8] ] ], dtype=torch.float) print(a.shape) """ [ [ [1, 2], [3, 4] ], [ [5, 6], [7, 8] ] ] &qu
Pytorch中Softmax与LogSigmoid的对比分析

Pytorch中Softmax与LogSigmoid的对比 torch.nn.Softmax 作用: 1.将Softmax函数应用于输入的n维Tensor,重新改变它们的规格,使n维输出张量的元素位于[0,1]范围内,并求和为1. 2.返回的Tensor与原Tensor大小相同,值在[0,1]之间. 3.不建议将其与NLLLoss一起使用,可以使用LogSoftmax代替之. 4.Softmax的公式: 参数: 维度,待使用softmax计算的维度. 例子: # 随机初始化一个tensor a
Pytorch十九种损失函数的使用详解

损失函数通过torch.nn包实现, 1 基本用法 criterion = LossCriterion() #构造函数有自己的参数 loss = criterion(x, y) #调用标准时也有参数 2 损失函数 2-1 L1范数损失 L1Loss 计算 output 和 target 之差的绝对值. torch.nn.L1Loss(reduction='mean') 参数: reduction-三个值,none: 不使用约简:mean:返回loss和的平均值: sum:返回loss的和.默认: