Tensorflow中k.gradients()和tf.stop_gradient()用法说明

2025-10-23 09:16:40

上周在实验室开荒某个代码，看到中间这么一段，对Tensorflow中的stop_gradient()还不熟悉，特此周末进行重新并总结。

y = xx + K.stop_gradient(rounded - xx)

这代码最终调用位置在tensoflow.python.ops.gen_array_ops.stop_gradient(input, name=None)，关于这段代码为什么这样写的意义在文末给出。

【stop_gradient()意义】

用stop_gradient生成损失函数w.r.t.的梯度。

【tf.gradients()理解】

tf中我们只需要设计我们自己的函数，tf提供提供强大的自动计算函数梯度方法，tf.gradients()。

tf.gradients(
 ys,
 xs,
 grad_ys=None,
 name='gradients',
 colocate_gradients_with_ops=False,
 gate_gradients=False,
 aggregation_method=None,
 stop_gradients=None,
 unconnected_gradients=tf.UnconnectedGradients.NONE
)

gradients() adds ops to the graph to output the derivatives of ys with respect to xs. It returns a list of Tensor of length len(xs) where each tensor is the sum(dy/dx) for y in ys.

1、tf.gradients()实现ys对xs的求导

2、ys和xs可以是Tensor或者list包含的Tensor

3、求导返回值是一个list，list的长度等于len(xs)

eg.假设返回值是[grad1, grad2, grad3]，ys=[y1, y2]，xs=[x1, x2, x3]。则计算过程为:

import numpy as np
import tensorflow as tf

#构造数据集
x_pure = np.random.randint(-10, 100, 32)
x_train = x_pure + np.random.randn(32) / 32
y_train = 3 * x_pure + 2 + np.random.randn(32) / 32

x_input = tf.placeholder(tf.float32, name='x_input')
y_input = tf.placeholder(tf.float32, name='y_input')
w = tf.Variable(2.0, name='weight')
b = tf.Variable(1.0, name='biases')
y = tf.add(tf.multiply(x_input, w), b)

loss_op = tf.reduce_sum(tf.pow(y_input - y, 2)) / (2 * 32)
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss_op)
gradients_node = tf.gradients(loss_op, w)

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

for i in range(20):
 _, gradients, loss = sess.run([train_op, gradients_node, loss_op], feed_dict={x_input: x_train[i], y_input: y_train[i]})
 print("epoch: {} \t loss: {} \t gradients: {}".format(i, loss, gradients))
sess.close()

自定义梯度和更新函数

import numpy as np
import tensorflow as tf

#构造数据集
x_pure = np.random.randint(-10, 100, 32)
x_train = x_pure + np.random.randn(32) / 32
y_train = 3 * x_pure + 2 + np.random.randn(32) / 32

x_input = tf.placeholder(tf.float32, name='x_input')
y_input = tf.placeholder(tf.float32, name='y_input')
w = tf.Variable(2.0, name='weight')
b = tf.Variable(1.0, name='biases')
y = tf.add(tf.multiply(x_input, w), b)

loss_op = tf.reduce_sum(tf.pow(y_input - y, 2)) / (2 * 32)
# train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss_op)

#自定义权重更新
grad_w, grad_b = tf.gradients(loss_op, [w, b])
new_w = w.assign(w - 0.01 * grad_w)
new_b = b.assign(b - 0.01 * grad_b)

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

for i in range(20):
 _, gradients, loss = sess.run([new_w, new_b, loss_op], feed_dict={x_input: x_train[i], y_input: y_train[i]})
 print("epoch: {} \t loss: {} \t gradients: {}".format(i, loss, gradients))
sess.close()

【tf.stop_gradient()理解】

在tf.gradients()参数中存在stop_gradients，这是一个List，list中的元素是tensorflow graph中的op，一旦进入这个list，将不会被计算梯度，更重要的是，在该op之后的BP计算都不会运行。

import numpy as np
import tensorflow as tf

a = tf.constant(0.)
b = 2 * a
c = a + b
g = tf.gradients(c, [a, b])

with tf.Session() as sess:
 tf.global_variables_initializer().run()
 print(sess.run(g))

#输出[3.0, 1.0]

在用一个stop_gradient()的例子

import tensorflow as tf

#实验一
w1 = tf.Variable(2.0)
w2 = tf.Variable(2.0)
a = tf.multiply(w1, 3.0)
a_stoped = tf.stop_gradient(a)

# b=w1*3.0*w2
b = tf.multiply(a_stoped, w2)
gradients = tf.gradients(b, xs=[w1, w2])
print(gradients)
#输出[None, <tf.Tensor 'gradients/Mul_1_grad/Reshape_1:0' shape=() dtype=float32>]

#实验二
a = tf.Variable(1.0)
b = tf.Variable(1.0)
c = tf.add(a, b)
c_stoped = tf.stop_gradient(c)
d = tf.add(a, b)
e = tf.add(c_stoped, d)
gradients = tf.gradients(e, xs=[a, b])
with tf.Session() as sess:
 tf.global_variables_initializer().run()
 print(sess.run(gradients))

#因为梯度从另外地方传回，所以输出 [1.0, 1.0]

【答案】

开始提出的问题，为什么存在那段代码：

t = g(x)

y = t + tf.stop_gradient(f(x) - t)

这里，我们本来的前向传递函数是XX，但是想要在反向时传递的函数是g(x)，因为在前向过程中，tf.stop_gradient()不起作用，因此+t和-t抵消掉了，只剩下f(x)前向传递；而在反向过程中，因为tf.stop_gradient()的作用，使得f(x)-t的梯度变为了0，从而只剩下g(x)在反向传递。

以上这篇Tensorflow中k.gradients()和tf.stop_gradient()用法说明就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持我们。

Keras 使用 Lambda层详解

我就废话不多说了,大家还是直接看代码吧! from tensorflow.python.keras.models import Sequential, Model from tensorflow.python.keras.layers import Dense, Flatten, Conv2D, MaxPool2D, Dropout, Conv2DTranspose, Lambda, Input, Reshape, Add, Multiply from tensorflow.python.ker
Keras之自定义损失(loss)函数用法说明

在Keras中可以自定义损失函数,在自定义损失函数的过程中需要注意的一点是,损失函数的参数形式,这一点在Keras中是固定的,须如下形式: def my_loss(y_true, y_pred): # y_true: True labels. TensorFlow/Theano tensor # y_pred: Predictions. TensorFlow/Theano tensor of the same shape as y_true . . . return scalar #返回一个标量
keras-siamese用自己的数据集实现详解

Siamese网络不做过多介绍,思想并不难,输入两个图像,输出这两张图像的相似度,两个输入的网络结构是相同的,参数共享. 主要发现很多代码都是基于mnist数据集的,下面说一下怎么用自己的数据集实现siamese网络. 首先,先整理数据集,相同的类放到同一个文件夹下,如下图所示: 接下来,将pairs及对应的label写到csv中,代码如下: import os import random import csv #图片所在的路径 path = '/Users/mac/Desktop/wxd/fl
keras打印loss对权重的导数方式

Notes 怀疑模型梯度爆炸,想打印模型 loss 对各权重的导数看看.如果如果fit来训练的话,可以用keras.callbacks.TensorBoard实现. 但此次使用train_on_batch来训练的,用K.gradients和K.function实现. Codes 以一份 VAE 代码为例 # -*- coding: utf8 -*- import keras from keras.models import Model from keras.layers import Input
Tensorflow中k.gradients()和tf.stop_gradient()用法说明

上周在实验室开荒某个代码,看到中间这么一段,对Tensorflow中的stop_gradient()还不熟悉,特此周末进行重新并总结. y = xx + K.stop_gradient(rounded - xx) 这代码最终调用位置在tensoflow.python.ops.gen_array_ops.stop_gradient(input, name=None),关于这段代码为什么这样写的意义在文末给出. [stop_gradient()意义] 用stop_gradient生成损失函数w.r.
Tensorflow中的图（tf.Graph）和会话（tf.Session）的实现

Tensorflow编程系统 Tensorflow工具或者说深度学习本身就是一个连贯紧密的系统.一般的系统是一个自治独立的.能实现复杂功能的整体.系统的主要任务是对输入进行处理,以得到想要的输出结果.我们之前见过的很多系统都是线性的,就像汽车生产工厂的流水线一样,输入->系统处理->输出.系统内部由很多单一的基本部件构成,这些单一部件具有特定的功能,且需要稳定的特性:系统设计者通过特殊的连接方式,让这些简单部件进行连接,以使它们之间可以进行数据交流和信息互换,来达到相互配合而完成具体工作的目的
Tensorflow中的降维函数tf.reduce_*使用总结

在使用tensorflow时常常会使用到tf.reduce_*这类的函数,在此对一些常见的函数进行汇总 1.tf.reduce_sum tf.reduce_sum(input_tensor , axis = None , keep_dims = False , name = None , reduction_indices = None) 参数: input_tensor:要减少的张量.应该有数字类型. axis:要减小的尺寸.如果为None(默认),则缩小所有尺寸.必须在范围[-rank(in
在TensorFlow中实现矩阵维度扩展

一般TensorFlow中扩展维度可以使用tf.expand_dims().近来发现另一种可以直接运用取数据操作符[]就能扩展维度的方法. 用法很简单,在要扩展的维度上加上tf.newaxis就行了. foo = tf.constant([[1,2,3], [4,5,6], [7,8,9]]) print(foo[tf.newaxis, :, :].eval()) # => [[[1,2,3], [4,5,6], [7,8,9]]] print(foo[:, tf.newaxis, :].eva
TensorFlow中tf.batch_matmul()的用法

TensorFlow中tf.batch_matmul()用法如果有两个三阶张量,size分别为 a.shape = [100, 3, 4] b.shape = [100, 4, 5] c = tf.batch_matmul(a, b) 则c.shape = [100, 3, 5] //将每一对 3x4 的矩阵与 4x5 的矩阵分别相乘.batch_size不变 100为张量的batch_size.剩下的两个维度为数据的维度. 不过新版的tensorflow已经移除了上面的函数,使用时换为tf.
Tensorflow中tf.ConfigProto()的用法详解

参考Tensorflow Machine Leanrning Cookbook tf.ConfigProto()主要的作用是配置tf.Session的运算方式,比如gpu运算或者cpu运算具体代码如下: import tensorflow as tf session_config = tf.ConfigProto( log_device_placement=True, inter_op_parallelism_threads=0, intra_op_parallelism_threads=0,
关于Tensorflow中的tf.train.batch函数的使用

这两天一直在看tensorflow中的读取数据的队列,说实话,真的是很难懂.也可能我之前没这方面的经验吧,最早我都使用的theano,什么都是自己写.经过这两天的文档以及相关资料,并且请教了国内的师弟.今天算是有点小感受了.简单的说,就是计算图是从一个管道中读取数据的,录入管道是用的现成的方法,读取也是.为了保证多线程的时候从一个管道读取数据不会乱吧,所以这种时候读取的时候需要线程管理的相关操作.今天我实验室了一个简单的操作,就是给一个有序的数据,看看读出来是不是有序的,结果发现是有序的,所以
浅谈tensorflow中几个随机函数的用法

如下所示: tf.constant(value, dtype=None, shape=None) 创建一个常量tensor,按照给出value来赋值,可以用shape来指定其形状.value可以是一个数,也可以是一个list. 如果是一个数,那么这个常亮中所有值的按该数来赋值. tf.random_normal(shape,mean=0.0,stddev=1.0,dtype=tf.float32) tf.truncated_normal(shape, mean=0.0, stddev=1.0,
tensorflow中tf.slice和tf.gather切片函数的使用

tf.slice(input_, begin, size, name=None):按照指定的下标范围抽取连续区域的子集 tf.gather(params, indices, validate_indices=None, name=None):按照指定的下标集合从axis=0中抽取子集,适合抽取不连续区域的子集输出: input = [[[1, 1, 1], [2, 2, 2]], [[3, 3, 3], [4, 4, 4]], [[5, 5, 5], [6, 6, 6]]] tf.slice(
浅谈tensorflow 中tf.concat()的使用

concat()是将tensor沿着指定维度连接起来.其中tensorflow1.3版中是这样定义的: concat(values,axis,name='concat') 一.对于2维来说,0表示行,1表示列 t1 = [[1, 2, 3], [4, 5, 6]] t2 = [[7, 8, 9], [10, 11, 12]] with tf.Session() as sess: print(sess.run(tf.concat([t1, t2], 0) )) 结果为:[[1, 2, 3], [4

Tensorflow中k.gradients()和tf.stop_gradient()用法说明

相关推荐

随机推荐