Tensorflow 2.4 搭建单层和多层 Bi-LSTM 模型

2025-06-07 15:02:41

前言

本文使用 cpu 版本的 TensorFlow 2.4 ，分别搭建单层 Bi-LSTM 模型和多层 Bi-LSTM 模型完成文本分类任务。

确保使用 numpy == 1.19.0 左右的版本，否则在调用 TextVectorization 的时候可能会报 NotImplementedError 。

实现过程

1. 获取数据

（1）我们本文用到的数据是电影的影评数据，每个样本包含了一个对电影的评论文本和一个情感标签，1 表示积极评论，0 表示负面评论，也就是说这是一份二分类的数据。

（2）我们通过 TensorFlow 内置的函数，可以从网络上直接下载 imdb_reviews 数据到本地的磁盘，并取出训练数据和测试数据。

（3）通过使用 tf.data.Dataset 相关的处理函数，我们将训练数据和测试数据分别进行混洗，并且设置每个 batch 大小都是 64 ，每个样本都是 (text, label) 的形式。如下我们取了任意一个 batch 中的前两个影评文本和情感标签。

import numpy as np
import tensorflow_datasets as tfds
import tensorflow as tf
import matplotlib.pyplot as plt
tfds.disable_progress_bar()
BUFFER_SIZE = 10000
BATCH_SIZE = 64
dataset, info = tfds.load('imdb_reviews', with_info=True, as_supervised=True)
train_dataset, test_dataset = dataset['train'], dataset['test']
train_dataset = train_dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
test_dataset = test_dataset.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
for example, label in train_dataset.take(1):
    print('text: ', example.numpy()[:2])
    print('label: ', label.numpy()[:2])

部分样本显示：

text: [
b"First of all, I have to say I have worked for blockbuster and have seen quite a few movies to the point its tough for me to find something I haven't seen. Taking this into account, I want everyone to know that this movie was by far the worst film ever made, it made me pine for Gigli, My Boss's Daughter, and any other piece of junk you've ever seen. BeLyt must be out of his mind, I've only found one person who liked it and even they couldn't tell me what the movie was about. If you are able to decipher this movie and are able to tell me what it was about you have to either be the writer or a fortune teller because there's any other way a person could figure this crap out.<br /><br />FOR THE LOVE OF G-D STAY AWAY!"
b"Just got out and cannot believe what a brilliant documentary this is. Rarely do you walk out of a movie theater in such awe and amazement. Lately movies have become so over hyped that the thrill of discovering something truly special and unique rarely happens. Amores Perros did this to me when it first came out and this movie is doing to me now. I didn't know a thing about this before going into it and what a surprise. If you hear the concept you might get the feeling that this is one of those touchy movies about an amazing triumph covered with over the top music and trying to have us fully convinced of what a great story it is telling but then not letting us in. Fortunetly this is not that movie. The people tell the story! This does such a good job of capturing every moment of their involvement while we enter their world and feel every second with them. There is so much beyond the climb that makes everything they go through so much more tense. Touching the Void was also a great doc about mountain climbing and showing the intensity in an engaging way but this film is much more of a human story. I just saw it today but I will go and say that this is one of the best documentaries I have ever seen."
]
label: [0 1]

2. 处理数据

（1）想要在模型中训练这些数据，必须将这些文本中的 token 都转换成机器可以识别的整数，最简单的方法就是使用 TextVectorization 来制作一个编码器 encoder，这里只将出现次数最多的 1000 个 token 当做词表，另外规定每个影评处理之后只能保留最长 200 的长度，如果超过则会被截断，如果不足则用填充字符对应的整数 0 补齐。

（2）这里展现出来了某个样本的经过整数映射止之后的结果，可以看到影评对应的整数数组长度为 200 。

MAX_SEQ_LENGTH = 200
VOCAB_SIZE = 1000
encoder = tf.keras.layers.experimental.preprocessing.TextVectorization(max_tokens=VOCAB_SIZE, output_sequence_length=MAX_SEQ_LENGTH)
encoder.adapt(train_dataset.map(lambda text, label: text))
vocab = np.array(encoder.get_vocabulary())
encoded_example = encoder(example)[:1].numpy()
print(encoded_example)
print(label[:1])

随机选取一个样本进行证书映射结果：

[[ 86 5 32 10 26 6 130 10 26 926 16 1 3 26 108 176 4 164
93 6 2 215 30 1 16 70 6 160 140 10 731 108 647 11 78 1
10 178 305 6 118 12 11 18 14 33 234 2 240 20 122 91 9 91
70 1 16 1 56 1 580 3 99 81 408 5 1 825 122 108 1 217
28 46 5 25 349 195 61 249 29 409 37 405 9 3 54 35 404 360
70 49 2 18 14 43 45 23 24 491 6 1 11 18 3 24 491 6
360 70 49 9 14 43 23 26 6 352 28 2 762 42 4 1 1 80
213 99 81 97 4 409 96 811 11 638 1 13 16 2 116 5 1 766
242 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0]]
tf.Tensor([0], shape=(1,), dtype=int64)

3. 单层 Bi-LSTM 模型

（1）第一层是我们刚才定义好的 encoder ，将输入的文本进行整数的映射。

（2）第二层是 Embedding 层，我们这里设置了每个词的词嵌入维度为 32 维。

（3）第三层是 Bi-LSTM 层，这里我们设置了每个 LSTM 单元的输出维度为 16 维。

（4）第四层是一个输出 8 维向量的全连接层，并且使用的 relu 激活函数。

（5）第五层是 Dropout ，设置神经元丢弃率为 0.5 ，主要是为了防止过拟合。

（6）第六层是一个输出 1 维向量的全连接层，也就是输出层，表示的是该样本的 logit 。

model = tf.keras.Sequential([
    encoder,
    tf.keras.layers.Embedding( input_dim=len(encoder.get_vocabulary()), output_dim=32, mask_zero=True),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(16)),
    tf.keras.layers.Dense(8, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(1)
])

（7）在没有经过训练的模型上对文本进行预测，如果输出小于 0 则为消极评论，如果大于 0 则为积极评论，我们可以看出这条评论本来应该是积极评论，但是却输出的 logit 却是负数，即错误预测成了消极的。

sample_text = ('The movie was cool. The animation and the graphics were out of this world. I would recommend this movie.')
model.predict(np.array([sample_text]))

预测结果为：

array([[-0.01437075]], dtype=float32)

array([[-0.01437075]], dtype=float32)

（8）我们使用 BinaryCrossentropy 作为损失函数，需要注意的是如果模型输出结果给到 BinaryCrossentropy 的是一个 logit 值（值域范围 [-∞, +∞] ），则应该设置 from_logits=True 。如果模型输出结果给到 BinaryCrossentropy 的是一个概率值 probability （值域范围 [0, 1] ），则应该设置为 from_logits=False 。

（9）我们使用 Adam 作为优化器，并且设置学习率为 1e-3 。

（10）我们使用准确率 accuracy 作为评估指标。

（11）使用训练数据训练 10 个 epoch，同时每经过一个 epoch 使用验证数据对模型进行评估。

model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              optimizer=tf.keras.optimizers.Adam(1e-3),
              metrics=['accuracy'])
history = model.fit(train_dataset, epochs=10,  validation_data=test_dataset, validation_steps=30)

训练过程如下：

Epoch 1/10
391/391 [==============================] - 30s 65ms/step - loss: 0.6461 - accuracy: 0.5090 - val_loss: 0.4443 - val_accuracy: 0.8245
Epoch 2/10
391/391 [==============================] - 23s 58ms/step - loss: 0.4594 - accuracy: 0.6596 - val_loss: 0.3843 - val_accuracy: 0.8396
...
Epoch 10/10
391/391 [==============================] - 22s 57ms/step - loss: 0.3450 - accuracy: 0.8681 - val_loss: 0.3920 - val_accuracy: 0.8417

（12）训练结束之后使用测试数据对模型进行测试，准确率可以达到 0.8319 。如果经过超参数的调整和足够的训练时间，效果会更好。

model.evaluate(test_dataset)

结果为：

391/391 [==============================] - 6s 15ms/step - loss: 0.3964 - accuracy: 0.8319

（13）使用训练好的模型对影评进行分类预测，可以看出可以正确得识别文本的情感取向。因为负数表示的就是影评为负面情绪的。

sample_text = ('The movie was not cool. The animation and the graphics were bad. I would not recommend this movie.')
model.predict(np.array([sample_text]))

结果为：

array([[-1.6402857]], dtype=float32)

4. 多层 Bi-LSTM 模型

（1）我们上面只是搭建了一层的 Bi-LSTM ，这里我们搭建了两层的 Bi-LSTM 模型，也就是在第二层 Bidirectional 之后又加了一层 Bidirectional ，这样可以使我们的模型更加有效。我们使用的损失函数、优化器、评估指标都和上面一样。

model = tf.keras.Sequential([
    encoder,
    tf.keras.layers.Embedding(len(encoder.get_vocabulary()), 32, mask_zero=True),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32,  return_sequences=True)),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(16)),
    tf.keras.layers.Dense(8, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(1)
])
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), optimizer=tf.keras.optimizers.Adam(1e-3), metrics=['accuracy'])
history = model.fit(train_dataset, epochs=10, validation_data=test_dataset,  validation_steps=30)

训练过程如下：

Epoch 1/10
391/391 [==============================] - 59s 124ms/step - loss: 0.6170 - accuracy: 0.5770 - val_loss: 0.3931 - val_accuracy: 0.8135
Epoch 2/10
391/391 [==============================] - 45s 114ms/step - loss: 0.4264 - accuracy: 0.7544 - val_loss: 0.3737 - val_accuracy: 0.8380
...
Epoch 10/10
391/391 [==============================] - 45s 114ms/step - loss: 0.3138 - accuracy: 0.8849 - val_loss: 0.4069 - val_accuracy: 0.8323

（2）训练结束之后使用测试数据对模型进行测试，准确率可以达到 0.8217 。如果经过超参数的调整和足够的训练时间，效果会更好。

model.evaluate(test_dataset)

结果为：

391/391 [==============================] - 14s 35ms/step - loss: 0.4021 - accuracy: 0.8217

（3）使用训练好的模型对影评进行分类预测，可以看出可以正确得识别文本的情感取向。因为正数表示的就是影评为积极情绪的。

sample_text = ('The movie was good. The animation and the graphics were very good. you should love movie.')
model.predict(np.array([sample_text]))

结果为：

array([[3.571126]], dtype=float32)

以上就是Tensorflow 2.4 搭建单层和多层 Bi-LSTM 模型的详细内容，更多关于Tensorflow搭建Bi-LSTM模型的资料请关注我们其它相关文章！

python神经网络tensorflow利用训练好的模型进行预测

目录学习前言载入模型思路实现代码学习前言在神经网络学习中slim常用函数与如何训练.保存模型文章里已经讲述了如何使用slim训练出来一个模型,这篇文章将会讲述如何预测. 载入模型思路载入模型的过程主要分为以下四步: 1.建立会话Session: 2.将img_input的placeholder传入网络,建立网络结构: 3.初始化所有变量: 4.利用saver对象restore载入所有参数. 这里要注意的重点是,在利用saver对象restore载入所有参数之前,必须要建立网络结构,因
深度学习Tensorflow 2.4 完成迁移学习和模型微调

目录前言实现过程 1. 获取数据 2. 数据扩充与数据缩放 3. 迁移学习 4. 微调 5. 预测前言本文使用 cpu 的 tensorflow 2.4 完成迁移学习和模型微调,并使用训练好的模型完成猫狗图片分类任务. 预训练模型在 NLP 中最常见的可能就是 BERT 了,在 CV 中我们此次用到了 MobileNetV2 ,它也是一个轻量化预训练模型,它已经经过大量的图片分类任务的训练,里面保存了一个可以通用的去捕获图片特征的模型网络结构,其可以通用地提取出图片的有意义特征.这些特征
TensorFlow神经网络构造线性回归模型示例教程

先制作一些数据: import numpy as np import tensorflow as tf import matplotlib.pyplot as plt # 随机生成1000个点,围绕在y=0.1x+0.3的直线周围 num_points = 1000 vectors_set = [] for i in range(num_points): x1 = np.random.normal(0.0, 0.55) # np.random.normal(mean,stdev,size)给出均
Tensorflow2.1 完成权重或模型的保存和加载

目录前言实现方法 1. 读取数据 2. 搭建深度学习模型 3. 使用回调函数在每个 epoch 后自动保存模型权重 4. 使用回调函数每经过 5 个 epoch 对模型权重保存一次 5. 手动保存模型权重到指定目录 6. 手动保存整个模型结构和权重前言本文主要使用 cpu 版本的 tensorflow-2.1 来完成深度学习权重参数/模型的保存和加载操作. 在我们进行项目期间,很多时候都要在模型训练期间.训练结束之后对模型或者模型权重进行保存,然后我们可以从之前停止的地方恢复原模型效果继
Tensorflow2.4使用Tuner选择模型最佳超参详解

目录前言实现过程 1. 获取 MNIST 数据并进行处理 2. 搭建超模型 3. 实例化调节器并进行模型超调 4. 训练模型获得最佳 epoch 5. 使用最有超参数集进行模型训练和评估前言本文使用 cpu 版本的 tensorflow 2.4 ,选用 Keras Tuner 工具以 Fashion 数据集的分类任务为例,完成最优超参数的快速选择任务. 当我们搭建完成深度学习模型结构之后,我们在训练模型的过程中,有很大一部分工作主要是通过验证集评估指标,来不断调节模型的超参数,这是比较耗
python深度学习TensorFlow神经网络模型的保存和读取

目录之前的笔记里实现了softmax回归分类.简单的含有一个隐层的神经网络.卷积神经网络等等,但是这些代码在训练完成之后就直接退出了,并没有将训练得到的模型保存下来方便下次直接使用.为了让训练结果可以复用,需要将训练好的神经网络模型持久化,这就是这篇笔记里要写的东西. TensorFlow提供了一个非常简单的API,即tf.train.Saver类来保存和还原一个神经网络模型. 下面代码给出了保存TensorFlow模型的方法: import tensorflow as tf # 声明两个变量
python深度学习tensorflow训练好的模型进行图像分类

目录正文随机找一张图片读取图片进行分类识别最后输出正文谷歌在大型图像数据库ImageNet上训练好了一个Inception-v3模型,这个模型我们可以直接用来进来图像分类. 下载链接: https://pan.baidu.com/s/1XGfwYer5pIEDkpM3nM6o2A 提取码: hu66 下载完解压后,得到几个文件: 其中 classify_image_graph_def.pb 文件就是训练好的Inception-v3模型. imagenet_synset_to_huma
python人工智能TensorFlow自定义层及模型保存

目录一.自定义层和网络 1.自定义层 2.自定义网络二.模型的保存和加载 1.保存参数 2.保存整个模型一.自定义层和网络 1.自定义层 ①必须继承自layers.layer ②必须实现两个方法,__init__和call 这个层,实现的就是创建参数,以及一层的前向传播. 添加参数使用self.add_weight,直接调用即可,因为已经在母类中实现. 在call方法中,实现前向传播并返回结果即可. 2.自定义网络 ①必须继承自keras.Model ②必须实现两个方法,__init__和
Tensorflow 2.4 搭建单层和多层 Bi-LSTM 模型

目录前言实现过程 1. 获取数据 2. 处理数据 3. 单层 Bi-LSTM 模型 4. 多层 Bi-LSTM 模型前言本文使用 cpu 版本的 TensorFlow 2.4 ,分别搭建单层 Bi-LSTM 模型和多层 Bi-LSTM 模型完成文本分类任务. 确保使用 numpy == 1.19.0 左右的版本,否则在调用 TextVectorization 的时候可能会报 NotImplementedError . 实现过程 1. 获取数据 (1)我们本文用到的数据是电影的影评数据,每
Window版下在Jupyter中编写TensorFlow的环境搭建

在疫情飘摇的2020年初,TensorFlow发布了2.1.0版本,本Python小白在安装过程中遇坑无数,幸得多年练就的百度功力终于解决,特记录下来以免后人跳坑. 准备工作 Python 3.6或3.7 .TensorFlow2.1.0版本将是最后一个支持Python 2的版本,但Python3.8还不支持,因此请自行从官网下载安装Python 3.6或3.7(我安装的是3.6). 一.搭建虚拟环境(以下顺序不要乱) virtualenv可以搭建虚拟且独立的Python环境,解决不同的工程依赖
从零开始的TensorFlow+VScode开发环境搭建的步骤(图文)

VScode不愧是用户数量上升最快的编辑器,界面华丽(当然,需要配合各种主题插件),十分容易上手且功能强大.之前用它写C++体验十分nice,现在需要学习tensorflow,而工欲善其事必先利其器,搭建一个舒服的开发环境是非常重要的第一步. 目标:在linux下从无到有,安装anaconde3, tensorflow, 配置vs code,直到可以运行一个tensorflow版hello world(实为mnist手写数据分类).尽管有其他的安装tensorflow的方式,但使用anacond
使用tensorflow实现AlexNet

AlexNet是2012年ImageNet比赛的冠军,虽然过去了很长时间,但是作为深度学习中的经典模型,AlexNet不但有助于我们理解其中所使用的很多技巧,而且非常有助于提升我们使用深度学习工具箱的熟练度.尤其是我刚入门深度学习,迫切需要一个能让自己熟悉tensorflow的小练习,于是就有了这个小玩意儿...... 先放上我的代码:https://github.com/hjptriplebee/AlexNet_with_tensorflow 如果想运行代码,详细的配置要求都在上面链接的rea
运用PyTorch动手搭建一个共享单车预测器

本文摘自 <深度学习原理与PyTorch实战> 我们将从预测某地的共享单车数量这个实际问题出发,带领读者走进神经网络的殿堂,运用PyTorch动手搭建一个共享单车预测器,在实战过程中掌握神经元.神经网络.激活函数.机器学习等基本概念,以及数据预处理的方法.此外,还会揭秘神经网络这个"黑箱",看看它如何工作,哪个神经元起到了关键作用,从而让读者对神经网络的运作原理有更深入的了解. 3.1 共享单车的烦恼大约从2016年起,我们的身边出现了很多共享单车.五颜六色.各式各样的共
tensorflow 实现自定义layer并添加到计算图中

目的将用户自定义的layer结合tensorflow自带的layer组成多层layer的计算图. 实现功能对2D图像进行滑动窗口平均,并通过自定义的操作layer返回结果. import tensorflow as tf import numpy as np sess = tf.Session() #将size设为[1, 4, 4, 1]是因为tf中图像函数是处理四维图片的. #这四维依次是: 图片数量,高度, 宽度, 颜色通道 x_shape = [1,4,4,1] x_val = np.
Python搭建Keras CNN模型破解网站验证码的实现

在本项目中,将会用Keras来搭建一个稍微复杂的CNN模型来破解以上的验证码.验证码如下: 利用Keras可以快速方便地搭建CNN模型,本项目搭建的CNN模型如下: 将数据集分为训练集和测试集,占比为8:2,该模型训练的代码如下: # -*- coding: utf-8 -*- import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from matplotlib im
python人工智能tensorflow函数tensorboard使用方法

目录 tensorboard相关函数及其常用参数设置 1 with tf.name_scope(layer_name): 2 tf.summary.histogram(layer_name+"/biases",biases) 3 tf.summary.scalar(“loss”,loss) 4 tf.summary.merge_all() 5 tf.summary.FileWriter(“logs/”,sess.graph) 6 write.add_summary(result,i)
Tensorflow加载模型实现图像分类识别流程详解

目录前言正文 VGG19网络介绍总结前言深度学习框架在市面上有很多.比如Theano.Caffe.CNTK.MXnet .Tensorflow等.今天讲解的就是主角Tensorflow.Tensorflow的前身是Google大脑项目的一个分布式机器学习训练框架,它是一个十分基础且集成度很高的系统,它的目标就是为研究超大型规模的视觉项目,后面延申到各个领域.Tensorflow 在2015年正式开源,开源的一个月内就收获到1w多的starts,这足以说明Tensorflow的优越性以及
Python tensorflow与pytorch的浮点运算数如何计算

目录 1. 引言 2. 模型结构 3. 计算模型的 FLOPs 3.1. tensorflow 1.12.0 3.2. tensorflow 2.3.1 3.3. pytorch 1.10.1+cu102 3.4. 结果对比 4. 总结 1. 引言 FLOPs 是 floating point operations 的缩写,指浮点运算数,可以用来衡量模型/算法的计算复杂度.本文主要讨论如何在 tensorflow 1.x, tensorflow 2.x 以及 pytorch 中利用相关工具计算对

Tensorflow 2.4 搭建单层和多层 Bi-LSTM 模型

目录

前言

实现过程

1. 获取数据

2. 处理数据

3. 单层 Bi-LSTM 模型

4. 多层 Bi-LSTM 模型

相关推荐

随机推荐