— 全文阅读8分钟 —
在本文中,你将学习到以下内容:
- TensorFlow中调用ResNet网络
- 训练网络并保存模型
- 加载模型预测结果
前言
在深度学习中,随着网络深度的增加,模型优化会变得越来越困难,甚至会发生梯度爆炸,导致整个网络训练无法收敛。ResNet(Residual Networks)的提出解决了这个问题。在这里我们直接调用ResNet网络进行训练,讲解ResNet细节的文章有很多,这里找了一篇供参考。
搭建训练网络
如果你看过了前面的准备工作,图片预处理和制作tfrecord格式,默认已经有tfrecord格式的数据文件了。我们接着搭建网络,来处理100类商标图片的分类问题。将制作好的tfrecord数据通过队列系统传入ResNet网络进行训练。

首先导入必要的库:
1 2 |
import tensorflow as tf import tensorflow.contrib.slim.nets as nets |
nets库里面集成了现有的很多网络(AlexNet,Inception,ResNet,VGG)可以直接调用,我们在这里使用ResNet_50,即50层的网络训练。
接下来我们先定义一个读取tfrecord文件的函数:
1 2 3 4 5 6 7 8 9 10 11 12 |
def read_and_decode_tfrecord(filename): filename_deque = tf.train.string_input_producer(filename) reader = tf.TFRecordReader() _, serialized_example = reader.read(filename_deque) features = tf.parse_single_example(serialized_example, features={ 'label': tf.FixedLenFeature([], tf.int64), 'img_raw': tf.FixedLenFeature([], tf.string)}) label = tf.cast(features['label'], tf.int32) img = tf.decode_raw(features['img_raw'], tf.uint8) img = tf.reshape(img, [224, 224, 3]) img = tf.cast(img, tf.float32) / 255.0 #将矩阵归一化0-1之间 return img, label |
定义模型保存地址,batch_sizes设置的小一点训练效果更好,将当前目录下的tfrecord文件放入列表中:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
save_dir = r"./train_image_63.model" # 模型保存路径 batch_size_ = 2 lr = tf.Variable(0.0001, dtype=tf.float32) # 学习速率 x = tf.placeholder(tf.float32, [None, 224, 224, 3]) # 图片大小为224*224*3 y_ = tf.placeholder(tf.float32, [None]) train_list = ['traindata_63.tfrecords-000', 'traindata_63.tfrecords-001', 'traindata_63.tfrecords-002', 'traindata_63.tfrecords-003', 'traindata_63.tfrecords-004', 'traindata_63.tfrecords-005', 'traindata_63.tfrecords-006', 'traindata_63.tfrecords-007', 'traindata_63.tfrecords-008', 'traindata_63.tfrecords-009', 'traindata_63.tfrecords-010', 'traindata_63.tfrecords-011', 'traindata_63.tfrecords-012', 'traindata_63.tfrecords-013', 'traindata_63.tfrecords-014', 'traindata_63.tfrecords-015', 'traindata_63.tfrecords-016', 'traindata_63.tfrecords-017', 'traindata_63.tfrecords-018', 'traindata_63.tfrecords-019', 'traindata_63.tfrecords-020', 'traindata_63.tfrecords-021'] #制作成的所有tfrecord数据,每个最多包含1000个图片数据 # 随机打乱顺序 img, label = read_and_decode_tfrecord(train_list) img_batch, label_batch = tf.train.shuffle_batch([img, label], num_threads=2, batch_size=batch_size_, capacity=10000, min_after_dequeue=9900) |
注意这里使用了tf.train.shuffle_batch
随机打乱队列里面的数据顺序,num_threads
表示线程数,capacity
表示队列的容量,在这里设置成10000, min_after_dequeue
队列里保留的最小数据量,并且控制着随机的程度,设置成9900的意思是,当队列中的数据出列100个,剩下9900个的时候,就要重新补充100个数据进来并打乱顺序。如果你要按顺序导入队列,改成tf.train.batch
函数,并删除min_after_dequeue
参数。这些参数都要根据自己的电脑配置进行相应的设置。
接下来将label值进行onehot编码,直接调用tf.one_hot
函数。因为我们这里有100类,depth
设置成100:
1 2 3 4 |
# 将label值进行onehot编码 one_hot_labels = tf.one_hot(indices=tf.cast(y_, tf.int32), depth=100) pred, end_points = nets.resnet_v2.resnet_v2_50(x, num_classes=100, is_training=True) pred = tf.reshape(pred, shape=[-1, 100]) |
我们通过nets.resnet_v2.resnet_v2_50
直接调用ResNet_50网络,同样num_classes
等于类别总数,is_training
表示我们是否要训练网络里面固定层的参数,True表示所有参数都重新训练,False表示只训练后面几层的参数。
网络搭好后,我们继续定义损失函数和优化器,损失函数选择sigmoid交叉熵,优化器选择Adam:
1 2 3 |
# 定义损失函数和优化器 loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=pred, labels=one_hot_labels)) optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(loss) |
定义准确率函数,tf.argmax函数返回最大值所在位置:
1 2 3 4 5 |
# 准确度 a = tf.argmax(pred, 1) b = tf.argmax(one_hot_labels, 1) correct_pred = tf.equal(a, b) accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32)) |
最后我们构建Session,让网络跑起来:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
saver = tf.train.Saver() with tf.Session() as sess: sess.run(tf.global_variables_initializer()) # 创建一个协调器,管理线程 coord = tf.train.Coordinator() # 启动QueueRunner,此时文件名队列已经进队 threads = tf.train.start_queue_runners(sess=sess, coord=coord) i = 0 while True: i += 1 b_image, b_label = sess.run([img_batch, label_batch]) _, loss_, y_t, y_p, a_, b_ = sess.run([optimizer, loss, one_hot_labels, pred, a, b], feed_dict={x: b_image, y_: b_label}) print('step: {}, train_loss: {}'.format(i, loss_)) if i % 20 == 0: _loss, acc_train = sess.run([loss, accuracy], feed_dict={x: b_image, y_: b_label}) print('--------------------------------------------------------') print('step: {} train_acc: {} loss: {}'.format(i, acc_train, _loss)) print('--------------------------------------------------------') if i == 200000: saver.save(sess, save_dir, global_step=i) elif i == 300000: saver.save(sess, save_dir, global_step=i) elif i == 400000: saver.save(sess, save_dir, global_step=i) break coord.request_stop() # 其他所有线程关闭之后,这一函数才能返回 coord.join(threads) |
当我们使用队列系统时,在Session部分一定要创建一个协调器管理线程。我们每20步输出一次准确率,在200000,300000,400000步的时候自动保存模型。
训练结束后会得到如下模型文件,我在这里只保留了300000步的模型:

附上训练网络完整代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
import tensorflow as tf import tensorflow.contrib.slim.nets as nets def read_and_decode_tfrecord(filename): filename_deque = tf.train.string_input_producer(filename) reader = tf.TFRecordReader() _, serialized_example = reader.read(filename_deque) features = tf.parse_single_example(serialized_example, features={ 'label': tf.FixedLenFeature([], tf.int64), 'img_raw': tf.FixedLenFeature([], tf.string)}) label = tf.cast(features['label'], tf.int32) img = tf.decode_raw(features['img_raw'], tf.uint8) img = tf.reshape(img, [224, 224, 3]) img = tf.cast(img, tf.float32) / 255.0 #将矩阵归一化0-1之间 return img, label save_dir = r"./train_image_63.model" batch_size_ = 2 lr = tf.Variable(0.0001, dtype=tf.float32) x = tf.placeholder(tf.float32, [None, 224, 224, 3]) y_ = tf.placeholder(tf.float32, [None]) train_list = ['traindata_63.tfrecords-000', 'traindata_63.tfrecords-001', 'traindata_63.tfrecords-002', 'traindata_63.tfrecords-003', 'traindata_63.tfrecords-004', 'traindata_63.tfrecords-005', 'traindata_63.tfrecords-006', 'traindata_63.tfrecords-007', 'traindata_63.tfrecords-008', 'traindata_63.tfrecords-009', 'traindata_63.tfrecords-010', 'traindata_63.tfrecords-011', 'traindata_63.tfrecords-012', 'traindata_63.tfrecords-013', 'traindata_63.tfrecords-014', 'traindata_63.tfrecords-015', 'traindata_63.tfrecords-016', 'traindata_63.tfrecords-017', 'traindata_63.tfrecords-018', 'traindata_63.tfrecords-019', 'traindata_63.tfrecords-020', 'traindata_63.tfrecords-021'] # 随机打乱顺序 img, label = read_and_decode_tfrecord(train_list) img_batch, label_batch = tf.train.shuffle_batch([img, label], num_threads=2, batch_size=batch_size_, capacity=10000, min_after_dequeue=9900) # 将label值进行onehot编码 one_hot_labels = tf.one_hot(indices=tf.cast(y_, tf.int32), depth=100) pred, end_points = nets.resnet_v2.resnet_v2_50(x, num_classes=100, is_training=True) pred = tf.reshape(pred, shape=[-1, 100]) # 定义损失函数和优化器 loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=pred, labels=one_hot_labels)) optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(loss) # 准确度 a = tf.argmax(pred, 1) b = tf.argmax(one_hot_labels, 1) correct_pred = tf.equal(a, b) accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32)) saver = tf.train.Saver() with tf.Session() as sess: sess.run(tf.global_variables_initializer()) # 创建一个协调器,管理线程 coord = tf.train.Coordinator() # 启动QueueRunner,此时文件名队列已经进队 threads = tf.train.start_queue_runners(sess=sess, coord=coord) i = 0 while True: i += 1 b_image, b_label = sess.run([img_batch, label_batch]) _, loss_, y_t, y_p, a_, b_ = sess.run([optimizer, loss, one_hot_labels, pred, a, b], feed_dict={x: b_image, y_: b_label}) print('step: {}, train_loss: {}'.format(i, loss_)) if i % 20 == 0: _loss, acc_train = sess.run([loss, accuracy], feed_dict={x: b_image, y_: b_label}) print('--------------------------------------------------------') print('step: {} train_acc: {} loss: {}'.format(i, acc_train, _loss)) print('--------------------------------------------------------') if i == 200000: saver.save(sess, save_dir, global_step=i) elif i == 300000: saver.save(sess, save_dir, global_step=i) elif i == 400000: saver.save(sess, save_dir, global_step=i) break coord.request_stop() # 其他所有线程关闭之后,这一函数才能返回 coord.join(threads) |
预测结果
我们利用1000张测试数据评估我们的模型,直接放代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
import tensorflow as tf import tensorflow.contrib.slim.nets as nets from PIL import Image import os test_dir = r'./test' # 原始的test文件夹,含带预测的图片 model_dir = r'./train_image_63.model-300000' # 模型地址 test_txt_dir = r'./test.txt' # 原始的test.txt文件 result_dir = r'./result.txt' # 生成输出结果 x = tf.placeholder(tf.float32, [None, 224, 224, 3]) classes = ['1', '10', '100', '11', '12', '13', '14', '15', '16', '17', '18', '19', '2', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '3', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '4', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '5', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '6', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '7', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '8', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89', '9', '90', '91', '92', '93', '94', '95', '96', '97', '98', '99'] # 标签顺序 pred, end_points = nets.resnet_v2.resnet_v2_50(x, num_classes=100, is_training=True) pred = tf.reshape(pred, shape=[-1, 100]) a = tf.argmax(pred, 1) saver = tf.train.Saver() with tf.Session() as sess: sess.run(tf.global_variables_initializer()) saver.restore(sess, model_dir) with open(test_txt_dir, 'r') as f: data = f.readlines() for i in data: test_name = i.split()[0] for pic in os.listdir(test_dir): if pic == test_name: img_path = os.path.join(test_dir, pic) img = Image.open(img_path) img = img.resize((224, 224)) img = tf.reshape(img, [1, 224, 224, 3]) img1 = tf.reshape(img, [1, 224, 224, 3]) img = tf.cast(img, tf.float32) / 255.0 b_image, b_image_raw = sess.run([img, img1]) t_label = sess.run(a, feed_dict={x: b_image}) index_ = t_label[0] predict = classes[index_] with open(result_dir, 'a') as f1: print(test_name, predict, file=f1) break |
需要注意的是test数据集并没有处理成tfrecord格式,在这里直接将图片一张张导入用模型预测,生成的结果文件主要是为了提交比赛使用。原始数据和模型我会放在这里,密码:8xbi。有兴趣自提。
至此,我们就完成了一个CNN图像识别项目。