热门标签 | HotTags
当前位置:  开发笔记 > 开发工具 > 正文

【深度学习实战03】——YOLOtensorflow运行及源码解析

本文章是深度学习实战系列第三讲文章,以运行代码源码分析为主;转载请注明引用自:https:blog.csdn.netc20081052a

本文章是深度学习实战系列第三讲文章,以运行代码+源码分析 为主;

转载请注明引用自:https://blog.csdn.net/c20081052/article/details/80260726

首先代码下载链接是:https://github.com/hizhangp/yolo_tensorflow

下载完后建议好好读下里面的README部分内容;

本文结构:一.YOLO源码解读;二.代码运行

一.源码解读

下载代码YOLO_tensorflow-master.zip到自己常用的目录下,并解压,得到(其中data文件夹是我新建的)

主要输出返回尺寸缩放到448x448且归一化到【-1,1】后的images图像列表(batchsize个)以及对应的真实labels数据;


其中timer.py代码解析如下:(主要就是计时用的)



import time
import datetimeclass Timer(object):'''A simple timer.'''def __init__(self):self.init_time = time.time()self.total_time = 0.self.calls = 0self.start_time = 0.self.diff = 0.self.average_time = 0.self.remain_time = 0.def tic(self):# using time.time instead of time.clock because time time.clock# does not normalize for multithreadingself.start_time = time.time() #获取当前系统时间def toc(self, average=True): self.diff = time.time() - self.start_time #获取当前系统时间-之前获取的系统时间=时间差self.total_time += self.diff #获取总的时间差self.calls += 1 #调用次数self.average_time = self.total_time / self.calls #多次时间调用,计算平均时间差if average:return self.average_timeelse:return self.diffdef remain(self, iters, max_iters): #用于计算完成剩余迭代次数预计所费时间if iters == 0:self.remain_time = 0else:self.remain_time = (time.time() - self.init_time) * \(max_iters - iters) / itersreturn str(datetime.timedelta(seconds=int(self.remain_time)))

2.yolo文件夹下主要文件:config.py和yolo_net.py;

其中config.py代码解析如下:

import os#
# path and dataset parameter
#DATA_PATH = 'data'PASCAL_PATH = os.path.join(DATA_PATH, 'pascal_voc') #pascal的路径是;当前工作路径/data/pascal_vocCACHE_PATH = os.path.join(PASCAL_PATH, 'cache') #cache的路径是;当前工作路径/data/pascal_voc/cacheOUTPUT_DIR = os.path.join(PASCAL_PATH, 'output') #output的路径是;当前工作路径/data/pascal_voc/outputWEIGHTS_DIR = os.path.join(PASCAL_PATH, 'weights') #weights的路径是;当前工作路径/data/pascal_voc/weightsWEIGHTS_FILE = None
# WEIGHTS_FILE = os.path.join(DATA_PATH, 'weights', 'YOLO_small.ckpt')CLASSES = ['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', #目标类别'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse','motorbike', 'person', 'pottedplant', 'sheep', 'sofa','train', 'tvmonitor']FLIPPED = True #是否flipped#
# model parameter
#IMAGE_SIZE = 448CELL_SIZE = 7BOXES_PER_CELL = 2ALPHA = 0.1DISP_CONSOLE = FalseOBJECT_SCALE = 1.0 #这四个损失函数系数
NOOBJECT_SCALE = 1.0
CLASS_SCALE = 2.0
COORD_SCALE = 5.0#
# solver parameter
#GPU = ''LEARNING_RATE = 0.0001DECAY_STEPS = 30000DECAY_RATE = 0.1STAIRCASE = TrueBATCH_SIZE = 45MAX_ITER = 15000SUMMARY_ITER = 10SAVE_ITER = 1000#
# test parameter
#THRESHOLD = 0.2IOU_THRESHOLD = 0.5

主要是网络训练时的配置要求;

其中yolo_net.py代码解析如下:

import numpy as np
import tensorflow as tf
import yolo.config as cfgslim = tf.contrib.slimclass YOLONet(object): #定义一个YOLONet类def __init__(self, is_training=True):self.classes = cfg.CLASSES #目标类别self.num_class = len(self.classes) #目标类别数量,值为20self.image_size = cfg.IMAGE_SIZE #图像尺寸,为448self.cell_size = cfg.CELL_SIZE # cell尺寸,为7self.boxes_per_cell = cfg.BOXES_PER_CELL #每个grid cell负责的boxes数量,为2self.output_size = (self.cell_size * self.cell_size) *\ #输出特征维度,7X7X(20+2X5)(self.num_class + self.boxes_per_cell * 5)self.scale = 1.0 * self.image_size / self.cell_size #尺寸缩放系数, 448/7=64self.boundary1 = self.cell_size * self.cell_size * self.num_class # 7X7X20self.boundary2 = self.boundary1 +\ # 7X7X20 + 7X7X2 49个所属20个物体类别的概率+98个bboxself.cell_size * self.cell_size * self.boxes_per_cellself.object_scale = cfg.OBJECT_SCALE #值为1,有目标存在的系数self.noobject_scale = cfg.NOOBJECT_SCALE #值为1,没有目标存在的系数(论文貌似为0.5)self.class_scale = cfg.CLASS_SCALE #值为2.0, 类别损失函数的系数self.coord_scale = cfg.COORD_SCALE #值为5.0,坐标损失函数的系数self.learning_rate = cfg.LEARNING_RATE #学习率=0.0001self.batch_size = cfg.BATCH_SIZE #batch_size=45self.alpha = cfg.ALPHA #alpha=0.1self.offset = np.transpose(np.reshape(np.array([np.arange(self.cell_size)] * self.cell_size * self.boxes_per_cell), #将2X7X7的三维矩阵,转为7X7X2的三维矩阵(self.boxes_per_cell, self.cell_size, self.cell_size)), (1, 2, 0))self.images = tf.placeholder(tf.float32, [None, self.image_size, self.image_size, 3], #创建输入图像占位符 448X448 3通道 name='images')self.logits = self.build_network( #输出logits值(预测值)self.images, num_outputs=self.output_size, alpha=self.alpha,is_training=is_training)if is_training:self.labels = tf.placeholder(tf.float32,[None, self.cell_size, self.cell_size, 5 + self.num_class]) #为label(真实值)穿件占位符self.loss_layer(self.logits, self.labels) #求lossself.total_loss = tf.losses.get_total_loss() #求所有的losstf.summary.scalar('total_loss', self.total_loss)def build_network(self, #建立网络(卷积层+池化层+全连接层)images, #输入的图像 [None,448,448,3]num_outputs, #输出特征维度[None,7X7X30]alpha,keep_prob=0.5, #dropoutis_training=True,scope='yolo'): #命个名字with tf.variable_scope(scope):with slim.arg_scope([slim.conv2d, slim.fully_connected], activation_fn=leaky_relu(alpha), #激活函数用的是leaky_reluweights_regularizer=slim.l2_regularizer(0.0005), #权重正则化用的是l2weights_initializer=tf.truncated_normal_initializer(0.0, 0.01) #权重初始化用的是正态分布(0.0,0.01)):net = tf.pad( #为输入图像进行填充,单张图上下左右各用0填充3行/列images, np.array([[0, 0], [3, 3], [3, 3], [0, 0]]), #BatchSize维度不填充,行维度上下填充3行0,列维度左右填充3列0,channel维度不填充name='pad_1')net = slim.conv2d( # input=net; num_outputs=64个特征图;kernel_size:7X7; strides=2;net, 64, 7, 2, padding='VALID', scope='conv_2') # 上面已经pad了,所以选padding=VALID,即不停留在图像边缘net = slim.max_pool2d(net, 2, padding='SAME', scope='pool_3') #最大池化 2X2的核结构,stride=2;输出net 224X224X64net = slim.conv2d(net, 192, 3, scope='conv_4') #卷积,输出特征图192个,kernel_size:3X3; 输出net: 224X224X192net = slim.max_pool2d(net, 2, padding='SAME', scope='pool_5') #最大池化 2X2, stride=2; 输出net:112X112X192 OK net = slim.conv2d(net, 128, 1, scope='conv_6') #卷积, kernel=1X1; 输出net: 112X112X128net = slim.conv2d(net, 256, 3, scope='conv_7') #卷积, kernel=3X3;输出net: 112X112X256net = slim.conv2d(net, 256, 1, scope='conv_8') #卷积, kernel=1X1; 输出net: 112X112X256net = slim.conv2d(net, 512, 3, scope='conv_9') #卷积, kernel=3X3;输出net: 112X112X512net = slim.max_pool2d(net, 2, padding='SAME', scope='pool_10') #最大池化 2X2,stride=2; 输出net: 56x56x256net = slim.conv2d(net, 256, 1, scope='conv_11') #连续4组 卷积输出特征数256和512的组合;net = slim.conv2d(net, 512, 3, scope='conv_12')net = slim.conv2d(net, 256, 1, scope='conv_13')net = slim.conv2d(net, 512, 3, scope='conv_14')net = slim.conv2d(net, 256, 1, scope='conv_15')net = slim.conv2d(net, 512, 3, scope='conv_16')net = slim.conv2d(net, 256, 1, scope='conv_17')net = slim.conv2d(net, 512, 3, scope='conv_18')net = slim.conv2d(net, 512, 1, scope='conv_19') #卷积,kernel=1X1;输出net: 56x56x512net = slim.conv2d(net, 1024, 3, scope='conv_20') #卷积,kernel=3X3; 输出net: 56x56x1024 ???net = slim.max_pool2d(net, 2, padding='SAME', scope='pool_21') #最大池化 2X2,stride=2;输出net:28x28x512 ??net = slim.conv2d(net, 512, 1, scope='conv_22') #连续两组 卷积输出特征数512和1024的组合net = slim.conv2d(net, 1024, 3, scope='conv_23')net = slim.conv2d(net, 512, 1, scope='conv_24')net = slim.conv2d(net, 1024, 3, scope='conv_25') net = slim.conv2d(net, 1024, 3, scope='conv_26') #卷积,kernel=3X3;输出net:28X28X1024net = tf.pad( #对net进行填充net, np.array([[0, 0], [1, 1], [1, 1], [0, 0]]), #batch维度不填充;28的行维度上下填充1行(值为0);28的列维度左右填充1列(值为0),channel维度不填充;name='pad_27')net = slim.conv2d( net, 1024, 3, 2, padding='VALID', scope='conv_28') #上面已经pad了,所以选padding=VALID,kernel=3X3,stride=2,输出net:14x14x1024 ???net = slim.conv2d(net, 1024, 3, scope='conv_29') #连续两个卷积,特征数为1024,kernel=3x3net = slim.conv2d(net, 1024, 3, scope='conv_30') #输出net: 7x7x1024 ???net = tf.transpose(net, [0, 3, 1, 2], name='trans_31') #输出net:[batchsize,channel,28,28]net = slim.flatten(net, scope='flat_32') #输出net: (1,batchsize x channel x w x h)net = slim.fully_connected(net, 512, scope='fc_33') #全连接层 输出net:1x512net = slim.fully_connected(net, 4096, scope='fc_34') #全连接层 输出net:1x4096net = slim.dropout( #dropout层,防止过拟合net, keep_prob=keep_prob, is_training=is_training,scope='dropout_35')net = slim.fully_connected( #全连接层,输出net:7x7x30特征net, num_outputs, activation_fn=None, scope='fc_36')return net #返回net: 7x7x30def calc_iou(self, boxes1, boxes2, scope='iou'): #计算box和groundtruth的IOU值"""calculate iousArgs:boxes1: 5-D tensor [BATCH_SIZE, CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4] ====> (x_center, y_center, w, h)boxes2: 5-D tensor [BATCH_SIZE, CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4] ===> (x_center, y_center, w, h)Return:iou: 4-D tensor [BATCH_SIZE, CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]"""with tf.variable_scope(scope):# transform (x_center, y_center, w, h) to (x1, y1, x2, y2)boxes1_t = tf.stack([boxes1[..., 0] - boxes1[..., 2] / 2.0, #x-w/2=x1(左上)boxes1[..., 1] - boxes1[..., 3] / 2.0, #y-h/2=y1(左上)boxes1[..., 0] + boxes1[..., 2] / 2.0, #x+w/2=x2(右下)boxes1[..., 1] + boxes1[..., 3] / 2.0], #y+h/2=y2(右下)axis=-1) #替换最后那个维度boxes2_t = tf.stack([boxes2[..., 0] - boxes2[..., 2] / 2.0,boxes2[..., 1] - boxes2[..., 3] / 2.0,boxes2[..., 0] + boxes2[..., 2] / 2.0,boxes2[..., 1] + boxes2[..., 3] / 2.0],axis=-1)# calculate the left up point & right down point #计算重叠区域最左上和最右下点lu = tf.maximum(boxes1_t[..., :2], boxes2_t[..., :2])rd = tf.minimum(boxes1_t[..., 2:], boxes2_t[..., 2:])# intersectionintersection = tf.maximum(0.0, rd - lu) #重叠区域inter_square = intersection[..., 0] * intersection[..., 1] #重叠区域面积# calculate the boxs1 square and boxs2 squaresquare1 = boxes1[..., 2] * boxes1[..., 3] #box1.w * box1.hsquare2 = boxes2[..., 2] * boxes2[..., 3] #box2.w * box2.hunion_square = tf.maximum(square1 + square2 - inter_square, 1e-10) return tf.clip_by_value(inter_square / union_square, 0.0, 1.0) #将IOU计算得到的值归一化到(0,1)def loss_layer(self, predicts, labels, scope='loss_layer'): #定义损失函数with tf.variable_scope(scope):predict_classes = tf.reshape( #预测的类别 batchsize x 7x7x20predicts[:, :self.boundary1], [self.batch_size, self.cell_size, self.cell_size, self.num_class])predict_scales = tf.reshape( #预测的scale batchsize x 7x7x2predicts[:, self.boundary1:self.boundary2],[self.batch_size, self.cell_size, self.cell_size, self.boxes_per_cell])predict_boxes = tf.reshape( #预测的框 batchsize x 7x7x2,每个box四个位置坐标信息predicts[:, self.boundary2:],[self.batch_size, self.cell_size, self.cell_size, self.boxes_per_cell, 4])response = tf.reshape( #label后0位置:有无目标 labels[..., 0],[self.batch_size, self.cell_size, self.cell_size, 1])boxes = tf.reshape( #label后(1,2,3,4)位置:目标坐标labels[..., 1:5],[self.batch_size, self.cell_size, self.cell_size, 1, 4])boxes = tf.tile( #由于单个cell预测boxes_per_cell个box信息,先对box进行该维度上的拼贴一份相同尺度的;后将坐标尺度归一化到整幅图boxes, [1, 1, 1, self.boxes_per_cell, 1]) / self.image_sizeclasses = labels[..., 5:] #label后[5:25]位置:目标类别信息offset = tf.reshape(tf.constant(self.offset, dtype=tf.float32), #将offset维度由7x7x2 reshape成 1x7x7x2[1, self.cell_size, self.cell_size, self.boxes_per_cell])offset = tf.tile(offset, [self.batch_size, 1, 1, 1]) #将offset的第一个维度拼贴为batchsize大小,即offset变为:batchsize x 7x7x2offset_tran = tf.transpose(offset, (0, 2, 1, 3)) #作者是否考虑非AXA情况??如7x8predict_boxes_tran = tf.stack([(predict_boxes[..., 0] + offset) / self.cell_size, #(预测box的x坐标+偏移量)/7(predict_boxes[..., 1] + offset_tran) / self.cell_size, #(预测box的y坐标+偏移量)/7tf.square(predict_boxes[..., 2]), #对w求平方tf.square(predict_boxes[..., 3])], axis=-1) #对h求平方iou_predict_truth = self.calc_iou(predict_boxes_tran, boxes) #计算IOU的值# calculate I tensor [BATCH_SIZE, CELL_SIZE, CELL_SIZE, BOXES_PER_CELL] #计算有目标object_mask object_mask = tf.reduce_max(iou_predict_truth, 3, keep_dims=True) #找出iou_predict_truth 第 3维度(即box_per_cell)维度计算得到的最大值构成一个tensorobject_mask = tf.cast((iou_predict_truth >= object_mask), tf.float32) * response #object_mask:表示有目标 以及 目标与gt的IOU # calculate no_I tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL] #计算无目标noobject_masknoobject_mask = tf.ones_like( #新建一个与给定tensor(object_mask)大小一致的tensor,其所有元素都为1object_mask, dtype=tf.float32) - object_mask boxes_tran = tf.stack([boxes[..., 0] * self.cell_size - offset,boxes[..., 1] * self.cell_size - offset_tran,tf.sqrt(boxes[..., 2]),tf.sqrt(boxes[..., 3])], axis=-1)# class_loss #类别损失函数class_delta = response * (predict_classes - classes) #有目标情况下 类别误差class_loss = tf.reduce_mean( tf.reduce_sum(tf.square(class_delta), axis=[1, 2, 3]), #对7x7x20每个维度上预测的类别做误差平方求和后,乘以损失函数系数class_scalename='class_loss') * self.class_scale# object_loss #含有object的box的confidence预测object_delta = object_mask * (predict_scales - iou_predict_truth)object_loss = tf.reduce_mean(tf.reduce_sum(tf.square(object_delta), axis=[1, 2, 3]),name='object_loss') * self.object_scale# noobject_loss #不含object的box的confidence预测noobject_delta = noobject_mask * predict_scalesnoobject_loss = tf.reduce_mean(tf.reduce_sum(tf.square(noobject_delta), axis=[1, 2, 3]),name='noobject_loss') * self.noobject_scale# coord_loss #坐标损失函数coord_mask = tf.expand_dims(object_mask, 4) #先扩维boxes_delta = coord_mask * (predict_boxes - boxes_tran) #需要判断第i个cell中第j个box会否负责这个objectcoord_loss = tf.reduce_mean( tf.reduce_sum(tf.square(boxes_delta), axis=[1, 2, 3, 4]), #坐标四个维度对应求差,平方和name='coord_loss') * self.coord_scaletf.losses.add_loss(class_loss)tf.losses.add_loss(object_loss)tf.losses.add_loss(noobject_loss)tf.losses.add_loss(coord_loss)tf.summary.scalar('class_loss', class_loss) #以下方便tensorboard显示用tf.summary.scalar('object_loss', object_loss)tf.summary.scalar('noobject_loss', noobject_loss)tf.summary.scalar('coord_loss', coord_loss)tf.summary.histogram('boxes_delta_x', boxes_delta[..., 0])tf.summary.histogram('boxes_delta_y', boxes_delta[..., 1])tf.summary.histogram('boxes_delta_w', boxes_delta[..., 2])tf.summary.histogram('boxes_delta_h', boxes_delta[..., 3])tf.summary.histogram('iou', iou_predict_truth)def leaky_relu(alpha): #leaky_relu激活函数def op(inputs):return tf.nn.leaky_relu(inputs, alpha=alpha, name='leaky_relu')return op

下载的代码比较新,和网上博客的一些注解有些出入,这个代码主要是构建yolo的网络结构的。需要好好理解其结构。




3. train.py和test.py;


其中train.py文件可用来训练自己的权重文件,代码给的是对pascal_voc数据集进行训练。具体代码解析如下:



import os
import argparse
import datetime
import tensorflow as tf
import yolo.config as cfg
from yolo.yolo_net import YOLONet
from utils.timer import Timer
from utils.pascal_voc import pascal_voc slim = tf.contrib.slim #tensorflow 16年推出的瘦身版代码模块#这部分主要是用pascal_voc2007数据训练自己的网络权重数据
class Solver(object): def __init__(self, net, data):self.net = netself.data = dataself.weights_file = cfg.WEIGHTS_FILE #权重文件,默认无self.max_iter = cfg.MAX_ITER #默认15000self.initial_learning_rate = cfg.LEARNING_RATE #初始学习率0.0001self.decay_steps = cfg.DECAY_STEPS #衰减步长:30000self.decay_rate = cfg.DECAY_RATE #衰减率:0.1self.staircase = cfg.STAIRCASEself.summary_iter = cfg.SUMMARY_ITER #日志记录迭代步数:10self.save_iter = cfg.SAVE_ITER #保存迭代步长:1000self.output_dir = os.path.join(cfg.OUTPUT_DIR, datetime.datetime.now().strftime('%Y_%m_%d_%H_%M')) #保存路径:output/年_月_日_时_分if not os.path.exists(self.output_dir):os.makedirs(self.output_dir)self.save_cfg()self.variable_to_restore = tf.global_variables()self.saver = tf.train.Saver(self.variable_to_restore, max_to_keep=None)self.ckpt_file = os.path.join(self.output_dir, 'yolo') #模型文件路径: 输出目录/yoloself.summary_op = tf.summary.merge_all()self.writer = tf.summary.FileWriter(self.output_dir, flush_secs=60)self.global_step = tf.train.create_global_step()self.learning_rate = tf.train.exponential_decay( #产生一个指数衰减的学习速率,learning_rate=initial_learning_rate*decay_rate^(global_step/decay_steps)self.initial_learning_rate, self.global_step, self.decay_steps,self.decay_rate, self.staircase, name='learning_rate')self.optimizer = tf.train.GradientDescentOptimizer(learning_rate=self.learning_rate)self.train_op = slim.learning.create_train_op(self.net.total_loss, self.optimizer, global_step=self.global_step)gpu_options = tf.GPUOptions()config = tf.ConfigProto(gpu_options=gpu_options)self.sess = tf.Session(config=config)self.sess.run(tf.global_variables_initializer())if self.weights_file is not None: #如果权重文件空,则打印“恢复权重文件从:”print('Restoring weights from: ' + self.weights_file)self.saver.restore(self.sess, self.weights_file)self.writer.add_graph(self.sess.graph)def train(self):train_timer = Timer() #定义类对象load_timer = Timer()for step in range(1, self.max_iter + 1): #最大迭代:15000load_timer.tic() #开始计时images, labels = self.data.get() #从pascal_voc数据集读取图像和实际标签信息load_timer.toc() #终止该步(数据加载)计时feed_dict = {self.net.images: images, #生成一个图像和label对应的字典self.net.labels: labels}if step % self.summary_iter == 0: #迭代每10步时执行如下:日志记录步长if step % (self.summary_iter * 10) == 0: #迭代每100步时执行如下:训练模型,生成报文并打印(主要是打印报文)train_timer.tic() #训练开始计时summary_str, loss, _ = self.sess.run([self.summary_op, self.net.total_loss, self.train_op], #模型训练,返回 lossfeed_dict=feed_dict)train_timer.toc() #训练结束计时log_str = '''{} Epoch: {}, Step: {}, Learning rate: {},''' #报文字符串内容''' Loss: {:5.3f}\nSpeed: {:.3f}s/iter,''''''' Load: {:.3f}s/iter, Remain: {}'''.format(datetime.datetime.now().strftime('%m-%d %H:%M:%S'),self.data.epoch,int(step),round(self.learning_rate.eval(session=self.sess), 6),loss,train_timer.average_time,load_timer.average_time,train_timer.remain(step, self.max_iter))print(log_str) else: #训练模型,并计时train_timer.tic()summary_str, _ = self.sess.run([self.summary_op, self.train_op],feed_dict=feed_dict)train_timer.toc()self.writer.add_summary(summary_str, step) #每训练10步,记录日志文件else: #其他训练步长时,不记录日志,只计时train_timer.tic()self.sess.run(self.train_op, feed_dict=feed_dict)train_timer.toc()if step % self.save_iter == 0: #模型每训练1000步保存一次print('{} Saving checkpoint file to: {}'.format(datetime.datetime.now().strftime('%m-%d %H:%M:%S'),self.output_dir))self.saver.save(self.sess, self.ckpt_file, global_step=self.global_step)def save_cfg(self): #保存当前的模型配置信息with open(os.path.join(self.output_dir, 'config.txt'), 'w') as f: #往output/config.txt中写配置信息cfg_dict = cfg.__dict__for key in sorted(cfg_dict.keys()):if key[0].isupper():cfg_str = '{}: {}\n'.format(key, cfg_dict[key])f.write(cfg_str)def update_config_paths(data_dir, weights_file):cfg.DATA_PATH = data_dircfg.PASCAL_PATH = os.path.join(data_dir, 'pascal_voc')cfg.CACHE_PATH = os.path.join(cfg.PASCAL_PATH, 'cache')cfg.OUTPUT_DIR = os.path.join(cfg.PASCAL_PATH, 'output')cfg.WEIGHTS_DIR = os.path.join(cfg.PASCAL_PATH, 'weights') #权重文件在pascal_voc/weights中cfg.WEIGHTS_FILE = os.path.join(cfg.WEIGHTS_DIR, weights_file)def main():parser = argparse.ArgumentParser()parser.add_argument('--weights', default="YOLO_small.ckpt", type=str)parser.add_argument('--data_dir', default="data", type=str)parser.add_argument('--threshold', default=0.2, type=float)parser.add_argument('--iou_threshold', default=0.5, type=float)parser.add_argument('--gpu', default='', type=str)args = parser.parse_args()if args.gpu is not None: #如果训练传进来的gpu参数非空,则将传进来的gpu信息赋值给配置文件中cfg.GPU = args.gpuif args.data_dir != cfg.DATA_PATH: #如果传经来的数据路径与当前配置文件数据路径不一致,则更新配置信息update_config_paths(args.data_dir, args.weights)os.environ['CUDA_VISIBLE_DEVICES'] = cfg.GPUyolo = YOLONet() #声明类对象yolopascal = pascal_voc('train') #定义类别solver = Solver(yolo, pascal) #利用yolo网络结构,对传进的数据,生成solverprint('Start training ...') #开始训练solver.train()print('Done training.') #完成训练if __name__ == '__main__':# python train.py --weights YOLO_small.ckpt --gpu 0 #示例,默认使用第0个GPUmain()

如果你只想测试下yolo这个模型效果,可加载别人训练好的weights模型参数(本文一开始提到的,已经提供下载链接),也可记载用train训练得到的。该程序解析如下:



import os
import cv2
import argparse
import numpy as np
import tensorflow as tf
import yolo.config as cfg
from yolo.yolo_net import YOLONet
from utils.timer import Timer#这部分主要是加载训练好的权重文件做测试,这个权重文件可以是下载的YOLO_small.ckpt,也可以是自己训练的。
class Detector(object):def __init__(self, net, weight_file):self.net &#61; netself.weights_file &#61; weight_fileself.classes &#61; cfg.CLASSESself.num_class &#61; len(self.classes)self.image_size &#61; cfg.IMAGE_SIZEself.cell_size &#61; cfg.CELL_SIZEself.boxes_per_cell &#61; cfg.BOXES_PER_CELLself.threshold &#61; cfg.THRESHOLDself.iou_threshold &#61; cfg.IOU_THRESHOLDself.boundary1 &#61; self.cell_size * self.cell_size * self.num_classself.boundary2 &#61; self.boundary1 &#43;\self.cell_size * self.cell_size * self.boxes_per_cellself.sess &#61; tf.Session()self.sess.run(tf.global_variables_initializer())print(&#39;Restoring weights from: &#39; &#43; self.weights_file) self.saver &#61; tf.train.Saver()self.saver.restore(self.sess, self.weights_file) #加载权重文件def draw_result(self, img, result): #在输入图像img上对检测到的result进行绘制框并标注类别概率信息for i in range(len(result)): #目标个数遍历绘图x &#61; int(result[i][1]) #目标中心xy &#61; int(result[i][2]) #目标中心yw &#61; int(result[i][3] / 2) #目标宽取一半h &#61; int(result[i][4] / 2) #目标高取一半cv2.rectangle(img, (x - w, y - h), (x &#43; w, y &#43; h), (0, 255, 0), 2) #目标框cv2.rectangle(img, (x - w, y - h - 20), #显示目标类别和概率值的灰色填充框(x &#43; w, y - h), (125, 125, 125), -1)lineType &#61; cv2.LINE_AA if cv2.__version__ > &#39;3&#39; else cv2.CV_AA #根据opencv版本&#xff0c;作者已经做了考虑了cv2.putText(img, result[i][0] &#43; &#39; : %.2f&#39; % result[i][5], #概率是两位小数的浮点数(x - w &#43; 5, y - h - 7), cv2.FONT_HERSHEY_SIMPLEX, 0.5,(0, 0, 0), 1, lineType)def detect(self, img): #对输入图像做目标检测img_h, img_w, _ &#61; img.shapeinputs &#61; cv2.resize(img, (self.image_size, self.image_size)) #尺寸缩放到448x448的图像&#xff1a;inputsinputs &#61; cv2.cvtColor(inputs, cv2.COLOR_BGR2RGB).astype(np.float32) #opencv读取图像格式是bgr,需要转换为rgb格式&#xff1b;inputs &#61; (inputs / 255.0) * 2.0 - 1.0 #读取图像归一化到【-1,1】inputs &#61; np.reshape(inputs, (1, self.image_size, self.image_size, 3)) #维度变化为[1,448,448,3]result &#61; self.detect_from_cvmat(inputs)[0] for i in range(len(result)):result[i][1] *&#61; (1.0 * img_w / self.image_size) #检测到目标中心坐标x是448下的坐标&#xff0c;需要变化到原图像尺寸result[i][2] *&#61; (1.0 * img_h / self.image_size)result[i][3] *&#61; (1.0 * img_w / self.image_size)result[i][4] *&#61; (1.0 * img_h / self.image_size)return result #返回原图像上检测到的目标坐标尺寸信息def detect_from_cvmat(self, inputs): #输入的inputs&#xff1b;[1,448,448,3]net_output &#61; self.sess.run(self.net.logits, #网络回归输出目标feed_dict&#61;{self.net.images: inputs})results &#61; []for i in range(net_output.shape[0]): #遍历目标个数&#xff0c;将结果放进results中results.append(self.interpret_output(net_output[i]))return results #在448x448大小图像上检测到的目标信息def interpret_output(self, output):probs &#61; np.zeros((self.cell_size, self.cell_size, #所有box (98个)对应每个类别的概率&#xff0c;[7,7,2,20]self.boxes_per_cell, self.num_class))class_probs &#61; np.reshape(output[0:self.boundary1], #输出的[0:7x7x20]这980个数代表每个cell预测的每个类别的概率值(self.cell_size, self.cell_size, self.num_class)) #最后输出时&#xff0c;每个cell只返回一个类别&#xff0c;因此类别概率维度变为[7,7,20]scales &#61; np.reshape(output[self.boundary1:self.boundary2], #输出的[7x7x20:7x7x22]这98个数reshape成[7,7,2]&#xff0c;个人理解是有无目标落在这98个box中(self.cell_size, self.cell_size, self.boxes_per_cell))boxes &#61; np.reshape( #输出的[7x7x22:]这些数记录的是每个box对应的目标坐标信息&#xff0c;reshape为[7,7,2,4]output[self.boundary2:],(self.cell_size, self.cell_size, self.boxes_per_cell, 4))offset &#61; np.array([np.arange(self.cell_size)] * self.cell_size * self.boxes_per_cell)offset &#61; np.transpose(np.reshape(offset,[self.boxes_per_cell, self.cell_size, self.cell_size]),#offset&#xff1b;[2,7,7]->[7,7,2](1, 2, 0))boxes[:, :, :, 0] &#43;&#61; offset boxes[:, :, :, 1] &#43;&#61; np.transpose(offset, (1, 0, 2)) boxes[:, :, :, :2] &#61; 1.0 * boxes[:, :, :, 0:2] / self.cell_sizeboxes[:, :, :, 2:] &#61; np.square(boxes[:, :, :, 2:])boxes *&#61; self.image_size #将目标坐标相对cell的偏移量反映到448图像上for i in range(self.boxes_per_cell):for j in range(self.num_class):probs[:, :, i, j] &#61; np.multiply( #某cell中第i个box中含目标的概率*该cell中数据第j个类别概率class_probs[:, :, j], scales[:, :, i])filter_mat_probs &#61; np.array(probs >&#61; self.threshold, dtype&#61;&#39;bool&#39;) #若概率大于0.2&#xff0c;filter_mat_probs&#61;1filter_mat_boxes &#61; np.nonzero(filter_mat_probs) #过滤掉一个cell中的两个box的其中一个&#xff0c;返回filter_mat_probs中不为0的下标boxes_filtered &#61; boxes[filter_mat_boxes[0],filter_mat_boxes[1], filter_mat_boxes[2]]probs_filtered &#61; probs[filter_mat_probs]classes_num_filtered &#61; np.argmax(filter_mat_probs, axis&#61;3)[filter_mat_boxes[0], filter_mat_boxes[1], filter_mat_boxes[2]]argsort &#61; np.array(np.argsort(probs_filtered))[::-1]boxes_filtered &#61; boxes_filtered[argsort] #过滤刷选出boxprobs_filtered &#61; probs_filtered[argsort] #过滤刷选出probs高的classes_num_filtered &#61; classes_num_filtered[argsort] #过滤刷选出类别for i in range(len(boxes_filtered)):if probs_filtered[i] &#61;&#61; 0:continuefor j in range(i &#43; 1, len(boxes_filtered)):if self.iou(boxes_filtered[i], boxes_filtered[j]) > self.iou_threshold:probs_filtered[j] &#61; 0.0filter_iou &#61; np.array(probs_filtered > 0.0, dtype&#61;&#39;bool&#39;)boxes_filtered &#61; boxes_filtered[filter_iou]probs_filtered &#61; probs_filtered[filter_iou]classes_num_filtered &#61; classes_num_filtered[filter_iou]result &#61; []for i in range(len(boxes_filtered)):result.append([self.classes[classes_num_filtered[i]],boxes_filtered[i][0],boxes_filtered[i][1],boxes_filtered[i][2],boxes_filtered[i][3],probs_filtered[i]])return result #输出过滤后的类别&#xff0c;以及对应box的坐标def iou(self, box1, box2):tb &#61; min(box1[0] &#43; 0.5 * box1[2], box2[0] &#43; 0.5 * box2[2]) - \ #得到的tb为重叠区域的宽max(box1[0] - 0.5 * box1[2], box2[0] - 0.5 * box2[2])lr &#61; min(box1[1] &#43; 0.5 * box1[3], box2[1] &#43; 0.5 * box2[3]) - \ #得到的lr为重叠区域的高max(box1[1] - 0.5 * box1[3], box2[1] - 0.5 * box2[3])inter &#61; 0 if tb <0 or lr <0 else tb * lr #重叠区域面积inter&#61;tb*lrreturn inter / (box1[2] * box1[3] &#43; box2[2] * box2[3] - inter) #IOU&#61;inter/(box1面积&#43;box2面积)def camera_detector(self, cap, wait&#61;10): #读取摄像头&#xff0c;延迟10msdetect_timer &#61; Timer()ret, _ &#61; cap.read()while ret:ret, frame &#61; cap.read()detect_timer.tic()result &#61; self.detect(frame)detect_timer.toc()print(&#39;Average detecting time: {:.3f}s&#39;.format( #统计平均检测时间detect_timer.average_time))self.draw_result(frame, result) #绘制结果cv2.imshow(&#39;Camera&#39;, frame)cv2.waitKey(wait)ret, frame &#61; cap.read()def image_detector(self, imname, wait&#61;0): #读取图像&#xff0c;一直显示detect_timer &#61; Timer()image &#61; cv2.imread(imname)detect_timer.tic()result &#61; self.detect(image)detect_timer.toc()print(&#39;Average detecting time: {:.3f}s&#39;.format(detect_timer.average_time))self.draw_result(image, result)cv2.imshow(&#39;Image&#39;, image)cv2.waitKey(wait)def main():parser &#61; argparse.ArgumentParser()parser.add_argument(&#39;--weights&#39;, default&#61;"YOLO_small.ckpt", type&#61;str)parser.add_argument(&#39;--weight_dir&#39;, default&#61;&#39;weights&#39;, type&#61;str)parser.add_argument(&#39;--data_dir&#39;, default&#61;"data", type&#61;str)parser.add_argument(&#39;--gpu&#39;, default&#61;&#39;&#39;, type&#61;str)args &#61; parser.parse_args()os.environ[&#39;CUDA_VISIBLE_DEVICES&#39;] &#61; args.gpuyolo &#61; YOLONet(False)weight_file &#61; os.path.join(args.data_dir, args.weight_dir, args.weights) #权重文件目录detector &#61; Detector(yolo, weight_file) # detect from camera #以下是用摄像头做检测输入源# cap &#61; cv2.VideoCapture(-1)# detector.camera_detector(cap)# detect from image file #以下是用图像做检测输入源imname &#61; &#39;test/person.jpg&#39;detector.image_detector(imname)if __name__ &#61;&#61; &#39;__main__&#39;:main()

如果想更换图像测试&#xff0c;只需要把test/person.jpg 替换成你的文件目录加文件名即可&#xff1b;

如果想输入摄像头采集的图像&#xff0c;则将

    # cap &#61; cv2.VideoCapture(-1)
    # detector.camera_detector(cap)


取消注释&#xff0c;并注释掉以下两行即可

    imname &#61; &#39;test/person.jpg&#39;
    detector.image_detector(imname)




我的运行结果如下&#xff08;环境是win10,用Spyder运行的&#xff0c;其中tensorflow版本建议更换到1.4以上&#xff09;&#xff1a;


我下载的YOLO_small.ckpt放在weights目录下。单张图像检测用时2.527s。

运行视频时如果报错&#xff0c;建议将

with tf.variable_scope(scope):

中添加reuse&#61;True就OK了&#xff0c;关于摄像头参数传0&#xff0c;-1&#xff0c;还是1看你具体设备了。

YOLO对卡通识别效果不错&#xff0c;只是当前模型识别种类太少。


参考文章&#xff1a;

https://blog.csdn.net/qq1483661204/article/details/79681926

https://blog.csdn.net/qq_34784753/article/details/78803423


推荐阅读
  • 推荐系统遇上深度学习(十七)详解推荐系统中的常用评测指标
    原创:石晓文小小挖掘机2018-06-18笔者是一个痴迷于挖掘数据中的价值的学习人,希望在平日的工作学习中,挖掘数据的价值, ... [详细]
  • YOLOv7基于自己的数据集从零构建模型完整训练、推理计算超详细教程
    本文介绍了关于人工智能、神经网络和深度学习的知识点,并提供了YOLOv7基于自己的数据集从零构建模型完整训练、推理计算的详细教程。文章还提到了郑州最低生活保障的话题。对于从事目标检测任务的人来说,YOLO是一个熟悉的模型。文章还提到了yolov4和yolov6的相关内容,以及选择模型的优化思路。 ... [详细]
  • OpenCV4.5.0+contrib编译流程及解决错误方法
    本文介绍了OpenCV4.5.0+contrib的编译流程,并提供了解决常见错误的方法,包括下载失败和路径修改等。同时提供了相关参考链接。 ... [详细]
  • 安装Tensorflow-GPU文档第一步:通过Anaconda安装python从这个链接https:www.anaconda.comdownload#window ... [详细]
  • 微软头条实习生分享深度学习自学指南
    本文介绍了一位微软头条实习生自学深度学习的经验分享,包括学习资源推荐、重要基础知识的学习要点等。作者强调了学好Python和数学基础的重要性,并提供了一些建议。 ... [详细]
  • 在Docker中,将主机目录挂载到容器中作为volume使用时,常常会遇到文件权限问题。这是因为容器内外的UID不同所导致的。本文介绍了解决这个问题的方法,包括使用gosu和suexec工具以及在Dockerfile中配置volume的权限。通过这些方法,可以避免在使用Docker时出现无写权限的情况。 ... [详细]
  • 本文介绍了在Python3中如何使用选择文件对话框的格式打开和保存图片的方法。通过使用tkinter库中的filedialog模块的asksaveasfilename和askopenfilename函数,可以方便地选择要打开或保存的图片文件,并进行相关操作。具体的代码示例和操作步骤也被提供。 ... [详细]
  • baresip android编译、运行教程1语音通话
    本文介绍了如何在安卓平台上编译和运行baresip android,包括下载相关的sdk和ndk,修改ndk路径和输出目录,以及创建一个c++的安卓工程并将目录考到cpp下。详细步骤可参考给出的链接和文档。 ... [详细]
  • 本文介绍了PhysioNet网站提供的生理信号处理工具箱WFDB Toolbox for Matlab的安装和使用方法。通过下载并添加到Matlab路径中或直接在Matlab中输入相关内容,即可完成安装。该工具箱提供了一系列函数,可以方便地处理生理信号数据。详细的安装和使用方法可以参考本文内容。 ... [详细]
  • 解决Cydia数据库错误:could not open file /var/lib/dpkg/status 的方法
    本文介绍了解决iOS系统中Cydia数据库错误的方法。通过使用苹果电脑上的Impactor工具和NewTerm软件,以及ifunbox工具和终端命令,可以解决该问题。具体步骤包括下载所需工具、连接手机到电脑、安装NewTerm、下载ifunbox并注册Dropbox账号、下载并解压lib.zip文件、将lib文件夹拖入Books文件夹中,并将lib文件夹拷贝到/var/目录下。以上方法适用于已经越狱且出现Cydia数据库错误的iPhone手机。 ... [详细]
  • XML介绍与使用的概述及标签规则
    本文介绍了XML的基本概念和用途,包括XML的可扩展性和标签的自定义特性。同时还详细解释了XML标签的规则,包括标签的尖括号和合法标识符的组成,标签必须成对出现的原则以及特殊标签的使用方法。通过本文的阅读,读者可以对XML的基本知识有一个全面的了解。 ... [详细]
  • 本文介绍了在Python张量流中使用make_merged_spec()方法合并设备规格对象的方法和语法,以及参数和返回值的说明,并提供了一个示例代码。 ... [详细]
  • 通过Anaconda安装tensorflow,并安装运行spyder编译器的完整教程
    本文提供了一个完整的教程,介绍了如何通过Anaconda安装tensorflow,并安装运行spyder编译器。文章详细介绍了安装Anaconda、创建tensorflow环境、安装GPU版本tensorflow、安装和运行Spyder编译器以及安装OpenCV等步骤。该教程适用于Windows 8操作系统,并提供了相关的网址供参考。通过本教程,读者可以轻松地安装和配置tensorflow环境,以及运行spyder编译器进行开发。 ... [详细]
  • 【论文】ICLR 2020 九篇满分论文!!!
    点击上方,选择星标或置顶,每天给你送干货!阅读大概需要11分钟跟随小博主,每天进步一丢丢来自:深度学习技术前沿 ... [详细]
  • 本人学习笔记,知识点均摘自于网络,用于学习和交流(如未注明出处,请提醒,将及时更正,谢谢)OS:我学习是为了上 ... [详细]
author-avatar
手机用户2502887447
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有