YOLOv1代码复现

作者：Min2502857657_377 | 来源：互联网 | 2023-10-10 14:29

1.YOLOv1概述Two-stage目标检测算法将目标检测与识别的过程分为候选区域提取与目标识别两个步骤来做，由于在做具体分类识别和位置回归前多了一步候选区域提取&

1. YOLO v1概述
Two-stage目标检测算法将目标检测与识别的过程分为候选区域提取与目标识别两个步骤来做&＃xff0c;由于在做具体分类识别和位置回归前多了一步候选区域提取&＃xff0c;因此Two-stage目标检测算法的识别率和候选框精确度是比较高的&＃xff0c;但对性能的消耗是非常巨大的。而YOLOv1作为YOLO系列算法的开山之作&＃xff0c;创造性地提出不再预先进行候选区域(Proposal Region)的提取&＃xff0c;而是直接将输入图片以网格的方式进行划分&＃xff0c;由每个网格负责预测中心点落在它内部的物体。不过也正是因为缺少了Proposal Region的提取&＃xff0c;所以相对来说回归精度要低一些。Yolo v1是端到端的&＃xff0c;直接做预测&＃xff0c;而不是通过候选区域提取&＃xff0c;将目标检测问题转换为一个分类问题。
one-stage vs two-stage
One-stage Two-stage
优点优点
推理速度快、训练快精度高
背景误检率低目标定位精度高、检出率高
缺点缺点
目标定位精度低、检出率低推理速度慢、训练慢
小物体检测效果差背景误检率高

**one-stage vs two-stage**
One-stage	Two-stage
优点	优点
推理速度快、训练快	精度高
背景误检率低	目标定位精度高、检出率高
缺点	缺点
目标定位精度低、检出率低	推理速度慢、训练慢
小物体检测效果差	背景误检率高

2.YOLOv1网络结构
作者实现的YOLO v1版本中&＃xff0c;输入图像的尺寸固定为448448&＃xff0c;在经过了24个卷积层和2个全连接层后&＃xff0c;最后输出771024的特征图(feature map)&＃xff0c;对应了作者将原图划分为SS个格子的思想&＃xff0c;feature map上的每一个张量都包含了后续预测任务时所需要的高层抽象语意信息。
如图&＃xff0c;YOLO v1将一张图片划分为SS个格子&＃xff0c;作者称之为栅格(grid cell)。对于一张大小为448448的图像&＃xff0c;经卷积层提取特征后&＃xff0c;输出大小为771024的特征图(feature map)&＃xff0c;feature map上的每一个111024的张量就对应着原图中的一个grid cell所提取出的特征&＃xff0c;不同的通道对应着不同的抽象语意信息。每个grid cell预测两个物体边界框(Bounding Box)以及grid cell预测的物体类别&＃xff0c;最后通过一个NSM算法去除冗余的Bounding Box&＃xff0c;生成检测结果。
如图, YOLO v1的网络架构为24个卷积层、4个最大池化层、2个全连接层组成&＃xff0c;卷积和池化层部分用于特征的提取&＃xff0c;全连接层用于预测。全连接层输出7730&＃xff0c;77代表原图被划分成的77的grid cell。
预训练模型结构定义&＃xff1a;
import torch.nn as nn import torchclass Convention(nn.Module):def init(self,in_channels,out_channels,conv_size,conv_stride,padding,need_bn &＃61; True):super(Convention,self).init()self.conv &＃61; nn.Conv2d(in_channels, out_channels, conv_size, conv_stride, padding, bias&＃61;False if need_bn else True)self.leaky_relu &＃61; nn.LeakyReLU(inplace&＃61;True,negative_slope&＃61;1e-1)self.need_bn &＃61; need_bnif need_bn:self.bn &＃61; nn.BatchNorm2d(out_channels)def forward(self, x):return self.bn(self.leaky_relu(self.conv(x))) if self.need_bn else self.leaky_relu(self.conv(x))def weight_init(self):for m in self.modules():if isinstance(m, nn.Conv2d):torch.nn.init.kaiming_normal_(m.weight.data)elif isinstance(m, nn.BatchNorm2d):m.weight.data.fill_(1)m.bias.data.zero_()class YOLO_Feature(nn.Module):def init(self, classes_num&＃61;80):super(YOLO_Feature,self).init()self.Conv_Feature &＃61; nn.Sequential(Convention(3, 64, 7, 2, 3),nn.MaxPool2d(2, 2),Convention(64, 192, 3, 1, 1),nn.MaxPool2d(2, 2),Convention(192, 128, 1, 1, 0),Convention(128, 256, 3, 1, 1),Convention(256, 256, 1, 1, 0),Convention(256, 512, 3, 1, 1),nn.MaxPool2d(2, 2),Convention(512, 256, 1, 1, 0),Convention(256, 512, 3, 1, 1),Convention(512, 256, 1, 1, 0),Convention(256, 512, 3, 1, 1),Convention(512, 256, 1, 1, 0),Convention(256, 512, 3, 1, 1),Convention(512, 256, 1, 1, 0),Convention(256, 512, 3, 1, 1),Convention(512, 512, 1, 1, 0),Convention(512, 1024, 3, 1, 1),nn.MaxPool2d(2, 2),)self.Conv_Semanteme &＃61; nn.Sequential(Convention(1024, 512, 1, 1, 0),Convention(512, 1024, 3, 1, 1),Convention(1024, 512, 1, 1, 0),Convention(512, 1024, 3, 1, 1),)self.avg_pool &＃61; nn.AdaptiveAvgPool2d(1)self.linear &＃61; nn.Linear(1024, classes_num)def forward(self, x):x &＃61; self.Conv_Feature(x)x &＃61; self.Conv_Semanteme(x)x &＃61; self.avg_pool(x)# batch_size * channel * width * heightx &＃61; x.permute(0, 2, 3, 1)x &＃61; torch.flatten(x, start_dim&＃61;1, end_dim&＃61;3)x &＃61; self.linear(x)return x# 定义权值初始化def initialize_weights(self):for m in self.modules():if isinstance(m, nn.Conv2d):torch.nn.init.kaiming_normal_(m.weight.data)elif isinstance(m, nn.BatchNorm2d):m.weight.data.fill_(1)m.bias.data.zero_()elif isinstance(m, nn.Linear):torch.nn.init.kaiming_normal_(m.weight.data)m.bias.data.zero_()elif isinstance(m, Convention):m.weight_init()
注&＃xff1a;笔者此处并对YOLOv1前20个普通卷积改进为Conv&＃43;BN层&＃xff0c;是为了利用BN层来加速网络的收敛。时至今日&＃xff0c;BN&＃43;Residual已经成为了CNN进行特征提取的标配&＃xff0c;当然值得注意的是&＃xff0c;在GAN-generator中&＃xff0c;更为合适的是LayerNormal&＃xff0c;不同的领域有各自适应的方法。
使用COCO数据集中的目标进行预训练(当然有条件的还是建议使用ImageNet预训练)&＃xff1a;
注&＃xff1a;笔者先前使用过ImageNet-Tiny数据集训练&＃xff0c;发现效果很差&＃xff0c;检查数据集后发现Tiny系数据集的实际待分类目标占整幅图像的比例很低&＃xff0c;而在YOLOv1的网络中存在全局平均池化&＃xff0c;因此会加剧收敛出现问题。举例来说&＃xff0c;假设都是鱼的种类&＃xff0c;一幅图片是一只占图像比例很大的鱼&＃xff0c;另一幅图片是一个人手中抱着一只鱼&＃xff0c;在全局池化后&＃xff0c;后者中混入了较多了人类的特征&＃xff0c;我们在当前将其往鱼的类别收敛&＃xff0c;那么当我们遇到分类目标为人的图片后&＃xff0c;又需要朝着将其分类为人的目标迭代&＃xff0c;因此网络将一直在人是鱼/人是人两种决策中摇摆&＃xff0c;无法收敛。
所以笔者采用的替代方案为&＃xff0c;利用现有的COCO数据集的bounding box标注&＃xff0c;将拥有最小杂信息的图像区域框选出来&＃xff0c;用这部分图像区域进行训练。
好处&＃xff1a;杂项信息更少&＃xff0c;网络便于训练收敛
坏处&＃xff1a;图像简单导致任务变得简单&＃xff0c;同时网络可能学不会利用背景辅助判断物体
import cv2 import os import time import random import imagesize import numpy as np from utils import image from torch.utils.data import Dataset import torchvision.transforms as transformsclass coco_classify_dataset(Dataset):def init(self,imgs_path &＃61; "../DataSet/COCO2017/Train/Imgs", txts_path &＃61; "../DataSet/COCO2017/Train/Labels", is_train &＃61; True, edge_threshold&＃61;200, class_num&＃61;80, input_size&＃61;256): # input_size:输入图像的尺度img_names &＃61; os.listdir(txts_path)self.is_train &＃61; is_trainself.transform_common &＃61; transforms.Compose([transforms.ToTensor(), # height * width * channel -> channel * height * widthtransforms.Normalize(mean&＃61;(0.408, 0.448, 0.471), std&＃61;(0.242, 0.239, 0.234)) # 归一化后.不容易产生梯度爆炸的问题])self.input_size &＃61; input_sizeself.train_data &＃61; [] # [img_path,[[coord, class_id]]]for img_name in img_names:img_path &＃61; os.path.join(imgs_path, img_name.replace(".txt", ".jpg"))txt_path &＃61; os.path.join(txts_path, img_name)coords &＃61; []with open(txt_path, &＃39;r&＃39;) as label_txt:for label in label_txt:label &＃61; label.replace("\n", "").split(" ")class_id &＃61; int(label[4])if class_id >&＃61; class_num:continuexmin &＃61; round(float(label[0]))ymin &＃61; round(float(label[1]))xmax &＃61; round(float(label[2]))ymax &＃61; round(float(label[3]))if (xmax - xmin)
`注&＃xff1a;笔者对于训练过程使用了数据增强&＃xff0c;对于验证过程则不用数据增强。`
`训练过程&＃xff1a;`
#------0.common variable definition------ import torch import argparse import torch.nn as nn from tqdm import tqdm import torch.optim as optim from utils.model import accuracy from tensorboardX import SummaryWriter from torch.utils.data import DataLoader from utils.model import feature_map_visualize from YOLO.PreTrain.YOLO_Feature import YOLO_Feature from YOLO.PreTrain.COCO_Classify_DataSet import coco_classify_dataset if torch.cuda.is_available():device &＃61; torch.device(&＃39;cuda:0&＃39;)torch.backends.cudnn.benchmark &＃61; True else:device &＃61; torch.device(&＃39;cpu&＃39;)if name &＃61;&＃61; "main":# 1.training parametersparser &＃61; argparse.ArgumentParser(description&＃61;"YOLO_Feature train config")parser.add_argument(&＃39;--batch_size&＃39;, type&＃61;int, help&＃61;"YOLO_Feature train batch_size", default&＃61;32)parser.add_argument(&＃39;--num_workers&＃39;, type&＃61;int, help&＃61;"YOLO_Feature train num_worker num", default&＃61;4)parser.add_argument(&＃39;--lr&＃39;, type&＃61;float, help&＃61;"lr", default&＃61;3e-4)parser.add_argument(&＃39;--weight_decay&＃39;, type&＃61;float, help&＃61;"weight_decay", default&＃61;0.0005)parser.add_argument(&＃39;--epoch_num&＃39;, type&＃61;int, help&＃61;"YOLO_Feature train epoch_num", default&＃61;200)parser.add_argument(&＃39;--epoch_interval&＃39;, type&＃61;int, help&＃61;"save YOLO_Feature interval", default&＃61;10)parser.add_argument(&＃39;--class_num&＃39;, type&＃61;int, help&＃61;"YOLO_Feature train class_num", default&＃61;80)parser.add_argument(&＃39;--train_imgs&＃39;, type&＃61;str, help&＃61;"YOLO_Feature train train_imgs", default&＃61;"../../DataSet/COCO2017/Train/Imgs")parser.add_argument(&＃39;--train_labels&＃39;, type&＃61;str, help&＃61;"YOLO_Feature train train_labels", default&＃61;"../../DataSet/COCO2017/Train/Labels")parser.add_argument(&＃39;--val_imgs&＃39;, type&＃61;str, help&＃61;"YOLO_Feature train val_imgs", default&＃61;"../../DataSet/COCO2017/Val/Imgs")parser.add_argument(&＃39;--val_labels&＃39;, type&＃61;str, help&＃61;"YOLO_Feature train val_labels", default&＃61;"../../DataSet/COCO2017/Val/Labels")parser.add_argument(&＃39;--grad_visualize&＃39;, type&＃61;bool, help&＃61;"YOLO_Feature train grad visualize", default&＃61;False)parser.add_argument(&＃39;--feature_map_visualize&＃39;, type&＃61;bool, help&＃61;"YOLO_Feature train feature map visualize", default&＃61;False)parser.add_argument(&＃39;--restart&＃39;, type&＃61;bool, help&＃61;"YOLO_Feature train from zeor?", default&＃61;True)parser.add_argument(&＃39;--pre_weight_file&＃39;, type&＃61;str, help&＃61;"YOLO_Feature pre weight path", default&＃61;"./weights/YOLO_Feature_20.pth")args &＃61; parser.parse_args()batch_size &＃61; args.batch_sizenum_workers &＃61; args.num_workersepoch_num &＃61; args.epoch_numepoch_interval &＃61; args.epoch_intervalclass_num &＃61; args.class_numif args.restart &＃61;&＃61; True:lr &＃61; args.lrparam_dict &＃61; {}epoch &＃61; 0epoch_val_loss_min &＃61; 999999999else:param_dict &＃61; torch.load(args.pre_weight_file, map_location&＃61;torch.device("cpu"))optimal_dict &＃61; param_dict[&＃39;optimal&＃39;]epoch &＃61; param_dict[&＃39;epoch&＃39;]epoch_val_loss_min &＃61; param_dict[&＃39;epoch_val_loss_min&＃39;]# 2.datasettrain_dataSet &＃61; coco_classify_dataset(imgs_path&＃61;args.train_imgs,txts_path&＃61;args.train_labels, is_train&＃61;True, edge_threshold&＃61;200)val_dataSet &＃61; coco_classify_dataset(imgs_path&＃61;args.val_imgs,txts_path&＃61;args.val_labels, is_train&＃61;False, edge_threshold&＃61;200)# 3-4.network - optimizeryolo_feature &＃61; YOLO_Feature(classes_num&＃61;class_num)if args.restart &＃61;&＃61; True:yolo_feature.initialize_weights()optimizer &＃61; optim.Adam(params&＃61;yolo_feature.parameters(), lr&＃61;args.lr, weight_decay&＃61;args.weight_decay)else:yolo_feature.load_state_dict(param_dict[&＃39;model&＃39;])optimizer &＃61; param_dict[&＃39;optimizer&＃39;]yolo_feature.to(device&＃61;device, non_blocking&＃61;True)# 5.lossloss_function &＃61; nn.CrossEntropyLoss().to(device&＃61;device)# 6.train and recordinput_size &＃61; 256writer &＃61; SummaryWriter(logdir&＃61;&＃39;./log&＃39;, filename_suffix&＃61;&＃39; [&＃39; &＃43; str(epoch) &＃43; &＃39;~&＃39; &＃43; str(epoch &＃43; epoch_interval) &＃43; &＃39;]&＃39;)while epoch

3.YOLOv1输出结构如图&＃xff0c;由于作者使用了VOC数据集(20个类别)来测试并测试YOLO v1&＃xff0c;所以预测输出的张量中&＃xff0c;前面两个5维分别表示两个Bounding Box的物体置信度以及两个box各自的中心坐标及宽高&＃xff0c;后面的20维对应了20种类别各自的概率。 IOU&＃xff1a;&＃xff08;区域交并比&＃xff09; 在目标检测领域&＃xff0c;IoU是一个重要指标&＃xff0c;通过两个box的交集和并集的面积值比值来衡量两个boxes的接近程度(重叠程度)。矩形交集计算&＃xff1a;223. 矩形面积_The Shawshank Redemption-CSDN博客 def iou(self, box1, box2): # 计算两个box的IoU值# box: lx-左上x ly-左上y rx-右下x ry-右下y 图像向右为y 向下为x# 1. 获取交集的矩形左上和右下坐标interLX &＃61; max(box1[0],box2[0])interLY &＃61; max(box1[1],box2[1])interRX &＃61; min(box1[2],box2[2])interRY &＃61; min(box1[3],box2[3])# 2. 计算两个矩形各自的面积Area1 &＃61; (box1[2] - box1[0]) * (box1[3] - box1[1])Area2 &＃61; (box2[2] - box2[0]) * (box2[3] - box2[1])# 3. 不存在交集if interRX 置信度&＃xff1a;作者采用了同时考虑有无物体以及定位准确度的方式预测输出的结果我们使用sigmod函数将输出压缩在(0,1)&＃xff0c;在制作Ground Truth时&＃xff0c;我们根据上述约定直接计算即可。使用VOC数据集进行目标检测的训练用于目标检测的YOLOv1网络结构&＃xff1a; import torch import torch.nn as nn from YOLO.PreTrain.YOLO_Feature import Conventionclass YOLOv1(nn.Module):def init(self,B&＃61;2,classes_num&＃61;20):super(YOLOv1,self).init()self.B &＃61; Bself.classes_num &＃61; classes_numself.Conv_Feature &＃61; nn.Sequential(Convention(3, 64, 7, 2, 3),nn.MaxPool2d(2, 2),Convention(64, 192, 3, 1, 1),nn.MaxPool2d(2, 2),Convention(192, 128, 1, 1, 0),Convention(128, 256, 3, 1, 1),Convention(256, 256, 1, 1, 0),Convention(256, 512, 3, 1, 1),nn.MaxPool2d(2, 2),Convention(512, 256, 1, 1, 0),Convention(256, 512, 3, 1, 1),Convention(512, 256, 1, 1, 0),Convention(256, 512, 3, 1, 1),Convention(512, 256, 1, 1, 0),Convention(256, 512, 3, 1, 1),Convention(512, 256, 1, 1, 0),Convention(256, 512, 3, 1, 1),Convention(512, 512, 1, 1, 0),Convention(512, 1024, 3, 1, 1),nn.MaxPool2d(2, 2),)self.Conv_Semanteme &＃61; nn.Sequential(Convention(1024, 512, 1, 1, 0),Convention(512, 1024, 3, 1, 1),Convention(1024, 512, 1, 1, 0),Convention(512, 1024, 3, 1, 1),)self.Conv_Back &＃61; nn.Sequential(Convention(1024, 1024, 3, 1, 1, need_bn&＃61;False),Convention(1024, 1024, 3, 2, 1, need_bn&＃61;False),Convention(1024, 1024, 3, 1, 1, need_bn&＃61;False),Convention(1024, 1024, 3, 1, 1, need_bn&＃61;False),)self.Fc &＃61; nn.Sequential(nn.Linear(771024,4096),nn.LeakyReLU(inplace&＃61;True, negative_slope&＃61;1e-1),nn.Linear(4096,7 * 7 * (B5 &＃43; classes_num)),)self.sigmoid &＃61; nn.Sigmoid()self.softmax &＃61; nn.Softmax(dim&＃61;3)def forward(self, x):x &＃61; self.Conv_Feature(x)x &＃61; self.Conv_Semanteme(x)x &＃61; self.Conv_Back(x)# batch_size channel * height * weight -> batch_size * height * weight * channelx &＃61; x.permute(0, 2, 3, 1)x &＃61; torch.flatten(x, start_dim&＃61;1, end_dim&＃61;3)x &＃61; self.Fc(x)x &＃61; x.view(-1, 7, 7, (self.B * 5 &＃43; self.classes_num))#print("x seg:{}".format(x[:,:,:,0 : self.B * 5]))bnd_coord &＃61; self.sigmoid(x[:,:,:,0 : self.B * 5])#print("bnd_coord:{}".format(bnd_coord))bnd_cls &＃61; self.softmax(x[:,:,:, self.B * 5 : ])bnd &＃61; torch.cat([bnd_coord, bnd_cls], dim&＃61;3)#x &＃61; self.sigmoid(x.view(-1,7,7,(self.B * 5 &＃43; self.classes_num)))#x[:,:,:, 0 : self.B * 5] &＃61; self.sigmoid(x[:,:,:, 0 : self.B * 5])#x[:,:,:, self.B * 5 : ] &＃61; self.softmax(x[:,:,:, self.B * 5 : ])return bnd# 定义权值初始化def initialize_weights(self, net_param_dict):for name, m in self.named_modules():if isinstance(m, nn.Conv2d):torch.nn.init.kaiming_normal_(m.weight.data)elif isinstance(m, nn.BatchNorm2d):m.weight.data.fill_(1)m.bias.data.zero_()elif isinstance(m, nn.Linear):torch.nn.init.kaiming_normal_(m.weight.data)m.bias.data.zero_()elif isinstance(m, Convention):m.weight_init()self_param_dict &＃61; self.state_dict()for name, layer in self.named_parameters():if name in net_param_dict:self_param_dict[name] &＃61; net_param_dict[name]self.load_state_dict(self_param_dict) VOC目标检测数据集类&＃xff1a; from torch.utils.data import Dataset import os import cv2 import xml.etree.ElementTree as ET import torchvision.transforms as transforms import numpy as np import random import torch from utils import imageclass VOC_Detection_Set(Dataset):def init(self, imgs_path&＃61;"../DataSet/VOC2007&＃43;2012/Train/JPEGImages",annotations_path&＃61;"../DataSet/VOC2007&＃43;2012/Train/Annotations",classes_file&＃61;"../DataSet/VOC2007&＃43;2012/class.data", is_train &＃61; True, class_num&＃61;20,label_smooth_value&＃61;0.05, input_size&＃61;448, grid_size&＃61;64, loss_mode&＃61;"mse"): # input_size:输入图像的尺度self.label_smooth_value &＃61; label_smooth_valueself.class_num &＃61; class_numself.imgs_name &＃61; os.listdir(imgs_path)self.input_size &＃61; input_sizeself.grid_size &＃61; grid_sizeself.is_train &＃61; is_trainself.transform_common &＃61; transforms.Compose([transforms.ToTensor(), # height * width * channel -> channel * height * widthtransforms.Normalize(mean&＃61;(0.408, 0.448, 0.471), std&＃61;(0.242, 0.239, 0.234)) # 归一化后.不容易产生梯度爆炸的问题])self.imgs_path &＃61; imgs_pathself.annotations_path &＃61; annotations_pathself.class_dict &＃61; {}self.loss_mode &＃61; loss_modeclass_index &＃61; 0with open(classes_file, &＃39;r&＃39;) as file:for class_name in file:class_name &＃61; class_name.replace(&＃39;\n&＃39;, &＃39;&＃39;)self.class_dict[class_name] &＃61; class_index # 根据类别名制作索引class_index &＃61; class_index &＃43; 1def getitem(self, item):img_path &＃61; os.path.join(self.imgs_path, self.imgs_name[item])annotation_path &＃61; os.path.join(self.annotations_path, self.imgs_name[item].replace(".jpg", ".xml"))img &＃61; cv2.imread(img_path)tree &＃61; ET.parse(annotation_path)annotation_xml &＃61; tree.getroot()objects_xml &＃61; annotation_xml.findall("object")coords &＃61; []for object_xml in objects_xml:bnd_xml &＃61; object_xml.find("bndbox")class_name &＃61; object_xml.find("name").textif class_name not in self.class_dict: # 不属于我们规定的类continuexmin &＃61; round((float)(bnd_xml.find("xmin").text))ymin &＃61; round((float)(bnd_xml.find("ymin").text))xmax &＃61; round((float)(bnd_xml.find("xmax").text))ymax &＃61; round((float)(bnd_xml.find("ymax").text))class_id &＃61; self.class_dict[class_name]coords.append([xmin, ymin, xmax, ymax, class_id])coords.sort(key&＃61;lambda coord : (coord[2] - coord[0]) * (coord[3] - coord[1]) )if self.is_train:transform_seed &＃61; random.randint(0, 4)if transform_seed &＃61;&＃61; 0: # 原图img, coords &＃61; image.resize_image_with_coords(img, self.input_size, self.input_size, coords)img &＃61; self.transform_common(img)elif transform_seed &＃61;&＃61; 1: # 缩放&＃43;中心裁剪img, coords &＃61; image.center_crop_with_coords(img, coords)img, coords &＃61; image.resize_image_with_coords(img, self.input_size, self.input_size, coords)img &＃61; self.transform_common(img)elif transform_seed &＃61;&＃61; 2: # 平移img, coords &＃61; image.transplant_with_coords(img, coords)img, coords &＃61; image.resize_image_with_coords(img, self.input_size, self.input_size, coords)img &＃61; self.transform_common(img)else: # 曝光度调整img, coords &＃61; image.resize_image_with_coords(img, self.input_size, self.input_size, coords)img &＃61; image.exposure(img, gamma&＃61;0.5)img &＃61; self.transform_common(img)else:img, coords &＃61; image.resize_image_with_coords(img, self.input_size, self.input_size, coords)img &＃61; self.transform_common(img)ground_truth, ground_mask_positive, ground_mask_negative &＃61; self.getGroundTruth(coords)return img, [ground_truth, ground_mask_positive, ground_mask_negative, img_path]#ground_truth, ground_mask_positive, ground_mask_negative &＃61; self.getGroundTruth(coords)# 通道变化方法: img &＃61; img[:, :, ::-1]#return img, ground_truth, ground_mask_positive, ground_mask_negativedef len(self):return len(self.imgs_name)def getGroundTruth(self, coords):feature_size &＃61; self.input_size // self.grid_size#ground_mask_positive &＃61; np.zeros([feature_size, feature_size, 1], dtype&＃61;bool)#ground_mask_negative &＃61; np.ones([feature_size, feature_size, 1], dtype&＃61;bool)ground_mask_positive &＃61; np.full(shape&＃61;(feature_size, feature_size, 1), fill_value&＃61;False, dtype&＃61;bool)ground_mask_negative &＃61; np.full(shape&＃61;(feature_size, feature_size, 1), fill_value&＃61;True, dtype&＃61;bool)if self.loss_mode &＃61;&＃61; "mse":ground_truth &＃61; np.zeros([feature_size, feature_size, 10 &＃43; self.class_num &＃43; 2])else:ground_truth &＃61; np.zeros([feature_size, feature_size, 10 &＃43; 1])for coord in coords:xmin, ymin, xmax, ymax, class_id &＃61; coordground_width &＃61; (xmax - xmin)ground_height &＃61; (ymax - ymin)center_x &＃61; (xmin &＃43; xmax) / 2center_y &＃61; (ymin &＃43; ymax) / 2index_row &＃61; (int)(center_y * feature_size)index_col &＃61; (int)(center_x * feature_size)# 分类标签 label_smoothif self.loss_mode &＃61;&＃61; "mse":# 转化为one_hot编码对one_hot编码做平滑处理class_list &＃61; np.full(shape&＃61;self.class_num, fill_value&＃61;1.0, dtype&＃61;float)deta &＃61; 0.01class_list &＃61; class_list * deta / (self.class_num - 1)class_list[class_id] &＃61; 1.0 - detaelif self.loss_mode &＃61;&＃61; "cross_entropy":class_list &＃61; [class_id]else:raise Exception("the loss mode can&＃39;t be support now!")# 定位数据预设ground_box &＃61; [center_x * feature_size - index_col, center_y * feature_size - index_row,ground_width, ground_height, 1,round(xmin * self.input_size), round(ymin * self.input_size),round(xmax * self.input_size), round(ymax * self.input_size),round(ground_width * self.input_size * ground_height * self.input_size)]ground_box.extend(class_list)ground_box.extend([index_col, index_row])ground_truth[index_row][index_col] &＃61; np.array(ground_box)ground_mask_positive[index_row][index_col] &＃61; Trueground_mask_negative[index_row][index_col] &＃61; Falsereturn ground_truth, torch.BoolTensor(ground_mask_positive), torch.BoolTensor(ground_mask_negative) [注]&＃xff1a;在YOLO v1中&＃xff0c;每一个grid cell虽然预测两个bounding box&＃xff0c;但是最终只有一个是有效的&＃xff0c;最多检测771&＃61;49个物体。为简单起见&＃xff0c;在本人的实现中&＃xff0c;对于多个物体的重心落于同一个grid cell的情况(概率非常低)&＃xff0c;采用的方式是选择最后一个确定是该方格负责的物体。mask操作是为了利用一部分显存实现快速计算正负样本损失。 4. YOLO v1 损失函数损失函数是深度学习网络模型非常重要的“指挥棒”&＃xff0c;负责引导整体网络的任务和学习方向&＃xff0c;通过对预测样本和真实样本的误差进行反向传播来指导网络进行参数的调整学习。我们将含有物体的Bounding Box当作正样本&＃xff0c;将不含有物体的Bounding Box当作负样本。在实际的实现上&＃xff0c;通过Bounding Box与真实的物体边界框(Ground Truth)的IoU值来判定正负样本&＃xff0c;将与Ground Truth拥有最大IoU值的box当作正样本&＃xff0c;其余的box作为负样本。整个YOLO v1算法的损失函数就包含分别关于正样本(负责预测物体的Bounding Box)和负样本(负责预测物体的Bounding Box)两部分&＃xff0c;正样本置信度为1&＃xff0c;负样本置信度为0&＃xff0c;正样本的损失包含置信度损失、边框回归损失和类别损失&＃xff0c;而负样本损失只有置信度损失。 [注]&＃xff1a;这边解释一下&＃xff0c;因为我们预先设置好了SSB个Bounding Box&＃xff0c;但是有可能存在一些Bounding Box是完全没有预测到目标的&＃xff0c;那些预测到目标的Bounding Box就是正样本&＃xff0c;没有预测到目标的就是负样本。在作者创作YOLO v1的那个年代&＃xff0c;用于目标检测的数据还没有特别密集的目标的情况&＃xff0c;因此存在较多的负样本。 YOLO v1的损失由5个部分组成&＃xff0c;均使用均方差损失&＃xff1a; (1) 第一部分为正样本中心点坐标的损失&＃xff0c;引入 [注]&＃xff1a;对于YOLOv1来说&＃xff0c;正负样本的归属取决于在预测阶段预测框与真实框的IoU值&＃xff0c;若一个物体落在某个cell内&＃xff0c;那么由这个cell预测出的两个box中&＃xff0c;与真实框拥有更大IoU值的box负责拟合&＃xff0c;即作为正样本&＃xff0c;另一个即为负样本。另外&＃xff0c;在最初的时候&＃xff0c;先冻结backbone部分训练10个epoch&＃xff0c;先训练出预测部分&＃xff0c;然后再让预测部分与特征提取部分共同训练。 5. YOLOv1预测结果处理--NMS算法通常来说&＃xff0c;目标检测算法的最终输出结果是很多的Bounding Box用于预测目标&＃xff0c;常用做法是将所有的Box通过非极大值抑制(NMS)算法去除冗余&＃xff0c;保留效果最好的。算法 NMS算法输入&＃xff1a;Bounding Box的集合p、IoU阈值、置信度阈值。输出&＃xff1a;去除冗余的Bounding box集合q。 1.去除集合p中置信度低于置信度阈值的Bounding Box。 2.在集合p中选取拥有最大置信度的Box&＃xff0c;移出集合p并加入集合q&＃xff0c;并将p中剩余的Bounding Box与该box计算IOU值&＃xff0c;去除那些与该Box的IOU值超过阈值的Bounding Box。 3.重复步骤2&＃xff0c;直到集合p为空 4.输出集合q&＃xff0c;为所求的结果集合。 NMS&＃xff1a; import numpy as np# 这边要求的bounding_boxes为处理后的实际的样子 def NMS(bounding_boxes,confidence_threshold,iou_threshold):# boxRow : x y dx dy c# 1. 初步筛选,先把grid cell预测的两个bounding box取出置信度较高的那个boxes &＃61; []for boxRow in bounding_boxes:# grid cell预测出的两个box,含有物体的置信度没有达到阈值if boxRow[4] boxRow[9]:box &＃61; boxRow[0:4]else:box &＃61; boxRow[5:9]# box : x y dx dy class_probality_index class_probalitybox.append(class_probality_index)box.append(class_probality)boxes.append(box)# 2. 循环直到待筛选的box集合为空predicted_boxes &＃61; []while len(boxes) !&＃61; 0:# 对box集合按照置信度从大到小排序boxes &＃61; sorted(boxes, key&＃61;(lambda x : [x[4]]), reverse&＃61;True)# 确定含有最大值信度的box会被选中choiced_box &＃61; boxes[0]predicted_boxes.append(choiced_box)for index in len(boxes):# 如果冲突的box的iou值已经大于阈值需要丢弃if iou(boxes[index],choiced_box) > iou_threshold:boxes.pop(index)return predicted_boxes 6. YOLO v1分析 1.YOLO v1网络优势 ①在33的卷积后接上一个通道数低的11的卷积&＃xff0c;用于进行特征的通道压缩&＃xff0c;降低计算量&＃xff1b;同时多一层的卷积也提升了模型的非线性表达能力。 ②在训练中使用Dropout和数据增强的方式来防止网络过拟合。 ③并没有引入Anchor机制&＃xff0c;而是直接在每个区域进行框的大小与位置信息的预测&＃xff0c;利用区域本身携带的位置信息和被检测物体尺度处于网络可以回归范围之内的特性将目标检测问题转化为一个回归问题。 ④YOLO v1将物体类别与物体置信度分开预测&＃xff0c;简化了问题&＃xff0c;实验证明YOLO v1背景误检率要低于Fast R-CNN&＃xff0c;YOLO v1的误差主要来源是定位误差&＃xff0c;如图4-7所示&＃xff1a; 2.YOLO v1缺陷分析 ①每一个区域只预测两个框&＃xff0c;并且共用同一个类别向量&＃xff0c;这导致YOLO v1只能检测有限个物体&＃xff0c;并且对于小物体和距离相近的物体的检测效果并不好&＃xff0c;而实际的情况下&＃xff0c;预测的772&＃61;98个bounding box中&＃xff0c;最多只有49个是有效的&＃xff0c;也就是说YOLO v1对于一张图片最多预测49个物体。 ②由于没有引入Anchor机制&＃xff0c;而是直接从数据中学习并进行预测&＃xff0c;故很难泛化到新的、不常见的宽高比例的目标的检测中&＃xff0c;所以模型对于新的或者并不常见宽高比例的物体检测效果并不好。另外&＃xff0c;由于下采样率比较大&＃xff0c;对于边框的回归精度也不高。 ③在v1的损失函数设计中&＃xff0c;大物体和小物体的定位损失权重一样&＃xff0c;这将会导致同等比例的定位误差&＃xff0c;大物体的损失会比小物体大&＃xff0c;小物体的损失在总损失中占比较小&＃xff0c;然而实际上&＃xff0c;小边界框的小误差对IoU的影响比大边界框要大得多&＃xff0c;会导致定位的不准确&＃xff0c;但是作者也是知道的&＃xff0c;只不过为了保持YOLO v1简单的特性&＃xff0c;作者的处理方式是使用对尺度开方&＃xff0c;依此提高小物体尺度损失的相对权重。 3.YOLO v1与其他网络的性能对比&＃xff1a; 相较于DPM等传统方法而言&＃xff0c;YOLO有更高的精度&＃xff1b;相较于以Fast R-CNN为代表的一系列的Two-stage算法&＃xff0c;YOLO的精度稍有逊色&＃xff0c;但是FPS达到了完全碾压的地步&＃xff0c;兼顾了实时性和精度&＃xff0c;使得工业上用深度学习做目标检测成为可能。 7.个人训练优化策略(已删除&＃xff0c;基本按照YOLOv1论文复现) 1.全卷积结构为了避免卷积的输出reshape为普通张量导致的特征图错乱的问题&＃xff0c;因此本人还提出一种全卷积结构用来实验对YOLO V1的推理能力进行优化&＃xff0c;结合1*1的卷积进行特征压缩&＃xff0c;而不是直接降采样&＃xff0c;依此来提高有效的特征保留。 2.多步长调整学习率在深度学习中&＃xff0c;学习率在初期往往很大&＃xff0c;一是可以用来加快训练&＃xff0c;二是可以冲出鞍点和一些局部最优点&＃xff1b;而在后期&＃xff0c;网络稳定收敛到某个最小值时&＃xff08;实际上可能还是局部最小&＃xff0c;因为深度学习不是一个凸优化问题&＃xff0c;因此我们不太可能正好找到那个最优解&＃xff0c;但是我们可以通过学习算法获得一个较为优秀的解&＃xff09;&＃xff0c;为了避免网络发散&＃xff0c;同时防止网络在最小值附近不断震荡&＃xff0c;而应该调小学习率&＃xff0c;让网络顺着那个最小值的方向进行下降。 3.Tensorboard监控训练为了更好地监控网络的训练情况&＃xff0c;本人在项目中引入了Tensorboard功能。 4.后期准备本人打算先复现一个功能上还算完善的网络&＃xff0c;后期还会加入数据集扩充等功能&＃xff0c;并继续优化网络的计算速度以及显存占用~~ 5.当前网络情况全卷积网络收敛情况 YOLO V1原网络收敛情况项目复现github地址&＃xff1a;经过几版本重构后&＃xff0c;原仓库太大导致上传太慢&＃xff0c;现全部转移至新仓库 GitHub - ProgrammerZhujinming/YOLO




    
        
                        算法
                        io
                        图片
                        get
                        grid
                        架构
                        import
                        ide
                        instance
                    
    



    
        写下你的评论吧 !
        
            
                吐个槽吧,看都看了
            
            
                
                                        会员登录 | 用户注册
                                    
                
            
        

        
    

    
        推荐阅读
        
            
                                
                    
                        go
                        CSS3选择器的使用方法详解，提高Web开发效率和精准度
                    

                    
                                                
                            
                        
                                                
                        本文详细介绍了CSS3新增的选择器方法，包括属性选择器的使用。通过CSS3选择器，可以提高Web开发的效率和精准度，使得查找元素更加方便和快捷。同时，本文还对属性选择器的各种用法进行了详细解释，并给出了相应的代码示例。通过学习本文，读者可以更好地掌握CSS3选择器的使用方法，提升自己的Web开发能力。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-14 14:37:52
                    

                

                
                                
                    
                        join
                        如何从列表中删除所有零？
                    

                    
                                                
                        本文介绍了如何使用python从列表中删除所有的零，并将结果以列表形式输出，同时提供了示例格式。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-13 13:02:00
                    

                

                                
                    
                    
                
                
                                
                    
                        go
                        也就是|小窗_卷积的特征提取与参数计算
                    

                    
                                                
                            
                        
                                                
                        篇首语：本文由编程笔记#小编为大家整理，主要介绍了卷积的特征提取与参数计算相关的知识，希望对你有一定的参考价值。Dense和Conv2D根本区别在于，Den ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-13 12:59:48
                    

                

                
                                
                    
                        ip
                        Python瓦片图下载、合并、绘图、标记的代码示例
                    

                    
                                                
                            
                        
                                                
                        本文提供了Python瓦片图下载、合并、绘图、标记的代码示例，包括下载代码、多线程下载、图像处理等功能。通过参考geoserver，使用PIL、cv2、numpy、gdal、osr等库实现了瓦片图的下载、合并、绘图和标记功能。代码示例详细介绍了各个功能的实现方法，供读者参考使用。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-13 12:14:55
                    

                

                
                                
                    
                        timestamp
                        【机器学习手册】日期和时区操作的重要性及应用
                    

                    
                                                
                        本文介绍了机器学习手册中关于日期和时区操作的重要性以及其在实际应用中的作用。文章以一个故事为背景，描述了学童们面对老先生的教导时的反应，以及上官如在这个过程中的表现。同时，文章也提到了顾慎为对上官如的恨意以及他们之间的矛盾源于早年的结局。最后，文章强调了日期和时区操作在机器学习中的重要性，并指出了其在实际应用中的作用和意义。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-12 17:40:14
                    

                

                
                                
                    
                        byte
                        Python自动提取文本中的时间（包含中文日期）及特殊时间识别方法
                    

                    
                                                
                        本文介绍了在处理不规则数据时如何使用Python自动提取文本中的时间日期，包括使用dateutil.parser模块统一日期字符串格式和使用datefinder模块提取日期。同时，还介绍了一段使用正则表达式的代码，可以支持中文日期和一些特殊的时间识别，例如'2012年12月12日'、'3小时前'、'在2012/12/13哈哈'等。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-12 12:09:33
                    

                

                
                                
                    
                        join
                        Python拼接字符串的七种方式
                    

                    
                                                
                        这篇文章主要介绍了Python拼接字符串的七种方式，包括使用%、format()、join()、f-string等方法。每种方法都有其特点和限制，通过本文的介绍可以帮助读者更好地理解和运用字符串拼接的技巧。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-12 11:15:18
                    

                

                
                                
                    
                        ip
                        如何使用readlink获取文件的完整路径？
                    

                    
                                                
                        本文介绍了使用readlink命令获取文件的完整路径的简单方法，并提供了一个示例命令来打印文件的完整路径。共有28种解决方案可供选择。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-09 17:28:17
                    

                

                
                                
                    
                        ip
                        利用ARMA模型对平稳非白噪声序列进行建模的步骤及代码实现
                    

                    
                                                
                        本文介绍了利用ARMA模型对平稳非白噪声序列进行建模的步骤及代码实现。首先对观察值序列进行样本自相关系数和样本偏自相关系数的计算，然后根据这些系数的性质选择适当的ARMA模型进行拟合，并估计模型中的位置参数。接着进行模型的有效性检验，如果不通过则重新选择模型再拟合，如果通过则进行模型优化。最后利用拟合模型预测序列的未来走势。文章还介绍了绘制时序图、平稳性检验、白噪声检验、确定ARMA阶数和预测未来走势的代码实现。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-09 08:30:08
                    

                

                
                                
                    
                        go
                        Golang如何使用Cookie跟踪位置
                    

                    
                                                
                        关键词：Golang, Cookie, 跟踪位置, net/http/cookiejar, package main, golang.org/x/net/publicsuffix, io/ioutil, log, net/http, net/http/cookiejar ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-13 15:47:22
                    

                

                
                                
                    
                        buffer
                        java 模拟get post请求_Java后台模拟发送http的get和post请求，并测试
                    

                    
                                                
                        个人学习使用：谨慎参考1Client类importcom.thoughtworks.gauge.Step;importcom.thoughtworks.gauge.T ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-13 14:20:23
                    

                

                
                                
                    
                        require
                        web.py开发web 第八章 Formalchemy 服务端验证方法
                    

                    
                                                
                            
                        
                                                
                        本文介绍了在web.py开发中使用Formalchemy进行服务端表单数据验证的方法。以User表单为例，详细说明了对各字段的验证要求，包括必填、长度限制、唯一性等。同时介绍了如何自定义验证方法来实现验证唯一性和两个密码是否相等的功能。该文提供了相关代码示例。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-12 16:36:00
                    

                

                
                                
                    
                        buffer
                        OpenMap教程4 – 图层概述
                    

                    
                                                
                            
                        
                                                
                        本文介绍了OpenMap教程4中关于地图图层的内容，包括将ShapeLayer添加到MapBean中的方法，OpenMap支持的图层类型以及使用BufferedLayer创建图像的MapBean。此外，还介绍了Layer背景标志的作用和OMGraphicHandlerLayer的基础层类。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-09 19:26:56
                    

                

                
                                
                    
                        buffer
                        如何在elementui table 内容里面放多选框？
                    

                    
                                                
                            
                        
                                                
                        本文介绍了如何在elementui的table组件中放置多选框的方法，并提供了相应的代码示例和UI图效果。通过阅读本文，你将了解如何将UI图中的多选框放到表格内容中，并实现相应的功能。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-09 18:30:52
                    

                

                
                                
                    
                        go
                        [echarts] 同指标对比柱状图相关的知识介绍及应用示例
                    

                    
                                                
                            
                        
                                                
                        本文由编程笔记小编为大家整理，主要介绍了echarts同指标对比柱状图相关的知识，包括对比课程通过率最高的8个课程和最低的8个课程以及全校的平均通过率。文章提供了一个应用示例，展示了如何使用echarts制作同指标对比柱状图，并对代码进行了详细解释和说明。该示例可以帮助读者更好地理解和应用echarts。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2023-12-09 10:02:11

















    

    
        
            
            
                
                
            

            
                Min2502857657_377            

            
                这个家伙很懒，什么也没留下！            


        
    

    
    

    
    

    
        Tags | 热门标签
        
            
                                
                    scala
                
                                
                    filter
                
                                
                    usb
                
                                
                    python2
                
                                
                    main
                
                                
                    datetime
                
                                
                    join
                
                                
                    subset
                
                                
                    runtime
                
                                
                    require
                
                                
                    keyword
                
                                
                    instance
                
                                
                    replace
                
                                
                    settings
                
                                
                    md5
                
                                
                    ip
                
                                
                    callback
                
                                
                    eval
                
                                
                    python3
                
                                
                    bit
                
                                
                    jsp
                
                                
                    byte
                
                                
                    triggers
                
                                
                    hashset
                
                                
                    tree
                
                                
                    buffer
                
                                
                    timestamp
                
                                
                    go
                
                                
                    hook
                
                                
                    golang
                
                                
            
        
    

    
    
        
            
            
        
        RankList | 热门文章
        
            
                                
                    1详解Android中一些SQLite的增删改查操作
                
                                
                    2Android异步回调中的UI同步性问题分析
                
                                
                    3Android UI设计系列之HTML标签实现TextView设置中文字体加粗效果（6）
                
                                
                    4Android Activity中使用Intent实现页面跳转与参数传递的方法
                
                                
                    5Android实现带磁性的悬浮窗体效果
                
                                
                    6不可不知的Android strings.xml那些事
                
                                
                    7Android 实现手机接通电话后振动提示的功能
                
                                
                    8Android简单获取经纬度的方法
                
                                
                    9ViewPager顶部导航栏联动效果(标题栏条目多)
                
                                
                    10Android编程获取GPS数据的方法详解
                
                                
                    11Android TextView高级显示技巧实例小结
                
                                
                    12Android基于OpenGL的GLSurfaceView创建一个Activity实现方法
                
                                
                    13Android使用onCreateOptionsMenu()创建菜单Menu的方法详解
                
                                
                    14Android Parcelable与Serializable详解及区别
                
                                
                    15Android 操作excel功能实例代码