yolov3入门,yolov3算法详解

简单看了YOLOv1和YOLOv2，详细看了，看了YOLOv3，刚看到的时候戴着圈。经过研究，试着按步骤记录几个关键点：

v2和v3里面有anchors和Faster rcnn，有一定的区别，这个anchors怎么理解？

个人理解白话篇：

)1)是指有以左上坐标和右下坐标表示的bbox数据的组。将bbox聚类成几个类，作为预设的anchor宽度的高度。支持的格式只要给voc数据集加上xml注格式就可以了。

代码提取注释数据的宽度和高度，并按图像的宽度和高度进行归一化。

efload_dataset(path ) :

dataset=[]

forxml _ file in glob.glob ({ }/* XML '.format ) (path ) ) :

tree=et.parse(XML_file )

height=int(tree.findtext ) (./size/height ) )

idth=int (tree.find text ('./size/width ' ) )

forobjintree.Iter(object ) ) :

xmin=int(obj.findtext ) ' bndbox/xmin ' )/width

ymin=int(obj.findtext ) ' bndbox/ymin ' )/height

xmax=int(obj.findtext ) ' bndbox/xmax ' )/width

ymax=int(obj.findtext ) ' bndbox/ymax ' )/height

dataset.append([xmax-xmin，ymax - ymin]

returnNP.Array(dataset ) )。

)2)具体是怎么划分的呢？用K-means标记的所有bbox数据根据宽度和高度分成堆。 voc数据分为9个堆，距离使用distance=1-iou

导入编号为NP

“”'

(1) k-means获取数据中所有的目标框n个，得到所有的宽度和高度，在此随机获取9个作为随机中心

)2)然后所有其他的bbox，根据iou (作为距离)从这9个宽度高度计算，计算出n行9列的距离吧

)3)找出每行中最小的，即所有bbox被分成9个中的一个，然后计算9个族中所有bbox的中位数更新中心点。

)4)在九个中心保持不变之前，这九个中心的x，y是整个数据的九个适当的anchors==框的宽度和高度。

“”'

efiou(Box，clusters ) :

“”'

calculatestheintersectionoverunion (iou ) between a box and k clusters。

:param box: tuple or array，shiftedtotheorigin (I.e.widthandheight ) )。

3360 param clusters 3360 numpyarrayofshape (k，2 ) where k is the number of clusters

3360 return : numpyarrayofshape (k，0 ) where k is the number of clusters

“”'

#计算每个盒和9个clusters的iou

# boxes :所有[[width，height]，[width，height]，…]

# clusters : 9个随机中心点[width，height]

x=NP.minimum (clusters [ :0 ]，box[0] ) ) ) ) ) ) ) ) ) ) ) )。

y=NP.minimum (clusters [ :1 ]，box[1] ) ) ) ) ) ) ) )。) ) ) )。

IFNP.count_nonzero(x==0)0ornp.count _ nonzero (y==0) 0:

raisevalueerror(boxhasnoarea ) )

intersection=x * y

#所有boxes面积

box_area=box[0] * box[1]

cluster _ area=clusters [ :0 ] * clusters [ :1 ]

iou _=intersection/(box _ area cluster _ area-intersection () ) ) ) ) ) ) ) ) )。

返回iou _

efavg_iou(Boxes，clusters ) :

“”'

calculatestheaverageintersectionoverunion (iou ) between a n

umpy array of boxes and k clusters.

:param boxes: numpy array of shape (r, 2), where r is the number of rows

:param clusters: numpy array of shape (k, 2) where k is the number of clusters

:return: average IoU as a single float

"""

return np.mean([np.max(iou(boxes[i], clusters)) for i in range(boxes.shape[0])])

def translate_boxes(boxes):

"""

Translates all the boxes to the origin.

:param boxes: numpy array of shape (r, 4)

:return: numpy array of shape (r, 2)

"""

new_boxes = boxes.copy()

for row in range(new_boxes.shape[0]):

new_boxes[row][2] = np.abs(new_boxes[row][2] - new_boxes[row][0])

new_boxes[row][3] = np.abs(new_boxes[row][3] - new_boxes[row][1])

return np.delete(new_boxes, [0, 1], axis=1)

def kmeans(boxes, k, dist=np.median):

"""

Calculates k-means clustering with the Intersection over Union (IoU) metric.

:param boxes: numpy array of shape (r, 2), where r is the number of rows

:param k: number of clusters

:param dist: distance function

:return: numpy array of shape (k, 2)

"""

rows = boxes.shape[0]

distances = np.empty((rows, k))

last_clusters = np.zeros((rows,))

np.random.seed()

# the Forgy method will fail if the whole array contains the same rows

#初始化k个聚类中心(从原始数据集中随机选择k个)

clusters = boxes[np.random.choice(rows, k, replace=False)]

while True:

for row in range(rows):

# 定义的距离度量公式：d(box,centroid)=1-IOU(box,centroid)。到聚类中心的距离越小越好，

# 但IOU值是越大越好，所以使用 1 - IOU，这样就保证距离越小，IOU值越大。

# 计算所有的boxes和clusters的值(row，k)

distances[row] = 1 - iou(boxes[row], clusters)

#print(distances)

# 将标注框分配给“距离”最近的聚类中心(也就是这里代码就是选出(对于每一个box)距离最小的那个聚类中心)。

nearest_clusters = np.argmin(distances, axis=1)

# 直到聚类中心改变量为0(也就是聚类中心不变了)。

if (last_clusters == nearest_clusters).all():

break

# 计算每个群的中心(这里把每一个类的中位数作为新的聚类中心)

for cluster in range(k):

#这一句是把所有的boxes分到k堆数据中,比较别扭，就是分好了k堆数据，每堆求它的中位数作为新的点

clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0)

last_clusters = nearest_clusters

return clusters

运行代码：

import glob

import xml.etree.ElementTree as ET

import numpy as np

from kmeans import kmeans, avg_iou

#ANNOTATIONS_PATH = "Annotations"

CLUSTERS = 9

def load_dataset(path):

dataset = []

for xml_file in glob.glob("{}/*xml".format(path)):

tree = ET.parse(xml_file)

height = int(tree.findtext("./size/height"))

width = int(tree.findtext("./size/width"))

for obj in tree.iter("object"):

xmin = int(obj.findtext("bndbox/xmin")) / width

ymin = int(obj.findtext("bndbox/ymin")) / height

xmax = int(obj.findtext("bndbox/xmax")) / width

ymax = int(obj.findtext("bndbox/ymax")) / height

dataset.append([xmax - xmin, ymax - ymin])

return np.array(dataset)

ANNOTATIONS_PATH ="自己数据路径"

data = load_dataset(ANNOTATIONS_PATH)

out = kmeans(data, k=CLUSTERS)

print("Accuracy: {:.2f}%".format(avg_iou(data, out) * 100))

#print("Boxes:n {}".format(out))

print("Boxes:n {}-{}".format(out[:, 0]*416, out[:, 1]*416))

ratios = np.around(out[:, 0] / out[:, 1], decimals=2).tolist()

print("Ratios:n {}".format(sorted(ratios)))

自己计算的VOC2007数据集总共9963个标签数据，跟论文中给到的有些许出入，可能是coco和voc2007的区别吧,

计算如下：

Accuracy:

67.22%

Boxes(自己修改的格式都4舍5入了，ratios有些许对不上):

[347,327 40,40 76,77 184,277 89,207 162,134 14,27 44,128 23,72]

Ratios:

[0.32, 0.35, 0.43, 0.55, 0.67, 0.99, 1.02, 1.06, 1.21]