rcnn详解,fasterrcnn代码详解

该博客记录了generate_anchors的解读，有助于理解anchor的生成过程

首先请看主函数

if _ _ name _=' _ _ main _ ' : import time t=time.time (a=generate _ anchors ) #最重要的是此函数print time。embed

def generate _ anchors (base _ size=16，ratios=[ 0.5，1，2 ]，scales=2**np.arange(3) 3， 6 ) : ' ' '生成分析器(reference ) windowsbyenumeratingaspectratiosxscaleswrtareference ) 0，0，15，15 ) window . base_size](-1print ) ' base_anchor '，base ratios (print (anchorsafterratio )，ratio_anchors ) anchors=NP.NP scales ) foriinxrange (ratio _ anchors.shape [0] ) ) )打印(achorsafterrationandscale '，anchors )返回分析

1.base_size=16

此参数指定第一个相似感受区的区域大小。这是因为，经过多重卷积池化后，feature map上的一点感受野与原始图像对应时就成为区域。这里设定为16，即feature map上的一点与原始图像大小为16x16的区域相对应。根据需要也可以自己设置。

2.ratios=[ 0.5，1，2 ]

此参数表示按1:2、1:1和2:1这三个比率转换16x16区域，如下图所示。

图1宽高比转换

3.scales=2* * NP.arange (3，6 ) ) ) ) ) ) ) ) ) ) ) )。

此参数将输入区域的宽度和高度放大为三种倍数：2^3=8、2^4=16、2^5=32倍。例如，假设16*16的区域为(16*8) )=128*128的区域，) 16*16 ) *

图2面积放大变换

接下来，我们来看第一段代码：

base _ anchor=NP.array ([ 1，1，base_size，base_size](1''base_anchor的值为[ 0，0，15，15 ] ' )

ratio _ anchors=_ ratio _ enumm (base _ anchor，ratios )语句是对上述16x16的区域进行ratio变化，即输出3个anchors长宽比

def_Ratio_enum(anchor，ratios (3360 ((enumerateasetofanchorsforeachaspectratiowrtananchor.() ) ) size=w * h 256/ratios [ 0.5，1，2 ]=[ 512，256， 128]#round ()方法返回x的经过舍入的数字(sqrt ) )方法返回数字x的平方根ws=NP.round ) NP.sqrt(size_ )。as:2312 #具有宽的高度矢量输出将中心点纵坐标)的矩形变换为#4个坐标值后的矩形anchors=_mkanchors(ws，hs，x_ctr，y_ctr

return anchors

输入参数为一个anchor(四个坐标值表示)和三种宽高比例（0.5,1,2）

在这个函数中又调用了一个_whctrs函数，这个函数定义如下，其主要作用是将输入的anchor的四个坐标值转化成（宽，高，中心点横坐标，中心点纵坐标）的形式。

def _whctrs(anchor): """ Return width, height, x center, and y center for an anchor (window). """ w = anchor[2] - anchor[0] + 1 h = anchor[3] - anchor[1] + 1 x_ctr = anchor[0] + 0.5 * (w - 1) y_ctr = anchor[1] + 0.5 * (h - 1) return w, h, x_ctr, y_ctr

通过这个函数变换之后将原来的anchor坐标（0，0，15，15）转化成了w:16,h:16,x_ctr=7.5,y_ctr=7.5的形式，接下来按照比例变化的过程见_ratio_enum的代码注释。最后该函数输出的变换了三种宽高比的anchor如下：

ratio_anchors = _ratio_enum(base_anchor, ratios)'''[[ -3.5, 2. , 18.5, 13. ], [ 0. , 0. , 15. , 15. ], [ 2.5, -3. , 12.5, 18. ]]'''

进行完上面的宽高比变换之后，接下来执行的是面积的scale变换，

anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales) for i in xrange(ratio_anchors.shape[0])])

这里最重要的是_scale_enum函数，该函数定义如下，对上一步得到的ratio_anchors中的三种宽高比的anchor，再分别进行三种scale的变换，也就是三种宽高比，搭配三种scale，最终会得到9种宽高比和scale 的anchors。这就是论文中每一个点对应的9种anchors。

def _scale_enum(anchor, scales): """ Enumerate a set of anchors for each scale wrt an anchor. """ w, h, x_ctr, y_ctr = _whctrs(anchor) ws = w * scales hs = h * scales anchors = _mkanchors(ws, hs, x_ctr, y_ctr) return anchors

_scale_enum函数中也是首先将宽高比变换后的每一个ratio_anchor转化成（宽，高，中心点横坐标，中心点纵坐标）的形式，再对宽和高均进行scale倍的放大，然后再转换成四个坐标值的形式。最终经过宽高比和scale变换得到的9种尺寸的anchors的坐标如下：

anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales) for i in xrange(ratio_anchors.shape[0])])'''[[ -84. -40. 99. 55.] [-176. -88. 191. 103.] [-360. -184. 375. 199.] [ -56. -56. 71. 71.] [-120. -120. 135. 135.] [-248. -248. 263. 263.] [ -36. -80. 51. 95.] [ -80. -168. 95. 183.] [-168. -344. 183. 359.]]'''

下面这个表格对比了9种尺寸的anchor的变换：

base_anchorratios（宽，高，中心点横坐标，中心点纵坐标）坐标16x16

23x12

(2:1)

[184,96,7.5,7.5] scale=8

[ -84. -40. 99. 55.]

[368,192,7.5,7.5] scale=16

[-176. -88. 191. 103.]

[736,384,7.5,7.5] scale=32

[-360. -184. 375. 199.]

16x16

(1:1)

[128,128,7.5,7.5] scale=8

[ -56. -56. 71. 71.]

[256,256,7.5,7.5] scale=16

[-120. -120. 135. 135.]

[512,512,7.5,7.5] scale=32

[-248. -248. 263. 263.]

11x22

(1:2)

[88,176,7.5,7.5] scale=8[ -36. -80. 51. 95.] [176,352,7.5,7.5] scale=16

[ -80. -168. 95. 183.]

[352,704,7.5,7.5] scale=32[-168. -344. 183. 359.]

以我的理解，得到的这些anchors的坐标是相对于原始图像的，因为feature map的大小一般也就是60*40这样的大小，而上面得到的这些坐标都是好几百，因此是相对于原始大图像而设置的这9种组合的尺寸，这些尺寸基本上可以包含图像中的任何物体，如果画面里出现了特大的物体，则这个scale就要相应的再调整大一点，来包含特大的物体。