神经网络的数据处理,神经网络训练结果

神经网络学习小记录27——数据预处理学习前言处理长宽不同的图片数据增强1、在数据集内进行数据增强2、在读取图片的时候数据增强3、目标检测中的数据增强

学习前言

进行训练的话，如果直接用原图进行训练，也是可以的（就如我们最喜欢Mnist手写体），但是大部分图片长和宽不一样，直接resize的话容易出问题。
除去resize的问题外，有些时候数据不足该怎么办呢，当然要用到数据增强啦。
这篇文章就是记录我最近收集的一些数据预处理的方式。

处理长宽不同的图片

对于很多分类、目标检测算法，输入的图片长宽是一样的，如224,224、416,416等。

直接resize的话，图片就会失真。

但是我们可以采用如下的代码，使其用padding的方式不失真。

from PIL import Imagedef letterbox_image(image, size): # 对图片进行resize，使图片不失真。在空缺的地方进行padding iw, ih = image.size w, h = size scale = min(w/iw, h/ih) nw = int(iw*scale) nh = int(ih*scale) image = image.resize((nw,nh), Image.BICUBIC) new_image = Image.new('RGB', size, (128,128,128)) new_image.paste(image, ((w-nw)//2, (h-nh)//2)) return new_imageimg = Image.open("2007_000039.jpg")new_image = letterbox_image(img,[416,416])new_image.show()

得到图片为：

数据增强 1、在数据集内进行数据增强

这个的意思就是可以直接增加图片的方式进行数据增强。
其主要用到的函数是：

ImageDataGenerator(featurewise_center=False, samplewise_center=False, featurewise_std_normalization=False, samplewise_std_normalization=False, zca_whitening=False, zca_epsilon=1e-06, rotation_range=0, width_shift_range=0.0, height_shift_range=0.0, brightness_range=None, shear_range=0.0, xhdsb_range=0.0, channel_shift_range=0.0, fill_mode='nearest', cval=0.0, horizontal_flip=False, vertical_flip=False, rescale=None, preprocessing_function=None, data_format=None, validation_split=0.0, dtype=None)

对于我而言，常用的方法如下：

datagen = ImageDataGenerator( rotation_range=10, width_shift_range=0.1, height_shift_range=0.1, shear_range=0.2, xhdsb_range=0.1, horizontal_flip=False, brightness_range=[0.1, 2], fill_mode='nearest')

其中，参数的意义为：
1、rotation_range：旋转范围
2、width_shift_range：水平平移范围
3、height_shift_range：垂直平移范围
4、shear_range：float, 透视变换的范围
5、xhdsb_range：缩放范围
6、horizontal_flip：水平反转
7、brightness_range：图像随机亮度增强，给定一个含两个float值的list，亮度值取自上下限值间
8、fill_mode：‘constant’，‘nearest’，‘reflect’或‘wrap’之一，当进行变换时超出边界的点将根据本参数给定的方法进行处理。

实际使用时可以利用如下函数生成图像：

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_imgimport os datagen = ImageDataGenerator( rotation_range=10, width_shift_range=0.1, height_shift_range=0.1, shear_range=0.2, xhdsb_range=0.1, horizontal_flip=False, brightness_range=[0.1, 2], fill_mode='nearest')trains = os.listdir("./train/")for index,train in enumerate(trains): img = load_img("./train/" + train) x = img_to_array(img) x = x.reshape((1,) + x.shape) i = 0 for batch in datagen.flow(x, batch_size=1, save_to_dir='./train_out', save_prefix=str(index), save_format='jpg'): i += 1 if i > 20: break

生成效果为：

2、在读取图片的时候数据增强

ImageDataGenerator是一个非常nice的增强方式，不过如果不想生成太多的图片，然后想要直接在读图的时候处理，也是可以的。
我们用到PIL中的ImageEnhance库。
1、亮度增强ImageEnhance.Brightness(image)
2、色度增强ImageEnhance.Color(image)
3、对比度增强ImageEnhance.Contrast(image)
4、锐度增强ImageEnhance.Sharpness(image)

在如下的函数中，可以通过改变Ehance函数中的参数实现不同的增强方式。

import osimport numpy as npfrom PIL import Imagefrom PIL import ImageEnhancedef Enhance_Brightness(image): # 变亮，增强因子为0.0将产生黑色图像,为1.0将保持原始图像。 # 亮度增强 enh_bri = ImageEnhance.Brightness(image) brightness = np.random.uniform(0.6,1.6) image_brightened = enh_bri.enhance(brightness) return image_brighteneddef Enhance_Color(image): # 色度,增强因子为1.0是原始图像 # 色度增强 enh_col = ImageEnhance.Color(image) color = np.random.uniform(0.4,2.6) image_colored = enh_col.enhance(color) return image_coloreddef Enhance_contrasted(image): # 对比度，增强因子为1.0是原始图片 # 对比度增强 enh_con = ImageEnhance.Contrast(image) contrast = np.random.uniform(0.6,1.6) image_contrasted = enh_con.enhance(contrast) return image_contrasteddef Enhance_sharped(image): # 锐度，增强因子为1.0是原始图片 # 锐度增强 enh_sha = ImageEnhance.Sharpness(image) sharpness = np.random.uniform(0.4,4) image_sharped = enh_sha.enhance(sharpness) return image_sharpeddef Add_pepper_salt(image): # 增加椒盐噪声 img = np.array(image) rows,cols,_=img.shape random_int = np.random.randint(500,1000) for _ in range(random_int): x=np.random.randint(0,rows) y=np.random.randint(0,cols) if np.random.randint(0,2): img[x,y,:]=255 else: img[x,y,:]=0 img = Image.fromarray(img) return imgdef Enhance(image_path, change_bri=1, change_color=1, change_contras=1, change_sha=1, add_noise=1): #读取图片 image = Image.open(image_path) if change_bri==1: image = Enhance_Brightness(image) if change_color==1: image = Enhance_Color(image) if change_contras==1: image = Enhance_contrasted(image) if change_sha==1: image = Enhance_sharped(image) if add_noise==1: image = Add_pepper_salt(image) image.save("0.jpg")Enhance("2007_000039.jpg")

原图：

效果如下：

3、目标检测中的数据增强

在目标检测中如果要增强数据，并不是直接增强图片就好了，还要考虑到图片扭曲后框的位置。
也就是框的位置要跟着图片的位置进行改变。
原图：

增强后：

from PIL import Image, ImageDrawimport numpy as npfrom matplotlib.colors import rgb_to_hsv, hsv_to_rgbdef rand(a=0, b=1): return np.random.rand()*(b-a) + adef get_random_data(annotation_line, input_shape, random=True, max_boxes=20, jitter=.3, hue=.1, sat=1.5, val=1.5, proc_img=True): '''random preprocessing for real-time data augmentation''' line = annotation_line.split() image = Image.open(line[0]) iw, ih = image.size h, w = input_shape box = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]]) # resize image new_ar = w/h * rand(1-jitter,1+jitter)/rand(1-jitter,1+jitter) scale = rand(.7, 1.3) if new_ar < 1: nh = int(scale*h) nw = int(nh*new_ar) else: nw = int(scale*w) nh = int(nw/new_ar) image = image.resize((nw,nh), Image.BICUBIC) # place image dx = int(rand(0, w-nw)) dy = int(rand(0, h-nh)) new_image = Image.new('RGB', (w,h), (128,128,128)) new_image.paste(image, (dx, dy)) image = new_image # flip image or not flip = rand()<.5 if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT) # distort image hue = rand(-hue, hue) sat = rand(1, sat) if rand()<.5 else 1/rand(1, sat) val = rand(1, val) if rand()<.5 else 1/rand(1, val) x = rgb_to_hsv(np.array(image)/255.) x[..., 0] += hue x[..., 0][x[..., 0]>1] -= 1 x[..., 0][x[..., 0]<0] += 1 x[..., 1] *= sat x[..., 2] *= val x[x>1] = 1 x[x<0] = 0 image_data = hsv_to_rgb(x) # numpy array, 0 to 1 # correct boxes box_data = np.zeros((max_boxes,5)) if len(box)>0: np.random.shuffle(box) box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy if flip: box[:, [0,2]] = w - box[:, [2,0]] box[:, 0:2][box[:, 0:2]<0] = 0 box[:, 2][box[:, 2]>w] = w box[:, 3][box[:, 3]>h] = h box_w = box[:, 2] - box[:, 0] box_h = box[:, 3] - box[:, 1] box = box[np.logical_and(box_w>1, box_h>1)] # discard invalid box if len(box)>max_boxes: box = box[:max_boxes] box_data[:len(box)] = box return image_data, box_dataif __name__ == "__main__": line = r"F:Collectionyolo_Collectionkeras-yolo3-masterVOCdevkit/VOC2007/JPEGImages/00001.jpg 738,279,815,414,0" image_data, box_data = get_random_data(line,[416,416]) left, top, right, bottom = box_data[0][0:4] img = Image.fromarray((image_data*255).astype(np.uint8)) draw = ImageDraw.Draw(img) draw.rectangle([left, top, right, bottom]) img.show()