介绍了从零开始开发车牌对象检测模型的方法。整个项目还包括使用Flask的API。本文介绍了如何从一开始就训练自定义对象检测模型。

项目架构

现在，让我们来看看我们打算构建的车牌识别和OCR的项目架构。

上面的体系结构有六个模块。标记、培训和保存模型、OCR和模型管线以及rest风格的API。但是，本文只详细介绍前三个模块。过程如下。首先，收集图像。然后，使用python GUI开发的开源软件图像注释工具，对图像添加车牌和车牌注释。然后在给图像加标签后，进行数据预处理，用TensorFlow 2构建深度学习目标检测模型(Inception Resnet V2 )并进行训练。目标检测模型的训练过程完成后，使用该模型裁剪包含车牌的图像，也称为感兴趣区域(ROI )，并将该ROI传递给Python的Tesserac API。使用PyTesseract从图像中提取文本。最后将这些全部总结起来，构建深度学习模型的流水线。在最后一个模组中，您将使用FLASK Python建立web APP专案。这样可以发布APP应用程序，以供其他人使用。

为了建立

标注

车牌识别，需要数据。为此，有必要收集出现车牌的车辆的图像。这是针对图像标签，我使用了LabelImg图像注释工具。从GitHub下载标签img，然后按照说明安装软件包。打开后，GUI发出指示，单击CreateRectBox，如下所示绘制方框，并将输出保存为XML。

pip安装pyqt=5

pip安装lxml

pyr cc5-o libs /资源. py资源. qrc

python标签img.py

pythonlabelimg.py [图像路径] [预定义类文件(9501.163.com ]

这是一个手动过程，需要处理所有映像。由于这个过程直接影响模型的正确性，所以标注时要小心。

从XML解析信息

标记处理完成后，需要进行数据的预处理。

由于标记的输出为XML，因此必须处理格式数据，才能将其用于培训过程。因此，从标签中获得有用的信息。例如，其边界框的对角点分别为xmin、ymin、xmax和ymax。如图3所示，需要提取信息并以方便的格式保存。在本例中，我们将边界信息转换为CSV，然后使用Pandas将其转换为数组。现在，让我们来看看如何使用Python分析信息。

使用xml.etree python库分析XML中的数据，然后将其导入到pandas和glob中。首先，使用glob检索在标记期间生成的所有XML文件。

导入手册as PD

从全球汇入全球

导入XML.Etree.Elementtreeasxet

路径=地球('./图像/*.XML ' )

labels _ dict=dict (文件路径=[ ]，xmin=[]，xmax=[]，ymin=[]，ymax=[] )

路径中文件名：

info=xet.parse (文件名)

根=info.get根(

member _对象=根.查找(对象) )。

labels _ info=成员对象.查找(bnd box ) )。

xmin=int (标签信息查找(xmin ) .文本) ) ) ) ) ) ) ) )。

xmax=int (标签信息查找(xmax ) .文本) ) ) ) ) ) ) ) )。

ymin=int (标签信息查找(ymin ) .文本) ) ) ) ) ) ) )。

ymax=int (标签信息查找) ' ymax ' ) .文本) ) ) ) ) ) )。

#print(xmin、xmax、ymin和ymax ) )。

标签_光盘[ '文件路径' ].append (文件名)

标签光盘[ ' xmin ' ] .附加(xmin )

标签光盘

x'].append(xmax) labels_dict['ymin'].append(ymin) labels_dict['ymax'].append(ymax)

在上面的代码中，我们分别获取每个文件并将其解析为xml.etree，然后找到对象-> bndbox，它位于第2至7行。然后提取xmin，xmax，ymin，ymax并将这些值保存在字典中在第8至17行中。然后，将其转换为pandas的df，并将其保存到CSV文件中，如下所示。

df = pd.DataFrame(labels_dict) df.to_csv('labels.csv',index=False) df.head()

通过以上代码，我们成功提取了每个图像的对角线位置，并将数据从非结构化格式转换为结构化格式。

现在，我们来提取XML的相应图像文件名。

import os def getFilename(filename): filename_image = xet.parse(filename).getroot().find('filename').text filepath_image = os.path.join('./images',filename_image) return filepath_image image_path = list(df['filepath'].apply(getFilename)) image_path

验证数据

到目前为止，我们都是进行的手动处理，因此重要的是要验证所获得的信息是否有效。我们只需验证边界框对于给定图像正确显示。

file_path = "N1.jpeg" xmin,xmax,ymin,ymax = 1093,1396,645,727 img = cv2.imread(file_path) cv2.rectangle(img,(xmin,ymin),(ymin,ymax),(0,255,0),3) cv2.namedWindow('example',cv2.WINDOW_NORMAL) cv2.imshow('example',img) cv2.waitKey(0) cv2.destroyAllWindows()

数据处理

这是非常重要的一步，在此过程中，我们将获取每张图像，并使用OpenCV将其转换为数组，然后将图像调整为224 x 224，这是预训练的转移学习模型的标准兼容尺寸。

from sklearn.model_selection import train_test_split from tensorflow.keras.preprocessing.image import load_img, img_to_array import cv2 import numpy as np labels = df.iloc[:,1:].values data = [] output = [] for ind in range(len(image_path)): image = image_path[ind] img_arr = cv2.imread(image) h,w,d = img_arr.shape # prepprocesing load_image = load_img(image,target_size=(224,224)) load_image_arr = img_to_array(load_image) norm_load_image_arr = load_image_arr/255.0 # normalization # normalization to labels xmin,xmax,ymin,ymax = labels[ind] nxmin,nxmax = xmin/w,xmax/w nymin,nymax = ymin/h,ymax/h label_norm = (nxmin,nxmax,nymin,nymax) # normalized output # -------------- append data.append(norm_load_image_arr) output.append(label_norm)

我们将通过除以最大数量来归一化图像，因为我们知道8位图像的最大数量为 255

我们还需要对标签进行规范化。因为对于深度学习模型，输出范围应该在0到1之间。为了对标签进行归一化，我们需要将对角点除以图像的宽度和高度。

X = np.array(data,dtype=np.float32) y = np.array(output,dtype=np.float32)

sklearn的函数可以方便的将数据分为训练和测试集。

x_train,x_test,y_train,y_test = train_test_split(X,y,train_size=0.8,random_state=0) x_train.shape,x_test.shape,y_train.shape,y_test.shape

训练

现在我们已经可以准备训练用于对象检测的深度学习模型了。本篇文章中，我们将使用具有预训练权重的InceptionResNetV2模型，并将其训练到我们的数据中。首先从TensorFlow 2.3.0导入必要的库

from tensorflow.keras.applications import InceptionResNetV2 from tensorflow.keras.layers import Dense, Dropout, Flatten, Input from tensorflow.keras.models import Model import tensorflow as tf

我们需要的是一个对象检测模型，而期望的输出数量是4（对角点的信息）。我们将在迁移学习模型中添加一个嵌入神经网络层，如第5至9行所示。

inception_resnet = InceptionResNetV2(weights="imagenet",include_top=False, input_tensor=Input(shape=(224,224,3))) inception_resnet.trainable=False # --------------------- headmodel = inception_resnet.output headmodel = Flatten()(headmodel) headmodel = Dense(500,activation="relu")(headmodel) headmodel = Dense(250,activation="relu")(headmodel) headmodel = Dense(4,activation='sigmoid')(headmodel) # ---------- model model = Model(inputs=inception_resnet.input,outputs=headmodel)

现在编译模型并训练模型

# complie model model.compile(loss='mse',optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4)) model.summary() from tensorflow.keras.callbacks import TensorBoard dddwk = TensorBoard('object_detection') history = model.fit(x=x_train,y=y_train,batch_size=10,epochs=200, validation_data=(x_test,y_test),callbacks=[dddwk])

我们训练模型通常需要3到4个小时，具体取决于计算机的速度。在这里,我们使用TensorBoard记录了中模型训练时的损失。

进行边界框预测

这是最后一步。在这一步中，我们将所有这些放在一起并获得给定图像的预测。

# create pipeline path = './test_images/N207.jpeg' def object_detection(path): # read image image = load_img(path) # PIL object image = np.array(image,dtype=np.uint8) # 8 bit array (0,255) image1 = load_img(path,target_size=(224,224)) # data preprocessing image_arr_224 = img_to_array(image1)/255.0 # convert into array and get the normalized output h,w,d = image.shape test_arr = image_arr_224.reshape(1,224,224,3) # make predictions coords = model.predict(test_arr) # denormalize the values denorm = np.array([w,w,h,h]) coords = coords * denorm coords = coords.astype(np.int32) # draw bounding on top the image xmin, xmax,ymin,ymax = coords[0] pt1 =(xmin,ymin) pt2 =(xmax,ymax) print(pt1, pt2) cv2.rectangle(image,pt1,pt2,(0,255,0),3) return image, coords # ------ get prediction path = './test_images/N207.jpeg' image, cods = object_detection(path) plt.figure(figsize=(10,8)) plt.imshow(image) plt.show()

本文仅说明了项目架构的50％。下一个过程涉及从车牌中提取文本并在Flask中开发RestfulAPI。这里是完整项目的输出

作者：DEVI GUSKRA

deephub翻译组

车牌号识别(车牌自动识别停车系统)

项目架构

标注

从XML解析信息

验证数据

数据处理

训练

进行边界框预测