maskrcnn代码解读,cnn算法模型

这两天，我打算稍微构思一下，修改基于Mask RCNN的网络模型。整理了思路，准备跑马斯克rcnn。至少要先跑base模型再进行修正实验吧。最终，该Mask RCNN模型的demo环境持续了2天((制作了b )。为了这两天的光荣日子，我想写博客纪念。

1、实验环境

2、网络模式

3、遇到的洞

1、attribute error :模块‘tensor flow’hasnoattribute‘log’；

2、attribute error 3360 module‘tensor flow._ API.v2.sets’hasnoattribute‘set _ intersection’

3、value error 3360 triedtoconvert‘shape’toatensorandfailed.error : nonevaluesnotsupported。

4、attribute error : module‘keras.engine.saving’hasnoattribute‘load _ weights _ from _ HD F5 _ grou by _ ng

5、unabletoopenfile (中断文件： eof=7340032，sblock-base_addr=0，stored_eof=126651688 ) )。

6、通过jupyter notebook长时间运行模型

7、demo工作正常，但出现*** No instances to displa

y ***

4、Tensorflow2.X版本能否运行

5、最后看一下基于COCO的demo运行效果把：

6、总结

1、实验环境

跑通一个模型肯定需要补充对模型的了解，找一个对模型复现比较好的base模型可以对模型细节更加理解通透（只看论文虽然模型觉得已经掌握，但是细节还是看代码），并且对自己的实验或者训练自己的数据集更方便的base模型才是想要的。我的实验环境：

GPU:RTX3090；内存：64G；CPU：Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz；系统：Ubuntu18.04； 2、网络模型

搜了一遍网上代码，这个代码还是对模型理解不错的，并且星标人数也是最多的，果然群众的眼睛都是雪亮的：模型链接，实验环境如下：

numpyscipyPillowcythonmatplotlibscikit-imagetensorflow>=1.3.0keras>=2.0.8opencv-pythonh5pyimgaugIPython[all]

好了，介绍完实验背景和环境，下面就说一下我接下来遇到的坑，希望可以给大家提供帮助。这篇博文不是对环境部署详细解释的，如果有小伙伴需要环境部署的详细说明，可以参考我之前的文章：深度学习环境搭建，这篇文章的显卡是2070的。至于模型环境的部署，模型链接里面的Readme就可以，直接按步骤下载和安装好了。

3、遇到的坑

言归正传，我搭建好环境之后开始测试，就出现各种问题。以下问题都是在model.py中进行修改：

1、AttributeError: module ‘tensorflow’ has no attribute ‘log’；

解决办法：

#将log2_grap函数修改如下：def log2_graph(x): """Implementation of Log2. TF doesn't have a native implementation.""" return tf.math.log(x) / tf.math.log(2.0) 2、AttributeError: module ‘tensorflow._api.v2.sets’ has no attribute ‘set_intersection’

解决办法：

#将TensorFlow引入改为v1版本import tensorflow as tf变为:import tensorflow.compat.v1 as tf 3、ValueError: Tried to convert ‘shape’ to a tensor and failed. Error: None values not supported.

解决办法：

#将如下代码：mrcnn_bbox = KL.Reshape((s[1], num_classes, 4), name="mrcnn_bbox")(x)#修改为：if s[1] is None: mrcnn_bbox = KL.Reshape((-1, num_classes, 4), name="mrcnn_bbox")(x) else: mrcnn_bbox = KL.Reshape((s[1], num_classes, 4), name="mrcnn_bbox")(x)#将如下代码：indices = tf.stack([tf.range(probs.shape[0]), class_ids], axis=1)#修改为：indices = tf.stack([tf.range(tf.shape(probs)[0]), class_ids], axis = 1) 4、AttributeError: module ‘keras.engine.saving’ has no attribute ‘load_weights_from_hdf5_group_by_name

解决办法：

#将如下代码：if by_name: saving.load_weights_from_hdf5_group_by_name(f, layers)else: saving.load_weights_from_hdf5_group(f, layers)#修改为：keras_model.load_weights(filepath, by_name=by_name) 5、Unable to open file (truncated file: eof = 7340032, sblock->base_addr = 0, stored_eof = 126651688)

解决办法：

查看一下自己本地的coco预训练模型：mask_rcnn_coco.h5，查看大小与原网站链接上的文件大小是否一致，如不一致请重新下载。 6、在jupyter notebook上运行模型长时间卡顿

问题描述：在导入本地Mask CRNN库和检测推理时候程序长时间卡顿，我计时过大概得有10分钟，我以下面的本地库导入代码举例：

import osimport sysimport randomimport mathimport numpy as npimport skimage.ioimport matplotlibimport matplotlib.pyplot as plt# Root directory of the projectROOT_DIR = os.path.abspath("../")os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'print(1)import tensorflow as tfprint(2)tf.debugging.set_log_device_placement(True)print(3)gpus = tf.config.list_physical_devices('GPU')print(4)tf.config.experimental.set_visible_devices(gpus[0], 'GPU')print(5)os.environ['CUDA_VISIBLE_DEVICES']="0" # 指定哪块GPU训练 config=tf.compat.v1.ConfigProto() print(6)# 设置最大占有GPU不超过显存的80%（可选）# config.gpu_options.per_process_gpu_memory_fraction=0.8config.gpu_options.allow_growth = True # 设置动态分配GPU内存print(7)sess=tf.compat.v1.Session(config=config)print(8)tf.compat.v1.disable_eager_execution()#保证sess.run()能够正常运行hello = tf.constant('hello,tensorflow')sess= tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(log_device_placement=True))#版本2.0的函数print(sess.run(hello)) # Import Mask RCNNsys.path.append(ROOT_DIR) # To find local version of the libraryfrom mrcnn import utilsimport mrcnn.model as modellibfrom mrcnn import visualize# Import COCO configsys.path.append(os.path.join(ROOT_DIR, "samples/coco/")) # To find local versionimport coco%matplotlib inline # Directory to save logs and trained modelMODEL_DIR = os.path.join(ROOT_DIR, "logs")# Local path to trained weights fileCOCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")# Download COCO trained weights from Releases if neededif not os.path.exists(COCO_MODEL_PATH): utils.download_trained_weights(COCO_MODEL_PATH)# Directory of images to run detection onIMAGE_DIR = os.path.join(ROOT_DIR, "images")

上诉代码是base模型demo程序的导入本地库、加载模型路径的代码，我只是加上了针对GPU增长方式的控制。就上面的代码，卡顿了10分钟，甚至连模型都没加载（COCO预训练模型已经下载好到本地），问题出在哪呢？从上面代码可以看见，我每部都做了print(NUM)，结果发现导入基本库就卡住了，基本都是与TensorFlow库相关卡住。

解决办法：

原因是NVIDIA驱动版本、TensorFlow、keras、CUDA、CUDNN的版本要相对应。我的NVIDIA的版本是460.X，对应版本的CUDA是11.2，CUDNN对应8.X版本。TensorFlow版本是2.6.0，对应keras版本是2.6.0。过程我就不叙述，实在折磨人，这是我最后更改的版本，运行程序再无卡顿情况。

7、demo可以正常运行，但是出现*** No instances to display ***

解决办法：

这个问题着实困扰我很久，网上有很多也说改这个代码，改那个代码的。我都实验过，其实还是部署环境时候各个模块的版本要对应上。与我配置一致的小伙伴可以参考我的配置肯定可以解决：我的NVIDIA的版本是460.X，对应版本的CUDA是11.2，CUDNN对应8.X版本。TensorFlow版本是2.6.0，对应keras版本是2.6.0。如果是版本比较老的可以参考网上的版本对应信息，也基本都可以解决。

4、Tensorflow2.X版本能否运行

请看3小结的问题7，我将TensorFlow版本更换到最新的2.6.0版本，仍然可以运行，只需要将对应的keras修改为2.6.0版本即可。并且，我已经利用基于TensorFlow1.3这个版本，升级到TensorFlow2系列的代码也进行了测试：源码，我已经进行了测试，可以正常运行。

5、最后看一下基于COCO的demo运行效果把：

6、总结

与这些坑奋斗了两天多，折磨人的过程就不说了，希望觉得对自己有帮助的小伙伴给点关注，给点赞吧^_^。