Python pgm解析和格式转换,python文件格式转换

下载ORL人脸数据库，发现其图像文件格式为pgm，之前也遇到过这种情况，这次仔细分析它的使用，并编写脚本用于图像格式之间的转换

参考：

pgm

Netpbm format

PGM解析格式转换 PGM解析

pgm（便携式灰度图，Portable Gray Map）是Netpbm开源工程设计的一种图像格式，除了pgm外，还有pbm，ppm

一个pgm文件可以表示一个或多个pgm图像，其文件内容如下：

1. A "magic number" for identifying the file type. A pgm image's magic number is the two characters "P5".2. Whitespace (blanks, TABs, CRs, LFs).3. A width, formatted as ASCII characters in decimal.4. Whitespace.5. A height, again in ASCII decimal.6. Whitespace.7. The maximum gray value (Maxval), again in ASCII decimal. Must be less than 65536, and more than zero.8. A single whitespace character (usually a newline).9. A raster of Height rows, in order from top to bottom. Each row consists of Width gray values, in order from left to right. Each gray value is a number from 0 through Maxval, with 0 being black and Maxval being white. Each gray value is represented in pure binary by either 1 or 2 bytes. If the Maxval is less than 256, it is 1 byte. Otherwise, it is 2 bytes. The most significant byte is first.1. 用于识别文件类型的“幻数”。PGM图像的幻数是两个字符“P5”。2. 空格（blanks, TABs, CRs, LFs）。3. 宽度，格式为ASCII十进制数字。4. 空格。5. 高度，同样为ASCII十进制数字。6. 空格。7. 最大灰度值（Maxval），同时是ASCII十进制。范围为[0，6536]。8. 单个空白字符（通常是换行符）。9. 从上到下，从左到右排列灰度值。每个灰度值取值为[0，Maxval]，其中0表示黑色，Maxval表示白色。每个灰度值由1个或2个字节的纯二进制表示。如果最大值小于256，则为1字节。否则，它是2字节。最重要的字节是第一个。

用Notepad++打开一个PGM文件

P592 11225501-/19'*515<L[c_PKB6/12+.5=FTi厒n^Qk_P9...

它的幻数是P5，宽为92，高为112，最大值为255。

Plain PGM

还有其中格式的PGM文件，它的幻数是P2，称为Plain PGM，这种格式的变化在于：

1. There is exactly one image in a file.2. The magic number is P2 instead of P5. 3. Each pixel in the raster is represented as an ASCII decimal number (of arbitrary size).4. Each pixel in the raster has white space before and after it. There must be at least one character of white space between any two pixels, but there is no maximum.5. No line should be longer than 70 characters.1. 文件中仅有单个图像。2. 幻数是P2。3. 栅格中的每个像素表示为ASCII十进制数（任意大小）。4. 光栅中的每个像素在其前后都有白色空间。在任何两个像素之间必须至少有一个空白字符。5. 没有一行应该长于70个字符。

示例如下：

P2# feep.pgm24 7150 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 3 3 3 3 0 0 7 7 7 7 0 0 11 11 11 11 0 0 15 15 15 15 00 3 0 0 0 0 0 7 0 0 0 0 0 11 0 0 0 0 0 15 0 0 15 00 3 3 3 0 0 0 7 7 7 0 0 0 11 11 11 0 0 0 15 15 15 15 00 3 0 0 0 0 0 7 0 0 0 0 0 11 0 0 0 0 0 15 0 0 0 00 3 0 0 0 0 0 7 7 7 7 0 0 11 11 11 11 0 0 15 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 为什么灰度图会有一个灰度取值两个字节？

平常都是在RGB模式下工作，灰度取值为[0 - 255]，所以一个字节就好了

但其实还有更多的灰度级数，比如16位灰度，那就需要2个字节

参考：RGB Color & Bit Depth, & Memory Cost of Images

格式转换

写了一个python程序，可以批量处理，也可以单个图像转换

# -*- coding: utf-8 -*-from __future__ import print_functionimport cv2import timeimport osimport operatorimport numpy as npimport argparsefrom PIL import Image__author__ = 'zj'image_formats = ['jpg', 'JPG', 'jpeg', 'JPEG', 'png', 'PNG']def is_pgm_file(in_path): if not os.path.isfile(in_path): return False if in_path is not str and not in_path.endswith('.pgm'): return False return Truedef convert_pgm_by_PIL(in_path, out_path): if not is_pgm_file(in_path): raise Exception("%s 不是一个PGM文件" % in_path) # 读取文件 im = Image.open(in_path) im.save(out_path)def convert_pgm_P5(in_path, out_path): """ 将pgm文件转换成其它图像格式读取二进制文件，先读取幻数，再读取宽和高，以及最大值 :param in_path: 输入pgm文件路径 :param out_path: 输出文件路径 """ if not is_pgm_file(in_path): raise Exception("%s 不是一个PGM文件" % in_path) with open(in_path, 'rb') as f: # 读取两个字节 - 幻数，并解码成字符串 magic_number = f.readline().strip().decode('utf-8') if not operator.eq(magic_number, "P5"): raise Exception("该图像有误") # 读取高和宽 width, height = f.readline().strip().decode('utf-8').split(' ') width = int(width) height = int(height) # 读取最大值 maxval = f.readline().strip() # 每次读取灰度值的字节数 if int(maxval) < 256: one_reading = 1 else: one_reading = 2 # 创建空白图像，大小为(行，列)=(height, width) img = np.zeros((height, width)) img[:, :] = [[ord(f.read(one_reading)) for j in range(width)] for i in range(height)] cv2.imwrite(out_path, img) print('%s save ok' % out_path)def convert_pgm_P5_batch(in_dir, out_dir, res_format): """ 批量转换PGM文件 :param in_dir: pgm文件夹路径 :param out_dir: 输出文件夹路径 :param res_format: 结果图像格式 """ if not os.path.isdir(in_dir): raise Exception('%s 不是路径' % in_dir) if not os.path.isdir(out_dir): raise Exception('%s 不是路径' % out_dir) if not res_format in image_formats: raise Exception('%s 暂不支持' % res_format) file_list = os.listdir(in_dir) for file_name in file_list: file_path = os.path.join(in_dir, file_name) # 若为pgm文件路径，那么将其进行格式转换 if is_pgm_file(file_path): file_out_path = os.path.join(out_dir, os.path.splitext(file_name)[0] + '.' + res_format) convert_pgm_P5(file_path, file_out_path) # 若为目录，则新建结果文件目录，递归处理 elif os.path.isdir(file_path): file_out_dir = os.path.join(out_dir, file_name) if not os.path.exists(file_out_dir): os.mkdir(file_out_dir) convert_pgm_P5_batch(file_path, file_out_dir, res_format) else: pass print('batch operation over')if __name__ == '__main__': script_start_time = time.time() parser = argparse.ArgumentParser(description='Format Converter - PGM') ### Positional arguments ### Optional arguments parser.add_argument('-i', '--input', type=str, help='Path to the pgm file') parser.add_argument('-o', '--output', type=str, help='Path to the result file') parser.add_argument('--input_dir', type=str, help='Dir to the pgm files') parser.add_argument('--output_dir', type=str, help='Dir to the result files') parser.add_argument('-f', '--format', default='png', type=str, help='result image format') parser.add_argument('-b', '--batch', action="store_true", default=False, help='Batch processing') args = vars(parser.parse_args()) # print(args) in_path = args['input'] out_path = args['output'] isbatch = args['batch'] in_dir = args['input_dir'] out_dir = args['output_dir'] res_format = args['format'] if in_path is not None and out_path is not None: # 转换单个pgm文件格式 convert_pgm_P5(in_path, out_path) # convert_pgm_by_PIL(in_path, out_path) elif isbatch: # 批量转换 convert_pgm_P5_batch(in_dir, out_dir, res_format) else: print('请输入相应参数') print('Script took %s seconds.' % (time.time() - script_start_time,))

使用PIL库也可以读取PGM文件，然后保存为其它格式图像，我自己写了一个解析二进制文件的方式，速度比调用PIL库快大约2.5倍

转换单个PGM文件： python PGMConverter.py -i INPUT -o OUTPUT

例如：

python PGMConverter.py -i 1.pgm -o 3.png 转换整个PGM文件夹 python PGMConverter.py --batch --input_dir INPUT_DIR --output_dir OUTPUT_DIR -f FORMAT

INPUT_DIR替换成PGM文件夹路径，OUTPUT_DIR替换成结果文件路径（该文件夹需提前新建），FORMAT替换成结果图像格式

例如：

python PGMConverter.py --batch --input_dir c:\face\att_faces --output_dir c:\face\att_face_png -f png

历史中提交的图片或压缩文件