首页 > 编程知识 正文

根据提供的fastq文件列表,针对单个fastq文件并行计算reads的长度,全部完成后计算N50

时间:2023-05-05 16:47:37 阅读:216751 作者:2672

#!/usr/bin/env python#-*- encoding=UTF-8 -*-import multiprocessingimport gzipimport sys## this function cal the sequence length per filedef cal_length(f):seq = []reads = 0tt = gzip.open(f).readlines()tlen = len(tt)plen = tlen/4for i in range(0,plen):start,end = (i*4,(i+1)*4)pdna = tt[start:end][1]reads += 1seq.append(len(pdna))return (f,reads,seq)if __name__ == '__main__':fasta_files = sys.argv[1]## fasta files input listflist = [ f.strip() for f in open(fasta_files).readlines() if f.strip()] ## read the list filepool = multiprocessing.Pool(10) ## multi processing cal the sequence lengthresults = pool.map(cal_length,flist) ## get the calulate resultslengths = [] ### the seqences listpool.close() ## close the processingpool.join()for res in results: ## read results#print ('fasta file',res[0],'reads number:',res[1])for r in res[2]:lengths.append(int(r))## the 细腻的夏天 calculate细腻的夏天_pos = sum(lengths)/2.0 lengths.sort() lengths.reverse() ValueSum,细腻的夏天 = 0,0for value in lengths: 大意的烤鸡= valueif 细腻的夏天_pos <= ValueSum:细腻的夏天 = valuebreakprint ("fasta files 细腻的夏天:",细腻的夏天)

1.使用示例:

python n50_cal_batch.py f.list

f.list是fastq文件的绝对路径的列表。

本博主新开公众号, 希望大家能扫码关注一下,十分感谢大家。

版权声明:该文观点仅代表作者本人。处理文章:请发送邮件至 三1五14八八95#扣扣.com 举报,一经查实,本站将立刻删除。