转录组数据分析软件,转录组分析电脑配置

为了替换镜像源，提高下载速度并提高下载速度，必须替换/etc/apt/source.list中的默认镜像源。方法：供中国科学技术大学开源镜像工作站参考

备份

cd /etc/apt/

sudo CP source.list source.list.bk

替换

SDO sed-I‘s/http/https/g’sources.list

SDO sed-I‘s/archive.Ubuntu.com/mirrors.ustc.edu.cn/g’sources.list

SDO sed-I‘s/security.Ubuntu.com/mirrors.ustc.edu.cn/g’sources.list

更新

sudo apt-get update

sudo apt-get upgrade

下载数据下载小区rseq数据SRR

for ele in {511.519}

echo“3359 SRA-download.NCBI.NLM.NIH.gov/SRA pub/SRR 2089 $ ele”download.txt

唐

wget -ci download.txt

sratookit下载安装功能：下载、操作和NCBI SRA新一代测序数据验证

URL:https://trace.NCBI.NLM.NIH.gov/traces/SRA/SRA.CGI？ view=software

步骤：

cd src

wget 3359 FTP-trace.NCBI.NLM.NIH.gov/SRA/SDK/2.8.2-1/SRA toolkit.2.8.2-1-Ubuntu 64.tar.gz

tar-zxvfsratoolkit.2.8.2-1-Ubuntu 64.tar.gz

mvsratoolkit.2.8.2-1-Ubuntu 64~/biosoft

添加环境变量

echo‘path=$ path : (/biosoft/SRA toolkit.2.8.2-1-Ubuntu 64 /勤奋百合) (/.bashrc

source ~/.bashrc

测试

prefetch -v

prefetch -c SRR2413322

SRA将转换为fastq格式{ 511 . 519 }

echo“fastq-dumpsplit-3s RR 2089 $ I-o ./fastq”./fastq/fastq.sh

唐

运行fastq.sh

SAMtools下载安装序列对齐/映射(Sam )数据格式是用于存储当前高吞吐量测量序列中的归类数据的标准格式。当然，也可以用于存储未比对的数据。目前，处理SAM格式的工具主要是SAMTools，这是横沥大神编写的。 SAMTools的主要功能如下。

view: BAM-SAM/SAM-BAM转换与提取部分比对

sort:归类排序

merge:聚合多个排序归类

索引：索引排序归类

faidx:创建FASTA索引并提取子序列

tview:文本格式显示序列

pileup:生成基于位置的结果和consensus/indel calling

下载wget https://nchc.dl.source forge.net/project/samtools/samtools/1.6/samtools-1.6.tar.bz2

解压缩tar jxvf samtools-1.6.tar.bz2

添加到环境变量

echo‘path=$ path : (/biosoft/samtools-1.6 ) )/.bashrc

source ~/.bashrc

制作Hisat2基因组索引的人和鼠标的索引一般有现成的，所以建议下载现成的

33558 CCB.jhu.edu/software/hisa T2/index.shtml

wget FTP ://FTP.CCB.jhu.edu/pub/INF Philo/hisa T2/data/hg19.tar.gz

tar -zxvf hg19.tar.gz

猪的基因组中没有索引。必须手动创建

HISAT2包含两个Python脚本，用于将GTF文件转换为可在hisat2-build中使用的文件。

extract _ exons.py sus _ scro fa.sscrofa 11.1.90.chr.GTF genome.exon

extract _ splice _ sites.py sus _ scro fa.sscrofa 11.1.90.chr.GTF genome.ss

最后创建Index
hisat2-build –ss genome.ss –exon genome.exon Sus_scrofa.Sscrofa11.1.dna.toplevel.fa Sus_tran

Hisat2比对

将RNA-seq的测序reads使用hisat2比对
hisat2 -p 8 –dta -x ./ref/Sus_tran/Sus_tran -1 ./fastq/Blast_1.clean.fq -2 ./fastq/Blast_2.clean.fq -S ./hisat2-out/Blast.sam

比对率达到了96.85，大功告成。

HTSeq安装

使用pip直接下载：
pip install HTSeq
如果失败了，下载依赖包：
pip install ‘matplotlib>=1.4’
pip install Cython
pip install ‘pysam>=0.9’
pip install HTSeq
如果还失败,使用 setup.py
wget https://github.com/simon-anders/htseq/archive/master.zip
解压进入该目录，输入 python setup.py install –user
程序的路径 .local/勤劳的百合

添加到环境变量就可以了或者
/home/yczuo/.local/勤劳的百合/htseq-count Blast.sam /home/yczuo/ref/Sus.gtf

htseq-count 计数

将sam文件转换为bam文件
samtools view -S ./hisat2-out/Blast.sam -b > ./BAM/Blast.bam
bam文件排序#因为是双端测序，必须对bam文件排序
samtools sort -n ./BAM/Blast.bam ./BAM/Blast_sort.bam
samtools view -h ./BAM/Blast_sort.bam > ./SAM/Blast_sort.sam
htseq-count -s no ./SAM/Blast_sort.sam genes.gtf > ./reads count/Blast.count

写个循环
for ele in Blast ICM Morula Oocyte P1_cell P2_cell P4_cell P8_cell PFF TE
do
echo -e “samtools view -S ./hisat2-out/ ele.sam−b>./BAM/ ele.bamnsamtools sort -n ./BAM/ ele.bam./BAM/ ele.sortnsamtools view -h ./BAM/ ele.sort.bam>./SAM/ ele.sort.samnhtseq-count -s no ./SAM/ ele.sort.sam./ref/Susscrofa.Sscrofa11.1.90.chr.gtf>./count/ ele.countn” >> htseq.sh
done
chmod 777 ./htseq.sh
nohup ./htseq.sh > htseq.log 2>&1 &

查看任务
jobs -l
ps -ef |grep htseq

结果输出count值

stringtie安装

wget http://ccb.jhu.edu/software/stringtie/dl/stringtie-1.3.3b.Linux_x86_64.tar.gz
解压 tar zxvf stringtie-1.3.3b.Linux_x86_64.tar.gz
添加到环境变量
echo ‘PATH=$PATH:~/biosoft/stringtie-1.3.3b.Linux_x86_64’ >> ~/.bashrc
source ~/.bashrc

stringtie转录本处理

1、 stringtie组装转录本(首先将sam文件转换为bam文件，并排序；然后对每个样本进行转录本组装)

for ele in Blast ICM Morula Oocyte P1_cell P2_cell P4_cell P8_cell PFF TE
do
echo -e “samtools view -S ele.sam−b> ele.bamnsamtools sort -@ 8 ele.bam ele.sortednstringtie -p 8 -G Sus.gtf -o ele.gtf ele.sorted.bam” >> out.sh
done

2 、stringtie合并转录本（将所有样本的转录本进行合并）
stringtie –merge -p 8 -G Sus.gtf -o stringtie_merged.gtf mergelist.txt #mergelist.txt是自己创建的

for ele in Blast ICM Morula Oocyte P1_cell P2_cell P4_cell P8_cell PFF TE
do
echo -e “./$ele.gtf” >> mergelist.txt
done

3、stringtie评估表达量（计算表达量并且为Ballgown包提供输入文件）
for ele in Blast ICM Morula Oocyte P1_cell P2_cell P4_cell P8_cell PFF TE
do
echo -e “stringtie -p 8 -G stringtie_merged.gtf -e -B -o ballgown/ ele/ ele.gtf $ele.sorted.bam” >> out2.sh
done

在-B 指定的文件夹下生成特定的文件
e2t.ctab e_data.ctab i2t.ctab i_data.ctab t_data.ctab
e即外显子、i即内含子、t转录本；e2t即外显子和转录本间的关系，i2t即内含子和转录本间的关系，t_data即转录本的数据

Ballgown表达量分析

1、 Ballgown的安装
source(“http://bioconductor.org/biocLite.R“)
biocLite(“ballgown”)
2、文件准备与分析
将数据的分组信息写入一个csv文件，此处phenodata.csv文件

3、运行R脚本，分析
Rscript expr.R

library(ballgown)
library(genefilter)
a <- read.csv(“pheno_data.csv”)
bg <- ballgown(dataDir = ‘ballgown’, samplePattern = “Sample”, pData = a)
bg_filt <- subset(bg, “rowVars(texpr(bg)) > 0.1”, genomesubset=TRUE)
gene_expression <- gexpr(bg_filt)
write.csv(gene_expression, “./FPKM/gene_expression.csv”)
transcripts_expression <- texpr(bg_filt)
write.csv(transcripts_expression, “./FPKM/transcripts_expression.csv”)