small RNA学习（八）：降解组分析

2019-04-16 本文已影响32人 TOP生物信息

1. 降解组（测序）是什么？分析原理是什么？

在植物中，通常miRNA与靶标序列结合紧密，在结合位点的第10/11位碱基处直接剪切mRNA从而调控下游的mRNA表达。被剪切后的mRNA变为两段，其中之一是含有3'-polyA尾巴且5'不含cap的片段。在测序过程中，针对这种断片特异捕获，就是降解组测序了。

那分析思路是怎么样的呢？
首先是将降解组测序reads比对到转录本，比对深度明显增高的那些位点重点关注，这些位点的5'的上下游各取15bp左右的序列作为A集合，再将小RNA序列集跟A序列集比较看是否能反向互补上（也可以理解为，将小RNA比对到转录本，比对上的位点记为B集合，看A集合与B集合是否有重叠），若能，则记录下小RNA与转录本的对应关系。

根据上述的原理，不难发现，降解组分析主要是三个数据文件：

降解组测序数据
转录本序列
小RNA序列

2. 安装

需要提前准备的几个Perl模块和软件

Getopt::Std
Math::CDF

bowtie (version 0.12.x or 1.x)
bowtie-build
RNAplex (from Vienna RNA package)
GSTAr.pl (Version 1.0 or higher -- distrubuted with CleaveLand4)
R
samtools
#确保都已添加到环境变量PATH中

安装细节详见：https://github.com/MikeAxtell/CleaveLand4.git，这里就不赘述了。

3. 举个栗子

#在有多个版本Perl可以使用的情况下，用哪一个Perl安装的预备模块，就用绝对路径调用哪一个Perl。
perl CleaveLand4.pl \
-e degradome.clean.fa.gz \
-u osa-miR211a.fasta \
-n xxx.fasta \
-p 0.05 -c 2 \
-t -o ./test_plot > out.txt
#-e: Path to FASTA-formatted degradome reads
#-u: Path to FASTA-formatted small RNA queries
#-n: Path to FASTA-formatted transcriptome
#-p: p-value阈值，以此来过滤不大可信的小RNA与靶标的对应关系
#-c: DegradomeCategory共有5类，0-4，越小表示对应关系越可信
#-t: 输出文件制表符分割
#-o: T-plots存放的文件夹名称，注意该名称不能含有"/"符号也就是不能有子文件夹，类似./test/plot1

看看结果

$ ls
out2.txt  degradome.clean.fa.gz_dd.txt  test2_plot

$ head -n 30 degradome.clean.fa.gz_dd.txt #将降解组reads比对到转录本之后的结果-hsy
# CleaveLand4 degradome density
# Wed Mar  6 10:19:26 CST 2019
# Degradome Reads:./degradome.clean.fa.gz
# Transcriptome:./xxx.fasta
# TranscriptomeCharacters:12257028
# Mean Degradome Read Size:20 #降解组reads平均长度-hsy
# Estimated effective Transcriptome Size:11901028
# Category 0:8011 #这几个分类下面会讲-hsy
# Category 1:1616
# Category 2:9783
# Category 3:10382
# Category 4:74247
@ID:chr1A:4210110-4210858
@LN:748
77  1   4 #位置  reads数  分类
78  1   4
621 1   4
626 1   4
634 2   1
649 2   1
698 1   4

#out2.txt存放的是小RNA序列与转录本序列（位置）的对应关系，以及一些指标
#test2_plot文件夹存放的是图，out2.txt有多少个记录，就有多少个图，如下

4. 软件介绍

CleaveLand定义了几个分类，下面来看一下
Modes（模式）:

1. Align degradome data, align small RNA queries, and analyze. #就是我上面例子的模式，直接输入三个初始文件
  REQUIRED OPTIONS: -e, -u, -n
  DISALLOWED OPTIONS: -d, -g
2. Use existing degradome density file, align small RNA queries, and analyze. #-d：比如我上面例子得到的degradome.clean.fa.gz_dd.txt文件
  REQUIRED OPTIONS: -d, -u, -n
  DISALLOWED OPTIONS: -e, -g
3. Align degradome data, use existing small RNA query alignments, and analyze. #-g表示小RNA与转录本的比对记录
  REQUIRED OPTIONS: -e, -n, -g
  DISALLOWED OPTIONS: -d, -u
  IRRELEVANT OPTIONS: -a, -r
4. Use existing degradome density file and existing small RNA query alignments, and analyze.
  REQUIRED OPTIONS: -d, -g
  DISALLOWED OPTIONS: -e, -u
  IRRELEVANT OPTIONS: -a, -r

Categories:

Category 4: Just one read at that position
Category 3: >1 read, but below or equal to the average* depth of coverage on the transcript
Category 2: >1 read, above the average* depth, but not the maximum on the transcript
Cateogry 1: >1 read, equal to the maximum on the transcript, when there is >1 position at maximum value
Cateogry 0: >1 read, equal to the maximum on the transcript, when there is just 1 position at the the maximum value

5. 再举个栗子

如果我们需要查询的小RNA有多个，该怎么处理呢？

将含有多个小RNA的文件拆成多个小RNA.fasta文件，依次运行模式1，也就是我上面的例子；
但是这样还不够简洁，因为降解组数据比对到转录本每次都是一样的，所以可以运行1次模式1，剩下n-1都运行模式2

比如我有10个小RNA

        for i in {1..10}
        do
                if [ "${i}" -eq 1 ]; then
                        mkdir ${path6}${sample}/${i}
                        cd ${path6}${sample}/${i}
                        ${path1}perl ${path2}CleaveLand4.pl -e ${path3}degradome.clean.fa.gz \
                        -u ${path4}num${i}_miRNA.fasta -n ${path5}${sample}.xxx.fasta \
                        -p 0.05 -c 2 -t -o ./plot${i} > out${i}.txt
                fi
                if [ "${i}" -gt 1 ]; then
                        mkdir ${path6}${sample}/${i}
                        cd ${path6}${sample}/${i}
                        ${path1}perl ${path2}CleaveLand4.pl -d ${path3}degradome.clean.fa.gz_dd.txt \
                        -u ${path4}num${i}_miRNA.fasta -n ${path5}${sample}.xxx.fasta \
                        -p 0.05 -c 2 -t -o ./plot${i} > out${i}.txt
                fi
        done