miRNA 专题 | 数据过滤 & 比对 & 靶基因预测

2021-07-18  本文已影响0人  新_世_界




具体参考[miRNA 数据过滤我使用cutadapt](miRNA 数据过滤我使用cutadapt),进行了一些整理,感谢博主的分享。

一、miRNA 数据过滤(植物18~30nt)

cutadapt  -a  AGATCGGAAGAGCACACGTCT  -m  15  -q  20  --discard-untrimmed  -o  outname .fa


二、miRNA 比对

为了减少比对时间,在比对之前可将每个样本中的reads 进行合并,得到fasta 格式,其命名规则为:样本_r数字_x数字,其中r中的数字表示reads序号;x中数字表示该条reads重复次数

miR-PREFeR 软件的使用

介绍:miR-PREFeR: microRNA PREdiction From small RNAseq data,本文主要参考github上的tutorial


1. Required programs (必要的安装包)

a. 提前安装ViennaRNA,且版本最好在1.8.5、2.1.2、 2.1.5及以上 。

tar  zvxf  ViennaRNA-2.4.18.tar.gz
cd  ViennaRNA-2.4.18.tar.gz
./configure --prefix="/user/tools/ViennaRNA/" --without-perl
make  install

b. 安装samtools (0.1.15 或之后的版本)

cd   /manager/biosoft/
tar  jfx  samtools-0.1.19.tar.bz2
cd  samtools-0.1.19


The current version is only tested under Python 2.6.7, Python 2.7.2 and Python 2.7.3 and should work under Python 2.6. and Python 2.7.

2. Obtain and install the pipeline (下载安装miR-PREFeR)

git clone


3. Test the pipeline (软件调试用,可以跳过)



1. Test the pipeline.

# The package provides a small example dataset for testing the pipeline. The
# dataset is for Aradidopsis, chromosome 1. To run the example, first change
# directory to the example folder:

cd  example
tar  xvf  exampledata.tar.gz       #  Then decompress the exampledata.tar.gz file:

# Then open the config.example file, change the PIPELINE_PATH to the path where
# you put the miR-PREFeR package folder. For example, if you put miR-PREFeR at
# /home/username/tools/miR-PREFeR-v0.09, then set PIPELINE_PATH as:

# Save the config.example file. In the example folder, execute command:
python  ../  -L  -k  pipeline  config.example

# The -L option generates a log file in the output directory example-result. The
# -k option keeps the temp directory used to store the intermediate files. The
# temp directory is in the example-result directory.

# If you have python, samtools, RNALfold installed and in the PATH, you should be
# able to run the test program. It takes about one or two minutes to
# finish. You'll be able to see the result in the example-result folder.

2. Test how to do checkpointing.

# Before testing this, if you have run the pipeline with the example.config file
# in this folder, please remove the example-result folder first.

# Then change the 'CHECKPOINT_SIZE' option to a smaller value (30, for
# example). The reason to do this is that by default the pipeline makes a
# checkpoint after finishing folding every 3000 sequences, but the sample data is
# so small that the total number of sequences is smaller than the default.

# Then run the pipeline with 'pipeline' command:
python  ../  -L  -k  pipeline  config.example

# After running for a while (10 seconds, for example. You should let it run for
# enough time to do at least one checkpoint. A "Done" is shown when a checkpoint
# is applied), kill the process by "Ctrl-C". To check where the pipeline was stopped,
# run:
python ../ -L check config.example

# This will show the checkpoint information.

# To restart the pipeline from where it was stopped, run:
python  ../  -L  recover  config.example

# The pipeline will continue to finish the job specified in the config.example

4. How to run the pipeline (现在正式干活了)

a. Prepare input data for the pipeline.
  1. A fasta file, which contains the gnome sequences of the species under study.
  2. one or more SAM files which contains the alignments of small RNAseq data with the gnome.
  3. (Optional) An GFF ( file which lists regions in the gnome sequences that should be ignored from miRNA analysis.
a). Genome fasta file (是A fasta file的解读)

Fasta format specification can be found at In miR-PREFeR, for the string following ">", only the first word that is delimited by any white space characters (whitespace, tab, etc) is used. For example, for the following sequence, 'ath-MIR773a' is used as the identifier of the seqeunce. Thus, please ensure that all the sequences in the FASTA files have different identifiers.

>ath-MIR773a MI0005103
b). SAM alignment files (是SAM files的解读)

The miR-PREFeR pipeline takes SAM format alignment files. SAM alignment files can be generated by many aligners. Here we use Bowtie ( as an example.



上一篇 下一篇

