Protocols for RNA-seq data analy

2020-08-20  本文已影响0人  烦啦烦啦
1.Using rice as an example
2.All scripts and rawdata can be found in '/data/dta/shared/rnaseqworkflow'(For lab members)

Before working :

  1. Create a root directory to store all future data
  2. Create a subdirectory , download reference genome data and annotations
  3. Use the alignment software you like to make index for genome
  4. Create other subdirectories to store different data such as raw data, matrix, script
code:
$ mkdir Drought_stress
$ mkdir Drought_stress/Rice  && cd Drought_stress/Rice
$ mkdir data matrix homology olddata reference src_rice
$ mkdir reference/IRGSP && cd  reference/IRGSP
$ wget ftp://ftp.ensemblgenomes.org/pub/release-47/plants/fasta/oryza_sativa/dna/Oryza_sativa.IRGSP-1.0.dna.toplevel.fa.gz
$ wget ftp://ftp.ensemblgenomes.org/pub/release-47/plants/gtf/oryza_sativa/Oryza_sativa.IRGSP-1.0.47.gtf.gz
$ wget ftp://ftp.ensemblgenomes.org/pub/release-47/plants/gff3/oryza_sativa/Oryza_sativa.IRGSP-1.0.47.gff3.gz
$ gunzip *.gz
$ module load Anaconda3 hisat2
$ mkdir hsindex
$ hisat2-build -p 8 Oryza_sativa.IRGSP-1.0.dna.toplevel.fa hsindex/IRGSP
$ module unload Anaconda3 hisat2



Workflow:

1-3 :Run on the server. 4-7:Run on personal computer. 8-9:Run on the server
  1. Find bioprojects according to drought, roots and other conditions
  2. Make a samplelist.txt and save the sra number to be downloaded under data subdirectory
  3. command : nohup sh RNAseq_workflow.sh &
code:
$ cd ~/Drought_stress/Rice/data
$ vim samplelist.txt  # Then Enter the sra number we want to download
$ cd ../src_rice
$ nohup sh RNAseq_workflow.sh &  # This script can be found in the attachment

  1. Send count files to the local for downstream analysis(The R version of the server is too high to support the R package “biomRt”)
    (We can use scp command or FileZilla software to transfer files between local and server )
  2. Build an R project and use DESeq2 and biomaRt for diff analysis and annotation in Rstudio locally
  3. Run the following R scripts in sequence :downstream.R > Deseq2analysis.R > merge_desingn.R (Whole project can be found in the attachment named Rice4.zip)
  4. Send the diff gene table and gene count table to the server,Put them in the '~/Drought_stress/Rice/homology' directory

  1. Go to src_rice subdirectory
  2. Run related scripts
code:
$ cd ~/Drought_stress/Rice/src_rice
$ nohup sh anno.sh &
$ nohup sh merge.sh &
#Scripts can be found in the attachment
# the Rice.anno.txt can be found in the attachment
# the head.txt is a Colname for the final output table which was edit  and bind  from the colname of those raw files we used.

Attention:

If you have any suggestions or comments, please contact the author via xuyp8121@mail.ustc.edu.cn
We have been looking forward to friends who have the same interests in systems biology and comparative biology !!!
上一篇 下一篇

猜你喜欢

热点阅读