Protocols for RNA-seq data analy
2020-08-20 本文已影响0人
1.Using rice as an example
2.All scripts and rawdata can be found in '/data/dta/shared/rnaseqworkflow'(For lab members)
Before working :
- Create a root directory to store all future data
- Create a subdirectory , download reference genome data and annotations
- Use the alignment software you like to make index for genome
- Create other subdirectories to store different data such as raw data, matrix, script
$ mkdir Drought_stress
$ mkdir Drought_stress/Rice && cd Drought_stress/Rice
$ mkdir data matrix homology olddata reference src_rice
$ mkdir reference/IRGSP && cd reference/IRGSP
$ wget
$ wget
$ wget
$ gunzip *.gz
$ module load Anaconda3 hisat2
$ mkdir hsindex
$ hisat2-build -p 8 Oryza_sativa.IRGSP-1.0.dna.toplevel.fa hsindex/IRGSP
$ module unload Anaconda3 hisat2
1-3 :Run on the server. 4-7:Run on personal computer. 8-9:Run on the server
- Find bioprojects according to drought, roots and other conditions
- Make a samplelist.txt and save the sra number to be downloaded under data subdirectory
- command : nohup sh &
$ cd ~/Drought_stress/Rice/data
$ vim samplelist.txt # Then Enter the sra number we want to download
$ cd ../src_rice
$ nohup sh & # This script can be found in the attachment
- Send count files to the local for downstream analysis(The R version of the server is too high to support the R package “biomRt”)
(We can use scp command or FileZilla software to transfer files between local and server ) - Build an R project and use DESeq2 and biomaRt for diff analysis and annotation in Rstudio locally
- Run the following R scripts in sequence :downstream.R > Deseq2analysis.R > merge_desingn.R (Whole project can be found in the attachment named
- Send the diff gene table and gene count table to the server,Put them in the '~/Drought_stress/Rice/homology' directory
- Go to src_rice subdirectory
- Run related scripts
$ cd ~/Drought_stress/Rice/src_rice
$ nohup sh &
$ nohup sh &
#Scripts can be found in the attachment
# the Rice.anno.txt can be found in the attachment
# the head.txt is a Colname for the final output table which was edit and bind from the colname of those raw files we used.