RNA-seq workshop-Day 1

2018-06-12 本文已影响27人猪猪头看世界

Day 1.jpg

本周的Data Workshop又开始了，这次将围绕着以R语言为工具，进行RNA-seq和ScRNA-seq的分析。今天主要回顾了R introduction的内容，温习了接下来将要用到的一些commands，然后对RNA-seq的流程进行了系列介绍。

1. Introduction to R (Dr. Rocio T Martinez-Nunez）

1.1 Objects

Assign to objects(vectors, tables, values, functions)

1.2 Commenting your code

Just add (#) before what you want to comment

1.3 system(): communicates with the shell in your computer

system("ls -F/")

1.4 cmd as a group of commands

cmd <- paste("gunzip -c", fastq.files, "| head")
cmd  # to view cmds & runs
system(cmd[1]) # Run the first command of cmd

1.5 Some R tips

1.5.1 ask for help

# in R: ? + function
?system
#in shell : (-h)
system("trim_galore -h")

1.5.2 Tab: look for the list of word match in R.

1.5.3 Arrow keys: up row-the last thing you type in.

1.5.4 Pines %>% in R or | in shell

install. packages("tidyverse")  # install packages
library("tidyverse")  # load packages
download.file("website", "path and name. csv")  # download file
surveys <- read_csv("path and name. csv")  # open file
str( surveys)  # inspect the data: an overview of an object's structure and its elements
dim( surveys)  # size: row numbers and column numbers
head( surveys)  # check the top(first six lines) of the data frame
surveys_new <- surveys %>%  # pipes
filter(weight < 5) %>%  # filter
select(species_id, sex, weight)  # select
str(surveys_new)  # inspect the data: an overview of an object's structure and its elements
dim(surveys_new)  # size: row numbers and column numbers
head(surveys_new)  # check the top(first six lines) of the data frame

Only works when install tidyverse.
%>% : shortcut keys in PC: ctrl + shift + M
%>% means then, (the things we want pipe) on the left, and (the things we want to pine into) on the right.

1.6 Some R functions we will be using:

 # create command cmd that includes trim_galore and its flags with the object we apply it to   
cmd <- paste("trim_galore --length 21 --output_dir trimgalore, fastq.files)  
# run only the first line of the commands
system(cmd[1])
# create vector with the power of 1, 2 and 3:
sapply(1:3, function(x) x^2)
#[1] 1, 4, 9

system(): communicates with the shell.
dir.create(): create directories.
list.files(): list the files in your working directory.
paste(): concatenates vectors after converting into character.
data.frame(): generates a data frame.
sapply(): applies a function to an object and returns a simplified object.

1.7 Loops: vectorization & sapply

for (year in c(2010, 2011, 2012, 2013, 2014, 2015)){
      print(paste("The year is", year))
}

2. Introduction to RNA-seq data analysis (Dr. Alessandra Vigilante)

2.1 What is NGS

Next-generation sequencing (NGS), also known as high-throughput sequencing, is the term used to describe a number of different modern sequencing technologies, such as RNA-seq, ScRNA-seq, ChIP-seq et al.

2.2 Eight stages in RNA-seq Analysis

2.2.1 Define the question of interest (RNA-seq data can tell us)

Relative expression levels within a biological sample
Gene expression differences between biological samples
Quantify alternative transcript levels
Confirm annotated 5′ and 3′ ends of genes
Map exon/intron boundaries

2.2.2 Get the data(data formats)

Raw data: Fastq
Aligned data: SAM, BAM, CRAM
Genome annotation: GFF
Intervals: BED
Variants: VCF, BCF

2.2.3 Clean the data(quality control)

FastQC: trimmomatic, cutadapt
The ShortRead package in R/Bioconductor using the qa() and report () functions

2.2.4 Map the data

Chanllenges: large costs in memory; introns; updates of reference genomes, tools and softwares.
Mapping srategies: de novo assembly, align to transcriptome, align to genome.
Tools: Bowtie 2, TopHat 2, STAR
Pseudo-alignment: Kallisto - faster and more accurate
If you have SAM files you have to transform them to BAM
You can visualise your BAM files in IGV
Use either your BAM file or the transcript abundance file (from Kallisto) to
generate a Count Table
Perform differential expression analysis and downstream analyses

2.2.5 Explore the data

2.2.6 Fit statistical models

2.2.7 Make your analysis reproducible

RNA-seq workflow in the workshop

3. Learning experience

今天第一个到workshop，一切准备很充分，全天学习很投入。
今天课程比较杂，遇到的很多新的问题和挑战，需要好好消化。
今天认识了Guys Campus的口腔医学华人博士，聊得很开心，KCL的口腔医学已经世界排名第二啦，进一步了解了国外博士的生活和学习风貌，值得学习他们的新技术新方法。
今天还认识了Denmark Campus的生信大牛，乐于助人还给我们讲述他的学习历程，希望接下来可以继续向他们请教，互帮互助。

本次笔记借鉴了KCL Workshop的学习资料及课件，请勿转载，如需引用请注明。