单细胞文献起始——补充背景知识
听说课程配笔记,学习无压力
构建文库
综述:Comparative Analysis of Single-Cell RNA Sequencing Methods. 2017, (doi: 10.1016/j.molcel.2017.01.023.)
涉及到了6中文库构建方法(CEL-seq2, Drop-seq, MARS-seq, SCRB-
seq, Smart-seq, and Smart-seq2),可以再结合相关的每一个文库找6篇文章
文章发现:Smart-seq2可以在每个细胞中找到最多的基因,同样费用比较高;检测少量细胞时,MARS-seq、SCRB-seq、Smart-seq2更有效
归一化
文献1:Assessment of Single Cell RNA-Seq Normalization Methods,2017 (doi: 10.1534/g3.117.040683)
评价了几种归一化方法:
fragments per kilobase of transcript per million mapped
reads (FPKM)(Mortazavi et al., 2008)
upper quartile (UQ)(Bullard et al., 2010)
Trimmed mean of M-values (TMM)(Robinson and Oshlack, 2010)
DESeq(Love et al.,2014)
removed unwanted variation (RUV)(Risso et al., 2014)
gamma regression model (GRM)(Ding et al., 2015).
文献2:Performance Assessment and Selection of Normalization Procedures for Single-Cell RNA-Seq, 2019 (DOI:https://doi.org/10.1016/j.cels.2019.03.010)
主要研究了scone方法:a flexible framework for assessing performance based
on a comprehensive panel of data-driven metrics
(http://bioconductor.org/packages/scone/)
另外方法还有很多,比如:LSF(Lun Sum Factors),BigNorm, Scnorm, BASiCS, RLE(size factor relative log expression)
降维
PDF: https://lib.ugent.be/fulltxt/RUG01/002/349/740/RUG01-002349740_2017_0001_AC.pdf
值得好好阅读,讲了许多关于降维原理和应用的知识
文中1.5.1部分(Clustering high-dimension to identify subtypes)写出:
Importantly, the reduced dimensionality data are less noisy than the high-dimensional data bust lose some of the biological variance.
文章1:PCA, MDS, k-means, Hierarchical clustering and heatmap.
文章2:Outlier Preservation by Dimensionality Reduction Techniques
"MDS best choice for preserving outliers, PCA for variance, & T-SNE for clusters"
鉴定细胞群
每个术语都对应一篇文献
降维:PCA、tSNE、DM(Diffusion maps)
feature selection:M3Drop(Michaelis-Menten Modelling of Dropouts)、HVG(Highly variable genes)、Spike-in based methods、Correalated expression
Seurat:is an R package designed for the analysis and visualization of single cell RNA-seq data. It contains easy-to-use implementations of commonly used analytical techniques, including the identification of highly variable genes, dimensionality reduction (PCA, ICA, t-SNE), standard unsupervised clustering algorithms (density clustering, hierarchical clustering, k-means), and the discovery of differentially expressed genes and markers.
SC3:SC3 achieves high accuracy and robustness by consistently integrating different clustering solutions through a consensus approach. Tests on twelve published datasets show that SC3 outperforms five existing methods while remaining scalable, as shown by the analysis of a large dataset containing 44,808 cells. Moreover, an interactive graphical implementation makes SC3 accessible to a wide audience of users, and SC3 aids biological interpretation by identifying marker genes, differentially expressed genes and outlier cells.
tSNE+kmeans
SNN-Clip: doi: 10.1093/bioinformatics/btv088
SINCERA: SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis.
综述:A systematic performance evaluation of clustering methods for single-cell RNA-seq data (SC3 and Seurat show the most favorable results)
关于各种单细胞工具:https://www.scrna-tools.org/
文章在:Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database
在单细胞天地的公众号里面有
#第一期单细胞视频笔记汇总
根据目录内容,里面大多数是教学如何实现代码得到想要的结果,所以在这里我选择先花两天时间补充背景知识【12.15-12.16】而后再根据里面的内容来进行具象实现。