10X单细胞技术在新冠文章中的运用
hello,大家好,今天我们来分享一下张泽民团队发表的文章COVID-19 immune features revealed by a large-scale single-cell transcriptome atlas,这篇文章于2021年4月发表于cell,这篇文章对196个人284个样本进行了单细胞测序,创建了拥有146万个细胞的全面免疫景观。
这一庞大的数据集使我们能够识别出不同的外周免疫亚型变化与新冠肺炎的年龄、性别、 严重程度 和疾病阶段等明显的临床特征相关。
在多种上皮细胞和免疫细胞类型中发现了新冠病毒 (SARS-CoV-2) RNA,伴随着新冠病毒阳性细胞内转录组的急剧变化。
S100A8/A9的系统性上调,主要是通过外周血中的巨核细胞和单核细胞,这可能导致了在重症患者中经常观察到的细胞因子风暴。
这些数据 为了解新冠肺炎 (COVID-19) 的发病机理和制定有效治疗策略提供了丰富的资源 。
图片.png 图片.png但是,作为生信人员,更加关注的是如何分析得到这些结果,所以我们分享的是,文章中的生信分析方法。
我们从质控开始,关于10X单细胞的质控方法,大家可以参考文章10X单细胞(10X空间转录组)数据分析之细胞过滤那些事,我们来看看文章中的过滤:
首先UMI conut and gene count,cells with less than 1000 UMI counts and 500 detected genes were filtered(这个很常规).
线粒体比例,cells with more than 10% mitochondrial gene counts(也很常规).
doublets,关于去除多细胞,大家可以参考文章10X单细胞(10X空间转录组)多细胞去除之Chord,作者这里采用的方法是分组织进行:
- PBMC:with UMI counts above 25,000 and detected genes above 5,000 are filtered out
- other tissues : cells with UMI counts above 70,000 and detected genes above 7,500 are filtered out。
- 软件辅助鉴别,Scrublet,关于这个软件python分析单细胞数据,多细胞去除的模块,identify potential doublets. The doublet score for each single cell and the threshold based on the bimodal distribution(双峰分布) was calculated using default parameters. The expected doublet rate was set to be 0.08, and cells predicted to be doublets or with doubletScore larger than 0.25 were filtered。
均一化,scran包实现这个目的,然后进行下游分析。
二、批次矫正,关于批次矫正,大家可以参考文章10X单细胞(10X空间转录组)数据分析的一些分析细节,我们来看看作者的方法.
used the harmony algorithm to do batch effect correction。(这里需要详细分步骤分析)
首先挑选高变基因:detect the most variable genes used for harmony algorithm, we performed variable gene selection separately for each sample(每个样本先单独计算高变基因),A consensus list of 1,500 variable genes was then formed by selecting the genes with the greatest recovery rates across samples, with ties broken by random sampling.(很关键,共有的1500个高变基因)。然后将所有核糖体,线粒体和免疫球蛋白基因从列表中删除 (这个也需要注意).
harmony矫正,we calculate a PCA matrix with 20 components using such informative genes and then feed this PCA matrix into HarmonyMatrix() function implemented in R package Harmony. We set sample and dataset as two technical covariates for correction with theta set as 2.5 and 1.5, respectively(有关harmony参数的意义,大家可以参考文章10X单细胞(10X空间转录组)整合分析之nature文献思路整理)。接下来就是scanpy进行下游分析。
下游分析的一个细节,再次质控,single cells expressing two sets of well-studied canonical markers of major cell types
were labeled as doublets and excluded from the following analysis.标红的部分就是多细胞去除的第二个原则,Also, cells highly expressed HBA, HBB and HBD, which are the markers for erythrocytes, were also excluded.(红细胞的去除)。
Detection and processing scRNA-seq data with viral RNA
首先配置参考基因组,customized reference genome,in which the SARS-CoV-2 genome (NC_045512, NCBI Refseq) was added as an additional chromosome to the human reference genome.(这个方法相对常规)。Single cells with viral reads (UMI > 0) were retained.
Cells with less than 200 genes expressed or more than 20% mitochondrial counts were excluded, as well as those labeled as doublet following aforementioned protocol(大家注意到没,前后两次的质控标准是不一样的)。
还有不一样的地方,降维的时候高变基因只选取了前500个(常规来讲很低),而且这里没有再进行harmony矫正,因为上面已经定义过和聚类过,transfer一下应该就可以了。
TCR/BCR检测,这里文章还有针对克隆型的多样本,并未能解释到关键克隆型的存在,什么时候能有一个数据库,针对每个克隆型的作用进行剖析就好了。
Cell-cell communication analysis between PBMC and BALF by iTALK
To identify and visualize the possible cell-cell interactions in terms of cytokine storm between the highly inflammation-correlated cell types evaluated by the inflammation score within each tissue and the crosstalk between lung and circulating blood, we employed an R package iTALK introduced by (Wang et al., 2019). Cytokine/chemokine category (n = 320) in the ligand-receptor database was selectively used for our purpose. Wilcoxon rank sum test was used to identify the differentially expressed genes (DEGs) between the progression (severe) and progression (moderate) patient groups for each cell type. DEGs were then matched and paired against the ligand-receptor database to construct a putative cell-cell communication network. An interaction score defined as the product of the log fold change of ligand and receptor was used to rank these interactions. In addition, the expression level of both ligand and receptor were also considered. We defined severe gained interaction if a ligand gene was upregulated in the progression (severe) group and its paired gene upregulated or remains no change. We defined severe lost interaction if a ligand (receptor) gene was downregulated in the progression (severe) group regardless of the expression level of its paired gene
关于iTALK,大家可以参考文章细胞通讯-iTALK使用方法。
所谓细节决定成败,生信分析体现的尤为明显.
生活很好,有你更好~~~