文献阅读

一篇头颈癌npc的文献抄读

2020-03-06  本文已影响0人  小梦游仙境

文章题目:Genomic Analysis of Nasopharyngeal CarcinomaReveals TME-Based Subtypes

文章链接:https://mcr.aacrjournals.org/content/15/12/1722.abstract

image-20200131200726438

看这暗黑色系的题目,就感觉它不简单...

生词

TIL: tumor-infiltrating lymphocytes肿瘤浸润的淋巴细胞

mutation load 参考nature 的《Estimating the mutation load in human genomes》

The process of mutation constantly creates deleterious variation in a population. These mutations can persist for some time, depending on the intensity of drift and purifying selection. The burden of deleterious variants carried by a population was the subject of classical work in population genetics during the mid‑twentieth century and was termed mutation load.

目的

Our objectives were toidentify distinct gene expression–based subtypes of this disease,to define the tumor and microenvironment gene expression features of these subtypes, to examine how these features influ-ence prognosis, and to characterize the mutational landscape of NPC and how it relates to subtype.

疑惑1:

这个目的是确定一个subtype分型,是之前从未有过的嘛?作者在这篇文章中首次确定一个,那如果是首次确定的话,他就是一个结果,那后面怎么上来第一张结果的图片就用了呢?

方法

Gene expression–based subtypes and analysis

往下看作者的描述是Clustering was performed using the Affinity Propagation algorithm, varying the self-similarity parameter to obtain clustering solutions varying from two to six clusters. For each solution, the WB index was calculated as the product of the number of clusters and the ratio of within cluster to between cluster variance. Three clusters minimized the WB index.

We found that the best number of clusters was between 2 and 4 in the large majority oftrials, with three clusters the most frequent number identified.

image-20200131233858574

上面这个是第一个的附件图片,上了就疑惑了呢,下面感觉都不好看了

疑惑2:下面的句子完全不知道在说什么呀

Across1,000 random samplings, we computed the probability that two patient samples assigned to the same subtype would be grouped into the same cluster in the subsampled dataset.

然后说比下面图片上的 permuted 的方法好,不明白什么意思

image-20200131234516413

上面的疑惑我自己的理解就是:找到了一种方法,是用的基因表达量的矩阵,通过Affinity Propagation algorithm这种算法,结合我们后来下载的临床信息phenotype中每个病人都有一个expression sub-type,将病人分成了expression-based subtype: I、II、III。

image-20200131235022734

Whole-exome data analysis

111个病人

57个既有肿瘤组织也有blood样本-叫作paired cohort,后来又加了13个,一共有70个 blood samples (70) as a reference panel to assistsomatic mutation analysis.

54个只有肿瘤组织-叫作unpaired cohort

对于非同义突变的结果,在table s2和s3

同时下面也有红线的疑问

image-20200201000318847

Detection of somatic mutations

image-20200201000918332

上图消除的是3种SNVs,没有背景,不知道消除的具体是什么样的

结果

结果一 Gene expression–based subtypes

作者用无监督聚类的方法根据mRNA的expression 将病人分为了三类,而这个算法就是前面提到的Affinity Propagation algorithm,不过这个我们好像也不用关心。

A、B图是三种subtype中,其中仅仅含有 Differentiated这种组织类型的生存率是最低的;

C图是说在组织学上的分类,其中subtype3含有的TIL(肿瘤浸润的淋巴细胞)是最多的,subtype2是最少的 。

image-20200201110449710 image-20200201111157289

接下来作者提到了两个检验,分别是Kruskal–Wallis tesMWU test,从来没有见过,但是不怕他。反正就是说一种检验有显著差异,一种没有。如下描述

Group differences did not reach statistical significance by aKruskal–Wallis test, although the difference in stromal TILsbetween subtypes II and III was nominally significant (P¼0.03, MWU test). Median tumor cellularity was similar acrosssubtypes (Supplementary Fig. S4), although subtype I wasobserved to have significantly higher tumor content than subtypeII (P¼0.006, MWU test).

Supplementary Fig. S4 如下,说的是每个subtype所含有的tumor content,我其实也迷惑,这个tumor content是根据临床信息的哪个指标来看的呢

image-20200201111906959

找了下临床信息,应该就是这个了,tumor content,应该就是 intratumoralstromal 的tils,都是tils(肿瘤浸润的淋巴细胞),反正都是淋巴细胞,只不过一个是肿瘤组织内的,一个是基质的,这个是背景知识的问题了。所以上面的小提琴图一定是intratumoralstromal 的一种咯

image

结果二 Tumor microenvironment characteristics of subtypes

1.原文描述

Principal components analysis of gene expression data revealed that the first four principal components were significantly different across subtypes, and that they explained approximately 36% of variance in gene expression levels (Supplementary Fig. S5).

疑惑:四个主成分,怎么弄排列的每个主成分的解释程度的百分比呢?图是怎么画的呢?我虽然知道主成分不止主成分一和主成分二,但通常都是只是画主成分一和二呀,从来没弄过四、五、六个主成分呀,看附件5图

image-20200201113150772

但是原文说,前四个主成分就是可以解释approximately 36% of variance in gene expression levels,就是前四个主成分的百分比相加就得到36%了,所以我们就听他的,可能就是作者随机选的。

上图中的第二种图就是用了5个主成分,来将3个subtype进行区分了,可以看到,确实肉眼看,第一个主成分是将三种subtype 区分的最明显的了。

2.原文描述

For each principal component (PC), we identified the most correlated and anticorrelated genes (|PearsonR|>0.6) and assessed their canonical pathway enrichment in the MSIGDB collection (Sup-plementary Tables S5 and S6; ref. 32).

下面就是作者列出的每个主成分的基因和通路-Tables S5 and S6

image-20200201120049216 image-20200201120027241

PC1 was positively correlated with immune system genes and anticorrelated with cell cycleand proliferation markers.

PC2 was positively correlated withTGFb signaling.

PC3 was positively correlated with integrins,collagens, and genes involved in extracellular matrix receptor interactions.

PC4 was negatively correlated with genes annotated as being involved in cancer-specific pathways.

The strong association of PC1 and PC3 with immune and stromal cell genes, and the trend to differences by subtype in percent stromal TILs,prompted us to characterize the microenvironment of these tumor samples by examining signatures comprised of genes with enriched expression in various immune and stromal cell types, genes induced by interferons, and canonical proliferation markers (Supplementary Table S7; Supplementary Fig. S6).

疑惑:上面的一段英文,作者推导出来如下Table S7和Fig. S6的结果,我觉得这应该是一个耗时耗力还耗脑细胞的推导。table s7左边的T-cell、B-cell等是怎么得来的?

image-20200201115917409 image-20200201115953983 image-20200201120923918

下图是 Fig.1E的结果

image-20200201125759343

上面几张图的理解,就是作者挑出来的 B细胞、T细胞、细胞毒素、干扰素、巨噬细胞、成纤维细胞、i型干扰素、最后一个增殖。就是说作者挑选出(是作者认为挑选出了肿瘤免疫相关的着几种细胞,并且对应上了基因),然后在subtype这几种类型中,谁的表达量高,每个点代表的每个病人的在可以代表细胞的基因的表达量,哪些基因被选中是作者根据以往的经验和文献来选择(table s7有句英文是pre-specified gene signatures),也就是table s7中的结果(右边那列的基因少是作者仅仅列出了一部分),而接下来的 fig.s6 是就是根据table s7中选择的基因的表达量,进行了 A TSNE plot of the genes constituting the pre-specified gene signatures examined in this study is shown in the left panel. 就是说作者根据这些基因的表达量进行了主成分分析,来看这些预先设定的gene signature的表达量能否将这些基因所代表的这些 B细胞、T细胞、细胞毒素区分开,但是有个小疑惑,他用的是哪个病人呢然后 fig.s6 可以看到是能区分开的,右边的热图还做了个相关性展示。

上面的补充table s7fig.s6 就是为了说明作者 预先设定的gene signature是可以使用的,然后进行了 Fig.1E的展示。

结果三 EBV gene expression patterns by subtype

image-20200201130837442 image-20200201131233493 image-20200201133731558

作者在fig.1d进行了三个subtype的三种EBV感染的基因LMP1\LMP2B\EBNA1.并且在附件中展示了与EBV基因相关的其他基因和通路的相关性,下面原文的描述肯定需要背景知识了,至少癌症相关的通路有哪些需要清楚。

image-20200201132228695

结果四 The mutational landscape of NPC

应用的是 whole-exome sequencing and SNP 6.0 array profiling on tumor samples from 94 patients thatwere also profiled by RNA-seq

94个病人中有51个测了blood samples 视为normal

另外43个没有对应的normal blood sample 也或者指控失败

In the paired cohort, we identified a total of 1,520 nonsilent somatic mutations across 1,380 genes. The median mutation rate was 22 nonsynonymous lesions per tumor, which is comparable with the values reported in previous studies

关于上面的nonsilent somatic mutations,就搜到老大的菜鸟团推文经典的癌症外显子数据如何分析

image-20200201134102116

上面的图片的解释看原文把

结果五 TILs are associated with better PFS

1.原文描述

High stromal TILs, but not intratumoral TILs,were significantly associated with better PFS (Fig. 3A;P¼0.013,log-rank test). Interestingly, PC2 was found to be significantly higher in the low stromal TIL samples versus high stromal TILsamples (Fig. 3B;P¼0.02,ttest).

就是说基质中含有较高的TILs,预后好。同时有趣地发现,主成分PC2在低基质中TILs,要比高基质中TILs更高

image-20200201134821824

那么上面的结果也应该是作者这几个主成分都做了一遍吧,然后发现这个PC2是有显著差异的,那么就可以得出这个PC2和预后是有关的咯,PC2什么呢?是基因的表达量,也就是前面的一张主成分的一张结果图,能将subtype分成3类的那些主成分,每个主成分都是一些基因构成的表达量。

2.原文描述

image-20200201135544879

上面的思路还是要捋清的,首先是从PC2中拎基因,然后找到了9个gene sets,是基因集,不是基因。其实就是做了富集分析,得到九个显著的通路,那么每个通路必定都有基因咯,但是作者就是给描述成 gene sets

然后应该是还做了与最显著通路TGF通路最相关的通路

3.原文描述

image-20200201140625880 image-20200201140704682

上面图是说PC2里的基因和通路的相关性还挺高的,p值也显著

结果五 A proliferation signature is associated with poor PFS

1.原文描述

image-20200201141850724

上面的描述我感觉熟悉多了,就是做cox 回归模型。这次是把前面的PC1主成分结果、作者预先设定好(gene signature)的T-cell、profiliferation等进行cox回归,然后根据得到的HR值进行判断。增殖proliferation是1.1,为危险因素。如下表。

image-20200201141908664 image-20200201142356893

Fig.A,就是对从上面表中得到的proliferation这个signature来进行生存分析,就是先得到HR值然后做k-m生存分析这样的路子。

对于Fig.B,是说对于subtype 1型来说,增殖和mutation load 是不相关的,而对于subtype 2型和3型来说,是非常相关的(Fig. 4C;P¼2.1e�3,ttest),箱线图这么解释,学到了!

image-20200201141445101

Fig.C,作者用了多因素cox regression,最后说明对于 low proliferation high TIL的预后最后,就是绿色那根线。

这么结合两种因素来说哪种情况预后最后,也学到了!

Fig.D,做了与EB病毒相关的其他几个基因,在前面附件中罗列过,包括LMP1、LMP2B、RPMS1、A73、EBNA1等,发现Within subtype I, but not subtypes II and III, expression of EBV transcripts RPMS1 and A73(-0.49,6.9e�-5) was associated with lower proliferation signature.就是对于低增殖(预后好的)是RPMS1基因不相关,A73正相关。

疑惑:那到底是A73高表达还是低表达与 lower proliferation相关呢?我需要还得理解下相关性分析。

image-20200201141502585
上一篇 下一篇

猜你喜欢

热点阅读