基因突变

TCGA 同源重组缺陷(HRD)

2023-01-20  本文已影响0人  小洁忘了怎么分身

0.背景知识

最近在研读新的文献O(∩_∩)O。

Genomic, epigenomic, and transcriptomic signatures for telomerase complex components: a pan‐cancer analysis

其中提到:

Genomic instability included aneuploidy, somatic total mutation burden (TMB), somatic copy number alterations (SCNA), loss of heterozygosity (LOH), and homologous recombination deficiency (HRD).

基因组不稳定性包括非整倍性、体细胞总突变负荷(TMB)、体细胞拷贝数改变(SCNA)、杂合性丧失(LOH)和同源重组缺陷(HRD)。

同源重组修复(homologous recombination repair,HRR)是DNA双链断裂(double strand break,DSB)的首选修复方式。

同源重组修复缺陷(homologous recombination deficiency,HRD)通常指细胞水平上的HRR功能障碍状态,可由HRR相关基因胚系突变或体细胞突变以及表观遗传失活等诸多因素导致,常存在于多种恶性肿瘤中,其中在卵巢癌、乳腺癌、胰腺导管癌、前列腺癌等肿瘤中尤其突出。

杂合性缺失(loss of heterozygosity,LOH):大于15 Mb且小于整个染色体长度的杂合性缺失;

端粒等位基因不平衡(telomeric allelic imbalance,TAI):延伸到其中一个亚端粒但不超过着丝粒且大于11 Mb的等位基因不平衡的染色体片段;

大片段迁移(large-scale state transition,LST):两个相邻区域(两个区域长度均大于等于10 Mb,且区域间距小于3 Mb)之间的染色体断裂位点,肿瘤基因组截断点的总数可以用来描述基因组的不稳定性。

中国抗癌协会肿瘤标志专业委员会遗传性肿瘤标志物协作组, 中华医学会病理学分会分子病理学组 . 同源重组修复缺陷临床检测与应用专家共识(2021版)[J]. 中国癌症防治杂志, 2021, 13(4): 329-338.

经过查找搜寻,找到了HRD的数据计算结果。在:

https://gdc.cancer.gov/about-data/publications/PanCan-DDR-2018

DNA damage repair (DDR)

给出了一组tsv,和一个大的xls文件,这是关于DNA损伤修复所有的资料,非常之齐全。

dir("TCGA_DDR_Data_Resources/")
##  [1] "DDRscores.tsv"          "GeneAlterations.tsv"    "GeneDeletions.tsv"     
##  [4] "GeneMutations.tsv"      "Genes.tsv"              "GeneSilencing.tsv"     
##  [7] "PathwayAlterations.tsv" "PathwayDeletions.tsv"   "PathwayMembership.tsv" 
## [10] "PathwayMutations.tsv"   "Pathways.tsv"           "PathwaySilencing.tsv"  
## [13] "Samples.tsv"            "Scores.tsv"

对这些文件的描述在pdf文件里,可以在上面的网址里下载到。

DDRscores.tsv

Data table listing the scores of the 43 DDR footprints and the RRPA-based DDR score across all samples (n=9,125).

The order of the genes, pathways, samples and footprint scores in these TSV files are as given in Genes.tsv, Pathways.tsv, Samples.tsv and Scores.tsv

1.获得DDRscores表格

sc = read.delim("TCGA_DDR_Data_Resources/DDRscores.tsv",header = F)
s = read.delim("TCGA_DDR_Data_Resources/Samples.tsv",header = F)
cl = read.delim("TCGA_DDR_Data_Resources/Scores.tsv",header = F)
rownames(sc) = s$V1
colnames(sc) = cl$V1
colnames(sc)

##  [1] "mutLoad_silent"         "mutLoad_nonsilent"      "mutSig1"               
##  [4] "mutSig2"                "mutSig3"                "mutSig4"               
##  [7] "mutSig5"                "mutSig6"                "mutSig7"               
## [10] "mutSig8"                "mutSig9"                "mutSig10"              
## [13] "mutSig11"               "mutSig12"               "mutSig13"              
## [16] "mutSig14"               "mutSig15"               "mutSig16"              
## [19] "mutSig17"               "mutSig18"               "mutSig19"              
## [22] "mutSig20"               "mutSig21"               "CNA_n_segs"            
## [25] "CNA_frac_altered "      "CNA_n_focal_amp_del"    "aneuploidy_score"      
## [28] "aneuploidy_score_prime" "LOH_n_seg"              "LOH_frac_altered "     
## [31] "purity"                 "ploidy"                 "genome_doublings"      
## [34] "subclonal_frac"         "HRD_TAI"                "HRD_LST"               
## [37] "HRD_LOH"                "HRD_Score"              "eCARD"                 
## [40] "PARPi7"                 "PARPi7_bin"             "RPS"                   
## [43] "tp53_score"             "rppa_ddr_score"

# HDR分数
scores = sc[,35:38]
head(scores)

##                 HRD_TAI HRD_LST HRD_LOH HRD_Score
## TCGA-OR-A5J1-01       3       2       2         7
## TCGA-OR-A5J2-01       4       2       3         9
## TCGA-OR-A5J3-01       0       0       0         0
## TCGA-OR-A5J5-01       2       2       4         8
## TCGA-OR-A5J6-01       3       1       1         5
## TCGA-OR-A5J7-01      10       8       3        21

nrow(scores)

## [1] 9125

2. 加上癌症类型画个图看看

head(s)

##                V1  V2
## 1 TCGA-OR-A5J1-01 ACC
## 2 TCGA-OR-A5J2-01 ACC
## 3 TCGA-OR-A5J3-01 ACC
## 4 TCGA-OR-A5J5-01 ACC
## 5 TCGA-OR-A5J6-01 ACC
## 6 TCGA-OR-A5J7-01 ACC

identical(s$V1,rownames(scores))

## [1] TRUE

scores = cbind(s,scores)
colnames(scores)[1:2] = c("Id","Project")
library(tidyverse)
dat = drop_na(scores,HRD_Score)
su = group_by(dat,Project) %>% 
  summarise(a = median(HRD_Score)) %>% 
  arrange(desc(a))
dat$Project = factor(dat$Project,levels = su$Project)
library(ggplot2)
library(RColorBrewer)
mypalette <- colorRampPalette(brewer.pal(8,"Set1"))
ggplot(dat,aes(x = Project,y = HRD_Score,fill = Project))+
  geom_boxplot()+
  theme_bw()+
  theme(axis.text.x = element_text(vjust = 1,hjust = 1,angle = 45),legend.position = "bottom")+
  scale_fill_manual(values = mypalette(33))+
  guides (fill=guide_legend (nrow=3, byrow=TRUE))
# ggplot legend number of rows

这个数据本身就是全部癌症样本啦。可以看到HRD分数最高的是卵巢癌。

3. 对表格列名的注释

xs = rio::import_list("TCGA_DDR_Data_Resources.xlsx")
names(xs)

##  [1] "DDR genes and pathways"          "DDR gene alterations ONCOPRINT" 
##  [3] "DDR gene mutations"              "DDR deep deletions"             
##  [5] "DDR epigenetic silencing"        "DDR gene alterations"           
##  [7] "DDR footprint summary"           "DDR footprints"                 
##  [9] "DDR pathway alterations ONCOPR " "DDR pathway mutations "         
## [11] "DDR pathway deletions"           "DDR pathway silencing"          
## [13] "DDR pathway alterations"         "ME CO analysis core pathways"   
## [15] "ME CO analysis inclusive pathw " "DDR Survival Univariate"        
## [17] "DDR Survival Multivariate"       "DDR gene fusions"               
## [19] "TP53 predictor"

xs[[7]][14:17,1]
## [1] "TAI"       "LST"       "HRD LOH"   "HRD Score"
xs[[7]][14:17,2]
## [1] "number of subchromosomal regions with allelic imbalance extending to the telomere"           
## [2] "number of chromosomal breaks between adjacent regions of at least 10Mb"                      
## [3] "the number of LOH regions of intermediate size (> 15MB but < whole chromosome in length)"    
## [4] "Homologous recombination deficiency score calculated from three scores (TAI + LST + HRD LOH)"
上一篇 下一篇

猜你喜欢

热点阅读