GEO-芯片数据统计学学习

R|illumina芯片beads系列-lumi

2021-04-19  本文已影响0人  高大石头

Illumina是常见的3大制造商之一,lumi包主要针对Illumina公司出品的bead系列表达谱和甲基化芯片进行质控、标准化,获得表达矩阵。它已经封装好lumiExpresso()函数,包含N、T、B、Q(normalization,transformation,background correction, quality control)四个步骤。

LumiBatch对象是通过lumiR.batch()读取的被llumina Bead Studio toolkit 处理好的文件。

下面以GSE40553为例

Illumina BeadStudio (GenomeStudio)文件

1.1 数据下载及读取

gse = "GSE40553"
#setwd(gse)
library(lumi)
library(GEOquery)
#gunzip("GSE40553_Original_IlluinaFile_FromBeadStudio_UKpatients.txt.gz")
x.lumi <- lumiR("GSE40553_Original_IlluinaFile_FromBeadStudio_UKpatients.txt")
## Perform Quality Control assessment of the LumiBatch object ...
pd <- pData(phenoData(x.lumi))

1.2 质量控制及标准化

data.eset <- lumiExpresso(x.lumi) #质控
## Background Correction: bgAdjust 
## Variance Stabilizing Transform method: vst 
## Normalization method: quantile 
## 
## 
## Background correction ...
## Perform bgAdjust background correction ...
## The data has already been background adjusted!
## done.
## 
## Variance stabilizing ...
## Perform vst transformation ...
## 2021-04-13 18:20:28 , processing array  1 
## 2021-04-13 18:20:28 , processing array  2 
## 2021-04-13 18:20:28 , processing array  3 
## 2021-04-13 18:20:28 , processing array  4 
## 2021-04-13 18:20:28 , processing array  5 
## 2021-04-13 18:20:28 , processing array  6 
## 2021-04-13 18:20:28 , processing array  7 
## 2021-04-13 18:20:28 , processing array  8 
## 2021-04-13 18:20:28 , processing array  9 
## 2021-04-13 18:20:29 , processing array  10 
## 2021-04-13 18:20:29 , processing array  11 
## 2021-04-13 18:20:29 , processing array  12 
## 2021-04-13 18:20:29 , processing array  13 
## 2021-04-13 18:20:29 , processing array  14 
## 2021-04-13 18:20:29 , processing array  15 
## 2021-04-13 18:20:29 , processing array  16 
## 2021-04-13 18:20:29 , processing array  17 
## 2021-04-13 18:20:29 , processing array  18 
## 2021-04-13 18:20:29 , processing array  19 
## 2021-04-13 18:20:29 , processing array  20 
## 2021-04-13 18:20:29 , processing array  21 
## 2021-04-13 18:20:29 , processing array  22 
## 2021-04-13 18:20:30 , processing array  23 
## 2021-04-13 18:20:30 , processing array  24 
## 2021-04-13 18:20:30 , processing array  25 
## 2021-04-13 18:20:30 , processing array  26 
## 2021-04-13 18:20:30 , processing array  27 
## 2021-04-13 18:20:30 , processing array  28 
## 2021-04-13 18:20:30 , processing array  29 
## 2021-04-13 18:20:30 , processing array  30 
## 2021-04-13 18:20:30 , processing array  31 
## 2021-04-13 18:20:30 , processing array  32 
## 2021-04-13 18:20:31 , processing array  33 
## 2021-04-13 18:20:31 , processing array  34 
## 2021-04-13 18:20:31 , processing array  35 
## 2021-04-13 18:20:31 , processing array  36 
## 2021-04-13 18:20:31 , processing array  37 
## 2021-04-13 18:20:31 , processing array  38 
## 2021-04-13 18:20:31 , processing array  39 
## done.
## 
## Normalizing ...
## Perform quantile normalization ...
## done.
## 
## Quality control after preprocessing ...
## Perform Quality Control assessment of the LumiBatch object ...
## done.
data.exprs <- exprs(data.eset)
colors <- rainbow(ncol(data.exprs)*1.2)
boxplot(data.exprs,col=colors,mian="expression value1",las=2)

备注:lumiExpresso()进行背景校正,log2处理,quantile标准化。所有已经算是非常完备了。

non-normalized data

rm(list = ls())
library(lumi)
data.Nnorm <- data.table::fread("GSE40553_non-normalized_UKLong.txt",data.table = F) 
head(data.Nnorm)
##         REF_ID 2083-2 Detection Pval 2055-5 Detection Pval 2083-4
## 1 ILMN_1802380  378.4        0.00000  365.6        0.00000  401.2
## 2 ILMN_1893287  -20.7        0.98571    0.5        0.48571   12.5
## 3 ILMN_3238331  -10.6        0.79091   -2.0        0.58052   -5.6
## 4 ILMN_1736104   -7.2        0.71818   12.6        0.06883   -3.4
## 5 ILMN_1792389   23.5        0.03247   32.4        0.00000   70.7
## 6 ILMN_1854015   28.3        0.01299   17.7        0.02468   30.5
##   Detection Pval 2030-5 Detection Pval 2065-1 Detection Pval 2074-2
## 1        0.00000  224.9        0.00000  448.0        0.00000  588.1
## 2        0.12468   13.4        0.03117    6.0        0.27403    7.2
## 3        0.68701  -10.9        0.95325  -15.4        0.91818  -17.0
## 4        0.60649   -1.4        0.56494    1.4        0.43377   14.2
## 5        0.00000   20.9        0.00000   12.1        0.14026   60.1
## 6        0.00519   17.4        0.00390   17.7        0.07532   22.7
##   Detection Pval 2074-5 Detection Pval 2009-4 Detection Pval 2065-2
## 1        0.00000  402.4        0.00000  397.7        0.00000  246.6
## 2        0.27403    7.6        0.25714   -3.2        0.62987    2.3
## 3        0.92078  -13.8        0.85455    0.0        0.47792   -8.8
## 4        0.12208   26.4        0.01818    6.6        0.21818    0.2
## 5        0.00000   45.3        0.00130   26.3        0.00519   23.8
## 6        0.04545   33.1        0.00649   20.4        0.01818   13.0
##   Detection Pval 2030-4 Detection Pval 2004-2 Detection Pval 2055-1
## 1        0.00000  300.0        0.00000  504.2        0.00000  470.3
## 2        0.38442  -16.0        0.93117    6.6        0.28701   12.3
## 3        0.83766   -1.3        0.52987   -5.5        0.63636   -6.9
## 4        0.47273  -10.0        0.76883    2.5        0.41429   -9.3
## 5        0.00260   82.9        0.00000   26.6        0.02727   68.9
## 6        0.06364   10.0        0.19091   24.8        0.02987   30.4
##   Detection Pval 2055-2 Detection Pval 2083-3 Detection Pval 2074-4
## 1        0.00000  375.2        0.00000  520.5        0.00000  260.6
## 2        0.18442   12.1        0.14935    4.1        0.36234    0.7
## 3        0.67662   -4.9        0.64545   -3.6        0.59740    7.5
## 4        0.72857   -4.2        0.62597   -3.7        0.59740   -7.9
## 5        0.00000   31.4        0.00519   33.4        0.00260   76.0
## 6        0.01429   32.5        0.00130    5.9        0.31558   18.0
##   Detection Pval 2074-1 Detection Pval 2074-3 Detection Pval 2009-1
## 1        0.00000  291.0        0.00000  296.4        0.00000  439.8
## 2        0.45844    5.5        0.24156    2.3        0.37662   -0.7
## 3        0.21688   -0.7        0.51818   -1.8        0.56883   -9.0
## 4        0.77273   -0.3        0.50390    0.3        0.46234   -7.9
## 5        0.00000   14.7        0.04026   13.5        0.05844   18.1
## 6        0.04286   23.9        0.00519   22.0        0.00649   12.7
##   Detection Pval 2083-5 Detection Pval 2009-5 Detection Pval 2098-1
## 1        0.00000  337.9        0.00000  483.9        0.00000  194.9
## 2        0.53117    8.7        0.22078   -8.0        0.73377    5.4
## 3        0.82857  -17.1        0.95844   -6.5        0.70000  -16.7
## 4        0.78312    5.3        0.33117    1.1        0.44675    6.6
## 5        0.01818   40.2        0.00000   25.0        0.02208   46.4
## 6        0.06623   17.4        0.05844   17.8        0.06623   26.1
##   Detection Pval 2098-3 Detection Pval 2083-1 Detection Pval 2004-5
## 1        0.00000  293.9        0.00000  417.6        0.00000  315.8
## 2        0.32597    7.7        0.34805   15.5        0.09351   -8.3
## 3        0.90519  -10.0        0.69091    2.3        0.41688  -20.2
## 4        0.29740  -10.1        0.69091    7.9        0.23247    0.1
## 5        0.00130   21.2        0.07662   28.2        0.01039   32.1
## 6        0.02208   22.7        0.06364   24.1        0.02468   27.4
##   Detection Pval 2065-3 Detection Pval 2065-5 Detection Pval 2004-1
## 1        0.00000  166.2        0.00000  265.9        0.00000  432.9
## 2        0.70779    2.7        0.35714   -0.6        0.50519    2.7
## 3        0.94286   -3.3        0.62597    3.1        0.39481   -9.1
## 4        0.49481    3.4        0.32857   17.3        0.06364    4.6
## 5        0.00779   33.1        0.00000   56.7        0.00000   13.5
## 6        0.02597   26.7        0.00260   10.7        0.15714    5.1
##   Detection Pval 2065-4 Detection Pval 2030-2 Detection Pval 2030-1
## 1        0.00000  239.8        0.00000  244.5        0.00000  298.1
## 2        0.34935   -3.5        0.57922   15.9        0.06753    6.4
## 3        0.85714   -5.2        0.62597   -9.2        0.80000   -5.0
## 4        0.29351   -2.5        0.55844    8.6        0.18571    5.4
## 5        0.06494   26.4        0.02078   31.3        0.00649   51.5
## 6        0.27792   16.1        0.11558   17.4        0.05455   21.1
##   Detection Pval 2098-5 Detection Pval 2009-3 Detection Pval 2009-2
## 1        0.00000  128.8        0.00000  410.7        0.00000  515.7
## 2        0.29351   12.9        0.12208  -12.0        0.86234    6.2
## 3        0.62727  -11.9        0.83377  -11.6        0.85325   -1.7
## 4        0.32468   -9.1        0.75325    8.6        0.21299  -12.4
## 5        0.00000   21.1        0.03247    6.3        0.26883   22.1
## 6        0.04416    7.8        0.23377   -3.2        0.57532   18.6
##   Detection Pval 2098-4 Detection Pval
## 1        0.00000  164.3        0.00000
## 2        0.28571    1.5        0.38961
## 3        0.54805   -0.4        0.47403
## 4        0.84156    0.9        0.41558
## 5        0.03766   22.9        0.00909
## 6        0.06883    8.8        0.16883

2.1 读取数据

library(limma)
data.non <- read.ilmn("GSE40553_non-normalized_UKLong.txt",probeid = "REF_ID",other.columns = "Detection Pval",sep = "\t",expr = "20") #注意expr必须指明,否则报错
## Reading file GSE40553_non-normalized_UKLong.txt ... ...

2.2 预处理

data.exp <- neqc(data.non,detection.p = "Detection Pval") #质量控制
dim(data.exp)
## [1] 47323    34
data.exp1 <- data.exp$E
colors <- rainbow(ncol(data.exp1)*1.2)
boxplot(data.exp1,col=colors,las=3,mian="neqc-after")

可见neqc()进行了背景校正、标准化和log转换,还是比较方便的,剩下的就是走正常的差异分析和探针注释。参见Q:Microarry数据标准化流程?

2.3 探针过滤

data.non$other$`Detection Pval`[1:4,1:4]
##                 83-2    55-5    83-4    30-5
## ILMN_1802380 0.00000 0.00000 0.00000 0.00000
## ILMN_1893287 0.98571 0.48571 0.12468 0.03117
## ILMN_3238331 0.79091 0.58052 0.68701 0.95325
## ILMN_1736104 0.71818 0.06883 0.60649 0.56494
index <- rowSums(data.exp$other$`Detection Pval`<0.05)>=3
table(index)
## index
## FALSE  TRUE 
## 23425 23898
data.exp2 <- data.exp[index,]
dim(data.exp2)
## [1] 23898    34

参考链接:

用lumi包来处理illumina的bead系列表达芯片

lumi-对illumina的bead系列的表达谱和甲基化芯片标准化

R语言_illumia芯片数据预处理分析
illumina beadchip 芯片原始数据处理

上一篇下一篇

猜你喜欢

热点阅读