R package:xcms(三):原始数据的检查
2024-04-14 本文已影响0人
佳名
library("xcms")
myfiles <- list.files(pattern = "^neg")
myfiles
pd <- data.frame(sample_name = sub(basename(myfiles), pattern = ".mzML",replacement = "",
fixed = TRUE),sample_group = c(rep("FA", 4), rep("HFD", 4)),stringsAsFactors = FALSE)
pd
data_raw <- readMSData(files = myfiles, pdata = new("NAnnotatedDataFrame", pd),mode = "onDisk")
定义分组颜色
group_colors <- c("blue","red")
names(group_colors) <- c("FA", "HFD")
tc <- split(tic(raw_data), f = fromFile(raw_data))
type(tc)
#[1] "list"
tic这个函数用于计算每个样本所有保留时间内的总的离子强度(包括二级质谱),split用于将计算得到的总离子强度根据样本拆分为列表。
par(mfrow = c(1, 1))
boxplot(tc, col = group_colors[raw_data$sample_group],
ylab = "intensity", main = "Total ion current")
TIC boxplot.png
tc <- spectra(raw_data) |>
tic() |>
split(f = fromFile(raw_data))
boxplot(tc, col = group_colors[sampleData(raw_data)$sample_group],
ylab = "intensity", main = "Total ion current")
计算样品之间的相关性
bpis <- chromatogram(raw_data, aggregationFun = "max")
#aggregationFun也可以为sum
## Bin the BPC
bpis_bin <- MSnbase::bin(bpis, binSize = 2)
bpis为列表,一个列表代表一个样品,此时每个列表的长度并不一样长,所以需要将bpi的每个列表按同等长度取样,取样间隔为2秒。
## Calculate correlation on the log2 transformed base peak intensities
cormat <- cor(log2(do.call(cbind, lapply(bpis_bin, intensity))))
将bpis_bin列表中的 intensity提取出来按列组成矩阵,求对数,然后求矩阵的相关系数。
#列名等于行名等于样品名
colnames(cormat) <- rownames(cormat) <- raw_data$sample_name
## Define which phenodata columns should be highlighted in the plot
ann <- data.frame(group = raw_data$sample_group)
rownames(ann) <- raw_data$sample_name
## Perform the cluster analysis
pheatmap(cormat, annotation = ann,
annotation_color = list(group = group_colors))
pheatmap.png
参考资料:
LC-MS data preprocessing and analysis with xcms (bioconductor.org)