R语言之数据标准化方法大全
2019-04-21 本文已影响53人
Oodelay
decostand
是群落生态学中常用的工具包,提供了很多主流且高效的数据标准化方法。
基本语法
decostand (x,method, MARGIN, range.global,logbase = 2, na.rm = FALSE,...)
标准化,和转化相反,是求相对值,旨在降低数据之间因量级、单位等差异而带来的数据异质性。
示例
(dat = matrix(sample(seq(100)),nrow = 20,dimnames = list(paste0('OTU_',seq(20)),paste0('smp',seq(5)))))
library('vegan')
library('dplyr')
data:image/s3,"s3://crabby-images/8f4b3/8f4b3dcdee27fae785b4ff3e3997cf3ba9a35b01" alt=""
- 除以和,转化后,加和为1
decostand(dat,'total') %>% rowSums()
decostand(dat,'total',2) %>% colSums()
data:image/s3,"s3://crabby-images/0552f/0552f7b3f4d3449ccfc62c86d55892b047123084" alt=""
data:image/s3,"s3://crabby-images/fbec1/fbec13fdb1bc1158c77a463e22c7cf3a6bdf483e" alt=""
- 1.1 其他方法可实现对列除以和,并使列和为1
t(t(df)/colSums(df))
dat/matrix(rep(colSums(dat),nrow(dat)), nrow = nrow(dat), byrow = T)
sweep(dat,2,colSums(dat),`/`)
scale(dat, center=FALSE, scale=colSums(dat))
- 除以最大值
decostand(dat,'max') %>% summary()
decostand(dat,'max',1) %>% summary()
data:image/s3,"s3://crabby-images/0165f/0165fe6565936fb91d83c298552c6f733d3d0a67" alt=""
data:image/s3,"s3://crabby-images/886f2/886f23ce62cebc3b3258f13c498e3a83cb7ce9a5" alt=""
- 均值为1
decostand(dat,'frequency') %>% colMeans()
decostand(dat,'frequency',1) %>% rowMeans()
data:image/s3,"s3://crabby-images/4f4fc/4f4fcd41a62fefdc1059bed9e2e8fe5b2cc74e27" alt=""
data:image/s3,"s3://crabby-images/13671/13671ce2d3f483778b0ba8445f4343a8282c0c3d" alt=""
- 平方和为1
decostand(dat,'normalize') %>% apply(1,function(x) sum(x^2))
decostand(dat,'normalize',2) %>% apply(2,function(x) sum(x^2))
data:image/s3,"s3://crabby-images/a947c/a947cb0ccbdcf353071ddf5612b19bbc2aefadc0" alt=""
data:image/s3,"s3://crabby-images/9f55d/9f55d5170db180a8731dceb463ae6d036e91f9c6" alt=""
- 归一化为0~1
decostand(dat,'range') %>% summary() #apply(dat, 2, function (x) (max(x)-x)/(max(x)-min(x)))
decostand(dat,'range',1) %>% summary() #apply(dat, 1, function (x) (max(x)-x)/(max(x)-min(x)))
data:image/s3,"s3://crabby-images/06eeb/06eeb1cc58331dae882bccdec5a66717e0e90529" alt=""
data:image/s3,"s3://crabby-images/9d83e/9d83e05af867d04fa2d2fd2ca53ae2d0b6e90d2d" alt=""
- z-score转化,均值为0,方差为1
decostand(dat, 'standardize') %>% summary()
decostand(dat, 'standardize',1) %>% summary()
data:image/s3,"s3://crabby-images/b1b3c/b1b3cbc79fa78e948a2c956fb92e52065f6f597b" alt=""
data:image/s3,"s3://crabby-images/9bec9/9bec9159e1c80a639ab5b5542e7c00df70064b8a" alt=""
- chi.square 卡方,先每行差异行和,再每列除以列和平方根,最后除以矩阵和的平方根
decostand(dat,'chi.square')
# (dat / rowSums(dat)) %*% diag(1/sqrt(colSums(dat))) * sqrt(sum(dat))
data:image/s3,"s3://crabby-images/8f60d/8f60d8fda763bfbd8ebe5c698be1d5e614523a20" alt=""
data:image/s3,"s3://crabby-images/63aa3/63aa377c7a785296e9f35241a5beeee67f781ec3" alt=""
- log
decostand(dat,'log') %>% summary()
data:image/s3,"s3://crabby-images/84f5d/84f5dda133bd2531e78eaca09bde6ccbe074d6b4" alt=""
- 以排序序号替代具体数值
decostand(dat,'rank',2)
data:image/s3,"s3://crabby-images/c7389/c7389af4e725fc77c8327d74478d5ac984a03e1f" alt=""
- 转化为二进制0/1
decostand(dat,'pa') %>% summary()
data:image/s3,"s3://crabby-images/73977/7397743d774477bf475c3345ae62603bfda459f4" alt=""