FDR / z score介绍
2022-03-13 本文已影响0人
期待未来
- z score:在实际的数据分析中,如果要比较不同数据集(均值和标准差都不同)之间的数值,通常会引入z score的概念,z score 的计算方法是用某一数值减去均值在除以标准差。通过对原始数据进行z变换,我们将不同数据集转化为一个新的均值为0,标准差为1的分布。
- FDR (false discovery rate),很多的生信软件都在使用FDR对可能出现的大“假阳性”概率进行控制,比如RNA-seq的差异表达分析中,adjust p-value(padj )其实就是经过FDR校正的p值,FDR是为了控制多重比较中大量升高的假阳性概率, FDR就是为了控制多重比较中出现的假阳性概率。统计推断会有一定的假阳性,它和值相等,一般是5%,一般情况下5%都是可以接受的假阳性值。英文解释:When conducting multiple comparisons (e.g. thousands of hypothesis tests are often conducted simultaneously when analyzing results from genome-wide studies) there is an increased probability of false positives. While there are a number of approaches to overcome problems due to multiple testing, most of them attempt to reduce the p-value threshold from 5% to a more reasonable value. In 1995, Benjamini and Hochberg introduced the concept of the False Discovery Rate (FDR) as a way to allow inference when many tests are being conducted. The FDR is the ratio of the number of false positive results to the number of total positive test results: a p-value of 0.05 implies that 5% of all tests will result in false positives. An FDR-adjusted p-value (also called q-value) of 0.05 indicates that 5% of significant tests will result in false positives. In other words, an FDR of 5% means that, among all results called significant, only 5% of these are truly null.