科研-统计学相关

How to understand FDR, adjusted

2018-10-13  本文已影响121人  OmicsAcademy

The thing to understand is that terms like FDR and q-value were defined in

specific ways by their original inventors but are used in more generic

ways by later researchers who adapt, modify or use the ideas.

The term "false discovery rate (FDR)" was created by Benjamini and

Hochberg in their 1995 paper.  They gave a particular definition of what

they meant by FDR.  Their procedure accepted or rejected hypotheses, but

did not produce adjusted p-values.

Benjamini and Yekutieli presented another more conservative algorithm to

control the FDR in a 2001 paper.  Same definition of FDR, but a different

algorithm.

In 2002, I re-interpreted the Benjamini and Hochberg (BH) and Benjamini

and Yekutieli (BY) procedures in terms of adjusted p-values.  I

implemented the resulting algorithms in the function p.adjust() in the

stats package, and used them in the limma package, and this lead to the

concept of an FDR adjusted p-value.  The terminology used by the

p.adjust() function and limma packages has lead people to refer to "BH

adjusted p-values".

The adjusted p-value definition that you give is essentially the same as

the BH adjusted p-value, except that you omitted the last step in the

procedure.  Your definition as it stands is not an increasing function of

the original p-values.

In 2002, John Storey created a new definition of "false discovery rate".

Storey's definition is based on Benjamini and Hochberg's original idea,

but is mathematically a bit more flexible.  John Storey also created the

terminology "q-value" for a quantity estimates his definition of FDR.  He

implemented q-value estimation procedures in an R package called qvalue.

So, strictly speaking, the q-value and the FDR adjusted p-value are

similar but not quite the same.  However the terms q-value and FDR

adjusted p-value are often used generically by the Bioconductor community

to refer to any quantity that controls or estimates any definition of the

FDR.  In this general sense the terms are synonyms.

The lesson to draw from this is that different methods and different

packages are trying to do slighty different things and give slightly

different results, and you should always cite the specific software and

method that you have used.

Best wishes

Gordon

Reference:

https://stat.ethz.ch/pipermail/bioconductor/2012-December/049902.html

上一篇下一篇

猜你喜欢

热点阅读