差异分析:fold change(差异倍数)、P-value(差
在做基因表达分析时必然会要做差异分析(DE)
DE的方法主要有两种:
- Fold change
- t-test
fold change的意思是样本质检表达量的差异倍数,log2 fold change的意思是取log2,这样可以可以让差异特别大的和差异比较小的数值,之间的差距缩小。
Let's say there are 50 read counts in control and 100 read counts in treatment for gene A. This means gene A is expressing twice in treatment as compared to control (100 divided by 50 =2) or fold change is 2. This works well for over expressed genes as the number directly corresponds to how many times a gene is overexpressed. But when it is other way round (i.e, treatment 50, control 100), the value of fold change will be 0.5 (all underexpressed genes will have values between 0 to 1, while overexpressed genes will have values from 1 to infinity). To make this leveled, we use log2 for expressing the fold change. I.e, log2 of 2 is 1 and log2 of 0.5 is -1.
Q-value,是P-value校正值,P值是统计差异的显著性的。Q值比P值更严格的一种统计。
p-value 就是 t-test 来的:
The mean UMI counts per cell of this gene in cluster i
The log2 fold-change of this gene's expression in cluster i relative to other clusters
The p-value denoting significance of this gene's expression in cluster i relative to other clusters, adjusted to account for the number of hypotheses (i.e. genes) being tested.
10x流程解释
The differential expression analysis seeks to find, for each cluster, genes that are more highly expressed in that cluster relative to the rest of the sample. Here a differential expression test was performed between each cluster and the rest of the sample for each gene.
The Log2 fold-change (L2FC) is an estimate of the log2 ratio of expression in a cluster to that in all other cells. A value of 1.0 indicates 2-fold greater expression in the cluster of interest.
The p-value is a measure of the statistical significance of the expression difference and is based on a negative binomial test. The p-value reported here has been adjusted for multiple testing via the Benjamini-Hochberg procedure.
In this table you can click on a column to sort by that value. Also, in this table genes were filtered by (Mean UMI counts > 1.0) and the top N genes by L2FC for each cluster were retained. Genes with L2FC < 0 or adjusted p-value >= 0.10 were grayed out. The number of top genes shown per cluster, N, is set to limit the number of table entries shown to 10000; N=10000/K^2 where K is the number of clusters. N can range from 1 to 50. For the full table, please refer to the "differential_expression.csv" files produced by the pipeline.