DL notes[3]: classifier signific

2019-02-16 本文已影响0人 isSen

0. Cross Validation

k-fold cross-validation randomly divides the data into k blocks of roughly equal size. Each of the blocks is left out in turn and the other k-1 blocks are used to train the model.
Repeated k-fold CV does the same as above but more than once, producing different splits in each repetition. For example, five repeats of 10-fold CV would give 50 total resamples that are averaged. Note this is not the same as 50-fold CV.

# RepeatedKFold repeats K-Fold n times. 
# It can be used when one requires to run KFold n times, 
# producing different splits in each repetition.
# https://scikit-learn.org/stable/modules/cross_validation.html
import numpy as np
from sklearn.model_selection import RepeatedKFold
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
random_state = 12883823
rkf = RepeatedKFold(n_splits=2, n_repeats=2, random_state=random_state)
for train, test in rkf.split(X):
    print("%s %s" % (train, test))

Leave Group Out cross-validation (LGOCV), aka Monte Carlo CV, randomly leaves out some set percentage of the data B times. It is similar to min-training and hold-out splits but only uses the training set.
The bootstrap takes a random sample with replacement from the training set B times. Since the sampling is with replacement, there is a very strong likelihood that some training set samples will be represented more than once. As a consequence of this, some training set data points will not be contained in the bootstrap sample. The model is trained on the bootstrap sample and those data points not in that sample are predicted as hold-outs.

http://appliedpredictivemodeling.com/blog/2014/11/27/vpuig01pqbklmi72b8lcl3ij5hj2qm
https://stats.stackexchange.com/questions/218060/does-repeated-k-fold-cross-validation-give-the-same-answers-each-time

1. Nonparametric Statistical Significance Tests in Python

Friedman Test
If the samples are paired in some way, such as repeated measures, then the Kruskal-Wallis H test would not be appropriate. Instead, the Friedman test can be used, named for Milton Friedman.

The Friedman test is the nonparametric version of the repeated measures analysis of variance test, or repeated measures ANOVA. The test can be thought of as a generalization of the Kruskal-Wallis H Test to more than two samples.

# Friedman test
from numpy.random import seed
from numpy.random import randn
from scipy.stats import friedmanchisquare
# seed the random number generator
seed(1)
# generate three independent samples
data1 = 5 * randn(100) + 50
data2 = 5 * randn(100) + 50
data3 = 5 * randn(100) + 52
# compare samples
stat, p = friedmanchisquare(data1, data2, data3)
print('Statistics=%.3f, p=%.3f' % (stat, p))
# interpret
alpha = 0.05
if p > alpha:
    print('Same distributions (fail to reject H0)')
else:
    print('Different distributions (reject H0)')

https://machinelearningmastery.com/nonparametric-statistical-significance-tests-in-python/

2. Wilcoxon signed-rank test

The Wilcoxon signed-rank test is the non-parametric univariate test which is an alternative to the dependent t-test. It also is called the Wilcoxon T test, most commonly so when the statistic value is reported as a T value.

from scipy import stats
stats.wilcoxon(df['bp_before'], df['bp_after'])

https://pythonfordatascience.org/wilcoxon-sign-ranked-test-python/

3. Post-hoc Test

假设我们通过Friedman test发现有统计学显著(p<0.05)，那么我们还需要继续做事后分析(post-hoc)。换句话说，Friedman test只能告诉我们算法间是否有显著差异，而不能告诉我们到底哪些算法间有性能差异。想要定位具体的差异算法，还需要进行post-hoc分析。
python package: scikit-posthocs, Orange

import scikit_posthocs as sp
import numpy as np
tot_data = np.array([[1,2,3], [4,5,6,7],[6,7,8,9,10])
sp.posthoc_nemenyi(tot_data)

即便Friedman test发现有统计学显著，直接用Nemernyi/conover Test/Wilcoxon signed-rank test未必能得到相同的结论(即p<0.05)，可以手动的两两比较。
https://www.zhihu.com/question/27306416/answer/372241948

DL notes[3]: classifier signific

0. Cross Validation

1. Nonparametric Statistical Significance Tests in Python

2. Wilcoxon signed-rank test

3. Post-hoc Test

猜你喜欢

热点阅读