VC-Dimension and Rademacher Comp

2020-05-20 本文已影响0人抄书侠

VC-Dimension和Rademacher complexity是机器学习中常提到的度量复杂的的概念，一直远观而没有亵玩，今天对这个概念进行学习记录。

VC-Dimension

全称为Vapnik-Chervonenkis dimension，从wiki上搞来一段定义

In Vapnik–Chervonenkis theory, the Vapnik–Chervonenkis (VC) dimension is a measure of the capacity (complexity, expressive power, richness, or flexibility) of a space of functions that can be learned by a statistical classification algorithm. It is defined as the cardinality of the largest set of points that the algorithm can shatter. It was originally defined by Vladimir Vapnik and Alexey Chervonenkis.^[1]

即：在VC 理论中，VC维度是一种度量由统计分类算法得到的函数空间的容量(复杂度，表达能力，丰富性或灵活性)的方法。它被定义为算法可以破坏的最大点集的基数。

集合族的VC维

令 $H$ 为集合族， $C$ 是一个集合。它们的交被定义为
$H \cap C:=\{h \cap C | h \in H\}$
我们说集合 $C$ 被 $H$ shatter，如果 $H\cap C$ 包含 $C$ 的所有子集，即：
$|H \cap C|=2^{|C|}$
$H$ 的VC维 $D$ 是被 $H$ 破坏的最大基数，如果任意大的子集能被破坏，那么VC维是 $\infty$ 。

分类模型的VC维

一个分类模型 $f$ 有着参数向量 $\theta$ 被称为shatter一个集合的数据点 $x_1,x_2,\ldots,x_n$ 如果对这些数据点所有赋予的标签，都存在 $\theta$ 使得 $f$ 在评估这个集合的数据点的时候无错误。
模型 $f$ 的VC维是能被 $f$ shatter的最大点数。

例子

例如一条直线在平面分类，那么最多二分类的点数为3，所以VC维为3.

Rademacher complexity

In computational learning theory (machine learning and theory of computation), Rademacher complexity, named after Hans Rademacher, measures richness of a class of real-valued functions with respect to a probability distribution.

在计算学习理论中(机器学习和计算理论)，Rademacher comlexity由Hans Rademacher命名，度量一类基于概率分布实值函数的丰富性。

集合的Rademacher comlexity

给定集合 $A\subset \mathbb{R}^m$ ， $A$ 的Rademacher complexity定义如下：
$\operatorname{Rad}(A):=\frac{1}{m} \mathbf{E}\left[\sup _{a \in A} \sum_{i=1}^{m} \sigma_{i} a_{i}\right]$
其中 $\sigma_1,\sigma_2,\ldots,\sigma_m$ 是Rademacher分布得到的独立随机变量。 $\operatorname{Pr}\left(\sigma_{i}=+1\right)=\operatorname{Pr}\left(\sigma_{i}=-1\right)=1 / 2$ for $i=1,2, \ldots, m,$ and $a=\left(a_{1}, \ldots, a_{m}\right)$ 。许多作者在取上界前取了和的绝对值，但是如果 $A$ 是对称的，那么这就没区别。

函数类的Rademacher comlexity

给定样本 $S=\left(z_{1}, z_{2}, \dots, z_{m}\right) \in Z^{m}$ ，类别 $F$ 定义在空间 $Z$ ， $F$ 的实验Rademacher complexity对给定的 $S$ 定义为：
$\operatorname{Rad}_{S}(F)=\frac{1}{m} \mathrm{E}\left[\sup _{f \in F} \sum_{i=1}^{m} \sigma_{i} f\left(z_{i}\right)\right]$
这也可以使用之前的定义来写：
$\operatorname{Rad}_{S}(F)=\operatorname{Rad}(F \circ S)$
其中 $F \circ S$ 表示函数复合，即：
$F \circ S:=\left\{\left(f\left(z_{1}\right), \ldots, f\left(z_{m}\right)\right) | f \in F\right\}$
令 $P$ 是 $Z$ 的概率分布。函数类 $F$ 的Rademacher complexity基于 $P$ 且样本大小为 $m$ 为：
$\operatorname{Rad}_{P, m}(F):=\mathrm{E}_{S \sim P^{m}}\left[\operatorname{Rad}_{S}(F)\right]$
其中上述期望都是从 $P$ 中采样独立同分布的样本