再读****(机器学习)Dynamic Mobile Edge

2018-02-28 本文已影响35人生病喝药水

Peng Yang∗, Ning Zhang†, Shan Zhang†, Li Yu∗, Junshan Zhang‡, and Xuemin (Sherman) Shen†
∗School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China
†Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Canada
‡School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, Arizona, USA
Email: ∗{yangpeng, hustlyu}@hust.edu.cn, †{n35zhang, s372zhan, sshen}@uwaterloo.ca, ‡junshan.zhang@asu.edu

本文主要围绕机器学习中的岭回归。
Linear model是为了获取线性回归系数做准备；Location differentiated content caching algorithm即是获取回归系数；regret analysis即是回归的显著性检验

Online learning algorithm

Hit rate definition

The number of content Requests

Linear model

为下文中的机器学习做准备.png

Problem definition: max->min

问题最初

F*nt定义

long-term regret定义

因本文中的hit rate的定义为content Requests rather than rate。最优的文件缓存策略中文件被请求次数与正常情况下的文件请求次数相差越少越好。

问题最终形式

Location differentiated content caching algorithm

A. Predicting content hit rate

缓存需要考虑update,更新时间在本文中用文件f在n路侧单元第几次缓存表示。即：文件在路侧单元的缓存频率

historical feature vectors of file.png

。实际如何更新并未有明确解释。

机器学习优化.png
其中，

经验值的定义.png

显著性检验部分仍有看不懂。。。再说吧

霍夫丁不等式(此处理解并不系统，待后再补。。。)

常见形式

image.png

N为样本容量。具体表示见下图。

image.png

对于多个预测点：
抽样存在很多情况，难免出现Bad sample（Ein和Eout相差很大的sample）。霍夫丁不等式说明针对一个h出现bad sample的几率很小。但是当有很多个h时，bad data就很可能出现（如课件中抛硬币的例子），当bad sample的Ein又很小时，我们作出选择时就会worse情况。Bad sample也就是Bad Data。

image.png

含义：给出二项分布均值偏离期望的概率上限(置信区间)

一般形式

image.png
证明过程没看懂。。。霍夫丁不等式及其证明
举例
从罐子中抽样100个小球，其中70个绿色，30个橙色。绿球的比例抽样统计值v=70%,以ϵ=10%为衡量标准，不等式变为：

P[|0.7-u|>0.1]<=2exp(-20.01100)=0.27

即：真实概率落在[0.6,0.8]区间内的概率为73%；有27%的概率，真实统计值与抽样统计值相差超过10%。