【论文阅读】User Profiling based Deep

2018-04-18 本文已影响124人大魔王是本人

Terminology

word2vec：

word2vec简介

doc2vec embedding：

Recommendation System

User profile

R: 浏览过的文章数量

rh：doc2vec embedding

U：user profile（用户概况）which accounts for both the short term as well as the long term interests of the users.

采用了三种形式来表示user profile以便理解时间上的模式。加了discount的模型是为了给最近阅读的文章更大权重，以前的权重更小。

质心表示

2进制discount

e指数discount

DSSM模型（Deep Structured Semantic Model）

模型overview。左边计算user profile，右边选一个正常数（item+，除开已读过的文章外再选一篇），n个负常数（随机取样没读过的文章）。

看做高斯分布不再合适，还要考虑潜在data和排序，改进loss函数为

给定user求点击一个item的后验概率。item+表示已经被点击的item，R()表示内积函数。

max probability

实验

1. Settings

数据集：CLEF NewsREEL 2017. 用gensim 来学习doc2vec embedding（size设为300）。数据集中77%为小于3篇，用10-15篇阅读量的user来train（for cold start problem），2-4篇的来test。>15篇的user在frequency（？）上变化比较大，所以不采用。

leave-one-out法来做evaluation，performance用HR@k（测试item是否在）top k list中；NDCG accounts for the position of the hit by assigning higher scores to hits at top ranks.

（矩阵分解）Baseline：BPR ，eALS ，NeuMF 等方法（待查）

用Keras 做，training集合和validation集合比例为4：1。全连接层的权重初始化用范围内均匀分布。batch size为256，梯度用adabelta。