深度兴趣网络解析-DIN

2019-03-21 本文已影响0人康英永

一、源数据

1.1、训练数据

每行为一个样本，元素分别为用户id，用户访问的item id历史列表，用户本次访问的item（正样本为真实数据，负样本为随机生成数据），正负样本标识

1.2、测试数据

每行为一个样本，元素分别为用户id，用户访问的item id历史列表，（用户本次访问正样本item，系统随机生成的负样本item）

1.3、类别

每个元素为对应item的类别

1.4、其他参数

用户数目：192403

item数目：63001

cate数目：801

二、模型

2.1、模型结构

2.2、模型详解

2.2.1、输入参数

u：int，用户id列表，长度为batch_size

i：int，被推送的item

id列表，元素为曝光给每个用户的item的id，长度为batch_size

y：float，正负样本标识，0 or 1，长度为batch_size

hist_i：int，用户点击历史数据，元素为点击的历史item id，shape为batch_size*最大点击历史长度

sl：int，用户实际点击历史长度，长度为batch_size

lr：训练超参数

cate_iist：int，item类别转换表，元素为item的cate id，长度为item总数

2.2.2、训练参数

item_emb_w：item的embedding，shape为item数*64

item_b：item的bias，shape为item数

cate_emb_w：item类型的embedding，shape为cate数*64

2.2.3、整体流程

2.2.3.1、输入数据转换为嵌入向量形式

根据输入数据的id（item id，cate id），从相应的训练参数（嵌入表示查询表）获取匹配的嵌入表示。

c = tf.gather(cate_list, self.i) #被推送的item的类型列表

i_emb = tf.concat(values = [

tf.nn.embedding_lookup(item_emb_w, self.i),

tf.nn.embedding_lookup(cate_emb_w, ic),

],axis=1)

i_b = tf.gather(item_b, self.i)

hc = tf.gather(cate_list, self.hist_i)

h_emb = tf.concat([

tf.nn.embedding_lookup(item_emb_w, self.hist_i),

tf.nn.embedding_lookup(cate_emb_w, hc),

], axis=2)

2.2.3.2、由i_emb，h_emb和sl根据注意力机制生成用户的嵌入表示

用户的嵌入表示的格式如下：

2.2.3.3、将u_emb与i_emb进行拼接，并作为MLP的输入，最后输出user对item的pctr，并与真实的点击情况联合生成损失函数，用于优化

din_i = tf.concat([u_emb, i_emb], axis=-1)

din_i = tf.layers.batch_normalization(inputs=din_i, name='b1')

d_layer_1_i = tf.layers.dense(din_i, 80, activation=None, name='f1')

d_layer_1_i = dice(d_layer_1_i, name='dice_1')

d_layer_2_i = tf.layers.dense(d_layer_1_i, 40, ctivation=None, name='f2')

d_layer_2_i = dice(d_layer_2_i, name='dice_2')

d_layer_3_i = tf.layers.dense(d_layer_2_i, 1, activation=None, name='f3')

self.logits = i_b + d_layer_3_i

self.loss = tf.reduce_mean(

tf.nn.sigmoid_cross_entropy_with_logits(

logits=self.logits,

labels=self.y)

)

2.2.4、基于注意力机制的用户嵌入表示

'''

B指batch的大小，T指用户历史行为的最大长度，H指embedding的长度

queries: [B, H]

keys: [B, T, H]

keys_length: [B]

'''

queries_hidden_units = queries.get_shape().as_list()[-1] #最后一维大小queries = tf.tile(queries, [1, tf.shape(keys)[1]])

queries = tf.reshape(queries, [-1, tf.shape(keys)[1], queries_hidden_units])

din_all = tf.concat([queries, keys, queries-keys, queries*keys], axis=-1) #用户点击历史与推荐的item拼接生成MLP的输入

d_layer_1_all = tf.layers.dense(din_all, 80, activation=tf.nn.sigmoid, name='f1_att', reuse=tf.AUTO_REUSE)

d_layer_2_all = tf.layers.dense(d_layer_1_all, 40, activation=tf.nn.sigmoid, name='f2_att', reuse=tf.AUTO_REUSE)

d_layer_3_all = tf.layers.dense(d_layer_2_all, 1, activation=None, name='f3_att', reuse=tf.AUTO_REUSE)

d_layer_3_all = tf.reshape(d_layer_3_all, [-1, 1, tf.shape(keys)[1]])

outputs = d_layer_3_all

由queries（i_emb）拼贴成keys（h_emb）的形状，并与keys（h_emb）进行差、点乘运算后生成MLP的输入（din_all），经MLP计算生成与各点击历史相关的权重系数。

# Mask

key_masks = tf.sequence_mask(keys_length, tf.shape(keys)[1]) # [B, T]

key_masks = tf.expand_dims(key_masks, 1) # [B, 1, T]

paddings = tf.ones_like(outputs) * (-2 ** 32 + 1)

outputs = tf.where(key_masks, outputs, paddings) # [B, 1, T]

# Scale

outputs = outputs / (keys.get_shape().as_list()[-1] ** 0.5)

# Activation

outputs = tf.nn.softmax(outputs) # [B, 1, T]

# Weighted sum

outputs = tf.matmul(outputs, keys) # [B, 1, H]

hist_i =outpus hist_i = tf.layers.batch_normalization(inputs = hist_i)

hist_i = tf.reshape(hist_i, [-1, hidden_units], name='hist_bn')

hist_i = tf.layers.dense(hist_i, hidden_units, name='hist_fcn')

u_emb = hist_i

由于每个用户的点击历史是不一样长的，需要使用一个mask张量对无效的数据进行屏蔽。

在得到在推item与历史点击item的相关系数后，与点击历史的item_emb进行权重和后进行归一化处理可以得到用户的嵌入表示。

2.2.5、dice激活

def dice(_x,axis=-1,epsilon=0.0000001,name=''):

alphas = tf.get_variable('alpha'+name,_x.get_shape()[-1],initializer = tf.constant_initializer(0.0), dtype=tf.float32)

input_shape =list(_x.get_shape()) # [batch_size, hidden_unit_size]

reduction_axes = list(range(len(input_shape))) # [0, 1]

del reduction_axes[axis] # [0]

broadcast_shape = [1] * len(input_shape) # [1, 1]

broadcast_shape[axis] = input_shape[axis] # [1, hidden_unit_size]

# case: train mode (uses stats of the current batch)

mean = tf.reduce_mean(_x, axis=reduction_axes) # [1 * hidden_unit_size]

brodcast_mean = tf.reshape(mean, broadcast_shape) # [1 * hidden_unit_size]

std = tf.reduce_mean(tf.square(_x - brodcast_mean) + epsilon, axis=reduction_axes) # [1 * hidden_unit_size]

std = tf.sqrt(std)

brodcast_std = tf.reshape(std, broadcast_shape) #[1 * hidden_unit_size]

# x_normed = (_x - brodcast_mean) / (brodcast_std + epsilon)

x_normed = tf.layers.batch_normalization(_x, center=False, scale=False, training=True) # a simple way to use BN to calculate x_p

x_p = tf.sigmoid(x_normed)

return alphas * (1.0 - x_p) * _x + x_p * _x

不同alpha值得dice随y值变化图如下：

ReLU和Leaky ReLU的图示如下：

三、参考文献

1、Deep Interest Network for Click-Through Rate Prediction

2、github：https://github.com/zhougr1993/DeepInterestNetwork

3、https://www.jianshu.com/p/73b6f5d00f46