FM因子分解机原理小结与tensorflow工程实现
模型除了单特征之外,往往要进行特征交叉组合,比较常用的FM系列和Tree系列来解决特征交叉
二阶特征交叉的线性模型的表达式
二阶线性回归.png1.在线性回归的基础上增加二阶交互项,二阶作用只包含上三角,没有斜对角的平方项
2.表达式是一阶系数和一阶特征,以及二阶系数和二阶特征的加权求和
3.这个表达式可以做回归和分类,做分类的时候对表达式加上sigmoid或者softmax
二阶线性模型的参数求解
1.假设模型有n个特征,则二阶交互作用之后模型一共有 个参数,分别有 1个b,n个一阶特征, 个二阶特征
2.模型正向传播的预测值和真实值形成残差loss,对loss求梯度× 学习率得到权重w的更新值,因为梯度中特征值的乘项,所以只要有一个特征为0,则交叉作用也为0,梯度的更新值等于0,w无法更新
3.当数据离散特征较多时,大多特征onehot后都是0,再者为了更好地特征交叉,连续变量可能被离散化分箱处理,因此暴力求解模型学习不准
FM的表达式
FM表达式.png1.FM将对称矩阵w分解为VT*V,来解决二阶特征系数难以求解问题
2.FM的表达式如下
3.设一个 n × k 的对称矩阵,其中n是一阶特征个数作为矩阵的行维度,k作为一个超参数是矩阵的列维度,对矩阵中每一个行列节点进行随机初始化,则他和自己的对称矩阵(k × n)相乘得到一个n × n的矩阵,以此作为二阶系数的初始值,用numpy模拟一下过程
import numpy as np
X = np.random.rand(12).reshape(4, 3) # (4, 3)
# array([[0.70977022, 0.88085034, 0.9889709 ],
# [0.81632231, 0.85085403, 0.11926545],
# [0.52438335, 0.59454945, 0.29532991],
# [0.56476118, 0.66161352, 0.3078488 ]])
X.T # (3, 4)
# array([[0.70977022, 0.81632231, 0.52438335, 0.56476118],
# [0.88085034, 0.85085403, 0.59454945, 0.66161352],
# [0.9889709 , 0.11926545, 0.29532991, 0.3078488 ]])
np.dot(X, X.T) # (4, 4)
# array([[2.25773453, 1.44682639, 1.18797346, 1.28808668],
# [1.44682639, 1.40455894, 0.96916328, 1.06067941],
# [1.18797346, 0.96916328, 0.71568671, 0.78043028],
# [1.28808668, 1.06067941, 0.78043028, 0.85145853]])
FM的参数求解
FM公式变换.png1.需要估计的参数有1 + n + nk个,分别是1个常数项,n个一阶系数,nk个二阶系数,计算的时间复杂度是O(kn2),因为每个二阶系数的运算需要把一行向量乘以一列向量需要进行k次再相加,一共有 个二阶项,可以通过变换将计算复杂度变为O(kn),公式略
2.FM的梯度公式,分别对应常数项,一次项,二次项
FM梯度计算.png
FM和GBDT对比
GBDT:
1.对连续变量有超级强的分割能力,自动寻找非线性分界面,但是当特征稀疏时,这个优势荡然无存
2.当特征维度很高时,因为GBDT的深度有限,很难照顾到所有特征,容易对某些特征复用,不能充分利用所有特征
3.如果某组二阶特征区分度很强,但是两个单阶特征区分度很弱,GBDT可能无法挖掘这个二阶特征,因为每一层采用贪心策略,只采用一个照顾到所有样本的特征进行分割,并没有考虑两个特征同时出现的区分度,当特征维度大的时候尤为明显,很难顾及到琐碎的二阶特征。
FM:
1.照顾到所有特征和二阶交互项,能够很好的挖掘特征的二阶交叉组合
2.适合高维稀疏特征
3.FM不论训练还是预测,复杂度随着特征增长都是线性的
tensorflow实现FM二分类预测客户流失
FM模型类
定义模型的网络结构,输出值包一层sigmoid进行二分类预测
class FM(object):
def __init__(self, feature_size, fm_v_size=8, loss_fuc="Cross_entropy", train_optimizer="Adam",
learning_rate=0.1, reg_type="l2_reg", reg_param_w=0.0, reg_param_v=0.0, decaylearning_rate=0.9):
self.input_x = tf.placeholder(tf.float32, [None, feature_size], name="input_x")
self.input_y = tf.placeholder(tf.float32, [None, 1], name="input_y")
self.global_step = tf.Variable(0, name="global_step", trainable=False)
with tf.name_scope("fm_layers"):
# 一阶系数
FM_W = tf.get_variable(shape=[feature_size], initializer=tf.glorot_normal_initializer(), name="fm_beta1")
# 二阶交互项,n×k
FM_V = tf.get_variable(shape=[feature_size, fm_v_size], initializer=tf.glorot_normal_initializer, name="fm_beta2")
# 常数项
FM_B = tf.Variable(tf.constant(0.0), dtype=tf.float32, name="fm_bias") # W0
# 一阶相乘
Y_first = tf.multiply(FM_W, self.input_x)
# 二阶交互作用
embeddings = tf.multiply(FM_V, tf.reshape(self.input_x, (-1, feature_size, 1))) # None * V * X
summed_features_emb = tf.reduce_sum(embeddings, 1) # sum(v*x)
summed_features_emb_square = tf.square(summed_features_emb) # (sum(v*x))^2
squared_features_emb = tf.square(embeddings) # (v*x)^2
squared_sum_features_emb = tf.reduce_sum(squared_features_emb, 1) # sum((v*x)^2)
Y_second = 0.5 * tf.subtract(summed_features_emb_square, squared_sum_features_emb) # 0.5*((sum(v*x))^2 - sum((v*x)^2))
# 一阶 + 二阶 + 偏置
FM_out_lay1 = tf.concat([Y_first, Y_second], axis=1) # out = W * X + Vij * Vij* Xij
y_out = tf.reduce_sum(FM_out_lay1, 1)
y_d = tf.reshape(y_out, shape=[-1]) # out = out + bias
y_bias = FM_B * tf.ones_like(y_d, dtype=tf.float32) # Y_bias
self.output = tf.add(y_out, y_bias, name='output')
with tf.name_scope("predict"):
self.logit = tf.nn.sigmoid(self.output, name='logit')
self.auc_score = tf.metrics.auc(self.input_y, self.logit)
with tf.name_scope("loss"):
if reg_type == 'l1_reg':
regularization = tf.contrib.layers.l1_regularizer(reg_param_w)(FM_W) + \
tf.contrib.layers.l1_regularizer(reg_param_v)(FM_V)
elif reg_type == 'l2_reg':
regularization = reg_param_w * tf.nn.l2_loss(FM_W) + reg_param_v * tf.nn.l2_loss(FM_V)
else:
regularization = reg_param_w * tf.nn.l2_loss(FM_W) + reg_param_v * tf.nn.l2_loss(FM_V)
if loss_fuc == 'Squared_error':
self.loss = tf.reduce_mean(tf.reduce_sum(tf.square(self.input_y - self.output),
reduction_indices=[1])) + regularization
elif loss_fuc == 'Cross_entropy':
self.loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits=tf.reshape(self.output, [-1]),
labels=tf.reshape(self.input_y, [-1]))) + regularization
with tf.name_scope("optimizer"):
if decaylearning_rate != 1:
learning_rate = tf.train.exponential_decay(learning_rate, self.global_step, 100, decaylearning_rate)
if train_optimizer == 'Adam':
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
elif train_optimizer == 'Adagrad':
optimizer = tf.train.AdagradOptimizer(learning_rate=learning_rate)
elif train_optimizer == 'Momentum':
optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.95)
self.train_step = optimizer.minimize(self.loss, global_step=self.global_step)
with tf.name_scope("summaries"):
tf.summary.scalar("loss", self.loss)
tf.summary.scalar("auc", self.auc_score[0])
tf.summary.histogram("FM_W", FM_W)
tf.summary.histogram("FM_V", FM_V)
self.summary_op = tf.summary.merge_all()
数据输入
训练数据提前处理成libsvm,需要一个onehot的对照文件churn_featindex.txt和csv转libsvm的脚本libsvm_transform.py,转化好的格式如下,第一列是标签,第一个特征的位置索引从0开始,一共有186个特征:
head -2 churn_train_sample.svm
1 1:1 7:1 13:1 21:1 28:1 34:1 42:1 55:1 61:1 67:1 76:1 81:1 86:1 93:1 98:1 104:1 109:1 115:1 120:1 125:1 131:1 137:1 146:1 148:1 151:1 154:1 158:1 160:1 163:1 166:1 169:1 172:1 175:1 178:1 181:1 184:1
0 5:1 7:1 15:1 22:1 31:1 36:1 39:1 59:1 62:1 69:1 76:1 81:1 86:1 94:1 99:1 106:1 110:1 115:1 121:1 125:1 131:1 137:1 143:1 148:1 151:1 154:1 157:1 160:1 164:1 166:1 169:1 173:1 175:1 179:1 182:1 185:1
通过sklearn.datasets下的load_svmlight_file直接读取成系数矩阵,分别读取训练集和测试集
x_train, y_train = load_svmlight_file("./churn_train.svm", zero_based=True)
x_test, y_test = load_svmlight_file("./churn_test.svm", zero_based=True)
模型参数设定
feature_size: onehot之后的特征维度
fm_v_size: 隐向量维度,默认8
loss_fuc: 损失函数,默认交叉熵
train_optimizer: 优化算法,默认Adam
learning_rate: 学习率,默认0.1
reg_type: 正则化算法,默认L2
reg_param_w: 一阶正则因子,默认0
reg_param_v: 二阶正则因子,默认0
decaylearning_rate: 学习率衰减比率,默认0.9
epoches: 样本 复制次数,默认100
batch_size: 一个批次的训练数量,2048
FLAGS = tf.app.flags.FLAGS
# model参数
tf.app.flags.DEFINE_integer("feature_size", 186, "number of fields")
tf.app.flags.DEFINE_integer("fm_v_size", 8, "number of implicit vector dimensions")
tf.app.flags.DEFINE_string("loss_fuc", "Cross_entropy", "loss function")
tf.app.flags.DEFINE_string("train_optimizer", "Adam", "optimizer method")
tf.app.flags.DEFINE_float("learning_rate", 0.1, "initial learning rate")
tf.app.flags.DEFINE_string("reg_type", "l2_reg", "regularization method")
tf.app.flags.DEFINE_float("reg_param_w", 0.0, "first order beta regularization param")
tf.app.flags.DEFINE_float("reg_param_v", 0.0, "second order beta regularization param")
tf.app.flags.DEFINE_float("decaylearning_rate", 0.9, "decay learning rate param")
# data参数
tf.app.flags.DEFINE_integer("epoches", 100, "number of data repeat time")
tf.app.flags.DEFINE_integer("batch_size", 2048, "number of train data each batch")
tf.reset_default_graph()
model = FM(FLAGS.feature_size, FLAGS.fm_v_size, FLAGS.loss_fuc, FLAGS.train_optimizer, FLAGS.learning_rate, FLAGS.reg_type, FLAGS.reg_param_w, FLAGS.reg_param_v, FLAGS.decaylearning_rate)
训练模型
with tf.Session() as sess:
init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer(), tf.tables_initializer())
sess.run(init_op)
shutil.rmtree("./FM_churn.log", ignore_errors=True)
writer = tf.summary.FileWriter("./FM_churn.log", sess.graph)
batches = get_batch(FLAGS.epoches, FLAGS.batch_size)
x_test, y_test = load_svmlight_file("./churn_test.svm", zero_based=True)
for x_batch, y_batch in batches:
feed_dict = {model.input_x: x_batch.toarray(), model.input_y: np.reshape(y_batch, [-1, 1])}
train_op = [model.train_step, model.global_step, model.loss, model.auc_score, model.summary_op]
_, step, loss_val, auc_val, merged = sess.run(train_op, feed_dict=feed_dict)
writer.add_summary(merged, step)
if step % 100 == 0:
print("step:", step, "loss:", loss_val, "auc:", auc_val[0])
if step % 1000 == 0:
feed_dict = {model.input_x: x_test.toarray(), model.input_y: np.reshape(y_test, [-1, 1])}
loss_val, auc_val = sess.run([model.loss, model.auc_score], feed_dict=feed_dict)
print("[evaluation]", "loss:", loss_val, "auc:", auc_val[0])
print(" ")
模型训练过程
step: 76100 loss: 0.5005622 auc: 0.82709
step: 76200 loss: 0.50755 auc: 0.8270913
step: 76300 loss: 0.48795617 auc: 0.8270925
step: 76400 loss: 0.5073022 auc: 0.8270925
step: 76500 loss: 0.5022451 auc: 0.8270947
step: 76600 loss: 0.5266277 auc: 0.8270936
step: 76700 loss: 0.50896007 auc: 0.8270941
step: 76800 loss: 0.46825206 auc: 0.8270943
step: 76900 loss: 0.49328235 auc: 0.8270949
step: 77000 loss: 0.5090138 auc: 0.82709527
[evaluation] loss 0.4988083 auc: 0.82709527
模型的接口部署
使用docker的tensorflow_model_server镜像部署模型为restful接口
docker run -t --rm -p 8501:8501 -v "/****/customer_churn_prediction/FM/fm_csv/FM_churn.pb:/models/FM/" -e MODEL_NAME=FM tensorflow/serving
接口测试
curl -d '{"instances": [{"input_x": [0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0,0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,0,0,0,1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0,1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0,0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0,0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0,0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1,0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0]}], "signature_name":"my_signature"}' -X POST http://localhost:8501/v1/models/FM:predict
{
"predictions": [0.472961
]
完整代码
整理在https://github.com/xiaogp/customer_churn_prediction/tree/master/FM/fm_csv