工作生活

Mahout | 协同过滤算法

2019-07-02  本文已影响0人  icebreakeros

协同过滤

基于用户的协同过滤,和基于物品的协同过滤
Item CF适用于电子商务网站的推荐系统
User CF适用于新闻、博客或者微内容的推荐系统

优势

不足

Taste

TasteApache Mahout提供的一个协同过滤算法的高效实现

Taste基本架构

实例

数据包含三个文件:

/usr/local/mahout/examples/bin/factorize-movielens-1M.sh

if [[ -z "$MAHOUT_WORK_DIR" ]]; then
  WORK_DIR=/tmp/mahout-work-${USER}
else
  WORK_DIR=$MAHOUT_WORK_DIR
fi
mkdir -p ${WORK_DIR}/movielens
cat ${WORK_DIR}/ratings.dat |sed -e s/::/,/g| cut -d, -f1,2,3 > \
${WORK_DIR}/movielens/ratings.csv

hdfs dfs -mkdir -p ${WORK_DIR}
hdfs dfs -mkdir ${WORK_DIR}/movielens
hdfs dfs -put ${WORK_DIR}/movielens ${WORK_DIR}/

# create a 90% percent training set and a 10% probe set
mahout splitDataset \
--input ${WORK_DIR}/movielens/ratings.csv \
--output ${WORK_DIR}/dataset \
--trainingPercentage 0.9 \
--probePercentage 0.1 \
--tempDir ${WORK_DIR}/dataset/tmp

# run distributed ALS-WR to factorize \
# the rating matrix defined by the training set
mahout parallelALS \
--input ${WORK_DIR}/dataset/trainingSet/ \
--output ${WORK_DIR}/als/out \
--tempDir ${WORK_DIR}/als/tmp \
--numFeatures 20 \
--numIterations 10 \
--lambda 0.065 --numThreadsPerSolver 2

# compute predictions against the probe set, measure the error
mahout evaluateFactorization \
--input ${WORK_DIR}/dataset/probeSet/ \
--output ${WORK_DIR}/als/rmse/ \
--userFeatures ${WORK_DIR}/als/out/U/ \
--itemFeatures ${WORK_DIR}/als/out/M/ \
--tempDir ${WORK_DIR}/als/tmp

# compute recommendations
mahout recommendfactorized \
--input ${WORK_DIR}/als/out/userRatings/ \
--output ${WORK_DIR}/recommendations/ \
--userFeatures ${WORK_DIR}/als/out/U/ \
--itemFeatures ${WORK_DIR}/als/out/M/ \
--numRecommendations 6 --maxRating 5 --numThreads 2

echo -e "\nRMSE is:\n"
cat ${WORK_DIR}/als/rmse/rmse.txt
echo -e "\n"

echo -e "\nSample recommendations:\n"
shuf ${WORK_DIR}/recommendations/part-m-00000 |head
echo -e "\n\n"
上一篇 下一篇

猜你喜欢

热点阅读