博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Spark MLlib之协同过滤
阅读量:6023 次
发布时间:2019-06-20

本文共 3688 字,大约阅读时间需要 12 分钟。

Spark MLlib之协同过滤实例:

[java] view plain copy

import java.util.List;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaDoubleRDD;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.mllib.recommendation.ALS;
import org.apache.spark.mllib.recommendation.MatrixFactorizationModel;
import org.apache.spark.mllib.recommendation.Rating;
import scala.Tuple2;
public class SparkMLlibColbFilter {
public static void main(String[www.8555388.cn/] args) {
SparkConf conf = new SparkConf().setAppName("Java Collaborative Filtering Example");
JavaSparkContext sc = new JavaSparkContext(conf);
// Load and parse the data
String path = "file:///data/hadoop/spark-2.0.0-bin-hadoop2.7/data/mllib/als/test.data";
JavaRDD<String> data = sc.textFile(path);
JavaRDD<Rating> ratings = data.map(new Function<String, Rating>() {
@Override
public Rating call(String s) throws Exception {
String[] sarray = s.split(",");
return new Rating(Integer.parseInt(sarray[0]), Integer.parseInt(sarray[1]), Double.parseDouble(sarray[2]));
}
});
// Build the recommendation model using ALS
int rank = 10;
int numIterations = 10;
MatrixFactorizationModel model = ALS.train(JavaRDD.toRDD(ratings), rank, numIterations, 0.01);
JavaRDD<Tuple2<Object, Object>> userProducts = ratings.map(new Function<Rating, Tuple2<Object, Object>>() {
@Override
public Tuple2<Object, Object> call(Rating r) throws Exception {
return new Tuple2<Object, Object>(r.user(www.chushiyl.cn ), r.product());
}
});
JavaPairRDD<Tuple2<Integer, Integer>, Double> predictions = JavaPairRDD.fromJavaRDD(
model.predict(JavaRDD.toRDD(userProducts)).toJavaRDD().map(
new Function<Rating, Tuple2<Tuple2<Integer, Integer>, Double>>() {
@Override
public Tuple2<Tuple2<Integer, Integer>, Double> call(
Rating r) throws Exception {
return new Tuple2<>(new Tuple2<www.2018yulpt.com>(r.user(), r.product()), r.rating());
}
}));
JavaRDD<Tuple2<Double, Double>> ratesAndPreds = JavaPairRDD.fromJavaRDD(ratings.map(
new Function<Rating, Tuple2<Tuple2<Integer, Integer>, Double>>() {
@Override
public Tuple2<Tuple2<Integer, Integer>, Double> call(
Rating r) throws Exception {
return new Tuple2<www.dfgjyl.cn>(new Tuple2<>(r.user(), r.product()), r.rating());
}
})).join(predictions).values();
double MSE = JavaDoubleRDD.fromRDD(ratesAndPreds.map(new Function<Tuple2<Double, Double>, Object>() {
@Override
public Object call(Tuple2<Double, Double> pair) throws Exception {
return Math.pow((pair._1() - pair._2()),2);
}
}).rdd()).mean();
System.out.println("Mean Squared Error = " + MSE);
// Save and load model
model.save(sc.sc(), "target/tmp/myCollaborativeFilter");
MatrixFactorizationModel sameModel = MatrixFactorizationModel.load(sc.sc(),
"target/tmp/myCollaborativeFilter");
//为每个用户进行推荐,推荐的结果可以以用户id为key,结果为value存入redis或者hbase中
List<String> users = data.map(new Function<String, String>() {
@Override
public String call(String s) throws Exception {
String[] sarray = s.split(",");
return sarray[0];
}
}).distinct(www.078881.cn).collect();
for (String user : users) {
Rating[] rs = model.recommendProducts(Integer.parseInt(user), numIterations);
String value = "";
int key = 0;
for (Rating r : rs) {
key = r.user(www.rbuluoyl.cn/);
value = value + r.product(www.688qusheng.cn/) + ":" + r.rating() + "," ;
}
System.out.println(key + " " + value);
}
}
}
协同过滤ALS算法推荐过程如下:

加载数据到 ratings RDD,每行记录包括:user, product, rate

从 ratings 得到用户商品的数据集:(user, product)
使用ALS对 ratings 进行训练
通过 model 对用户商品进行预测评分:((user, product), rate)
从 ratings 得到用户商品的实际评分:((user, product), rate)
合并预测评分和实际评分的两个数据集,并求均方差

转载地址:http://cxqqx.baihongyu.com/

你可能感兴趣的文章
Ubuntu 10.04升级git 到1.7.2或更高的可行方法
查看>>
Spring Security4实战与原理分析视频课程( 扩展+自定义)
查看>>
消息队列服务器 memcacheq的搭建
查看>>
VMware Horizon View 7.5 虚拟桌面实施咨询与购买--软件硬件解决方案
查看>>
RabbitMQ如何保证队列里的消息99.99%被消费?
查看>>
第一周博客作业
查看>>
thinkpython2
查看>>
String、StringBuffer和StringBuilder的区别
查看>>
oracle recyclebin与flashback drop
查看>>
svmlight使用说明
查看>>
Swing 和AWT之间的关系
查看>>
Mysql设置自增长主键的初始值
查看>>
Android计时器正确应用方式解析
查看>>
获取post传输参数
查看>>
ASP生成静态页面的方法
查看>>
mysql 权限
查看>>
HDU 1325 Is It A Tree? 判断是否为一棵树
查看>>
Shell命令-文件压缩解压缩之gzip、zip
查看>>
个人总结
查看>>
uva 673 Parentheses Balance
查看>>