在E-MapReduce集群內運行Spark GraphX作業
Spark GraphX是一個比較流行的圖計算框架,如果你使用了阿裏雲的E-MapReduce服務,可以很方便的運行圖計算的作業。
下麵以PageRank為例,看看如何運行GraphX作業。這個例子來自Spark官方的example(examples/src/main/scala/org/apache/spark/examples/graphx/PageRankExample.scala),直接調用GraphOps的pageRank方法,計算出ranks:
object PageRankExample {
def main(args: Array[String]): Unit = {
// Creates a SparkSession.
val spark = SparkSession
.builder
.appName(s"${this.getClass.getSimpleName}")
.getOrCreate()
val sc = spark.sparkContext
// $example on$
// Load the edges as a graph
val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
// Run PageRank
val ranks = graph.pageRank(0.0001).vertices
// Join the ranks with the usernames
val users = sc.textFile("data/graphx/users.txt").map { line =>
val fields = line.split(",")
(fields(0).toLong, fields(1))
}
val ranksByUsername = users.join(ranks).map {
case (id, (username, rank)) => (username, rank)
}
// Print the result
println(ranksByUsername.collect().mkString("\n"))
// $example off$
spark.stop()
}
}
下麵來看如何運行這個example,首先要登錄E-MapReduce程序的Master節點,依次運行如下命令:
- cd /usr/lib/spark-current
- hadoop fs -mkdir -p data
- hadoop fs -put data/graphx data/
- run-example graphx.PageRankExample
等待作業 提交之後,最後運行結果打印:
(justinbieber,0.15)
(matei_zaharia,0.7013599933629602)
(ladygaga,1.390049198216498)
(BarackObama,1.4588814096664682)
(jeresig,0.9993442038507723)
(odersky,1.2973176314422592)
最後更新:2017-07-24 16:02:36