作者:赛亚兔备_393 | 来源:互联网 | 2023-02-03 06:22
我使用spark-shell来试验Spark的HashPartitioner.错误如下所示:
scala> val data = sc.parallelize(List((1, 3), (2, 4), (3, 6), (3, 7)))
data: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[0] at parallelize at :24
scala> val partitiOnedData= data.partitionBy(new spark.HashPartitioner(2))
:26: error: type HashPartitioner is not a member of org.apache.spark.sql.SparkSession
val partitiOnedData= data.partitionBy(new spark.HashPartitioner(2))
^
scala> val partitiOnedData= data.partitionBy(new org.apache.spark.HashPartitioner(2))
partitionedData: org.apache.spark.rdd.RDD[(Int, Int)] = ShuffledRDD[1] at partitionBy at :26
第三次操作有效时,第二次操作失败.为什么spark-shell会在org.apache.spark.sql.SparkSession的包中找到spark.HashPartitioner而不是org.apache.spark?
1> 小智..:
spark
是一个SparkSession
对象而不是org.apache.spark
包.
您应该导入org.apache.spark.HashPartitioner
或使用完整的类名,例如:
import org.apache.spark.HashPartitioner
val partitiOnedData= data.partitionBy(new HashPartitioner(2))