这是一个非常棒的问题.
我正在尝试了解SparkSQL.我一直在关注这里描述的例子:http: //spark.apache.org/docs/1.0.0/sql-programming-guide.html
在Spark-shell中一切正常,但是当我尝试使用sbt构建批处理版本时,我收到以下错误消息:
object sql is not a member of package org.apache.spark
不幸的是,我对sbt很新,所以我不知道如何纠正这个问题.我怀疑我需要包含其他依赖项,但我无法弄清楚如何.
这是我正在尝试编译的代码:
/* TestApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
case class Record(k: Int, v: String)
object TestApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
val data = sc.parallelize(1 to 100000)
val records = data.map(i => new Record(i, "value = "+i))
val table = createSchemaRDD(records, Record)
println(">>> " + table.count)
}
}
在我尝试创建SQLContext的行上标记错误.
这是sbt文件的内容:
name := "Test Project"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0"
resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
谢谢您的帮助.