问

如何从命令行为Spark示例设置主地址

冒泡鱼的快乐2011 发布于 2023-01-07 10:05

spark

text

int

main

注意: 作者正在寻找在运行不涉及源代码更改的Spark示例时设置Spark Master的答案,而是仅在可能的情况下从命令行完成的选项.

让我们考虑BinaryClassification示例的run()方法:

  def run(params: Params) {
    val conf = new SparkConf().setAppName(s"BinaryClassification with $params")
    val sc = new SparkContext(conf)

请注意,SparkConf没有提供任何配置SparkMaster的方法.

从Intellij运行此程序时,使用以下参数:

--algorithm LR --regType L2 --regParam 1.0 data/mllib/sample_binary_classification_data.txt

发生以下错误:

Exception in thread "main" org.apache.spark.SparkException: A master URL must be set
in your configuration
    at org.apache.spark.SparkContext.(SparkContext.scala:166)
    at org.apache.spark.examples.mllib.BinaryClassification$.run(BinaryClassification.scala:105)

我还尝试添加Spark Master网址(尽管代码似乎不支持它..)

  spark://10.213.39.125:17088   --algorithm LR --regType L2 --regParam 1.0 
  data/mllib/sample_binary_classification_data.txt

和

--algorithm LR --regType L2 --regParam 1.0 spark://10.213.39.125:17088
data/mllib/sample_binary_classification_data.txt

两者都不适用于错误:

Error: Unknown argument 'data/mllib/sample_binary_classification_data.txt'

这里是参考解析 - 它与SparkMaster没有任何关系:

val parser = new OptionParser[Params]("BinaryClassification") {
  head("BinaryClassification: an example app for binary classification.")
  opt[Int]("numIterations")
    .text("number of iterations")
    .action((x, c) => c.copy(numIterations = x))
  opt[Double]("stepSize")
    .text(s"initial step size, default: ${defaultParams.stepSize}")
    .action((x, c) => c.copy(stepSize = x))
  opt[String]("algorithm")
    .text(s"algorithm (${Algorithm.values.mkString(",")}), " +
    s"default: ${defaultParams.algorithm}")
    .action((x, c) => c.copy(algorithm = Algorithm.withName(x)))
  opt[String]("regType")
    .text(s"regularization type (${RegType.values.mkString(",")}), " +
    s"default: ${defaultParams.regType}")
    .action((x, c) => c.copy(regType = RegType.withName(x)))
  opt[Double]("regParam")
    .text(s"regularization parameter, default: ${defaultParams.regParam}")
  arg[String]("")
    .required()
    .text("input paths to labeled examples in LIBSVM format")
    .action((x, c) => c.copy(input = x))

所以......是的......我可以继续修改源代码.但我怀疑相反,我错过了一个可用的调整旋钮,使这项工作不涉及修改源代码.

5 个回答

如果您想从代码中完成此操作,可以.setMaster(...)在创建时使用SparkConf:

val conf = new SparkConf().setAppName("Simple Application")
                          .setMaster("spark://myhost:7077")

姗姗来迟的编辑(根据评论)

对于Spark 2.x +中的会话:

val spark = SparkSession.builder()
                        .appName("app_name")
                        .getOrCreate()

命令行(2.x)假设本地独立群集.

spark-shell --master spark://localhost:7077

2023-01-07 10:07 回答

爷们郭子

您可以通过添加JVM参数从命令行设置Spark master:
```
-Dspark.master=spark://myhost:7077
```
2023-01-07 10:07 回答

采蘑菇的雨天戴草帽_412_715
所以这是解决方案.
2023-01-07 10:07 回答

中华oc博弈网络志
我下载了Spark 1.3.0并希望使用Eclipse Luna 4.4测试java示例,并发现要运行java示例,需要将spark-assembly-1.3.0-hadoop2.4.0.jar作为Java的引用库添加项目.

使用Java开始使用Spark的最快方法是运行JavaWordCount示例.要修复上述问题,请为Spark配置添加以下行:
```
SparkConf sparkConf = new SparkConf().setAppName("JavaWordCount").setMaster("local[2]").set("spark.executor.memory","1g");
```
就是这样,尝试使用Eclipse运行你应该获得成功.如果您看到以下错误:
```
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
    at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
```
只需忽略,向下滚动控制台,您将看到每行的输入文本文件行,后跟一个单词计数器.

这是使用Windows操作系统开始使用Spark的快速方法,无需担心安装Hadoop,只需要JDK 6和Eclipse
2023-01-07 10:07 回答

zhengping4476
正如文件所述: setMaster(String master)

要连接的主URL,例如local在本地运行一个线程,local[4]在本地运行4个内核,或spark://master:7077在Spark独立群集上运行.

2023-01-07 10:08 回答

赢在青春创业团队

撰写答案

今天，你开发时遇到什么问题呢？

立即提问

热门标签