问

如何设置Spark执行器的数量？

乃ah麟发布于 2022-12-18 16:20

java

我如何配置Java(或Scala)代码量的执行程序有SparkConfig和SparkContext？我经常看到2个执行者.看起来spark.default.parallelism不起作用,是关于不同的东西.

我只需要将执行程序的数量设置为等于群集大小,但总是只有2个.我知道我的簇大小.如果这很重要,我会在YARN上运行.

3 个回答

您也可以通过在SparkConf对象上设置参数"spark.executor.instances"和"spark.executor.cores"以编程方式执行此操作.

例:
```
SparkConf conf = new SparkConf()
      // 4 workers
      .set("spark.executor.instances", "4")
      // 5 cores on each workers
      .set("spark.executor.cores", "5");
```
第二个参数仅适用于YARN和独立模式.它允许应用程序在同一个worker上运行多个执行程序,前提是该worker上有足够的内核.
2022-12-18 16:23 回答

顺hw应大自然改造大自然
好的,我知道了.执行程序的数量实际上不是Spark属性本身,而是用于在YARN上放置作业的驱动程序.因此我使用SparkSubmit类作为驱动程序,它具有适当的--num-executors参数,这正是我需要的.

更新:

对于某些工作,我不再遵循SparkSubmit方法了.我不能主要用于Spark作业只是应用程序组件之一的应用程序(甚至是可选的).对于这些情况,我使用spark-defaults.conf附加到群集配置和其中的spark.executor.instances属性.这种方法更加通用,允许我根据集群(开发人员工作站,登台,生产)正确平衡资源.

2022-12-18 16:25 回答

搞笑--林佳豪_533_654
在Spark 2.0+版本中

使用spark会话变量动态设置执行程序的数量（从程序内部）

spark.conf.set("spark.executor.instances", 4)

spark.conf.set("spark.executor.cores", 4)

In above case maximum 16 tasks will be executed at any given time.

other option is dynamic allocation of executors as below -

spark.conf.set("spark.dynamicAllocation.enabled", "true")

spark.conf.set("spark.executor.cores", 4)

spark.conf.set("spark.dynamicAllocation.minExecutors","1")

spark.conf.set("spark.dynamicAllocation.maxExecutors","5")

This was you can let spark decide on allocating number of executors based on processing and memory requirements for running job.

I feel second option works better that first option and is widely used.

Hope this will help.

2022-12-18 16:25 回答

huangpeishan49

撰写答案

今天，你开发时遇到什么问题呢？

立即提问

热门标签