作者:mobiledu2502883317 | 来源:互联网 | 2022-12-06 15:28
我正在通过Kafka使用Python 3.5和Spark 2.2流,并且由于缺少kafka库,脚本无法运行。
我感到困惑的是,即使依赖项信息来自Spark的网站本身,为什么也缺少/找不到该库。
groupId = org.apache.spark
artifactId = spark-streaming-kafka-0-10_2.11
version = 2.2.0
我运行了“ spark-submit script.py”,错误提示需要kafka库。
Spark Streaming's Kafka libraries not found in class path. Try one of the following.
1. Include the Kafka library and its dependencies with in the
spark-submit command as
$ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8:2.2.0 ...
2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-0-8-assembly, Version = 2.2.0.
Then, include the jar in the spark-submit command as
$ bin/spark-submit --jars ...
在下一次运行中,我运行了带有要下载的kafka库的“ spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10:2.2.0 script.py”。
这次,该错误表明它无法找到/下载该库。
Ivy Default Cache set to: C:\Users\james\.ivy2\cache
The jars for the packages stored in: C:\Users\james\.ivy2\jars
:: loading settings :: url = jar:file:/D:/programs/spark-2.2.0/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.spark#spark-streaming-kafka-0-10 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
:: resolution report :: resolve 2908ms :: artifacts dl 0ms
:: modules in use:
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 1 | 0 | 0 | 0 || 0 | 0 |
---------------------------------------------------------------------
:: problems summary ::
:::: WARNINGS
module not found: org.apache.spark#spark-streaming-kafka-0-10;2.2.0
==== local-m2-cache: tried
file:/C:/Users/james/.m2/repository/org/apache/spark/spark-streaming-kafka-0-10/2.2.0/spark-streaming-kafka-0-10-2.2.0.pom
-- artifact org.apache.spark#spark-streaming-kafka-0-10;2.2.0!spark-streaming-kafka-0-10.jar:
file:/C:/Users/james/.m2/repository/org/apache/spark/spark-streaming-kafka-0-10/2.2.0/spark-streaming-kafka-0-10-2.2.0.jar
==== local-ivy-cache: tried
C:\Users\james\.ivy2\local\org.apache.spark\spark-streaming-kafka-0-10\2.2.0\ivys\ivy.xml
-- artifact org.apache.spark#spark-streaming-kafka-0-10;2.2.0!spark-streaming-kafka-0-10.jar:
C:\Users\james\.ivy2\local\org.apache.spark\spark-streaming-kafka-0-10\2.2.0\jars\spark-streaming-kafka-0-10.jar
==== central: tried
https://repo1.maven.org/maven2/org/apache/spark/spark-streaming-kafka-0-10/2.2.0/spark-streaming-kafka-0-10-2.2.0.pom
-- artifact org.apache.spark#spark-streaming-kafka-0-10;2.2.0!spark-streaming-kafka-0-10.jar:
https://repo1.maven.org/maven2/org/apache/spark/spark-streaming-kafka-0-10/2.2.0/spark-streaming-kafka-0-10-2.2.0.jar
==== spark-packages: tried
http://dl.bintray.com/spark-packages/maven/org/apache/spark/spark-streaming-kafka-0-10/2.2.0/spark-streaming-kafka-0-10-2.2.0.pom
-- artifact org.apache.spark#spark-streaming-kafka-0-10;2.2.0!spark-streaming-kafka-0-10.jar:
http://dl.bintray.com/spark-packages/maven/org/apache/spark/spark-streaming-kafka-0-10/2.2.0/spark-streaming-kafka-0-10-2.2.0.jar
::::::::::::::::::::::::::::::::::::::::::::::
:: UNRESOLVED DEPENDENCIES ::
::::::::::::::::::::::::::::::::::::::::::::::
:: org.apache.spark#spark-streaming-kafka-0-10;2.2.0: not found
::::::::::::::::::::::::::::::::::::::::::::::
:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: org.apache.spark#spark-streaming-kafka-0-10;2.2.0: not found]
at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1177)
at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:298)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
T. Gawęda..
5
第一:正如在Developers Mailing列表上所讨论的,Kafka不包含在二进制分发中。这就是为什么在classpath上没有它的原因。
第二:在--packages
命令中,您应该指定Scala版本。它不仅在SBT中是必需的,而且spark-submit
在后台使用Ivy。
因此,请尝试:
$ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.2.0 script.py
重点:也许我会创建一个PR来更改描述,这会产生误导
1> T. Gawęda..:
第一:正如在Developers Mailing列表上所讨论的,Kafka不包含在二进制分发中。这就是为什么在classpath上没有它的原因。
第二:在--packages
命令中,您应该指定Scala版本。它不仅在SBT中是必需的,而且spark-submit
在后台使用Ivy。
因此,请尝试:
$ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.2.0 script.py
重点:也许我会创建一个PR来更改描述,这会产生误导
我什至添加了scala版本,仍然找不到该库。我的命令:/ path / to / bin / spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10_2.12:2.4.0 script.py 我得到以下内容: