作者:好几个健康2002_408 | 来源:互联网 | 2022-12-06 19:34
我正在尝试从运行在IBM Analytics Engine上的Spark 2.3连接到在IBM Cloud上运行的ScyllaDB数据库.
我这样开始火花壳......
$ spark-shell --master local[1] \
--files jaas.conf \
--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.0,datastax:spark-cassandra-connector:2.3.0-s_2.11,commons-configuration:commons-configuration:1.10 \
--conf "spark.driver.extraJavaOptiOns=-Djava.security.auth.login.cOnfig=jaas.conf" \
--conf "spark.executor.extraJavaOptiOns=-Djava.security.auth.login.cOnfig=jaas.conf" \
--conf spark.cassandra.connection.host=xxx1.composedb.com,xxx2.composedb.com,xxx3.composedb.com \
--conf spark.cassandra.connection.port=28730 \
--conf spark.cassandra.auth.username=scylla \
--conf spark.cassandra.auth.password=SECRET \
--conf spark.cassandra.connection.ssl.enabled=true \
--num-executors 1 \
--executor-cores 1
然后执行以下spark scala代码:
import com.datastax.spark.connector._
import org.apache.spark.sql.cassandra._
val stocksRdd = sc.cassandraTable("stocks", "stocks")
stocksRdd.count()
但是,我看到一堆警告:
18/08/23 10:11:01 WARN Cluster: You listed xxx1.composedb.com/xxx.xxx.xxx.xxx:28730 in your contact points, but it wasn't found in the control host's system.peers at startup
18/08/23 10:11:01 WARN Cluster: You listed xxx1.composedb.com/xxx.xxx.xxx.xxx:28730 in your contact points, but it wasn't found in the control host's system.peers at startup
18/08/23 10:11:06 WARN Session: Error creating pool to /xxx.xxx.xxx.xxx:28730
com.datastax.driver.core.exceptions.ConnectionException: [/xxx.xxx.xxx.xxx:28730] Pool was closed during initialization
...
但是,在警告中的堆栈跟踪之后,我看到了我期望的输出:
res2: LOng= 4
如果我导航到撰写UI,我会看到一个地图json:
[
{"xxx.xxx.xxx.xxx:9042":"xxx1.composedb.com:28730"},
{"xxx.xxx.xxx.xxx:9042":"xxx2.composedb.com:28730"},
{"xxx.xxx.xxx.xxx:9042":"xxx3.composedb.com:28730"}
]
警告似乎与地图文件有关.
警告的含义是什么?我可以忽略它吗?
注意:我已经看到了类似的问题,但是我认为这个问题因地图文件而有所不同,我无法控制如何通过Compose设置scylladb集群.
1> Moreno Garci..:
这只是警告.警告正在发生,因为火花试图触及的IP对Scylla本身并不了解.显然Spark正在连接到集群并检索预期的信息,所以你应该没问题.