热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

SparkRDD在s3文件上创建-SparkRDDcreateons3file

ImtryingtocreateJAVARDDons3filebutnotabletocreaterdd.Cansomeonehelpmetosolvethis

I'm trying to create JAVARDD on s3 file but not able to create rdd.Can someone help me to solve this problem.

我正在尝试在s3文件上创建JAVARDD但无法创建rdd.Can有人帮我解决了这个问题。

Code :

        SparkConf cOnf= new SparkConf().setAppName(appName).setMaster("local");
            JavaSparkContext javaSparkCOntext= new JavaSparkContext(conf);

    javaSparkContext.hadoopConfiguration().set("fs.s3.awsAccessKeyId",
                    accessKey);
            javaSparkContext.hadoopConfiguration().set("fs.s3.awsSecretAccessKey",
                    secretKey);
            javaSparkContext.hadoopConfiguration().set("fs.s3.impl",
                    "org.apache.hadoop.fs.s3native.NativeS3FileSystem");

JavaRDD rawData = sparkContext
                    .textFile("s3://mybucket/sample.txt");

This code throwing exception

此代码抛出异常

2015-05-06 18:58:57 WARN  LoadSnappy:46 - Snappy native library not loaded
java.lang.IllegalArgumentException: java.net.URISyntaxException: Expected scheme-specific part at index 3: s3:
    at org.apache.hadoop.fs.Path.initialize(Path.java:148)
    at org.apache.hadoop.fs.Path.(Path.java:126)
    at org.apache.hadoop.fs.Path.(Path.java:50)
    at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1084)
    at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
    at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
    at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
    at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
    at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1023)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:987)
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:177)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
    at org.apache.spark.rdd.RDD.take(RDD.scala:1156)
    at org.apache.spark.rdd.RDD.first(RDD.scala:1189)
    at org.apache.spark.api.java.JavaRDDLike$class.first(JavaRDDLike.scala:477)
    at org.apache.spark.api.java.JavaRDD.first(JavaRDD.scala:32)
    at com.cignifi.DataExplorationValidation.processFile(DataExplorationValidation.java:148)
    at com.cignifi.DataExplorationValidation.main(DataExplorationValidation.java:104)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.URISyntaxException: Expected scheme-specific part at index 3: s3:
    at java.net.URI$Parser.fail(URI.java:2829)
    at java.net.URI$Parser.failExpecting(URI.java:2835)
    at java.net.URI$Parser.parse(URI.java:3038)
    at java.net.URI.(URI.java:753)
    at org.apache.hadoop.fs.Path.initialize(Path.java:145)
    ... 36 more

Some more details

更多细节

Spark version 1.3.0.

Spark版本1.3.0。

Running in local mode using spark-submit.

使用spark-submit在本地模式下运行。

I tried this thing on local and EC2 instance ,In both case I'm getting same error.

我在本地和EC2实例上试过这个东西,在这两种情况下我都得到同样的错误。

1 个解决方案

#1


It should be s3n:// instead of s3://

它应该是s3n://而不是s3://

See External Datasets in Spark Programming Guide

请参阅“Spark编程指南”中的“外部数据集”


推荐阅读
author-avatar
mobiledu2502925687
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有