我是新手,并试图运行示例JavaSparkPi.java,它运行良好,但因为我必须在另一个java中使用它我将所有东西从main复制到类中的方法并尝试调用主要方法,它说
org.apache.spark.SparkException:作业已中止:任务不可序列化:java.io.NotSerializableException
代码如下所示:
public class JavaSparkPi { public void cal(){ JavaSparkContext jsc = new JavaSparkContext("local", "JavaLogQuery"); int slices = 2; int n = 100000 * slices; Listl = new ArrayList (n); for (int i = 0; i < n; i++) { l.add(i); } JavaRDD dataSet = jsc.parallelize(l, slices); System.out.println("count is: "+ dataSet.count()); dataSet.foreach(new VoidFunction (){ public void call(Integer i){ System.out.println(i); } }); int count = dataSet.map(new Function () { @Override public Integer call(Integer integer) throws Exception { double x = Math.random() * 2 - 1; double y = Math.random() * 2 - 1; return (x * x + y * y < 1) ? 1 : 0; } }).reduce(new Function2 () { @Override public Integer call(Integer integer, Integer integer2) throws Exception { return integer + integer2; } }); System.out.println("Pi is roughly " + 4.0 * count / n); } public static void main(String[] args) throws Exception { JavaSparkPi myClass = new JavaSparkPi(); myClass.cal(); } }
有谁有这个想法?谢谢!
嵌套函数包含对包含对象(JavaSparkPi
)的引用.所以这个对象将被序列化.为此,它需要可序列化.简单易行:
public class JavaSparkPi implements Serializable { ...