我正在Cloudera集群上以YARN客户端模式启动分布式Spark应用程序.过了一段时间,我在Cloudera Manager上看到了一些错误.一些执行器断开连接,系统地发生这种情况.我想调试该问题,但YARN没有报告内部异常.
Exception from container-launch with container ID: container_1417503665765_0193_01_000003 and exit code: 1 ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
如何查看异常的堆栈跟踪?似乎YARN仅报告应用程序异常退出.有没有办法在YARN配置中查看spark executor日志?