3.2 master为local时
首先查看下SparkSubmit提交应用程序时,一些不支持的组合形式,对应代码如下所示:
private[deploy] defprepareSubmitEnvironment(args: SparkSubmitArguments)
……
case (LOCAL, CLUSTER) =>
printErrorAndExit("Cluster deploymode is not compatible with master \"local\"")
,,,,,,
即,在使用local与local-cluster这种local方式时,不支持以CLUSTER的部署模式提交应用程序。
因此以下对应都是在CLIENT的部署模式提交应用程序。
3.2.1 环境变量方式
1. 命令:
SPARK_CLASSPATH=$SPARK_HOME/ojdbc14.jar…. |
2. 说明:
指定SPARK_CLASSPATH时,相当于同时指定driver和executor的classpath,根据前面的分析,实际上local模式下只需要设置driver端的classpath即可,同时需要手动在该路径下方式所需的jar包,否则会抛出驱动类无法找到的异常。
参考前面启动章节的启动日志中的红色斜体部分,表示的是SPARK_CLASSPATH已经被废弃,建议使用斜体部分的对应配置属性进行替换。
Please instead use:
- ./spark-submit with --driver-class-pathto augment the driver classpath
-spark.executor.extraClassPath to augment the executor classpath
16/04/18 11:56:35 WARN spark.SparkConf:Setting 'spark.executor.extraClassPath' to '$SPARK_HOME/lib/ojdbc14.jar' as awork-around.
16/04/18 11:56:35 WARN spark.SparkConf: Setting'spark.driver.extraClassPath' to '$SPARK_HOME/lib/ojdbc14.jar' as awork-around.
3. 测试方式一:在环境变量的路径下不存在所需jar包,driver和executor端加载类同时异常,执行命令及其异常日志如下所示:
[hdfs@nodemaster spark-1.5.2-bin-hadoop2.6]$SPARK_CLASSPATH=$SPARK_HOME/ojdbc14.jar $SPARK_HOME/bin/spark-submit --master local \
> --deploy-mode client \
> --driver-memory 2g \
> --driver-cores 1 \
> --total-executor-cores 2 \
> --executor-memory4g \
> --conf"spark.ui.port"=4081 \
> --class com.mb.TestJarwithOracle \
> /tmp/test/Spark15.jar
16/04/26 10:31:39 INFO spark.SparkContext: RunningSpark version 1.5.2
16/04/26 10:31:40 WARN util.NativeCodeLoader: Unableto load native-hadoop library for your platform... using builtin-java classeswhere applicable
16/04/26 10:31:40 WARN spark.SparkConf:
SPARK_CLASSPATH was detected (set to '/ojdbc14.jar').
This is deprecated in Spark 1.0+.
Please instead use:
-./spark-submit with --driver-class-path to augment the driver classpath
-spark.executor.extraClassPath to augment the executor classpath
16/04/26 10:31:40 WARN spark.SparkConf: Setting'spark.executor.extraClassPath' to '/ojdbc14.jar' as a work-around.
16/04/26 10:31:40 WARN spark.SparkConf: Setting'spark.driver.extraClassPath' to '/ojdbc14.jar' as a work-around.
16/04/26 10:31:40 INFO spark.SecurityManager: Changingview acls to: hdfs
16/04/26 10:31:40 INFO spark.SecurityManager: Changingmodify acls to: hdfs
16/04/26 10:31:40 INFO spark.SecurityManager:SecurityManager: authentication disabled; ui acls disabled; users with viewpermissions: Set(hdfs); users with modify permissions: Set(hdfs)
16/04/26 10:31:41 INFO slf4j.Slf4jLogger: Slf4jLoggerstarted
16/04/26 10:31:41 INFO Remoting: Starting remoting
16/04/26 10:31:42 INFO Remoting: Remoting started;listening on addresses :[akka.tcp://sparkDriver@192.168.149.86:32898]
16/04/26 10:31:42 INFO util.Utils: Successfullystarted service 'sparkDriver' on port 32898.
16/04/26 10:31:42 INFO spark.SparkEnv: RegisteringMapOutputTracker
16/04/26 10:31:42 INFO spark.SparkEnv: RegisteringBlockManagerMaster
16/04/26 10:31:42 INFO storage.DiskBlockManager:Created local directory at /tmp/blockmgr-43894e1f-4546-477c-91f9-766179306112
16/04/26 10:31:42 INFO storage.MemoryStore:MemoryStore started with capacity 1060.3 MB
16/04/26 10:31:42 INFO spark.HttpFileServer: HTTP Fileserver directory is/tmp/spark-0294727b-0b57-48ff-9f36-f441fa3604aa/httpd-cec03f6e-ca66-49b8-94b9-39427a86ed65
16/04/26 10:31:42 INFO spark.HttpServer: Starting HTTPServer
16/04/26 10:31:42 INFO server.Server:jetty-8.y.z-SNAPSHOT
16/04/26 10:31:42 INFO server.AbstractConnector:Started SocketConnector@0.0.0.0:57115
16/04/26 10:31:42 INFO util.Utils: Successfullystarted service 'HTTP file server' on port 57115.
16/04/26 10:31:42 INFO spark.SparkEnv: RegisteringOutputCommitCoordinator
16/04/26 10:31:57 INFO server.Server:jetty-8.y.z-SNAPSHOT
16/04/26 10:31:57 INFO server.AbstractConnector:Started SelectChannelConnector@0.0.0.0:4081
16/04/26 10:31:57 INFO util.Utils: Successfullystarted service 'SparkUI' on port 4081.
16/04/26 10:31:57 INFO ui.SparkUI: Started SparkUI athttp://192.168.149.86:4081
16/04/26 10:31:57 INFO spark.SparkContext: Added JARfile:/tmp/test/Spark15.jar at http://192.168.149.86:57115/jars/Spark15.jar withtimestamp 1461637917682
16/04/26 10:31:57 WARN metrics.MetricsSystem: Usingdefault name DAGScheduler for source because spark.app.id is not set.
16/04/26 10:31:57 INFO executor.Executor: Startingexecutor ID driver on host localhost
16/04/26 10:31:58 INFO util.Utils: Successfullystarted service 'org.apache.spark.network.netty.NettyBlockTransferService' onport 23561.
16/04/26 10:31:58 INFOnetty.NettyBlockTransferService: Server created on 23561
16/04/26 10:31:58 INFO storage.BlockManagerMaster:Trying to register BlockManager
16/04/26 10:31:58 INFOstorage.BlockManagerMasterEndpoint: Registering block manager localhost:23561with 1060.3 MB RAM, BlockManagerId(driver, localhost, 23561)
16/04/26 10:31:58 INFO storage.BlockManagerMaster:Registered BlockManager
16/04/26 10:31:59 INFO scheduler.EventLoggingListener:Logging events to hdfs://nodemaster:8020/user/hdfs/sparklogs/local-1461637917735
delete from TEST_TABLE where log_date in ('date')
java.lang.ClassNotFoundException:oracle.jdbc.driver.OracleDriver
atjava.net.URLClassLoader$1.run(URLClassLoader.java:366)
atjava.net.URLClassLoader$1.run(URLClassLoader.java:355)
atjava.security.AccessController.doPrivileged(Native Method)
atjava.net.URLClassLoader.findClass(URLClassLoader.java:354)
atjava.lang.ClassLoader.loadClass(ClassLoader.java:425)
atjava.lang.ClassLoader.loadClass(ClassLoader.java:358)
atjava.lang.Class.forName0(Native Method)
atjava.lang.Class.forName(Class.java:191)
atcom.mb.TestJarwithOracle$.deleteRecodes(TestJarwithOracle.scala:38)
atcom.mb.TestJarwithOracle$.main(TestJarwithOracle.scala:27)
atcom.mb.TestJarwithOracle.main(TestJarwithOracle.scala)
atsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
atjava.lang.reflect.Method.invoke(Method.java:606)
atorg.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
atorg.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
atorg.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
atorg.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
atorg.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/04/26 10:31:59 INFO spark.SparkContext: Startingjob: foreachPartition at TestJarwithOracle.scala:28
16/04/26 10:31:59 INFO scheduler.DAGScheduler: Got job0 (foreachPartition at TestJarwithOracle.scala:28) with 1 output partitions
16/04/26 10:31:59 INFO scheduler.DAGScheduler: Finalstage: ResultStage 0(foreachPartition at TestJarwithOracle.scala:28)
16/04/26 10:31:59 INFO scheduler.DAGScheduler: Parentsof final stage: List()
16/04/26 10:31:59 INFO scheduler.DAGScheduler: Missingparents: List()
16/04/26 10:32:00 INFO scheduler.DAGScheduler:Submitting ResultStage 0 (ParallelCollectionRDD[0] at parallelize atTestJarwithOracle.scala:26), which has no missing parents
16/04/26 10:32:00 INFO storage.MemoryStore:ensureFreeSpace(1200) called with curMem=0, maxMem=1111794647
16/04/26 10:32:00 INFO storage.MemoryStore: Blockbroadcast_0 stored as values in memory (estimated size 1200.0 B, free 1060.3MB)
16/04/26 10:32:00 INFO storage.MemoryStore:ensureFreeSpace(851) called with curMem=1200, maxMem=1111794647
16/04/26 10:32:00 INFO storage.MemoryStore: Blockbroadcast_0_piece0 stored as bytes in memory (estimated size 851.0 B, free1060.3 MB)
16/04/26 10:32:00 INFO storage.BlockManagerInfo: Addedbroadcast_0_piece0 in memory on localhost:23561 (size: 851.0 B, free: 1060.3MB)
16/04/26 10:32:00 INFO spark.SparkContext: Createdbroadcast 0 from broadcast at DAGScheduler.scala:861
16/04/26 10:32:00 INFO scheduler.DAGScheduler:Submitting 1 missing tasks from ResultStage 0 (ParallelCollectionRDD[0] atparallelize at TestJarwithOracle.scala:26)
16/04/26 10:32:00 INFO scheduler.TaskSchedulerImpl:Adding task set 0.0 with 1 tasks
16/04/26 10:32:00 INFO scheduler.TaskSetManager:Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 2310 bytes)
16/04/26 10:32:00 INFO executor.Executor: Running task0.0 in stage 0.0 (TID 0)
16/04/26 10:32:00 INFO executor.Executor: Fetchinghttp://192.168.149.86:57115/jars/Spark15.jar with timestamp 1461637917682
16/04/26 10:32:00 INFO util.Utils: Fetchinghttp://192.168.149.86:57115/jars/Spark15.jar to/tmp/spark-0294727b-0b57-48ff-9f36-f441fa3604aa/userFiles-9b936a62-13aa-4ac4-8c26-caabe7bd4367/fetchFileTemp8857984169855770119.tmp
16/04/26 10:32:00 INFO executor.Executor: Addingfile:/tmp/spark-0294727b-0b57-48ff-9f36-f441fa3604aa/userFiles-9b936a62-13aa-4ac4-8c26-caabe7bd4367/Spark15.jarto class loader
java.lang.ClassNotFoundException:oracle.jdbc.driver.OracleDriver
atjava.net.URLClassLoader$1.run(URLClassLoader.java:366)
atjava.net.URLClassLoader$1.run(URLClassLoader.java:355)
atjava.security.AccessController.doPrivileged(Native Method)
atjava.net.URLClassLoader.findClass(URLClassLoader.java:354)
atjava.lang.ClassLoader.loadClass(ClassLoader.java:425)
atjava.lang.ClassLoader.loadClass(ClassLoader.java:358)
atjava.lang.Class.forName0(Native Method)
atjava.lang.Class.forName(Class.java:191)
atcom.mb.TestJarwithOracle$.insertInto(TestJarwithOracle.scala:61)
atcom.mb.TestJarwithOracle$$anonfun$main$1.apply(TestJarwithOracle.scala:28)
atcom.mb.TestJarwithOracle$$anonfun$main$1.apply(TestJarwithOracle.scala:28)
atorg.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
atorg.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
atorg.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
atorg.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
atorg.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
atorg.apache.spark.scheduler.Task.run(Task.scala:88)
atorg.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
atjava.lang.Thread.run(Thread.java:745)
16/04/26 10:32:00 INFO executor.Executor: Finishedtask 0.0 in stage 0.0 (TID 0). 915 bytes result sent to driver
16/04/26 10:32:00 INFO scheduler.TaskSetManager:Finished task 0.0 in stage 0.0 (TID 0) in 235 ms on localhost (1/1)
两个异常分别对应driver端和executor端。
4. 测试方式二:在环境变量的路径下存在所需jar包,driver和executor端加载类正常。
环境变量同时设置了driver端和executor端的classpath,只要该路径下有jar包,驱动类即可加载。
下面通过配置属性分别进行配置并测试。
3.2.2 配置属性方式
1. 使用配置属性提交应用程序的命令:
$SPARK_HOME/bin/spark-submit --master spark://masternode:7078 \
--conf "spark.executor.extraClassPath"="$SPARK_HOME/lib/ojdbc14.jar"\
--conf"spark.driver.extraClassPath"="$SPARK_HOME/lib/ojdbc14.jar" \
--conf "spark.ui.port"=4061 \
--class com.TestClass \
/tmp/test/SparkTest.jar
通过前面的分析,在local下,只"spark.driver.extraClassPath"有效。
2. 说明:
此时,配置属性只是指定jar包在classpath指定的路径下,但没有手动将所需的jar包部署到该路径下,因此加载驱动器类时会抛出java.lang.ClassNotFoundException:oracle.jdbc.driver.OracleDriver的异常。
3. 执行日志如下所示:
16/04/14 14:23:18 INFOspark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 154bytes
16/04/14 14:23:19 WARNscheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 2, 192.168.149.98):java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at com.TestClass$.insertInto(WebLogExtractor.scala:48)
at com.TestClass$$anonfun$main$1.apply(WebLogExtractor.scala:43)
at com.TestClass$$anonfun$main$1.apply(WebLogExtractor.scala:43)
atorg.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
atorg.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
atorg.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
atjava.lang.Thread.run(Thread.java:745)
4. 解决方法:
将对应ojdbc14.jar拷贝到当前节点的"spark.driver.extraClassPath"路径下即可。
5. 扩展:
可以尝试"spark.executor.extraClassPath"与"spark.driver.extraClassPath"是否设置的几种情况分别进行测试,以验证前面针对local时,仅"spark.driver.extraClassPath"有效的分析。
再次总结:只要设置"spark.driver.extraClassPath",并在该路径下放置了jar包即可。
3.2.3 自动上传jar包方式
当部署模式为CLIENT时,应用程序(对应Driver)会将childMainClass设置为传入的mainClass,然后启动JVM进程,对应代码如下所示:
if (deployMode == CLIENT) {
childMainClass = args.mainClass
if(isUserJar(args.primaryResource)) {
childClasspath += args.primaryResource
}
if(args.jars != null) { childClasspath ++= args.jars.split(",") }
if(args.childArgs != null) { childArgs ++= args.childArgs }
}
在client模式,直接启动应用程序的主类,同时,将主类的jar包和添加的jars包(如果在参数中设置的话)都添加到运行时的classpath中。即当前的driver的classpath会自动包含--jars 设置的jar包。
同时,driver通过启动的http服务上传该jar包,executor在执行时下载该jar包,同时放置到executor进程的classpath路径。
测试案例的构建:删除前面的环境变量或两个配置属性的设置,直接用--jars命令行选项指定所需的第三方jar包(即这里的驱动类jar包)即可。例如:
$SPARK_HOME/bin/spark-submit --master local \
--deploy-mode client \
--driver-memory 2g \
--driver-cores 1 \
--total-executor-cores 2 \
--executor-memory 4g \
--conf "spark.ui.port"=4081 \
--classcom.mb.TestJarwithOracle \
--jars"$SPARK_HOME/thirdlib/ojdbc14.jar" \
/tmp/test/Spark15.jar
此时会直--jars指定的jar包加入classpath路径,因此可以成功加载驱动类。
当将应用程序提交到集群中时,对应不同的部署模式(--deploy-mode)会有不同的情况,因此下面分别针对不同的部署模式进行分析。