1概述
本文主要内容包含Spark SQL读写Oracle表数据的简单案例,并针对案例中比较常见的几个问题给出解决方法。
最后从常见的java.lang.ClassNotFoundException(无法找到驱动类)的异常问题出发,分析相关的几种解决方法,以及各个解决方法之间的异同点。
2案例中比较常见问题及其解决方法
2.1 启动
首先查看Spark 官网给出的SparkSQL的编程指南部分(http://spark.apache.org/docs/latest/sql-programming-guide.html)的JDBC To Other Databases 内容。参考命令:
SPARK_CLASSPATH=postgresql-9.3-1102-jdbc41.jar bin/spark-shell |
对应写出访问 Oracle的命令,如下:
SPARK_CLASSPATH=$SPARK_HOME/ojdbc14.jar bin/spark-shell --master local |
其中,CLASSPATH相关内容会在后一章节给出详细分析,在此仅针对其他一些常见问题给出解决方法。
启动过程如下(部分字符串已经被替换,如:$SPARK_HOME):
[hdfs@masternode spark-1.5.2-bin-hadoop2.6]$ SPARK_CLASSPATH=$SPARK_HOME/lib/ojdbc14.jar bin/spark-shell --master local …… Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.5.2 /_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_71) Type in expressions to have them evaluated. Type :help for more information. 16/04/18 11:56:35 INFO spark.SparkContext: Running Spark version 1.5.2 16/04/18 11:56:35 WARN spark.SparkConf: SPARK_CLASSPATH was detected (set to '$SPARK_HOME/lib/ojdbc14.jar'). This is deprecated in Spark 1.0+.
Please instead use: - ./spark-submit with --driver-class-path to augment the driver classpath - spark.executor.extraClassPath to augment the executor classpath
16/04/18 11:56:35 WARN spark.SparkConf: Setting 'spark.executor.extraClassPath' to '$SPARK_HOME/lib/ojdbc14.jar' as a work-around. 16/04/18 11:56:35 WARN spark.SparkConf: Setting 'spark.driver.extraClassPath' to '$SPARK_HOME/lib/ojdbc14.jar' as a work-around. …… 16/04/18 11:56:51 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 16/04/18 11:56:51 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 16/04/18 11:56:51 INFO ui.SparkUI: Started SparkUI at http://192.168.149.86:4040 …… 16/04/18 11:56:57 INFO repl.SparkILoop: Created sql context (with Hive support).. SQL context available as sqlContext.
|
下面给出简单读写案例中的常见问题及其解决方法。
2.2 访问表数据时报无法找到驱动类
scala> val url = "jdbc:oracle:thin:@xx.xx.xx.xx:1521:dbsid"
url: String = jdbc:oracle:thin:@xx.xx.xx.xx:1521:dbsid
// 通过format指定访问jdbc,通过options指定访问时的参数,然后load加载数据
scala> val jdbcDF =sqlContext.read.format("jdbc").options( Map( "url" ->url, "dbtable" -> "MyTable", "user" ->"username", "password" -> " password ")).load()
java.sql.SQLException: No suitable driverfound for jdbc:oracle:thin:@xx.xx.xx.xx:1521:dbsid
at java.sql.DriverManager.getConnection(DriverManager.java:596)
at java.sql.DriverManager.getConnection(DriverManager.java:187)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:188)
atorg.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:181)
atorg.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:121)
atorg.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:91)
at org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:60)
atorg.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
……
报错信息为:
java.sql.SQLException: No suitable driver found for jdbc:oracle:thin:@xx.xx.xx.xx:1521:dbsid |
即报错提示无法找到合适的驱动时,此时可以通过在options方法中指定”driver”属性为具体的驱动类来解决,如下所示:
scala> val jdbcDF =sqlContext.read.format("jdbc").options( Map( "url" ->url, "dbtable" -> "TEST_TABLE", "user" ->"userName", "password" -> "password","driver" -> "oracle.jdbc.driver.OracleDriver")).load()
jdbcDF: org.apache.spark.sql.DataFrame = [ID:decimal(0,-127), XX: string, XX: string, XX: string, XX: decimal(0,-127)]
注意:在1.5版本中,当Oracle表字段为Number时,对应DataType为decimal,此时会由于scala的精度断言抛出异常——可以在stackoverflow网站查找该异常——应该是个bug,在1.6中应该已经解决。有兴趣可以试下下。——如果继续show数据的话,会抛出该异常。
补充:手动加载驱动类也可以解决。
2.3 访问表数据时报无效调用参数
另外,当属性名写错时的错误信息:
——由于在通常写jdbc的连接时,使用username标识用户名,(比如在sqoop工具中),在SparkSQL中需要使用user,否则会报以下错误(可以在官网或API的描述中查找相关属性名):
scala> val jdbcDF =sqlContext.read.format("jdbc").options( Map( "url" ->url, "dbtable" -> "TEST_TABLE", "username"-> "userName", "password"-> "password")).load()
java.sql.SQLException: invalid arguments incall
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:146)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:208)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:236)
当出现invalid arguments in call错误,表示调用时使用了无效参数,可以检查options中的参数名是否写错,尤其是user是否写成了username。
下面解析最常见的无法找到驱动类的问题及其解决方法。