暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

生产Spark+Hive会遇见的拦路虎

若泽大数据 2019-10-17
1734

问题不难,关键在于排查思路


1.抛错: Unable to instantiate SparkSession with Hive support because Hive classes are not found.

需要先将hadoop的core-site.xml,hive的hive-site.xml拷贝到project中
1.1 测试代码

def main(args: Array[String]): Unit = {
val spark: SparkSession = SparkSession
.builder()
.appName("www.ruozedata.com")
.master("local[2]")
.enableHiveSupport()
.getOrCreate()
val userClickDF = spark.table("user_click")
userClickDF.show(10)
}

1.2 报错

Exception in thread "main" java.lang.IllegalArgumentException: 
Unable to instantiate SparkSession with Hive support because Hive classes are not found.

at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:869)
at homework0522.OverwriteTopN$.main(OverwriteTopN.scala:12)
at homework0522.OverwriteTopN.main(OverwriteTopN.scala)

1.3 查看源码

"SparkSession.scala"
**
* Enables Hive support, including connectivity to a persistent Hive metastore, support for
* Hive serdes, and Hive user-defined functions.
*
* @since 2.0.0
*/

def enableHiveSupport(): Builder = synchronized {
"在这里进行if判断的时候找不到hive class"
if (hiveClassesArePresent) {
config(CATALOG_IMPLEMENTATION.key, "hive")
} else {
throw new IllegalArgumentException(
"Unable to instantiate SparkSession with Hive support because " +
"Hive classes are not found.")
}
}

**
* @return true if Hive classes can be loaded, otherwise false.
*/
private[spark] def hiveClassesArePresent: Boolean = {
try {
"这里通过Class.forName去找下面的两个类,第一个类的时候就找不到了"
Utils.classForName(HIVE_SESSION_STATE_BUILDER_CLASS_NAME)
Utils.classForName("org.apache.hadoop.hive.conf.HiveConf")
true
} catch {
case _: ClassNotFoundException | _: NoClassDefFoundError => false
}
}

1.4 发现找不到HiveSessionStateBuilder

   private val HIVE_SESSION_STATE_BUILDER_CLASS_NAME =
"org.apache.spark.sql.hive.HiveSessionStateBuilder"

1.5 解决方法
将$HIVEHOME/lib下的spark-hive2.11-2.4.2.jar与spark-hive-thriftserver_2.11-2.4.2.jar添加到project中。


2.继续报错: java.lang.NoSuchFieldError: METASTORECLIENTSOCKET_LIFETIME

2.1 错误

Exception in thread "main" 
java.lang.NoSuchFieldError: METASTORE_CLIENT_SOCKET_LIFETIME

at org.apache.spark.sql.hive.HiveUtils$.formatTimeVarsForHiveClient(HiveUtils.scala:194)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:285)
at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)

2.2 查看源码

"HiveUtils.scala"
**
* Change time configurations needed to create a [[HiveClient]] into unified [[Long]] format.
*/
private[hive] def formatTimeVarsForHiveClient(hadoopConf: Configuration): Map[String, String] = {
Hive 0.14.0 introduces timeout operations in HiveConf, and changes default values of a bunch
of time `ConfVar`s by adding time suffixes (`s`, `ms`, and `d` etc.). This breaks backwards-
compatibility when users are trying to connecting to a Hive metastore of lower version,
because these options are expected to be integral values in lower versions of Hive.

Here we enumerate all time `ConfVar`s and convert their values to numeric strings according
to their output time units.
Seq(
ConfVars.METASTORE_CLIENT_CONNECT_RETRY_DELAY -> TimeUnit.SECONDS,
ConfVars.METASTORE_CLIENT_SOCKET_TIMEOUT -> TimeUnit.SECONDS,
"在这里读不到值"
ConfVars.METASTORE_CLIENT_SOCKET_LIFETIME -> TimeUnit.SECONDS,
...
).map { case (confVar, unit) =>
confVar.varname -> HiveConf.getTimeVar(hadoopConf, confVar, unit).toString
}.toMap
}


进入ConfVars
"HiveConf.java"
public static enum ConfVars {
SCRIPTWRAPPER("hive.exec.script.wrapper", (Object)null, ""),
PLAN("hive.exec.plan", "", ""),
...

发现ConfVars中定义的变量并没有METASTORECLIENTSOCKET_LIFETIME,而HiveConf.java来自于hive-exec-1.1.0-cdh5.7.0.jar,即证明hive1.1.0中并没有假如该参数。

2.3 解决方法

将hive依赖换为1.2.1

<properties>
...
<!-- <hive.version>1.1.0-cdh5.7.0</hive.version> -->
<hive.version>1.2.1</hive.version>
</properties>

...
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>${hive.version}</version>
</dependency>


3.继续报错: Could not connect to meta store

3.1 抛错

Exception in thread "main" org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Caused by: java.lang.reflect.InvocationTargetException
Caused by: MetaException(message:

Could not connect to meta store using any of the URIs provided
. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused: connect
Caused by: java.net.ConnectException:

Connection refused: connect

3.2 解决方法
这是因为远端没有启动hive造成的,启动hive时需要配置metastore。$HIVE_HOME/bin/hive --service metastore &



剑指数据仓库 | 预售

文章转载自若泽大数据,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论