点击关注上方“知了小巷”,
设为“置顶或星标”,第一时间送达干货。
Apache Atlas导入Hive元数据及Atlas常用配置
元数据驱动-用数据解决数据的问题;Apache Atlas Hook & Bridge for Apache Hive。
内容提要:
1.导入Hive元数据
2.Apache Atlas常用配置
接上篇、上上篇
【Apache Atlas源码编译部署和配置运行】
【Apache Atlas元数据管理入门】
导入Hive元数据
修改Hive的配置文件 hive-site.xml
,添加配置参数hive.exec.post.hooks
<property>
<name>hive.exec.post.hooks</name>
<value>org.apache.atlas.hive.hook.HiveHook</value>
</property>
找到 hive-hook
包并解压出来,在上篇的编译打包后的目录下apache-atlas-sources-2.1.0/distro/target/
;压缩包apache-atlas-2.1.0-hive-hook.tar.gz
,解压之后复制到atlas的相关部署目录下面
$ tar zxvf apache-atlas-2.1.0-hive-hook.tar.gz
$ ls apache-atlas-hive-hook-2.1.0/
hook hook-bin
$ mkdir apache-atlas-2.1.0/hook/
$ cp -r apache-atlas-hive-hook-2.1.0/hook/hive apache-atlas-2.1.0/hook/
$ cp -r apache-atlas-hive-hook-2.1.0/hook-bin apache-atlas-2.1.0
$ ls apache-atlas-2.1.0
DISCLAIMER.txt NOTICE conf hook logs server
LICENSE bin data hook-bin models tools
在Hive的环境变量配置 hive-env.sh
中加上hive-hook
的jar包地址配置
export HIVE_AUX_JARS_PATH=/Users/shaozhipeng/Development/project/java/atlas-download/apache-atlas-2.1.0/hook/hive
把atlas的配置文件 atlas-application.properties
和atlas的环境变量配置atlas-env.sh
链接到Hive的conf
配置文件目录下
$ pwd
/Users/shaozhipeng/Development/hive-3.1.1/conf
$ ln -s /Users/shaozhipeng/Development/project/java/atlas-download/apache-atlas-2.1.0/conf/atlas-application.properties /Users/shaozhipeng/Development/hive-3.1.1/conf/atlas-application.properties
$ ln -s /Users/shaozhipeng/Development/project/java/atlas-download/apache-atlas-2.1.0/conf/atlas-env.sh /Users/shaozhipeng/Development/hive-3.1.1/conf/atlas-env.sh
创建solr的索引 edge_index
$ ~/Development/solr-7.5.0/bin/solr create -c edge_index -d ~/Development/solr-7.5.0/atlas_conf
...
Created new core 'edge_index'
导入hive元数据
注意Atlas的用户名密码admin/admin;如果导入报错,注意排查atlas的logs/application.log
文件,一般错误信息会输出到日志文件里面。
$ ls
import-hive.sh
$ sh import-hive.sh
...
... INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Successfully imported 26 tables from database default
Hive Meta Data imported successfully!!!
Apache Atlas常用配置
Atlas Server的内存配置atlas-env.sh
#设置Atlas内存
export ATLAS_SERVER_OPTS="-server -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=dumps/atlas_server.hprof -Xloggc:logs/gc-worker.log -verbose:gc -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1m -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCTimeStamps"
#建议JDK1.7使用以下配置
export ATLAS_SERVER_HEAP="-Xms15360m -Xmx15360m -XX:MaxNewSize=3072m -XX:PermSize=100M -XX:MaxPermSize=512m"
#建议JDK1.8使用以下配置
export ATLAS_SERVER_HEAP="-Xms15360m -Xmx15360m -XX:MaxNewSize=5120m -XX:MetaspaceSize=100M -XX:MaxMetaspaceSize=512m"
#如果是Mac OS用户需要配置
export ATLAS_SERVER_OPTS="-Djava.awt.headless=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
参数说明:-XX:SoftRefLRUPolicyMSPerMB=0
此参数对管理具有许多并发用户的查询繁重工作负载的GC性能特别有用。
用户名密码配置
默认账户是admin/admin
atlas-application.properties
登录认证方式:
$ vi atlas-application.properties
# Authentication config
atlas.authentication.method.kerberos=false
# 默认是文件
atlas.authentication.method.file=true
#### ldap.type= LDAP or AD
atlas.authentication.method.ldap.type=none
如果两个或多个身份证验证方法设置为true,如果前面的方法失败,则身份验证将回退到后一种方法。例如,如果Kerberos
身份验证设置为true并且ldap
身份验证也设置为true,那么,如果对于没有kerberos principal和keytab的请求,LDAP身份验证将作为后备方案。
默认情况下采用文件方式,修改用户名和密码设置(users-credentials.properties
):
$ vi users-credentials.properties
#username=group::sha256-password
admin=ADMIN::a4a88c0872bf652bb9ed803ece5fd6e82354838a9bf59ab4babb1dab322154e1
rangertagsync=RANGER_TAG_SYNC::0afe7a1968b07d4c3ff4ed8c2d809a32ffea706c66cd795ead9048e81cfaf034
admin
是用户名称8c6976e5b5410415bde908bd4dee15dfb167a9c873fc4bb8a81f6f2ab448a918是采用 sha256
加密的密码,默认密码为admin
添加一个用户,比如zlxx
,密码zlxxAdmin
,sha256
加密之后的密文:1e71d0be866581347570c9680a4b7eb8535fb8c91b490ab7f7c8d35fe1ddeb86
$ echo -n "zlxxAdmin"|sha256sum
1e71d0be866581347570c9680a4b7eb8535fb8c91b490ab7f7c8d35fe1ddeb86 -
$ vim users-credentials.properties
#username=group::sha256-password
admin=ADMIN::a4a88c0872bf652bb9ed803ece5fd6e82354838a9bf59ab4babb1dab322154e1
rangertagsync=RANGER_TAG_SYNC::0afe7a1968b07d4c3ff4ed8c2d809a32ffea706c66cd795ead9048e81cfaf034
zlxx=ADMIN::1e71d0be866581347570c9680a4b7eb8535fb8c91b490ab7f7c8d35fe1ddeb86
重启Atlas Server后使用zlxx登录:
$ ./bin/atlas_stop.py
stopping atlas...
Apache Atlas Server stopped!!!
$ ./bin/atlas_start.py
starting atlas on host localhost
starting atlas on port 21000
.............................
Apache Atlas Server started!!!
正常登录

猜你喜欢

点一下,代码无 Bug





