使用Sqoop将MySQL数据导入到hive

BigData社区 2019-07-18

323

本篇文章主要举一些实时案例来展现Sqoop把MySQL数据导入到hive的功能实现。

普通导入

将top100表导入到hive中 //hive表不存在的情况下，必须加--create-hive-table

sqoop import --connect jdbc:mysql://192.168.15.119:3306/big14 --username root --password root --table toppw100 --hive-import --create-hive-table --hive-database big14 --hive-table toppw100 --delete-target-dir --fields-terminated-by '\t' --lines-terminated-by '\n' -m 1
复制

2.覆盖导入

sqoop import --connect jdbc:mysql://192.168.15.119:3306/big14 --username root --password root --table toppw100 --hive-import  --hive-database big14 --hive-table toppw100 --delete-target-dir --fields-terminated-by '\t' --lines-terminated-by '\n' -m 1 --hive-overwrite
复制

3.导入分区表 //不支持多级分区数据导入

sqoop import --connect jdbc:mysql://192.168.79.1:3306/big14 --username root --password root --table users --hive-import --create-hive-table --hive-database big14 --hive-table sqoop_users --delete-target-dir --fields-terminated-by '\t' --lines-terminated-by '\n' -m 1 --hive-partition-key province --hive-partition-value beijing
复制

4.将users中奇数id的数据插入到分区表sqoop_user,指定分区province=beijing

sqoop import --connect jdbc:mysql://192.168.79.1:3306/big14 --username root --password root --hive-import --create-hive-table --hive-overwrite  --hive-database big14 --hive-table sqoop_user --fields-terminated-by '\t' --lines-terminated-by '\n' -m 1 --hive-partition-key province --hive-partition-value beijing -e 'select * from users where id%2=1 and $CONDITIONS' --target-dir user/sqoop/sqoop_user
复制

原理：

使用中间文件，将mysql中的数据先放入中间文件，然后使用个load命令直接将数据导入到hive分区中

eg.LOAD DATA INPATH 'hdfs://mycluster/user/sqoop/sqoop_user' OVERWRITE INTO TABLE `big14`.`sqoop_user` PARTITION (province='beijing')

5.在users插入一条数据，101 jerry 20，将其增量导入到hive，分区provice =beijing

sqoop import --connect jdbc:mysql://192.168.79.1:3306/big14 --username root --password root --table users --hive-import --hive-database big14 --hive-table sqoop_user --target-dir user/sqoop/sqoop_user --fields-terminated-by '\t' --lines-terminated-by '\n' -m 1 --hive-partition-key province --hive-partition-value beijing --incremental append --check-column id --last-value 4 
复制

必填参数

------------------------------------------

--hive-database <database-name> //指定hive的数据库

--hive-import //指定导入到hive表

--hive-table <table-name> //指定hive表

可选参数

----------------------------------------------

--fields-terminated-by <char> //指定字段分隔符

--lines-terminated-by <char> //指定行分隔符

--create-hive-table //自动创建hive表，若表已存在则异常

--external-table-dir <hdfs path> //指定外部表的路径

--hive-overwrite //覆盖原数据

--hive-partition-key <partition-key> //hive分区key：如province

--hive-partition-value <partition-value> //hive分区value：如beijing

以上就是今天的所有内容啦。希望能在你学习的路上帮到你，要是觉得还不错请识别以下二维码关注或转发吧，感谢支持！

使用Sqoop将MySQL数据导入到hive

评论

相关阅读