Aapche Drill 是什么?
Apache Drill™ is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google's Dremel.
优缺点
优势
支持自定义的嵌套数据结构
兼容 Hive(包括 Hive 的 UDF,且支持自定义 UDF)
高性能、低延迟的 SQL 查询
支持多数据源(插件化,包括 Apache Kafka、Apache HBase、Apache Hive、OpenTSDB、S3 等)
UDF(User Defined Funcation):用户定义普通函数,只作用于单行记录
UDAF(User Defined Aggregation Funcation):用户定义聚合函数,只作用于多行记录
UDTF(User Defined Table Generating Funcation):用户定义表生成函数,可以输入一行记录输出多行记录
劣势
与标准 SQL 略有不同
外部依赖较多(基于 Apache ZooKeeper 实现分布式、基于 Apache Calcite 实现 SQL 解析)
比较小众,相关资料缺乏
比对
Apache Drill vs Presto
Apache Drill | Presto | |
---|---|---|
针对领域 | 非关系型数据库 | 分布式数据库 |
企业级 | ✔ | ✔ |
成功案例 | MapR™ | Teradata™ |
部署 | 较繁琐 | 较快捷 |
实战
下载
$ wget https://mirror.bit.edu.cn/apache/drill/drill-1.18.0/apache-drill-1.18.0.tar.gz
$ tar zxvf apache-drill-1.18.0.tar.gz
$ ln -s apache-drill-1.18.0 drill
$ cd drill
复制
启动
$ bin/drill-embedded
复制
Apache Drill 1.18.0
"In Drill We Trust."
复制
展示所有 Tables
apache drill> !tables
复制
+-----------+--------------------+----------------------+--------------+---------+----------+------------+-----------+---------------------------+----------------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_CAT | TYPE_SCHEM | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION |
+-----------+--------------------+----------------------+--------------+---------+----------+------------+-----------+---------------------------+----------------+
| DRILL | information_schema | CATALOGS | SYSTEM TABLE | null | null | null | null | null | null |
| DRILL | information_schema | COLUMNS | SYSTEM TABLE | null | null | null | null | null | null |
| DRILL | information_schema | FILES | SYSTEM TABLE | null | null | null | null | null | null |
| DRILL | information_schema | PARTITIONS | SYSTEM TABLE | null | null | null | null | null | null |
| DRILL | information_schema | SCHEMATA | SYSTEM TABLE | null | null | null | null | null | null |
| DRILL | information_schema | TABLES | SYSTEM TABLE | null | null | null | null | null | null |
| DRILL | information_schema | VIEWS | SYSTEM TABLE | null | null | null | null | null | null |
| DRILL | sys | boot | SYSTEM TABLE | null | null | null | null | null | null |
| DRILL | sys | connections | SYSTEM TABLE | null | null | null | null | null | null |
| DRILL | sys | drillbits | SYSTEM TABLE | null | null | null | null | null | null |
| DRILL | sys | functions | SYSTEM TABLE | null | null | null | null | null | null |
| DRILL | sys | internal_options | SYSTEM TABLE | null | null | null | null | null | null |
| DRILL | sys | internal_options_old | SYSTEM TABLE | null | null | null | null | null | null |
| DRILL | sys | memory | SYSTEM TABLE | null | null | null | null | null | null |
| DRILL | sys | options | SYSTEM TABLE | null | null | null | null | null | null |
| DRILL | sys | options_old | SYSTEM TABLE | null | null | null | null | null | null |
| DRILL | sys | profiles | SYSTEM TABLE | null | null | null | null | null | null |
| DRILL | sys | profiles_json | SYSTEM TABLE | null | null | null | null | null | null |
| DRILL | sys | threads | SYSTEM TABLE | null | null | null | null | null | null |
| DRILL | sys | version | SYSTEM TABLE | null | null | null | null | null | null |
+-----------+--------------------+----------------------+--------------+---------+----------+------------+-----------+---------------------------+----------------+
复制
查询
apache drill> select * from cp.`employee.json` limit 2;
复制
+-------------+-----------------+------------+-----------+-------------+--------------------+----------+---------------+------------+-----------------------+---------+---------------+-----------------+----------------+--------+-------------------+
| employee_id | full_name | first_name | last_name | position_id | position_title | store_id | department_id | birth_date | hire_date | salary | supervisor_id | education_level | marital_status | gender | management_role |
+-------------+-----------------+------------+-----------+-------------+--------------------+----------+---------------+------------+-----------------------+---------+---------------+-----------------+----------------+--------+-------------------+
| 1 | Sheri Nowmer | Sheri | Nowmer | 1 | President | 0 | 1 | 1961-08-26 | 1994-12-01 00:00:00.0 | 80000.0 | 0 | Graduate Degree | S | F | Senior Management |
| 2 | Derrick Whelply | Derrick | Whelply | 2 | VP Country Manager | 0 | 1 | 1915-07-03 | 1994-12-01 00:00:00.0 | 40000.0 | 1 | Graduate Degree | M | M | Senior Management |
+-------------+-----------------+------------+-----------+-------------+--------------------+----------+---------------+------------+-----------------------+---------+---------------+-----------------+----------------+--------+-------------------+
2 rows selected (0.281 seconds)
复制
退出
apache drill> !quit
复制
踩过的坑
默认时区是 UTC
解决
SELECT to_timestamp(CONCAT(`timestamp`, ' +0800'), 'YYYY-MM-dd HH:mm:ss.SSS Z') AS tm FROM `yuzhouwan`.`blog`
复制