暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Aapche Drill:一款分布式查询引擎

宇宙湾 2020-10-25
399

Aapche Drill 是什么?

Apache Drill™ is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google's Dremel.

优缺点

优势

  • 支持自定义的嵌套数据结构

  • 兼容 Hive(包括 Hive 的 UDF,且支持自定义 UDF)

  • 高性能、低延迟的 SQL 查询

  • 支持多数据源(插件化,包括 Apache Kafka、Apache HBase、Apache Hive、OpenTSDB、S3 等)

UDF(User Defined Funcation):用户定义普通函数,只作用于单行记录

UDAF(User Defined Aggregation Funcation):用户定义聚合函数,只作用于多行记录

UDTF(User Defined Table Generating Funcation):用户定义表生成函数,可以输入一行记录输出多行记录

劣势

  • 与标准 SQL 略有不同

  • 外部依赖较多(基于 Apache ZooKeeper 实现分布式、基于 Apache Calcite 实现 SQL 解析)

  • 比较小众,相关资料缺乏

比对

Apache Drill vs Presto


Apache DrillPresto
针对领域非关系型数据库分布式数据库
企业级
成功案例MapR™Teradata™
部署较繁琐较快捷

实战

下载

  1. $ wget https://mirror.bit.edu.cn/apache/drill/drill-1.18.0/apache-drill-1.18.0.tar.gz

  2. $ tar zxvf apache-drill-1.18.0.tar.gz

  3. $ ln -s apache-drill-1.18.0 drill

  4. $ cd drill

复制

启动

  1. $ bin/drill-embedded

复制
  1. Apache Drill 1.18.0

  2. "In Drill We Trust."

复制

展示所有 Tables

  1. apache drill> !tables

复制
  1. +-----------+--------------------+----------------------+--------------+---------+----------+------------+-----------+---------------------------+----------------+

  2. | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_CAT | TYPE_SCHEM | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION |

  3. +-----------+--------------------+----------------------+--------------+---------+----------+------------+-----------+---------------------------+----------------+

  4. | DRILL | information_schema | CATALOGS | SYSTEM TABLE | null | null | null | null | null | null |

  5. | DRILL | information_schema | COLUMNS | SYSTEM TABLE | null | null | null | null | null | null |

  6. | DRILL | information_schema | FILES | SYSTEM TABLE | null | null | null | null | null | null |

  7. | DRILL | information_schema | PARTITIONS | SYSTEM TABLE | null | null | null | null | null | null |

  8. | DRILL | information_schema | SCHEMATA | SYSTEM TABLE | null | null | null | null | null | null |

  9. | DRILL | information_schema | TABLES | SYSTEM TABLE | null | null | null | null | null | null |

  10. | DRILL | information_schema | VIEWS | SYSTEM TABLE | null | null | null | null | null | null |

  11. | DRILL | sys | boot | SYSTEM TABLE | null | null | null | null | null | null |

  12. | DRILL | sys | connections | SYSTEM TABLE | null | null | null | null | null | null |

  13. | DRILL | sys | drillbits | SYSTEM TABLE | null | null | null | null | null | null |

  14. | DRILL | sys | functions | SYSTEM TABLE | null | null | null | null | null | null |

  15. | DRILL | sys | internal_options | SYSTEM TABLE | null | null | null | null | null | null |

  16. | DRILL | sys | internal_options_old | SYSTEM TABLE | null | null | null | null | null | null |

  17. | DRILL | sys | memory | SYSTEM TABLE | null | null | null | null | null | null |

  18. | DRILL | sys | options | SYSTEM TABLE | null | null | null | null | null | null |

  19. | DRILL | sys | options_old | SYSTEM TABLE | null | null | null | null | null | null |

  20. | DRILL | sys | profiles | SYSTEM TABLE | null | null | null | null | null | null |

  21. | DRILL | sys | profiles_json | SYSTEM TABLE | null | null | null | null | null | null |

  22. | DRILL | sys | threads | SYSTEM TABLE | null | null | null | null | null | null |

  23. | DRILL | sys | version | SYSTEM TABLE | null | null | null | null | null | null |

  24. +-----------+--------------------+----------------------+--------------+---------+----------+------------+-----------+---------------------------+----------------+

复制

查询

  1. apache drill> select * from cp.`employee.json` limit 2;

复制
  1. +-------------+-----------------+------------+-----------+-------------+--------------------+----------+---------------+------------+-----------------------+---------+---------------+-----------------+----------------+--------+-------------------+

  2. | employee_id | full_name | first_name | last_name | position_id | position_title | store_id | department_id | birth_date | hire_date | salary | supervisor_id | education_level | marital_status | gender | management_role |

  3. +-------------+-----------------+------------+-----------+-------------+--------------------+----------+---------------+------------+-----------------------+---------+---------------+-----------------+----------------+--------+-------------------+

  4. | 1 | Sheri Nowmer | Sheri | Nowmer | 1 | President | 0 | 1 | 1961-08-26 | 1994-12-01 00:00:00.0 | 80000.0 | 0 | Graduate Degree | S | F | Senior Management |

  5. | 2 | Derrick Whelply | Derrick | Whelply | 2 | VP Country Manager | 0 | 1 | 1915-07-03 | 1994-12-01 00:00:00.0 | 40000.0 | 1 | Graduate Degree | M | M | Senior Management |

  6. +-------------+-----------------+------------+-----------+-------------+--------------------+----------+---------------+------------+-----------------------+---------+---------------+-----------------+----------------+--------+-------------------+

  7. 2 rows selected (0.281 seconds)

复制

退出

  1. apache drill> !quit

复制

踩过的坑

默认时区是 UTC

解决

  1. SELECT to_timestamp(CONCAT(`timestamp`, ' +0800'), 'YYYY-MM-dd HH:mm:ss.SSS Z') AS tm FROM `yuzhouwan`.`blog`

复制


文章转载自宇宙湾,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论