Apache Calcite 是什么?
Apache Calcite™ is a dynamic data management framework.
基本概念
Catelog
用于定义 SQL 语义相关的元数据与命名空间
SQL Parser
负责将 SQL 转化成 AST
SQL Validator
负责通过 Catalog 对 AST 进行校证
Query Optimizer
负责将 AST 转化成物理执行计划、优化物理执行计划
SQL Generator
负责将物理执行计划反向转化成 SQL 语句
特性
支持标准 SQL 语言
通过适配器(Adapter)可以支持连接任何数据源
支持丰富的关系代数(并集、交集、连接、笛卡尔积等)
支持对逻辑规划规则进行定制(例如 Filter 下推)
支持成本模型优化(CBO, Cost-Based Optimizer 和 RBO, Rule-Based Optimizer)
支持管理物化视图(Materialized view)
支持查询流式数据
稳定可靠(开发迭代 10 年以上)
已贡献给 Apache 基金会(于 2013 年)
开源社区活跃(Apache Druid、Apache Hive、Apache Drill、Apache Flink、Apache Phoenix 等项目均在使用)
Apache Calcite 借助开源的 JavaCC 完成 SQL 解析,将 SQL 语句转化为 Java 代码
Apache Calcite 还使用了轻量级 Janino 编译运行时 Java 代码,以便灵活地管理元数据
架构
专注的层面
其中,数据管理和数据存储,交由第三方计算和存储引擎实现
解析流程
RBO
谓词下推
列裁剪
常量折叠
比对
查询语言 | JDBC | SQL 解析与校验 | Relational Algebra | Execution Engine | |
---|---|---|---|---|---|
Apache Drill | SQL | ✔ | ✔ | ✔ | Native |
Apache Druid | SQL | ✔ | ✔ | ✔ | Native |
Apache Phoenix | SQL | ✔ | ✔ | ✔ | Apache HBase |
Apache Solr | SQL | ✔ | ✔ | ✔ | Native, Enumerable, Apache Lucene |
Apache Storm | SQL | ✔ | ✔ | ✔ | Native |
Apache Kylin | SQL | ✔ | ✔ | Enumerable, Apache HBase | |
MapD | SQL | ✔ | ✔ | Native | |
Lingual | SQL | ✔ | ✔ | Cascading | |
Apache Hive | SQL | ✔ | Apache Tez, Apache Spark | ||
Qubole Quark | SQL | ✔ | ✔ | ✔ | Apache Hive, Presto |
Apache Apex | Streaming SQL | ✔ | ✔ | ✔ | Native |
Apache Flink | Streaming SQL | ✔ | ✔ | ✔ | Native |
Apache Samza | Streaming SQL | ✔ | ✔ | ✔ | Native |
实战
下载
$ $ git clone --depth 1 --single-branch --branch master https://github.com/apache/calcite.git
复制
编译
$ cd calcite/example/csv
$ ./sqlline
复制
链接
sqlline> !connect jdbc:calcite:model=src/test/resources/model.json admin admin
复制
展示所有 Tables
0: jdbc:calcite:model=src/test/resources/mode> !tables
复制
+-----------+-------------+------------+--------------+---------+----------+------------+-----------+---------------------------+----------------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_CAT | TYPE_SCHEM | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION |
+-----------+-------------+------------+--------------+---------+----------+------------+-----------+---------------------------+----------------+
| | SALES | DEPTS | TABLE | | | | | | |
| | SALES | EMPS | TABLE | | | | | | |
| | SALES | SDEPTS | TABLE | | | | | | |
| | metadata | COLUMNS | SYSTEM TABLE | | | | | | |
| | metadata | TABLES | SYSTEM TABLE | | | | | | |
+-----------+-------------+------------+--------------+---------+----------+------------+-----------+---------------------------+----------------+
复制
查询
0: jdbc:calcite:model=src/test/resources/mode> SELECT * FROM emps;
复制
+-------+-------+--------+--------+---------------+-------+------+---------+---------+------------+
| EMPNO | NAME | DEPTNO | GENDER | CITY | EMPID | AGE | SLACKER | MANAGER | JOINEDAT |
+-------+-------+--------+--------+---------------+-------+------+---------+---------+------------+
| 100 | Fred | 10 | | | 30 | 25 | true | false | 1996-08-03 |
| 110 | Eric | 20 | M | San Francisco | 3 | 80 | | false | 2001-01-01 |
| 110 | John | 40 | M | Vancouver | 2 | null | false | true | 2002-05-03 |
| 120 | Wilma | 20 | F | | 1 | 5 | | true | 2005-09-07 |
| 130 | Alice | 40 | F | Vancouver | 2 | null | false | true | 2007-01-01 |
+-------+-------+--------+--------+---------------+-------+------+---------+---------+------------+
5 rows selected (1.266 seconds)
复制
Join
0: jdbc:calcite:model=src/test/resources/mode> SELECT d.name, COUNT(*)
. . . . . . . . . . . . . . . . . . semicolon> FROM emps AS e
. . . . . . . . . . . . . . . . . . semicolon> JOIN depts AS d ON e.deptno = d.deptno
. . . . . . . . . . . . . . . . . . semicolon> GROUP BY d.name
复制
+-----------+--------+
| NAME | EXPR$1 |
+-----------+--------+
| Sales | 1 |
| Marketing | 2 |
+-----------+--------+
2 rows selected (0.336 seconds)
复制
内置函数
0
jdbc:calcite:model=src/test/resources/mode> VALUES CHAR_LENGTH('Hello, ' || 'world!');复制
+--------+
| EXPR$0 |
+--------+
| 13 |
+--------+
1 row selected (0.092 seconds)
复制
社区发展
Star 趋势
个人贡献
Pull Request
详见:《如何成为 Apache 的 PMC》