在当今大数据时代,时序数据库的应用越来越广泛,尤其是在物联网、工业监控、金融分析等领域。TDengine 作为一款高性能的时序数据库,凭借独特的存储架构和高效的压缩算法,在存储和查询效率上表现出色。然而,随着数据规模的不断增长,在保证数据安全性和存储效率的同时,如何优化 CPU 的资源占用,成为了一个值得深入讨论的问题。

测试环境
系统:Darwin Kernel Version 23.6.0
taosd 版本:
TDengine Enterprise Editiontaosd version: 3.3.5.2.alpha compatible_version: 3.0.0.0git: 0a42d321120b313019f0ee9b1d7e23599bfd462dgitOfInternal: ab27dbaf76fa60c57363a3053c9c5b012fafddadbuild: macOS-arm64 2025-01-22 15:59:30 +0800
测试准备
建库时指定加密方式,taosbenchmark 不支持加密建库。
create database test ENCRYPT_ALGORITHM 'sm4';
insert.json:
{"filetype": "insert","cfgdir": "/etc/taos","host": "localhost","port": 6030,"user": "root","password": "taosdata","connection_pool_size": 8,"num_of_records_per_req": 20000,"thread_count": 8,"create_table_thread_count": 10,"result_file": "./insert_res_mix.txt","confirm_parameter_prompt": "no","insert_interval": 0,"continue_if_fail": "yes","databases": [{"dbinfo": {"name": "test","drop": "no","vgroups": 1,"replica": 1,"stt_trigger": 1,"minRows": 100,"WAL_RETENTION_PERIOD": 10,"maxRows": 4096},"super_tables": [{"name": "meters","child_table_exists": "no","auto_create_table":"no","childtable_count": 10000,"insert_rows": 100,"childtable_prefix": "d","insert_mode": "stmt2","insert_interval": 0,"timestamp_step": 900000,"start_timestamp":"2022-09-01 10:00:00","disorder_ratio": 0,"update_ratio": 0,"delete_ratio": 0,"continue_if_fail": "yes","disorder_fill_interval": 0,"update_fill_interval": 0,"generate_row_rule": 0,"columns": [{ "type": "binary","compress":"lz4", "name": "val", "len": 64},{ "type": "binary","compress":"lz4", "name": "order_no", "len": 64},{ "type": "binary","compress":"lz4", "name": "production_no", "len": 64},{ "type": "binary","compress":"lz4", "name": "modal_no", "len": 64}],"tags": [{ "type": "binary", "name": "device_no", "len": 64 ,"values": ["San Francisco", "Los Angles", "San Diego","San Jose", "Palo Alto", "Campbell", "Mountain View","Sunnyvale", "Santa Clara", "Cupertino"] },{ "type": "int", "name": "channel_id", "max": 100, "min": 0},{ "type": "binary", "name": "point_no", "len": 64 ,"values": ["San Francisco", "Los Angles", "San Diego","San Jose", "Palo Alto", "Campbell", "Mountain View","Sunnyvale", "Santa Clara", "Cupertino"]},{ "type": "int", "name": "datatype", "max": 100, "min": 0},{ "type": "int", "name": "business_type", "max": 100, "min": 0},{ "type": "binary", "name": "unit", "len": 16 ,"values": ["San Francisco", "Los Angles", "San Diego","San Jose", "Palo Alto", "Campbell", "Mountain View","Sunnyvale", "Santa Clara", "Cupertino"]}]}]}]}
测试结果
“
场景一:sm4 加密 & lz4 压缩
/
测试结果:

压缩:LZ4compress:0.76% + 2.84%(table data compress)+0.1%(Stt) 解密:SM4_decrypt:5.87%(MergeFile)+ 1.12%(MergeFile) 加密:SM4_encrypt:59.02%(WAL) + 10.68%(table data) + 6.97%(table data end) + 2.04%(Stt)
结论:加密比压缩占用更多 CPU 资源,大约达 70%。这是因为压缩/解压仅在数据生成时调用,而写入 WAL、Meta 数据和落盘至 TSDB 的全过程都涉及加密。此外,系统启动时,读取仍存于 WAL 中的未落盘数据、首次从 TSDB 读取的数据,以及首次访问 Meta 数据时,均需执行解密操作。
“
场景二:lz4 压缩解压缩
/
for (int i = 0; i < 10000; i++) {sprintf(sql, "select * from d_%d", i);do_query(taos, sql);}
测试结果:

压缩:compressData:3.33%(table data)+1.01%(table data end)
解压缩:ColDataDecompress/decompressData:1.31%+0.66%+0.22%+0.18%
结论:加密解密的性能占比不高,大部分耗时在 LRU 缓存切换上,因为查询次数过多,导致测试不理想。
“
场景三:增大数据量减少查询次数,测 lz4 压缩解压缩化
/
select * from meters;
测试结果:

压缩:4.93%(table data end)+7.3%(table data)+0.44%(table data end) 解压缩:0.95%+0.51%
结论:测试结果显示,在正常情况下,压缩/解压过程占整个查询的 CPU 开销约 15%。由于压缩/解压仅在数据生成时调用,并且数据以块形式进行处理,其效率远高于加密/解密。
结语
通过分析 TDengine 在数据写入与查询场景下的压缩解压与加密解密过程的 CPU 占用情况,可以看出,加密对数据导入影响较大,占用约 77% 的 CPU 资源。写入 WAL、Meta 数据及落盘至 TSDB 的全过程均涉及加密,而系统启动时,读取仍存于 WAL 中的未落盘数据、首次从 TSDB 读取的数据以及首次访问 Meta 数据时,则需要执行解密操作。相比之下,压缩/解压对数据导入导出的影响较小,仅占 CPU 资源约 15%。这是因为压缩/解压仅在数据生成时调用,并且数据以块形式处理,其效率远高于加密/解密。

推荐阅读
2600 万表流计算分析,TDengine 助力数百家超市数字化转型
25-2-11

24-12-31

25-1-13

👇 点击阅读原文,立即体验 TDengine!




