PERFORMACE _schema 如何设置不好内存消耗是非常恐怖的!
我们来看一下生产环境的主库
mysql> select event_name,current_alloc from sys.memory_global_by_current_bytes limit 10;
+---------------------------------------------------------------------------+---------------+
| event_name | current_alloc |
+---------------------------------------------------------------------------+---------------+
| memory/innodb/buf_buf_pool | 4.08 GiB |
| memory/performance_schema/events_statements_history.digest_text | 2.44 GiB |
| memory/performance_schema/events_statements_history.sql_text | 2.44 GiB |
| memory/performance_schema/events_statements_history | 355.47 MiB |
| memory/innodb/hash0hash | 114.83 MiB |
| memory/performance_schema/events_statements_history_long.sql_text | 97.66 MiB |
| memory/performance_schema/events_statements_summary_by_digest.digest_text | 97.66 MiB |
| memory/performance_schema/events_statements_history_long.digest_text | 97.66 MiB |
| memory/sql/TABLE | 73.76 MiB |
| memory/performance_schema/events_statements_summary_by_digest | 39.67 MiB |
+---------------------------------------------------------------------------+---------------+
10 rows in set (0.01 sec)
mysql> system free -m
total used free shared buff/cache available
Mem: 11852 8393 1158 102 2300 3049
Swap: 24571 190 24381复制
第一行是我们的INNODB_BUFFER_SIZE 数据缓存占用了4GB复制
主要是我们设置该参数.复制
第二行是events_statements_history 的这个字段 DIGEST_TXT 占用了2.44GB复制
第三行 同样是这个表的 另外个字段 SQL_TXT 占用了2.44GB复制
第四行 是这个表其它字段共同占用了355MB复制
第五行 就是 HASH索引占用的 114MB复制
第六行 语句历史LONG表.SQLTXT 占用了97MB复制
第七行 语句汇总数字的DIGEST_TXT 97MB复制
第八行 语句历史LONG表的DIGEST_TXT 占用97MB复制
第九行 表定义 73复制
第十行 语句汇总BY数字表 39MB复制
复制
可以大部分都是PERFORMACE_schema的事件语句相关的表,其中两大字段占用贼多.他们就是复制
它们就是可恶的DIGEST_TXT和SQL_TEXT!复制
SQL_TEXT 字段是SQL语句的原始文本,包含变量值.复制
DIGEST_TXT 字段 是原始SQL 除掉变量值,用?号代替.复制
按ORACLE说法叫绑定变量.这两个字段都是longtext复制
DIGEST 字段是 DIGEST_TXT的 64字符的HASH值, 也就是ORACLE的SQL_ID复制
复制
我们常用的是这4张SQL表:复制
复制
| events_statements_current |
| events_statements_history |
| events_statements_history_long |
|events_statements_summary_by_digest |复制
复制
参数设置:复制
mysql> show variables like '%performance%';
+----------------------------------------------------------+-------+
| Variable_name | Value |
+----------------------------------------------------------+-------+
|performance_schema | ON |
|performance_schema_accounts_size | -1 |
|performance_schema_digests_size | 10000 |
|performance_schema_events_statements_history_long_size | 10000 |
|performance_schema_events_statements_history_size | 1000 |
|performance_schema_max_digest_length | 10240 |
|performance_schema_max_sql_text_length | 10240 |
+----------------------------------------------------------+-------+
45 rows in set (0.01 sec)
复制
原本是1000行的, 语句长度是1024!
我寻思着 要保存1周时间的运行过的SQL 起码要1万以上吧.而且我们这个JAVA的MYBAITS PLUS生产的SQL 含有大量的换行和空格,这1K字符经常满足不了.我就来个10K.
这下好了! PERFORMACE_schema 一上来就全部把内存给占满.不是那种用多少分配多少人品.
它还是静态变量,只要重启才能释放内存!
真坑爹啊
.
四 内存释放的优化
PFS内存释放的最大的问题就是一旦创建出的内存就得不到释放,直到shutdown。
如果遇到热点业务,在业务高峰阶段分配了很多page的内存,在业务低峰阶段依然得不到释放。
要实现定期检测回收内存,又不影响内存分配的效率,实现一套无锁的回收机制还是比较复杂的。
主要有如下几点需要考虑:
释放肯定是要以page为单位的,也就是释放的page内的所有records都必须保证都为free,而且要保证待free的page不会再被分配到
内存分配是随机的,整体上内存是可以回收的,但可能每个page都有一些busy的,如何更优的协调这种情况
释放的阈值怎么定,也要避免频繁分配+释放的问题
针对PFS内存释放的优化,PolarDB已经开发并提供了定期回收PFS内存的特性,鉴于本篇幅的限制,留在后续再介绍了。
五 关于我们
PolarDB 是阿里巴巴自主研发的云原生分布式关系型数据库,于2020年进入Gartner全球数据库Leader象限,并获得了2020年中国电子学会颁发的科技进步一等奖。
PolarDB 基于云原生分布式数据库架构,提供大规模在线事务处理能力,
兼具对复杂查询的并行处理能力,在云原生分布式数据库领域整体达到了国际领先水平,并且得到了广泛的市场认可。在阿里巴巴集团内部的最佳实践中,PolarDB还全面支撑了2020年天猫双十一,并刷新了数据库处理峰值记录,高达1.4亿TPS。欢迎有志之士加入我们,简历请投递到zetao.wzt@alibaba-inc.com,期待与您共同打造世界一流的下一代云原生分布式关系型数据库。
————————————————
版权声明:本文为CSDN博主「阿里云云栖号」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/yunqiinsight/article/details/120861933
复制
然鹅 这4个表付出了这么多内存,确没有得到回报,因为1万条都保存半天的SQL量.
虽然带有HISTORY 甚至还要加上LONG字眼. 真是扯虎皮吓唬人
这三个表相当于ORACLE 的V$SQL
| events_statements_current |
| events_statements_history |
| events_statements_history_long |
而这个才是V$SQL_AREA
|events_statements_summary_by_digest |复制
看下从库内存
生产环境SLAVE
mysql> select event_name,current_alloc from sys.memory_global_by_current_bytes limit 10;
+---------------------------------------------------------------------------+---------------+
| event_name | current_alloc |
+---------------------------------------------------------------------------+---------------+
| memory/performance_schema/events_statements_history.digest_text | 2.44 GiB |
| memory/performance_schema/events_statements_history.sql_text | 2.44 GiB |
| memory/performance_schema/events_statements_history | 355.47 MiB |
| memory/innodb/buf_buf_pool | 130.69 MiB |
| memory/performance_schema/events_statements_history_long.sql_text | 97.66 MiB |
| memory/performance_schema/events_statements_summary_by_digest.digest_text | 97.66 MiB |
| memory/performance_schema/events_statements_history_long.digest_text | 97.66 MiB |
| memory/sql/TABLE | 69.17 MiB |
| memory/performance_schema/events_statements_summary_by_digest | 39.67 MiB |
| memory/innodb/memory | 35.20 MiB |
+---------------------------------------------------------------------------+---------------+
10 rows in set (0.01 sec)
mysql> system free -m
total used free shared buff/cache available
Mem: 9503 3790 1445 145 4267 5261
Swap: 12283 3650 8633
mysql> exit复制
也一样的内存消耗量
算来算去 只要这个表比较实用些,靠谱些
|events_statements_summary_by_digest |复制
events_statements_summary_by_digest.digest_text | 97.66 MiB |
events_statements_summary_by_digest | 39.67 MiB |
相关内存占用150MB
主要是相同的SQL 绑定变量后合并一条.
events_statements_summary_by_digest 最大能设置多少行呢?
我拿SQLE审核数据库试试看
试试之前内存使用情况:
+---------------------------------------------------------------------------+------------+
| event_name | MEMORY |
+---------------------------------------------------------------------------+------------+
| memory/performance_schema/events_statements_history_long | 1.36 GiB |
| memory/performance_schema/events_statements_summary_by_digest | 396.73 MiB |
| memory/performance_schema/events_transactions_history_long | 328.06 MiB |
| memory/performance_schema/events_statements_history_long.sql_text | 243.19 MiB |
| memory/performance_schema/events_statements_history_long.digest_text | 243.19 MiB |
| memory/performance_schema/events_waits_history_long | 167.85 MiB |
| memory/innodb/buf_buf_pool | 130.69 MiB |
| memory/performance_schema/events_stages_history_long | 99.18 MiB |
| memory/mysys/IO_CACHE | 32.09 MiB |
| memory/performance_schema/events_statements_summary_by_digest.digest_text | 24.32 MiB |
+---------------------------------------------------------------------------+------------+
10 rows in set, 1 warning (0.04 sec)
mysql> show variables like '%performance%';
+----------------------------------------------------------+---------+
| Variable_name | Value |
+----------------------------------------------------------+---------+
| performance_schema | ON |
| performance_schema_accounts_size | -1 |
| performance_schema_digests_size | 100000 |
| performance_schema_events_stages_history_long_size | 1000000 |
| performance_schema_events_stages_history_size | 10 |
| performance_schema_events_statements_history_long_size | 1000000 |
| performance_schema_events_statements_history_size | 10 |
| performance_schema_events_transactions_history_long_size | 1000000 |
| performance_schema_events_transactions_history_size | 10 |
| performance_schema_events_waits_history_long_size | 1000000 |
| performance_schema_events_waits_history_size | 10 |
| performance_schema_max_digest_length | 255 |
| performance_schema_max_sql_text_length | 255 |
+----------------------------------------------------------+---------+
45 rows in set (0.01 sec)复制
这个库设置成10万条和100万条,长度缩减一半 255字符
内存占用就没有那么夸张了,两个字段DIGEST_TXT,SQL_TEXT都244MB
反倒是HISTORY_LONG 占用1.3GB
测试环境与生产环境设置一样:
mysql> select event_name,current_alloc from sys.memory_global_by_current_bytes limit 10;
+---------------------------------------------------------------------------+---------------+
| event_name | current_alloc |
+---------------------------------------------------------------------------+---------------+
| memory/performance_schema/events_statements_summary_by_digest.digest_text | 976.56 MiB |
| memory/performance_schema/events_statements_summary_by_digest | 396.73 MiB |
| memory/innodb/buf_buf_pool | 130.69 MiB |
| memory/performance_schema/events_statements_history_long.digest_text | 97.66 MiB |
| memory/performance_schema/events_statements_history_long.sql_text | 97.66 MiB |
| memory/performance_schema/events_statements_current.sql_text | 25.00 MiB |
| memory/performance_schema/events_statements_history.digest_text | 25.00 MiB |
| memory/performance_schema/events_statements_current.digest_text | 25.00 MiB |
| memory/performance_schema/events_statements_history.sql_text | 25.00 MiB |
| memory/innodb/ut0link_buf | 24.00 MiB |
+---------------------------------------------------------------------------+---------------+
10 rows in set (0.00 sec)
mysql> show variables like '%performance%';
+----------------------------------------------------------+--------+
| Variable_name | Value |
+----------------------------------------------------------+--------+
| performance_schema | ON |
| performance_schema_accounts_size | -1 |
| performance_schema_digests_size | 100000 |
| performance_schema_events_stages_history_long_size | 10000 |
| performance_schema_events_stages_history_size | 10 |
| performance_schema_events_statements_history_long_size | 10000 |
| performance_schema_events_statements_history_size | 10 |
| performance_schema_events_transactions_history_long_size | 10000 |
| performance_schema_events_transactions_history_size | 10 |
| performance_schema_events_waits_history_long_size | 10000 |
| performance_schema_events_waits_history_size | 10 |
| performance_schema_max_digest_length | 10240 |
| performance_schema_max_sql_text_length | 10240 |
| performance_schema_show_processlist | ON |
+----------------------------------------------------------+--------+
45 rows in set (0.01 sec)
mysql> system free -m
total used free shared buff/cache available
Mem: 7821 3461 3759 3 600 3680
Swap: 8063 183 7880复制
这样看来PS内存也是动态分配上去的,不是一来就全部分配掉.我那个生产库业务量非常低,低到当做测试库来用.像这样基本零QPS的库SQL量也非常大,基本冲爆内存.
performance_schema_digests_size=10W 100W启动不来
[Server] Failed to allocate 10240000000 bytes for buffer 'memory/performance_schema/events_statements_history_long.digest_text' due to out-of-memory.
复制
100万条启动不了,
10240 X 100W =9765.625MB 约9GB内存
把参数 10240 改成默认值1024 就能启动100W
不过内存就占用的可怕
mysql> select event_name,current_alloc from sys.memory_global_by_current_bytes limit 10;
+-----------------------------------------------------------------------------+---------------+
| event_name | current_alloc |
+-----------------------------------------------------------------------------+---------------+
| memory/performance_schema/events_statements_summary_by_digest | 3.87 GiB |
| memory/performance_schema/events_statements_summary_by_digest.digest_text | 976.56 MiB |
| memory/innodb/buf_buf_pool | 130.69 MiB |
| memory/innodb/ut0link_buf | 24.00 MiB |
| memory/performance_schema/events_statements_history_long | 13.89 MiB |
| memory/performance_schema/events_errors_summary_by_thread_by_error | 12.52 MiB |
| memory/performance_schema/events_statements_summary_by_thread_by_event_name | 10.20 MiB |
| memory/performance_schema/events_statements_history_long.sql_text | 9.77 MiB |
| memory/performance_schema/events_statements_history_long.digest_text | 9.77 MiB |
| memory/performance_schema/memory_summary_by_thread_by_event_name | 9.32 MiB |
+-----------------------------------------------------------------------------+---------------+
10 rows in set (0.09 sec)复制
看样子感觉奇怪, 这段实验感觉是静态分配内存.否则就起不来!
1024X100W = 976.56MB
events_statements_summary_by_digest 一张表就占了3.87GB,咋回事呢?
mysql> desc events_statements_summary_by_digest;
+-----------------------------+-----------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------------+-----------------+------+-----+---------+-------+
| SCHEMA_NAME | varchar(64) | YES | MUL | NULL | |
| DIGEST | varchar(64) | YES | | NULL | |
| DIGEST_TEXT | longtext | YES | | NULL | |
| COUNT_STAR | bigint unsigned | NO | | NULL | |
| SUM_TIMER_WAIT | bigint unsigned | NO | | NULL | |
| MIN_TIMER_WAIT | bigint unsigned | NO | | NULL | |
| AVG_TIMER_WAIT | bigint unsigned | NO | | NULL | |
| MAX_TIMER_WAIT | bigint unsigned | NO | | NULL | |
| SUM_LOCK_TIME | bigint unsigned | NO | | NULL | |
| SUM_ERRORS | bigint unsigned | NO | | NULL | |
| SUM_WARNINGS | bigint unsigned | NO | | NULL | |
| SUM_ROWS_AFFECTED | bigint unsigned | NO | | NULL | |
| SUM_ROWS_SENT | bigint unsigned | NO | | NULL | |
| SUM_ROWS_EXAMINED | bigint unsigned | NO | | NULL | |
| SUM_CREATED_TMP_DISK_TABLES | bigint unsigned | NO | | NULL | |
| SUM_CREATED_TMP_TABLES | bigint unsigned | NO | | NULL | |
| SUM_SELECT_FULL_JOIN | bigint unsigned | NO | | NULL | |
| SUM_SELECT_FULL_RANGE_JOIN | bigint unsigned | NO | | NULL | |
| SUM_SELECT_RANGE | bigint unsigned | NO | | NULL | |
| SUM_SELECT_RANGE_CHECK | bigint unsigned | NO | | NULL | |
| SUM_SELECT_SCAN | bigint unsigned | NO | | NULL | |
| SUM_SORT_MERGE_PASSES | bigint unsigned | NO | | NULL | |
| SUM_SORT_RANGE | bigint unsigned | NO | | NULL | |
| SUM_SORT_ROWS | bigint unsigned | NO | | NULL | |
| SUM_SORT_SCAN | bigint unsigned | NO | | NULL | |
| SUM_NO_INDEX_USED | bigint unsigned | NO | | NULL | |
| SUM_NO_GOOD_INDEX_USED | bigint unsigned | NO | | NULL | |
| FIRST_SEEN | timestamp(6) | NO | | NULL | |
| LAST_SEEN | timestamp(6) | NO | | NULL | |
| QUANTILE_95 | bigint unsigned | NO | | NULL | |
| QUANTILE_99 | bigint unsigned | NO | | NULL | |
| QUANTILE_999 | bigint unsigned | NO | | NULL | |
| QUERY_SAMPLE_TEXT | longtext | YES | | NULL | |
| QUERY_SAMPLE_SEEN | timestamp(6) | NO | | NULL | |
| QUERY_SAMPLE_TIMER_WAIT | bigint unsigned | NO | | NULL | |
+-----------------------------+-----------------+------+-----+---------+-------+
35 rows in set (0.06 sec)复制
共35个字段 除掉2个LONGTEXT外,其它字段加起来才364字节
364X100W=347MB
算上另外两个LONG_TEXT字段的话 976MBX2+347=2300MB
3.8GB 算不清楚啊!
系统内存情况也不乐观
[root@localhost aesygo_mapper]# free -m
total used free shared buff/cache available
Mem: 7821 6816 314 4 690 289
Swap: 8063 181 7882
复制
把长度改成了512后,digest_text缩小了, 表还是老样子
mysql> select event_name,current_alloc from sys.memory_global_by_current_bytes limit 10;
+-----------------------------------------------------------------------------+---------------+
| event_name | current_alloc |
+-----------------------------------------------------------------------------+---------------+
| memory/performance_schema/events_statements_summary_by_digest | 3.87 GiB |
| memory/performance_schema/events_statements_summary_by_digest.digest_text | 488.28 MiB |
| memory/innodb/buf_buf_pool | 130.69 MiB |
| memory/innodb/ut0link_buf | 24.00 MiB |
| memory/performance_schema/events_statements_history_long | 13.89 MiB |
| memory/performance_schema/events_errors_summary_by_thread_by_error | 12.52 MiB |
| memory/performance_schema/events_statements_summary_by_thread_by_event_name | 10.20 MiB |
| memory/performance_schema/memory_summary_by_thread_by_event_name | 9.32 MiB |
| memory/performance_schema/table_handles | 9.06 MiB |
| memory/mysys/KEY_CACHE | 8.00 MiB |
+-----------------------------------------------------------------------------+---------------+
10 rows in set (0.21 sec)复制
系统内存释放了很多复制
[root@localhost aesygo_mapper]# free -m
total used free shared buff/cache available
Mem: 7821 5814 1306 4 700 1291
Swap: 8063 181 7882
复制
复制
继续改成10W + 默认值 1024长度
performance_schema=ON
#performance_schema_events_statements_history_long_size=10000
performance_schema_digests_size=100000
#performance_schema_events_waits_history_long_size=1000
#performance_schema_events_stages_history_long_size=1000
#performance_schema_events_transactions_history_long_size=10000
#performance_schema_max_sql_text_length=1024
#performance_schema_max_digest_length=1024
performance-schema-instrument='wait/lock/metadata/sql/mdl=ON'复制
+-----------------------------------------------------------------------------+---------------+
| event_name | current_alloc |
+-----------------------------------------------------------------------------+---------------+
| memory/performance_schema/events_statements_summary_by_digest | 396.73 MiB |
| memory/innodb/buf_buf_pool | 130.69 MiB |
| memory/performance_schema/events_statements_summary_by_digest.digest_text | 97.66 MiB |
| memory/innodb/ut0link_buf | 24.00 MiB |
| memory/performance_schema/events_statements_history_long | 13.89 MiB |
| memory/performance_schema/events_errors_summary_by_thread_by_error | 12.52 MiB |
| memory/performance_schema/events_statements_summary_by_thread_by_event_name | 10.20 MiB |
| memory/performance_schema/events_statements_history_long.sql_text | 9.77 MiB |
| memory/performance_schema/events_statements_history_long.digest_text | 9.77 MiB |
| memory/performance_schema/memory_summary_by_thread_by_event_name | 9.32 MiB |
+-----------------------------------------------------------------------------+---------------+
10 rows in set (0.00 sec)复制
系统内存释放大把内存
[root@localhost aesygo_mapper]# free -m
total used free shared buff/cache available
Mem: 7821 1467 5652 4 700 5637
Swap: 8063 180 7883
复制
另外三张表可能是动态分配内存?
从主库来看 参数 LONG历史表为1万行,历史表为1千行,最大长度10240
mysql> select event_name,current_alloc from sys.memory_global_by_current_bytes limit 10;
+---------------------------------------------------------------------------+---------------+
| event_name | current_alloc |
+---------------------------------------------------------------------------+---------------+
| memory/performance_schema/events_statements_history.digest_text | 2.44 GiB |
| memory/performance_schema/events_statements_history.sql_text | 2.44 GiB |
| memory/performance_schema/events_statements_history | 355.47 MiB |
| memory/performance_schema/events_statements_history_long.sql_text | 97.66 MiB |
| memory/performance_schema/events_statements_summary_by_digest.digest_text | 97.66 MiB |
| memory/performance_schema/events_statements_history_long.digest_text | 97.66 MiB |
| memory/performance_schema/events_statements_summary_by_digest | 39.67 MiB |
+---------------------------------------------------------------------------+---------------+
10 rows in set (0.01 sec)复制
事件语句历史表两个字段分别占用2.44GB, 表本身占用355MB
事件语句历史长表,字段分别占用97MB,表本身占用 未知
performance_schema_digests_size=10000
/*
控制events_statements_summary_by_digest表中的最大行数。
如果产生的语句摘要信息超过此最大值,便无法继续存入该表,
此时performance_schema会增加状态变量
*/
performance_schema_events_statements_history_long_size=10000
/*
控制events_statements_history_long表中的最大行数,
该参数控制所有会话在events_statements_history_long表中能够存放的
总事件记录数,超过这个限制之后,最早的记录将被覆盖
通常情况下,自动计算的值都是10000.
*/
performance_schema_events_statements_history_size=10
/*
控制events_statements_history表中单个线程(会话)的最大行数,
该参数控制单个会话在events_statements_history表中能够存放的事件记录数,
超过这个限制之后,单个会话最早的记录将被覆盖
通常情况下,自动计算的值都是10
*/
复制
重点是:
控制events_statements_history表中单个线程(会话)的最大行数,
总结
我们获得更多的疑问
1 events_statements_summary_by_digest 是静态分配内存的
2 events_statements_history是按线程动态分配内存
3 SQL_TEXT内存 是 performance_schema_max_sql_text_length 乘以 对应的行数.
4 events_statements_summary_by_digest 所占内存是怎么计算的?
相关文章:
额外知识:
MySQL Performance schema(PFS)是MySQL提供的强大的性能监控诊断工具,
提供了一种能够在运行时检查server内部执行情况的特方法。
PFS通过监视server内部已注册的事件来收集信息,
一个事件理论上可以是server内部任何一个执行行为或资源占用,
比如一个函数调用、一个系统调用wait、SQL查询中的解析或排序状态,
或者是内存资源占用等。
PFS将采集到的性能数据存储在performance_schema存储引擎中,
performance_schema存储引擎是一个内存表引擎,
也就是所有收集的诊断信息都会保存在内存中。
诊断信息的收集和存储都会带来一定的额外开销,为了尽可能小的影响业务,
PFS的性能和内存管理也显得非常重要了。源代码分析基于MySQL-8.0.24版本。
二 内存管理模型
PFS内存管理有几个关键特点:
内存分配以Page为单位,一个Page内可以存储多条record
系统启动时预先分配部分pages,运行期间根据需要动态增长,
但page是只增不回收的模式 record的申请和释放都是无锁的
1 核心数据结构
PFS_buffer_scalable_container是PFS内存管理的核心数据结构
Container中包含多个page,每个page都有固定个数的records,
每个record对应一个事件对象,比如PFS_thread。
每个page中的records数量是固定不变的,但page个数会随着负载增加而增长。
2 Allocate时Page选择策略
PFS_buffer_scalable_container是PFS内存管理的核心数据结构
涉及内存分配的关键数据结构如下:
PFS_PAGE_SIZE 每个page的大小, global_thread_container中默认为256
PFS_PAGE_COUNT page的最大个数,global_thread_container中默认为256
class PFS_buffer_scalable_container
{
PFS_cacheline_atomic_size_t m_monotonic; 单调递增的原子变量,用于无锁选择page
PFS_cacheline_atomic_size_t m_max_page_index; 当前已分配的最大page index
size_t m_max_page_count; 最大page个数,超过后将不再分配新page
std::atomic<array_type *> m_pages[PFS_PAGE_COUNT]; page数组
native_mutex_t m_critical_section; / 创建新page时需要的一把锁
}
首先m_pages是一个数组,每个page都可能有free的records,
也有可能整个page都是busy的,Mysql采用了比较简单的策略,
轮训挨个尝试每个page是否有空闲,直到分配成功。
如果轮训所有pages依然没有分配成功,这个时候就会创建新的page来扩充,
直到达到page数的上限。
轮训并不是每次都是从第1个page开始寻找,
而是使用原子变量m_monotonic记录的位置开始查找,
m_monotonic在每次在page中分配失败是加1。
3 Page内Record选择策略
PFS_buffer_default_array是每个Page维护一组records的管理类。
关键数据结构如下:
class PFS_buffer_default_array
{
PFS_cacheline_atomic_size_t m_monotonic; // 单调递增原子变量,用来选择free的record
size_t m_max; // record的最大个数
T *m_ptr; // record对应的PFS对象,比如PFS_thread
}
每个Page其实就是一个定长的数组,每个record对象有3个状态FREE,DIRTY,
ALLOCATED,FREE表示空闲record可以使用,ALLOCATED是已分配成功的,
DIRTY是一个中间状态,表示已被占用但还没分配成功。
Record的选择本质就是轮训查找并抢占状态为free的record的过程。
选择record的主体主体流程和选择page基本相似,
不同的是page内record数量是固定不变的,所以没有扩容的逻辑。
当然选择策略相同,也会有同样的问题,
这里的m_monotonic原子变量++是多线程并发的,
同样如果并发大的场景下会有record被跳过选择了,
这样导致page内部即便有free的record也可能没有被选中。
所以也就是page选择即便是没有被跳过,
page内的record也有几率被跳过而选不中,雪上加霜,更加加剧了内存的增长。
4 pfs_lock
每个record都有一个pfs_lock,
来维护它在page中的分配状态(free/dirty/allocated),以及version信息。
关键数据结构:
struct pfs_lock {
std::atomic m_version_state;
}
pfs_lock使用1个32位无符号整型来保存version+state信息,
state
低2位字节表示分配状态。
state PFS_LOCK_FREE = 0x00
state PFS_LOCK_DIRTY = 0x01
state PFS_LOCK_ALLOCATED = 0x11
version
初始version为0,每分配成功一次加1,
version就能表示该record被分配成功的次数
5 PFS内存释放
PFS内存释放就比较简单了,
因为每个record都记录了自己所在的container和page,
调用deallocate接口,最终将状态置为free就完成了。
最底层都会进入到pfs_lock来更新状态:
struct pfs_lock
{
void allocated_to_free(void)
{
/*
If this record is not in the ALLOCATED state and the caller is trying
to free it, this is a bug: the caller is confused,
and potentially damaging data owned by another thread or object.
*/
uint32 copy = copy_version_state();
/* Make sure the record was ALLOCATED. */
assert(((copy & STATE_MASK) == PFS_LOCK_ALLOCATED));
/* Keep the same version, set the FREE state */
uint32 new_val = (copy & VERSION_MASK) + PFS_LOCK_FREE;
m_version_state.store(new_val);
}
}
复制