作者
digoal
日期
2019-11-20
标签
PostgreSQL , linux , combined , 网卡队列 , irqbalance
背景
中断被限制在某些单核处理成为瓶颈。
网卡没有开队列,高吞吐时产生瓶颈。qps可能从几十万降到几万。
这两种瓶颈如何解决?
中断不均匀问题:
《转载 - Linux 多核下绑定硬件中断到不同 CPU(IRQ Affinity)》
《转载 - 用户空间与内核空间,进程上下文与中断上下文[总结]》
```
也可以直接使用irqbalance解决
service irqbalance start
```
网卡队列问题:
通过ethtool -L 设置
```
ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX: 0
TX: 0
Other: 0
Combined: 4 最大值4
Current hardware settings:
RX: 0
TX: 0
Other: 0
Combined: 1
设置为最大值4
ethtool -L eth0 combined 4
ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX: 0
TX: 0
Other: 0
Combined: 4
Current hardware settings:
RX: 0
TX: 0
Other: 0
Combined: 4
```
性能对比
ecs 客户端 16c
ecs 数据库服务器 32c
相同机房
pgbench -i -s 1000
pgbench -M prepared -n -r -P 1 -c 192 -j 192 -T 120 -S
关闭网卡队列时,测试tpcb select only
transaction type: <builtin: select only>
scaling factor: 1000
query mode: prepared
number of clients: 192
number of threads: 192
duration: 120 s
number of transactions actually processed: 34502551
latency average = 0.664 ms
latency stddev = 0.134 ms
tps = 287512.936228 (including connections establishing)
tps = 288934.928415 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.667 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
开启网卡队列(客户端4, 数据库端8)时,测试tpcb select only
transaction type: <builtin: select only>
scaling factor: 1000
query mode: prepared
number of clients: 192
number of threads: 192
duration: 120 s
number of transactions actually processed: 55518366
latency average = 0.403 ms
latency stddev = 0.520 ms
tps = 462553.463542 (including connections establishing)
tps = 467220.064086 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.403 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
注意
测试机所在服务器,要开启网卡队列。
数据库所在服务器,也要开启网卡队列。
小结
网卡队列对高并发的业务非常重要。关闭网卡队列,cpu压不满,28.8万 qps。开启网卡队列,46.7万 qps, cpu基本耗尽。开启相比关闭性能差了将近一倍。
参考
man irqbalance
man ethtool
```
-L --set-channels
Changes the numbers of channels of the specified network device.
rx N Changes the number of channels with only receive queues. tx N Changes the number of channels with only transmit queues. other N Changes the number of channels used only for other purposes e.g. link interrupts or SR-IOV co-ordination. combined N Changes the number of multi-purpose channels.
复制
```
《转载 - Linux 多核下绑定硬件中断到不同 CPU(IRQ Affinity)》
《转载 - 用户空间与内核空间,进程上下文与中断上下文[总结]》
PostgreSQL 许愿链接
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.