暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

linux 内存文件系统使用 - tmpfs, ramfs, shmfs

digoal 2019-02-11
679

作者

digoal

日期

2019-02-11

标签

PostgreSQL , hugetlbfs , hugepage , memory filesystem , ramfs , tmpfs , shmfs


背景

在做一些测试时,如果IO设备很烂的话,可以直接使用内存文件系统,避免IO上引入的一些开销影响测试结果。

用法很简单:

tmpfs or shmfs

mount a shmfs with a certain size to /dev/shm, and set the correct permissions.

For tmpfs you do not need to specify a size. Tmpfs or shmfs allocated memory is pageable.

For example:

Example Mount shmfs:

```

mount -t shm shmfs -o size=20g /dev/shm

Edit /etc/fstab:

shmfs /dev/shm shm size=20g 0 0
```

OR

Example Mount tmpfs:

```

mount –t tmpfs tmpfs /dev/shm

Edit /etc/fstab:

none /dev/shm tmpfs defaults 0 0
```

ramfs

ramfs is similar to shmfs, except that pages are not pageable or swappable.

This approach provides the commonly desired effect. ramfs is created by:

```
umount /dev/shm

mount -t ramfs ramfs /dev/shm
```

例子

[root@pg11-test ~]# mkdir /mnt/tmpfs [root@pg11-test ~]# mkdir /mnt/ramfs

1、tmpfs

mount -t tmpfs tmpfs /mnt/tmpfs -o size=10G,noatime,nodiratime,rw mkdir /mnt/tmpfs/a chmod 777 /mnt/tmpfs/a

2、ramfs

mount -t ramfs ramfs /mnt/ramfs -o noatime,nodiratime,rw,data=writeback,nodelalloc,nobarrier mkdir /mnt/ramfs/a chmod 777 /mnt/ramfs/a

ramfs无法在mount时限制大小,即使限制了也不起作用,在df结果中也看不到这个挂载点,但是实际上已经挂载。

```
[root@pg11-test ~]# mount
tmpfs on /mnt/tmpfs type tmpfs (rw,noatime,nodiratime,size=10485760k)
ramfs on /mnt/ramfs type ramfs (rw,noatime,nodiratime,data=writeback,nodelalloc,nobarrier)

[root@pg11-test ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 197G 17G 171G 9% /
devtmpfs 252G 0 252G 0% /dev
tmpfs 252G 936K 252G 1% /dev/shm
tmpfs 252G 676K 252G 1% /run
tmpfs 252G 0 252G 0% /sys/fs/cgroup
/dev/mapper/vgdata01-lv03 4.0T 549G 3.5T 14% /data03
/dev/mapper/vgdata01-lv02 4.0T 335G 3.7T 9% /data02
/dev/mapper/vgdata01-lv01 4.0T 1.5T 2.6T 37% /data01
tmpfs 51G 0 51G 0% /run/user/0
/dev/mapper/vgdata01-lv04 2.0T 621G 1.3T 32% /data04
tmpfs 10G 0 10G 0% /mnt/tmpfs
```

内存文件系统性能

PostgreSQL fsync测试接口,测试内存文件系统fsync性能。

```
su - digoal

digoal@pg11-test-> pg_test_fsync -f /mnt/tmpfs/a/1
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync n/a
fdatasync 1137033.436 ops/sec 1 usecs/op
fsync 1146431.736 ops/sec 1 usecs/op
fsync_writethrough n/a
open_sync n/a

* This file system and its mount options do not support direct
I/O, e.g. ext4 in journaled mode.

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync n/a
fdatasync 622763.705 ops/sec 2 usecs/op
fsync 625990.998 ops/sec 2 usecs/op
fsync_writethrough n/a
open_sync n/a

* This file system and its mount options do not support direct
I/O, e.g. ext4 in journaled mode.

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB in different write
open_sync sizes.)
1 * 16kB open_sync write n/a
2 * 8kB open_sync writes n/a

4 * 4kB open_sync writes n/a
8 * 2kB open_sync writes n/a

16 * 1kB open_sync writes n/a*

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written on a different
descriptor.)
write, fsync, close 317779.892 ops/sec 3 usecs/op
write, close, fsync 317769.037 ops/sec 3 usecs/op

Non-sync'ed 8kB writes:
write 529490.541 ops/sec 2 usecs/op

digoal@pg11-test-> pg_test_fsync -f /mnt/ramfs/a/1
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync n/a
fdatasync 1146515.453 ops/sec 1 usecs/op
fsync 1149912.760 ops/sec 1 usecs/op
fsync_writethrough n/a
open_sync n/a

* This file system and its mount options do not support direct
I/O, e.g. ext4 in journaled mode.

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync n/a
fdatasync 621456.930 ops/sec 2 usecs/op
fsync 624811.200 ops/sec 2 usecs/op
fsync_writethrough n/a
open_sync n/a

* This file system and its mount options do not support direct
I/O, e.g. ext4 in journaled mode.

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB in different write
open_sync sizes.)
1 * 16kB open_sync write n/a
2 * 8kB open_sync writes n/a

4 * 4kB open_sync writes n/a
8 * 2kB open_sync writes n/a

16 * 1kB open_sync writes n/a*

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written on a different
descriptor.)
write, fsync, close 314754.770 ops/sec 3 usecs/op
write, close, fsync 314509.045 ops/sec 3 usecs/op

Non-sync'ed 8kB writes:
write 517299.869 ops/sec 2 usecs/op
```

本地磁盘性能如下:

```
digoal@pg11-test-> pg_test_fsync -f /data01/digoal/1
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 46574.176 ops/sec 21 usecs/op
fdatasync 40183.743 ops/sec 25 usecs/op
fsync 36875.852 ops/sec 27 usecs/op
fsync_writethrough n/a
open_sync 42927.560 ops/sec 23 usecs/op

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 17121.111 ops/sec 58 usecs/op
fdatasync 26438.641 ops/sec 38 usecs/op
fsync 24562.907 ops/sec 41 usecs/op
fsync_writethrough n/a
open_sync 15698.199 ops/sec 64 usecs/op

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB in different write
open_sync sizes.)
1 * 16kB open_sync write 28793.172 ops/sec 35 usecs/op
2 * 8kB open_sync writes 15720.156 ops/sec 64 usecs/op
4 * 4kB open_sync writes 10007.818 ops/sec 100 usecs/op
8 * 2kB open_sync writes 5698.259 ops/sec 175 usecs/op
16 * 1kB open_sync writes 3116.232 ops/sec 321 usecs/op

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written on a different
descriptor.)
write, fsync, close 33399.473 ops/sec 30 usecs/op
write, close, fsync 33216.001 ops/sec 30 usecs/op

Non-sync'ed 8kB writes:
write 376584.982 ops/sec 3 usecs/op
```

性能对比,显而易见。

其他

mount hugetlbfs,使用huge page的文件系统,但是不支持read, write接口,需要使用mmap的用法。

详见

https://www.ibm.com/developerworks/cn/linux/l-cn-hugetlb/index.html

参考

https://docs.oracle.com/cd/E11882_01/server.112/e10839/appi_vlm.htm#UNXAR397

http://www.cnblogs.com/jintianfree/p/3993893.html

https://lwn.net/Articles/376606/

https://www.ibm.com/developerworks/cn/linux/l-cn-hugetlb/index.html

《PostgreSQL Huge Page 使用建议 - 大内存主机、实例注意》

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论