引言
我们在进行 ceph 的 osd 的增加和减少的维护的时候,会碰到迁移数据,但是我们平时会怎么去回答关于迁移数据量的问题,一般来说,都是说很多,或者说根据环境来看,有没有精确的一个说法,到底要迁移多少数据?这个我以前也有思考过这个问题,当时想是对比前后的pg的分布,然后进行计算,正好在翻一些资料的时候,看到有alram写的一篇博客,alram是Inktank的程序员,也就是sage所在的公司,程序是一个python脚本,本篇会分析下这个对比的思路,以及运行效果
计算迁移量只需要一个修改后的crushmap就可以了,这个是离线计算的,所以不会对集群有什么影响
运行效果
准备修改后的crushmap
获取当前crushmap
ceph osd getcrushmap -o crushmap
解码crushmap
crushtool -d crushmap -o crushmap.txt
修改crushmap.txt
这个根据自己需要,修改成自己想修改成的crushmap即可,可以是增加,也可以是删除
减少节点的计算
假如删除一个osd.5 我们需要迁移多少数据
将crushmap里面的osd.5的weight改成0
crushtool -c crushmap.txt -o crushmapnew
运行计算脚本
[root@lab8106 ceph]# python jisuan.py --crushmap-file crushmapnew
POOL REMAPPED OSDs BYTES REBALANCE OBJECTS REBALANCE rbd 59 6157238296 1469
data 54 5918162968 1412
metadata 53 5825888280 1390
可以看到迁移的数据量
REMAPPED OSDs 下面就是有多少份的PG数据需要迁移,这里面计算的方式是比较前后的分布
[1,2] - > [1,2] 迁移0个
[1,2] - > [4,2] 迁移1个
[1,2] - > [4,3] 迁移2个
上面的统计的是这样的个数,所以不太好说是PG或者是OSD,可以理解为PG内数据的份数,因为单个PG可能需要迁移一份,也有可能迁移两份,或者多份
增加节点的计算
如果增加一个osd.6 我们需要迁移多少数据
直接运行脚本
[root@lab8106 ceph]# python jisuan.py --crushmap-file crushmapnew
POOL REMAPPED OSDs BYTES REBALANCE OBJECTS REBALANCE rbd 0 0 0
data 0 0 0
metadata 0 0 0
可以看到没有输出,这个是因为计算的脚本里面有个地方报错了,ceph内部有个限制,在将crushmap import进osdmap的时候,ceph会验证osdmap里面的osd个数和crushmap里面的osd个数是不是相同
所以这个地方需要多做一步,将osd的个数设置成跟预估的一致,这个是唯一对现有集群做的修改操作,
[root@lab8106 ceph]# ceph osd setmaxosd 7
set new max_osd = 7
然后再次运行就可以了
[root@lab8106 ceph]# python jisuan.py --crushmap-file crushmapnew
POOL REMAPPED OSDs BYTES REBALANCE OBJECTS REBALANCE rbd 31 3590324224 856
data 34 3372220416 804
metadata 41 4492099584 1071
上面就是运行的效果,下面我们对内部的逻辑进行分析
代码和代码分析
代码
#!/usr/bin/env pythonimport ast import json import os import subprocess import argparse import sys FNULL = open(os.devnull, 'w')# assume the osdmap test output# is the same lenght and order...# if add support for PG increase# that's gonna breakdef diff_output(original, new, pools): number_of_osd_remap = 0 osd_data_movement = 0 results = {} pg_data, pg_objects = get_pg_info() for i in range(len(original)): orig_i = original[i] new_i = new[i] if orig_i[0].isdigit(): pg_id = orig_i.split('\t')[0] pool_id = pg_id[0] pool_name = pools[pool_id]['pool_name'] if not pool_name in results: results[pool_name] = {} results[pool_name]['osd_remap_counter'] = 0 results[pool_name]['osd_bytes_movement'] = 0 results[pool_name]['osd_objects_movement'] = 0 original_mappings = ast.literal_eval(orig_i.split('\t')[1]) new_mappings = ast.literal_eval(new_i.split('\t')[1]) intersection = list(set(original_mappings).intersection(set(new_mappings))) osd_movement_for_this_pg = int(pools[pool_id]['pool_size']) - len(intersection) osd_data_movement_for_this_pg = int(osd_movement_for_this_pg) * int(pg_data[pg_id]) osd_object_movement_for_this_pg = int(osd_movement_for_this_pg) * int(pg_objects[pg_id]) results[pool_name]['osd_remap_counter'] += osd_movement_for_this_pg results[pool_name]['osd_bytes_movement'] += int(osd_data_movement_for_this_pg) results[pool_name]['osd_objects_movement'] += int(osd_object_movement_for_this_pg) elif orig_i.startswith('#osd'): break return results def get_pools_info(osdmap_path): pools = {} args = ['osdmaptool', '--print', osdmap_path] osdmap_out = subprocess.check_output(args, stderr=FNULL).split('\n') for line in osdmap_out: if line.startswith('pool'): pool_id = line.split()[1] pool_size = line.split()[5] pool_name = line.split()[2].replace("'","") pools[pool_id] = {} pools[pool_id]['pool_size'] = pool_size pools[pool_id]['pool_name'] = pool_name elif line.startswith('max_osd'): break return pools def get_osd_map(osdmap_path): args = ['sudo', 'ceph', 'osd', 'getmap', '-o', osdmap_path] subprocess.call(args, stdout=FNULL, stderr=subprocess.STDOUT) def get_pg_info(): pg_data = {} pg_objects = {} args = ['sudo', 'ceph', 'pg', 'dump'] pgmap = subprocess.check_output(args, stderr=FNULL).split('\n') for line in pgmap: if line[0].isdigit(): pg_id = line.split('\t')[0] pg_bytes = line.split('\t')[6] pg_obj = line.split('\t')[1] pg_data[pg_id] = pg_bytes pg_objects[pg_id] = pg_obj elif line.startswith('pool'): break return pg_data, pg_objects def osdmaptool_test_map_pgs_dump(original_osdmap_path, crushmap): new_osdmap_path = original_osdmap_path + '.new' get_osd_map(original_osdmap_path) args = ['osdmaptool', '--test-map-pgs-dump', original_osdmap_path] original_osdmaptool_output = subprocess.check_output(args, stderr=FNULL).split('\n') args = ['cp', original_osdmap_path, new_osdmap_path] subprocess.call(args, stdout=FNULL, stderr=subprocess.STDOUT) args = ['osdmaptool', '--import-crush', crushmap, new_osdmap_path] subprocess.call(args, stdout=FNULL, stderr=subprocess.STDOUT) args = ['osdmaptool', '--test-map-pgs-dump', new_osdmap_path] new_osdmaptool_output = subprocess.check_output(args, stderr=FNULL).split('\n') pools = get_pools_info(original_osdmap_path) results = diff_output(original_osdmaptool_output, new_osdmaptool_output, pools) return results def dump_plain_output(results): sys.stdout.write("%-20s %-20s %-20s %-20s\n" % ("POOL", "REMAPPED OSDs", "BYTES REBALANCE", "OBJECTS REBALANCE")) for pool in results: sys.stdout.write("%-20s %-20s %-20s %-20s\n" % ( pool, results[pool]['osd_remap_counter'], results[pool]['osd_bytes_movement'], results[pool]['osd_objects_movement'] )) def cleanup(osdmap): FNULL.close() new_osdmap = osdmap + '.new' os.remove(new_osdmap) def parse_args(): parser = argparse.ArgumentParser(description='Ceph CRUSH change data movement calculator.') parser.add_argument( '--osdmap-file', help="Where to save the original osdmap. Temp one will be <location>.new. Default: tmp/osdmap", default="/tmp/osdmap", dest="osdmap_path" ) parser.add_argument( '--crushmap-file', help="CRUSHmap to run the movement test against.", required=True, dest="new_crushmap" ) parser.add_argument( '--format', help="Output format. Default: plain", choices=['json', 'plain'], dest="format", default="plain" ) args = parser.parse_args() return argsif __name__ == '__main__': ctx = parse_args() results = osdmaptool_test_map_pgs_dump(ctx.osdmap_path, ctx.new_crushmap) cleanup(ctx.osdmap_path) if ctx.format == 'json': print json.dumps(results) elif ctx.format == 'plain': dump_plain_output(results)
直接放在这里方便拷贝,也可以去原作者的gist里面去获取
主要代码分析
首先获取osdmap
ceph osd getmap -o /tmp/osdmap
获取原始pg分布
使用osdmaptool --test-map-pgs-dump /tmp/osdmap
获取新的crushmap
这个是自己编辑成需要的crushmap
将新的crushmap注入到osdmap里面得到新的osdmap
osdmaptool --import-crush crushmap /tmp/new_osdmap_path
根据新的osdmap进行计算新的分布
osdmaptool --test-map-pgs-dump /tmp/new_osdmap_path
然后比较两个输入进行对比得到结果
相关链接
Calculate data migration when changing the CRUSHmap
alram/crush_data_movement_calculator.py
变更记录
| Why | Who | When |
|---|---|---|
| 创建 | 武汉-运维-磨渣 | 2017-02-08 |




