# author: Min chen(minchen@ubuntukylin.com) 2014 2015 ------------- ceph rbd recover tool ------------- ceph rbd recover tool is used for recovering ceph rbd image, when all ceph services are killed. it is based on ceph-0.80.x (Firefly and newer) currently, ceph service(ceph-mon, ceph-osd) evently are not avaiable caused by bugs or sth else , especially on large scale ceph cluster, so that the ceph cluster can not supply service and rbd images can not be accessed. In this case, a tool to recover rbd image is nessecary. ceph rbd recover tool is just used for this, it can collect all objects of an image from distributed osd nodes with the latest pg epoch, and splice objects by offset to a complete image. To make sure object data is complete, this tool does flush osd journal on each osd node before recovering. but, there are some limitions: -need ssh service and unobstructed network -osd data must be accessed on local disk -clone image is not supported, while snapshot is supported -only support relicated pool before you run this tool, you should make sure that: 1). all processes (ceph-osd, ceph-mon, ceph-mds) are shutdown 2). ssh deamon is running & network is ok (ssh to each node without password) 3). ceph-kvstore-tool is installed(for ubuntu: apt-get install ceph-test) 4). osd disk is not crashed and data can be accessed on local filesystem -architecture: +---- osd.0 | admin_node -----------+---- osd.1 | +---- osd.2 | ...... -files: admin_node: {rbd-recover-tool common_h epoch_h metadata_h database_h} osd: {osd_job common_h epoch_h metadata_h} #/var/rbd_tool/osd_job in this architecture, admin_node acts as client, osds act as server. so, they run diffrent files: on admin_node run: rbd-recover-tool <action> [<parameters>] on osd node run: ./osd_job <funtion> <parameters> admin_node will copy files: osd_job, common_h, epoch_h, metadata_h to remote osd node -config file before you run this tool, make sure write config files first osd_host_path: osd hostnames and osd data path #user input osdhost0 /var/lib/ceph/osd/ceph-0 osdhost1 /var/lib/ceph/osd/ceph-1 ...... mon_host: all mon node hostname #user input monhost0 monhost1 ...... mds_host: all mds node hostname #user input mdshost0 mdshost1 ...... then, init_env_admin function will create file: osd_host osd_host: all osd node hostname #generated by admin_job, user ignore it osdhost0 osdhost1 ...... -usage: rbd-recovert-tool <operation> <operation> : database #generating offline database: hobject path, node hostname, pg_epoch and image metadata list #list all images from offline database lookup <pool_id>/<image_name>[@[<snap_name>]] #lookup image metadata in offline database recover <pool_id><image_name>[@[<snap_name>]] [/path/to/store/image] #recover image data according to image metadata -steps: 1. stop all ceph services: ceph-mon, ceph-osd, ceph-mds 2. setup config files: osd_host_path, mon_host, mds_host 3. rbd-recover-tool database # wait a long time 4. rbd-recover-tool list 4. rbd-recover-tool recover <pool_id>/<image_name>[@[<image_name>]] [/path/to/store/image] -debug & error check if admin_node operation is failed, you can check it on osd node cd /var/rbd_tool/osd_job ./osd_job <operation> <opeartion> : do_image_id <image_id_hobject> #get image id of image format v2 do_image_id <image_header_hobject> #get image id of image format v1 do_image_metadata_v1 <image_header_hobject> #get image metadata of image format v1, maybe pg epoch is not latest do_image_metadata_v2 <image_header_hobject> #get image metadata of image format v2, maybe pg epoch is not latest do_image_list #get all images on this osd(image head hobject) do_pg_epoch #get all pg epoch and store it in /var/rbd_tool/single_node/node_pg_epoch do_omap_list #list all omap headers and omap entries on this osd -FAQ file FAQ lists some common confusing cases while testing