/ceph-rbd-recover-tool

ceph offline recover tool for ceph rbd images

Primary LanguageShell

# author: Min chen(minchen@ubuntukylin.com) 2014 2015

------------- ceph rbd recover tool -------------

  ceph rbd recover tool is used for recovering ceph rbd image, when all ceph services are killed.
it is based on ceph-0.80.x (Firefly and newer)
  currently, ceph service(ceph-mon, ceph-osd) evently are not avaiable caused by bugs or sth else
, especially on large scale ceph cluster, so that the ceph cluster can not supply service 
and rbd images can not be accessed. In this case, a tool to recover rbd image is nessecary.
  ceph rbd recover tool is just used for this, it can collect all objects of an image from distributed
osd nodes with the latest pg epoch, and splice objects by offset to a complete image. To make sure
object data is complete, this tool does flush osd journal on each osd node before recovering.
  but, there are some limitions:
-need ssh service and unobstructed network 
-osd data must be accessed on local disk
-clone image is not supported, while snapshot is supported
-only support relicated pool

before you run this tool, you should make sure that:
1). all processes (ceph-osd, ceph-mon, ceph-mds) are shutdown
2). ssh deamon is running & network is ok (ssh to each node without password)
3). ceph-kvstore-tool is installed(for ubuntu: apt-get install ceph-test)
4). osd disk is not crashed and data can be accessed on local filesystem

-architecture:

                      +---- osd.0
                      |
admin_node -----------+---- osd.1
                      |
                      +---- osd.2
		      |
                      ......

-files:
admin_node: {rbd-recover-tool  common_h  epoch_h  metadata_h  database_h}
osd:        {osd_job           common_h  epoch_h  metadata_h} #/var/rbd_tool/osd_job
in this architecture, admin_node acts as client, osds act as server.
so, they run diffrent files: 
on admin_node run:  rbd-recover-tool <action> [<parameters>]
on osd node run:    ./osd_job <funtion> <parameters>
admin_node will copy files: osd_job, common_h, epoch_h, metadata_h to remote osd node


-config file
before you run this tool, make sure write config files first
osd_host_path: osd hostnames and osd data path #user input
  osdhost0	/var/lib/ceph/osd/ceph-0
  osdhost1	/var/lib/ceph/osd/ceph-1
  ......
mon_host: all mon node hostname #user input
  monhost0
  monhost1
  ......
mds_host: all mds node hostname #user input
  mdshost0
  mdshost1
  ......
then, init_env_admin function will create file: osd_host
osd_host: all osd node hostname #generated by admin_job, user ignore it
  osdhost0
  osdhost1
  ......


-usage:
rbd-recovert-tool <operation>
<operation> :
database		#generating offline database: hobject path, node hostname, pg_epoch and image metadata
list			#list all images from offline database
lookup <pool_id>/<image_name>[@[<snap_name>]]	#lookup image metadata in offline database
recover <pool_id><image_name>[@[<snap_name>]] [/path/to/store/image]	#recover image data according to image metadata

-steps:
1. stop all ceph services: ceph-mon, ceph-osd, ceph-mds
2. setup config files: osd_host_path, mon_host, mds_host
3. rbd-recover-tool database 	# wait a long time 
4. rbd-recover-tool list
4. rbd-recover-tool recover <pool_id>/<image_name>[@[<image_name>]] [/path/to/store/image]


-debug & error check
if admin_node operation is failed, you can check it on osd node
cd /var/rbd_tool/osd_job
./osd_job <operation>
<opeartion> :
do_image_id <image_id_hobject>		#get image id of image format v2 
do_image_id <image_header_hobject>	#get image id of image format v1
do_image_metadata_v1 <image_header_hobject>  	#get image metadata of image format v1, maybe pg epoch is not latest
do_image_metadata_v2 <image_header_hobject>  	#get image metadata of image format v2, maybe pg epoch is not latest
do_image_list 				#get all images on this osd(image head hobject)
do_pg_epoch				#get all pg epoch and store it in /var/rbd_tool/single_node/node_pg_epoch
do_omap_list    			#list all omap headers and omap entries on this osd


-FAQ
file FAQ lists some common confusing cases while testing