gluster/glusterd2

Healinfo for disperse volumes errors out

shtripat opened this issue · 3 comments

Observed behavior

In case of dispersed volumes heal info endpoint errors out saying heal info operation failed

Expected/desired behavior

The dispersed volumes show heal info (only split brain heal info is not applicable)

Details on how to reproduce (minimal and precise)

  1. Setup 3 node gluster cluster
  2. Install glusterd2 on the storage nodes and peer probe the nodes
  3. Create a disperse volume using glustercli
  4. Access the endpoint http://{node-ip}:24007/v1/volumes/{volname}/heal-info

Information about the environment:

  • Glusterd2 version used (e.g. v4.1.0 or master): glusterd2-5.0-0.dev.98.git5dd1254.el7.x86_64
  • Operating system used: CentOS Linux release 7.4.1708 (Core)
  • Glusterd2 compiled from sources, as a package (rpm/deb), or container: rpm
  • Using External ETCD: (yes/no, if yes ETCD version): yes
  • If container, which container image: NA
  • Using kubernetes, openshift, or direct install: direct install
  • If kubernetes/openshift, is gluster running inside kubernetes/openshift or outside: NA

Other useful information

  • glusterd2 config files from all nodes (default /etc/glusterd2/glusterd2.toml)
  • glusterd2 log files from all nodes (default /var/log/glusterd2/glusterd2.log)
  • ETCD configuration
  • Contents of uuid.toml from all nodes (default /var/lib/glusterd2/uuid.toml)
  • Output of statedump from any one of the node

Useful commands

  • To get glusterd2 version
    glusterd2 --version
    
  • To get ETCD version
    etcd --version
    
  • To get output of statedump
    curl http://glusterd2-IP:glusterd2-Port/statedump
    

@shtripat Have you tried this with a higher time out value (--timeout parameter) ? heal info command especially with disperse volume where total number of bricks can be on the higher side might time out by 30 secs.

@atinmu I tried with bigger timeout but no help.
There is another finding, if I create a pure disperse volume the heal-info works in gd2 but for a distribute-disperse (which I started testing with initially) it gives the error.