cea-hpc/clustershell

pbrun with clush on remote host is not printing out to stdout

kiransgithub opened this issue · 3 comments

Hi Team,

I have requirement, where I need to execute clush command on a Server B from Server A. I had created a python script which is working fine (I am using subprocess.Popen) when I execute it from Server B directly. However when I do the same from Server A over ssh, it is throwing exit code 255 and that too at a step where clush -w is used. Clush with -a option commands are working from Server B or over ssh from Server A.

Server B:

==============================
[hdpsupl2@SERVER-B(ADMINNODE)\ /home/hdpsupl2] $ pbrun /opt/automation/L2/command_parse_executer.py -c refreshQueues
command:  refreshQueues
Finding active RM host...maprcli node list -columns svc,h -filter [svc==resourcemanager]| awk '/RM/ {print $1}'
ACTIVE_RM_HOST : SERVER_C
Executing command :  echo y | clush -a  --copy /opt/mapr/hadoop/hadoop/etc/hadoop/fair-scheduler.xml --dest /opt/mapr/hadoop/hadoop/etc/hadoop/ARCHIVE/fair-scheduler.xml.12072020_1419
rc: 0
std :Running command on 'all' group. Do you want to continue?(y/n)
Executing command :  echo y | clush -a  --copy /home/hdpsupl2/fair-scheduler.xml --dest /opt/mapr/hadoop/hadoop/etc/hadoop/fair-scheduler.xml
rc: 0
std :Running command on 'all' group. Do you want to continue?(y/n)
Executing command :  clush -w SERVER_C -B "runuser -l mapr -c 'yarn rmadmin -refreshQueues';echo -e 'command_status=$?'"
rc: 0
std :---------------
**SERVER_C
---------------
20/12/07 14:19:49 INFO client.MapRZKBasedRMFailoverProxyProvider: Updated RM address to SERVER-C./11.12.16.52:8033
command_status=0**
=====================

from Server A executing the same python script via ssh :

=========================================
bash-4.2$ ssh -t -q SERVER-B bash -c "whoami;date;pbrun /opt/automation/L2/command_parse_executer.py -c refreshQueues"
hdpsupl2
Mon Dec  7 14:20:07 MST 2020
command:  refreshQueues
Finding active RM host...maprcli node list -columns svc,h -filter [svc==resourcemanager]| awk '/RM/ {print $1}'
ACTIVE_RM_HOST : SERVER_C
Executing command :  echo y | clush -a  --copy /opt/mapr/hadoop/hadoop/etc/hadoop/fair-scheduler.xml --dest /opt/mapr/hadoop/hadoop/etc/hadoop/ARCHIVE/fair-scheduler.xml.12072020_1420
rc: 0
std :Running command on 'all' group. Do you want to continue?(y/n)
Executing command :  echo y | clush -a  --copy /home/hdpsupl2/fair-scheduler.xml --dest /opt/mapr/hadoop/hadoop/etc/hadoop/fair-scheduler.xml
rc: 0
std :Running command on 'all' group. Do you want to continue?(y/n)
**Executing command :  clush -w SERVER_C -B "runuser -l mapr -c 'yarn rmadmin -refreshQueues';echo -e 'command_status=$?'"
rc: 1
ste:clush: SERVER_C: exited with exit code 255

rc: 1**

Any hints on this are highly appreciated.Tried ssh with -tt or -T but no luck. Its always failing on clush with -w.

Hi @kiransgithub , the output you provided is not that easy to read to get a clear understanding of what you're doing.
The first difference I'm seeing here is that when running clush -a you using echo y | first, which is making a difference. the clush stdout process will be reading stdin from this echo command.
Also, clush -a is used to copy files, but clush -w is used to run a command, that's really different.

-a option is just getting the nodelist from the all group, and -w is getting the nodelist you specify. They have very little difference.
Could you try using clush command with and without ssh first, without using your script?

I did not realized you closed this one. i'm moving my note on this other ticket.

Duplicate of #456