gearman/gearmand

gearadmin --priority-status does not return expected values

Closed this issue ยท 8 comments

We're using Gearman 1.1.19.1 with one queue/function and all 3 priorities. We want to build a Grafana dashboard with the queue structure stacked in terms of priorities. This would allow us to view how many HIGH jobs are waiting, estimate when NORMAL jobs will start, etc.. .

However, in testing, we're unable to get the expected results from gearadmin --priority-status.

For example, we've pushed 100 high, 50 normal and 10 low jobs to the foo function with 0 workers (so all 160 are waiting) and this is the output we're getting:

gearadmin --priority-status
bar 0 0 0 0
convert 0 0 0 25
foo 1 1 2 0

So only 1 high, 1 medium and 2 low ?! gearadmin --show-unique-jobs shows all 160 jobs/uuids.

We've repeatedly run variations of this test, and the numbers never get close to what's waiting in the queue so we're either missing something or the functionality is bugged. The same numbers are returned when we telnet into gearmand and type prioritystatus.

Initial merge for the --priority-status feature: 33fccc5.

@octavn :

  1. Are you sure the payloads for each of the 160 jobs you submitted are different? If the payloads are the same, gearmand might be combining them somehow into the same job when listed in the prioritystatus output. (It's a theory. I'm just guessing.)

  2. Would you share your code or script for submitting the 160 "foo" jobs with different priorities? I know it's probably super-simple, but it might help with testing.

  3. Would you be willing to test this with older gearmand releases? prioritystatus was first added in 1.1.15, so it would be nice to confirm if the problem was evident in that version or if it only appeared in later versions.

Joining the discussion here:

  1. The payloads were the same when first testing, but to check your theory I've also tested with distinct payloads and had the same result, the issue still persists.
  2. Here is the code for the gearman client:
<?php

//instantiate the client
$client = new GearmanClient();
$client->addServer("localhost", 4730);

for ($i = 0; $i < 100; $i++){
   $jobHandle = $client->doHighBackground("reverse", "test payload high");
   echo "Added job $jobHandle \n";
}

for ($i = 0; $i < 50; $i++){
   $jobHandle = $client->doBackground("reverse", "test payload normal");
   echo "Added job $jobHandle \n";
}

for ($i = 0; $i < 10; $i++){    
   $jobHandle = $client->doLowBackground("reverse", "test payload low");
   echo "Added job $jobHandle \n";
}

And the code for the gearman worker:

<?php

$worker= new GearmanWorker();
$worker->addServer("localhost", 4730);
$worker->addFunction("reverse", "reverse_fn");

while($worker->work()){
    if ($worker->returnCode() != GEARMAN_SUCCESS) {
        break;
    }
}

//the classic example of the string reverse
function reverse_fn($job){
	echo "Received job: " . $job->handle() . "\n";

	$workload = $job->workload();
	$workload_size = $job->workloadSize();

	echo "Workload: $workload ($workload_size)\n";

	// This status loop is not needed, just showing how it works
	for ($x= 0; $x < $workload_size; $x++){
		echo "Sending status: " . ($x + 1) . "/$workload_size complete\n";
		$job->sendStatus($x, $workload_size);
		sleep(1);
	}

	$result= strrev($workload);
	echo "Result: $result\n";

	return $result;
}
  1. The build process failed when trying to build 1.1.15 from the repo, but managed to build 1.1.16 just fine and the problem existed back then as well.

Confirming here that prioritystatus does not seem to work. Taking a look now to see what's wrong.

Thanks so much for the bug report @octavn . This has basically always been broken. It appeared to work because it would at least print 1 for "there are things in this priority" and 0 for "there are no things in this priority". We'll get that PR merged and then cut a release for a 1.1.20.

Thank you for quickly fixing this issue.

@SpamapS wrote:

It appeared to work because it would at least print 1 for "there are things in this priority" and 0 for "there are no things in this priority".

I think the output @octavn posted above had a "2" in the low priority column?

Ok so I looked deeper and found what that ->next attribute really is. I can't tell you all how much I really don't want to read C/C++ macros as much as we do here, and I hope one day I'll convince you all to move over to rustygear.

This line is where that field is maintained:

GEARMAND_HASH__ADD(server->job, key, server_job);

That is the hash that contains all of the server's jobs by job ID, so that the STATUS commands work. This is maintained by this macro:

https://github.com/gearman/gearmand/blob/master/libgearman-server/common.h#L198

These fields really have no business being accessible to other things, but hey, we're half C++ half C so this is what we get.

The function_next field, however, is maintained directly here:

https://github.com/gearman/gearmand/blob/master/libgearman-server/job.cc#L431

And just like the fixed prioritystatus does now, it is used to take jobs off the queue here:

https://github.com/gearman/gearmand/blob/master/libgearman-server/gearmand_con.cc#L399