mortonne/aperture

apply_to_subj: findTask not able to obtain job task state

GoogleCodeExporter opened this issue · 1 comments

Starting today, whenever I run something that uses apply_to_subj, there appear 
to be problems with the findTask function and the individual tasks' states.  My 
jobs continue to run, but the exp structure is not updated since the error 
occurs.

I'm not 100% sure this is an eeg_ana problem, since I haven't updated anything 
since yesterday and it was working.  It could be a problem with the distcomp 
toolbox (unlikely, since I did not have a problem yesterday), or an ACCRE 
problem.


What steps will reproduce the problem?

Here's the code I was attempting to use, with the error message that I 
received.  This is calling apply_to_pat, but I get the same error message using 
just apply_to_subj as well.

>> % time bin 2600
>> params = [];
>> params.overwrite = 1;
>> params.timebins = make_bins(10, -200, 2600);
>> params.save_as = 'vf_2600';
>> 
>> exp.subj = apply_to_pat(exp.subj, 'volt_filtered', @bin_pattern, {params}, 
...
                        dist, 'memory', mem, 'walltime', walltime);
job 7 submitted with 39 tasks.
??? Error using ==> distcomp.fileserializer.getFields at 83
State file does not contain string data

Error in ==> distcomp.abstractjob.findTask>iGetTasksByState at 76
states = job.Serializer.getFields(tasks, {'state'});

Error in ==> distcomp.abstractjob.findTask at 45
        [p{i}, r{i}, f{i}] = iGetTasksByState(jobs(i));

Error in ==> apply_to_subj at 123
      [p, r, f] = findTask(c{3}(j));

Error in ==> apply_to_subj_obj at 68
  temp_subj = apply_to_subj(temp_subj, @apply_to_obj, ...

Error in ==> apply_to_pat at 72
  subj = apply_to_subj_obj(subj, {'pat', pat_names{i}}, fcn_handle, ...




I have loaded up the job scheduler afterwards, and am able to find the task 
states, but the internal function does not seem to be able to for some reason, 
causing the error and consequential failure.

Original issue reported on code.google.com by joshua.d...@vanderbilt.edu on 19 Aug 2011 at 10:22

I've run into this problem too occasionally.  It seems to be an intermittent 
problem with ACCRE.  I've committed a change to apply_to_subj that may help; it 
just wraps the state query in a try catch block, so if the function has issues 
getting the state, it will just wait a bit and try again later.  It should 
ideally have a timeout mechanism, but it doesn't yet, so there's a possibility 
of the new version hanging if there is a long-term issue with accessing the 
scheduler.

Original comment by morto...@gmail.com on 19 Aug 2011 at 10:47

  • Changed state: Fixed