apply_to_subj: findTask not able to obtain job task state
GoogleCodeExporter opened this issue · 1 comments
GoogleCodeExporter commented
Starting today, whenever I run something that uses apply_to_subj, there appear
to be problems with the findTask function and the individual tasks' states. My
jobs continue to run, but the exp structure is not updated since the error
occurs.
I'm not 100% sure this is an eeg_ana problem, since I haven't updated anything
since yesterday and it was working. It could be a problem with the distcomp
toolbox (unlikely, since I did not have a problem yesterday), or an ACCRE
problem.
What steps will reproduce the problem?
Here's the code I was attempting to use, with the error message that I
received. This is calling apply_to_pat, but I get the same error message using
just apply_to_subj as well.
>> % time bin 2600
>> params = [];
>> params.overwrite = 1;
>> params.timebins = make_bins(10, -200, 2600);
>> params.save_as = 'vf_2600';
>>
>> exp.subj = apply_to_pat(exp.subj, 'volt_filtered', @bin_pattern, {params},
...
dist, 'memory', mem, 'walltime', walltime);
job 7 submitted with 39 tasks.
??? Error using ==> distcomp.fileserializer.getFields at 83
State file does not contain string data
Error in ==> distcomp.abstractjob.findTask>iGetTasksByState at 76
states = job.Serializer.getFields(tasks, {'state'});
Error in ==> distcomp.abstractjob.findTask at 45
[p{i}, r{i}, f{i}] = iGetTasksByState(jobs(i));
Error in ==> apply_to_subj at 123
[p, r, f] = findTask(c{3}(j));
Error in ==> apply_to_subj_obj at 68
temp_subj = apply_to_subj(temp_subj, @apply_to_obj, ...
Error in ==> apply_to_pat at 72
subj = apply_to_subj_obj(subj, {'pat', pat_names{i}}, fcn_handle, ...
I have loaded up the job scheduler afterwards, and am able to find the task
states, but the internal function does not seem to be able to for some reason,
causing the error and consequential failure.
Original issue reported on code.google.com by joshua.d...@vanderbilt.edu
on 19 Aug 2011 at 10:22
GoogleCodeExporter commented
I've run into this problem too occasionally. It seems to be an intermittent
problem with ACCRE. I've committed a change to apply_to_subj that may help; it
just wraps the state query in a try catch block, so if the function has issues
getting the state, it will just wait a bit and try again later. It should
ideally have a timeout mechanism, but it doesn't yet, so there's a possibility
of the new version hanging if there is a long-term issue with accessing the
scheduler.
Original comment by morto...@gmail.com
on 19 Aug 2011 at 10:47
- Changed state: Fixed