ryantate/typingpool

HITs getting "hung up" because of an expired approval?

Closed this issue · 6 comments

I skipped an approval, went to bed, and six hours later, I assume it auto-paid.

After that timeout occurred, tp-review did not present it to me again, but also did not present to me any new approvals, either:

Figuring out what needs to be assigned
41 assignments total
10 assignments completed
31 assignments outstanding
Nothing to assign

I've continued to receive no assignments, and there has been no change in tp-review reporting, but looking at Amazon's HIT review interface, every assignment now has a "Reviewed Assignment" which I did not manually approve:

screen shot 2013-12-03 at 12 12 09 pm

tp-finish did not retrieve any of the auto-approved entries, and it does not appear that there is a way to download those results, which are now no longer present on the MTurk site (also, since they were individual HITs and not a batch, it did not appear as if I could have downloaded the results other than one at a time).

Is this enough information or would you like a more detailed test case?

Oh, hm, just saw your comment on the other one. If I had run tp-collect would that allowed the rest of the items to be tp-reviewed?

Running tp-collect automatically adds them to your transcript, since once they are auto-approved it is not possible to reject them.

This is by design, but obviously the design did not serve you well in this case -- if you ran tp-finish PROJECTNAME those HITs are gone. Oy. Ack.

I am under the impression that my experience was:

  • Skip response A
  • Other responses B, C, and D arrive
  • Amazon auto-approves A
  • I tp-review but don't see B, C, and D because I haven't tp-collected A
  • Amazon auto-approves B, C, and D

Is that the case? Or was it just a weird coincidence of timing that I never saw B, C, and D?

You are right up until bullet point 4 -- tp-review will show you any incoming submissions ("assignments" in mturk terms) that have not been expired (and thus auto-approved by Amazon) or not already manually approved or rejected by you. It doesn't care and should not be affected by whatever you have skipped. If it didn't show you something it should be because it was already auto-approved due to the auto-approval deadline running out, which would fit with you going to bed if the HITs came in during the night.

The fundamental difference between tp-review and tp-collect is that tp-review is for submissions that require your action, while tp-collect is for items that do not require your action (or that have been approved via other means, e.g. manually through the Amazon web interface). Both add items to your transcript, but different types of items.

I freely admit this distinction can seem a bit arbitrary and that tp-review should just collect all types of submissions. And tp-finish should probably throw a warning before deleting data you have not put into your transcript. I feel terribly you lost data that you paid for.

Okay! This all seems to work as it's supposed to, and I was just wrong the first time through. No worries about data loss, looks like it was user error.

I edited tp-assign to pause for a minute between HIT assignments, and submitted two projects with a short deadline and short approval time, to help get at least one file submitted fast, which I could then skip to test my case up there.

After I skipped the first response, and Amazon auto-approved it, subsequent submissions that hadn't timed out were retrieved by tp-review, and tp-collect did indeed retrieve the skipped one.

I had completely missed tp-collect being a thing. Perhaps adding a line in the example workflow usage showing it? Or if tp-review returns no data, suggest the user try tp-collect for auto-approved items? And that tp-finish warning sounds nice.

tp-review now collects auto-approved HITs (and anything else tp-collect would collect - it even passes the exact same tests :-)

will close this issue with the new commit shortly.