user_folder=True download_vott_json only pulls from one folder
abfleishman opened this issue · 15 comments
I am trying the workflow with user_folders=True
for the first time and it looks like after I initialized project it only is pulling images from one of the folders when I run download_vott_json.py. Ping me when you are ready to address this and I can provide info that you need to recreate.
Hey Abram I'm actually flying back to San Diego right now but one quick thing you could try is setting user_folder=True. The program checks for that exact spelling/capitalization, so that might have caused an issue.
This is likely because it is less certain about images in the random folder than it is about ones in infusion. There idea behind user_folders was that if it picked images from only one folder each time it would be easier for the tagger to tag. If it keeps picking random that's probably just because the "randomness" in random means that there's always photos it's uncertain about.
I have not trained a model yet though. I want to label images in both folders (maybe one at a time) but I have only just initialized the project and I pulled 5 batches of 50 photos and they all came from the random folder. Now that I have trained a version of the model I am still only pulling photos from the random folder and I have set pick_max both true and false. thoughts? seems like a bug to me, at least if I understand the user_folders param
Yeah that definitely seems like an issue. Again assuming pick_max was set to True (capital T), it should have picked photos from infusions. I'll look into it later today.
yes pick_max=True
Hey Abram,
I looked at the code and nothing sticks out as the reason. Could you send me the totag_{timestamp}.csv generated after the model was trained?
Yup! here it is. I just picked a version of the files since I have many. So I downloaded images and tagged them ~20 times yesterday and it did eventually pick a small batch from the infusions folder (just once) and the rest of the time it picked from random.
totag_1537917161.zip
Hey Abram,
The CSV file you sent me doesn't seem to have any predictions in it. In that case, the defaulting to random folder is, ironically, random. If possible, could you send a version of the file that has predictions on it (i.e. not all NULL)?
Thanks!
ooops
totag_1537923981118.zip
Hey Abram,
I ran the script on the local CSV files and the behaviour seems expected. If you do pick_max=True and run it a couple of times you should definitely get a few sets from the infusions folder.
We have started to get images from infusions, but it is less common and the first ~20 times i did not. It would be nice to be able to force it to pick a specific folder, or sample from multiple folders at once as an option
That sounds like a good feature to have. It's a little complicated to implement - a good way to do it would be to (in download_vott_json.py) look at how many images from each folder are in the tagged.csv file, then bias towards picking images from the folder with less images currently. I'll leave this one to @olgaliak since the code will have to be significantly changed.
I was looking at the code in dowload_vott_json and decided for now give each folder "equal chances 634aed8
It will result in bigger number of images pulled to user machine -- but from all folders. Let's see how useful it will be.
@olgaliak I think the equal chances idea mentioned above makes sense, except I have a use case where it does not. I have just initialized a project with 70 user folders. Each folder has images that were tagged by a client as having presence/absence of a different type of animal. I asked for 20 images but it pulled 20 images from each folder!
It would be nice to have the config, allow the user to pull images from a specific folder or list of folders to avoid getting thousands of images at a time. My idea would be to allow the user to focus on specific classes (or sites or subsets of photos that they have organized themselves) to start with.
Maybe this should be a new Issue?