Multi-threaded loading of multi-label dataset

Question

Multi-threaded loading of multi-label dataset

jrbtaylor opened this issue 9 years ago · 9 comments

A flat folder structure is not possible with a multi-label dataset (in this case, video). Is it possible to set up asynchronous loading (and label extraction from a .json file) in dp?

Answer 1 · 2016-04-15T19:28:11.000Z

Perhaps a more specific question: what method should I overwrite to load batches with a custom function so I can otherwise use the dp framework for training?

Answer 2 · 2016-07-29T15:09:48.000Z

Sports-1m? Maybe pre-process and save tensors to disks the hard way and use dp to just load these tensors? What did you end up doing?

Answer 3 · 2016-07-29T16:13:45.000Z

It was for the ActivityNet challenge. I ended up using optim instead of dp and then it was pretty simple with the threads package.

Answer 4 · 2016-07-29T18:48:59.000Z

So while loading data asynchronously, you handled the workers yourself?

Answer 5 · 2016-07-29T19:08:14.000Z

Yes, that is the only way I found to do it. The threads package is pretty easy to use once the basic concept is figured out. You pass each worker two functions: one it runs independent of the other workers and one that is called by the main thread (not a worker) upon completion of the first. Data loading happens in the worker thread (the first function). GPU training has to occur in the latter function, as each worker has no idea what the other is doing, so they might write to overlapping GPU memory.

Answer 6 · 2016-07-29T19:30:17.000Z

Thanks! I was considering ElementResearch's dataload. In particular, AsyncIterator provides override-able methods.
Close this issue with a link to your code perhaps, for future searches?

Answer 7 · 2016-07-30T16:06:45.000Z

AsyncIterator looks great. Torch was lacking a general solution like that (or maybe I missed that entirely back when I was working on it).

If you're curious, here's my code: https://github.com/jrbtaylor/ActivityNet

Answer 8 · 2016-08-08T11:21:07.000Z

@jrbtaylor, you have mis-linked. Good example, thanks!

Answer 9 · 2016-08-08T11:25:45.000Z

I fixed the link. Thanks.