dicarlolab/archconvnets

Unknown Issue with data providers

Closed this issue · 3 comments

@ardila @daseibert

In thinking about the dataprovider issue that we discussed yesterday, I think it's important to understand the primary reason why I re-implemented part of the code from the original dataprovider: that you have to provide a method that will read in the data, as well as simply putting the data in batches.

There is no spec for what the data looks like in the batch files. It is entirely up the user to determine what that format is. Then, you provide a method for reading the data in. What IS specified is what the data looks like when it's been read in. If there is an error in the data provider as I've written it, perhaps it's there. But of course, we have a bunch of tests that seem to show it works exactly as it ought to in the test cases.

So, with regard to the strategy of just writing out batches and then pointing the "standard" dataprovider to that path: It might be possible to re-engineer the written-out format of the data as used by the existing data providers, so that it can be read in using the existing data provider. However, there is no spec'ed out description of what that format is.

I believe the intended use pattern is that the user writes a data provider for their own read-in/write-out format, re-implementing a small part of the data provider code, as we have done. If there's some error there, let's just figure out what it is.

@ardila if you provide a clear example of where the behavior is different from what you expect it to be, I will work on debugging it -- since I wrote the code to begin with.

I believe Darren has already re-engineered the format so I will have to ask
him. I still want a method in the dataset base class that writes out
batches in an efficient way (which is probably not preprocessing one batch
at a time if you want to take advantage of multiple CPUs well)

I will write up some code that will reproduce the two main errors we've
been talking about

  1. Hanging for dataset based providers when you try and run a network and
    the first batch is not yet cached
  2. All other things being equal, networks train and test much faster on
    dataset based providers, and test error never goes down below what is
    expected by chance, even if you make the images as similar as possible to
    Cifar images

On Friday, November 8, 2013, Dan Yamins wrote:

@ardila https://github.com/ardila @daseiberthttps://github.com/daseibert

In thinking about the dataprovider issue that we discussed yesterday, I
think it's important to understand the primary reason why I re-implemented
part of the code from the original dataprovider: that you have to provide a
method that will read in the data, as well as simply putting the data in
batches.

There is no spec for what the data looks like in the batch files. It is
entirely up the user to determine what that format is. Then, you provide a
method for reading the data in. What IS specified is what the data looks
like when it's been read in. If there is an error in the data provider as
I've written it, perhaps it's there. But of course, we have a bunch of
tests that seem to show it works exactly as it ought to in the test cases.

So, with regard to the strategy of just writing out batches and then
pointing the "standard" dataprovider to that path: It might be possible to
re-engineer the written-out format of the data as used by the existing data
providers, so that it can be read in using the existing data provider.
However, there is no spec'ed out description of what that format is.

I believe the intended use pattern is that the user writes a data provider
for their own read-in/write-out format, re-implementing a small part of the
data provider code, as we have done. If there's some error there, let's
just figure out what it is.

@ardila https://github.com/ardila if you provide a clear example of
where the behavior is different from what you expect it to be, I will work
on debugging it -- since I wrote the code to begin with.


Reply to this email directly or view it on GitHubhttps://github.com//issues/7
.

Great -- I'll look at these examples when you have them -- I want to fix this right away since it seems like a significant stumbling block, and one where we really don't understand what the problem is. Let's chat about the exact structure/use-case when you're in the lab. I'll also interface with Darren to make sure I understand what he's done.

We have solved this issue with the current DLDataProvider