Performance issues in /capslayer/data/datasets (by P3)

Question

Performance issues in /capslayer/data/datasets (by P3)

Opened this issue 3 years ago · 1 comments

Hello! I've found a performance issue in /capslayer/data/datasets: batch() should be called before map(), which could make your program more efficient. Here is the tensorflow document to support it.

Detailed description is listed below:

/cifar10/reader.py: dataset.batch(batch_size)(here) should be called before dataset.map(parse_fun)(here).
/fashion_mnist/reader.py: dataset.batch(batch_size)(here) should be called before dataset.map(parse_fun)(here).
/mnist/reader.py: dataset.batch(batch_size)(here) should be called before dataset.map(parse_fun)(here).
/cifar100/reader.py: dataset.batch(batch_size)(here) should be called before dataset.map(parse_fun)(here).

Besides, you need to check the function called in map()(e.g., parse_fun called in dataset.map(parse_fun)) whether to be affected or not to make the changed code work properly. For example, if parse_fun needs data with shape (x, y, z) as its input before fix, it would require data with shape (batch_size, x, y, z).

Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.

Answer 1 · 2021-11-04T09:31:19.000Z

Hello, I'm looking forward to your reply~