Performance issue in tasks.py

Question

Performance issue in tasks.py

DLPerf opened this issue 3 years ago · 4 comments

Describe the bug
I've found a performance issue in "tasks.py": dataset = dataset.batch(params['eval_batch_size'], drop_remainder=True)(here) should be called before dataset = dataset.map(_get_output)(here), which would make your program more efficient.
Here is the tensorflow document to support this thing.

To Reproduce
Steps to reproduce the behavior:

Go to "tasks.py"
Scroll down to line 104
See error

Expected behavior
call dataset = dataset.batch(params['eval_batch_size'], drop_remainder=True) before dataset = dataset.map(_get_output)

Proposed solution
Swap the order of dataset = dataset.map(_get_output) and dataset = dataset.batch(params['eval_batch_size'], drop_remainder=True) in "tasks.py".
Besides, you need to check the function _get_output(here) called in dataset.map() whether to be affected or not to make the changed code work properly. For example, if _get_output needs data with shape(x, y, z) as its input before fix, it will require data with shape(batch_size, x, y, z).

Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.

Answer 1 · 2021-08-19T22:26:11.000Z

Thanks for letting us know! It would be awesome if you could submit a PR with plots showing the performance improvement

Answer 2 · 2021-08-20T05:19:04.000Z

Thanks for your reply! Is there any benchmark to show the performance of function lambada_input(here)? @StellaAthena

Answer 3 · 2021-08-20T05:23:51.000Z

Thanks for your reply! Is there any benchmark to show the performance of function lambada_input(here)? @StellaAthena

Maybe I’m misunderstanding, but I was expecting you to run the code both ways and use a timer to show how long it takes.

Answer 4 · 2021-08-31T07:05:13.000Z

OK,
I'll try my best to run the code and calculate the time it takes.
Thank you, Dear Stella~ @StellaAthena