Performance issue in tasks.py
DLPerf opened this issue · 4 comments
Describe the bug
I've found a performance issue in "tasks.py": dataset = dataset.batch(params['eval_batch_size'], drop_remainder=True)
(here) should be called before dataset = dataset.map(_get_output)
(here), which would make your program more efficient.
Here is the tensorflow document to support this thing.
To Reproduce
Steps to reproduce the behavior:
- Go to "tasks.py"
- Scroll down to line 104
- See error
Expected behavior
call dataset = dataset.batch(params['eval_batch_size'], drop_remainder=True)
before dataset = dataset.map(_get_output)
Proposed solution
Swap the order of dataset = dataset.map(_get_output)
and dataset = dataset.batch(params['eval_batch_size'], drop_remainder=True)
in "tasks.py".
Besides, you need to check the function _get_output
(here) called in dataset.map()
whether to be affected or not to make the changed code work properly. For example, if _get_output
needs data with shape(x, y, z) as its input before fix, it will require data with shape(batch_size, x, y, z).
Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.
Thanks for letting us know! It would be awesome if you could submit a PR with plots showing the performance improvement
Thanks for your reply! Is there any benchmark to show the performance of function lambada_input
(here)? @StellaAthena
Thanks for your reply! Is there any benchmark to show the performance of function
lambada_input
(here)? @StellaAthena
Maybe I’m misunderstanding, but I was expecting you to run the code both ways and use a timer to show how long it takes.
OK,
I'll try my best to run the code and calculate the time it takes.
Thank you, Dear Stella~ @StellaAthena