titu1994/neural-image-assessment

Performance issues in utils/data_loader.py (by P3)

Opened this issue · 1 comments

Hello! I've found a performance issue in utils/data_loader.py: batch() should be called before map(), which could make your program more efficient. Here is the tensorflow document to support it.

Detailed description is listed below:

  • data_loader.py: train_dataset.batch(batchsize)(line 103) should be called before train_dataset.map(parse_data, num_parallel_calls=2)(line 101).
  • data_loader.py: val_dataset.batch(batchsize)(line 140) should be called before val_dataset.map(parse_data_without_augmentation)(line 138).
  • data_loader.py: train_dataset.batch(batchsize)(line 194) should be called before train_dataset.map(parse_single_record, num_parallel_calls=4)(line 192).

Besides, you need to check the function called in map()(e.g., parse_single_record called in train_dataset.map(parse_single_record, num_parallel_calls=4)) whether to be affected or not to make the changed code work properly. For example, if parse_single_record needs data with shape (x, y, z) as its input before fix, it would require data with shape (batch_size, x, y, z).

Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.

Hello, I'm looking forward to your reply~