pytorch-ignite/code-generator

Remove local_rank and `idist.barrier()` from data.py if no distributed configuration selected

vfdev-5 opened this issue · 4 comments

Clear and concise description of the problem

Got a feedback that this code

    local_rank = idist.get_local_rank()

    ...

    if local_rank > 0:
        # Ensure that only rank 0 download the dataset
        idist.barrier()

    ...
    
    )
    if local_rank == 0:
        # Ensure that only rank 0 download the dataset
        idist.barrier()

looks a bit strange if no distributed configuration is selected

Let's put template conditions here as well

I was thinking about this issue. Initially, my thoughts were that a distributed-ready code would be better for the user than a sequential code, whether the user asked (sequential code = parallel code with 1 process). However, I understand perfectly that the code generator users inspecting the generated code would be surprised to see distributed tricks.

Finally, I think that the user would appreciate a sequential-look code, even if it is a distributed one. Sometimes, even often, the distributed computing is visible outside our handlers and tools, involving if clauses about rank. The more I think about it, the more I think that we should propose a collective api to the handlers. It should allow to generate a better code, with minimal specificities about distribution. A good example in such a mind is auto_* from idist.

Just a little thought 😊

Hey is the issue still open ? I would love to contribute. Also I was looking for the data.py file in the codebase but could not find it.
Thank you !!

@sayantan1410 thanks for your interest in helping with this issue.

Also I was looking for the data.py file in the codebase but could not find it.

For each template there is its own data.py file, for example for vision classification: https://github.com/pytorch-ignite/code-generator/blob/main/src/templates/template-vision-classification/data.py

Okay thank you !!