Lightning-Universe/lightning-transformers

compact packaging - all inside namespace

Closed this issue ยท 11 comments

Borda commented

๐Ÿš€ Feature

refactor to make the package more compact, move configs and train inside

Motivation

be able to call it from everywhere as python -m lightning_transformers.train --some asrgs

Pitch

Easier to use from everywhere
reduce collision with configs (if any) from other packages

Alternatives

Additional context

I don't think we should do this:

  • train: the only reason we have train.py is for people who rely on having these scripts at the root level. If we move them inside, their purpose is defeated. We have the console entry points defined for those who would want to do python -m lightning_transformers.train ...
  • configs: I also like having these at the root. It also means that if users want to modify them, they don't have to make changes to the source code directory. We just need to rename it to avoid collisions

If anyone can explain to me the issue with the conf/ directory please do! If we have to rename, would config/ be ok?

Borda commented
  • train: the only reason we have train.py is for people who rely on having these scripts at the root level. If we move them inside, their purpose is defeated. We have the console entry points defined for those who would want to do python -m lightning_transformers.train ...

well you can still have just a wrapper in the root (almost no code duplication)

  • configs: I also like having these at the root. It also means that if users want to modify them, they don't have to make changes to the source code directory. We just need to rename it to avoid collisions

Well here you can use the configs in the package which would be kind of frozen, but still, you are free to use any config from your local/custom path

If anyone can explain to me the issue with the conf/ directory please do! If we have to rename, would config/ be ok?

In the case if you install the package the config folder is copied to the site-packages which can/would be overwritten by any other package as this would be a common name...

@Borda but couldn't you simply put them to the ignore part of find_packages in setup.py?

Borda commented

@Borda but couldn't you simply put them to the ignore part of find_packages in setup.py?

yes, you can also ignore them in the manifest, but that is not the point, you as a user want to have the configs, right? not need aways to clone the repo, just to install it with pip and use...

well you can still have just a wrapper in the root (almost no code duplication)

This is already what we have:

https://github.com/PyTorchLightning/lightning-transformers/blob/master/train.py

Well here you can use the configs in the package which would be kind of frozen, but still, you are free to use any config from your local/custom path

But then you would have to copy the train.py code and change the hydra path to the config, so we want people to use our config directory

If anyone can explain to me the issue with the conf/ directory please do! If we have to rename, would config/ be ok?

https://github.com/PyTorchLightning/internal-dev/issues/132

but couldn't you simply put them to the ignore part of find_packages in setup.py?

Yes for the tests, (see issue above) but not for the configs as we want to distribute them.

What if instead of yaml configs we ship dataclass configs as part of the package? Then the location doesn't matter so much since they can be imported.

Hi @carmocca @Borda @SeanNaren, Gianluca's here. I've a few contributions on PL and wanted to give my opinion on the issue. I think having everything inside the package could be really useful for end-user. Since users usually install packages from pypi, we cannot rely on the repository to distribute training scripts and configurations. I think having all inside a unique package, instead of having "lightinin-transformer-train" and so on, would be cleaner for the users.

What if instead of yaml configs we ship dataclass configs as part of the package? Then the location doesn't matter so much since they can be imported.

I don't think this is supported by Hydra. And even if it did, we want to facilitate users modifying/extending these configs and yaml files are simpler for this purpose

Since users usually install packages from pypi, we cannot rely on the repository to distribute training scripts and configurations.

Why not? This is exactly the point of the MANIFEST.in file

I think having all inside a unique package, instead of having "lightinin-transformer-train" and so on, would be cleaner for the users.

Are you saying you prefer?:

$ python -m lightning_transformers.cli.train ...

over

$ ligtning-transformers-train ...

This is totally fine, but note that these two options are not mutually exclusive.

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.