compact packaging - all inside namespace
Closed this issue ยท 11 comments
๐ Feature
refactor to make the package more compact, move configs and train inside
Motivation
be able to call it from everywhere as python -m lightning_transformers.train --some asrgs
Pitch
Easier to use from everywhere
reduce collision with configs (if any) from other packages
Alternatives
Additional context
I don't think we should do this:
- train: the only reason we have
train.py
is for people who rely on having these scripts at the root level. If we move them inside, their purpose is defeated. We have the console entry points defined for those who would want to dopython -m lightning_transformers.train ...
- configs: I also like having these at the root. It also means that if users want to modify them, they don't have to make changes to the source code directory. We just need to rename it to avoid collisions
If anyone can explain to me the issue with the conf/
directory please do! If we have to rename, would config/
be ok?
- train: the only reason we have
train.py
is for people who rely on having these scripts at the root level. If we move them inside, their purpose is defeated. We have the console entry points defined for those who would want to dopython -m lightning_transformers.train ...
well you can still have just a wrapper in the root (almost no code duplication)
- configs: I also like having these at the root. It also means that if users want to modify them, they don't have to make changes to the source code directory. We just need to rename it to avoid collisions
Well here you can use the configs in the package which would be kind of frozen, but still, you are free to use any config from your local/custom path
If anyone can explain to me the issue with the
conf/
directory please do! If we have to rename, wouldconfig/
be ok?
In the case if you install the package the config
folder is copied to the site-packages
which can/would be overwritten by any other package as this would be a common name...
@Borda but couldn't you simply put them to the ignore part of find_packages in setup.py?
@Borda but couldn't you simply put them to the ignore part of find_packages in setup.py?
yes, you can also ignore them in the manifest, but that is not the point, you as a user want to have the configs, right? not need aways to clone the repo, just to install it with pip and use...
well you can still have just a wrapper in the root (almost no code duplication)
This is already what we have:
https://github.com/PyTorchLightning/lightning-transformers/blob/master/train.py
Well here you can use the configs in the package which would be kind of frozen, but still, you are free to use any config from your local/custom path
But then you would have to copy the train.py code and change the hydra path to the config, so we want people to use our config directory
If anyone can explain to me the issue with the
conf/
directory please do! If we have to rename, wouldconfig/
be ok?
https://github.com/PyTorchLightning/internal-dev/issues/132
but couldn't you simply put them to the ignore part of find_packages in setup.py?
Yes for the tests, (see issue above) but not for the configs as we want to distribute them.
What if instead of yaml configs we ship dataclass configs as part of the package? Then the location doesn't matter so much since they can be imported.
Hi @carmocca @Borda @SeanNaren, Gianluca's here. I've a few contributions on PL and wanted to give my opinion on the issue. I think having everything inside the package could be really useful for end-user. Since users usually install packages from pypi, we cannot rely on the repository to distribute training scripts and configurations. I think having all inside a unique package, instead of having "lightinin-transformer-train" and so on, would be cleaner for the users.
What if instead of yaml configs we ship dataclass configs as part of the package? Then the location doesn't matter so much since they can be imported.
I don't think this is supported by Hydra. And even if it did, we want to facilitate users modifying/extending these configs and yaml
files are simpler for this purpose
Since users usually install packages from pypi, we cannot rely on the repository to distribute training scripts and configurations.
Why not? This is exactly the point of the MANIFEST.in
file
I think having all inside a unique package, instead of having "lightinin-transformer-train" and so on, would be cleaner for the users.
Are you saying you prefer?:
$ python -m lightning_transformers.cli.train ...
over
$ ligtning-transformers-train ...
This is totally fine, but note that these two options are not mutually exclusive.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.