thebjorn/pydeps

Support pyproject.toml for options

Closed this issue · 10 comments

The pyproject.toml file is slowly becoming the standard file for a modules metadata and configuration of used tools. It is encouraged in modern PEPs and supported my many popular tools like black, isort etc. already support it.

Hi Nils, and thank you for your interest in pydeps.

What problem would pyproject.toml solve for Pydeps? black and isort are not good examples for me (e.g. black is one person's code style preferences heaved onto the entire Python community without any discussion nor configurability).

I'm not saying I would reject a PR, but I'd like to understand why first...

Those were just examples of tools which already support reading configuration from this file. It would not solve a problem per se but allow pydeps to be configured in a place which slowly seems to become the standard in the Python ecosystem. The pyprpject.toml file got introduced in PEP-518 and reserves the tool namespace to be used for configuring setting of tools. This way to root folder of a project does not have to be filled with small settings files which only contain a few lines for each tool (like linters etc.).

[...] allow pydeps to be configured in a place which slowly seems to become the standard in the Python ecosystem [...]

After searching, I see many examples of people wanting this to be true, but of the hundreds of available tools, few have actually done this. I'm guessing some of the reasons include not having a parser in the Python stdlib, the TOML spec not being at v1.0 yet, and the Python parsers that exists only support v0.5.0 of the TOML spec.

TOML is an ugly file format, so I have a feeling that pyproject.toml is not going to survive as a repository for the universe of tools. The fact that Guido doesn't like it is also not helpful (https://mail.python.org/pipermail/python-ideas/2018-October/054126.html).

I will consider a PR when there's an obvious and stable choice of TOML parsing library that supports v1+ of the spec.

I understand that decision. The v0.5 spec can however be considered stable and backwards compatible and release candidates for v1.0 are currently made.

I also hope that we will get a TOML parser in the stdlib after the spec is finalized and there already were some discussions on mailinglists about this. I think pip currently vendors promo.

I will consider a PR when there's an obvious and stable choice of TOML parsing library that supports v1+ of the spec.

There is now tomllib in the standard library

You can checkout the dev/pyproject-toml branch for my attempt.

My thoughts:

TOML is just an ugly syntax and I don't like it.

There are too many ways to write the same thing:

license = {text = "BSD"}

vs

license.text = "BSD"

Multi-line strings are not handled smoothly

description = """pydeps is a python module dependency analysis tool.
    It parses the import statements in python source files and
    generates a dependency graph from them."""

gives you lots of unnatural white-space:

$> tomlq .project.description testoml.toml
"pydeps is a python module dependency analysis tool.\n        It parses the import statements in python source files and\n        generates a dependency graph from them."

It is too verbose, consider this toml (1112 characters):

[project]
    name = "pydeps"
    version = "1.11.0"
    authors = [{name = "bjorn", email = "bp@datakortet.no"}]
    license.text = "BSD"
    description = """pydeps is a python module dependency analysis tool.
        It parses the import statements in python source files and
        generates a dependency graph from them."""
    keywords = ["Python", "Module", "Dependency", "graphs"]
    readme = "README.rst"
    classifiers = [
        "Development Status :: 5 - Production/Stable",
        "Intended Audience :: Developers",
        "Natural Language :: English",
        "License :: OSI Approved :: BSD License",
        "Operating System :: OS Independent",
        "Programming Language :: Python",
        "Programming Language :: Python :: 2",
        "Programming Language :: Python :: 2.7",
        "Programming Language :: Python :: 3",
        "Topic :: Software Development :: Libraries :: Python Modules",
    ]
    urls = {Homepage = "https://github.com/thebjorn/pydeps"}
    dependencies = [
        'enum34; python_version < "3.4"',
        "stdlib_list",
    ]

vs. the same structure in yaml (977 characters):

---
project:
  name: pydeps
  version: 1.11.0
  authors:
    - name: bjorn
      email: bp@datakortet.no
  license: 
    text: BSD
  description: >
    pydeps is a python module dependency analysis tool.
    It parses the import statements in python source files and
    generates a dependency graph from them.
  keywords: [Python, Module, Dependency, graphs]
  readme: README.rst
  classifiers: |
    Development Status :: 5 - Production/Stable
    Intended Audience :: Developers
    Natural Language :: English
    License :: OSI Approved :: BSD License
    Operating System :: OS Independent
    Programming Language :: Python
    Programming Language :: Python :: 2
    Programming Language :: Python :: 2.7
    Programming Language :: Python :: 3
    Topic :: Software Development :: Libraries :: Python Modules
  urls:
    Homepage: http://github.com/thebjorn/pydeps
  dependecies: |
    enum34        ; python_version<"3.4"
    stdlib-list

What are the extra 135 characters buying you..? I find the yaml version both easier to read and to write.

sub-mappings are repetitive:

[tool.dkbuild]
    [tool.dkbuild.package]
    name = "pydeps"
    description = "Display module dependencies"
    version = "1.11.0"
    created = "2014"

    [tool.dkbuilds.build]
    venv = "pydeps"

    [tool.dkbuild.requirements]
    update = "manual"

vs.

tool:
  dkbuild:
    package:
      name: pydeps
      description: Display module dependencies
      version: 1.11.0
      created: 2014

    build:
      venv: pydeps

    requirements:
      update: manual

If I were a tool-creator, why would I want my users to do this..? (I see now that created has the wrong type in the toml-document, which it can't in the yaml version since it would cause a schema validation error).

Also, typos in sub-mapping declarations (did you catch the one above - I didn't until I did tomq . testtoml.yml) leads to data structured vastly differently than the visual representation.

It doesn't help that there is, in general, no way to use a pyproject.toml file to get the version number of a package without installing it (ie. there is no python setup.py --version analogue).

There is also no way of finding the source directory from the pyproject.toml file (sure, if you knew all build-systems in existence you would know to look in tomlq .tools.setuptools.packages pyproject.toml but that seems very fragile...)

I'm really hoping there will be a pyproject.yml format coming soon ;-) -- or really a .pyproject package-root folder...

I respond to the different points you make about TOML (vs YAML) below the ruler, however, I think it's missing the point a bit.

Supporting pyproject.toml should be very easy as all the config currently in .pydeps is in INI format anyhow. Parsing pyproject.toml and looking for a [tools.pydeps] section can be done trivially. The resulting data would not need to be transformed but can simply be used the same way as the output from the INI configparser.
This would allow people who prefer pyproject.toml to use it if they wish so. While this might no be everybody, it is certainly a large share, and seems to be where the Python ecosystem is headed.


TLDR:
TOML and YAML shine in different scenarios. pyproject.toml contains mostly very flat data and would not gain a lot from the tree structure which YAML allows and instead suffer from the drawbacks.

You can checkout the dev/pyproject-toml branch for my attempt.

That seems to be about changing this repo to use pyproject.toml, not about supporting a [tools.pydeps] inside it for pydeps configuration.

My thoughts:

TOML is just an ugly syntax and I don't like it.

That is very subjective. To me it is primarily an extension to INI to allow nesting and better defined types.

There are too many ways to write the same thing:

license = {text = "BSD"}

vs

license.text = "BSD"

True, but not necessarily a bad thing. The latter is mostly a shorthand for the former when one wants to set only a single value. Especially when comparing it to YAML as you do below this is worlds simpler.

Multi-line strings are not handled smoothly

description = """pydeps is a python module dependency analysis tool.
    It parses the import statements in python source files and
    generates a dependency graph from them."""

gives you lots of unnatural white-space:

$> tomlq .project.description testoml.toml
"pydeps is a python module dependency analysis tool.\n        It parses the import statements in python source files and\n        generates a dependency graph from them."

This is actually equivalent to how it is handled in python. It's debatable whether that should be stripped or not but at least it's consistent with python (and there is textwrap.dedent should that be bothering someone).
It's furthermore very easy to learn and one has to be explicit what to strip/include unlike the 6 ways to use multiline strings in YAML (>, |, >+, |+, >-, |-).

It is too verbose, consider this toml (1112 characters): [...]

vs. the same structure in yaml (977 characters): [...]

What are the extra 135 characters buying you..? I find the yaml version both easier to read and to write.

In your case about half of that is due to indentation (4 for TOML vs 2 for YAML over ~30 lines). For the rest you mostly gain more explicit types which won't to stuff you do not expect. E.g. classifiers and dependencies are proper lists instead of multiline-strings, you do not risk having your version number (e.g. 2.10 converted to a float and back to a string when you use it - resulting in 2.1) etc.

sub-mappings are repetitive: [...]

If I were a tool-creator, why would I want my users to do this..?

For me TOML and YAML are geared towards different stuff. I would never use TOML for deeply nested configurations (e.g. Github Action Pipelines) and I would never use YAML for a mostly flat configuration file (such as .pydeps).

(I see now that created has the wrong type in the toml-document, which it can't in the yaml version since it would cause a schema validation error).

There are already tools which can check TOML schema (and are compatible with e.g. existing JSON schema definitions).

Also, typos in sub-mapping declarations (did you catch the one above - I didn't until I did tomq . testtoml.yml) leads to data structured vastly differently than the visual representation.

Yes, thats why I would not suggest TOML for highly nested stuff

It doesn't help that there is, in general, no way to use a pyproject.toml file to get the version number of a package without installing it (ie. there is no python setup.py --version analogue).

That is not a problem of/discussion about pyproject.toml/TOML but rather a unique feature of old setuptools setups. setup.py files are actually already discouraged and one should either use setup.cfg or directly switch to pyproject.toml. I am not quite sure in which case this would be relevant: When you inspect an already downloaded package, look into some repo...? During runtime you can easily get the version regardless.

There is also no way of finding the source directory from the pyproject.toml file (sure, if you knew all build-systems in existence you would know to look in tomlq .tools.setuptools.packages pyproject.toml but that seems very fragile...)

That again is very setuptools/setup.py specific which is by no means the only player on the block anymore.

I'm really hoping there will be a pyproject.yml format coming soon ;-) -- or really a .pyproject package-root folder...

I see little advantages in YAML versus TOML for what is usually inside pyproject.toml. The only advantage might be if one would also want to move pre-commit hook config into it but that is probably the only one I can think of.

TOML is just an ugly syntax and I don't like it.

That is very subjective.

Very true :-)

I might have misunderstood this issue though. If it's only about the ability to look for .pydeps configuration data in pyproject.toml that is certainly doable.

Check out v1.12.5 now available on PyPI.

I might have misunderstood this issue though. If it's only about the ability to look for .pydeps configuration data in pyproject.toml that is certainly doable.

Yes my request was only for reading the config from other places :D As I am not actively contributing I couldn't care less where you put the build/package information so feel free to stick with what you prefer ;)