karlicoss/HPI

Figure out the 'core' and the module system

Opened this issue · 6 comments

I can manage, say, 30 modules I'm using personally.

If there are 200 modules, I will be completely overwhelmed (i.e. see what happened to oh-my-zsh or spacemacs).

I guess I need to figure out the 'core' of the system and a good way to make 'plugins', so you can use third party modules without merging them in the main repository/maintaining a fork.

Python packages kind of work, but modules need to be more lightweight. Ideally you don't need to make a proper python package from a module, as long as you're accepting you manage dependencies yourself.

Hi,
I'm still analyzing the source code, and so far I realized that, after the assumptions I made here, HPI core could really be a library with helper functions (mainly configuration loading, common and error modules) for the other modules/packages.

That said, if your really want to avoid pypi/packages (in the sense of setup.py backed packages), it could be done like this (I never did it, but I think it could work):

  • plugins will be stored in a user folder directory (e.g. ~/.config/my/plugins)
  • each plugin will be a package (that is, they are in their own folder with an empty __init__.py file)
  • the plugin directory will be added to sys.path by HPI during initialization or explicitly using an helper function (just because I don't like relying on __init__.py files);
  • the user config (in ~/.config/my.yml or the like) lists the modules/plugin to be used, along with their settings;
  • If plugins have their git repository, the repo address could be specified in the config, and a script (a module that is listed int the scripts parameter of setup.py to be easilly called via CLI) could be used to fetch the missing packages in the plugin directory;
  • Obviously, plugins can also be placed manually by users if they don't want to publish them.

All of this doesn't solve the "module not found" issues in IDEs while importing the modules, because the plugin directory is still dynamically loaded.
Plugins dependencies also need to be resolved - maybe a manifest.json file in the plugin folder that describes the plugin and lists the dependencies (I'm borrowing the idea from Ulauncher extension architecture)? This sounds like a makeshift setup.py to me...

IMO the most pythonic way to handle this is by using namespace packages, so that all the modules/plugins created can be installed separately and seen as a single package.
Obviously this needs a setup.py file, but:

  • there's no need to publish them on pypi, as you can pip install from a git repo or from a local directory;
  • as usual, pip install -e allows you to edit the source on the go;
  • plugins will be available (in terms of IDE commodity features) to other plugins and to final user without further work;
  • A script can be used to automate the calls to pip and install the plugins that the user specifies (via package name if published to pip, git repo url or local folder);
  • Everything could be installed in its own virtualenv for a bit of isolation.

Let me know what you think!

Hi! Sorry for a late reply, was a bit busy over the weekend...

Thanks for your thorough comments and time spent analyzing the code. Really appreciate it, and your thoughts!

I decided to respond to your other comment here too, because the issues kind of overlap anyway.

my first choice for the config location would be a user directory like ~/.config/my

Thanks, actually a really good idea. I'll make it a default :)

I would avoid using py files for configuration (mainly for security issues and to be a bit more user "friendly"), I would rather read JSON/YAML/INI/TOML files with sane defaults fallback.

I actually published a whole post about it just yesterday, it's kind of relevant to this package too!

Overall:

  • I feel like ability to hack the code is going to be an essential part of the system anyway to make it flexible

  • I'm not sure how much of an issue is security

    • you edit your own config
    • you run untrusted third party Python modules anyway
  • in terms of plain config being friendlier than Python file

    Maybe, but I guess would be good to figure out the architecture of the system first to understand what should go in the config and what shouldn't.

    Although I feel that simple Python is as easily readable as, say, simple JSON

  • if you are using a Python config, you can automatically find 90% of errors with pylint/mypy. If you're using JSON, you need to write a separate validation/unit tests/etc.

I like the Idea to make this, and every other repos in the ecosystem, a real python package that can act as libraries
core could really be a library with helper functions (mainly configuration loading, common and error modules)

Yep, feels about right!

the dependencies of the various my modules can be made an optional using extras_require as you did with testing
if your really want to avoid pypi/packages (in the sense of setup.py backed packages)

Not that I have anything against it, but most modules are fairly simple and consist of a single file.
I want to make is as easy as possible to extend and modify the system.
So I'd prefer if one could simply add a python file as a plugin, and carry on.

Maybe some 'heavy' modules could indeed be properly packaged when matured (with setup.py, etc) -- have nothing against it!

plugins will be stored in a user folder directory

Sure, seems like a reasonable default!

For inspiration, might also be a good idea to look how tools like pytest or mypy load their plugins (which can be separate packages).
Wonder if they just go all of installed packages and filter them.

each plugin in a package

Yeah, it's kind of the way it is now (just namespace packages instead separaty).

Some of them are single-file modules, and I just didn't bother creating unnecessary dirctory for them.
From Python's perspective, don't think module.py vs module/__init__py are any differnt?

the plugin directory will be added to sys.path by HPI during initialization

Yeah, sure, don't have any strong opinion about that -- some like it more explicit, some more implicit.

E.g. I personally use it with a helper script.

the user config (in ~/.config/my.yml or the like) lists the modules/plugin to be used, along with their settings

Yep, sure. That actually might work with namespace packages too.. E.g.:

~/.config/my/__init__.py
~/.config/my/plugins/food/...
~/.config/my/plugins/exercise/...
~/.config/my/plugins/books/...
~/.config/my/mycfg.py

Then mycfg.py contains:

from .plugins import food, exercise, bools 
ACTIVE_MODULES = [food, exercise, books]
# by default, it could just load everything from plugins??

Haven't tested, but seems like it could work?

script could be used to fetch the missing packages in the plugin directory

Maybe, but that makes it quite a bit more complicated.
Actually it almost feels like that kind of problem must occur in other projects, etc.

As you mentioned, virtualenv or something similar could be used? Seems that it might work in conjunction with pip/setup.py.
Yep, agree that it doesn't even need to be expoesed to the user and programmatic interface could be used.

Obviously, plugins can also be placed manually by users if they don't want to publish them

Yep!

All of this doesn't solve the "module not found" issues in IDEs while importing the modules, because the plugin directory is still dynamically loaded.
Plugins dependencies also need to be resolved - maybe a manifest.json file in the plugin folder that describes the plugin and lists the dependencies (I'm borrowing the idea from Ulauncher extension architecture)? This sounds like a makeshift setup.py to me...

IMO the most pythonic way to handle this is by using namespace packages

100% agree! That's actually what I've been doing so far with external dependencies.

Obviously this needs a setup.py file, but

Actually, not necessarily! I've been getting away with it by simply using symlinks! E.g. example in the repository

That makes it transparent and pylint/mypy/IDE friendly.
I'm not sure if I Windows has symlinks? And maybe good to have an alternative means of installing modules too.
So agree with what you mention about virtualenv/pip.


Wow, that's quite long. Again, thanks for you elaborate comments :) Hopefully that helps with simplifying and documenting the system!

Hi,
let me add some more info/clarifications (to make this discussion even longer :P )

Let's say that I'm biased towards a separation of configuration from the code, mainly because I write tools for colleagues that don't know python but can handle (more or less) an ini or json file.

As I stated in the other issue, a library like file-config makes configuration loading easier, with type hinting and validation at runtime, and you have the entire configuration object definition already available for the logic to work.

As of now, to know the configuration options that can/should be present in mycfg, one needs to inspect every single file in search for mycfg imports and guess the structure from the usage.
Im my projects, I usually have a single module that handles the configuration loading, and all the options are there for the other modules to read.

This is an (untested) example of a refactored configuration for rss and pdfs

# my/config.py
from pathlib import Path
from typing import List
from file_config import config, var

@config(title="Feedbin", description="Feedbin export paths configuration")
class Feedbin:
    enabled = var(bool, default=False)
    export_dir = var(str)

@config(title="Feedly", description="Feedly export paths configuration")
class Feedly:
    enabled = var(bool, default=False)
    export_dir = var(str)

@config(title="RSS", description="RSS configuration")
class RSS:
    feedbin = var(Feedbin)
    feedly = var(Feedly)

@config(title="PDF", description="PDF annotations configuration")
class PDFs:
    search_paths = var(List[str])
    ignore_files = var(List[str])

@config(title="My Config", description="HPI modules configuration")
class MyConfig:
    pdfs = var(PDFs)
    rss = var(RSS)

def load_user_config() -> MyConfig:
    cfg_path = Path.home() / ".config" / "my" / "config.toml"
    if cfg_path.exists():
        with path.open() as tomlconfig:
            return MyConfig.load_toml(tomlconfig)

To make it extendable, you can modify this pattern my making plugins declare their own configuration object to be added to the main config via injection, if it has to be shared with other modules, or just use their own config object, leveraging HPI only for a configuration loading function.

Again, from what I saw, it seems that the various modules can't be called proper plugins, since there's no logic that ties them together. So the only "logic" that I see is to have everything inside the my package.

For namespace packages I meant what it's stated in the documentation I linked:

Namespace packages allow you to split the sub-packages and modules within a single package across multiple, separate distribution packages (referred to as distributions in this document to avoid ambiguity).

my-rss/
    setup.py
    my/
        rss/
            __init__.py
            config.py
            logic.py

my-fb/
    setup.py
    my/
        fb/
            __init__.py
            config.py
            logic.py

That way, users can create and publish their own subpackage that will be added to the my main package.

In the case I am totally off, and indeed a plugin architecture is needed, here are some other thoughts:

Plugin discovery and loading is a topic already covered by multiple projects in multiple ways;
during my search for a good plugin management system for a program developed for my company, I've stumbled across stevedore, part of openstack, and fell in love because of the wonderful explanation made by the author.
It simply wraps what is already there, setuptools entrypoints.
Of course, it builds on the fact that the program and the plugins are proper python (distribution) packages.

Previously mentioned Ulauncher, instead, implements its own plugin discovery and management, by looking for folders with manifest.json in its extension directory, validates the manifest and launches the main.py as a separate process (not what you need) without adding them to sys.path.
To mee it seems overkill to implement all this logic to avoid using standard python packaging tools.

But then again, I'm biased because I need to deploy apps that don't need tinkering on the user side.

I guess the discussion is to complex to solve it in short messages :D

Let's say that I'm biased towards a separation of configuration from the code, mainly because I write tools for colleagues that don't know python but can handle (more or less) an ini or json file.

Sure. But for now the whole system is so far from being accessible for an average user (i.e. you have to get oauth tokens, run cron jobs, and so on), that I'd rather not trade the flexibility at this stage. I think adding plain yaml configs would be easy to do later if it feels necessary.

As of now, to know the configuration options that can/should be present in mycfg, one needs to inspect every single file in search for mycfg imports and guess the structure from the usage.
Im my projects, I usually have a single module that handles the configuration loading, and all the options are there for the other modules to read.

Ah, yep, totally agree about it! I just have been postponing it so far until things relatively settle!
It would also help in writing the config as well, allowing to typecheck it properly.

As I stated in the other issue, a library like file-config makes configuration loading easier, with type hinting and validation at runtime

Hmm, it looks nice indeed, thanks! From the first grance looks similar to mypy's Protocols, but having extra runtime checks are probably better for an average user, who wouldn't bother running mypy.

For namespace packages I meant what it's stated in the documentation I linked:

Yep! So far, I've applied your suggestion of namespace packages, and split out the config into a namespace package. So at the moment it's:

my/
    common.py
    error.py
    ...
    hypothesis.py
    pinboard.py
    youtube.py

(in the future common/error might be moved into my.core and the individual modules into separate 'plugins').

And the private configuration in ~/.config/my:

my/
    config/
        __init__.py

And thanks for your links regarding packaging! Bookmarked and will check them out.

Namespace packages turned out to be even more flexible than I expected... So I managed to simplify it quite a bit, hopefully can finally extract the interfaces as the next step! #41

I thought about the configs specifically a bit more, and ended up writing a whole document about it, figured it's worth a separate issue #46