Digital-Sapphire/PyUpdater

Project Update v5 - Clean Slate

JMSwag opened this issue ยท 37 comments

Hey All,

v4 was a cluster of my own making. Essentially not giving the project enough of my time. My sinceriest apologies. There is some good news though. I've finally got over my fear of front end development and have been learning quiet a lot in that area and have some cool projects under my belt. I finally feel like a half way complete developer. ๐Ÿ˜…

I initially wrote this project while learning how to code. Shotly after going though learn python the hard way. Shout out to Zed Shaw.

Ok let's get down to business.

Lots of thinks I'd like to improve on.

PyUpdater v5 will be a clean break, saving the good stuff. Thinking in the since of a toolkit where parts are easily replaced.

Gaols

  • Remove all compatiabilty code
  • Remove as my 3rd party libraries as possible.
  • Only supported versions of python. We end when the PSF ends.
  • Support other hosting proviers. .i.e Github releases
  • Make a way to migrate from v3 PyUpdatr projects
  • Best effort on migrating from v4 PyUpdater projects, The config file could be broken beyound repair. Gather feedback during alpha/beta phases.

Questions / Comments / Concerns are very welccome.

@NicklasTegner @dennisvang

Fantastic.
I have the following questions.

  • Do you want to rewrite it from scratch, or split apart what we already have.
  • By supported python and PyInstaller versions, I'm assuming we are talking about Python 3.x and PyInstaller 3x and 4x
  • Where do you want to begin?

In addition, if you need help approving code, managing the documentation, feel free to tag me or any of the sorts.
Let's get started :)

@NicklasTegner
Glad you're on board.

  • Ideally keep the good parts of what we have and improve on what's feasible. I'm fine with everything else being thrown out.
  • I've simplified my initial comment. Essentially we support what's supported by the PSF
  • Not quiet sure just yet. I do have a milestone created. Let me go over existing issues & the codebase. After analysis I'll start creating a few tickets / assigning existing tickets to the v5 milestone.

Will do a the help is greatly appreciated!

@JMSwag nice. Looking forward to a roadmap / todo list of some kind.
May I suggest linking the milestone in the first post, and pinning this issue.

@JMSwag Looks like a good plan. :)

I particularly like the idea of minimizing third-party dependencies and simplifying where possible.

Some steps towards simplifying the code base:

  • drop support for Python 3.6, which has reached end-of-life
  • replace os.path by pathlib equivalents, where possible
  • only create new directories and files in one place, to prevent side effects such as caused by PackageHandler._setup_work_dirs()
  • issue #323

It might also be worth having a discussion about the data structure for the versions.gz file and config.pyu file, and the general workflow:

  • What needs to remain, and what can go? For example, are "platform" keys required in the versions.gz file, or would it make more sense only to create a separate versions file for each platform?
  • What would be the workflow for multi-platform development?
  • I also wonder if it is necessary to keep the channel variables/arguments/keys throughout the code. The "channel" information is already contained in the version string (or Version object), so this feels redundant. Getting rid of both "channel" and "platform" keys from the versions file, etc., would certainly simplify things.
  • I suppose the new release should at least be able to produce a versions.gz file (and keys.gz) that can be read by the Client from older releases. In my opinion, clients in the field should be able to update from 3.* to 4.* to 5.* without issue.

A clear definition of "release channels" would also help guide further development:

  • The concepts of "release cycle" and "release channel," as implemented in PyUpdater, should be clearly defined in the documentation.
  • Currently, it looks like release channels are treated as separate, isolated/parallel paths. Is that the intention? That would imply it is not possible to update from e.g. an alpha version to a beta version or stable version. The fact that there is only one patch counter, on the other hand, suggests the opposite.
  • The basic software release cycle (...->alpha->beta->release candidate->production (stable)->alpha->...) suggests a linear path where alpha/beta/rc are intermediate steps between the "stable" releases. If I'm on the alpha channel, I expect to get any available update, i.e. alpha, beta, rc, or stable, whichever is the latest. If I'm on the beta channel, I expect to get only beta, rc, and stable updates, and so on and so forth.
  • Instead of the current release channels "alpha", "beta", "stable", it would be more convenient to adhere to PEP440 terminology. Among other things, PEP440 distinguishes between "final release", i.e. our "stable", and "pre-release", i.e. "alpha", "beta", or "release candidate". I would propose to drop the "stable" terminology, and, instead, let channel=None (or channel="", if you prefer) correspond with a "final release". We should then support pre-release channels as "a", "b", "rc".

In the meantime I've started fixing some of the issues related to the 4.0 release. I think that's coming along reasonably well, but I've taken a lot of liberty there, as you can see e.g. in PR #316...

@JMSwag Could you help me understand the reasoning behind the current structure of the version manifest, as in versions.gz?

Here's an example what it looks like, currently:

{
  "latest": {
    "Acme": {
      "alpha": {"win": "1.0.2.0.0"},
      "stable": {"win": "1.0.1.2.0"}
    }
  },
  "updates": {
    "Acme": {
      "1.0.0.2.0": {
        "win": {
          "file_hash": "***",
          "file_size": 279320368,
          "filename": "Acme-win-1.0.0.zip"
        }
      },
      "1.1.0.2.0": {
        "win": {
          "file_hash": "***",
          "file_size": 275047978,
          "filename": "Acme-win-1.1.0.zip",
          "patch_hash": "***",
          "patch_name": "Acme-win-stable-0",
          "patch_size": 3063604
        }
      },
      "1.2.0.0.0": {
        "win": {
          "file_hash": "***",
          "file_size": 301965712,
          "filename": "Acme-win-1.2.0-alpha.zip"
        }
      }
    }
  },
  "signature": "***"
}

Now, I'm wondering:

  • Is there a use case where the manifest file would contain multiple different app-name keys (like "Acme")?
  • Is there a use case where the manifest file would contain multiple different platform keys (like "win")?
  • Is the "latest" object really necessary? The actual latest version must be determined dynamically anyway, based on channel, as in get_highest_version().

The reason I'm asking is this:

Following the discussion around issue #255, I'm thinking of an alternative implementation, along the lines of separate manifest files for each app and supported platform, with a simplified, shallower, structure.

For example something like this:

{
  "app_name": "Acme",
  "platform": "win",
  "signature": "***",
  "archives": [
    {
      "version": "1.0.0",
      "hash": "***",
      "size": 279320368,
      "filename": "Acme-win-1.0.0.zip"
    },
    {
      "version": "1.1.0",
      "hash": "***",
      "size": 275047978,
      "filename": "Acme-win-1.1.0.zip"
    },
    {
      "version": "1.2.0a0",
      "hash": "***",
      "size": 301965712,
      "filename": "Acme-win-1.2.0a0.zip"
    }
  ],
  "patches": [
    {
      "version": "1.1.0",
      "hash": "***",
      "size": 3063604,
      "filename": "Acme-win-0"
    },
    {
      "version": "1.2.0a0",
      "hash": "***",
      "size": 2487417,
      "filename": "Acme-win-1"
    }
  ]
}

This puts the version string inside the archive/patch objects, instead of using the version as a key. This may be slightly less convenient, but it ensures that we always know which version corresponds with the object. In addition, JSON objects are unordered, whereas arrays are ordered.

We could then do things like:

import json
from packaging.version import Version

manifest = json.load(<manifest file defined above>)

current_version = Version("1.0.0")
latest_version = Version("1.2.0a0")

# we can easily collect the patches required to update from current to latest version
required_patches = [
    patch for patch in manifest["patches"]
    if current_version < Version(patch["version"]) <= latest_version
]

# patches can be sorted like this, whenever necessary
required_patches.sort(key=lambda patch: Version(patch["version"]))

To improve cohesion, I'm also thinking about something like a VersionManifest class which would handle (de)serialization of the version manifest, determining the latest version, collecting the required patches for an update, etc.

For backward compatibility, the structure above can be converted to a separate file in the old format.

What do you think?

@dennisvang I have always wondered about that myself.
It might be of interest to check out the --split-version argument during signing e.g:
pyupdater pkg --sign --split-version
As it splits the versions.gz file up into platform specific ones.

As for the application names, that could (but I'm not sure) be because PyUpdater also supports tracking and updating of asset files, hence why I don't think your suggested implementation would work.
What do you think :)

@NicklasTegner Thanks, those are good points. :)
I never used asset updates before, so didn't think of those.
I'll have a closer look at them.

@dennisvang I coded around my workflow. At the time of writing, I was using PyUpdater by syncing source code to VMs for each supported platform, then building and packaging. Syncing artifacts back to my main dev machine then signing and uploading. This was before the CI/CD craze. When PyUpdater usage picked up and people started wanting to use it in CI, the --split-version option was added. Never needed it personally for my use case.

PyUpdater supports versioned assets with are treated as another app internally.

@JMSwag Thanks for the explanation. That provides some insight into possible workflows for multiple platforms.

@NicklasTegner I think it should be possible to use a simplified version manifest like the one I described above. Maybe not exactly the same, but similar.

At the very least I think we should aim to remove as much nesting as possible from the manifest (as in my example), or, alternatively, change the order of nesting so that app name and platform are at the top level.

I don't think the app name or platform can change within one "session", right? If so, then, both on the client side and on the package-creation side, the app name and platform should only be needed once, viz. at the start of the "session", to load the version data from the manifest for that specific combination of app name and platform.

This way, we don't need the app_name and platform keys every time we access the manifest data, e.g. here, and don't need to use things like EasyAccessDict, e.g. here.

For example, version information in the current manifest is nested like this

{"updates": {"Acme": {"1.0.0.2.0": {"win": {}}}}}

So we need to do version_data["updates"]["Acme"]["1.0.0.2.0"]["win"] every time we need info for "1.0.0.2.0", even though "Acme" and "win" are always the same in one session.

It would already be a great improvement if the order were different:

{"updates": {"Acme": {"win": {"1.0.0.2.0": {}}}}}

Then we could assign version_data = manifest["updates"]["Acme"]["win"] somewhere at the start, and only need to do version_data["1.0.0.2.0"] whenever we need it.

I'm convinced a simplification of the manifest structure would greatly improve code readability.

@dennisvang I concur.

@JMSwag Just a quick question related to removal of non-essential code: Is there a compelling reason to hide/unhide the files in AppUpdate._win_rename? See e.g. this line.

@dennisvang On Windows you cannot overwrite the exe from a running process. This is a workaround to simulate the behavior on Linux/Mac.

@JMSwag: Thanks. Does that mean it won't work if you don't hide the file on windows?

Another question, to help us understand the reasoning behind the cryptographic approach.

As I see it, currently, the versions.gz file contains hashes for the package files (archives and patches). The versions.gz file itself is signed. If the versions.gz file signature is verified, the hashes from that file can be trusted. As a result, these hashes can be used to verify not only integrity but also authenticity of the package files.

I wonder why this approach was chosen, instead of just signing all individual package files?
I assume it is a performance optimization, as calculating a hash is typically (much) faster than verifying a signed message.
Is that assumption correct?

I'm asking because I think removing the hash part, and just signing all files instead, would enable another significant simplification of the code (and workflow).

Based on a quick timing comparison, verifying a 1 GiB signed file takes approx. two to three times as long as verifying the hash for the same file, on my system: 5.5 s vs 2.4 s.
The signing itself takes approx. twice that.

The relative time difference is large, but it occurs only incidentally: before deploying a new update, on the dev side, and after downloading a new update, on the client side. Moreover, compared with e.g. download time, it can probably be neglected.

What do you think?

@dennisvang NP! It will still work but the old executable will be visible next to the current executable.

It was a solution copied form youtube-dl. I was a noob at the time and figured it was sufficient. If signing each file individually simplifies the implementation, I'm all for it!

@JMSwag Thanks again for clarifying. :-)

@dennisvang My pleasure ๐Ÿ˜‰

@JMSwag @dennisvang how would I start writing code/documentation or anything else for this?
I mean, it doesn't seem like we have a branch on here, to work against, or something like that?

Please correct me, if I'm wrong :)

Also another suggestions. Would it maybe be a good idea to split PyUpdater app updates and PyUpdater assets updates (think it's called library updates) into separate packages, as I think (but I'm not sure) that the majority of people using PyUpdater are using it for the application update capabilities?

@JMSwag @dennisvang how would I start writing code/documentation or anything else for this? I mean, it doesn't seem like we have a branch on here, to work against, or something like that?

@NicklasTegner you're right, but you can create a new branch in your own fork, for the time being.

Also another suggestions. Would it maybe be a good idea to split PyUpdater app updates and PyUpdater assets updates (think it's called library updates) into separate packages, as I think (but I'm not sure) that the majority of people using PyUpdater are using it for the application update capabilities?

@NicklasTegner Although I've never used the asset update capability myself, I'm not sure it is worth moving it into a separate repo. The asset updates do not require that much extra code.

On the client side, AppUpdate is actually a subclass of LibUpdate (i.e. the asset update class), see source.

On the dev/build/server side (whatever you call it in this context), the actual workflow for creating asset updates is slightly different from that for app updates, but the asset-specific code is mostly limited to ExternalLib (i.e. external asset) and create_asset_archive, see source.

True, there is some juggling with app-name/asset-name throughout the rest of the codebase, but that can be taken care of by a simplified implementation of the file manifest.

@JMSwag @NicklasTegner

I suppose it would be useful to create a list of minimum requirements:

  • what functionality is considered essential?
  • what functionality is optional? (asset updates? uploader? plugins?)
  • what functionality can be dropped altogether?

Based on that, we could set some targets, etc.

Also:

  • which python versions do we want to support? (3.6 is end-of-life, 3.7 still has life left, but imposes some minor limitations)
  • which OS/platforms do we want to support?
  • CI/CD support? (what are example use cases?)

you're right, but you can create a new branch in your own fork, for the time being.

That just seems very problematic, if everyone starts writing their own implementation/code.

Although I've never used the asset update capability myself, I'm not sure it is worth moving it into a separate repo.

I see your point with this comment, but it definitely neds a cleanup :)

That just seems very problematic, if everyone starts writing their own implementation/code.

Personally I think it would be very useful to have a few different ideas and implementations to compare.

As long as there is no clear tactical plan...

@JMSwag @NicklasTegner

I suppose it would be useful to create a list of minimum requirements:

  • what functionality is considered essential?
  • what functionality is optional? (asset updates? uploader? plugins?)
  • what functionality can be dropped altogether?

Based on that, we could set some targets, etc.

Also:

  • which python versions do we want to support? (3.6 is end-of-life, 3.7 still has life left, but imposes some minor limitations)
  • which OS/platforms do we want to support?
  • CI/CD support? (what are example use cases?)

I'd say that existing functionality is essential. Not sure of anything that could be dropped.

We'd want to support any supported version of Python. 3.7+ at this point in time.

At a minimum we should support Linux, Mac & Windows.

I don't think explicit support for a particular CI/CD is needed as long as PyUpdater is flexible enough to be used in CI/CD.

any updates on this?

@dennisvang @JMSwag

@JonThom Version 4.0 has some issues. See e.g. PR #322 for pending fixes.

We've been using the 3.1 release for several years now without major issues, although I've not tried any of the fancy stuff such as plugins and assets.

@NicklasTegner While awaiting guidance here, I've been working on an alternative implementation that uses a version manifest structured as a flat JSON array of file-info objects, as follows:

[
    {"filename": "Acme-win-1.0.zip", "file_size": 12345, "file_hash": "...."},
    {"filename": "Acme-win-1.1a0.zip", "file_size": 13456, "file_hash": "...."},
    {"filename": "Acme-win-1.1a0.patch", "file_size": 1111, "file_hash": "...."},
    ...,
]

App/asset name, platform, version (and release channel), and file type are parsed from the filenames. App archives, patches, and asset archives, for different platforms and release channels, all co-exist peacefully in the same array. No nesting.

Although the work is far from finished, I am about ready to publish to my fork. At the very least, I'm hoping that will spark the discussion here.

@dennisvang @NicklasTegner I think I've responded to the outstanding questions. Please let me know if otherwise.

@JMSwag @dennisvang how would I start writing code/documentation or anything else for this? I mean, it doesn't seem like we have a branch on here, to work against, or something like that?

@JMSwag I guess above comment from @NicklasTegner is the only one that has not been answered yet. :-)

I think the main question is: How would you like to proceed? Do you have any specific tasks in mind that we could work on?

@JMSwag @NicklasTegner It turns out python-tuf, i.e. the reference implementation for TUF (The Update Framework), has recently reached version 1.0.0, and is now production-ready.

I'm just wondering, would it be an option to build on top of that? It would allow pyupdater to focus on high-level functionality.

@dennisvang seems like a fantastic idea

Any updates on the progress of v5? Very interested in building on top of this! Thanks!

Any updates?

Project is dead