Repeated runs should not download Conda each time

Question

Repeated runs should not download Conda each time

probonopd opened this issue 6 years ago · 23 comments

Repeated runs should not download Conda each time. Especially during development of a "recipe"...

There should be a download stage and a separate stage where the downloaded stuff goes into the AppDir. On a re-run, only the AppDir shall be deleted but not the downloaded stuff. Similar to pkg2appImage...

Answer 1 · 2019-01-02T14:05:35.000Z

Re-setting up from scratch makes sure you have a clean environment. You're free not to re-run the plugin on future invocations (just remove the --plugin conda part). You can call tools like pip etc. yourself directly, too. I don't see any reason to change this behavior in this plugin. If you need caching, you should think about asking the conda folks to implement something in their installer.

Answer 2 · 2019-01-02T14:13:37.000Z

I don't know how you develop a script like this, but to get it into this shape I had to run it about 30 times, and each time I had to download PyQt and everything...

Answer 3 · 2019-01-02T14:23:56.000Z

I don't want to bloat the plugin script with implementing a reliable update and caching etc. system... PRs welcome?

Answer 4 · 2019-01-03T18:29:22.000Z

Can it be achieved (easily) by using a local HTTP(S) proxy/cache? Do you know one that is super quick to set up (as part of my "build" scripts)?

Answer 5 · 2019-01-03T18:33:00.000Z

Would need to be HTTPS-enabled, no? You'd probably have to get a generic SSL cert, trust it in your system etc. Wouldn't take too long for an experienced user, but not "super quick".

Miniconda don't provide "pre-built tarballs" which you could use as a base as in "just extract and then install whatever you want on top". They want you to use this shell script and download stuff package wise.

Answer 6 · 2019-05-21T22:55:12.000Z

#22 (comment)

Can you open a new issue about caching that?

Answer 7 · 2019-05-21T23:03:34.000Z

PRs welcome, but this is nothing I will work on actively.

Answer 8 · 2019-06-09T11:52:47.000Z

this is nothing I will work on actively

Do you get your AppImages right on the first try? This is the #1 annoyance for me why. I always need at least tens of tries, during which I have to wait for many minutes downloading.

Answer 9 · 2019-06-09T11:53:52.000Z

You are free to send a PR, but I personally don't need this, and the other users don't seem to do either.

Answer 10 · 2020-05-02T07:48:06.000Z

Please reconsider to make this high priority, as I understand now that it might technically be easy to keep the cached pip downloads (by just not deleting them).

Having to download everything each time (especially big packets like Qt) is what made me not use this tool anymore.

Answer 11 · 2020-05-02T11:39:14.000Z

Yet again, I can only tell you that the main concern is that we have to re-setup conda every time to make sure it's a clean situation. We use the miniconda installer for this, so it's not as easy as just extracting an archive over an existing directory. That is the problem you need to solve. If you come up with a solution that works around that problem, please be my guest. I don't think there's miniconda tarballs that you can just download and extract by yourself. If there are any, we can just use those without breaking anything.

Answer 12 · 2020-05-02T14:35:21.000Z

All I know is that with pip install, the downloaded archives get cached, so that if some other operation (in a completely different virtualenv) requires the same packages, they don't have to be downloaded again.

This is currently the reason why I avoid Conda if I can get away with just using pip.

Answer 13 · 2020-05-02T17:06:09.000Z

The pip in a conda environment does not cache on a local system for good reasons. It's a completely portable setup, after all. A cache is also not needed if we manage to just keep/overwrite the existing conda env, as pip would not just reinstall already installed dependencies.

Answer 14 · 2020-05-02T19:27:12.000Z

Please see 4bf0143 and 417ecdb. By default the plugin still re-downloads the conda installer, but you can specify a cache dir, and if it's used it will cache the file there. That'll avoid further downloading. The second commit then allows for re-running the plugin on existing AppDirs, at the risk of breaking things. It shows a warning, though.

Feature implemented, we can close this issue.

Answer 15 · 2020-05-02T19:39:56.000Z

you can specify a cache dir, and if it's used it will cache the file there

Can we make it so that the cache dir will be automatically created and used by default without the user needing to do anything special?

Answer 16 · 2020-05-02T19:52:11.000Z

Already thought about it, but I cannot come up with a good solution. We'd need something like /tmp but user-specific. ~/.cache is a bad idea IMO. Maybe we need to invent our own user-tempdir like Mozilla did?

In any case, can you please make this a new issue? I don't think it's really that important, now that it's possible via opt-in. Also it's easier to keep track there.

Answer 17 · 2020-05-03T19:49:28.000Z

Possibly use something like /tmp/linuxdeploy-plugin-conda-$UID/ as a cache directory.

Answer 18 · 2020-05-04T12:47:03.000Z

I have re-evaluated the situation. So, right now, the user has to set a variable to get a persistent cache. If we change this to what we discussed, we have a huge problem: the plugin may not run concurrently any more, as those runs could influence each other.

You might think "no problem", but actually, it is. Many CI servers (maybe not that cloud-based stuff) run builds with one user on the same machine. Now, with this proposal, it could happen that two instances of wget might get into a data race where both try to download the same file at once.

The comparison with Mozilla Thunderbird for instance which we made earlier is a bad one therefore. They actively ensure only one instance is running at a time, and hence can use a predictable temporary directory name.

Unless someone implements a lockfile scheme to prevent this, we should not touch it any more.
I consider this issue resolved. Using a data cache is not at all difficult, and it's documented. I don't have time to implement proper locking right now.

If you think it's worth the effort, this time please create a new issue.

Answer 19 · 2020-05-05T22:13:38.000Z

I just discovered a simple and probably portable way to lock with flock. It even comes with Alpine (as it's one of the tools provided by busybox), and can used to lock single files. It should be possible to have some rudimentary mutual exclusive access to the downloaded files by just calling wget through flock.

Answer 20 · 2020-05-08T06:45:01.000Z

Thank you very much.

By the way, I checked a couple of systems and the "per-user /tmp" indeed seems to be /run/user/$UID these days. Usually a tmpfs is mounted there. Learned something new!

Answer 21 · 2020-05-08T09:39:53.000Z

And that is exactly why we shouldn't use it. These conda installers have at least 80-100 MiB from my tests. Imagine occupying that space until the computer is shut down. I usually am a strong advocate for building in RAM disks, but caching in RAM disks requires more RAM than many machines can afford. In fact, building in RAM disks is already highly problematic if not impossible on CI systems like Travis.

Answer 22 · 2020-05-08T18:34:14.000Z

I see...says the guy who is running everything on Live systems where everything lives in tmpfs... ;)

Answer 23 · 2020-05-08T19:10:19.000Z

You cannot always make such assumptions. Just because it might work on your system it won't automatically work on other people's, too. That's a misconception which makes people develop bad software all the time.