official-stockfish/fishtest

Python packages exist inside the uploaded version of the repo

Opened this issue · 27 comments

Python packages are not fishtest code, and should not be a part of the repo for couple of reasons:

1- Copywrite concerns
2- Bloated repo (upload, download)
3- Creates confusion

Ideally, there should be a mechanism like when installing node modules in a javascript project.
The config.json tells pip or whatever installer to install specific versions of all the packages.. and we can view which version is used in fishtest and we can update that regularly.

To be honest I believe none of your reasons is compelling. On the other hand I also don't know why the worker (not the server!) is shipped with some pre-installed packages. @ppigazzini no doubt can explain.

One practical issue is the expression package. There are many packages (which could be present on the host system) that have a module called expression, but we want to make sure we import the right one. This problem is real and has occurred on my system!

To avoid this we could run the worker in a virtual environment (these are isolated python environments which are created via python -m venv <directory>).

I dont see how it's fine to have the packages directly in this repository and at the same time deny the addition of docker. So we are saying we dont care how the user is running the fishtest worker. It's his operating system, everything required needs to be there, but apparently we do care? A bit contradicting...

The way as run the worker does not belong to this repository, every user has his operating system, his use cases and his preferences.

The hard coded packages probably aim to prevent some conflicts on the users operating system, but this would have been solved long ago by directing the users to setup a venv or even using docker. Which is just the common way to run such apps I guess.

Some additional points:

  • What if someone wants to update the packages? One copies over all thousands of file changes? Who is reviewing the that contributor didnt add anything malicious in the update?
  • 1- Copywrite concerns @vdbergh why do you think this is not important? idna is distributed under the BSD License license. Where can I find the license ? I dont see it in this repo.

I dont see how it's fine to have the packages directly in this repository and at the same time deny the addition of docker. So we are saying we dont care how the user is running the fishtest worker. It's his operating system, everything required needs to be there, but apparently we do care? A bit contradicting...

Obviously @ppigazzini does not want to support yet another way of running the worker. I can understand that. He is already doing an infinite amount of work.

Personally I think docker is not really a good fit for the worker. To get the same functionality as the standard worker one has to start the container via a script (e.g. to supply a stable UUID). But of course people will just start the container with docker run or docker compose up.

The way as run the worker does not belong to this repository, every user has his operating system, his use cases and his preferences.

The hard coded packages probably aim to prevent some conflicts on the users operating system, but this would have been solved long ago by directing the users to setup a venv or even using docker. Which is just the common way to run such apps I guess.

Some additional points:

* What if someone wants to update the packages? One copies over all thousands of file changes? Who is reviewing the that contributor didnt add anything malicious in the update?

* `1- Copywrite concerns` @vdbergh why do you think this is not important? `idna` is distributed under the `BSD License` license. Where can I find the license ? I dont see it in this repo.

If a package is distributable then distributing it is not a copyright violation. If the idna package does not contain a LICENSE file, then that is a problem with the package, which should be reported upstream.

If a package is distributable then distributing it is not a copyright violation. If the idna package does not contain a LICENSE file, then that is a problem with the package, which should be reported upstream.

The repo itself has a license https://github.com/kjd/idna/tree/v2.1, and the license clearly says Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. So I dont understand how redistributing it without a license is not problematic.

I didnt want to spark another discussion about docker because this is not the point of this issue, only that it is contradicting imo.

If a package is distributable then distributing it is not a copyright violation. If the idna package does not contain a LICENSE file, then that is a problem with the package, which should be reported upstream.

The repo itself has a license https://github.com/kjd/idna/tree/v2.1, and the license clearly says Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. So I dont understand how redistributing it without a license is not problematic.

As I said this is a bug in the package and it should be reported upstream. We could temporarily add a copy of the LICENSE file to the worker copy of package though (if @ppigazzini thinks we should keep it).

It's not really a "bug", even the official sample project has their license at the root of the project https://github.com/pypa/sampleproject. It's where the license belongs. Additionally one might add the license to each source file, but I think this is very much outdated.
Simply said the packages were never really meant to be copied 1:1 into the source directory of one's project.

To make a long story short, the enthusiast Linux users, lack knowledge to proper setup, configure, maintain a Linux system, as an enthusiast Linux user myself, I do know first hand the issue.
https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

  • The project started with python 2.x and for several months it supported both python 2.x and 3.x
  • In the past the folder was re added because the enthusiast linux users have not the requests package installed, so the workers went down during an update, see #267
  • I already updated the requests package to the last version compatible with noob fleet to fix in advance a problem with a future python version dropping a deprecated package, see #1017.
    The process used to get the requests package with its dependencies:
  python3 -m venv env
  python3 -m pip install --upgrade pip
  python3 -m pip install --upgrade wheel setuptools
  python3 -m pip install requests

So the code distributed is the same installed by pip at the time of those PR.

  • System python on every linux distro is intended to be used by system applications, not by user applications. This explain why the system python is usually not updatable and crippled respect to a standard build. Ubuntu at example compiles system python without the modules pip, ensurepip and venv, pip can be installed with apt but that can be saw as a moral suasion to have a not crippled and up to date python user installation in place using other ways (with pyenv, conda etc.).

As for docker, we have an official repo.
The code provided at the moment does not work with any docker installations on Ubuntu (official docker, docker.io, podman-docker, snapcraft etc), and there are not proper instructions about anything needed to run the container. But, as in any open source project, people work pro bono in their spare time, so consider that as a WIP repo.
I am confident that the docker repo will improve over time as contributors improve their knowledge about containers and orchestration of containers.
For example, contrary to the common wisdom of the enthusiast linux user, the biggest concern with docker is the security model (root privilege of the daemon, kernel exploits, denial-of-service from one bad container, container breakouts, poisoned base images etc.), I'm sure that the docker repo contributors/maintainers will advise in proper way the user base before distributing an official containerized worker at large.

I dont care about docker, you (ppigazzini) said in the docker issue it's not fishtests/the workers responsability to setup/provide something which setups/includes the necessary packages for the worker. "His os, his preferences", so I dont see why having the packages in the repository is the right thing to do.

I know that it will probably be impossible to immediately throw away the packages directory and hope that every user setup their enviornment correctly because this is likely not the case, however we might start preparing for such a future.

Which I think you agree with? If I understand the comments on #1017 correctly.
So we might as well just start to update the wiki installation script.

Ah I see that the https://github.com/official-stockfish/fishtest/wiki/Running-the-worker-on-Linux#worker-setup-script-for-ubuntu script already has this but I doubt people actually use this script.

Surprise! :)

Why do I need files to exist in my oploaded project that I don't intend to modify? It makes 0 sense from the dev perspective nor from the downloader POV.
Fishtest is not the place to get them.
Where are their licenses here?
Them existing on PIP is not a direct approval of redistribution but a direct approval of only usage.

Also, I dont know how docker came into this.. distributing packages without their root liciense is plain wrong.. we never edit the package, and if we did we only then want to distribute it again correctly and not install it as it becomes a code related to fishtest and needs distribution.

If a package cannot be installed by the worker or server owner then the problem lies in their python version insaller version then we can enforce a version to be supported and should not be solved by distributing the package again..

Screenshot of when the last time we modified any of these packages
Screenshot_20231224-181347_Chrome~2

Cant we just use this?
pip install packageName --target pathOfDirectorty

Cant we just use this? pip install packageName --target pathOfDirectorty

And then import packages from this directory, without any local enviroment or using their unreliable global packages.

No, because nothing assure that pip is available.

No, because nothing assure that pip is available.

And why this could be a problem, either we install pip before or make whoever installed that package also install it here.

Cant we just use this? pip install packageName --target pathOfDirectorty

And then import packages from this directory, without any local enviroment or using their unreliable global packages.

It doesn't work, see
https://stackoverflow.com/questions/6570635/installing-multiple-versions-of-a-package-with-pip

The only correct way is to setup a virtual environment, as in the wiki setup scripts (linux/windows).
But nobody read the wiki, so I'm expecting that several workers are running with system python.

So you are saying that installing a different version of a package into a specifc directory overwrites the global one the user have?
This sounds like a huge design issue in their solution.

To overtake the issue python/pip has the user mode python3 -m pip --user install (but this could messes with another package version installed in user mode) and, the only safe way, the virtual environment python3 -m venv env && env/bin/python3 -m pip install.
ps: there is another safe way, installing a new python distribution alongside your application.

Okay but inside that virtual enviroment cannot a config file tell pip what versions to install, i.e. what is the need for serving packages from fishtest repo, instead of installing them in that enviroment that we can specify beforehand?

Is the issue only dependant on misbehaving users finding it more convenient to use system python when running multiple workers somehow?

To overtake the issue python/pip has the user mode python3 -m pip --user install (but this could messes with another package version installed in user mode) and, the only safe way, the virtual environment python3 -m venv env && env/bin/python3 -m pip install. ps: there is another safe way, installing a new python distribution alongside your application.

We setup the server with setuptools (poetry is a more recent alternative to install python applications). But there is still the problem that is responsibility of the user to setup a virtual environment.

https://github.com/official-stockfish/fishtest/blob/master/server/setup.py

https://github.com/official-stockfish/fishtest/wiki/Fishtest-server-setup#server-setup-1

# download fishtest
sudo -i -u ${user_name} << EOF
git clone --single-branch --branch master https://github.com/official-stockfish/fishtest.git
cd fishtest
git config user.email "${git_user_email}"
git config user.name "${git_user_name}"
EOF

# setup fishtest
sudo -i -u ${user_name} << 'EOF'
python3 -m venv ${VENV}
${VENV}/bin/python3 -m pip install --upgrade pip setuptools wheel
cd ${HOME}/fishtest/server
${VENV}/bin/python3 -m pip install -e .
EOF

Okay great! Now I understand some stuff that are not relevant to the issue.
What this issue is about is how hard is it to clone/install the packages we use instead of having them uploaded to github.

I propose one way to solve this problem.
We use a wiki script..
We in the wiki script clone all the packages inside the correct path.
This way we solve this issue of people not using the wiki.
And we solve the issue of us having python packages inside the uploaded repo (by removing them).

I dont see how it's fine to have the packages directly in this repository and at the same time deny the addition of docker. So we are saying we dont care how the user is running the fishtest worker. It's his operating system, everything required needs to be there, but apparently we do care? A bit contradicting...

Obviously @ppigazzini does not want to support yet another way of running the worker. I can understand that. He is already doing an infinite amount of work.

I don't want to maintain an official way to setup and run the worker (and the server), I always wrote the instructions and the scripts or on the wiki or in a personal repo (windows setup script).
About docker, I suggested to open another repository in the organization.