kluctl/go-embed-python

Are you able to give suggestion as to why sklearn will not install?

Closed this issue · 18 comments

If you add sklearn to the requirements.txt (scikit-learn) the install of packages always fails. Are you able to suggest a reason for this that I can look into?
Realise you're busy, just looking for a nudge in the right direction

Can you provide an error message?

Hey @codablock - apologies, I completely missed you responded.
In my requirements.txt I have

--find-links https://pypi.anaconda.org/scientific-python-nightly-wheels/simple/scikit-learn
numpy
pandas
jax==0.1.23
scikit-learn
seaborn
matplotlib

(note the --find-links was just me trying things as it wasn't working so added that, but no fix)

and when I run go generate ./... I get

ERROR: Could not find a version that satisfies the requirement scikit-learn (from versions: none)
ERROR: No matching distribution found for scikit-learn

[notice] A new release of pip is available: 23.0 -> 23.2.1
[notice] To update, run: /tmp/python-pip-darwin-arm64-macosx_11_0_arm64/bin/python3 -m pip install --upgrade pip
panic: exit status 1

goroutine 1 [running]:
main.main()
        /Users/irenespalletti/go/src/github.com/amlwwalker/jupyter-editor/internal/generate/main.go:10 +0x50
exit status 2
internal/dummy.go:3: running "go": exit status 1

and then removing that from the requirements.txt and it works. I have same issue with tensorflow tensorflow==1.1.* in the requirements.txt

Would love to get these libraries all working. Any help greatly appreciated.

My main.go looks like

func main() {
	name := "example"
	ep, err := python.NewEmbeddedPython(name)
	if err != nil {
		panic(err)
	}

	c := make(chan os.Signal)
	signal.Notify(c, os.Interrupt, syscall.SIGHUP)
	go func() {
		<-c
		fmt.Println("received an exit code and overriding")
		os.Exit(0)
	}()

	lib, err := embed_util.NewEmbeddedFiles(data.Data, name)
	if err != nil {
		panic(err)
	}
	ep.AddPythonPath(lib.GetExtractedPath())

	//for {
	content, err := os.ReadFile("./scripts/scikit.py")
	if err != nil {
		log.Fatalln("could not read file contents ", err)
	}
	cmd := ep.PythonCmd("-c", string(content))
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr
	err = cmd.Run()
	if err != nil {
		//log the python error
	}
}

It all works without those libraries so I think I have everything setup correctly otherwise. I'm running off your master branch here. My directory structure is

.
├── data/
├── go.mod
├── go.sum
├── internal
│   ├── data/
│   ├── dummy.go
│   ├── generate
│   │   └── main.go
│   └── requirements.txt
├── main.go
├── pkg
├── plot.pdf
└── scripts
    └── scikit.py

Hey @codablock any chance you got to have a look at this? Would really appreciate it, thank you!

@amlwwalker Sorry that this took so long. I think I have found the issue. The package in question was not available as a MacOS 11 Wheel. go-embed-python tries its best to use the lowest available macos versions here to stay compatible. I have now added MacOS 12 as an additional version to try after 11 fails. This should allow you to use the package in question.

Aaand it looks like you'll probably run into further issue due to missing ctypes support in go-embed-python. Just verified this by trying to run an example with scikit-learn. I will need to think more about how to proceed with this situation in general.

Thanks @codablock .

I was actually using previously a previous commit of your master before you removed ctypes. Could I cherry pick your fix on to there?

I currently tend to re-add ctypes and completely move away from musl based linux builds. I also had to do this for linux-arm64 as python-standalone was simply not available in this combination.

At the same time, my main project (kluctl) moved to wolfi based images which use a rebuilt version of Alpine with glibc instead of musl, so my initial requirement for musl has gone actually.

I will later push a dev version to this fork: https://github.com/codablock/go-embed-python
It will then also contain tags with specific python version combinations. Would you be willing to test this out? It will also contain a new release process that allows to support multiple Python versions.

Hey @codablock - more than happy to test whatever's needed out. Its a very very cool thing having python interacting with Go. Please do send me whatever you would like to test. Should I move to that fork for my usage as well then?

So to be clear, are you saying that that fork has no ctype issues that this version has once you push to it, or atleast thats your objective with this? My knowledge of the internal workings of python aren't great, but what i think you are saying is you start with a version of python that does not have ctypes and then you area dding them back in?

I'd also be very interested if you had come to a convention as to how to pass data (variables, arrays, objects) between python and Go at all?

Edit:

I did just try to use the fork instead, but I think its module is named incorrectly? I am getting

go: github.com/codablock/go-embed-python@v0.0.0-3.10.13-20231002: parsing go.mod:
        module declares its path as: github.com/kluctl/go-embed-python
                but was required as: github.com/codablock/go-embed-python

note when i use the latest github.com/kluctl/go-embed-python and try to pull scikit learn I am still getting

ERROR: Could not find a version that satisfies the requirement scikit-learn (from versions: none)
ERROR: No matching distribution found for scikit-learn

[notice] A new release of pip is available: 23.0 -> 23.3.1
[notice] To update, run: /tmp/python-pip-darwin-arm64-macosx_11_0_arm64/bin/python3 -m pip install --upgrade pip
panic: exit status 1

goroutine 1 [running]:
main.main()
        /Users/alexwalker/go/src/github.com/amlwwalker/python-embed/internal/python-libs/generate/main.go:10 +0x50
exit status 2
internal/python-libs/dummy.go:3: running "go": exit status 1

have i missed something?

To test the fork, try to add replace github.com/kluctl/go-embed-python => github.com/codablock/go-embed-python v0.0.0-3.10.13-20231002 to your go.mod. The original dependency must be left untouched.

So to be clear, are you saying that that fork has no ctype issues that this version has once you push to it, or atleast thats your objective with this? My knowledge of the internal workings of python aren't great, but what i think you are saying is you start with a version of python that does not have ctypes and then you area dding them back in?
So far, I've used the python-standalone distro from here. It contains ctypes, but as part of the go-embed-python build, I remove it from the packaged source. My fork now omits this "removing". When it works as expected, I will merge these changes into the main repo/branch.

Regarding interaction between go and python...this is currently out-of-scope for this library. This library really just allows you to start the python binary with whatever arguments you like.

So, interaction is on you. You could look into https://github.com/kluctl/go-jinja2 to see an example on how you could do it. That example uses stdout/stdin to exchange json strings between Go and the Python application which runs in the background. You could also use TCP/IP, Unix Sockets, and so on. As an alternative to json, gRPC is also an option.

I thanks @codablock I'll try that. To be clear do you think using the fork, scikit learn will work?

@amlwwalker yes, scikit should work (verified this locally already).

This is great thanks @codablock - sklearn now working a charm. Trying a few other libraries out now inc grpc.

Couple of thoughts/questions that i wondered if could be applied to the fork.

  1. With regards to this other question - can the pip version be configured and not hardcoded?
  2. I notice that you added/hardcoded the solution to my question in the fork for sklearn which specifies the versions of python etc. I wondered if this could also be configurable? Perhaps an optional yaml file or something, not sure. Another benefit to this is then can just specify a version for the current machine rather than pulling all code for all versions that are hardcoded, speeding up the pip install ?

EDIT

https://pypi.org/project/grpcio-tools/ results in

ERROR: Could not find a version that satisfies the requirement grpcio-tools (from versions: none)
ERROR: No matching distribution found for grpcio-tools

[notice] A new release of pip is available: 23.0 -> 23.3.1
[notice] To update, run: /tmp/python-pip-linux-arm64-manylinux_2_28_aarch64/bin/python3 -m pip install --upgrade pip
ERROR: Could not find a version that satisfies the requirement grpcio-tools (from versions: none)
ERROR: No matching distribution found for grpcio-tools

[notice] A new release of pip is available: 23.0 -> 23.3.1
[notice] To update, run: /tmp/python-pip-linux-arm64-manylinux2014_aarch64/bin/python3 -m pip install --upgrade pip
panic: exit status 1

goroutine 1 [running]:
main.main()
        /Users/alexwalker/go/src/github.com/amlwwalker/python-embed/internal/python-libs/generate/main.go:10 +0x50
exit status 2
internal/python-libs/dummy.go:3: running "go": exit status 1

is this in another library like before that needs adding?

@amlwwalker Can you try the freshly pushed tags?
v0.0.0-3.10.13-20231002-6
v0.0.0-3.11.6-20231002-4
v0.0.0-3.12.0-20231002-4

These should support grpcio-tools (and any other package that only supplies manylinux_2_17 wheels).

@amlwwalker ping :) After you verify that the new tags work with grpcio-tools, I'm able to merge all this back into the upstream repo.

trying now, apologies!

Answer:

yep all seems great my end on all those tags.
Before you merge back, any thoughts on making the OSes dynamic (perhaps a build tag?) and ability to upgrade the pip version?

@amlwwalker both topics are independent of the ones from this issue. I was thinking about a way to allow choosing versions more dynamically but that's something I probably won't have time for in the next weeks

The changes discussed in this issue have been pushed to the upstream fork, meaning that this can be closed now.