sugarme/gotch

Mac Support

Opened this issue · 4 comments

Hi there @sugarme - I've been evaluating gotch for a project at work.

Thanks for the fix to the rnn example - I'll create a PR for the small API suggestion I had (for TextDataIter) for tracking progress.

I've been using gotch in a linux docker container and was wondering of the possibility of running it on a mac (natively). A couple of weeks ago I built an ARM64 build of pytorch for both the Mac + my Linux container (I wasn't able to find those on the pytorch site of archived binaries). Then today I managed to get a Mac build of gotch running locally - though the cgo configuration of lib.go involved a fair bit of guess work. Anyway, I attempted to run the rnn example on my mac and it ran but crashed in the sample function here:;

func sample(data *ts.TextData, lstm *nn.LSTM, linear *nn.Linear, device gotch.Device) string {
[SNIP]
	for i := 0; i < int(SamplingLen); i++ {
[SNIP]
		// 1. Delete inState tensors (from C land memory)
		inState.(*nn.LSTMState).Tensor1.MustDrop()

with the error:

char-rnn(42548,0x104a30580) malloc: Incorrect checksum for freed object 0x12b1cc680: probably modified after being freed.

As a test - I tried commenting out the above MustDrop() (free) and it just crashed at a different point in the sample function with a SIGBUS error. I'm guessing these errors are a result of not setting up pytorch + gotch correctly as opposed to an actual allocation/deallocation error - but wanted to get your opinion.

I'm not blocked - I can just switch back to my linux docker container (which doesn't have these crashes).

@source-transformer ,

Thanks for sharing. Can I just verify that

  1. You built Pytorch (for Python) or Libtorch (for C++) from source? Can you refer to the source or how did you do it? (latest gotch just use libtorch version 1.11 and if you build from source, should try with the same version).
  2. You built such library for Mac M1? (I can run gotch on Mac Intel CPU just fine - No CUDA)

I don't have a Mac M1 to try so I am just pure guessing.

  • I often build libtorch from source using official Pytorch script provided here. Just install dependencies and run `python YOUR_LOCAL_CLONE_OF_PYTORCH/tools/build_libtorch.py). I am not sure if it works for Mac M1. As you can built and linked libtorch without issues, it may not be the issue here.
  • A quick search of the error you got and may be it similar to this malloc: Incorrect checksum for freed object 0x12b1cc680: probably modified after being freed.
  • If so, it is probably not related to ts.MustDrop() but just an in incorrect casting type in libtorch binding and probably fixed previously in master branch.
  • Have you tried lastest gotch? If not, please try with go get github.com/sugarme/gotch@latest

Let's me know how things going. Thanks.

Thanks for the reply!

Yes - for both my linux docker container + my local mac usage - I had to build LibTorch from source - as I say - I wasn't able to find an AM64 build of PyTorch in their archives (exp: https://pytorch.org/get-started/previous-versions/).

Here is how I cloned their git repository and ensured that I was on 1.11 (essentially a copy/paste from here: https://pytorch.org/get-started/previous-versions/#from-source):

cd ~/
mkdir github
cd github
git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
git checkout v1.11.0
git submodule sync
git submodule update --init --recursive

In my linux docker container - I did run into some difficulties getting the PyTorch cmake script to recognize my installation of python (I think their python version check isn't working properly - i can elaborate in another post if you're interested) - so I ended up having to install the dev version of python - i.e.

sudo apt-get install python3-dev

After that - I was able to run the following (to generate an ARM64 build of PyTorch for Linux):

python3 setup.py develop

Then I updated the environment variables mentioned in the gotch scripts in my .bashrc:

GOTCH_LIBTORCH, LIBRARY_PATH, CPATH, LD_LIBRARY_PATH, GOTCH_VER

After that it more or less just worked in my linux docker container (i.e I was able to reference gotch from my go application).

For running any examples in the gotch repository on my mac locally (i.e. not in a linux docker container) - I did run into some difficulty with the default lib.go. I know the setup-gotch.sh essentially stomps on that file in the installed gotch package. I've never done anything with cgo before (mostly trying to understand the errors returned from go's darwin_arm64/link) - so it was a lot of trial and error before I ended up with something like this:

package libtch

// #cgo CFLAGS: -I<root dir>/<pytorch dir>/torch/include -O3 -Wall -Wno-unused-variable -Wno-deprecated-declarations -Wno-c++11-narrowing -g -Wno-sign-compare -Wno-unused-function
// #cgo CFLAGS: -I/usr/local/include
// #cgo CFLAGS: -D_GLIBCXX_USE_CXX11_ABI=1
// #cgo LDFLAGS: -lstdc++ -ltorch -lc10 -ltorch_cpu -L<root dir>/<pytorch dir>/torch/lib/
// #cgo CXXFLAGS: -std=c++17 -I<root dir>/<pytorch dir>/torch/include -g -O3
// #cgo CFLAGS: -I<root dir>/<pytorch dir>/torch/lib -I<root dir>/<pytorch dir>/torch/include -I<root dir>/<pytorch dir>/libtorch/include/torch/csrc
// #cgo LDFLAGS: -L<root dir>/<pytorch dir>/torch/lib
// #cgo CXXFLAGS: -I<root dir>/<pytorch dir>/torch/lib -I<root dir>/<pytorch dir>/torch/include -I<root dir>/<pytorch dir>/libtorch/include/torch/csrc
import "C"

As you mentioned there is no CUDA on mac - so the above was arrived at after repeated errors from the go's link command failing to locate the various libraries.

If you suspect it is my build of PyTorch - i can try what you mentioned:

I often build libtorch from source using official Pytorch script provided here. Just install dependencies and run `python YOUR_LOCAL_CLONE_OF_PYTORCH/tools/build_libtorch.py)

Thanks again - I might just abandon trying to use gotch on anything but my linux docker container since it technically works there.

Okay, in case anyone else sees it here, this is how I get the current version 0.7 of Gotch working on a Mac M1.

  • Using python venv and a pip pytorch installed torchlib, as suggested here and see here
  • Plus a few fix around here and there (see below)
# using a venv at /some_path/.venv, use those two commands to verify before starting

which python
which pip

# the output above should look like this
# /some_path/.venv/bin/pip

# then install pytorch for M1 with ARM build.

pip install torch==1.11.0

Then setup environment parameters as usual, pointing the libtorch to the installed pytorch location in venv folder.

export GOTCH_LIBTORCH="/some_path/.venv/lib/python3.10/site-packages/torch/"
export LIBRARY_PATH="$LIBRARY_PATH:$GOTCH_LIBTORCH/lib"
export CPATH="$CPATH:$GOTCH_LIBTORCH/lib:$GOTCH_LIBTORCH/include:$GOTCH_LIBTORCH/include/torch/csrc/api/include"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$GOTCH_LIBTORCH/lib"

Then setup gotch with a script.

chmod +x setup-gotch.sh
export CUDA_VER=cpu && export GOTCH_VER=v0.7.0 && bash setup-gotch.sh

# If seeing the build faling of trying to load -lcuda, then try clean and setup again.
# go clean
# go clean -cache

You might also need to change 3 places on tensor.go. See
#44 (comment)

Now the build and run go program should work as below.

go build . 
go run -exec "env DYLD_LIBRARY_PATH=$GOTCH_LIBTORCH/lib" .

close for now as above solution.