ebitengine/purego

purgo build fail at linux

Cyberhan123 opened this issue · 28 comments

I try to build an app using purego on ubuntu but it fails on linux

2024-01-07T05:12:29.4981059Z # github.com/cyberhan123/stable-diffusion-desktop
2024-01-07T05:12:29.4983718Z dlopen: unhandled relocation for purego_dlopen (type 46 (SDYNIMPORT) rtype 7 (R_CALL))
2024-01-07T05:12:29.4989379Z dlsym: unhandled relocation for purego_dlsym (type 46 (SDYNIMPORT) rtype 7 (R_CALL))
2024-01-07T05:12:29.4991109Z dlerror: unhandled relocation for purego_dlerror (type 46 (SDYNIMPORT) rtype 7 (R_CALL))
2024-01-07T05:12:29.4992706Z dlclose: unhandled relocation for purego_dlclose (type 46 (SDYNIMPORT) rtype 7 (R_CALL))
2024-01-07T05:12:29.5058987Z   �[90m�[90m•�[0m�[0m �[39m�[39mCompiling application: �[0m�[0m�[30;101m�[30;101m  ERROR  �[0m�[0m �[91m�[91mexit status 1�[0m�[0m

The following is the complete github action log:

App, linuxamd64, ubuntu-latest.txt

Hey @Cyberhan123 👋

Can you also provide the output of go env (feel free to redact any value that you don't feel comfortable sharing)?

In fact, there is in the log.

I do not. Seems very strange since no changes to the dynamic linking have been made in quite a while

@Cyberhan123 does it work if you set CGO_ENABLED=0?

If I am not mistaken, if relocations happens during build process it means the library is being statically linked. Now the question is: why would anyone want to static link libdl? I believe this could be caused by another package somewhere in the dependency tree that added -static or equivalent in the LDFLAGS directive for cgo.

Adding -x and -work flags should give us enough info to understand if it's the case by grepping in the files and logs. I should be able to try it myself tomorrow.

Relocations happen for any symbol that is in a different object file. Even if that symbol is one that is dynamically linked it must be relocated to the GOT & POT at build time.

However, I think you might be correct that there is something causing it to try and be statically linked which isn't possible with purego. Please do let me know what you find!

I was able to confirm on Ubunutu arm64 that building with #cgo LDFLAGS: -static produces the same output that is displayed above. This would suggest that this isn't an issue with purego as there is no way to statically link with purego since it requires libdl.so. Someone could rewrite the entire dynamic linker in Go but that sounds like it's begging to break.

@Cyberhan123 does it work if you set CGO_ENABLED=0?

If it is pure go code, there is no problem,If cgo exists, the above error will be reported.

I was able to confirm on Ubunutu arm64 that building with #cgo LDFLAGS: -static produces the same output that is displayed above. This would suggest that this isn't an issue with purego as there is no way to statically link with purego since it requires libdl.so. Someone could rewrite the entire dynamic linker in Go but that sounds like it's begging to break.

I'm using wails to build the desktop app ,I need to investigate it, thanks for the heads up.

My code below also causes this problem:

CGO_ENABLED=1 go build main.go
package main

import (
	"fmt"
	"image"
	"image/color"
	"image/png"
	"os"

	sd "github.com/seasonjs/stable-diffusion"
)


// #include <stdio.h>
// #include <stdlib.h>
//
// void __lsan_do_leak_check(void);
//
//#cgo CFLAGS: -fsanitize=address
//#cgo LDFLAGS: -fsanitize=address
import "C"

func MemoryLeakCheck() error {
	diffusion, err := sd.NewCStableDiffusion("./deps/linux/lbsd-abi.so")
	if err != nil {
		return err
	}
	diffusion.SetLogCallBack(func(level sd.LogLevel, text string) {
		fmt.Printf("%s", text)
	})
	ctx := diffusion.NewCtx("./models/miniSD.ckpt", "", "", "", false, false, true, 4, sd.F16, sd.CUDA_RNG, sd.DEFAULT)
	defer diffusion.FreeCtx(ctx)

	images := diffusion.PredictImage(ctx, "british short hair cat, high quality", "", 0, 7.0, 128, 128, sd.EULER_A, 10, 43, 1)

	err = writeToFile(images[0].Data, int(images[0].Height), int(images[0].Width), "./assets/love_cat1.png")
	if err != nil {
		return err
	}
	C.__lsan_do_leak_check()
	return nil
}

func writeToFile(byteData []byte, height int, width int, outputPath string) error {

	img := image.NewRGBA(image.Rect(0, 0, width, height))

	for y := 0; y < height; y++ {
		for x := 0; x < width; x++ {
			idx := (y*width + x) * 3
			img.Set(x, y, color.RGBA{
				R: byteData[idx],
				G: byteData[idx+1],
				B: byteData[idx+2],
				A: 255,
			})
		}
	}

	file, err := os.Create(outputPath)
	if err != nil {
		return err
	}
	defer file.Close()

	err = png.Encode(file, img)
	if err != nil {
		return err
	}
	return nil
}

func main() {
	MemoryLeakCheck()
}

There is definitely something going on with the connection between having import "C" and import "github.com/seasonjs/stable-diffusion" because doing CGO_ENABLED=1 go build with the below code still produces the error. Is there another import "C" in stable-diffusion? I still don't think this is an issue fixable inside purego.

package main

import (
	_ "github.com/seasonjs/stable-diffusion"
)


import "C"


func main() {
}

No cgo code is used in stable-diffusion

The solution I am thinking of now is that if there is cgo, then my library needs to use cgo for compatibility,

So I removed everything from stable-diffusion except import "github.com/ebitengine/purego" and it still fails but succeeds when it's removed.

Do you mean there are dependencies using cgo?

No, I'm saying if a main.go imports "C" but also imports another package that imports purego then you get this error message. This is very strange.

Forcing internal link mode with Cgo enabled works. CGO_ENABLED=1 go build -ldflags="-linkmode internal" So this is an issue with external linking

My code below also causes this problem:

CGO_ENABLED=1 go build main.go
package main

import (
	"fmt"
	"image"
	"image/color"
	"image/png"
	"os"

	sd "github.com/seasonjs/stable-diffusion"
)


// #include <stdio.h>
// #include <stdlib.h>
//
// void __lsan_do_leak_check(void);
//
//#cgo CFLAGS: -fsanitize=address
//#cgo LDFLAGS: -fsanitize=address
import "C"

func MemoryLeakCheck() error {
	diffusion, err := sd.NewCStableDiffusion("./deps/linux/lbsd-abi.so")
	if err != nil {
		return err
	}
	diffusion.SetLogCallBack(func(level sd.LogLevel, text string) {
		fmt.Printf("%s", text)
	})
	ctx := diffusion.NewCtx("./models/miniSD.ckpt", "", "", "", false, false, true, 4, sd.F16, sd.CUDA_RNG, sd.DEFAULT)
	defer diffusion.FreeCtx(ctx)

	images := diffusion.PredictImage(ctx, "british short hair cat, high quality", "", 0, 7.0, 128, 128, sd.EULER_A, 10, 43, 1)

	err = writeToFile(images[0].Data, int(images[0].Height), int(images[0].Width), "./assets/love_cat1.png")
	if err != nil {
		return err
	}
	C.__lsan_do_leak_check()
	return nil
}

func writeToFile(byteData []byte, height int, width int, outputPath string) error {

	img := image.NewRGBA(image.Rect(0, 0, width, height))

	for y := 0; y < height; y++ {
		for x := 0; x < width; x++ {
			idx := (y*width + x) * 3
			img.Set(x, y, color.RGBA{
				R: byteData[idx],
				G: byteData[idx+1],
				B: byteData[idx+2],
				A: 255,
			})
		}
	}

	file, err := os.Create(outputPath)
	if err != nil {
		return err
	}
	defer file.Close()

	err = png.Encode(file, img)
	if err != nil {
		return err
	}
	return nil
}

func main() {
	MemoryLeakCheck()
}

@TotallyGamerJet When I try to build using the command you suggested on Linux, I get an error:

/usr/local/go/pkg/tool/linux_amd64/link: internal linking requested but external linking required: some packages could not be built to support internal linking

Right now my leading theory is that the external linker cannot handle the linker directives generated by go:cgo_import_dynamic. This branch fixes the reproducer on my end.

Since it seems that we kinda have different errors than you. @Cyberhan123 Would you mind cloning my repo on my branch and connecting it with a go.work file to check if it indeed, works out?

@eliottness Thank you, I still have some questions to ask, which is the problem of struct transfer. When cgo is enabled, the memory of the struct cannot be aligned.

type cImage struct {
	width   uint32
	height  uint32
	channel uint32
	data    uintptr
}
func (c *CStableDiffusionImpl) UpscaleImage(ctx *CUpScalerCtx, img Image, upscaleFactor uint32) Image {
        // not work with cgo
	uptr := c.upscale(ctx.ctx, uintptr(unsafe.Pointer(&img)), upscaleFactor)
	ptr := *(*unsafe.Pointer)(unsafe.Pointer(&uptr))
	if ptr == nil {
		return Image{}
	}
	cimg := (*cImage)(ptr)
	dataPtr := *(*unsafe.Pointer)(unsafe.Pointer(&cimg.data))
	return Image{
		Width:   cimg.width,
		Height:  cimg.height,
		Channel: cimg.channel,
		Data:    unsafe.Slice((*byte)(dataPtr), cimg.channel*cimg.width*cimg.height),
	}
}

I think Elliottness is right about the cause. I just find it strange that reproducing only happens when another package imports a package with purego in it. Otherwise it does work. But if your fix solves this then that's great!

Thank God, this fix seems to also fix the struct alignment issue? ? ? I'm really confused about the underlying logic. It would be great if someone from Golang official could explain it.

Let me make a small attempt: @rsc

It likely has to do with the fact that //go:cgo_import_dynamic is for internal linking while //go:cgo_import_static is for external linking. The former is what purego uses and the later can't be used outside the compiler. I just don't know why it worked before.

I'll tag v0.5.2 and v0.6.0-alpha.4 a few days later, if we don't find any other issues.

Let me reopen this. We are not able to cherry-pick this to 0.5 yet.

Done