daulet/tokenizers

invalid argument when running example/main.go

Closed this issue · 7 comments

I have compiled the project through make build, and put the compiled libtokenizers.a file in the root directory of tokenizers project.

I can run all the test cases in tokenizer_test.go normally through go test.

But when I run example, the program can read libtokenizers.a correctly, but it will report an error when I run example use cmd go run main.go:

panic: invalid argument

goroutine 1 [running]:
main.main()
        /search/odin/liliang/tokenizers-0.7.1/example/main.go:12 +0x2f2
exit status 2

I've tried master/v0.7.1/v0.8.0/v0.60.0, and they all have the same problem.

My golang version info is: go version go1.22.0 linux/amd64

@lianoid you are probably missing a build parameter -ldflags="-extldflags '-L./'", which is required in latest release to hint the linker where the native library is.

@daulet Thank you for your answer.
I tried to add the parameters you mentioned and went to go run, but the program still reported the same error.
I use tokenizers.FromBytes(bertBaseTokenizeModelData) instead of tokenizers.FromFile("./test/data/bert-base-uncased.json") to solve this problem.

And I tried to run

	tk1, err1 := tokenizers.FromBytes(bertBaseTokenizeModelData)
	if err1 != nil {
		panic(err1)
	}
	// release native resources
	defer tk1.Close()

	tk, err := tokenizers.FromFile("./test/data/bert-base-uncased.json")
	if err != nil {
		panic(err)
	}
	// release native resources
	defer tk.Close()

	fmt.Println("Test2 Vocab size:", tk.VocabSize())
	// Vocab size: 30522
	fmt.Println(tk.Encode("brown fox jumps over the lazy dog", false))

And the Tokenizer tk loaded byFromFile can also Encode normally, but after I removed the loading of tk1, I reported the previous invalid argument error, although it sounds incredible.

can you paste the whole program you are trying to run, and the error you are getting? It's unclear what exactly is failing in your case.

I have the same issue: invalid argument.
The workaround above of using FromBytes instead of FromFile fixes the issue for now.

Looks like there are issues in FromFile

There are tests for FromFile in the module, so it is not completely broken. To make progress someone needs to share a full repro.

Have the same problem when using FromFile function. This err invalid argument is really interesting. It seems like it was missing something in FromFile

I'll close the issue until folks provide a repro. I trust there is an issue, but no one is sharing repro and I couldn't repro.