add_library INTERFACE library requires no source arguments.
songkq opened this issue · 12 comments
Thanks for sharing the bindings of hugging face tokenizer. When I build it, it failed with the problem. Could you please share a tutorial for usage?
CMake Error at CMakeLists.txt:109 (add_library):
add_library INTERFACE library requires no source arguments.
CMake Error at CMakeLists.txt:110 (target_link_libraries):
Cannot specify link libraries for target "tokenizers_c" which is not built
by this project.
Interesting, this could due to difference of cmake version, can you check your cmake version.
@tqchen Thanks. I have confirmed that Cmake-3.26.4
and -std=c++17
are required.
set(TOKENIZERS_RUST_LIB "${TOKENIZERS_CPP_CARGO_BINARY_DIR}/libtokenizers_c.a")
As shown in the CMakeLists.txt
, does it mean that I need to compile the huggingface/tokenizers
library as libtokenizers_c.a
first?
it should get compiled automatically
Did you confirm that you have rust and cargo installed btw?
@tqchen @junrushao Thanks. After configuring the cargo env, it can be compiled automatically.
After finishing compiling, I get the library libtokenizers_cpp.a
and libtokenizers_c.a
.
Suppose I just want to use the SentencePieceTokenizer
interface, only the library libtokenizers_cpp.a
and the tokenizers_cpp.h
are required to be added in my program, right?
Here is a sentencepiece testcase. However, the returned result is empty. Could you please give some advice?
[debug] input_text = hello world
[debug] token_ids =
[debug] recover_text =
#include <iostream>
#include <vector>
#include <string>
#include "tokenizers_cpp.h"
#include "sentencepiece_tokenizer.cc"
int main() {
std::unique_ptr<tokenizers::Tokenizer> tokenizer = std::make_unique<tokenizers::SentencePieceTokenizer>("spiece.model");
std::string text = "hello world";
printf("[debug] input_text = %s\n", text.c_str());
auto token_ids = tokenizer->Encode(text);
printf("[debug] token_ids = ");
for(const int token_id: token_ids){
printf("%d, ", token_id);
}
printf("\n");
auto recover_text = tokenizer->Decode(token_ids);
printf("[debug] recover_text = %s\n", recover_text.c_str());
return 0;
}
all the interface takes in model binary blob instead of file name
Added some examples to https://github.com/mlc-ai/tokenizers-cpp, please check it out and send PR to further improve it if you like
@tqchen Thanks for sharing the examples. The target_link_libraries(tokenizers_c INTERFACE ${TOKENIZERS_RUST_LIB} ${CMAKE_DL_LIBS})
is required to be set in the tokenizers-cpp CMakeLists.txt
. Or it failed with the following issue.
tokenizers/release/libtokenizers_c.a(std-946b15357ac77df4.std.1ade4ed0-cgu.0.rcgu.o): In function `std::sys::unix::weak::fetch':
/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/sys/unix/weak.rs:138: undefined reference to `dlsym'
collect2: error: ld returned 1 exit status
CMakeFiles/example.dir/build.make:99: recipe for target 'example' failed
make[2]: *** [example] Error 1
CMakeFiles/Makefile2:165: recipe for target 'CMakeFiles/example.dir/all' failed
make[1]: *** [CMakeFiles/example.dir/all] Error 2
Makefile:155: recipe for target 'all' failed
make: *** [all] Error 2
thanks @songkq , do you mind send a PR? I think we can detect linux system name and set it here https://github.com/mlc-ai/tokenizers-cpp/blob/main/CMakeLists.txt#L20 (just like foundation for iOS)