mlc-ai/tokenizers-cpp

undefined symbol: open64 when run build.sh in web dir

helloburke opened this issue · 1 comments

when i run build.sh in web dir, i got error:
`burke@instance-1:~/project/tmp/tokenizers-cpp-0.1.0/web$ ./build.sh

  • rustup target add wasm32-unknown-emscripten
    info: component 'rust-std' for target 'wasm32-unknown-emscripten' is up to date
  • mkdir -p build
  • cd build
  • emcmake cmake ../.. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS=-O3
    configure: cmake ../.. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS=-O3 -DCMAKE_TOOLCHAIN_FILE=/home/burke/project/emsdk/upstream/emscripten/cmake/Modules/Platform/Emscripten.cmake -DCMAKE_CROSSCOMPILING_EMULATOR=/home/burke/project/emsdk/node/16.20.0_64bit/bin/node
    -- system-nameEmscripten
    -- VERSION: 0.2.00
    -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
    -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
    -- Found Threads: TRUE
    -- Configuring done
    -- Generating done
    -- Build files have been written to: /home/burke/project/tmp/tokenizers-cpp-0.1.0/web/build
  • emmake make tokenizers_cpp tokenizers_c sentencepiece-static -j8
    make: make tokenizers_cpp tokenizers_c sentencepiece-static -j8
    [ 2%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir//third_party/protobuf-lite/arena.cc.o
    [ 4%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/
    /third_party/protobuf-lite/arenastring.cc.o
    [ 6%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir//third_party/protobuf-lite/bytestream.cc.o
    [ 8%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/
    /third_party/protobuf-lite/common.cc.o
    [ 11%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir//third_party/protobuf-lite/extension_set.cc.o
    [ 13%] Generating wasm32-unknown-emscripten/release/libtokenizers_c.a
    [ 15%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/
    /third_party/protobuf-lite/coded_stream.cc.o
    [ 17%] Building CXX object CMakeFiles/tokenizer_cpp_objs.dir/src/sentencepiece_tokenizer.cc.o
    Updating crates.io index
    Compiling libc v0.2.147
    Compiling proc-macro2 v1.0.66
    Compiling unicode-ident v1.0.11
    Compiling cfg-if v1.0.0
    Compiling autocfg v1.1.0
    [ 20%] Building CXX object CMakeFiles/tokenizer_cpp_objs.dir/src/huggingface_tokenizer.cc.o
    [ 22%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir//third_party/protobuf-lite/generated_enum_util.cc.o
    [ 24%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/
    /third_party/protobuf-lite/generated_message_table_driven_lite.cc.o
    [ 26%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir//third_party/protobuf-lite/generated_message_util.cc.o
    Compiling quote v1.0.32
    [ 28%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/
    /third_party/protobuf-lite/implicit_weak_message.cc.o
    Compiling crossbeam-utils v0.8.16
    [ 31%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir//third_party/protobuf-lite/int128.cc.o
    Compiling memchr v2.5.0
    Compiling syn v1.0.109
    [ 31%] Built target tokenizer_cpp_objs ] 11/103: syn(build.rs), memchr(build.rs)
    [ 33%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/
    /third_party/protobuf-lite/io_win32.cc.o
    Compiling memoffset v0.9.0
    [ 35%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir//third_party/protobuf-lite/message_lite.cc.o
    [ 37%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/
    /third_party/protobuf-lite/parse_context.cc.o
    [ 40%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir//third_party/protobuf-lite/repeated_field.cc.o
    [ 42%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/
    /third_party/protobuf-lite/status.cc.o
    [ 44%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir//third_party/protobuf-lite/statusor.cc.o
    [ 46%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/
    /third_party/protobuf-lite/stringpiece.cc.o
    Compiling crossbeam-epoch v0.9.15
    [ 48%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir//third_party/protobuf-lite/stringprintf.cc.o
    Compiling strsim v0.10.0
    [ 51%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/
    /third_party/protobuf-lite/structurally_valid.cc.o
    [ 53%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir//third_party/protobuf-lite/strutil.cc.o
    Compiling fnv v1.0.7
    Compiling ident_case v1.0.1
    [ 55%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/
    /third_party/protobuf-lite/time.cc.o
    [ 57%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir//third_party/protobuf-lite/wire_format_lite.cc.o
    [ 60%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/
    /third_party/protobuf-lite/zero_copy_stream.cc.o
    [ 62%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir//third_party/protobuf-lite/zero_copy_stream_impl.cc.o
    Compiling scopeguard v1.2.0
    [ 64%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/
    /third_party/protobuf-lite/zero_copy_stream_impl_lite.cc.o
    Compiling serde v1.0.183
    [ 66%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/builtin_pb/sentencepiece.pb.cc.o
    Compiling darling_core v0.14.4
    [ 68%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/builtin_pb/sentencepiece_model.pb.cc.o
    [ 71%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/bpe_model.cc.o
    [ 73%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/char_model.cc.o
    [ 75%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/error.cc.o
    [ 77%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/filesystem.cc.o
    [ 80%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/model_factory.cc.o
    [ 82%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/model_interface.cc.o
    [ 84%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/normalizer.cc.o
    [ 86%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/sentencepiece_processor.cc.o
    [ 88%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/unigram_model.cc.o
    Compiling darling_macro v0.14.4
    [ 91%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/util.cc.o
    [ 93%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/word_model.cc.o
    [ 95%] Building CXX object sentencepiece/src/CMakeFiles/sentencepiece-static.dir/__/third_party/absl/flags/flag.cc.o
    Compiling cc v1.0.82
    Compiling syn v2.0.28
    Compiling either v1.9.0
    Compiling rayon-core v1.11.0
    Compiling pkg-config v0.3.27
    Compiling serde_derive v1.0.183
    Compiling onig_sys v69.8.1
    Compiling darling v0.14.4
    Compiling crossbeam-deque v0.8.3
    Compiling crossbeam-channel v0.5.8
    [ 97%] Linking CXX static library libsentencepiece.aossbeam-channel, serde
    [ 97%] Built target sentencepiece-static
    Compiling getrandom v0.2.10
    Compiling num_cpus v1.16.0
    Compiling paste v1.0.14
    Compiling rand_core v0.6.4
    Compiling derive_builder_core v0.12.0
    Compiling aho-corasick v1.0.3
    Compiling regex-syntax v0.7.4
    Compiling esaxx-rs v0.1.8
    Compiling thiserror v1.0.44
    Compiling minimal-lexical v0.2.1
    Compiling ppv-lite86 v0.2.17
    Compiling serde_json v1.0.104
    Compiling rand_chacha v0.3.1
    Compiling nom v7.1.3
    Compiling regex-automata v0.3.6
    Compiling derive_builder_macro v0.12.0
    Compiling rayon v1.7.0
    Compiling thiserror-impl v1.0.44
    Compiling monostate-impl v0.1.9
    Compiling itertools v0.8.2
    Compiling unicode-segmentation v1.10.1
    Compiling base64 v0.13.1
    Compiling itoa v1.0.9
    Compiling once_cell v1.18.0
    Compiling macro_rules_attribute-proc_macro v0.1.3
    Compiling ryu v1.0.15
    Compiling bitflags v1.3.2
    Compiling smallvec v1.11.0
    Compiling unicode-normalization-alignments v0.1.12
    Compiling onig v6.4.0
    Compiling macro_rules_attribute v0.1.3
    Compiling spm_precompiled v0.1.4
    Compiling rayon-cond v0.1.0
    Compiling monostate v0.1.9
    Compiling regex v1.9.3
    Compiling derive_builder v0.12.0
    Compiling rand v0.8.5
    Compiling itertools v0.9.0
    Compiling aho-corasick v0.7.20
    Compiling regex-syntax v0.6.29
    Compiling log v0.4.19
    Compiling lazy_static v1.4.0
    Compiling unicode_categories v0.1.1
    Compiling tokenizers v0.13.3
    Compiling tokenizers-c v0.1.0 (/home/burke/project/tmp/tokenizers-cpp-0.1.0/rust)
    Finished release [optimized] target(s) in 3m 47s
    [ 97%] Built target tokenizers_c
    [100%] Linking CXX static library libtokenizers_cpp.a
    [100%] Built target tokenizers_cpp
    [100%] Built target tokenizers_c
    Consolidate compiler generated dependencies of target sentencepiece-static
    [100%] Built target sentencepiece-static
  • cd ..
  • emcc --bind -o src/tokenizers_binding.js src/tokenizers_binding.cc build/libtokenizers_cpp.a build/libtokenizers_c.a build/sentencepiece/src/libsentencepiece.a -O3 -s EXPORT_ES6=1 -s MODULARIZE=1 -s SINGLE_FILE=1 -s EXPORTED_RUNTIME_METHODS=FS -s ALLOW_MEMORY_GROWTH=1 -I../include
    wasm-ld: error: build/libtokenizers_c.a(std-7970bb38e707128f.std.754a0857678cf097-cgu.0.rcgu.o): undefined symbol: open64
    wasm-ld: error: build/libtokenizers_c.a(std-7970bb38e707128f.std.754a0857678cf097-cgu.0.rcgu.o): undefined symbol: fstat64
    wasm-ld: error: build/libtokenizers_c.a(std-7970bb38e707128f.std.754a0857678cf097-cgu.0.rcgu.o): undefined symbol: lseek64
    emcc: error: '/home/burke/project/emsdk/upstream/bin/wasm-ld -o src/tokenizers_binding.wasm /tmp/emscripten_temp_ldj__90h/tokenizers_binding_0.o build/libtokenizers_cpp.a build/libtokenizers_c.a build/sentencepiece/src/libsentencepiece.a -L/home/burke/project/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten --whole-archive -lembind-rtti --no-whole-archive -lGL -lal -lhtml5 -lstubs -lnoexit -lc -ldlmalloc -lcompiler_rt -lc++-noexcept -lc++abi-noexcept -lsockets -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr /tmp/tmpzpxlwl8ulibemscripten_js_symbols.so --strip-debug --export-if-defined=main --export-if-defined=__start_em_asm --export-if-defined=__stop_em_asm --export-if-defined=__start_em_lib_deps --export-if-defined=__stop_em_lib_deps --export-if-defined=__start_em_js --export-if-defined=__stop_em_js --export-if-defined=__main_argc_argv --export=stackSave --export=stackRestore --export=stackAlloc --export=__errno_location --export=__get_temp_ret --export=__set_temp_ret --export=__wasm_call_ctors --export-table -z stack-size=65536 --initial-memory=16777216 --no-entry --max-memory=2147483648 --global-base=1024' failed (returned 1)`

how to fix that

modify the line in web/build.sh
emcc --bind -o src/tokenizers_binding.js src/tokenizers_binding.cc\ build/libtokenizers_cpp.a build/libtokenizers_c.a build/sentencepiece/src/libsentencepiece.a\ -O3 -s EXPORT_ES6=0 -s MODULARIZE=1 -s SINGLE_FILE=1 -s EXPORTED_RUNTIME_METHODS=FS -s ALLOW_MEMORY_GROWTH=1\ -I../include

to

emcc --bind -o src/tokenizers_binding.js src/tokenizers_binding.cc\ build/libtokenizers_cpp.a build/libtokenizers_c.a build/sentencepiece/src/libsentencepiece.a\ -O3 -s EXPORT_ES6=0 -s MODULARIZE=1 -s ERROR_ON_UNDEFINED_SYMBOLS=0 -s SINGLE_FILE=1 -s EXPORTED_RUNTIME_METHODS=FS -s ALLOW_MEMORY_GROWTH=1\ -I../include

can fix the bug

add -s ERROR_ON_UNDEFINED_SYMBOLS=0 in emcc ,so ignore the error