fp16 does not work on CPU

Question

fp16 does not work on CPU

Opened this issue a year ago · 2 comments

Bug description

I am not able to run inference with fp16 on CPU.

How to reproduce

Describe steps or include command to reproduce the behavior.

echo "▁Hello" | marian-decoder -m model.bin -v model.spv model.spv --cpu-threads 1 --precision float16

Context

Marian version: v1.12.0 65bf82f 2023-02-21 09:56:29 -0800
CMake command:

cmake ..  -DCOMPILE_CPU=on -DCOMPILE_FP16=on

./marian-decoder --build-info all

AVX2_FOUND=true
AVX512_FOUND=true
AVX_FOUND=true
BLAS_flexiblas_LIBRARY=BLAS_flexiblas_LIBRARY-NOTFOUND
BLAS_goto2_LIBRARY=BLAS_goto2_LIBRARY-NOTFOUND
BLAS_mkl_LIBRARY=BLAS_mkl_LIBRARY-NOTFOUND
BLAS_mkl_em64t_LIBRARY=BLAS_mkl_em64t_LIBRARY-NOTFOUND
BLAS_mkl_ia32_LIBRARY=BLAS_mkl_ia32_LIBRARY-NOTFOUND
BLAS_mkl_intel_LIBRARY=BLAS_mkl_intel_LIBRARY-NOTFOUND
BLAS_mkl_intel_lp64_LIBRARY=BLAS_mkl_intel_lp64_LIBRARY-NOTFOUND
BLAS_mkl_rt_LIBRARY=BLAS_mkl_rt_LIBRARY-NOTFOUND
BLAS_openblas_LIBRARY=/usr/lib/x86_64-linux-gnu/libopenblas.so
BUILD_ARCH=native
CMAKE_ADDR2LINE=/usr/bin/addr2line
CMAKE_AR=/usr/bin/ar
CMAKE_BUILD_TYPE=Release
CMAKE_COLOR_MAKEFILE=ON
CMAKE_CXX_COMPILER=/usr/bin/c++
CMAKE_CXX_COMPILER_AR=/usr/bin/gcc-ar-9
CMAKE_CXX_COMPILER_RANLIB=/usr/bin/gcc-ranlib-9
CMAKE_CXX_FLAGS=-std=c++11 -pthread -Wl,--no-as-needed -fPIC -Wno-unused-result  -march=native  -DUSE_SENTENCEPIECE -DCUDA_FOUND -DUSE_NCCL -DMKL_ILP64 -m64
CMAKE_CXX_FLAGS_DEBUG=-O0 -g -rdynamic
CMAKE_CXX_FLAGS_MINSIZEREL=-Os -DNDEBUG
CMAKE_CXX_FLAGS_RELEASE=-O3 -m64 -funroll-loops -g -rdynamic
CMAKE_CXX_FLAGS_RELWITHDEBINFO=-O3 -m64 -funroll-loops -g -rdynamic
CMAKE_C_COMPILER=/usr/bin/cc
CMAKE_C_COMPILER_AR=/usr/bin/gcc-ar-9
CMAKE_C_COMPILER_RANLIB=/usr/bin/gcc-ranlib-9
CMAKE_C_FLAGS=-pthread -Wl,--no-as-needed -fPIC -Wno-unused-result  -march=native  -DMKL_ILP64 -m64
CMAKE_C_FLAGS_DEBUG=-O0 -g -rdynamic
CMAKE_C_FLAGS_MINSIZEREL=-Os -DNDEBUG
CMAKE_C_FLAGS_RELEASE=-O3 -m64 -funroll-loops -g -rdynamic
CMAKE_C_FLAGS_RELWITHDEBINFO=-O3 -m64 -funroll-loops -g -rdynamic
CMAKE_DLLTOOL=CMAKE_DLLTOOL-NOTFOUND
CMAKE_INSTALL_BINDIR=bin
CMAKE_INSTALL_DATAROOTDIR=share
CMAKE_INSTALL_INCLUDEDIR=include
CMAKE_INSTALL_LIBDIR=lib
CMAKE_INSTALL_LIBEXECDIR=libexec
CMAKE_INSTALL_LOCALSTATEDIR=var
CMAKE_INSTALL_OLDINCLUDEDIR=/usr/include
CMAKE_INSTALL_PREFIX=/usr/local
CMAKE_INSTALL_SBINDIR=sbin
CMAKE_INSTALL_SHAREDSTATEDIR=com
CMAKE_INSTALL_SYSCONFDIR=etc
CMAKE_LINKER=/usr/bin/ld
CMAKE_MAKE_PROGRAM=/usr/bin/make
CMAKE_NM=/usr/bin/nm
CMAKE_OBJCOPY=/usr/bin/objcopy
CMAKE_OBJDUMP=/usr/bin/objdump
CMAKE_RANLIB=/usr/bin/ranlib
CMAKE_READELF=/usr/bin/readelf
CMAKE_SKIP_INSTALL_RPATH=NO
CMAKE_SKIP_RPATH=NO
CMAKE_STRIP=/usr/bin/strip
CMAKE_TAPI=CMAKE_TAPI-NOTFOUND
CMAKE_VERBOSE_MAKEFILE=FALSE
COMPILE-FP16=on
COMPILE_AMPERE=ON
COMPILE_AMPERE_RTX=ON
COMPILE_AVX=ON
COMPILE_AVX2=ON
COMPILE_AVX512=ON
COMPILE_CPU=on
COMPILE_CUDA=ON
COMPILE_EXAMPLES=OFF
COMPILE_KEPLER=OFF
COMPILE_LIBRARY_ONLY=OFF
COMPILE_MAXWELL=OFF
COMPILE_PASCAL=ON
COMPILE_SERVER=OFF
COMPILE_SSE2=ON
COMPILE_SSE3=ON
COMPILE_SSE4_1=ON
COMPILE_SSE4_2=ON
COMPILE_TESTS=OFF
COMPILE_TURING=ON
COMPILE_VOLTA=ON
CUDA_64_BIT_DEVICE_CODE=ON
CUDA_ATTACH_VS_BUILD_RULE_TO_CUDA_FILE=ON
CUDA_BUILD_CUBIN=OFF
CUDA_BUILD_EMULATION=OFF
CUDA_CUDART_LIBRARY=/usr/local/cuda-11.8/lib64/libcudart.so
CUDA_CUDA_LIBRARY=/usr/lib/x86_64-linux-gnu/libcuda.so
CUDA_HOST_COMPILATION_CPP=ON
CUDA_HOST_COMPILER=/usr/bin/cc
CUDA_NVCC_EXECUTABLE=/usr/local/cuda-11.8/bin/nvcc
CUDA_NVCC_FLAGS=-DUSE_SENTENCEPIECE-DCUDA_FOUND-DUSE_NCCL--default-streamper-thread-O3-g--use_fast_math-Wno-deprecated-gpu-targets-gencode=arch=compute_60,code=sm_60-gencode=arch=compute_61,code=sm_61-arch=sm_70-gencode=arch=compute_70,code=sm_70-gencode=arch=compute_70,code=compute_70-gencode=arch=compute_75,code=sm_75-gencode=arch=compute_75,code=compute_75-gencode=arch=compute_80,code=sm_80-gencode=arch=compute_80,code=compute_80-gencode=arch=compute_86,code=sm_86-gencode=arch=compute_86,code=compute_86-ccbin/usr/bin/cc-std=c++11-Xcompiler -fPIC-Xcompiler -Wno-unused-result-Xcompiler -Wno-deprecated-Xcompiler -Wno-pragmas-Xcompiler -Wno-unused-value-Xcompiler -Werror-DDETERMINISTIC=0
CUDA_OpenCL_LIBRARY=/usr/local/cuda-11.8/lib64/libOpenCL.so
CUDA_PROPAGATE_HOST_FLAGS=OFF
CUDA_SDK_ROOT_DIR=CUDA_SDK_ROOT_DIR-NOTFOUND
CUDA_SEPARABLE_COMPILATION=OFF
CUDA_TOOLKIT_INCLUDE=/usr/local/cuda-11.8/include
CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.8
CUDA_USE_STATIC_CUDA_RUNTIME=ON
CUDA_VERBOSE_BUILD=OFF
CUDA_VERSION=11.8
CUDA_cublasLt_LIBRARY=/usr/local/cuda-11.8/lib64/libcublasLt.so
CUDA_cublas_LIBRARY=/usr/local/cuda-11.8/lib64/libcublas.so
CUDA_cudadevrt_LIBRARY=/usr/local/cuda-11.8/lib64/libcudadevrt.a
CUDA_cudart_static_LIBRARY=/usr/local/cuda-11.8/lib64/libcudart_static.a
CUDA_cufft_LIBRARY=/usr/local/cuda-11.8/lib64/libcufft.so
CUDA_cupti_LIBRARY=/usr/local/cuda-11.8/extras/CUPTI/lib64/libcupti.so
CUDA_curand_LIBRARY=/usr/local/cuda-11.8/lib64/libcurand.so
CUDA_cusolver_LIBRARY=/usr/local/cuda-11.8/lib64/libcusolver.so
CUDA_cusparse_LIBRARY=/usr/local/cuda-11.8/lib64/libcusparse.so
CUDA_nppc_LIBRARY=/usr/local/cuda-11.8/lib64/libnppc.so
CUDA_nppial_LIBRARY=/usr/local/cuda-11.8/lib64/libnppial.so
CUDA_nppicc_LIBRARY=/usr/local/cuda-11.8/lib64/libnppicc.so
CUDA_nppidei_LIBRARY=/usr/local/cuda-11.8/lib64/libnppidei.so
CUDA_nppif_LIBRARY=/usr/local/cuda-11.8/lib64/libnppif.so
CUDA_nppig_LIBRARY=/usr/local/cuda-11.8/lib64/libnppig.so
CUDA_nppim_LIBRARY=/usr/local/cuda-11.8/lib64/libnppim.so
CUDA_nppist_LIBRARY=/usr/local/cuda-11.8/lib64/libnppist.so
CUDA_nppisu_LIBRARY=/usr/local/cuda-11.8/lib64/libnppisu.so
CUDA_nppitc_LIBRARY=/usr/local/cuda-11.8/lib64/libnppitc.so
CUDA_npps_LIBRARY=/usr/local/cuda-11.8/lib64/libnpps.so
CUDA_nvToolsExt_LIBRARY=/usr/local/cuda-11.8/lib64/libnvToolsExt.so
CUDA_rt_LIBRARY=/usr/lib/x86_64-linux-gnu/librt.so
DETERMINISTIC=OFF
DOXYGEN_DOT_EXECUTABLE=/usr/bin/dot
DOXYGEN_EXECUTABLE=DOXYGEN_EXECUTABLE-NOTFOUND
GENERATE_MARIAN_INSTALL_TARGETS=OFF
GIT_EXECUTABLE=/usr/bin/git
INTEL_ROOT=/opt/intel
INTGEMM_CPUID_ENVIRONMENT=ON
INTGEMM_DONT_BUILD_TESTS=ON
MKL_CORE_LIBRARY=/opt/intel/mkl/lib/intel64/libmkl_core.a
MKL_INCLUDE_DIR=/opt/intel/mkl/include
MKL_INCLUDE_DIRS=/opt/intel/mkl/include
MKL_INTERFACE_LIBRARY=/opt/intel/mkl/lib/intel64/libmkl_intel_ilp64.a
MKL_LIBRARIES=-Wl,--start-group/opt/intel/mkl/lib/intel64/libmkl_intel_ilp64.a/opt/intel/mkl/lib/intel64/libmkl_sequential.a/opt/intel/mkl/lib/intel64/libmkl_core.a-Wl,--end-group
MKL_ROOT=/opt/intel/mkl
MKL_SEQUENTIAL_LAYER_LIBRARY=/opt/intel/mkl/lib/intel64/libmkl_sequential.a
SPM_ARTIFACT_NAME=sentencepiece
SPM_BUILD_TEST=OFF
SPM_COVERAGE=OFF
SPM_ENABLE_NFKC_COMPILE=OFF
SPM_ENABLE_SHARED=OFF
SPM_ENABLE_TCMALLOC=ON
SPM_ENABLE_TENSORFLOW_SHARED=OFF
SPM_NO_THREADLOCAL=OFF
SPM_TCMALLOC_STATIC=OFF
SPM_USE_BUILTIN_PROTOBUF=ON
SQLITE_ENABLE_ASSERT_HANDLER=OFF
SQLITE_ENABLE_COLUMN_METADATA=ON
SQLITE_USE_LEGACY_STRUCT=OFF
SSE2_FOUND=true
SSE3_FOUND=true
SSE4_1_FOUND=true
SSE4_2_FOUND=true
SSSE3_FOUND=true
TCMALLOC_LIB=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so
USE_APPLE_ACCELERATE=OFF
USE_CCACHE=OFF
USE_CUDNN=OFF
USE_DOXYGEN=ON
USE_FBGEMM=OFF
USE_MKL=ON
USE_MPI=OFF
USE_NCCL=ON
USE_OPENMP=OFF
USE_SENTENCEPIECE=ON
USE_STATIC_LIBS=OFF
WORMHOLE=OFF
cblas_openblas_INCLUDE=/usr/include/x86_64-linux-gnu
cblas_openblas_LIBRARY=/usr/lib/x86_64-linux-gnu/libopenblas.so

Log file:
Here is the log of the marian-decoder

[2023-12-19 18:14:58] [config] allow-unk: false
[2023-12-19 18:14:58] [config] authors: false
[2023-12-19 18:14:58] [config] beam-size: 12
[2023-12-19 18:14:58] [config] bert-class-symbol: "[CLS]"
[2023-12-19 18:14:58] [config] bert-mask-symbol: "[MASK]"
[2023-12-19 18:14:58] [config] bert-masking-fraction: 0.15
[2023-12-19 18:14:58] [config] bert-sep-symbol: "[SEP]"
[2023-12-19 18:14:58] [config] bert-train-type-embeddings: true
[2023-12-19 18:14:58] [config] bert-type-vocab-size: 2
[2023-12-19 18:14:58] [config] best-deep: false
[2023-12-19 18:14:58] [config] build-info: ""
[2023-12-19 18:14:58] [config] check-nan: false
[2023-12-19 18:14:58] [config] cite: false
[2023-12-19 18:14:58] [config] cpu-threads: 1
[2023-12-19 18:14:58] [config] data-threads: 4
[2023-12-19 18:14:58] [config] dec-cell: ssru
[2023-12-19 18:14:58] [config] dec-cell-base-depth: 2
[2023-12-19 18:14:58] [config] dec-cell-high-depth: 1
[2023-12-19 18:14:58] [config] dec-depth: 2
[2023-12-19 18:14:58] [config] devices:
[2023-12-19 18:14:58] [config]   - 0
[2023-12-19 18:14:58] [config] dim-emb: 256
[2023-12-19 18:14:58] [config] dim-rnn: 512
[2023-12-19 18:14:58] [config] dim-vocabs:
[2023-12-19 18:14:58] [config]   - 16000
[2023-12-19 18:14:58] [config]   - 16000
[2023-12-19 18:14:58] [config] dump-config: ""
[2023-12-19 18:14:58] [config] enc-cell: gru
[2023-12-19 18:14:58] [config] enc-cell-depth: 1
[2023-12-19 18:14:58] [config] enc-depth: 4
[2023-12-19 18:14:58] [config] enc-type: bidirectional
[2023-12-19 18:14:58] [config] factors-combine: sum
[2023-12-19 18:14:58] [config] factors-dim-emb: 0
[2023-12-19 18:14:58] [config] force-decode: false
[2023-12-19 18:14:58] [config] gemm-type: float32
[2023-12-19 18:14:58] [config] ignore-model-config: false
[2023-12-19 18:14:58] [config] input:
[2023-12-19 18:14:58] [config]   - stdin
[2023-12-19 18:14:58] [config] input-types:
[2023-12-19 18:14:58] [config]   []
[2023-12-19 18:14:58] [config] interpolate-env-vars: false
[2023-12-19 18:14:58] [config] layer-normalization: true
[2023-12-19 18:14:58] [config] lemma-dependency: ""
[2023-12-19 18:14:58] [config] lemma-dim-emb: 0
[2023-12-19 18:14:58] [config] log: ""
[2023-12-19 18:14:58] [config] log-level: info
[2023-12-19 18:14:58] [config] log-time-zone: ""
[2023-12-19 18:14:58] [config] max-length: 1000
[2023-12-19 18:14:58] [config] max-length-crop: false
[2023-12-19 18:14:58] [config] max-length-factor: 3
[2023-12-19 18:14:58] [config] maxi-batch: 1
[2023-12-19 18:14:58] [config] maxi-batch-sort: none
[2023-12-19 18:14:58] [config] mini-batch: 1
[2023-12-19 18:14:58] [config] mini-batch-words: 0
[2023-12-19 18:14:58] [config] model-mmap: false
[2023-12-19 18:14:58] [config] models:
[2023-12-19 18:14:58] [config]   - /data/workspace/experiments/marian/models/default/model.bin
[2023-12-19 18:14:58] [config] n-best: false
[2023-12-19 18:14:58] [config] no-spm-decode: false
[2023-12-19 18:14:58] [config] normalize: 0
[2023-12-19 18:14:58] [config] num-devices: 0
[2023-12-19 18:14:58] [config] optimize: false
[2023-12-19 18:14:58] [config] output: stdout
[2023-12-19 18:14:58] [config] output-approx-knn:
[2023-12-19 18:14:58] [config]   []
[2023-12-19 18:14:58] [config] output-omit-bias: false
[2023-12-19 18:14:58] [config] output-sampling:
[2023-12-19 18:14:58] [config]   []
[2023-12-19 18:14:58] [config] precision:
[2023-12-19 18:14:58] [config]   - float16
[2023-12-19 18:14:58] [config] quantize-range: 0
[2023-12-19 18:14:58] [config] quiet: false
[2023-12-19 18:14:58] [config] quiet-translation: false
[2023-12-19 18:14:58] [config] relative-paths: false
[2023-12-19 18:14:58] [config] right-left: false
[2023-12-19 18:14:58] [config] seed: 0
[2023-12-19 18:14:58] [config] shortlist:
[2023-12-19 18:14:58] [config]   []
[2023-12-19 18:14:58] [config] skip: false
[2023-12-19 18:14:58] [config] skip-cost: false
[2023-12-19 18:14:58] [config] stat-freq: 0
[2023-12-19 18:14:58] [config] tied-embeddings: false
[2023-12-19 18:14:58] [config] tied-embeddings-all: true
[2023-12-19 18:14:58] [config] tied-embeddings-src: false
[2023-12-19 18:14:58] [config] transformer-aan-activation: swish
[2023-12-19 18:14:58] [config] transformer-aan-depth: 2
[2023-12-19 18:14:58] [config] transformer-aan-nogate: false
[2023-12-19 18:14:58] [config] transformer-decoder-autoreg: rnn
[2023-12-19 18:14:58] [config] transformer-decoder-dim-ffn: 0
[2023-12-19 18:14:58] [config] transformer-decoder-ffn-depth: 0
[2023-12-19 18:14:58] [config] transformer-depth-scaling: false
[2023-12-19 18:14:58] [config] transformer-dim-aan: 1024
[2023-12-19 18:14:58] [config] transformer-dim-ffn: 1024
[2023-12-19 18:14:58] [config] transformer-ffn-activation: relu
[2023-12-19 18:14:58] [config] transformer-ffn-depth: 2
[2023-12-19 18:14:58] [config] transformer-guided-alignment-layer: last
[2023-12-19 18:14:58] [config] transformer-heads: 4
[2023-12-19 18:14:58] [config] transformer-no-projection: false
[2023-12-19 18:14:58] [config] transformer-pool: false
[2023-12-19 18:14:58] [config] transformer-postprocess: dan
[2023-12-19 18:14:58] [config] transformer-postprocess-emb: d
[2023-12-19 18:14:58] [config] transformer-postprocess-top: ""
[2023-12-19 18:14:58] [config] transformer-preprocess: ""
[2023-12-19 18:14:58] [config] transformer-rnn-projection: false
[2023-12-19 18:14:58] [config] transformer-tied-layers:
[2023-12-19 18:14:58] [config]   []
[2023-12-19 18:14:58] [config] transformer-train-position-embeddings: false
[2023-12-19 18:14:58] [config] tsv: false
[2023-12-19 18:14:58] [config] tsv-fields: 0
[2023-12-19 18:14:58] [config] type: transformer
[2023-12-19 18:14:58] [config] ulr: false
[2023-12-19 18:14:58] [config] ulr-dim-emb: 0
[2023-12-19 18:14:58] [config] ulr-trainable-transformation: false
[2023-12-19 18:14:58] [config] version: v1.10.14; 1cc16bc4 2022-10-10 12:00:00 +0200
[2023-12-19 18:14:58] [config] vocabs:
[2023-12-19 18:14:58] [config]   - /data/workspace/experiments/marian/models/default/model.spv
[2023-12-19 18:14:58] [config]   - /data/workspace/experiments/marian/models/default/model.spv
[2023-12-19 18:14:58] [config] weights:
[2023-12-19 18:14:58] [config]   []
[2023-12-19 18:14:58] [config] word-penalty: 0
[2023-12-19 18:14:58] [config] word-scores: false
[2023-12-19 18:14:58] [config] workspace: 512
[2023-12-19 18:14:58] [config] Loaded model has been created with Marian v1.10.14; 1cc16bc4 2022-10-10 12:00:00 +0200
[2023-12-19 18:14:58] [data] Loading vocabulary from text file /data/workspace/experiments/marian/models/default/model.spv
[2023-12-19 18:14:58] [data] Loading vocabulary from text file /data/workspace/experiments/marian/models/default/model.spv
[2023-12-19 18:14:58] Loading model from /data/workspace/experiments/marian/models/default/model.bin
[2023-12-19 18:14:58] [memory] Extending reserved space to 512 MB (device cpu0)
[2023-12-19 18:14:58] Loaded model config
[2023-12-19 18:14:58] Loading scorer of type transformer as feature F0
[2023-12-19 18:14:59] [memory] Reserving 81 MB, device cpu0
[2023-12-19 18:14:59] Error: Unsupported type for element-wise operation: float16
[2023-12-19 18:14:59] Error: Aborted from void marian::cpu::Element(const Functor&, marian::Tensor, Tensors ...) [with Functor = marian::functional::Assign<marian::functional::Var<1>, marian::functional::BinaryFunctor<marian::functional::elem::Mult, marian::functional::Capture, marian::functional::Assignee<2> > >; Tensors = {IntrusivePtr<marian::TensorBase>}; marian::Tensor = IntrusivePtr<marian::TensorBase>] in /data/workspace/code/marian/src/tensors/cpu/element.h:122

[CALL STACK]
[0x55c3c328d12e]    void marian::cpu::  Element  <marian::functional::Assign<marian::functional::Var<1>,marian::functional::BinaryFunctor<marian::functional::elem::Mult,marian::functional::Capture,marian::functional::Assignee<2>>>,IntrusivePtr<marian::TensorBase>>(marian::functional::Assign<marian::functional::Var<1>,marian::functional::BinaryFunctor<marian::functional::elem::Mult,marian::functional::Capture,marian::functional::Assignee<2>>> const&,  IntrusivePtr<marian::TensorBase>,  IntrusivePtr<marian::TensorBase>) + 0x2ce
[0x55c3c328d76c]    void marian::  Element  <marian::functional::Assign<marian::functional::Var<1>,marian::functional::BinaryFunctor<marian::functional::elem::Mult,marian::functional::Capture,marian::functional::Assignee<2>>>,IntrusivePtr<marian::TensorBase>>(marian::functional::Assign<marian::functional::Var<1>,marian::functional::BinaryFunctor<marian::functional::elem::Mult,marian::functional::Capture,marian::functional::Assignee<2>>>,  IntrusivePtr<marian::TensorBase>,  IntrusivePtr<marian::TensorBase>) + 0x1bc
[0x55c3c328d8a1]    std::_Function_handler<void (),marian::ScalarMultNodeOp::forwardOps()::{lambda()#1}>::  _M_invoke  (std::_Any_data const&) + 0xa1
[0x55c3c32a653f]    marian::Node::  forward  ()                        + 0x21f
[0x55c3c311f8f5]    marian::ExpressionGraph::  forward  (std::__cxx11::list<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>,std::allocator<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>>>&,  bool) + 0x205
[0x55c3c3121019]    marian::ExpressionGraph::  forwardNext  ()         + 0x2d9
[0x55c3c32f5459]    marian::BeamSearch::  search  (std::shared_ptr<marian::ExpressionGraph>,  std::shared_ptr<marian::data::CorpusBatch>) + 0x42f9
[0x55c3c2fb2e8a]    marian::Translate<marian::BeamSearch>::run()::{lambda(unsigned long)#1}::  operator()  (unsigned long) const + 0x6ba
[0x55c3c2fb4de4]    marian::ThreadPool::enqueue<marian::Translate<marian::BeamSearch>::run()::{lambda(unsigned long)#1}&,unsigned long>(std::result_of&&,(marian::Translate<marian::BeamSearch>::run()::{lambda(unsigned long)#1}&)...)::{lambda()#1}::  operator()  () const + 0x34
[0x55c3c2fb59c4]    std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base,std::__future_base::_Result_base::_Deleter> (),std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>,std::__future_base::_Result_base::_Deleter>,std::__future_base::_Task_state<marian::ThreadPool::enqueue<marian::Translate<marian::BeamSearch>::run()::{lambda(unsigned long)#1}&,unsigned long>(std::result_of&&,(marian::Translate<marian::BeamSearch>::run()::{lambda(unsigned long)#1}&)...)::{lambda()#1},std::allocator<int>,void ()>::_M_run()::{lambda()#1},void>>::  _M_invoke  (std::_Any_data const&) + 0x34
[0x55c3c2f5865d]    std::__future_base::_State_baseV2::  _M_do_set  (std::function<std::unique_ptr<std::__future_base::_Result_base,std::__future_base::_Result_base::_Deleter> ()>*,  bool*) + 0x2d
[0x7fdeb06c34df]                                                       + 0x114df
[0x55c3c2f59b7c]    std::__future_base::_Task_state<marian::ThreadPool::enqueue<marian::Translate<marian::BeamSearch>::run()::{lambda(unsigned long)#1}&,unsigned long>(std::result_of&&,(marian::Translate<marian::BeamSearch>::run()::{lambda(unsigned long)#1}&)...)::{lambda()#1},std::allocator<int>,void ()>::  _M_run  () + 0xfc
[0x55c3c2f5900e]    std::thread::_State_impl<std::thread::_Invoker<std::tuple<marian::ThreadPool::reserve(unsigned long)::{lambda()#1}>>>::  _M_run  () + 0x16e
[0x7fdeb05a6df4]                                                       + 0xd6df4
[0x7fdeb06ba609]                                                       + 0x8609
[0x7fdeb0291353]    clone                                              + 0x43

Aborted (core dumped)

Add any other information about the problem here.

the inference works properly on CPU without --fp16
the inference works properly on GPU both with and without --fp16
the decoder cell type is: dec-cell: ssru

Actually, by looking at this
it seems that float16 is not enabled at all, whereas the documentation (https://marian-nmt.github.io/docs/cmd/marian-decoder/) says it is.

Answer 1 · 2023-12-29T00:04:10.000Z

Hi, fp16 isn't a CPU type, that's GPU-only. The error message could be a bit clearer, but that's not supposed to work.

Answer 2 · 2023-12-30T19:07:06.000Z

thank you very much for your reply.
I would suggest to explain better in the documentation as well.