`minimum` makes Julia crash on A64FX
giordano opened this issue · 4 comments
$ JULIA_LLVM_ARGS="--aarch64-sve-vector-bits-min=512" julia -q
julia> versioninfo()
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
OS: Linux (aarch64-unknown-linux-gnu)
CPU: unknown
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, a64fx)
Environment:
JULIA_LLVM_ARGS = --aarch64-sve-vector-bits-min=512
julia> minimum([1])
LLVM ERROR: Cannot select: 0xadee70: v2i64 = AArch64ISD::DUPLANE64 0xada7e0, Constant:i64<1>, reduce.jl:638
0xada7e0: nxv2i64 = AArch64ISD::SMIN_PRED 0xaaf958, 0x917e98, 0xadd400, reduce.jl:638
0xaaf958: nxv2i1 = AArch64ISD::PTRUE TargetConstant:i64<2>, reduce.jl:638
0xadeed8: i64 = TargetConstant<2>
0x917e98: nxv2i64 = AArch64ISD::SMIN_PRED 0xaaf958, 0xae1c30, 0xadce50, reduce.jl:638
0xaaf958: nxv2i1 = AArch64ISD::PTRUE TargetConstant:i64<2>, reduce.jl:638
0xadeed8: i64 = TargetConstant<2>
0xae1c30: nxv2i64 = AArch64ISD::SMIN_PRED 0xaaf958, 0xae24b8, 0xadcf20, reduce.jl:638
0xaaf958: nxv2i1 = AArch64ISD::PTRUE TargetConstant:i64<2>, reduce.jl:638
0xadeed8: i64 = TargetConstant<2>
0xae24b8: nxv2i64 = insert_subvector undef:nxv2i64, 0xada230, Constant:i64<0>, reduce.jl:638
0xadefa8: nxv2i64 = undef
0xada230: v2i64,ch = CopyFromReg 0x6ad628, Register:v2i64 %58, reduce.jl:638
0xadf280: v2i64 = Register %58
0xada640: i64 = Constant<0>
0xadcf20: nxv2i64 = insert_subvector undef:nxv2i64, 0xadcf88, Constant:i64<0>, reduce.jl:638
0xadefa8: nxv2i64 = undef
0xadcf88: v2i64,ch = CopyFromReg 0x6ad628, Register:v2i64 %59, reduce.jl:638
0xadd878: v2i64 = Register %59
0xada640: i64 = Constant<0>
0xadce50: nxv2i64 = insert_subvector undef:nxv2i64, 0xadcff0, Constant:i64<0>, reduce.jl:638
0xadefa8: nxv2i64 = undef
0xadcff0: v2i64,ch = CopyFromReg 0x6ad628, Register:v2i64 %60, reduce.jl:638
0xada298: v2i64 = Register %60
0xada640: i64 = Constant<0>
0xadd400: nxv2i64 = insert_subvector undef:nxv2i64, 0xaaf548, Constant:i64<0>, reduce.jl:638
0xadefa8: nxv2i64 = undef
0xaaf548: v2i64,ch = CopyFromReg 0x6ad628, Register:v2i64 %61, reduce.jl:638
0xadd6d8: v2i64 = Register %61
0xada640: i64 = Constant<0>
0xada438: i64 = Constant<1>
In function: julia_mapreduce_impl_65
signal (6): Aborted
in expression starting at REPL[2]:1
gsignal at /lib64/libc.so.6 (unknown line)
abort at /lib64/libc.so.6 (unknown line)
_ZN4llvm18report_fatal_errorERKNS_5TwineEb at /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so (unknown line)
Allocations: 649721 (Pool: 649264; Big: 457); GC: 1
Aborted
This is a reduced reproducer of the crashes you get with BenchmarkTools.@benchmark
(in particular the BenchmarkTools.asciihist
function).
Edit: looking at the error message referencing mapreduce_impl
, I've got a more basic reproducer:
julia> Base.mapreduce_impl(identity, min, [1], 1, 1)
LLVM ERROR: Cannot select: 0xad9c80: v2i64 = AArch64ISD::DUPLANE64 0x917dc8, Constant:i64<1>, reduce.jl:638
0x917dc8: nxv2i64 = AArch64ISD::SMIN_PRED 0xae1fd8, 0xadd608, 0xadf968, reduce.jl:638
0xae1fd8: nxv2i1 = AArch64ISD::PTRUE TargetConstant:i64<2>, reduce.jl:638
0xae5210: i64 = TargetConstant<2>
0xadd608: nxv2i64 = AArch64ISD::SMIN_PRED 0xae1fd8, 0xadd878, 0xadeda0, reduce.jl:638
0xae1fd8: nxv2i1 = AArch64ISD::PTRUE TargetConstant:i64<2>, reduce.jl:638
0xae5210: i64 = TargetConstant<2>
0xadd878: nxv2i64 = AArch64ISD::SMIN_PRED 0xae1fd8, 0xae57c0, 0xadf760, reduce.jl:638
0xae1fd8: nxv2i1 = AArch64ISD::PTRUE TargetConstant:i64<2>, reduce.jl:638
0xae5210: i64 = TargetConstant<2>
0xae57c0: nxv2i64 = insert_subvector undef:nxv2i64, 0xadf558, Constant:i64<0>, reduce.jl:638
0xad9c18: nxv2i64 = undef
0xadf558: v2i64,ch = CopyFromReg 0x6ad628, Register:v2i64 %58, reduce.jl:638
0xaddae8: v2i64 = Register %58
0xae1ea0: i64 = Constant<0>
0xadf760: nxv2i64 = insert_subvector undef:nxv2i64, 0xae28c8, Constant:i64<0>, reduce.jl:638
0xad9c18: nxv2i64 = undef
0xae28c8: v2i64,ch = CopyFromReg 0x6ad628, Register:v2i64 %59, reduce.jl:638
0x9477b8: v2i64 = Register %59
0xae1ea0: i64 = Constant<0>
0xadeda0: nxv2i64 = insert_subvector undef:nxv2i64, 0xae22b0, Constant:i64<0>, reduce.jl:638
0xad9c18: nxv2i64 = undef
0xae22b0: v2i64,ch = CopyFromReg 0x6ad628, Register:v2i64 %60, reduce.jl:638
0xada7e0: v2i64 = Register %60
0xae1ea0: i64 = Constant<0>
0xadf968: nxv2i64 = insert_subvector undef:nxv2i64, 0xad9d50, Constant:i64<0>, reduce.jl:638
0xad9c18: nxv2i64 = undef
0xad9d50: v2i64,ch = CopyFromReg 0x6ad628, Register:v2i64 %61, reduce.jl:638
0xae4fa0: v2i64 = Register %61
0xae1ea0: i64 = Constant<0>
0xae5688: i64 = Constant<1>
In function: julia_mapreduce_impl_277
First part of the backtrace in GDB:
(gdb) bt
#0 0x0000400000132bec in raise () from /lib64/libc.so.6
#1 0x000040000012096c in abort () from /lib64/libc.so.6
#2 0x0000400001199c60 in llvm::report_fatal_error(llvm::Twine const&, bool) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#3 0x0000400001199d98 in llvm::report_fatal_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) ()
from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#4 0x00004000019c2698 in llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#5 0x00004000019c5320 in llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#6 0x0000400003017994 in (anonymous namespace)::AArch64DAGToDAGISel::Select(llvm::SDNode*) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#7 0x00004000019c1310 in llvm::SelectionDAGISel::DoInstructionSelection() () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#8 0x00004000019c7fa8 in llvm::SelectionDAGISel::CodeGenAndEmitDAG() () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#9 0x00004000019cab30 in llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#10 0x00004000019cc414 in llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) [clone .part.869] () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#11 0x00004000015b7ad4 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#12 0x00004000013705c4 in llvm::FPPassManager::runOnFunction(llvm::Function&) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#13 0x0000400001370d08 in llvm::FPPassManager::runOnModule(llvm::Module&) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#14 0x000040000136fa54 in llvm::legacy::PassManagerImpl::run(llvm::Module&) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#15 0x00004000004741d0 in JuliaOJIT::CompilerT::operator() (this=0x0, M=...) at /buildworker/worker/package_linuxaarch64/build/src/jitlayers.cpp:612
#16 0x0000400002b0f394 in llvm::orc::IRCompileLayer::emit(std::unique_ptr<llvm::orc::MaterializationResponsibility, std::default_delete<llvm::orc::MaterializationResponsibility> >, llvm::orc::ThreadSafeModule)
() from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#17 0x0000400002b197fc in llvm::orc::BasicIRLayerMaterializationUnit::materialize(std::unique_ptr<llvm::orc::MaterializationResponsibility, std::default_delete<llvm::orc::MaterializationResponsibility> >) ()
from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#18 0x0000400002af0fb0 in llvm::orc::ExecutionSession::materializeOnCurrentThread(std::unique_ptr<llvm::orc::MaterializationUnit, std::default_delete<llvm::orc::MaterializationUnit> >, std::unique_ptr<llvm::orc::MaterializationResponsibility, std::default_delete<llvm::orc::MaterializationResponsibility> >) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#19 0x0000400002aef2e4 in std::_Function_handler<void (std::unique_ptr<llvm::orc::MaterializationUnit, std::default_delete<llvm::orc::MaterializationUnit> >, std::unique_ptr<llvm::orc::MaterializationResponsibility, std::default_delete<llvm::orc::MaterializationResponsibility> >), void (*)(std::unique_ptr<llvm::orc::MaterializationUnit, std::default_delete<llvm::orc::MaterializationUnit> >, std::unique_ptr<llvm::orc::MaterializationResponsibility, std::default_delete<llvm::orc::MaterializationResponsibility> >)>::_M_invoke(std::_Any_data const&, std::unique_ptr<llvm::orc::MaterializationUnit, std::default_delete<llvm::orc::MaterializationUnit> >&&, std::unique_ptr<llvm::orc::MaterializationResponsibility, std::default_delete<llvm::orc::MaterializationResponsibility> >&&) ()
from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#20 0x0000400002aefa98 in llvm::orc::ExecutionSession::dispatchOutstandingMUs() () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#21 0x0000400002af75cc in llvm::orc::ExecutionSession::OL_completeLookup(std::unique_ptr<llvm::orc::InProgressLookupState, std::default_delete<llvm::orc::InProgressLookupState> >, std::shared_ptr<llvm::orc::AsynchronousSymbolQuery>, std::function<void (llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr> >, llvm::DenseMapInfo<llvm::orc::JITDylib*>, llvm::detail::DenseMapPair<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr> > > > const&)>) ()
from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#22 0x0000400002af7ac4 in llvm::orc::InProgressFullLookupState::complete(std::unique_ptr<llvm::orc::InProgressLookupState, std::default_delete<llvm::orc::InProgressLookupState> >) ()
from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#23 0x0000400002ae8f7c in llvm::orc::ExecutionSession::OL_applyQueryPhase1(std::unique_ptr<llvm::orc::InProgressLookupState, std::default_delete<llvm::orc::InProgressLookupState> >, llvm::Error) ()
from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#24 0x0000400002aefd64 in llvm::orc::ExecutionSession::lookup(llvm::orc::LookupKind, std::vector<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags>, std::allocator<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags> > > const&, llvm::orc::SymbolLookupSet, llvm::orc::SymbolState, llvm::unique_function<void (llvm::Expected<llvm::DenseMap<llvm::orc::SymbolStringPtr, llvm::JITEvaluatedSymbol, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr>, llvm::detail::DenseMapPair<llvm::orc::SymbolStringPtr, llvm::JITEvaluatedSymbol> > >)>, std::function<void (llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr> >, llvm::DenseMapInfo<llvm::orc::JITDylib*>, llvm::detail::DenseMapPair<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr> > > > const&)>) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#25 0x0000400002af03f0 in llvm::orc::ExecutionSession::lookup(std::vector<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags>, std::allocator<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags> > > const&, llvm::orc::SymbolLookupSet const&, llvm::orc::LookupKind, llvm::orc::SymbolState, std::function<void (llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr> >, llvm::DenseMapInfo<llvm::orc::JITDylib*>, llvm::detail::DenseMapPair<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr> > > > const&)>) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#26 0x0000400002af074c in llvm::orc::ExecutionSession::lookup(std::vector<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags>, std::allocator<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags> > > const&, llvm::orc::SymbolStringPtr, llvm::orc::SymbolState) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
Contrary to #44263, LLVM can generated the code:
julia> @code_llvm debuginfo=:none Base.mapreduce_impl(identity, min, [1], 1, 1)
define i64 @julia_mapreduce_impl_186({}* nonnull align 16 dereferenceable(40) %0, i64 signext %1, i64 signext %2) #0 {
top:
%3 = alloca [1 x i64], align 8
%4 = add i64 %1, -1
%5 = bitcast {}* %0 to i64**
%6 = load i64*, i64** %5, align 8
%7 = getelementptr inbounds i64, i64* %6, i64 %4
%8 = load i64, i64* %7, align 8
%9 = add i64 %1, 1
%10 = add i64 %1, 253
%11 = add i64 %2, -3
%.not63 = icmp sgt i64 %10, %11
br i1 %.not63, label %L107, label %L30.lr.ph
L30.lr.ph: ; preds = %top
%12 = getelementptr inbounds [1 x i64], [1 x i64]* %3, i64 0, i64 0
%13 = bitcast {}* %0 to {}**
%14 = getelementptr inbounds {}*, {}** %13, i64 3
%15 = bitcast {}** %14 to i64*
br label %L30
L30: ; preds = %L98, %L30.lr.ph
%value_phi570 = phi i64 [ %8, %L30.lr.ph ], [ %value_phi21, %L98 ]
%value_phi469 = phi i64 [ %8, %L30.lr.ph ], [ %value_phi20, %L98 ]
%value_phi368 = phi i64 [ %8, %L30.lr.ph ], [ %value_phi19, %L98 ]
%value_phi267 = phi i64 [ %8, %L30.lr.ph ], [ %value_phi18, %L98 ]
%value_phi165 = phi i64 [ %9, %L30.lr.ph ], [ %40, %L98 ]
%value_phi64 = phi i64 [ %10, %L30.lr.ph ], [ %41, %L98 ]
%16 = call i64 @j_steprange_last_188(i64 signext %value_phi165, i64 signext 4, i64 signext %value_phi64) #0
%.not41 = icmp sgt i64 %value_phi165, %16
br i1 %.not41, label %L81, label %L47.preheader
L47.preheader: ; preds = %L30
%17 = load i64*, i64** %5, align 8
br label %L47
L47: ; preds = %L47, %L47.preheader
%value_phi9 = phi i64 [ %24, %L47 ], [ %value_phi267, %L47.preheader ]
%value_phi10 = phi i64 [ %28, %L47 ], [ %value_phi368, %L47.preheader ]
%value_phi11 = phi i64 [ %32, %L47 ], [ %value_phi469, %L47.preheader ]
%value_phi12 = phi i64 [ %21, %L47 ], [ %value_phi570, %L47.preheader ]
%value_phi13 = phi i64 [ %33, %L47 ], [ %value_phi165, %L47.preheader ]
%18 = add i64 %value_phi13, -1
%19 = getelementptr inbounds i64, i64* %17, i64 %18
%20 = load i64, i64* %19, align 8
%.not42 = icmp slt i64 %20, %value_phi12
%21 = select i1 %.not42, i64 %20, i64 %value_phi12
%22 = getelementptr inbounds i64, i64* %17, i64 %value_phi13
%23 = load i64, i64* %22, align 8
%.not43 = icmp slt i64 %23, %value_phi9
%24 = select i1 %.not43, i64 %23, i64 %value_phi9
%25 = add i64 %value_phi13, 1
%26 = getelementptr inbounds i64, i64* %17, i64 %25
%27 = load i64, i64* %26, align 8
%.not44 = icmp slt i64 %27, %value_phi10
%28 = select i1 %.not44, i64 %27, i64 %value_phi10
%29 = add i64 %value_phi13, 2
%30 = getelementptr inbounds i64, i64* %17, i64 %29
%31 = load i64, i64* %30, align 8
%.not45 = icmp slt i64 %31, %value_phi11
%32 = select i1 %.not45, i64 %31, i64 %value_phi11
%.not46 = icmp eq i64 %value_phi13, %16
%33 = add i64 %value_phi13, 4
br i1 %.not46, label %L81, label %L47
L81: ; preds = %L47, %L30
%value_phi18 = phi i64 [ %value_phi267, %L30 ], [ %24, %L47 ]
%value_phi19 = phi i64 [ %value_phi368, %L30 ], [ %28, %L47 ]
%value_phi20 = phi i64 [ %value_phi469, %L30 ], [ %32, %L47 ]
%value_phi21 = phi i64 [ %value_phi570, %L30 ], [ %21, %L47 ]
%34 = add i64 %value_phi64, 3
%35 = load i64, i64* %15, align 8
%36 = icmp slt i64 %34, 1
%37 = icmp sgt i64 %34, %35
%38 = or i1 %36, %37
br i1 %38, label %L96, label %L98
L96: ; preds = %L81
store i64 %34, i64* %12, align 8
%39 = call nonnull {}* @j_throw_boundserror_189({}* nonnull %0, [1 x i64]* nocapture readonly %3) #0
call void @llvm.trap()
unreachable
L98: ; preds = %L81
%40 = add i64 %value_phi165, 256
%41 = add i64 %value_phi64, 256
%.not = icmp sgt i64 %41, %11
br i1 %.not, label %L6.L107_crit_edge, label %L30
L103: ; preds = %L126, %middle.block, %L107
%merge = phi i64 [ %44, %L107 ], [ %68, %middle.block ], [ %72, %L126 ]
ret i64 %merge
L6.L107_crit_edge: ; preds = %L98
store i64 %34, i64* %12, align 8
br label %L107
L107: ; preds = %L6.L107_crit_edge, %top
%value_phi1.lcssa = phi i64 [ %40, %L6.L107_crit_edge ], [ %9, %top ]
%value_phi2.lcssa = phi i64 [ %value_phi18, %L6.L107_crit_edge ], [ %8, %top ]
%value_phi3.lcssa = phi i64 [ %value_phi19, %L6.L107_crit_edge ], [ %8, %top ]
%value_phi4.lcssa = phi i64 [ %value_phi20, %L6.L107_crit_edge ], [ %8, %top ]
%value_phi5.lcssa = phi i64 [ %value_phi21, %L6.L107_crit_edge ], [ %8, %top ]
%.not47 = icmp slt i64 %value_phi2.lcssa, %value_phi5.lcssa
%42 = select i1 %.not47, i64 %value_phi2.lcssa, i64 %value_phi5.lcssa
%.not48 = icmp slt i64 %value_phi4.lcssa, %value_phi3.lcssa
%43 = select i1 %.not48, i64 %value_phi4.lcssa, i64 %value_phi3.lcssa
%.not49 = icmp slt i64 %43, %42
%44 = select i1 %.not49, i64 %43, i64 %42
%.not50 = icmp sgt i64 %value_phi1.lcssa, %2
%45 = add i64 %value_phi1.lcssa, -1
%46 = select i1 %.not50, i64 %45, i64 %2
%.not51 = icmp slt i64 %46, %value_phi1.lcssa
br i1 %.not51, label %L103, label %L126.preheader
L126.preheader: ; preds = %L107
%47 = load i64*, i64** %5, align 8
%48 = add i64 %46, 1
%49 = sub i64 %48, %value_phi1.lcssa
%min.iters.check = icmp ult i64 %49, 8
br i1 %min.iters.check, label %L126, label %vector.ph
vector.ph: ; preds = %L126.preheader
%n.vec = and i64 %49, -8
%ind.end = add i64 %value_phi1.lcssa, %n.vec
%minmax.ident.splatinsert = insertelement <2 x i64> poison, i64 %44, i32 0
%minmax.ident.splat = shufflevector <2 x i64> %minmax.ident.splatinsert, <2 x i64> poison, <2 x i32> zeroinitializer
br label %vector.body
vector.body: ; preds = %vector.body, %vector.ph
%index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
%vec.phi = phi <2 x i64> [ %minmax.ident.splat, %vector.ph ], [ %63, %vector.body ]
%vec.phi103 = phi <2 x i64> [ %minmax.ident.splat, %vector.ph ], [ %64, %vector.body ]
%vec.phi104 = phi <2 x i64> [ %minmax.ident.splat, %vector.ph ], [ %65, %vector.body ]
%vec.phi105 = phi <2 x i64> [ %minmax.ident.splat, %vector.ph ], [ %66, %vector.body ]
%offset.idx = add i64 %value_phi1.lcssa, %index
%50 = add i64 %offset.idx, -1
%51 = getelementptr inbounds i64, i64* %47, i64 %50
%52 = bitcast i64* %51 to <2 x i64>*
%wide.load = load <2 x i64>, <2 x i64>* %52, align 8
%53 = getelementptr inbounds i64, i64* %51, i64 2
%54 = bitcast i64* %53 to <2 x i64>*
%wide.load106 = load <2 x i64>, <2 x i64>* %54, align 8
%55 = getelementptr inbounds i64, i64* %51, i64 4
%56 = bitcast i64* %55 to <2 x i64>*
%wide.load107 = load <2 x i64>, <2 x i64>* %56, align 8
%57 = getelementptr inbounds i64, i64* %51, i64 6
%58 = bitcast i64* %57 to <2 x i64>*
%wide.load108 = load <2 x i64>, <2 x i64>* %58, align 8
%59 = icmp slt <2 x i64> %wide.load, %vec.phi
%60 = icmp slt <2 x i64> %wide.load106, %vec.phi103
%61 = icmp slt <2 x i64> %wide.load107, %vec.phi104
%62 = icmp slt <2 x i64> %wide.load108, %vec.phi105
%63 = select <2 x i1> %59, <2 x i64> %wide.load, <2 x i64> %vec.phi
%64 = select <2 x i1> %60, <2 x i64> %wide.load106, <2 x i64> %vec.phi103
%65 = select <2 x i1> %61, <2 x i64> %wide.load107, <2 x i64> %vec.phi104
%66 = select <2 x i1> %62, <2 x i64> %wide.load108, <2 x i64> %vec.phi105
%index.next = add i64 %index, 8
%67 = icmp eq i64 %index.next, %n.vec
br i1 %67, label %middle.block, label %vector.body
middle.block: ; preds = %vector.body
%rdx.minmax.cmp = icmp slt <2 x i64> %63, %64
%rdx.minmax.select = select <2 x i1> %rdx.minmax.cmp, <2 x i64> %63, <2 x i64> %64
%rdx.minmax.cmp109 = icmp slt <2 x i64> %rdx.minmax.select, %65
%rdx.minmax.select110 = select <2 x i1> %rdx.minmax.cmp109, <2 x i64> %rdx.minmax.select, <2 x i64> %65
%rdx.minmax.cmp111 = icmp slt <2 x i64> %rdx.minmax.select110, %66
%rdx.minmax.select112 = select <2 x i1> %rdx.minmax.cmp111, <2 x i64> %rdx.minmax.select110, <2 x i64> %66
%rdx.shuf = shufflevector <2 x i64> %rdx.minmax.select112, <2 x i64> poison, <2 x i32> <i32 1, i32 undef>
%rdx.minmax.cmp113 = icmp slt <2 x i64> %rdx.minmax.select112, %rdx.shuf
%rdx.minmax.select114 = select <2 x i1> %rdx.minmax.cmp113, <2 x i64> %rdx.minmax.select112, <2 x i64> %rdx.shuf
%68 = extractelement <2 x i64> %rdx.minmax.select114, i32 0
%cmp.n = icmp eq i64 %49, %n.vec
br i1 %cmp.n, label %L103, label %L126
L126: ; preds = %L126, %middle.block, %L126.preheader
%value_phi25 = phi i64 [ %73, %L126 ], [ %ind.end, %middle.block ], [ %value_phi1.lcssa, %L126.preheader ]
%value_phi27 = phi i64 [ %72, %L126 ], [ %68, %middle.block ], [ %44, %L126.preheader ]
%69 = add i64 %value_phi25, -1
%70 = getelementptr inbounds i64, i64* %47, i64 %69
%71 = load i64, i64* %70, align 8
%.not52 = icmp slt i64 %71, %value_phi27
%72 = select i1 %.not52, i64 %71, i64 %value_phi27
%.not53 = icmp eq i64 %value_phi25, %46
%73 = add i64 %value_phi25, 1
br i1 %.not53, label %L103, label %L126
}
Ok, the title of the issue may not be accurate: minimum([1])
works on the latest nightly:
julia> versioninfo()
Julia Version 1.9.0-DEV.106
Commit 394af38501 (2022-02-28 23:39 UTC)
Platform Info:
OS: Linux (aarch64-unknown-linux-gnu)
CPU: 50 × unknown
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, a64fx)
Threads: 1 on 50 virtual cores
Environment:
JULIA_LLVM_ARGS = --aarch64-sve-vector-bits-min=512
However BenchmarkTools.asciihist([1])
still crashes with the same error reported above, and the code reported in the above issue was a reduced reproducer in Julia v1.7 of the crash I always get with the ASCII histogram generated by BenchmarkTools.@benchmark
.
Try to reproduce on LLVM master with llc
and then file the reproducer upstream?
I still haven't got the time to (re)build Julia with LLVM master because compiling anything on A64FX is excruciatingly slow (and compiling LLVM even more so), but the error message looks like llvm/llvm-project#53331
It appears @benchmark
finally works on Julia master
with LLVM 14 (although llvm/llvm-project#53331 is still open):
$ JULIA_LLVM_ARGS="--aarch64-sve-vector-bits-min=512" ./julia -q
julia> using BenchmarkTools
julia> function sumsimd(x)
s = zero(eltype(x))
@simd for xi in x
s += xi
end
s
end
sumsimd (generic function with 1 method)
julia> @benchmark sumsimd(x) setup=(x = [randn(Float64) for _ in 1:1_000_000])
BenchmarkTools.Trial: 94 samples with 1 evaluation.
Range (min … max): 185.532 μs … 250.102 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 206.148 μs ┊ GC (median): 0.00%
Time (mean ± σ): 205.783 μs ± 12.503 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▄▁ █ ▃ ▁ ▆ ▁
▇▄▁▄▄███▄▆▇▆▄▁▄▄▄▆▆█▁▆█▇▆█▆▇▄█▆█▁▄▆▆▆▄▁▁▁▇▁▁▁▁▁▁▁▁▄▁▁▁▁▄▁▁▁▁▄ ▁
186 μs Histogram: frequency by time 242 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark sumsimd(x) setup=(x = [randn(Float32) for _ in 1:1_000_000])
BenchmarkTools.Trial: 92 samples with 1 evaluation.
Range (min … max): 78.991 μs … 88.341 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 80.206 μs ┊ GC (median): 0.00%
Time (mean ± σ): 80.670 μs ± 1.603 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▂ ▁█
█▇▁▆▃██▇▆▆▇▇▇▆▄▆▃▄▆▄▄▁▁▃▄▁▄▁▁▄▁▃▁▃▄▁▃▁▁▃▃▁▁▁▃▁▃▃▁▁▁▁▁▁▁▁▁▁▃ ▁
79 μs Histogram: frequency by time 85.6 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark sumsimd(x) setup=(x = [randn(Float16) for _ in 1:1_000_000])
BenchmarkTools.Trial: 92 samples with 1 evaluation.
Range (min … max): 41.970 μs … 47.470 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 42.961 μs ┊ GC (median): 0.00%
Time (mean ± σ): 43.265 μs ± 1.025 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▆▄▆▄▆█ ▂
▆▄▆▆▆▁▆█████████████▄█▆▆▁▁▁▄▁▁▁▁▄▄▄▁▁▁▁▄▆▁▁▁▆▁▄▁▁▁▆▁▁▄▁▄▁▁▄ ▁
42 μs Histogram: frequency by time 46.1 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> versioninfo()
Julia Version 1.9.0-DEV.809
Commit 9b83dd8920 (2022-06-19 19:31 UTC)
Platform Info:
OS: Linux (aarch64-unknown-linux-gnu)
CPU: 48 × unknown
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.3 (ORCJIT, a64fx)
Threads: 1 on 48 virtual cores
Performance is same as #40308 (comment), which is good, but issue #44263 is still relevant.