ChenHuajun/pg_roaringbitmap

bitmap or calculation may cause postgres crash

Closed this issue · 2 comments

Problem

Bitmap or calculation may cause postgres crash

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: admin olap_pre [l'.
Program terminated with signal 4, Illegal instruction.
#0  bitset_set_list (bitset=0x2cdd9a0, list=0x29bcf98, length=<optimized out>) at roaring.c:2733
2733	        __asm volatile(
Missing separate debuginfos, use: debuginfo-install audit-libs-2.4.1-5.el7.x86_64 bzip2-libs-1.0.6-13.el7.x86_64 cyrus-sasl-lib-2.1.26-20.el7_2.x86_64 elfutils-libs-0.163-3.el7.x86_64 glibc-2.17-105.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.13.2-10.el7.x86_64 libattr-2.4.46-12.el7.x86_64 libcap-2..el7.x86_64 libcurl-7.29.0-35.el7.centos.x86_64 libgcc-4.8.5-11.el7.x86_64 libgcrypt-1.5.3-12.el7_1.1.x86_64 libgpg-error-1.12-3.el7.x86_64 libicu-50.1.2-174 libselinux-2.2.2-6.el7.x86_64 libssh2-1.4.3-10.el7.x86_64 libstdc++-4.8.5-11.el7.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64 nspr-4.10.8-2.el7_1.x86_64 nss-3.19-3.16.2.3-13.el7_1.x86_64 nss-util-3.19.1-4.el7_1.x86_64 openldap-2.4.40-13.el7.x86_64 openssl-libs-1.0.1e-60.el7.x86_64 pam-1.1.8-18.el7.x86_64 systemd-lib0.9.2-1.rhel7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  bitset_set_list (bitset=0x2cdd9a0, list=0x29bcf98, length=<optimized out>) at roaring.c:2733
#1  0x00007fc1ab22b3f1 in array_array_container_inplace_union (src_1=src_1@entry=0x7fc1a9d35cb0, src_2=0x7fc1a8d886c0, dst=dst@entry=0x7fff852087b0) at roar
#2  0x00007fc1ab2395ff in container_ior (result_type=0x7fff852087ac "\002\002\002\002PPө\301\177", type2=2 '\002', c2=<optimized out>, type1=2 '\002', c1=<o
#3  roaring_bitmap_or_inplace (x1=x1@entry=0x7fc1a9d2f338, x2=x2@entry=0x7fc1a8d82c40) at roaring.c:8347
#4  0x00007fc1ab24518c in rb_or_trans (fcinfo=<optimized out>) at roaringbitmap.c:1486
#5  0x00000000005fcba2 in advance_transition_function (aggstate=aggstate@entry=0x27ed228, pertrans=pertrans@entry=0x280ce08, pergroupstate=0x7fc1a9d2f1e8) a
#6  0x00000000005fe814 in advance_aggregates (aggstate=aggstate@entry=0x27ed228, pergroup=pergroup@entry=0x0, pergroups=0x28b6ae0) at nodeAgg.c:1113
#7  0x00000000005fed60 in agg_fill_hash_table (aggstate=0x27ed228) at nodeAgg.c:2553
#8  ExecAgg (pstate=0x27ed228) at nodeAgg.c:2151

Env

  • CentOS 7.2
  • PostgreSQL 10.7
  • pg_roraingbitmap 0.5.0
  • Intel(R) Xeon(R) CPU E5-2430 v2 @ 2.50GHz

Reason

Roaring.c's bitset_set_list () uses " shrx% [shift],% [pos],% [offset] \ n " assembly instructions to optimize performance. If you compile pg_roraingbitmap in an environment that supports this instruction, and then run it on a machine that does not support this instruction, it may cause a crash.

Reproduction

with a as(
  select rb_build_agg(id) bitmap from generate_series(1,3000)id
)
select rb_or(bitmap,bitmap) from a;

Solution

Rebuild pg_roaringbitmap in runing environment.
Or turn off optimization features that rely on CPU instructions as following:

Makefile:

roaringbitmap.o: override CFLAGS += -march=native -std=c99 -Wno-error=maybe-uninitialized
=>
roaringbitmap.o: override CFLAGS += -std=c99 -Wno-error=maybe-uninitialized

That sounds right to me.

Fixed by c32cadf