Bus error on sparc64 in smoc code
df7cb opened this issue · 2 comments
I'm only filing this for reference since I was curious and poked around a bit with it. I don't expect any fixes, just writing it down in case I get curious again in the future. :)
On Debian's unofficial sparc64 architecture, pgsphere is failing the moc regression tests:
2023-11-16 14:17:18.560 UTC [658036] LOG: Serverprozess (PID 660738) wurde von Signal 10 beendet: Bus-Zugriffsfehler
2023-11-16 14:17:18.560 UTC [658036] DETAIL: Der fehlgeschlagene Prozess führte aus: select '1/1'::smoc;
(gdb) bt
#0 order_break (outputs=std::vector of length 2, capacity 2 = {...}, x=..., max_order=1) at src/process_moc.cpp:697
#1 0xfff8000113b33f98 in ascii_out (m_s="", s=0x7fefff73c98 "", moc=0x10000ac1128, begin=72, end=88, entry_size=16)
at src/process_moc.cpp:749
#2 0xfff8000113b344d0 in create_moc_out_context (moc=0x10000ac1128, end=88,
error_out=0xfff8000113b0ec14 <moc_error_out>) at src/process_moc.cpp:791
SIGBUS means unaligned access:
(gdb) p x
$1 = (const moc_interval &) @0x10000ac1174: {first = 72057594037927936, second = 144115188075855872}
(gdb) l
692 order_break(output_map & outputs, const moc_interval & x, int max_order)
693 {
694 int order;
695 hpint64 mask = 0;
696 mask = ~mask ^ 3;
697 hpint64 first = x.first >> 2 * (29 - max_order);
698 hpint64 second = x.second >> 2 * (29 - max_order);
699 for (order = max_order; order > 0; --order, first >>= 2, second >>= 2)
700 {
701 if (second == first)
(gdb) f 1
#1 0xfff8000113b33f98 in ascii_out (m_s="", s=0x7fefff73c98 "", moc=0x10000ac1128, begin=72, end=88, entry_size=16)
at src/process_moc.cpp:749
749 order_break(outputs, *interval_ptr(moc, j), order);
(gdb) l
744 {
745 // page bumps
746 int32 mod = (j + entry_size) % PG_TOAST_PAGE_FRAGMENT;
747 if (mod > 0 && mod < entry_size)
748 j += entry_size - mod;
749 order_break(outputs, *interval_ptr(moc, j), order);
750 }
751 for (int k = 0; k <= order; ++k)
752 {
753 const moc_map & output = outputs[k];
As seen above, the x address is only 4-aligned, not 8.
The reason is somewhere in *interval_ptr(moc, j) and how the offsets are computed.
static
moc_interval* interval_ptr(Smoc* moc, int32 offset)
{
return data_as<moc_interval>(detoasted_offset(moc, offset));
}
static
char* detoasted_offset(Smoc* moc, size_t offset = 0)
{
return offset + reinterpret_cast<char*>(moc) + offsetof(Smoc, version);
}
/*
* this particular layout should prevent the compiler from introducing unwanted
* padding
*/
typedef struct
{
char vl_len_[4]; /* size of PostgreSQL variable-length data */
uint16 version; /* version of the 'toasty' MOC data structure */
uint8 order; /* actual MOC order */
uint8 depth; /* depth of B+-tree */
hpint64 first; /* first Healpix index in set */
hpint64 last; /* 1 + (last Healpix index in set) */
hpint64 area; /* number of covered Healpix cells */
int32 tree_begin; /* start of B+ tree, past the options block */
int32 data_begin; /* start of Healpix intervals, bypassing the tree */
int32 data[1]; /* no need to optimise for empty MOCs */
} Smoc;
My suspicion is that the offsetof should rather be hooked on data than version, and that the data field should be hpint64.
Since I don't want to redesign the Smoc struct, I'm stopping here.
As said above, I don't expect any fixes - sparc64 is an old architecture only barely kept alive, so I'll close this immediately again.
I'd leave the issue open at least, but, yeah, I doubt there's much interest in fixing this issue.