Sort and finalize can reorder the fields in a segment and crash with static schema
Closed this issue · 0 comments
Describe the bug
Sort and finalize uses merge_descriptors to generate the field descriptor for the newly added segment. After that during the merging phase it creates an Aggregator and strips the fields from the merged descriptor leaving the aggregator to create the field collection. This later leads to crash on write since the segment field descriptor is different than the one in the header.
In the example below merge_descriptors would order the fields in the field descriptor in order of appearance: index - 0, a - 1, b - 2. The final sorted segment adds rows one by one in the order of their index, thus column b will be reported first and it will have index 1 after that a will have index 2.
Steps/Code to Reproduce
import numpy as np
import pandas as pd
import arcticdb
ac = arcticdb.Arctic("lmdb://test")
lib = ac.get_library("test", create_if_missing=True)
idx1 = pd.DatetimeIndex([
pd.Timestamp("2024-01-02")
])
df1 = pd.DataFrame({
"a": np.array([1], dtype="float"),
"b": np.array([22250], dtype="int64")
}, index=idx1)
b = np.array([-53979, -53973], dtype="int64")
idx = pd.DatetimeIndex([
pd.Timestamp("2024-01-03"),
pd.Timestamp("2024-01-01")
])
df2 = pd.DataFrame({"b": b}, index=idx)
lib.write("sym", df1, staged=True)
lib.write("sym", df2, staged=True)
lib.sort_and_finalize_staged_data("sym")
lib.read("sym")
Expected Results
Create the right field descriptor and do not throw.
OS, Python Version and ArcticDB Version
Python: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
OS: Windows-10-10.0.22631-SP0
ArcticDB: dev
Backend storage used
No response
Additional Context
No response