huggingface/datasets

Adding column with dict struction when mapping lead to wrong order

Opened this issue · 0 comments

Describe the bug

in map() function, I want to add a new column with a dict structure.

def map_fn(example):
  example['text'] = {'user': ..., 'assistant': ...}
  return example

However this leads to a wrong order {'assistant':..., 'user':...} in the dataset.
Thus I can't concatenate two datasets due to the different feature structures.
Here is a minimal reproducible example
This seems an issue in low level pyarrow library instead of datasets, however, I think datasets should allow concatenate two datasets actually in the same structure.

Steps to reproduce the bug

Here is a minimal reproducible example

Expected behavior

two datasets could be concatenated.

Environment info

N/A