Series.to_json(orient='records') does not return records-based JSON
klenium opened this issue · 3 comments
klenium commented
df = ks.DataFrame([['a', 'b'], ['c', 'd']], columns=['col 1', 'col 2'])
def add_json(row):
row['serialized_row_content'] = row.to_json()
return row
df = df.apply(add_json, axis = 1)
print(df)
col 1 col 2 serialized_row_content
0 a b {"col 1":"a","col 2":"b"}
1 c d {"col 1":"c","col 2":"d"}
That works as expected. The documentation says:
orient str, default ‘records’
It should be always ‘records’ for now.
So if instead of row.to_json()
I write row.to_json(orient = 'records')
, the output must be the same. But it's not:
col 1 col 2 serialized_row_content
0 a b ["a","b"]
1 c d ["c","d"]
Which is rather the values format from Pandas.
klenium commented
Very interesting, I don't see the reason for this behavior in its source code. :)
klenium commented
row['type'] = str(type(row))
-> <class 'pandas.core.series.Series'>
Well that's unexpected, why is a Pandas Series used there?
Also why wouldn't it return records-based JSON uh.
klenium commented
The same applies to Pandas on Spark. If I follow the documentation and call to_json('records')
, then the output is None thus I get errors.