ctable.fromdataframe doesn't preserve datetime indices.
Kyrish opened this issue · 0 comments
Kyrish commented
Let's consider following case:
date_range = pd.date_range(start='2017-01-01', end='2017-01-03', freq='D')
cols = ['A', 'B', 'C', 'D', 'E']
df = pd.DataFrame(index=date_range, data=np.random.rand(len(date_range), len(cols)), columns=cols)
df
A | B | C | D | E |
---|---|---|---|---|
2017-01-01 | 0.140845 | 0.619423 | 0.697564 | 0.975552 |
2017-01-02 | 0.327959 | 0.657860 | 0.234037 | 0.705020 |
2017-01-03 | 0.065697 | 0.543179 | 0.517165 | 0.692339 |
df_table = bcolz.ctable.fromdataframe(df)
df_table
ctable((3,), [('A', '<f8'), ('B', '<f8'), ('C', '<f8'), ('D', '<f8'), ('E', '<f8')])
nbytes: 120; cbytes: 80.00 KB; ratio: 0.00
cparams := cparams(clevel=5, shuffle=1, cname='blosclz', quantize=0)
[( 0.14084531, 0.61942324, 0.69756392, 0.9755517, 0.41981526)
( 0.3279591, 0.65785982, 0.23403727, 0.70502042, 0.65651498)
( 0.06569704, 0.54317941, 0.51716506, 0.69233858, 0.34075387)]
and if we try to retrieve dataframe from table we get:
df_ = df_table.todataframe()
df_
A | B | C | D | E |
---|---|---|---|---|
0 | 0.140845 | 0.619423 | 0.697564 | 0.975552 |
1 | 0.327959 | 0.657860 | 0.234037 | 0.705020 |
2 | 0.065697 | 0.543179 | 0.517165 | 0.692339 |
A possible workaround is to duplicate index column to regular one:
df['_index'] = df.index
df_table = bcolz.ctable.fromdataframe(df)
df_table.attrs['_index'] = '_index'
In this case, Bcolz preserves datetime type. And after retrieving you can assign your index back.