Blosc/bcolz

ctable.fromdataframe doesn't preserve datetime indices.

Kyrish opened this issue · 0 comments

Let's consider following case:

date_range = pd.date_range(start='2017-01-01', end='2017-01-03', freq='D')
cols = ['A', 'B', 'C', 'D', 'E']
df = pd.DataFrame(index=date_range, data=np.random.rand(len(date_range), len(cols)), columns=cols)
df
A B C D E
2017-01-01 0.140845 0.619423 0.697564 0.975552
2017-01-02 0.327959 0.657860 0.234037 0.705020
2017-01-03 0.065697 0.543179 0.517165 0.692339
df_table = bcolz.ctable.fromdataframe(df)
df_table
ctable((3,), [('A', '<f8'), ('B', '<f8'), ('C', '<f8'), ('D', '<f8'), ('E', '<f8')])
  nbytes: 120; cbytes: 80.00 KB; ratio: 0.00
  cparams := cparams(clevel=5, shuffle=1, cname='blosclz', quantize=0)
[( 0.14084531,  0.61942324,  0.69756392,  0.9755517,  0.41981526)
 ( 0.3279591,  0.65785982,  0.23403727,  0.70502042,  0.65651498)
 ( 0.06569704,  0.54317941,  0.51716506,  0.69233858,  0.34075387)]

and if we try to retrieve dataframe from table we get:

df_ = df_table.todataframe()
df_
A B C D E
0 0.140845 0.619423 0.697564 0.975552
1 0.327959 0.657860 0.234037 0.705020
2 0.065697 0.543179 0.517165 0.692339

A possible workaround is to duplicate index column to regular one:

df['_index'] = df.index
df_table = bcolz.ctable.fromdataframe(df)
df_table.attrs['_index'] = '_index'

In this case, Bcolz preserves datetime type. And after retrieving you can assign your index back.