d6t/d6tflow

TaskCSVPandas loses index in the csv file output file

wj-c opened this issue · 3 comments

wj-c commented

An example:

import d6tflow
import pandas as pd
import numpy as np

class test(d6tflow.tasks.TaskCSVPandas):

    def run(self):
        data = np.random.randn(2, 2)
        data = pd.DataFrame(data, index=['a', 'b'], columns=['c', 'd'])
        self.save(data)

d6tflow.run(test())

The actual data in the output csv file is:
c,d
1.5490553923182304,-0.3279984496021263
0.7946535471877705,0.5790784973358706

However, the expected data in the output file should be:
,c,d
a,0.20200720089562157,0.5134288778567592
b,2.918867471040273,-0.5393324706416279

The index is lost when using TaskCSVPandas to save data.

wj-c commented

Is the repository actively maintained?

Yes that's a feature, you can use self.save(data,index=True) if you want to keep index. Or better still use TaskPqPandas instead of TaskCSVPandas.

And yes is actively maintained