TaskCSVPandas loses index in the csv file output file

Question

TaskCSVPandas loses index in the csv file output file

wj-c opened this issue 5 years ago · 3 comments

An example:

import d6tflow
import pandas as pd
import numpy as np

class test(d6tflow.tasks.TaskCSVPandas):

    def run(self):
        data = np.random.randn(2, 2)
        data = pd.DataFrame(data, index=['a', 'b'], columns=['c', 'd'])
        self.save(data)

d6tflow.run(test())

The actual data in the output csv file is:
c,d
1.5490553923182304,-0.3279984496021263
0.7946535471877705,0.5790784973358706

However, the expected data in the output file should be:
,c,d
a,0.20200720089562157,0.5134288778567592
b,2.918867471040273,-0.5393324706416279

The index is lost when using TaskCSVPandas to save data.

Answer 1 · 2019-12-29T09:58:47.000Z

Is the repository actively maintained?

Answer 2 · 2019-12-31T21:03:29.000Z

Yes that's a feature, you can use self.save(data,index=True) if you want to keep index. Or better still use TaskPqPandas instead of TaskCSVPandas.

Answer 3 · 2019-12-31T21:03:43.000Z

And yes is actively maintained