sjwhitworth/golearn

Integrating DataFrame-go with goLearn

Yushgoel opened this issue · 5 comments

There is a golang version of pandas (biggest data processing library in python) being developed here: https://github.com/rocketlaunchr/dataframe-go

This library allows a much easier pipeline for data handling. Using this, it would be much easier to do data cleaning and feature engineering inside golang, before using golearn (since Golearn Fixed Data Grid isn't designed for handling so much data processing).

Since Fixed Data Grid is already very deeply integrated into golearn, it would not be feasible to change everything to support dataframe-go. Instead, could we build a function that converts the dataframe-go object into a golearn Fixed Data Grid? That way, the two libraries would be easily integrated, with minimal changes.

Sounds like a great idea - it might be possible to phase out the FixedDataGrid assuming that dataframe-go supports similar(ish) operations. It made sense to me at the time, but the pandas/dataframe approach has clearly won out.

By phasing out the FixedDataGrid , do you mean that we eventually completely replace it with the dataframe approach?

The FixedDataGrid interface is heavily integrated into golearn, so phasing it out would have required a major change across the library. That's why I just suggested a converter function, that essentially just converts the dataframe object to FixedDataGrid, and then can be used with the rest of the library. Although eventually, having easy integration with dataframe would be better. What do you think?

I am needing something in the other direction. Have you thought about a function to convert a FixedDataGrid to a Go dataframe?

My use case (because there may be a better solution):
I'm a newcomer to golang and golearn (though not to ML) and are not quite sure how I can use MapOverRows to iterate over and plot FixedDataGrids. I have used golearn with KNNs and I want to visualise the results.