[FEATURE] Merge dataframes with different columns
Opened this issue · 2 comments
Is your feature request related to a problem? Please describe.
I'd like to merge DataFrames with different columns.
Describe the solution you'd like
I'd like to have a df1.merge(df2)
way to automatically merge two dataframes, even if a column is in df1 but not in df2, filling it with
Describe alternatives you've considered
Here is a snippet from @lmeyerov I found (and completed) on issue #15, that makes just what I want :
function unionDFs(a, b, fill='n/a') {
// Merge two dataframes with different columns
const aCols = a.listColumns(); // this line was missing on lmeyerov's original snippet
const bCols = b.listColumns(); // this line was missing on lmeyerov's original snippet
const aNeeds = b.listColumns().filter((v) => aCols.indexOf(v) === -1);
const bNeeds = a.listColumns().filter((v) => bCols.indexOf(v) === -1);
const a2 = aNeeds.reduce((df, name) => df.withColumn(name, () => fill), a);
const b2 = bNeeds.reduce((df, name) => df.withColumn(name, () => fill), b);
return a2.union(b2);
}
A better implementation of the unionDFs
snippet :
DataFrame.prototype.merge = function(df2, fill = null) {
// Merge two dataframes with different columns
const aCols = df2.listColumns();
const bCols = this.listColumns();
const aNeeds = this.listColumns().filter((v) => aCols.indexOf(v) === -1);
const bNeeds = df2.listColumns().filter((v) => bCols.indexOf(v) === -1);
const a2 = aNeeds.reduce((df, name) => df.withColumn(name, () => fill), df2);
const b2 = bNeeds.reduce((df, name) => df.withColumn(name, () => fill), this);
return a2.union(b2);
}
This bug can be particularly insidious - if one dataframe's columns are a subset of another's, the behavior is inconsistent.
- If you concatenate the df with fewer columns to the one with all columns, the union will execute without issue.
- If you concatenate the df with all columns to the one with fewer, then it will fail.
This error is due to the use of an incorrect column comparison. It is still an issue in master:
Line 15 in aebcd1b