capitalone/datacompy

Fugue support for extra helper functions from core

fdosani opened this issue · 2 comments

Currently there are some helper functions as part of the core Pandas code which I think are generally very helpful.
We need to spend some time exploring those and seeing which ones can be mirrored/included via the Fugue implementation.

This is just a list of most of the functions. Not all will make sense to move over. But we should investigate which ones make sense to:

  • df1_unq_columns (#217)
  • df2_unq_columns (#217)
  • intersect_columns (#217)
  • all_columns_match (#219)
  • all_rows_overlap (#244)
  • count_matching_rows (#294)
  • intersect_rows_match
  • matches (is_match)
  • subset
  • sample_mismatch
  • all_mismatch
  • columns_equal
  • compare_string_and_date_columns

Need to look into a bit more. low prio for now.

  • get_merged_columns
  • temp_column_name
  • calculate_max_diff
  • generate_id_within_group

@goodwanghan @kvnkho FYI, no pressure to contribute, but something in our backlog I'm thinking to ensure full parity in terms of function etc.

Sounds good, let's chat about it