VIDA-NYU/bdi-kit

New data integration API

Closed this issue · 2 comments

We'll use this issue to track the implementation of the new bdi-kit API.

Implementation:

  • bdi.match_columns()
  • bdi.top_matches()
  • bdi.preview_domains()
  • bdi.preview_value_matches()
  • bdi.update_matches()
  • bdi.match_values()
  • bdi.materialize_mapping()

Documentation:

  • Make sure that all public functions have pydocs
  • Make sure that readthedocs is generating API documentation
  • Create a notebook demonstrating the API usage

I suggest some naming refactoring before integrating this branch into the devel. For example:
Instead of bdikit.mapping_algorithms.column_mapping.algorithms we could have bdikit.matching.column.algorithms; the same for value matching: bdikit.matching.value.algorithms. For a naming standard, I would suggest that we stick to the schema matching/mapping conventions from a textbook book like:
https://link.springer.com/book/10.1007/978-3-642-16518-4

So far algorithms is a Python module, but we will have to split it as we add new methods. In that case, we will have the package bdikit.matching.column.algorithms with modules like algorithm_type1.py, algorithm_type2.py, etc. Otherwise, we will have a lot of code in a single module which is more difficult to maintain. I think it's better to do this refactoring in another PR.