skrub-data/skrub

Enable setting the Joiner threshold in kilometers when joining on (latitude, longitude) columns

Opened this issue · 0 comments

Problem Description

when joining on coordinates ATM the quality of matches is assessed with euclidean distances between pairs of (lat, long) coordinates.
it would be much easier and give better results if we could compute the geodesic distances and the user would say match airports to the closest weather station but only if they are less than 50 km away

Feature Description

not sure, maybe add a parameter to the joiner or adapt the existing ones or add a new joiner

Alternative Solutions

No response

Additional Context

No response