openvenues/jpostal

Bindings for near-duplicate detection and address deduping

iantabolt opened this issue · 3 comments

Reading over openvenues/libpostal#294 we are very interested in making use of some of these new features. The question is which methods would be best to expose in the jpostal bindings?

I saw that you mentioned in that libostal PR that you have pypostal bindings already to use the new API from lieu, but I can't seem to find this. If you could point me towards these python bindings, I'd be happy to port them over to jpostal and open a PR.

Many many thanks!!

I am working on this now. I am starting with adding dedupe.h bindings. To answer my own question for reference, the python bindings are found at https://github.com/openvenues/pypostal/blob/master/postal/pydedupe.c and https://github.com/openvenues/pypostal/blob/master/postal/dedupe.py

Hey, sorry, have been super booked lately working on a voting rights restoration campaign for November. That's exciting, and yes, those are the files to look at on the pypostal side. Also keep in mind the concurrency/synchronization stuff we do for the other jpostal bindings (it's just on the Java side so should be more familiar to folks, see e.g. https://github.com/openvenues/jpostal/blob/master/src/main/java/com/mapzen/jpostal/AddressExpander.java for details).

Absolutely no problem. Thanks for the heads up and all the awesome work you've already done!

I am more or less basing it exactly off of other code that already exists in AddressExpander and AddressParser so it's pretty straightforward. Once I finish the bindings for the fuzzy and toponym duplicate methods then I'll open the PR. This is my WIP branch https://github.com/openvenues/jpostal/compare/master...iantabolt:dedupe-bindings?expand=1