alexharri/beygla

Beygla for street addresses

Closed this issue · 7 comments

Hi!

Is there any way you could use Beygla's way of declining names to decline Icelandic street addresses? It would be awesome to be able to do something like

import { applyCaseToAddress } from 'beygla'

applyCaseToAddress('ef', 'Dúfnahólar 10') // RESULT: Dúfnahóla 10

There is a list available of all Icelandic street addresses in Staðfangaskrá 👀

Thanks for your time ☺️

We should be able to do something like this. We'll have to see how well the BÍN data covers address declensions (I can generate some statistics later about the coverage).

I don't want to increase the bundle size for beygla, so we can create a new submodule beygla/addresses which would expose the same applyCase method:

import { applyCase as applyCaseToAddress } from 'beygla/addresses'

applyCaseToAddress('ef', 'Dúfnahólar 10');

And I imagine, given #14, that you'd be interested in guaranteed correctness for address declension. If that's the case, I figure we'd add beygla/addresses/strict (or beygla/addresses-strict).

How does this approach sound @oddsson?

Guaranteed correctness is less of an issue for street addresses than for names, since we are strictly using Icelandic street names in our use case. Otherwise, this approach sounds good to me ☺️

Hey @oddsson, I've been quite busy (finishing a new article for alexharri.com + personal stuff), but I should be able to start taking a look this weekend.

@oddsson Here is information after including addresses in the pipeline:

Running 'group-addresses'

4696 of 11595 addresses (40.50%) in 'address-cases.csv' are not present in 'words.csv' and are not included.

Found 2 addresses with multiple genders. They are omitted from Beygla.

It seems that BÍN data covers about 60% of addresses. Here is the "excluded addresses" list: excluded-addresses.json

After some random sampling, most of these places either:

  • don't seem to be addresses, or
  • are not listed on Google Maps

When looking at the list of "included addresses", it seems to cover quite obscure places, such as:

  • Sela-Kirkjuból
  • Hornafjarðarflugvöllur
  • Broddadalsá

Either way, the coverage is more impressive than I expected.

I'll have an update on the expected bundle size of beygla/addresses soon.

Bundle size results are very good!

Output size:
        beygla.js
                Minified: 12.65 kB
                Gzipped: 4.7 kB
        beygla.esm.js
                Minified: 12.58 kB
                Gzipped: 4.69 kB
        strict.js
                Minified: 44.12 kB
                Gzipped: 14.79 kB
        strict.esm.js
                Minified: 44.02 kB
                Gzipped: 14.77 kB
        addresses.js
                Minified: 15.5 kB
                Gzipped: 5.57 kB
        addresses.esm.js
                Minified: 15.43 kB
                Gzipped: 5.55 kB

A gzipped size of <6kB is lower than I expected (I expected 10-20kB). This makes beygla/addresses very much usable for client-side applications.

(PS: This is for the non-strict version of beygla/addresses. The strict version, which I haven't created yet, would be much larger.)

Awesome! Thanks for implementing this 🙏 Looking forward to seeing this shipped 🛳️

Hey @oddsson. Version 1.5.1 has been released to npm, which includes the beygla/addresses module. See https://www.npmjs.com/package/beygla#Addresses.

Addresses with dashes (e.g. "Litla-Breiðuvík") caused a bit of extra work. I ended up splitting and applying cases to those separately (e.g. meaning that "Litla" is declined separately to "Breiðuvík").

There was also some trouble with inconsistent prefixes, e.g.:

Efri-Hreppur
Efra-Hrepp
Efra-Hreppi
Efra-Hrepps

vs

Efri-Hvoll
Efri-Hvol
Efri-Hvol
Efri-Hvols

Notice the inconsistency in Efri/Efra. Similar problems occur for a small group of names (e.g. Litli/Litla). I list these as "known problem names" in addresses.spec.ts (see source). These are declined incorrectly by beygla. I'll open an issue to resolve this some time in the future.

Anyway, try it out and let me know how it works for your use case!