[IMPROVE] remove accented characters from name column

Question

[IMPROVE] remove accented characters from name column

maread99 opened this issue 2 years ago · 2 comments

Hi, what an awesome resource you've put together here! Thank you.

I've noticed that some of the entries have accented characters in the name column which don't appear to decode correctly (at least for me), for example:

equities.search(name="telef", exchange="MCE")

So neither of the following return anything:

>>> df = equities.search(name="telefonica", exchange="MCE")
>>> df.empty
True
>>> equities.search(name="telefónica", exchange="MCE")
>>> df.empty
True

The following returns a load of instruments because many entries for Telefonica do not included the accented 'o':

>>> df = equities.search(name="telefonica")
>>> len(df)
20

...although it won't include the Madrid listing (and other entries that have the accented o in the name).

To ensure consistent querying I'd suggest replacing all accented characters in the name column with their unaccented equivalents.

If I get a moment (unlikely tbh) I'll contribute the change. Thought I'd raise the issue in the meantime in case anyone else runs into this and has the opportunity to make the changes.

Thanks again for the library!
Marcus

Answer 1 · 2023-06-22T05:01:09.000Z

Good call! I would suggest not to replace the actual name but instead make sure the accented characters are properly included.

It should then also include a Boolean ("accent_sensitive") parameter in case you do want to search specifically for "telephóne" instead of "telephone". By default this is off so you get both results.

We'd need to figure out whether this is easily doable.

Answer 2 · 2023-08-24T15:39:41.000Z

This issue has been resolved. I've renamed most of the names for a whole lot of tickers. Quite a cumbersome task but it's done!