[IMPROVE] remove accented characters from name column
maread99 opened this issue · 2 comments
Hi, what an awesome resource you've put together here! Thank you.
I've noticed that some of the entries have accented characters in the name column which don't appear to decode correctly (at least for me), for example:
equities.search(name="telef", exchange="MCE")
So neither of the following return anything:
>>> df = equities.search(name="telefonica", exchange="MCE")
>>> df.empty
True
>>> equities.search(name="telefónica", exchange="MCE")
>>> df.empty
TrueThe following returns a load of instruments because many entries for Telefonica do not included the accented 'o':
>>> df = equities.search(name="telefonica")
>>> len(df)
20...although it won't include the Madrid listing (and other entries that have the accented o in the name).
To ensure consistent querying I'd suggest replacing all accented characters in the name column with their unaccented equivalents.
If I get a moment (unlikely tbh) I'll contribute the change. Thought I'd raise the issue in the meantime in case anyone else runs into this and has the opportunity to make the changes.
Thanks again for the library!
Marcus
Good call! I would suggest not to replace the actual name but instead make sure the accented characters are properly included.
It should then also include a Boolean ("accent_sensitive") parameter in case you do want to search specifically for "telephóne" instead of "telephone". By default this is off so you get both results.
We'd need to figure out whether this is easily doable.
