Potentially wrong results returned
jannikmi opened this issue · 17 comments
I wrote a package using the same data and compared my results to the results of this package for aprox. 5M points. Those are a few of the points where I got mismatches:
(lng, lat), result timezonefinder, result pytzwhere
(-110.58195555806134, 35.53178795327783), 'America/Phoenix', 'America/Denver']
(-110.41118747093144, 35.76969036563554), 'America/Phoenix', 'America/Denver']
(-110.69776990691068, 35.59283347945522), 'America/Phoenix', 'America/Denver']
(-110.91327628315915, 36.11131540134208), 'America/Phoenix', 'America/Denver']
(-110.26814562443687, 35.98507252905026), 'America/Phoenix', 'America/Denver']
(-133.73396065378114, 68.38068073677294), 'America/Inuvik', 'America/Yellowknife']
These might be errors in my algorithm, but I still though you might want to check this out.
This is my package:
https://github.com/MrMinimal64/timezonefinder
Thanks for the info. As far as I remember that area of Arizona is a mess due to the Navajo Nation and the Hopi. I will check it out though :)
I checked the points manually. It looks like pytzwhere is not dealing with polygons within polygons correctly. Your results are indeed correct.
I also think it is because of polygons within polygons. My algorithms is stopping when the point was found to be included in a polygon, too. I decided not to change my algorithm, because actually this is a mistake within the data and correcting it would make it ugly and slow.
That makes sense. However, it is actually not a mistake in the data. It is legitimate for timezones to have timezones within them.
Why? Then your data would suggest that this area is in two timezones... I don't think that makes sense.
I should rephrase this for clarity. I didn't mean to say that a point would actually be in two polygons, but within the American/Denver timezone there is a part that is American/Phoenix. This video does a much better job at explaining it than I ever could.
But still a timezone has to be unique and therefore the surrounding timezone would need to be excluded from the surrounded tz. Otherwise algorithms don't really have a chance of deciding which timezone it is (except hardcoding it) or am I wrong here?
Polygons can't have holes in them. To ensure every coordinate was located within exactly one polygon would require changing the data to have a "seam" connecting the hole with the outer boundary. So it's just a matter of convention. We pull the data from http://efele.net/maps/tz/world/ and they follow one convention. The only solution here would be to reprocess the data to remove all layered polygons.
The original data from Eric Muller does have holes though. I just checked.
"Holes" as in the polygon defines both an outer and inner boundary?
For me the ideal solution would be to convince them to correct the data (for me layered polygons are mistakes), but I don't know realistic this is.
By the way do you have a way of automatically creating your .csv from efele.net data?
I thought about that issue , but I just decided to leave it that way for now. There are just a few points where this matters and the effort for fixing this seemed to big.
Yes a "hole" as in having an outer and inner boundary
I remember running into issues with "holes" when initially working on this project. I can dig in next week to see if I can figure it out
I think I know how to fix this. I am working on it.
This should now be fixed in the development branch. The development branch now correctly deals with holes. There are a few other little things I would like to add to that branch before we can release it but for testing timezonefinder
it should already be sufficient