tdowd/excel-geocoder

Frequent timeouts and hangs

Closed this issue · 13 comments

First thanks so much for this tool. I was doing geocoding on Fusion Tables but ran into daily quota limits there (there is no paid option), so this is allowing me to slowly geocode more than 40,000 addresses offline using a paid API plan.

My ongoing problem is frequent hangs and timeouts. I've worked around some of them by disabling automatic recalculation and turning off "recalculate before saving" in Excel, but the script will still frequently hang in the middle of geocoding just a few hundred addresses. Sometimes it simply stops ticking up and I'm forced to start that batch over, and other times the batch acts like it's ended but just a handful of results were received.

Are you aware of any workarounds that might allow me to, ideally, start a run of 40,000 addresses overnight and have it complete by morning? Would introducing a delay in the formula or in the geocoding script code possibly have any benefit?

BTW, this is the formula I'm using, which first checks that the result cell is empty (to avoid recalculating unnecessarily), grabs a street address from the same row, and hands it to the geocoding script:

=IF(INDIRECT("AA"&ROW())<>"",AddressGeocode(INDIRECT("Y"&ROW())),"")

Any ideas?

Thanks,
Joel Risberg

tdowd commented

joel - thanks for posting this issue! i have actually been trying to solve for this problem very recently as i noticed the same thing happening. i tied to add a "wait" clause into the API call out to google, but did not have a whole lot of success.

one thing i was considering was making more of a "template" where you could input a bunch of addresses into the sheet and then click a button to process them all consecutively. the downside of this is that it would probably lock up your computer for some time if you have a bunch of them - the upside would be that i could time the API requests so that they ideally would not hang anymore (or, at least if something timed out, we could re-try the request after waiting to ensure that the request eventually goes through).

my understanding is that there used to be another excel geocoding product on github that did this - but i don't think it is being supported anymore.

any thoughts on if the template version could help? it sounds like it might be a step in the right direction for doing thousands of them.

thanks for the help,
ted

I'm not sure if your suggestion makes sense with our workflow. We start with a spreadsheet containing 40,000 addresses (among other things) and need to generate lat/long for each row in a new column.

I've considered playing with the "number of processors" option in Excel in case multithreading is overloading the script. I'll try reducing it to one processor and will let you know if that helps, but if you have any other ideas I'd be very interested.

Thanks,
Joel

tdowd commented

gotcha, i see. just to confirm, you'll have a spreadsheet of 40,000 rows, use the AddressGeocode() formula you have in your first post, and just drag it down all the rows and let it process?

if so, i can do some more testing and see if i can come up with a solution. i have a feeling i may be able to build in a small delay in the script or some way to prevent the requests from piling up.

That's correct. It's just that it fails so regularly that I'm forced to drag just a few hundred rows at a time, wait to see if it actually worked, then drag a few hundred more. That gets old quickly on 40,000 rows.

tdowd commented

ok, understood. let me play around with it and i'll report back with what i find. thanks again for bringing this up!

FYI I was running into problems with the previous formula, which may have confused Excel with a circular reference. I'm using this simpler one for now:

=AddressGeocode(INDIRECT("Y"&ROW()))

tdowd commented

i just updated the code and added a sleep function which should stop us from pinging the google API too frequently, which i believe is what was causing some of the hanging. using a 1,000 property test set, i was able to get geocodes back for all of them in about 30 seconds. would you mind giving the new app a try and let me know if it helps at all? i did not get a chance to test it on a 32-bit system (so if you are stuck on 32-bit windows we might have another issue) but hopefully this is a step in the right direction.

I'm using your modified code for the first time today and I'm afraid the results seem to be worse. Running geocoding on just 500 addresses at a time right now results in about 30% failure -- meaning no latitude/longitude returned.

It looks like I'll have to do a sort on blank results and re-run those rows in hopes of catching them on a second pass. Any other ideas?

Also, I don't get geocoding performance even approaching what you mentioned. Geocoding 500 addresses takes something like 5 minutes on my 64-bit machine with Excel 2016.

FYI, I modify the code to return nothing when geocoding fails (line 69) and to use a space between latitude and longitude (line 66) rather than a comma as specified by Fusion Tables. Maybe these could be configurable options?

Thanks,
Joel

tdowd commented

darn - okay, thank you for testing it out. are all your addresses located in the US? just want to make sure that my test set is relatively similar to yours.

that's a good idea on making the delimiter configurable - i can work on that.

the performance kind of worries me as i was able to get through them incredibly quickly. let me see if i can test on a couple other setups and that may give me some other ideas as to what i can do to help.

one thing you might try is in line 54, change Sleep (5) to Sleep (10). this is the lag i put in to slow down the queries we are sending to the google geocoder. my fist inclination was that the issue you were experiencing was because we were querying google too fast and needed to slow it down. adding a little more delay might increase the time it takes to run but it may give you a higher hit rate (if my theory is correct).

i will do some more testing in the mean-time.

I may have spoken too soon. I was inadvertently using the free geocoder (I'd forgotten to update my key when replacing the geocoder code with the new version). This was my first batch in over a week so I don't understand why the failure rate would be so high even with the free service. The good news is that when using a premium key the rate is not noticeably faster, but success rates are now as expected (98 to 99%).

And yes, my addresses are 100% U.S. and generally return results 98 to 99% of the time.

So I guess the news is mixed, but it's not as bad as I thought earlier. Sorry for the confusion.

Thanks.

tdowd commented

ah, okay, gotcha. so basically, it seems like it's working and you are getting accurate results, but it takes a while to get through them?

Yes. I'd say the geocoding speed isn't significantly changed from before your recent change, but it's not noticeably worse.

Thanks.

tdowd commented

alright, cool. i'll take accuracy over speed in the short term, but i'll let you know if i can think of a way to speed it up. just need to do a little more stress testing.