elaske/tsp-data

Compare CSV and HTML data gathering methods.

Closed this issue · 3 comments

The most obvious thing is what is going to be faster. The CSV will have gains in not having to parse as much, but the HTML version might get posted back to us quicker (less server processing time?).

Need to measure this to be sure which is the best option. This will become the default option. The other will stay in the code as a user option.

A preliminary test where the form data was only changed between "CSV" and "Retrieve" and the script run exactly the same way - simply to finish getting the request from the server - ran in about the same 2.2 to 2.4 seconds. This was a qualitative test, but it looks like the biggest difference is going to come from the processing of the data.

Created some tests using timeit in 4eb748c.

Results (100 loops)
CSV request into list of row lists = 26.6854787864
HTML request into data_dict-type = 32.8403614132

The CSV version is faster, but this will be eroded by the fact that it doesn't have some of the processing required to create a similar data structure.

There was one other advantage: network usage. The CSV requests averaged about 25kB/s down on my computer versus the HTML requests' 75kB/s.

This might yield that we choose one or the other of #8 and #16 rather than having to support both.