http://money.cnn.com/news/specials/storysupplement/circuitcity/
http://articles.latimes.com/2009/jan/17/business/fi-circuitcity17/2
Data src (see locations.txt
)
- save list of addresses as json
selenium:
- go to maps.google.com
- wait 5 seconds
- paste address into search bar
- hit enter key on search bar
- wait 10 seconds
- click 360 view image on left side panel (to escape "map view" and get satellite view)
- wait 10 seconds
- click zoom out button 3 times
- wait 15 seconds
- take screenshot (https://stackoverflow.com/a/6282203/1757149)
- capture name of current store from ad in left side panel
- create new json with uid (matching saved image filename), address, and current store name
Step 0:
The data contains the following items: Store #, Store Name, Street Address, City, State, ZIP for all stores. Using regex on locations.txt
in sublime text to remove everything but the address:
[ ]{1}[0-9]{5}(?![0-9])
Finds all addresses (5 digit numbers), so that a newline can be added immediately following each one to deliniate between list items when readlines()
is executed on this file from python.
Next:
[\n][0-9]*(?=[ ])
Finds any series of digits between newline and a space, which is the first set of digits in each entry, in order to delete them.
Then:
[\n](\D)+(?![0-9])
Finds all chars between newline and the first digit that is anything but a digit, in order to delete them and be left with just the address of each entry.
Finally delete the Puerto Rican address because its format is different. Result: locations_pretty.txt
selenium.common.exceptions.ElementNotInteractableException: Message: Element <input id="searchboxinput" class="tactile-searchbox-input" name="q"> is not reachable by keyboard
Constantly having to start over possibly because the page doesn't load fast enough :((