Two versions of a scraping script for tabulated weather data from a Davis® Weatherlink Web page. So, if that Web page HTML changes, the scripts can easily break. The scripts have been created for testing and personal use. The following modules have been imported into one or both scripts:
- urllib3 is the HTTP client, including PoolManager which can pool multiple servers.
- BeautifulSoup functions as a Python wrapper for an HTML parser for traversing, searching and changing the parsed tree.
- AsciiTable uses
-
for horizontal lines,|
for vertical lines and+
for intersections to construct tabular data grids.
http = urllib3.PoolManager()
:urllib3
'sPoolManager()
handles arbitrary server requests.req = http.request("GET", "http://www.weatherlink.com/user/gooselakewx/index.php?view=summary&headers=0")
: theGET
reqeust.
if req.status == 200:
blob = req.data
else:
print("Check req.status")
- the conditional statement above checks for
OK
connection status, or status200
. soup = BeautifulSoup(blob, "html.parser")
: for parsing HTML for text from elements. in this repo, most of which were<td>
elements from tabular data.f'{soup.select("td:nth-of-type(14)")[0].string}'
: in this f-string, theselect()
function is used to target a specific<td>
element for its text.string
.
table = AsciiTable(main_data)
r_table = AsciiTable(rainfall)
s_table = AsciiTable(soil)
print(table.table)
print(r_table.table)
print(s_table.table)
in the variables defined above, AsciiTable()
class gets the data arrays and formats them into the three tables shown below: