/gwells_geocode_and_archive_data

schedule_github_actions_to_save_csv_to_amazon_s3

Primary LanguageRGNU General Public License v3.0GPL-3.0

GWELLS Download, Geocode, QA and archive data re

The purpose of the code in this repo is to maintain an archive of the registered ground water wells data provided by the Government of British-Columbia in the data/ folder.

Three CSVs appear in the data/ folder. They are all updated on a daily basis:

gwells_data_first_appearance.csv keeps a record of each well on the day they were added to the gwells csv. We never update the information of a well, allowing us to go back in time and generate a record for any time period. The wells are defined by their well_tag_number. The columns are the same as for the gwells.csv, with the addition of the date_added column, which is the first date a well_tag_number was spotted by this script.

  • wells_geocoded.csv is the result of passing the gwells_data_first_appearance.csv through the python gwells_locationqa geocode script.

  • gwells_locationqa.csv is the result of passing gwells_data_first_appearance.csv and wells_geocoded.csv through the python gwells_locationqa qa script.

The daily updates occur daily thanks to scheduled github action that depends on the Docker Image created specifically for this project. The image was tailored to include all the R, Python and spatial dependencies required to run the Python scripts created by Simon Norris and build on the rocker/geospatial:4.1.2 docker image.

The three CSVs will then be used to feed the shiny app (code) created for this Code With Us project.

below is some information I got somewhere else :

Well extracts for GWELLS (Groundwater Wells and Aquifers)

CSV Format notes

https://docs.python.org/3/library/csv.html:

The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. CSV format was used for many years prior to attempts to describe the format in a standardized way in RFC 4180. The lack of a well-defined standard means that subtle differences often exist in the data produced and consumed by different applications. These differences can make it annoying to process CSV files from multiple sources. Still, while the delimiters and quoting characters vary, the overall format is similar enough that it is possible to write a single module which can efficiently manipulate such data, hiding the details of reading and writing the data from the programmer.

Well extracts are generated in Python 3 using the csv library "excel" dialect (see https://docs.python.org/3/library/csv.html#csv.excel)

Recommendations

It is suggested that you use a mature library, rather than attempting to write bespoke code to read CSV data. For example, Python 3 comes with module to read and write CSV data.

Common problems when reading GWELLS CSV data

Blank records or data that doesn't match up with columns

It may be that your application does not correctly handle escaped line-break characters. See RFC 4180, Section 2, point 6:

Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example:

"aaa","b CRLF

bb","ccc" CRLF

zzz,yyy,xxx

Columns/data suddenly no longer available or changed

The data in the well export closely matches the current state of data in the GWELLS web application, as such the structure may change from time to time.

  • Column names and positions may change at any time.
  • Columns may be added or removed at any time.

Frequency

Data extracts should be generated daily but may fail to be generated for various reasons.

Other sources for well information