/esmr_data

Electronic Self Monitoring Report parsing and analysis scripts

Primary LanguageJupyter NotebookMIT LicenseMIT

Electronic Self Monitoring Data (ESMR)

California Integrated Water Quality System Project (CIWQS) collects the data for National Pollutant Discharge Elimination System (NPDES). Electronic Self Monitoring Data is available via the webpage https://ciwqs.waterboards.ca.gov/ciwqs/readOnly/CiwqsReportServlet?inCommand=reset&reportName=esmrAnalytical

This data is also available from California's Open Data website: https://data.ca.gov/dataset/water-quality-effluent-electronic-self-monitoring-report-esmr-data. This project parses the data from the .csv file (large > 8GB+) and uses the descriptions as defined in the data dictionary available that site to define classes that can make sense of the data

Conceptual model

The main class is ESMR which takes a dataframe of the entire .csv file in its construction. This class can return a list of Facility names and also a Facility given a specific name.

The key concept is that of a Facility (usually a Waster Water Treament Plant (WWTP)). A Facility has a name and id and can have many Location(s).

Each Location has a name, e.g. EFF-001, of type such as 'Effluent Montioring' in this example. In addition it has also has the concept of a geo referenced latitude and longitude (this may be empty in many cases)

Each Location can measure various Parameter(s). Each Parameter has a name and is represented by one or more Variable(s).

A Variable has a name (same as Parameter), units, calculated method (such as daily mean, daily max, weekly mean, etc.). The result pandas DataFrame attached to the Variable contains the time series of values for that variable

Each of the concepts above has an attached pandas DataFrame, df, that is a subset corresponding that to that concept. E.g. the Facility has a df attribute that has a dataframe for just that Facility.

classDiagram
class ESMR{
df: DataFrame
get_region_names(): list(str)
get_facility_names(region:str): list(str)
get_facility(name): Facility
}

class Facility{
df: DataFrame
name: str
region: str
place_id: int
get_location_names(): list(str)
get_location(location:str): Location
get_locations_of_type(location_place_type='Effluent Monitoring'): list(Location)
}

class Location{
df: DataFrame
name: str
facility: Facility
place_id: int
place_type: str
desc: str
get_parameter_names(): list(str)
get_parameter(parameter): Parameter
}

class Parameter {
df: DataFrame
name: str
location: Location
get_variables(): list(Variable)
}

class Variable{
    df: DataFrame
    name: str
    calculated_method: str
    units: str
    parameter: Parameter
    result: DataFrame 
    analytical_method_code: str
    qualifier: str
    mdl: DataFrame
    ml: DataFrame
    rl: DataFrame
}

ESMR *-- Facility
Facility *-- Location
Location --> Facility
Location *-- Parameter
Parameter --> Location
Parameter *-- Variable
Variable --> Parameter