Data Analysis Using SQLAlchemy

Background

For this challenge, I decided to treat myself to a long holiday vacation in Honolulu, Hawaii. To help with my trip planning, I decided to do a climate analysis about the area. The following sections outline the steps that I took to accomplish this task.

Part 1: Analyzing and Exploring the Climate Data

In this section, I used Python and SQLAlchemy to do a basic climate analysis and data exploration of my climate database. Specifically, I utilized SQLAlchemy Object Relational Mapper queries, Pandas, and Matplotlib. I used the provided hawaii.sqlite database to store the data and climate-analysis.ipynb to complete my climate analysis. Using these files, I completed the following steps:

Used the SQLAlchemy create_engine() function to connect to my SQLite database.
Used the SQLAlchemy automap_base() function to reflect my tables into classes, and then saved references to the classes named station and measurement.
Linked Python to the database by creating a SQLAlchemy session.

Figure 1. Importing dependencies, creating the engine, reflecting the tables, creating a session, and saving the measurement and station information into variables

Precipitation Analysis

I performed a precipitation analysis by completing the steps.

Found the most recent date in the dataset.
Using that date, I obtained the previous 12 months of precipitation data by querying the previous 12 months of data.
Selected only the "date" and "prcp" values.
Loaded the query results into a Pandas DataFrame. I explicitly set the column names.
Sorted the DataFrame values by "date".
Plotted the results by using the DataFrame plot method, as the following image shows:

Figure 2. Bar graph of montly precipitation data.

Use Pandas to print the summary statistics for the precipitation data.

Figure 2. Summary statistics of precipitation data.

Station Analysis

I performed a station analysis by completing the steps:

Designed a query to calculate the total number of stations in the dataset.
Designed a query to find the most-active stations (that is, the stations that have the most rows). To do so, I completed the following steps:
- Listed the stations and observation counts in descending order.
- The station id with the greatest number of observations is USC00519281
Designed a query that calculates the lowest, highest, and average temperatures for station USC00519281 (the most-active station).

Figure 3. Queries to locate the most-active station and the lowest, highest, and average temperatures for the most-active station.

Designed a query to get the previous 12 months of temperature observation (TOBS) data. To do so, I completed the following steps:
- Filtered by the station that has the greatest number of observations.
- Queried the previous 12 months of TOBS data for that station.
- Plotted the results as a histogram with bins=12, as the following image shows:

Figure 4. Bar Graph of the frequency of each temperature for the most-active station.

Closed my session.

Part 2: Climate App Using Flask

After I completed my climate analysis in jupyter, I designed a Flask API based on the queries that I just developed above. To do so, I used Flask to create my routes as follows:

Route 1: Homepage

Start at the homepage.
Lists all the available routes.

Route 2: /api/v1.0/precipitation

Converts the query results from my precipitation analysis (i.e. retrieve only the last 12 months of precipitation data) to a dictionary using date as the key and prcp as the value.
Returns the JSON representation of my dictionary.

Route 3: /api/v1.0/stations

Returns a JSON list of stations from the dataset.

Route 4: /api/v1.0/tobs

Queries the dates and temperature observations of the most-active station for the previous year of data.
Returns a JSON list of temperature observations for the previous year.

Route 5: /api/v1.0/<start> and /api/v1.0/<start>/<end>

Returns a JSON list of the minimum temperature, the average temperature, and the maximum temperature for a specified start or start-end range.
For a specified start, the max, min, and average temperature will be calculated for all the dates greater than or equal to the start date.
For a specified start date and end date, the max, min, and average temperature will be calculated for the dates from the start date to the end date, inclusive.