- Course: HHA 504
- Homework assignment #3: Introduction to the world of databases, starting with SQLite. Integrate data processing with Python, utilize Pandas for exploratory data analysis, and conduct database operations using SQLite.
- Used datasets from New York Presbyterian Hospital and St. Joseph Hospital. The datasets contain information about standard pricing for hospital services.
- Conducted summary statistics for both datasets. For the columns with numerical data, I calculated the count, mean, minimum, maximum, standard deviation, and quartiles. For the columns with catergorical data, I calculated the count, frequency, top data, and unique data.
- Created histograms for both datasets. For the New York Presbyterian Hospital dataset, I specifically looked at the "minimum negotiated charges" column. For the St. Joseph Hospital dataset, I specifically looked at the "package size" column.
- Import SQLite:
import sqlite3
. - Import create_engine:
from sqlalchemy import create_engine
.
- Create a connection object using
sqlite3.connect(<'name of database'>)
. - Create a cursor object using
.cursor()
. - Create a new table with columns using the SQL command
CREATE TABLE <name of table>
. Then, execute it usingexecute()
. Make sure to commit the changes using.commit()
. - Create and insert new rows with data into the table using the SQL command
INSERT INTO <name of table>
. Then, execute it usingexecute()
. Make sure to commit the changes using.commit()
- Turn the table into a pandas dataframe using
pd.read_sql()
.
- Close the connection and cursor using
.close()
. For more details about the code click here