- Install pipenv
- Install odbc drivers OR install Docker
-
Install SQLSorcery
-
Set up .env file
MS_SERVER=dchess.database.windows.net MS_DB=data_whiz MS_SCHEMA=dbo MS_USER=sql_admin MS_PWD=DataWhiz2019
-
Download 2019 SBAC Research Files
This can be done manually or using some simple python:
import urllib.request url = "http://caaspp-elpac.cde.ca.gov/caaspp/researchfiles/sb_ca2019_1_csv_v2.zip" urllib.request.urlretrieve(url, "sb_ca2019_1_csv_v2.zip")
-
Unzip the files
Again, this can be done manually in the file system or with a little python:
from zipfile import ZipFile with ZipFile("sb_ca2019_1_csv_v2.zip", "r") as z: z.extractall()
-
Read the CSV files into Pandas dataframes
import pandas as pd entities = pd.read_csv("sb_ca2019entities_csv.txt", encoding="ISO-8859-1") scores = pd.read_csv("sb_ca2019_1_csv_v2.txt", encoding="ISO-8859-1")
The additional encoding is necessary because the SBAC files do not use the standard UTF-8.
-
Load them as tables in the database
from sqlsorcery import MSSQL sql = MSSQL() sql.insert_into("sbac_entities_2019", entities, chunksize=1000) sql.insert_into("sbac_scores_2019", scores, chunksize=1000)
The additional chunksize param allows for batch insertions, greatly improving performance.
-
Now you can query as you like. Some samples are provided.
from sqlsorcery import MSSQL sql = MSSQL() df = sql.query("SELECT * FROM sbac_entities_2019")
-
Run the source code
If you are using the provided Dockerfile:
$ docker build -t data_whiz . $ docker run --rm -it data_whiz
If you have installed the Microsoft ODBC drivers locally:
$ pipenv run python main.py