README

Prerequisites

Install pipenv
Install odbc drivers OR install Docker

Tutorial

Install SQLSorcery

Set up .env file

MS_SERVER=dchess.database.windows.net
MS_DB=data_whiz                                                                         
MS_SCHEMA=dbo                                                                      
MS_USER=sql_admin                                                                  
MS_PWD=DataWhiz2019

Download 2019 SBAC Research Files

This can be done manually or using some simple python:

import urllib.request

url = "http://caaspp-elpac.cde.ca.gov/caaspp/researchfiles/sb_ca2019_1_csv_v2.zip"
urllib.request.urlretrieve(url, "sb_ca2019_1_csv_v2.zip")

Unzip the files

Again, this can be done manually in the file system or with a little python:

from zipfile import ZipFile

with ZipFile("sb_ca2019_1_csv_v2.zip", "r") as z:
    z.extractall()

Read the CSV files into Pandas dataframes

import pandas as pd

entities = pd.read_csv("sb_ca2019entities_csv.txt", encoding="ISO-8859-1")
scores = pd.read_csv("sb_ca2019_1_csv_v2.txt", encoding="ISO-8859-1")

The additional encoding is necessary because the SBAC files do not use the standard UTF-8.

Load them as tables in the database

from sqlsorcery import MSSQL

sql = MSSQL()
sql.insert_into("sbac_entities_2019", entities, chunksize=1000)
sql.insert_into("sbac_scores_2019", scores, chunksize=1000)

The additional chunksize param allows for batch insertions, greatly improving performance.

Now you can query as you like. Some samples are provided.

from sqlsorcery import MSSQL

sql = MSSQL()
df = sql.query("SELECT * FROM sbac_entities_2019")

Run the source code

If you are using the provided Dockerfile:
```
$ docker build -t data_whiz .
$ docker run --rm -it data_whiz
```
If you have installed the Microsoft ODBC drivers locally:
```
$ pipenv run python main.py
```

dchess/data_whiz

README

Prerequisites

Tutorial