This project helps detect and log mismatches in student academic records (entry, dropout, GPA issues) using Python and SQL. The outputs are prepared for use in Power BI dashboards for risk analysis and early intervention.
- Python (Pandas, SQLite3)
- SQL (SQLite)
- Power BI (for dashboards)
main.py– Python script to run validations and log issuessql_queries.sql– SQL code to create tables and query datastudent_records.db– SQLite database (auto-created)
Ensure you have Python 3.x installed. Install pandas if not already available:
pip install pandas
Run the CREATE TABLE statements in sql_queries.sql using a SQLite client or add them inside main.py for first-time setup.
Optionally, insert mock student data into student_records.
Execute the Python script to find and log data quality issues:
python main.py
This will populate the issue_log table with detected mismatches.
Use the following SQL query from sql_queries.sql to get summary output:
SELECT
sr.program,
sr.term_status,
COUNT(il.issue_id) AS issue_count
FROM student_records sr
LEFT JOIN issue_log il ON sr.student_id = il.student_id
GROUP BY sr.program, sr.term_status;
Load this result into Power BI to create visuals like:
- Bar chart: Program vs. Issue Count
- Filters: GPA range, term status
- Conditional formatting: Highlight high-risk programs
- Missing program or term status
- GPA < 2.0 not flagged as "At Risk"
- Dropped students still marked "Active"