mlinspect-SQL

This is an SQL extension to the mlinspect framework to transpile Python library functions to SQL for execution within a database system.

Run mlinspect locally

Prerequisite: Python 3.8

Clone this repository
Set up the environment

python -m venv venv
source venv/bin/activate
If you want to use the visualisation functions we provide, install graphviz which can not be installed via pip

Linux: apt-get install graphviz
MAC OS: brew install graphviz
Install pip dependencies

pip install -e .[dev]
To ensure everything works, you can run the tests (without graphviz, the visualisation test will fail)

python setup.py test

How to use the SQL backend

We prepared two examples, the first is to demonstrate execution of machine learning pipelines only, the second demonstrate a full end-to-end machine learning pipeline that compares the performance of different backends.

In order to run the latter one, you need a PostgreSQL database system running (at port 5432) in the background with an user luca with password password that is allowed to copy from CSV files and has access to the respective database. (https://www.postgresql.org/download/linux/ubuntu/)

# After intalling: 
sudo -i -u postgres
psql

create user luca;
alter role luca with password 'password';
grant pg_read_server_files to luca;
create database healthcare_benchmark;
grant all privileges on database healthcare_benchmark to luca;

To also run the benchmarks in Umbra, you need an Umbra server running at port 5433.

For more information on the functions supported w.r.t execution outsourced to DBMS, please see here.

How to use mlinspect

mlinspect makes it easy to analyze your pipeline and automatically check for common issues.

from mlinspect import PipelineInspector
from mlinspect.inspections import MaterializeFirstOutputRows
from mlinspect.checks import NoBiasIntroducedFor

IPYNB_PATH = ...

inspector_result = PipelineInspector\
        .on_pipeline_from_ipynb_file(IPYNB_PATH)\
        .add_required_inspection(MaterializeFirstOutputRows(5))\
        .add_check(NoBiasIntroducedFor(['race']))\
        .execute()

extracted_dag = inspector_result.dag
dag_node_to_inspection_results = inspector_result.dag_node_to_inspection_results
check_to_check_results = inspector_result.check_to_check_results

With execution outsourced to a Database Management System (DBMS):

from mlinspect.to_sql.dbms_connectors.postgresql_connector import PostgresqlConnector
from mlinspect import PipelineInspector
from mlinspect.inspections import MaterializeFirstOutputRows
from mlinspect.checks import NoBiasIntroducedFor

dbms_connector = PostgresqlConnector(...)

IPYNB_PATH = ...

inspector_result = PipelineInspector\
        .on_pipeline_from_ipynb_file(IPYNB_PATH)\
        .add_required_inspection(MaterializeFirstOutputRows(5))\
        .add_check(NoBiasIntroducedFor(['race']))\
        .execute_in_sql(dbms_connector=dbms_connector, mode="VIEW", materialize=True)

extracted_dag = inspector_result.dag
dag_node_to_inspection_results = inspector_result.dag_node_to_inspection_results
check_to_check_results = inspector_result.check_to_check_results

LC117/mlinspect

mlinspect-SQL

Run mlinspect locally

How to use the SQL backend

How to use mlinspect