TorchQL is a query language for Python-based machine learning models and datasets.
Try TorchQL using our demo in this colab notebook.
The easiest way to install TorchQL is as a Python package:
pip install torchql
Alternatively, install TorchQL from source to contribute and keep up with the latest updates:
git clone https://github.com/TorchQL/torchql.git
cd torchql
pip install . # add the -e option to install an editable version of the package
Here is a simple example of a query that can be written in TorchQL. It loads the MNIST training dataset and extracts samples with the label equal to 7.
First, we set up a TorchQL database:
from torchvision import datasets
from torchql import Database, Query
train_data = datasets.MNIST(
root = 'data',
train = True,
download = True,
)
db = Database("mnist")
db.register_dataset(train_data, "train")
Observe that we can directly instantiate a TorchQL table from the PyTorch MNIST train dataset. Next, we write the query and execute it on this dataset:
q = Query('seven', base='train').filter(lambda img, label : label == 7)
print(q(db).sample())
The TorchQL Query
object is instantiated with a name (here seven
), and a base table over which operations
can be specified (here train
).
We then specify a filter
operation to only keep the records that have the label as 7.
Each record contains an image and its label.
We run this query on the database using q(db)
, and randomly sample a single record from the resulting table.
This is the output of running the above code:
Filtering: 100%|██████████| 60000/60000 [00:00<00:00, 992096.76it/s]
(<PIL.Image.Image image mode=L size=28x28>, 7)
Please refer to the documentation and the demo for in-depth description of each functionality of TorchQL.
You can find more documentation on TorchQL here.
@inproceedings{naik2023torchql,
title={TorchQL: A Programming Framework for Integrity Constraints in Machine Learning},
author={Aaditya Naik and Adam Stein and Yinjun Wu and Mayur Naik and Eric Wong},
booktitle={OOPSLA}
year={2024}
}