/concrete-ml

Concrete ML: Privacy Preserving ML framework built on top of Concrete, with bindings to traditional ML frameworks.

Primary LanguageJupyter NotebookOtherNOASSERTION


📒 Documentation | 💛 Community support | 📚 FHE resources by Zama

About

What is Concrete ML

Concrete ML is a Privacy-Preserving Machine Learning (PPML) open-source set of tools built on top of Concrete by Zama.

It simplifies the use of fully homomorphic encryption (FHE) for data scientists so that they can automatically turn machine learning models into their homomorphic equivalents, and use them without knowledge of cryptography.

Concrete ML is designed with ease of use in mind. Data scientists can use models with APIs that are close to the frameworks they already know well, while additional options to those models allow them to run inference or training on encrypted data with FHE. The Concrete ML model classes are similar to those in scikit-learn and it is also possible to convert PyTorch models to FHE.

Main features

  • Built-in models: Ready-to-use FHE-friendly models with a user interface that is equivalent to their the scikit-learn and XGBoost counterparts
  • Customs models: Concrete ML supports models that can use quantization-aware training. These are developed by the user using PyTorch or keras/tensorflow and are imported into Concrete ML through ONNX

Learn more about Concrete ML features in the documentation.

Use cases

By leveraging FHE, Concrete ML can unlock a myriad of new use cases for machine learning, such as enabling secure and private data collaboration, protecting sensitive data while still allowing for analysis, and facilitating machine learning on data-sets that are subject to strict data privacy regulations, for instance

  • Healthcare data analysis: Improve patient care while maintaining privacy by allowing secure, confidential data sharing between healthcare providers.
  • Financial services: Facilitate secure financial data analysis for risk management and fraud detection, keeping client information encrypted and safe.
  • Ad campaign tracking: Create targeted advertising and campaign insights in a post-cookie era, ensuring user privacy through encrypted data analysis.
  • Industries: Enable predictive maintenance in the cloud while keeping sensitive data confidential, enhancing efficiency and data security.
  • Biometrics: Give the ability to create user authentication applications without having to reveal their identities.
  • Government: Enable governments to create digitized versions of their services without having to trust cloud providers.

See more use cases in the list of demos.

Table of Contents

Getting Started

Installation

Depending on your OS, Concrete ML may be installed with Docker or with pip:

OS / HW Available on Docker Available on pip
Linux Yes Yes
Windows Yes Coming soon
Windows Subsystem for Linux Yes Yes
macOS 11+ (Intel) Yes Yes
macOS 11+ (Apple Silicon: M1, M2, etc.) Yes Yes

Note: Concrete ML only supports Python 3.8, 3.9 and 3.10. Concrete ML can be installed on Kaggle (see this question on the community for more details) and on Google Colab.

Docker

To install with Docker, pull the concrete-ml image as follows: docker pull zamafhe/concrete-ml:latest

Pip

To install Concrete ML from PyPi, run the following:

pip install -U pip wheel setuptools
pip install concrete-ml

Find more detailed installation instructions in this part of the documentation

↑ Back to top

A simple example

Here is a simple example which is very close to scikit-learn for a logistic regression :

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from concrete.ml.sklearn import LogisticRegression

# Lets create a synthetic data-set
x, y = make_classification(n_samples=100, class_sep=2, n_features=30, random_state=42)

# Split the data-set into a train and test set
X_train, X_test, y_train, y_test = train_test_split(
    x, y, test_size=0.2, random_state=42
)

# Now we train in the clear and quantize the weights
model = LogisticRegression(n_bits=8)
model.fit(X_train, y_train)

# We can simulate the predictions in the clear
y_pred_clear = model.predict(X_test)

# We then compile on a representative set 
model.compile(X_train)

# Finally we run the inference on encrypted inputs !
y_pred_fhe = model.predict(X_test, fhe="execute")

print("In clear  :", y_pred_clear)
print("In FHE    :", y_pred_fhe)
print(f"Similarity: {int((y_pred_fhe == y_pred_clear).mean()*100)}%")

# Output:
    # In clear  : [0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 1 1 1 0 0]
    # In FHE    : [0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 1 1 1 0 0]
    # Similarity: 100%



It is also possible to call encryption, model prediction, and decryption functions separately as follows. Executing these steps separately is equivalent to calling predict_proba on the model instance.

# Predict probability for a single example
y_proba_fhe = model.predict_proba(X_test[[0]], fhe="execute")

# Quantize an original float input
q_input = model.quantize_input(X_test[[0]])

# Encrypt the input
q_input_enc = model.fhe_circuit.encrypt(q_input)

# Execute the linear product in FHE
q_y_enc = model.fhe_circuit.run(q_input_enc)

# Decrypt the result (integer)
q_y = model.fhe_circuit.decrypt(q_y_enc)

# De-quantize and post-process the result
y0 = model.post_processing(model.dequantize_output(q_y))

print("Probability with `predict_proba`: ", y_proba_fhe)
print("Probability with encrypt/run/decrypt calls: ", y0)

This example is explained in more detail in the linear model documentation.

Concrete ML built-in models have APIs that are almost identical to their scikit-learn counterparts. It is also possible to convert PyTorch networks to FHE with the Concrete ML conversion APIs. Please refer to the linear models, tree-based models and neural networks documentation for more examples, showing the scikit-learn-like API of the built-in models.

↑ Back to top

Resources

Demos

Live demos on Hugging Face

  • Credit card approval: Predicting credit scoring card approval application in which sensitive data can be shared and analyzed without exposing the actual information to neither the three parties involved, nor the server processing it.
    • Check the code here
  • Sentiment analysis with transformers: predicting if an encrypted tweet / short message is positive, negative or neutral, using FHE.
  • Health diagnosis: giving a diagnosis using FHE to preserve the privacy of the patient based on a patient's symptoms, history and other health factors.
    • Check the code here
  • Encrypted image filtering : filtering encrypted images by applying filters such as black-and-white, ridge detection, or your own filter.
    • Check the code here

Other demos

  • Encrypted Large Language Model: converting a user-defined part of a Large Language Model for encrypted text generation. This demo shows the trade-off between quantization and accuracy for text generation and shows how to run the model in FHE.
  • Private inference for federated learned models: private training of a Logistic Regression model and then importing the model into Concrete ML and performing encrypted prediction.
  • Titanic: solving the Kaggle Titanic competition. Implemented with XGBoost from Concrete ML, this example comes as a companion of the Kaggle notebook, and was the subject of a blogpost in KDnuggets.
  • CIFAR10 FHE-friendly model with Brevitas: training a VGG9 FHE-compatible neural network using Brevitas, and a script to run the neural network in FHE. Execution in FHE takes ~4 minutes per image and shows an accuracy of 88.7%.
  • CIFAR10 / CIFAR100 FHE-friendly models with Transfer Learning approach: series of three notebooks, that convert a pre-trained FP32 VGG11 neural network into a quantized model using Brevitas. The model is fine-tuned on the CIFAR data-sets, converted for FHE execution with Concrete ML and evaluated using FHE simulation. For CIFAR10 and CIFAR100, respectively, our simulations show an accuracy of 90.2% and 68.2%.
  • FHE neural network splitting for client/server deployment: explaining how to split a computationally-intensive neural network model in two parts. First, we execute the first part on the client side in the clear, and the output of this step is encrypted. Next, to complete the computation, the second part of the model is evaluated with FHE. This tutorial also shows the impact of FHE speed/accuracy trade-off on CIFAR10, limiting PBS to 8-bit, and thus achieving 62% accuracy.

If you have built awesome projects using Concrete ML, please let us know and we will be happy to showcase them here!

Tutorials

Explore more useful resources in Awesome Zama repo

Documentation

Full, comprehensive documentation is available here: https://docs.zama.ai/concrete-ml.

↑ Back to top

Working with Concrete ML

Citations

To cite Concrete ML in academic papers, please use the following entry:

@Misc{ConcreteML,
  title={Concrete {ML}: a Privacy-Preserving Machine Learning Library using Fully Homomorphic Encryption for Data Scientists},
  author={Zama},
  year={2022},
  note={\url{https://github.com/zama-ai/concrete-ml}},
}

Contributing

To contribute to Concrete ML, please refer to this section of the documentation.

License

This software is distributed under the BSD-3-Clause-Clear license. If you have any questions, please contact us at hello@zama.ai.

↑ Back to top

Support

🌟 If you find this project helpful or interesting, please consider giving it a star on GitHub! Your support helps to grow the community and motivates further development.

↑ Back to top