/SyferText

A privacy preserving NLP framework

Primary LanguagePythonApache License 2.0Apache-2.0

CI License Python 3.6

All Contributors

SyferText

SyferText is a library for privacy preserving Natural Language Processing in Python. It leverages PySyft to perform Federated Learning and Encrypted Computations (Multi-Party Computation (MPC)) on text data. The two main usage scenarios of SyferText are:

  • 🔥 Secure plaintext pre-processing: Enables pre-processing of text located on a remote machine without breaking data privacy.
  • 🚀 Secure pipeline deploy: Starting from version 0.1.0, SyferText will be able to bundle the complete pipeline of pre-processing components and trained PySyft models and to securely deploy it to PyGrid.

To get a more detailed introduction about SyferText, watch 🎥 OpenMined AMA with Alan Aboudib available on YouTube.

Installation

In order to install and start using SyferText, you first have to install git-lfs by following this short guide.

Then go ahead and install our experimental language model that we adapted form spaCy's en_core_web_lg model. This should take a few minutes since the model size is >800M.

$ pip install git+git://github.com/Nilanshrajput/syfertext_en_core_web_lg@master

If you had already installed syfertext_en_core_web_lg prior to installing git-lfs please do the following:

  1. Uninstall syfertext_en_core_web_lg
  2. Install git-lfs.
  3. Reinstall syfertext_en_core_web_lg.

Now you can go ahead and install SyferText:

$ git clone https://github.com/OpenMined/SyferText.git
$ cd SyferText
$ python setup.py install

That's it, you are good to go!

Getting Started

SyferText can be used to work with datasets residing on a local machine (or a local worker as we call it in PySyft), as well as with private datasets on remote workers. Here is a list of tutorials that you can follow to get more familiar with SyferText:

Code Examples Use Cases
1. Tokenizing local strings 1. Training a sentiment classifier on multiple private datasets
2. Tokenizing remote strings
3. Using the SimpleTagger

More tutorials are coming soon. Stay tuned!

Our Team

SyferText is created and maintained by the NLP team at OpenMined and by volunteer contributors from all around the world. Here are the current members of the core NLP team. The team is growing!


Alan Aboudib avatar
Alan Aboudib

Team Lead / Author
Nilansh Rajput avatar
Nilansh Rajput

OM NLP team / Core Dev
Jatin Prakash avatar
Jatin Prakash

OM NLP team / Core Dev
Sachin Kumar avatar
Sachin Kumar

OM NLP team / Core Dev
Bachir Chihani avatar
Bachir Chihani

OM NLP Team / Core Dev
Marcio Porto avatar
Márcio Porto

OM NLP team / Core Dev
Antonio Lopardo avatar
Antonio Lopardo

OM NLP team / Documentation

Events

Demo on remote blind tokenization with SyferText.

Demo on sentiment analysis with SyferText on multiple private datasets.

SyferText vision and encrypted sentiment analyzer demo.

Introduction to SyferText.

SyferText vision and encrypted sentiment analyzer demo.

About SyferText and my Open Source Contribution Experience with OpenMined

  • (September 16th, 2020 at 5:30PM GMT): OpenMined AMA. (Cancelled)

Introducing SyferText 0.1.0

News

To get news about feature and tutorial relseases:

Alan Aboudib: @twitter

and join #lib_syfertext channel on slack.

Support

To get support in using this library, please join the #lib_syfertext Slack channel. If you’d like to follow along with any code changes to the library, please join the #code_syfertext Slack channel. Click here to join our Slack community!

Contributors ✨

CONTRIBUTORS.md

This project follows the all-contributors specification. Contributions of any kind are welcome!

Call for Partners

We, at the NLP team, are eager to learn about new real-world use-cases around which new features in SyferText could be built.

If you think that SyferText, in its current state or by adding more features, could be useful to your research or company, please contact us as indicated below in the Contact Us section, and let us discuss how we can help.

Contact Us

You can reach out to us by contacting Alan on one of the following channels:

LinkedIn | Slack | Twitter

License

Apache License 2.0