SyferText is a library for privacy preserving Natural Language Processing in Python. It leverages PySyft to perform Federated Learning and Encrypted Computations (Multi-Party Computation (MPC)) on text data. The two main usage scenarios of SyferText are:
- 🔥 Secure plaintext pre-processing: Enables pre-processing of text located on a remote machine without breaking data privacy.
- 🚀 Secure pipeline deploy: Starting from version 0.1.0, SyferText will be able to bundle the complete pipeline of pre-processing components and trained PySyft models and to securely deploy it to PyGrid.
To get a more detailed introduction about SyferText, watch 🎥 OpenMined AMA with Alan Aboudib available on YouTube.
In order to install and start using SyferText, you first have to install git-lfs
by following this short guide.
Then go ahead and install our experimental language model that we adapted form spaCy's en_core_web_lg
model. This should take a few minutes since the model size is >800M.
$ pip install git+git://github.com/Nilanshrajput/syfertext_en_core_web_lg@master
If you had already installed syfertext_en_core_web_lg
prior to installing git-lfs
please do the following:
- Uninstall
syfertext_en_core_web_lg
- Install
git-lfs
. - Reinstall
syfertext_en_core_web_lg
.
Now you can go ahead and install SyferText:
$ git clone https://github.com/OpenMined/SyferText.git
$ cd SyferText
$ python setup.py install
That's it, you are good to go!
SyferText can be used to work with datasets residing on a local machine (or a local worker as we call it in PySyft), as well as with private datasets on remote workers. Here is a list of tutorials that you can follow to get more familiar with SyferText:
Code Examples | Use Cases |
1. Tokenizing local strings | 1. Training a sentiment classifier on multiple private datasets |
2. Tokenizing remote strings | |
3. Using the SimpleTagger |
More tutorials are coming soon. Stay tuned!
SyferText is created and maintained by the NLP team at OpenMined and by volunteer contributors from all around the world. Here are the current members of the core NLP team. The team is growing!
Alan Aboudib Team Lead / Author |
Nilansh Rajput OM NLP team / Core Dev |
Jatin Prakash OM NLP team / Core Dev |
Sachin Kumar OM NLP team / Core Dev |
Bachir Chihani OM NLP Team / Core Dev |
Márcio Porto OM NLP team / Core Dev |
Antonio Lopardo OM NLP team / Documentation |
- (October 26th, 2019) DevFest2019, Reading, UK.
Demo on remote blind tokenization with SyferText.
- (March 19th, 2020) GDG Meetup, Reading, UK. (Cancelled due to COVID-19)
Demo on sentiment analysis with SyferText on multiple private datasets.
-
(May 13th, 2020): OpenMined AMA. (Cancelled due to COVID-19)
-
(June 17th, 2020): OpenMined AMA.
SyferText vision and encrypted sentiment analyzer demo.
- (June 18th, 2020): The Federated Learning Conference.
Introduction to SyferText.
- (July 8th, 2020): OpenMined Paris Meetup.
SyferText vision and encrypted sentiment analyzer demo.
- (July 29th, 2020): MLH Fellowship Talk.
About SyferText and my Open Source Contribution Experience with OpenMined
- (September 16th, 2020 at 5:30PM GMT): OpenMined AMA. (Cancelled)
Introducing SyferText 0.1.0
To get news about feature and tutorial relseases:
Alan Aboudib: @twitter
and join #lib_syfertext channel on slack.
To get support in using this library, please join the #lib_syfertext Slack channel. If you’d like to follow along with any code changes to the library, please join the #code_syfertext Slack channel. Click here to join our Slack community!
This project follows the all-contributors specification. Contributions of any kind are welcome!
We, at the NLP team, are eager to learn about new real-world use-cases around which new features in SyferText could be built.
If you think that SyferText, in its current state or by adding more features, could be useful to your research or company, please contact us as indicated below in the Contact Us section, and let us discuss how we can help.
You can reach out to us by contacting Alan on one of the following channels: