ParallelTextProcessing: A Jupyter Notebook repository from rafaelvalero

I would like to process text in the most parallel possible way. The library multiprocessing could give some errors, a possible way to overcome them is by using pathos.

Examples of errors and how to install pathos: https://kampta.github.io/Parallel-Processing-in-Python/

For this I provide here two jupyter notebooks:

parallelizing_text_processing.ipynb.
parallelizing_text_processing_pathos.ipynb . Using pathos.

Althought the results are the some in this case.

References:

Related questions online:
- Parallization using sklearn and tfidfvectorizer: https://stackoverflow.com/questions/28396957/sklearn-tfidf-vectorizer-to-run-as-parallel-jobs
Really good answer to parallelize using dataframes in python: http://blog.adeel.io/2016/11/06/parallelize-pandas-map-or-apply/ To re-do it using joblib: https://joblib.readthedocs.io/en/latest/auto_examples/parallel_memmap.html

rafaelvalero/ParallelTextProcessing