/map-reduce-and-multiprocessing

Multiprocessing can be an effective way to speed up a time-consuming workflow via parallelization. This article illustrates how multiprocessing can be utilized in a concise way when implementing MapReduce-like workflows.

Primary LanguageJupyter NotebookMIT LicenseMIT

map-reduce-and-multiprocessing

Multiprocessing capabilities can be an effective tool for speeding up a time-consuming workflow by making it possible to execute portions of the workflow in parallel across multiple CPU cores. However, for more complex workflows, data and control flow dependencies that can lead to race conditions can make implementation, debugging, and maintenance more challenging.

One approach to consider when planning a new workflow is whether the workflow is amenable to a more functional implementation that leverages map and reduce operations (i.e., whether it is compatible with the MapReduce paradigm). This article illustrates how multiprocessing can be utilized in a more concise and less error-prone way when parallelizing a MapReduce-like workflow.