This is a project to implement a machine learning system that can self-learning by collecting data from google using crawler. This system can be split into three major stages, the crawler stage, feature extraction stage, and SVM modeling stage. Each stage takes a long time, so we using lots of parallel methods to speed up them. Finally, we get approximately 85 times speedup over the serial program.
- CPU: Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz
- Core: 2 * (12 cores 24 threads)
- GPU: RTX 2080 Ti 12GB
OS:
- CentOS 8
To reproduct our implementation without do the following steps:
Using Anaconda is strongly recommended.
conda create -n pp_smls python=3.6
conda activate pp_smls
git clone https://github.com/vbnmzxc9513/Parallel-self-learning-model.git
pip install -r requirement.txt