The goal is to cluster resumes into relevant categories using the CorEx topic model and analyze the most common words and topics. This can help with tasks like automatically scanning and classifying resumes for job applications, making the process faster and more efficient.
- Python: The core language used for development
- Pandas, Numpy: for data manipulation and analysis
- Scikit-learn: for preprocessing
- CorexTopic: for CorEx topic modeling
- Matplotlib: for visualizations
- NLP Techniques: Count Vectorizer, text cleaning, tokenization
- Flask: For deployment as a web application.
- Data Preparation: Load and preprocess the resume dataset.
- Text Processing: Apply cleaning, stopword removal, and tokenization.
- Topic Modeling: Use the CorEx model to assign topics to resumes.
- Analysis: Visualize the distribution of topics and explore which resumes fall under specific categories.
- Clone this repository:
git clone https://github.com/pelinoz/Ozgecmis-Tarama-Sistemi.git
- Run the core script:
python app.py
- View the results of the topic modeling in the output folder or visualize the topics using matplotlib plots.