Install package in development mode:
pip install -e .
Package name is fydjob
. Example:
import fydjob.utils as utils
from fydjob.utils import tokenize_text_field
Scrapes job offers. To use it, download chromedriver
from the Google Drive folders and place it in drivers/
.
Supports Indeed API parameters. When not specified, the default parameters are:
start = 0 #the job offer at which to start
filter = 1 #the API tries to filter out duplicate postings
sort = 'date' #get the newest job offers (alternative is 'relevant')
To run the scraper:
pip install -r requirements.txt
python -m fydjob.IndeedScraper
Input job title, location, and a limit on the job offers to extract.
Output is saved in fydjob/output/indeed_scrapes/
. Filename format is jobtitle_location_date_limit
.
Loads JSON files from fydjob/output/indeed_scrapes
and Kaggle file from fydjob/output/kaggle
. Joins the dataframes and applies basic preprocessing. To run as a script:
python -m fydjob.IndeedProcessor
To run as a class:
from fydjob.IndeedProcessor import IndeedProcessor
ip = IndeedProcessor()
Output is saved in fydjob/output/indeed_proc
The skills dictionary is assembled here. The file spreadsheet is downloaded as Excel file and placed into fydjob/data/dicts/skills_dict.xlsx
. Then:
from fydjob import utils
utils.save_skills() #extracts skills and saves them in JSON
utils.load_skills() #loads the skills from JSON file
This is just the setup. If you haven't changed the pipeline, just run utils.load_skills
to get the skills.
- Document here the project: find-your-dream-job
- Description: Project Description
- Data Source:
- Type of analysis:
Please document the project the better you can.
The initial setup.
Create virtualenv and install the project:
sudo apt-get install virtualenv python-pip python-dev
deactivate; virtualenv ~/venv ; source ~/venv/bin/activate ;\
pip install pip -U; pip install -r requirements.txt
Unittest test:
make clean install test
Check for find-your-dream-job in gitlab.com/{group}. If your project is not set please add it:
- Create a new project on
gitlab.com/{group}/find-your-dream-job
- Then populate it:
## e.g. if group is "{group}" and project_name is "find-your-dream-job"
git remote add origin git@github.com:{group}/find-your-dream-job.git
git push -u origin master
git push -u origin --tags
Functional test with a script:
cd
mkdir tmp
cd tmp
find-your-dream-job-run
Go to https://github.com/{group}/find-your-dream-job
to see the project, manage issues,
setup you ssh public key, ...
Create a python3 virtualenv and activate it:
sudo apt-get install virtualenv python-pip python-dev
deactivate; virtualenv -ppython3 ~/venv ; source ~/venv/bin/activate
Clone the project and install it:
git clone git@github.com:{group}/find-your-dream-job.git
cd find-your-dream-job
pip install -r requirements.txt
make clean install test # install and test
Functionnal test with a script:
cd
mkdir tmp
cd tmp
find-your-dream-job-run