Automated tool for scraping job postings into a .csv
file.
- Never see the same job twice!
- Browse all search results at once, in an easy to read/sort spreadsheet.
- Keep track of all explicitly new job postings in your area.
- See jobs from multiple job search sites all in one place.
The spreadsheet for managing your job search:
JobFunnel requires Python 3.6 or later.
All dependencies are listed in setup.py
, and can be installed automatically with pip
when installing JobFunnel.
pip install git+https://github.com/PaulMcInnis/JobFunnel.git
funnel --help
If you want to develop JobFunnel, you may want to install it in-place:
git clone git@github.com:PaulMcInnis/JobFunnel.git jobfunnel
pip install -e ./jobfunnel
funnel --help
- Set your job search preferences in the
yaml
configuration file (or use-kw
). - Run
funnel
to scrape all-available job listings. - Review jobs in the master-list, update the job
status
to other values such asinterview
oroffer
. - Set any undesired job
status
toarchive
, these jobs will be removed from the.csv
next time you runfunnel
. - Check out demo/readme.md if you want to try the demo.
Note: rejected
jobs will be filtered out and will disappear from the output .csv
.
-
Custom Status
Note that any custom states (i.eapplied
) are preserved in the spreadsheet. -
Running Filters
To update active filters and to see anynew
jobs going forwards, just runfunnel
again, and review the.csv
file. -
Recovering Lost Master-list
If ever your master-list gets deleted you still have the historic pickle files.
Simply runfunnel --recover
to generate a new master-list. -
Managing Multiple Searches
You can keep multiple search results across multiple.csv
files:funnel -kw Python -o python_search funnel -kw AI Machine Learning -o ML_search
-
Filtering Undesired Companies
Filter undesired companies by providing your ownyaml
configuration and adding them to the black list(seeJobFunnel/jobfunnel/config/settings.yaml
). -
Filtering Old Jobs
Filter jobs that you think are too old:funnel -s JobFunnel/demo/settings.yaml --max_listing_days 30
will filter out job listings that are older than 30 days. -
Automating Searches
JobFunnel can be easily automated to run nightly with crontab
For more information see the crontab document.-
Glassdoor Notes
TheGlassDoor
scraper has two versions:GlassDoorStatic
andGlassDoorDynamic
. Both of these give you the same end result: they scrape GlassDoor and dump your job listings onto yourmaster_list.csv
. We recommend to always runGlassDoorStatic
(this is the default preset we have on our demosettings.yaml
file) because it is a lot faster thanGlassDoorDynamic
. However, given the event thatGlassDoorStatic
fails, you may useGlassDoorDynamic
. It is very slow, but you'll still be able to scrape GlassDoor.When using
GlassDoorDynamic
Glassdoor might require a human to complete a CAPTCHA. Therefore, in the case of automating with something like cron, you might need to be physically present to complete the Glassdoor CAPTCHA.You may also of course disable the Glassdoor scraper when using
GlassDoorDynamic
in yoursettings.yaml
to not have to complete any CAPTCHA at all:
-
- 'Indeed'
- 'Monster'
#- 'GlassDoorStatic'
# - 'GlassDoorDynamic'
-
Reviewing Jobs in Terminal
You can review the job list in the command line:column -s, -t < master_list.csv | less -#2 -N -S
-
Saving Duplicates
You can save removed duplicates in a separate file, which is stored in the same place as your master list:funnel --save_dup
-
Respectful Delaying
Respectfully scrape your job posts with our built-in delaying algorithm, which can be configured using a config file (seeJobFunnel/jobfunnel/config/settings.yaml
) or with command line arguments:-d
lets you set your max delay value:funnel -s demo/settings.yaml -kw AI -d 15
-r
lets you specify if you want to use random delaying, and uses-d
to control the range of randoms we pull from:
funnel -s demo/settings.yaml -kw AI -r
-c
specifies converging random delay, which is an alternative mode of random delay. Random delay needed to be turned on as well for it to work. Proper usage would look something like this:
funnel -s demo/settings.yaml -kw AI -r -c
-md
lets you set a minimum delay value:
funnel -s demo/settings.yaml -d 15 -md 5
--fun
can be used to set which mathematical function (constant
,linear
, orsigmoid
) is used to calculate delay:
funnel -s demo/settings.yaml --fun sigmoid
--no_delay
Turns off delaying, but it's usage is not recommended.
To better understand how to configure delaying, check out this Jupyter Notebook breaking down the algorithm step by step with code and visualizations.