A command-line interface to scrape jobs from Indeed.com with search terms. It is configured to look for jobs in Israel.
Note: the tags in Indeed website can change and if it does you will see that no results are being found and the Dataframe
that comes back is None
.
Currently this version works with Unix-based systems only (tested on MacOS with python 3.7 and Google Chrome)
- This package uses
Selenium
which can emulate browser usage within python, with Google Chrome browser. You need to install Chrome and the correct driver to use with Selenium in addition to theselenium
package. Please refer to https://pypi.org/project/selenium/ for further details. - Clone the repo to your local machine.
- Give execution permissions to
jobs.py
like so:$ chmod +x jobs.py
You can always issue $ ./jobs.py -h
for help and usage instructions.
For example, in order to grab the 30 first (sorted by date with newest first) jobs for "data scientist" query
and keep only the jobs that have at least one of "machine learning, engineer, python" keywords (separated by ,
) inside
the (description + requirements) text you would type:
$ ./jobs.py "data scientist" -t "machine learning, engineer, python" -n 30
If you wanted the first 30 jobs to be sorted by relevance instead, you would add -sbr
to the former.
And if you wanted to keep only the jobs that have all the keywords in the text you would add -a
to the command above.
The output is an HTML table with job title, company, text, and a clickable link as its columns. This table is being saved as
jobs_[Current_Date].html
to your main folder where you cloned the repo to.
Current_Date is the current date like so: 2020-03-14.
The command keeps logs for different stages of the process. There is one log file for each date so if you run this couple of
times at the same day, the logs get appended to the same file. The logs are saved in the logs
folder under you main folder
where you cloned the repo to.
- Specify the country in which to look. This could possibly have issues with html tags compatibility across different indeed country-specific sites.
- Add option to send jobs file to email. Potential problem (for the free options) when using Gmail is a trade-off between allowing non-secure connections to your Gmail or generating a refresh token that will expire once in a while, so the solution is not robust enough.
- Optional - install the
jobs.py
script to system PATH. - Add example for usage with cron
- Make the code work with Windows