JobPulse.fyi is a powerful tool that tracks software engineering and product manager openings tailored for students. This repository is a part of the JobPulse.fyi project and is designed to scrape job information from company websites using Google's API.
- Job search: Given a query and a website, the scraper searches for job listings that match the query.
- Data extraction: The scraper visits each job listing page and extracts relevant data, such as job title, years of experience, company, application link, location, and job description.
- Python 3.7 or above
- Packages: BeautifulSoup, selenium, pytz, requests
- Google API key
- OpenAI API key
-
Clone this repository:
-
Install the required packages:
pip install -r requirements.txt
-
Get a Google API Key:
- Follow the steps from Google Custom Search JSON API to obtain a Google API key and a Search Engine ID (cx key).
-
Get an OpenAI API Key:
- Follow the steps from OpenAI to get an API key.
-
Set the environment variables:
-
Copy the
.env.example
file to a new file named.env
and fill in the appropriate keys:GOOGLE_API_KEY=your_google_api_key CX_KEY=your_cx_key OPENAI_KEY=your_openai_key
-
-
Modify the query and site variables in the
main
function as per your requirements. -
Run the code:
python3 src/main.py --run_pure
This project is licensed under the MIT License - see the LICENSE file for details.
Please feel free to contact us if you have any questions about the project.
Join us on Discord: Discord Link
Happy Coding!
This README is subject to updates, please stay tuned for any changes.
jobPosting schema class Mandatory:
- apply_link: str
- company: str
- date_added: str
- title: str
Optional:
- description: str
- location: str
- category: "Software Engineer"
- title_correct_by_gpt: True