This repository is a web scraping project that collects data about internships in various cities. It heavily relies on JobSpy.
We have several tasks planned for the project's development:
-
Move these tasks into issues tab: For better organization and collaboration, let's move these tasks into GitHub issues.
-
Divide cities into two different files: Divide cities into two different files: general ones as they are now (a list that can contain all cities you want), and the ones specific for indeed, as this it accepts only some cities that you can check here: JobSpy.
-
Queries by source: do one query for all the cities with linkedin, and one query with only the cities for indeed.
-
Add 'closed' column: add a 'closed' column to indicate if a position is closed. Users can insert this information through pull requests or through automated scraping (if feasible).
-
Add Indeed Contributions: Similar to our LinkedIn scraping, implement scraping for Indeed job postings.
-
Data representation: Instead of storing data as lines in .txt files, pass pandas DataFrames through scripts for more efficient data management.
-
Explore additional data sources: expand our scraping capabilities beyond city-based data to include company-based data. This may involve scraping from company career sites or LinkedIn research filtered by specific companies.
Feel free to contribute to these tasks and help make this project better.
Explore the visual representation of the internship data collected by this project in the European Tech Internships 2024 Repository.
We appreciate your support and contributions to this open-source project!