All the data from GSoC-archive in JSON format.
-
Data/
-
orgs/
- all orgs that have been a part of GSoC from 2005 to 2017 -
projects/
- all projects that are completed under GSoC program from year 2005-2017
-
-
Scrapers/
- Contains all the scrapers used for scraping the data
-
organiztion_2005.json
-organization_2008
link
: URL of the orgname
: Name of the org
-
organization_2009-2013.json
about
: Work that org dolink
: URL of the orgmail
: Mailing list of the orgname
: Name of the orgpage
: Idea page of the org
-
organization_2014-2015.json
link
: URL of the orgmail
: Mailing list of the orgpage
: Idea page of the orgname
: Name of the org selected
-
organization_2016-2017.json
about
: Info about the organizationlink
: URL of the orgname
: Name of the org
-
project_2005.json
-project_2008.json
Mentor
: Name of the mentor of the projectproject
: Name of the projectstudent
: Name of the student
-
project_2009-2013.json
&project_2014-2015.json
Organization
: Name of the organizationdetail
: Detail about the projectlink
: Link to the projectstudent
: Name of the student selectedtitle
: Name of the project
-
project_2016-2017.json
Organization
: Name of the organizationlink
: Link to the projectmentors
: Name of all the studentsstudent
: Name of the studentstitle
: Name of the project
This data will be used for improving the functionality of Soccer.
It can also be used to generate various stats, plots or answer data-related questions like:
- Who did the most number of GSoCs? under which org?
- Which org has the highest sutdent-to-mentor conversion rate? (students who first did GSoC under the org, and then became mentors)
- Run some magic on the descriptions of projects over the years to find out if there is a trend of ML related projects.
etc. etc.
Feel free to open issues to discuss any more ideas!