/gcsj-profile-scraper

Google Cloud Study Jam 2023 Public Profile Scraper for realtime stats and leaderboard.

Primary LanguageTypeScript

Google Cloud Study Jam Profile Scraper

Typescript

About

Study Jam Banner

This project was aimed at scraping the public badge profiles of the Google Cloud Study Jam participants. This data can then be used in any project through a MongoDB database. This is especially helpful for creating a leaderboard for the program.

How to use

  • Clone the repository in your desired directory
  • Store the public profile URLs in the ./src/data directory as input.csv
    (Don't worry, you only need to do this once, this is required to fetch the public badge profile URLs)
  • (IMPORTANT!) The CSV must be formatted as below with the two fields: Student Name and Profile URL
    image
  • Create your own .env and store your own MongoDB Atlas URI there. (See .env.example)
  • Install the node_modules with
    npm install
    
    or
    yarn install
    
    or
    pnpm install
    
  • Run locally using
    npm run dev
    
    or
    yarn dev
    
    or
    pnpm dev
    
  • All scraped data will be used to update the Mongo Database, you can then use that data to create your own leaderboard.
  • You can create a cronjob to automate the scraping once every 30 mins - 1 hour.

ENJOY 😉☕

MIT License