/ScrapperOP

A platform for Police Department where police can gather data about an individual from different social media sites that is publicly available.

Primary LanguageJavaScript

Scraper Op

A platform developed for Police Department where police can gather data about an individual from different social media sites that is publicly available.

Demo

Video Link

Platform Glance

Register Page Login Page
Register login
Home Page Data Fetched
Faa meet

Table of Contents

Features

  • Phone Number Scraping

    • Scrape data associated with mobile number in India.
    • Get details of the sim location with proper address.
    • Receive data in JSON format.
    • Download data in CSV file.
    • Screen Sharing
  • Tweets Toxicity Detection (X-factor / Flagship)

    • Archive tweets of an individual and give toxicity details about it.
    • Toxicity details are served on 7 different parameters.
    • Uses Tensorflow Model to detect the toxicity category.
  • Twitter Scraper

    • Search Twitter Account Details by providing username.
    • Extract all the tweets of the individual.
    • Receive data in JSON format.
    • Download data in CSV file.
  • Linkedin Scraper

    • Extract data from a LinkedIn profile of a user.
    • Receive data in JSON format.
    • Download data in CSV file.
  • Instagram Scraper

    • Extract data from a Instagram profile of a user.
    • Obtain profile image, no. of followers & following and much more.
    • Receive data in JSON format.
    • Download data in CSV file.
  • Facebook Scraper

    • Extract data from a Facebook profile of a user.
    • Obtain profile image, no. of followers & following and much more.
    • Receive data in JSON format.
    • Download data in CSV file.

TechStack

Key Technologies Used

  1. Front End / Client Side

    • ReactJS
    • Bootstrap - CSS and other components
  2. BackEnd Server ( Followed 2 backend architecture to distribute load on servers ):

    • Flask Backend

      • Facebook Scraper - Intakes Facebook username or ID username and scrapes data from user profile dismentaling the site.
      • Twitter Scraper - Intakes Twitter username and scrapes data from user profile dismentaling the site.
    • MongoDB Backend

      • Phone Number Scraper - Scrapes data associated with a mobile number.
      • Instagram Scraper - Intakes Instagram username and scrapes data from user profile dismentaling the site.
      • LinkedIn Scraper - Intakes LinedkIn username and scrapes data from user profile dismentaling the site.
  3. Data Management (Databases):

    • MongoDB Atlas - Data management and user details

Installation

Pre-Requisites:

  1. Install Git Version Control [ https://git-scm.com/ ]

  2. Install Python Latest Version [ https://www.python.org/downloads/ ]

  3. Install Pip (Package Manager) [ https://pip.pypa.io/en/stable/installing/ ]

  4. Install MongoDB Compass and connect it to localhost 27017 [ Atlas Connection is quite slow and may not work everytime ]

  • Uncomment the following code in app.py to change the connection as per requirement.

    Faa

Clone the project:

  git clone https://github.com/rajprem4214/ScrapperOP.git

Go to the project directory

  cd ScraperOP

Backend Server:

Go to backend folder

  cd backend-python

Create a Virtual Environment and Activate:

Install Virtual Environment

  pip install virtualenv

Create Virtual Environment:

  virtualenv venv

Go to venv folder and Activate virtual enviroment

  cd venv

Run the following command

  .\Scripts\activate.ps1

Go back to backend folder

  cd ..

Install Requirements from 'requirements.txt'

  pip install -r requirements.txt

Start the backend server

 flask run

Start the other backend server

 cd FinalKSP

Navigate to server folder

 cd Server

Run the following command

 nodemon index.js

Frontend Server:

Go to frontend folder

 cd instaSCRAP

Install all dependencies

 npm install

Start frontend server

 npm run start

Local Url for Server:

Optimizations

  • Reduced time in scraping data by distributing server load to 2 servers.
  • In toxicity detection, instead of all tweets, some sorted tweets were passed for faster detection.

Future Aspects

  • Increase API rate limits to seamlessly extract data.
  • Use cloud services architecture to improve performance of the platform.

Authors

  • Prem Raj
  • [Saishwar Anand]
  • [Utsav Sinha]