/Freadom

A simple tool to scrape articles. Works even on websites that dont allow users to read the content without a login.

Primary LanguagePython

WEC NITK GSDC Task ID: Web Scraper

This simple tool allows the user to provide a URL to article. The script scrapes the article contents and serves the article as a docx file to the user.


USP: It even works on websites that dont allow a user to read an article without a subscription/login! One such article is here


Demo

Demo

Setting up the project:


Installing and using a Virtual Environment

pip install virtualenvwrapper-win
mkvirtualenv test   test = name of virtual env


Install required packages:

pip install -r requirements.txt

To run project:

After ensuring that we are in a virtual environment (If not, use workon test)

python manage.py makemigrations
python manage.py migrate
python manage.py runserver

Visit development server http://127.0.0.1:8000/



Tech Stack

Python HTML CSS

Implemented Features

  • Scrape articles
  • Ability to download a word doc of the same without storing it on server side

Known Bugs and imporvements

  • Doesnt work on Medium articles
  • Nothing to show the progress when file is being processed.

References:

Django's Official Documentation
Python Docx Documentation
Stack Overflow
Running a python script from Django
Exporting docx with Django