This project is a web scraper designed to extract data from a website and store it in a database. The scraper is deployed on AWS Elastic Beanstalk using Python and the Flask web framework.
To get started with this project, you will need to set up the AWS Elastic Beanstalk environment and deploy the application. You will also need to set up the database and configure the application to connect to the database.
To deploy the scraper on AWS Elastic Beanstalk, you will need an AWS account and the AWS Command Line Interface (CLI) installed on your computer. You will also need a database, such as MySQL or PostgreSQL, set up and running.
To install the scraper, clone the repository to your local machine:
git clone https://github.com/FahimaChowdhury/scrapper.git
pip install -r requirements.txt
To deploy the scraper on cloud (AWS Elastic Beanstalk), follow these steps:
- Set up an Elastic Beanstalk environment using the AWS Management Console or the AWS CLI.
- Create an Elastic Beanstalk application using the AWS Management Console or the AWS CLI.
- Deploy the application using the AWS Management Console or the AWS CLI.
- For more detailed instructions on deploying a Flask application on Elastic Beanstalk, see the official AWS documentation.
To configure the scraper, edit the application.py file and set the database connection details:
DB_USER = 'username'
DB_PASSWORD = 'password'
DB_HOST = 'hostname'
DB_NAME = 'database_name'
You can also set other configuration options, such as the URL to scrape and the time interval between scrapes.
To run the scraper, activate the virtual environment and run the application.py file:
python application.py