/SkalTrial

Scrapping a site

Primary LanguagePython

SKAL TRIAL SCRAPPING

scrapy startproject projectname

scrapy genspider spidername URL

scrapy version used 2.1.0

Splash scrapy for accessing the java script objects

docker for splash browser sudo docker run -it -p 8050:8050 --name splash scrapinghub/splash requires knowledge on lua scripts

Beginner friendly

selenium for headless browser https://github.com/clemfromspace/scrapy-selenium

chrome selenium driver path https://chromedriver.storage.googleapis.com/index.html?path=81.0.4044.69/

install the pymongo pip3 install pymongo

docker run -d --network some-network --name some-mongo -e MONGO_INITDB_ROOT_USERNAME=mongoadmin -e MONGO_INITDB_ROOT_PASSWORD=secret -p 8081:8081 mongo

or

check the mongodb.yaml file for docker-compose docker-compose -f mongodb.yaml up -d

for images need

pip3 install pillow