Legal500-Web-Scrapper
This is a Scrapy Web Spider that scraps information about companies listed on the Legal500 directory site.
How to run the script
-
cd to the directory where the requirements.txt file is located
-
run: pip install -r requirements.txt in your shell to install the required dependencies
-
run: scrapy crawl legal500_spider1 -a directory_url="name_of_legal500_directory" -O "filepath_to_save_output.json"
The -a command allows you to pass an argument to the directory_url parameter
The -O command allows you to define the location and file to store the output
NB: Wait for the script to finish running
- run: scrapy crawl legal500_spider2 -a filename="filepath_to_saved_output_from_spider1" -O "filepath_to_save_output.json"
Some commands to get you started
To run the first script: scrapy crawl legal500_spider1 -a directory_url="https://www.legal500.com/c/germany/directory/" -O "../scrapped_data/firm_urls.json"
To run the second script: scrapy crawl legal500_spider2 -a filename="../scrapped_data/firm_urls.json" -O "../scrapped_data/firm_details.json"