The present corpus was part of a summer internship. We use scrapy spiders/crawlers to crawl the Moroccan newspaper websites and save all the scraped data to either json or txt files. We built spiders/crawlers for the following news websites:
Every folder represents the project folder for every newspaper.
To scrape any data from any of the newspapers above,
Download its project folder.
On the command line, change directory to the project folder
Invoke the following command to start scrabing the website: scrapy crawl < name of the spider > -o < name of the file >.json
Note
Every spider/crawler automatically saves a text file in addition to either json files or xml files that you determine when you run your spider in the command line.
This is the link to download about 2 gigabytes of texts. https://drive.google.com/open?id=1w2-DTJF2phU3fVf4XkDh1tsN-O3N_baF