This repo scrapes hebrew songs from 3 different sites.
In order to run it you need to:
- Run wget_all.sh - This script will start getting the html files from the different sites (After running this you'll see directory created for each site)
- Wait a while so the script will start getting some html files to scrape.
- Run scrape_all.sh - This script will scrape the html files and will output files with only the songs text (All in the same file) It will be named _output.txt
Check it out. You should see something similar to this: הגבול הלגיטימי לגעגועים
(ככל דבר סכריני אחר) עובר מן מחוזות השעמום וחבלי הנידחות לנפת הברירה נטולת הגבולות של עולם הציד החופן בחובו מועדון רוק אפלולי, נוטף זכרונות -
The scrapers might output 'Processing error' for some files. In this case it ignores it and continues to the next one. Each scraper looks for a specific tag in the html and if it doesn't finds it then it will not work. Not all files contain songs so this it not really a problem.