Directory finallinks not found.

Question

Directory finallinks not found.

Closed this issue 6 years ago · 3 comments

Hi,
I tried scraping urls with archive_scrapper, it worked smoothly.
But when trying to scrapping the content out of those url i tried running scrape with url. it is looking in the directory "links/finallinks" but there is no folder as such as finallinks created by archive_scrapper.

Answer 1 · 2018-06-28T11:08:03.000Z

@bhangaboy Hi, you need to run merger.py before running scrape_with_bs4.py. It will create separate final data file for each of the company. This data is merged from the google search results as well as all the web archives together. This is required to get rid of duplicate entries from different sources. scrape_with_bs4.py scrapes the data sequentially. To scrape it parallely, use quick_scraper.py .

Answer 2 · 2018-06-28T13:33:37.000Z

Hi gyanesh,
Thanks for replying. I tried running merger.py file for every company a new file was created for each company even after that i was getting that same directory error. It would be great help if you provide steps in good detail.

Answer 3 · 2018-07-18T18:43:56.000Z

@bhangaboy Hi, after running merged.py to get unique links for each company follow these steps:

Create a folder called finallinks inside links folder and copy the unique/full output which you got from running merge.py inside this folder.
Create an empty file tracker.data inside links folder. In Linux, you can do touch tracer.data.
After these steps run python scrape_with_bs4.py to get the data.