istresearch/scrapy-cluster

runspider: error: Unable to load 'link_spider.py': attempted relative import with no known parent package

BeamoINT opened this issue · 4 comments

Hello, I got this issue when trying to start the runspider on Scrapy Cluster so that I could feed URLs into it.I have everything set up properly, Kafka is good, Redis is good, Zookeeper is good, etc. I just don't know what this issue could be. Thanks so much!

root@crawler:~/scrapy-cluster/crawler/crawling/spiders# scrapy runspider link_spider.py
Usage

scrapy runspider [options] <spider_file>
runspider: error: Unable to load 'link_spider.py': attempted relative import with no known parent package

The docs here should show you how to run your spider properly.

scrapy runspider crawling/spiders/link_spider.py

I have been looking through the docs and have not found the issue yet, but I have a few more questions, does this command automatically start the crawler without anything having to be fed into it?

scrapy runspider crawling/spiders/link_spider.py

If so, is there a starting URL in the settings and does it branch off from there to crawl multiple URLs from the seed URL? If you do have to feed a URL into it to start it, does it then automatically start crawling other URLs from there? Sorry for the so many questions, thank you for your help.

Scrapy cluster runs on inbound requests via Kafka. Please see the API documentation on how to push requests into the cluster.

Please close this issue if the original request has been answered.

Closing due to inactivity