a problem in running some scripts
Opened this issue · 5 comments
hi there... I run some scripts of this project successfully but there is a problem with some scripts.
by default I disable the elastic-search and I push a .onion domain with push.sh, and I see the results in domain and page tables,but after that, when I run scraper-service.sh face with a loop like below:
As you see a part of my terminal during running scraper-service.sh, scrapy opens and closes again and again and it can not get out from this loop by itself...
the second problem is with elastic-search! When i enable it, none of the scripts run and i have this error:
Would you plz guide me... and if you need some files, i will show you...thanks...
I think that you had a problem in your initial configuration, if you want to restart your project and try without elasticsearch, you can try with an updated Readme : https://github.com/GoSecure/freshonions-torscraper/blob/update-readme/README.md
hi @mrL3x ,
thanks for your guides...
I run the project successfully, but some tables in my db are empty including category, category_link,headless bot, open_port, web component and web component link! I know it is because of something about codes and scripts that i don't know what should i do! Also, I think some .sh and .py files are missed in this clone such as corpus.py and etc. Please check the project's files and guide me to complete the run...
thanks...
Hi @davisbra ,
If you cloned the project you supposed to have all files that you need to compile the project. I didn't miss any files, the project ran perfectly. I think that the maintainer updated the database schema to add more functionalities but he decided to stop them, so you may have the beginning of functionalities never finished. It's only a theory. Like I said before, my project works well without these tables and I didn't know if these tables are useful or not. if you had followed the readme file that I sent you, you supposed to be all good. How many onions do you have? How many of them are valid (alive/green)?
Hi @mrL3x ,
I checked the commits and i saw some files (like autocategorise folder and corpus.py in it) which i didn't have in cloned file! As i said, the project ran without them, but some tables are empty ... there are about30,000 domains in which about 7000 of them are alive... Are these result ok?
I have another question, do you have any idea about forum spidering? This crawler works well, but it seems that it can not enter private forums which requires login or register!