dirtyfilthy/freshonions-torscraper

a problem in running some scripts

Opened this issue · 5 comments

hi there... I run some scripts of this project successfully but there is a problem with some scripts.
by default I disable the elastic-search and I push a .onion domain with push.sh, and I see the results in domain and page tables,but after that, when I run scraper-service.sh face with a loop like below:
screenshot from 2017-10-15 12-57-50
screenshot from 2017-10-15 12-59-27

As you see a part of my terminal during running scraper-service.sh, scrapy opens and closes again and again and it can not get out from this loop by itself...
the second problem is with elastic-search! When i enable it, none of the scripts run and i have this error:
screenshot from 2017-10-15 13-48-50
screenshot from 2017-10-15 13-51-04

Would you plz guide me... and if you need some files, i will show you...thanks...

I think that you had a problem in your initial configuration, if you want to restart your project and try without elasticsearch, you can try with an updated Readme : https://github.com/GoSecure/freshonions-torscraper/blob/update-readme/README.md

hi @mrL3x ,
thanks for your guides...
I run the project successfully, but some tables in my db are empty including category, category_link,headless bot, open_port, web component and web component link! I know it is because of something about codes and scripts that i don't know what should i do! Also, I think some .sh and .py files are missed in this clone such as corpus.py and etc. Please check the project's files and guide me to complete the run...
thanks...

Hi @davisbra ,
If you cloned the project you supposed to have all files that you need to compile the project. I didn't miss any files, the project ran perfectly. I think that the maintainer updated the database schema to add more functionalities but he decided to stop them, so you may have the beginning of functionalities never finished. It's only a theory. Like I said before, my project works well without these tables and I didn't know if these tables are useful or not. if you had followed the readme file that I sent you, you supposed to be all good. How many onions do you have? How many of them are valid (alive/green)?

Hi @mrL3x ,
I checked the commits and i saw some files (like autocategorise folder and corpus.py in it) which i didn't have in cloned file! As i said, the project ran without them, but some tables are empty ... there are about30,000 domains in which about 7000 of them are alive... Are these result ok?
I have another question, do you have any idea about forum spidering? This crawler works well, but it seems that it can not enter private forums which requires login or register!

Hi @davisbra,

We create a channel about forum spidering: #19

I think the ratio alive/down is correct. I didn't see the others files that can be missing. Also, I didn't take care of the empty tables I didn't find that the most important thing, but it can be a good point to check this in the future.