s-rah/onionscan

Is there any alternative for 'snapshot'?

powerfulTrouser opened this issue · 1 comments

I'm a student and I'm trying to follow this site

http://www.automatingosint.com/blog/2016/09/dark-web-osint-part-four-using-scikit-learn-to-find-hidden-service-clones/

to use machine learning to analysis dark web.
But I had found that 'snapshot' became unavailable.
Then I found an issue said this function had been moved to dat_0
My dat_0 file is about 10G.
I tried to parse it by python and kaitai struct but failed.
onions.py.txt
parsedat.py.txt
Is there any way to at least implement the analysis from the website?
(use old version onionscan or some tutorial of how to achieve same goal by new onionscan or somewhat)

Thanks!

Finally I use python to parse dat_0 to many many many json file

`# coding:utf-8
import json
import sys
import os
import stat

i = 0
knife = '{"Page":{"Status":'

def is_json(myjson):
try:
json_object = json.loads(myjson)
except ValueError as e:
try:
json_object = json.loads(myjson.rsplit('}', 2)[0] + '}')
except ValueError as e:
print(e)
print(myjson)
return 0
print(myjson.rsplit('}', 2)[0] + '}')
return myjson.rsplit('}', 2)[0] + '}'
return myjson

with open('/Home/dat_0.json') as f:
for line in f:
for frag in s.split(knife):
if len(frag) is 0 and '{' not in frag:
del frag
else:
frag = frag.rsplit('}', 1)[0]
frag = knife + frag + '}'
frag = str(frag)
if is_json(frag) is not 0:
result_json = json.loads(is_json(frag))
if result_json['Page']['Status'] != 403 and result_json['Page']['Status'] != 404:
print("下一個")
path = ('/Home/parse dat-1/' +
result_json['URL'].encode('utf8')[7:-1].replace('/', '斜線')+'.json')
try:
f = open(path, 'w+')
except IOError as e:
path = ('/Home/parse dat-1/' +
'有問題'+str(i)+'.json')
i = i + 1
print(e)
f = open(path, 'w+')
f.write(frag)
f.close()
`
It won't generate json file which status is 403 or 404.
I use '{"Page":{"Status":' to split the file, wondering there's any better cut string.
This is not a beautiful solution, but it works however.