Unreachable code
delirious-lettuce opened this issue · 3 comments
Line 65 in b3af99c
To get inside of this if
statement, link.get('href')
must start with http
but then it cannot ever start with #
or tel:
or :mailto
.
Since link.get('href')
is being used so frequently in this function, a better option would be to assign it to a variable like current_link = link.get('href')
and also use Python's str.startswith
instead of slices
>>> link = {'href': 'http://www.google.com'}
>>> current_link = link.get('href')
>>> current_link[:4] == 'http'
True
>>> current_link.startswith('http')
True
>>> current_link.startswith(('#', 'tel:', 'mailto:'))
False
I commented out these two seemingly unreachable sections to highlight them.
if link.get('href')[:4] == "http":
# SAME ORIGIN
if domain in link.get('href'):
# IF URL IS DYNAMIC
if "?" in link.get('href'):
print OKRED + "[+] Dynamic URL found! " + link.get('href') + " " + RESET
urls.write(link.get('href') + "\n")
urls_saved.write(link.get('href') + "\n")
dynamic_saved.write(link.get('href') + "\n")
# # DOM BASED LINK
# elif link.get('href')[:1] == "#":
# print OKBLUE + "[i] DOM based link found! " + link.get('href') + " " + RESET
# # TELEPHONE
# elif link.get('href')[:4] == "tel:":
# s = link.get('href')
# phonenum = s.split(':')[1]
# print OKORANGE + "[i] Telephone # found! " + phonenum + " " + RESET
# phones_saved.write(phonenum + "\n")
# # EMAIL
# elif link.get('href')[:7] == "mailto:":
# s = link.get('href')
# email = s.split(':')[1]
# print OKORANGE + "[i] Email found! " + email + " " + RESET
# emails_saved.write(email + "\n")
# FULL URI OF SAME ORIGIN
else:
print link.get('href')
urls.write(link.get('href') + "\n")
urls_saved.write(link.get('href') + "\n")
# EXTERNAL LINK FOUND
else:
# IF URL IS DYNAMIC
if "?" in link.get('href'):
print COLOR2 + "[+] External Dynamic URL found! " + link.get('href') + " " + RESET
# # DOM BASED LINK
# elif link.get('href')[:1] == "#":
# print COLOR2 + "[i] External DOM based link found! " + link.get('href') + " " + RESET
# # TELEPHONE
# elif link.get('href')[:4] == "tel:":
# s = link.get('href')
# phonenum = s.split(':')[1]
# print OKORANGE + "[i] External Telephone # found! " + phonenum + " " + RESET
# # EMAIL
# elif link.get('href')[:7] == "mailto:":
# s = link.get('href')
# email = s.split(':')[1]
# print OKORANGE + "[i] External Email found! " + email + " " + RESET
# FULL URI OF EXTERNAL ORIGIN
else:
print COLOR2 + "[i] External link found! " + link.get('href') + " " + RESET
Thanks for the heads up. I removed the affected lines from the master branch, so should be good now.
@1N3 ,
No problem, I just wasn't sure which way you wanted to go with it (just deleting it or re-working them in some other way).
I've also been experimenting with using a scrapy spider to get the links instead of how it is currently being done.
@delirious-lettuce, cool. If there's a better way to do it, I'd be interested to see it.