DanMcInerney/xsscrapy

Can't pass a cookie instead of login data?

kapsolas opened this issue · 9 comments

I have reviewed the code in the loginform.py file and see that the script is looking for a submit button in order to submit the login information.

In my application, their doesn't exist an input of type submit. Instead it is implemented differently.

Sign in

With this, the application cannot sign in and redirect to the application. Can we have an enhancement to get the scanner to accept an authenticated cookie? This way you don't have to do the login.

I'm not very familiar with the scapy library, thus I couldn't tell where to pass this cookie value. FormRequest didn't appear to have any input to accept a cookie.

Thanks!

That's a good idea. I could've sworn I looked into doing that already and
had some stumbling block with the way scrapy handles cookies but once I'm
done with my new project I'll look into adding this.

On Mon, Nov 23, 2015 at 11:06 AM, Dimitrios Kapsalis <
notifications@github.com> wrote:

I have reviewed the code in the loginform.py file and see that the script
is looking for a submit button in order to submit the login information.

In my application, their doesn't exist an input of type submit. Instead it
is implemented differently.

Sign in

With this, the application cannot sign in and redirect to the application.
Can we have an enhancement to get the scanner to accept an authenticated
cookie? This way you don't have to do the login.

I'm not very familiar with the scapy library, thus I couldn't tell where
to pass this cookie value. FormRequest didn't appear to have any input to
accept a cookie.

Thanks!


Reply to this email directly or view it on GitHub
#26.

Great! I'll poke around a bit too with the scrapy library. Maybe the Request object can take the cookie directly there in the start_requests(self) method.

To anyone else looking to do this -

Looked into this a little bit. According to the docs these are the default headers sent with requests:

{
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en',
}

They can be changed in the spider by setting the DEFAULT_REQUEST_HEADERS setting. So in xsscrapy.py when accepting args it would seem that if you accept a Cookie header value you could just build out the default args + the Cookie value by doing something like:

headers = """{
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en',
    'Cookie': '%s',
}""" % user_provided_cookie

Then in the execute statement, set the DEFAULT_REQUEST_HEADERS to this new value by doing something like this:

...
try:
    execute(['scrapy', 'crawl', 'xsscrapy', 
             '-a', 'url=%s' % args.url, '-a', 'user=%s' % args.login, '-a', 
             'pw=%s' % args.password, '-a', 'basic=%s' % args.basic, 
             '-s', 'CONCURRENT_REQUESTS=%s' % args.connections,
             '-s', 'DOWNLOAD_DELAY=%s' % rate,
             '-s', 'DEFAULT_REQUEST_HEADERS=%s' % headers])
...

Of course modifying request headers also opens other doors, like including X-Forwarded-For and the like while crawling. It might be a better idea to just allow a user to submit a string containing additional headers they want instead of just Cookie?

I don't have time at the moment to put in a proper pull request, but I will be testing this out and hopefully submitting something soon. Just wanted the info out there in case it helps anyone move forward.

This is great! I am absolutely swamped for a while so I won't get a chance
to implement this for at least a month without a PR, but I wanted to
implement cookie XSS scanning too so this should be an easy way to do that
too.

On Thu, Dec 17, 2015 at 10:52 AM, binarycanary notifications@github.com
wrote:

To anyone else looking to do this -

Looked into this a little bit. According to the docs
http://doc.scrapy.org/en/latest/topics/settings.html#default-request-headers
these are the default headers sent with requests:

{
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8',
'Accept-Language': 'en',
}

They can be changed in the spider by setting the DEFAULT_REQUEST_HEADERS
setting. So in xsscrapy.py when accepting args it would seem that if you
accept a Cookie header value you could just build out the default args +
the Cookie value by doing something like:

headers = """{
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8',
'Accept-Language': 'en',
'Cookie': '%s',
}""" % user_provided_cookie

Then in the execute statement, set the DEFAULT_REQUEST_HEADERS to this new
value by doing something like this:

...
try:
execute(['scrapy', 'crawl', 'xsscrapy',
'-a', 'url=%s' % args.url, '-a', 'user=%s' % args.login, '-a',
'pw=%s' % args.password, '-a', 'basic=%s' % args.basic,
'-s', 'CONCURRENT_REQUESTS=%s' % args.connections,
'-s', 'DOWNLOAD_DELAY=%s' % rate,
'-s', 'DEFAULT_REQUEST_HEADERS=%s' % headers])
...

Of course modifying request headers also opens other doors, like including
X-Forwarded-For and the like while crawling. It might be a better idea to
just allow a user to submit a string containing additional headers they
want instead of just Cookie?

I don't have time at the moment to put in a proper pull request, but I
will be testing this out and hopefully submitting something soon. Just
wanted the info out there in case it helps anyone move forward.


Reply to this email directly or view it on GitHub
#26 (comment)
.

I tried the suggestion above, but the problem comes from trying to pass a dictionary as a command-line argument, it's not possible as far as I can tell. It seems the recommended way to do this is to enable CookiesMiddleware.

http://doc.scrapy.org/en/latest/topics/downloader-middleware.html#module-scrapy.downloadermiddlewares.cookies

So I'll try that next.

Sorry I meant to follow up - I tried as well and failed. I couldn't figure out a way around it either. Looking forward to what you come up with! I may have time next week and will post back if I figure anything out.

When you can complete the add cookie~~~~~~~~

@kapsolas and @DanMcInerney nailed it. Using the built in cookie middleware seems to be working. As @kapsolas suggested adding the cookies to the requests in start_requests seems like a solid way to do it.

Pull request submitted: #27

Note: the implementation in the PR can accept cookies plus login data.

binarycanary fixed this issue.