rajatomar788/pywebcopy

Login before save

tadam98s opened this issue · 14 comments

Hi,

I have a website running on a docker that is accesses locally:
http://localhost:9000/dashboard?id=face-animation

I need to login with two fields: login and password.

config = get_config('http://localhost:9000/dashboard?id=face-animation')
wp = config.create_page()
wp.get(config['project_url'])
form = wp.get_forms()[0]
form.inputs['login'].value = 'my_user' # etc
form.inputs['password'].value = 'my_password' # etc
wp.submit_form(form)
wp.get_links()

When I run it I get on wp.get(config['project_url']):

Exception has occurred: KeyError
Exception has occurred: UrlDisallowed
Access to ['http://localhost:9000/dashboard?id=face-animation'] disallowed by the Session rules.
  File "D:\download\tests\scripts\clone_test.py", line 10, in <module>
    wp.get(config['project_url'])
pywebcopy.session.UrlDisallowed: Access to ['http://localhost:9000/dashboard?id=face-animation'] disallowed by the Session rules.

How do I write the code to save this website ?
When I login to the site it creates two cookies:
JWT-SESSION
XSRF-TOKEN
Which I need to carry on into the pywebsave

You need to pass bypass_robots=True to the get_config function. The error states that your local website has robots.txt rule which prohibits bot or script access. It can be just bypassed using the arguments above.

config = get_config(url,bypass_robots=True)
wp = config.create_page()
wp.get(config['project_url'])
form = wp.get_forms()[0]

Exception has occurred: IndexError
list index out of range
File "D:\download\test\scripts\clone_test.py", line 21, in
form = wp.get_forms()[0]
IndexError: list index out of range

You need to verify whether their are forms before applying [0] index. Common sense yaar. Check the url property of the wp object before hand whether their wasn't any redirects. Then check the available forms using get_forms method.

when I open the site manually I get:
image

if I login manually, can I manually copy the cookies and pass to pywebsave?
Apparently, there is a java script that shows the login/password form. Is there a way to provide it with the answers programatically? or continue after manual login ?

If is only showing the spinner and the java code /js/outBIMYN2XL.js that opens the login/password does not appear to be executed.

<!DOCTYPE html>
<html lang="en">

<head>
    <meta http-equiv="content-type" content="text/html; charset=UTF-8" charset="UTF-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <link rel="apple-touch-icon" href="/apple-touch-icon.png">
    <link rel="apple-touch-icon" sizes="57x57" href="/apple-touch-icon-57x57.png">
    <link rel="apple-touch-icon" sizes="60x60" href="/apple-touch-icon-60x60.png">
    <link rel="apple-touch-icon" sizes="72x72" href="/apple-touch-icon-72x72.png">
    <link rel="apple-touch-icon" sizes="76x76" href="/apple-touch-icon-76x76.png">
    <link rel="apple-touch-icon" sizes="114x114" href="/apple-touch-icon-114x114.png">
    <link rel="apple-touch-icon" sizes="120x120" href="/apple-touch-icon-120x120.png">
    <link rel="apple-touch-icon" sizes="144x144" href="/apple-touch-icon-144x144.png">
    <link rel="apple-touch-icon" sizes="152x152" href="/apple-touch-icon-152x152.png">
    <link rel="apple-touch-icon" sizes="180x180" href="/apple-touch-icon-180x180.png">
    <link rel="icon" type="image/x-icon" href="/favicon.ico">
    <meta name="application-name" content="test" />
    <meta name="msapplication-TileColor" content="#FFFFFF" />
    <meta name="msapplication-TileImage" content="/mstile-512x512.png" />
    <title>test</title>

    <link rel="stylesheet" href="/js/outWHCP76XN.css" />
</head>

<body>
    <div id="content" data-base-url="" data-server-status="UP" data-instance="test" data-official="true">
        <div class="global-loading">
            <i class="spinner global-loading-spinner"></i>
            <span aria-live="polite" class="global-loading-text">Loading...</span>
        </div>
    </div>

    <script type="module" src="/js/outBIMYN2XL.js"></script>
</body>

</html>

Just login with your browser and then copy the cookies to the pywebcopy session headers.

You can access the session using the .session attribute of the wp object that you created. Then use .headers attribute of the session to set the headers including cookies.
The session object is a requests library session. You can read up online how to manage a requests.Session object.

URLmain = "http://localhost:9000/"
session = requests.session()
my_cookies = {'JWT-SESSION': 'some value',
'XSRF-TOKEN': 'some value'}
r = requests.post(URLmain, cookies=my_cookies)

This sets the cookies correctly. But the next save_website is not tied to this session at it gets the url as a parameter not the session.
How do I connect this session to the following save_website ?

Use the wp object style approch as you did in the start. Use wp.get methods to open pages. Then the session would remain same for all the requests.

I have built a url that sets cookies.
url_cookies = f"{url}/cookies/set?JWT-SESSiON={my_cookies['JWT-SESSiON']}?XSRF-TOKEN={my_cookies['XSRF-TOKEN']}"

Then I used get_config to start a session
config = get_config(url, project_folder, project_name=projectName, bypass_robots=True, debug=True, delay=None, threaded=False)

Then I was not sure how to use the usl_cookies to set the cookies. I tried:

crawler = config.create_crawler()
crawler.get(url_cookies)
crawler.get(url)
crawler.save_complete(pop=True)

But this did not set the cookies.
Not sure how to use the wp as if I have the cookies not sure I need to open a form. Please avise.

Anyway, I maybe copying the cookies may not work as they are JWT and could be associated with some seed in each instance.
I may be back to the question, how to login.

You may have proceed with trial and error method. It is understood that there is no javascript support in the pywebcopy. So each javascript based site would require some different approach to get around. At the moment I can only tell you to see the requests.Session usage and documentation. Because cookies and auth is handled by that quite capable library.