the-convocation/twitter-scraper

scraper.login gives: Invalid ATT for the flow

Opened this issue · 13 comments

I am trying to use the twitter search for which i need to login:

I am using the following code to login but it gives me error:

"use client";

import { Scraper, Tweet } from "@the-convocation/twitter-scraper";
import { useEffect, useMemo, useState } from "react";

export default function Home() {
  const scraper = useMemo(
    () =>
      new Scraper({
        transform: {
          request(input: RequestInfo | URL, init?: RequestInit) {
            if (input instanceof URL) {
              const proxy =
                "https://corsproxy.io/?" +
                encodeURIComponent(input.toString());
              return [proxy, init];
            } else if (typeof input === "string") {
              const proxy =
                "https://corsproxy.io/?" + encodeURIComponent(input);
              return [proxy, init];
            } else {
              throw new Error("Unexpected request input type");
            }
          },
        },
      }),
    [],
  );
  const [tweet, setTweet] = useState<Tweet | null>(null);

  useEffect(() => {
      async function getTweet() {
          await scraper.login("user", "pass");

          if (await scraper.isLoggedIn()) {
            console.log('Logged in successfully');
            setTimeout(async() => {
                const latestTweet = await scraper.getLatestTweet("saadqbal");
                    if (latestTweet) {
                        setTweet(latestTweet);
                    }
            }, 1000);
          } else {
            console.log('Failed to login');
          }
        //   const latestTweet = await scraper.getLatestTweet("saadqbal");
    //   if (latestTweet) {
    //     setTweet(latestTweet);
    //   }
    }

    getTweet();
  }, [scraper]);

  return (
    <main className="flex min-h-screen flex-col items-center justify-between p-24">
          {tweet?.text}
    </main>
  );
}

I get this error in the browser console:

auth.js:23     POST https://corsproxy.io/?https%3A%2F%2Fapi.twitter.com%2F1.1%2Fonboarding%2Ftask.json 400


auth-user.js:161 Uncaught (in promise) Error: {"errors":[{"code":366,"message":"Invalid ATT for the flow"}]}
    at TwitterUserAuth.executeFlowTask (webpack-internal:///(app-client)/./node_modules/@the-convocation/twitter-scraper/dist/auth-user.js:161:44)
    at async handleFlowTokenResult (webpack-internal:///(app-client)/./node_modules/@the-convocation/twitter-scraper/dist/auth-user.js:41:28)
    at async TwitterUserAuth.login (webpack-internal:///(app-client)/./node_modules/@the-convocation/twitter-scraper/dist/auth-user.js:55:9)
    at async Scraper.login (webpack-internal:///(app-client)/./node_modules/@the-convocation/twitter-scraper/dist/scraper.js:162:9)
    at async getTweet (webpack-internal:///(app-client)/./src/app/twitter/page.tsx:40:13)

Could you please let me know what I am doing wrong?

Owen3H commented

To clarify, you aren't using "user" and "pass" as the real inputs, but something different correct?

Also, I wouldn't advise calling .login() every time you get a tweet or it could cause Twitter to flag logins as suspicious and require an email/phone number to be confirmed.

Not sure it will help, but it might be worth trying to pass an email in case that's what is happening.

await scraper.login("user", "pass", "email");

To clarify, you aren't using "user" and "pass" as the real inputs, but something different correct?

Also, I wouldn't advise calling .login() every time you get a tweet or it could cause Twitter to flag logins as suspicious and require an email/phone number to be confirmed.

Not sure it will help, but it might be worth trying to pass an email in case that's what is happening.

await scraper.login("user", "pass", "email");

I am using actual username and password. And I have made sure that they are valid. I have also tried it with email but it still gives the same error.

If you have 2FA configured on the account, there are additional auth steps that haven't been implemented yet.

If you have 2FA configured on the account, there are additional auth steps that haven't been implemented yet.

Nope, 2FA is not configured.

Also, I wouldn't advise calling .login() every time you get a tweet or it could cause Twitter to flag logins as suspicious and require an email/phone number to be confirmed.

fwiw the tests do this, and every single login is marked as suspicious in the account info. The account was never blocked from logging in over that, though — that only happened once I tried to log in from another country (and even then I just had to reset the password via the confirmed phone number).

Actually, I never tested logging in using a browser environment, there might be something specific to the CORS proxy etc. causing auth errors there. I'll see if I can log in to test that later tonight.

Also having this issue in a React environment. Works fine in regular node.js with the same function call. Also works when setting up the scraper with a proxy, i.e., this works.

const {Scraper} = require("@the-convocation/twitter-scraper");

const scraper = new Scraper({
    transform: {
      request(input, init) {
        // The arguments here are the same as the parameters to fetch(), and
        // are kept as-is for flexibility of both the library and applications.
        if (input instanceof URL) {
          const proxy =
            "https://corsproxy.io/?" +
            encodeURIComponent(input.toString());
          return [proxy, init];
        } else if (typeof input === "string") {
          const proxy =
            "https://corsproxy.io/?" + encodeURIComponent(input);
          return [proxy, init];
        } else {
          // Omitting handling for example
          throw new Error("Unexpected request input type");
        }
      },
    },
  });

async function login(username, password)
{
    await scraper.login(username, password);
    console.log("Logged in: " + await scraper.isLoggedIn());
}

login("username", "password")

But this function called on the frontend side from React will fail with the same login data.

export async function test_login()
{
  const scraper = new Scraper({
    transform: {
      request(input, init) {
        // The arguments here are the same as the parameters to fetch(), and
        // are kept as-is for flexibility of both the library and applications.
        if (input instanceof URL) {
          const proxy =
            "https://corsproxy.io/?" +
            encodeURIComponent(input.toString());
          return [proxy, init];
        } else if (typeof input === "string") {
          const proxy =
            "https://corsproxy.io/?" + encodeURIComponent(input);
          return [proxy, init];
        } else {
          // Omitting handling for example
          throw new Error("Unexpected request input type");
        }
      },
    },
  });

    await scraper.login("username", "password");
    console.log("Logged in: " + await scraper.isLoggedIn());
}
POST https://corsproxy.io/?https%3A%2F%2Fapi.twitter.com%2F1.1%2Fonboarding%2Ftask.json 400

Uncaught (in promise) Error: {"errors":[{"code":366,"message":"Invalid ATT for the flow"}]}
    at TwitterUserAuth.executeFlowTask (auth-user.ts:241:38)
    at async handleFlowTokenResult (auth-user.ts:94:22)
    at async TwitterUserAuth.login (auth-user.ts:109:5)
    at async Scraper.login (scraper.ts:236:5)
    at async test_login (twitter_utils.ts:68:5)

Are there any errors besides that in the browser console? I'm wondering if it's actually a side-effect of the CORS proxy.

No other error messages in the console. I don't really know enough about the auth flow to tell what exactly is going on, but looks like a "flow_token" is successfully obtained (server responds with 200 and a "flow_token" to the request with {flow_name: "login",...}), and then responds with 400 in the subsequent step when the flow_token is sent in the payload, with the response {"errors":[{"code":366,"message":"Invalid ATT for the flow"}]}

I don't think I can debug this until I get back from my vacation in ~4 days (logging in from another country always flags my test account), but my guess is that the ATT is somehow derived from the non-cookie request headers, which are being set automatically by the browser in a way that differs from a pure Node.js client. The best way to debug this would be to progressively add headers present in the browser (but not in Node) to the Node request until something breaks, I think.

If anyone wants to give that a go, it should be simple enough to do without cloning the codebase, using the new request interceptor.

So from what I can tell, these are the headers being set by Node.js (defined in auth-user.js, executeFlowTask(data))

async executeFlowTask(data) {
        const onboardingTaskUrl = 'https://api.twitter.com/1.1/onboarding/task.json';
        const token = this.guestToken;
        if (token == null) {
            throw new Error('Authentication token is null or undefined.');
        }
        const headers = new headers_polyfill_1.Headers({
            authorization: `Bearer ${this.bearerToken}`,
            cookie: await this.jar.getCookieString(onboardingTaskUrl),
            'content-type': 'application/json',
            'User-Agent': 'Mozilla/5.0 (Linux; Android 11; Nokia G20) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.88 Mobile Safari/537.36',
            'x-guest-token': token,
            'x-twitter-auth-type': 'OAuth2Client',
            'x-twitter-active-user': 'yes',
            'x-twitter-client-language': 'en',
        });
    ...
}

and this is the call that fails (also in auth-user.js)

            .then((ft) => executeFlowSubtask({
            flow_token: ft,
            subtask_inputs: [
                {
                    subtask_id: 'LoginJsInstrumentationSubtask',
                    js_instrumentation: {
                        response: '{}',
                        link: 'next_link',
                    },
                },
            ],
        }))

The full list of additional non-cookie headers in my browser is:
:authority, :method, ;path, :scheme (which I believe are stream headers that are probably being added in regular node as well?)
Accept, Accept-Encoding, Accept-Language, Content-Length, Origin, Referer, Sec-Ch-Ua, Sec-Ch-Ua-Mobile, Sec-Ch-Ua-Platform, Sec-Fetch-Dest, Sec-Fetch-Mode, Sec-Fetch-Site

with Requestly, I could remove Origin, Referer, and all the Accept.. and Sec... headers.
Which leaves the request as below.

image

But that still leads to the 400 error at the LoginJsInstrumentationSubtask step.

Actually, I just saw as well that the ATT is sent as a response cookie in the previous request, i.e. this one

        await executeFlowSubtask({
            flow_name: 'login',
            input_flow_data: {
                flow_context: {
                    debug_overrides: {},
                    start_location: {
                        location: 'splash_screen',
                    },
                },
            },
        }

And is set for domain .twitter.com, could that be an issue when the proxy is being used, since the domain does not match? I really don't know enough about these kind of flows to say for sure, but I can't see the ATT being sent in the subsequent request in any part of the network tab in my browser.

Edit: Yeah, there is no cookie header being sent in any of the requests being made. Neither the guest_id nor the ATT is passed on.

That's probably the issue, the cookie jar implementation needs to be using the proxied host, and that can probably also be ignored altogether in the browser in favor of the regular cookie manager.

I was looking deeper into this and found some new issues - most response headers seem to not be available in the browser at all as a security mechanism. As such, the cookies can't be manually stored by the library, which means it's not possible to do things like change the cookie domain dynamically. Even though the library has its own cookie manager, it's unused here.

I think this means that the proxy needs to either set Access-Control-Allow-Headers or it needs to rewrite Set-Cookie headers itself.