Throw error when redirecting to authwall
acanimal opened this issue · 2 comments
It is posible your cookie credentials become invalid and LinkedIn redirects to the "authwall" where you need to login again.
The current code simple returns an empty profile object that generates an error like Cannot read property 'name' of undefined at module.exports (xxx/node_modules/scrapedin/src/profile/cleanProfileData.js:5:23)
At least for me, in that cases, it's necessary to know if the profile has failed due auth error and because of this I have modified slightly the profile.js
file with the next lines:
module.exports = async (browser, cookies, url, waitTimeToScrapMs = 500, hasToGetContactInfo = false, puppeteerAuthenticate = undefined) => {
...
const page = await openPage({ browser, cookies, url, puppeteerAuthenticate })
let authwall = false;
page.on('response', response => {
const status = response.status()
if ((status >= 300) && (status <= 399)) {
const location = response.headers()['location'];
if (location.includes('authwall')){
authwall = true;
}
}
})
const profilePageIndicatorSelector = '.pv-profile-section'
await page.waitFor(profilePageIndicatorSelector, { timeout: 5000 })
.catch(() => {
//why doesn't throw error instead of continuing scraping?
//because it can be just a false negative meaning LinkedIn only changed that selector but everything else is fine :)
logger.warn('profile selector was not found')
})
// If redirect to authwall is detected throw error
if (authwall) {
const msg = 'Redirected to authwall :( You need new credentials';
logger.warn(msg);
throw new Error(msg);
}
...
I don't know if this is something you want to integrate in the project. If so, let me know and I will send a PR.
Thanks in advance.
send a PR for sure, I'm very busy realocating right now, however there are more people to review and approve it, once that's done I'll just publish the npm package.
Thank you.