Rcrawler not crawling some websites
Rifakh opened this issue · 4 comments
Hi Salim,
i am running Rcrawler on a vector of websites. i have noticed that it is failing to crawl some ex:
http://www.alahleia.com
http://www.almalki.com
tried several depth levels and timeout.
thank you
I'm having the same problem. I was only having the issue with https:// sites, but confirmed that the ones you listed were not working for me as well. Some that I was having trouble with were:
https://manager.submittable.com/beta/discover/?page=1&sort=
https://www.estheticapostle.com/
subscribe to our mailing list to receive notification of the release http://eepurl.com/dMv_7s
@amarbut
Good news
Password protected website can be scraped with the last version.
For your case
LS<-run_browser()
LS<-LoginSession(Browser = LS, LoginURL = 'https://manager.submittable.com/login', LoginCredentials = c('your email','your password'), cssLoginFields =c('#email', '#password'), XpathLoginButton ='//*[\@type=\"submit\"]' )
#Then scrape data with the session
DATA<-ContentScraper(Url='https://manager.submittable.com/beta/discover/119087', XpathPatterns = c('//*[\@id=\"submitter-app\"]/div/div[2]/div/div/div/div/div[3]', '//*[\@id=\"submitter-app\"]/div/div[2]/div/div/div/div/div[2]/div[1]/div[1]' ), PatternsName = c("Article","Title"), astext = TRUE, browser = LS )
check the update to know more features