medialab/artoo

artoo.ajaxSpider on dynamic data

chrisvasey opened this issue · 9 comments

Hi there!

I have been using the artoo.waitFor helper to load dynamic content on a single page.
The issue I have faced is that when using the ajaxSpider method I am unable to get artoo to run with the .waitFor method as it just returns the HTML.

Is it possible for me to crawl the dynamic content or am I miss understanding?

The content I am trying to scrape is set using JS so pulling the DOM does not help sadly

Yes, dynamic content is the limit here. If what you retrieve is html you could try to inject it into an iframe or something but this is a longshot. Seems you need browser emulation at this point. Or a chrome/firefox extension taking control over your browser.

Sadly the client I am using needs to fetch some data

What is the format of this data? JSON? HTML? Have you tried retro-engineering the API?

Hi @chrisvasey,

Have you achieved to fetch dynamic data using artoo?

Hi @santteegt, In the current setup I am not sure it is possible in JS.
I ended up using python to achieve this task which is a shame because I could not do it straight in the browser.

I will close this thread as I didn't realise it was still open.

I was only able to use recursive with setInterval. If you use waitFor or loop, it doesn't work due to the fact that JS is an asynchronous language.

If you use waitFor or loop, it doesn't work due to the fact that JS is an asynchronous language.

What do you mean? waitFor actually uses setInterval. A loop won't work indeed because it will freeze the main stack.