currentslab/extractnet

Does not parse the page vk.com

Vponed opened this issue · 1 comments

raw_html = requests.get('https://vk.com/neurosciencenews').text
results = Extractor().extract(raw_html)

It does not return almost anything. Why it can be? It works great with other sites.
Also, I would like to know more about manipulations with the extractor. It is very interesting whether it is possible to obtain from it not only data, but also the way in which he extracted them.

My guess is this page is a client side generated site which the content are loaded after the website was loaded. Using requests only returns empty web page ( contents are not yet loaded ). You might need to render the page and try again.

You can view these two files for understanding how the extraction works