Generated sitemap contains current url only
LinusGeffarth opened this issue · 6 comments
Hey, love the idea and could really make use of this library. I tested it out on the website I'm working on. It's a single-page-application built in react.js.
I ran the extension and it immediately finished and downloaded the following xml file:
Why is that? There are (obviously) more links on my website...
So you didn't change anything, just ran it twice?
How many links did it produce for you?
My second run had 72 Links.
I was running into this problem trying to scrape SPA apps, the app needs to wait until the web page have fully rendered, which this does a decent job of in my use case. It looks like it might have issues on slower pages.
if you run your site in chrome and open lighthouse in the console, you have some serious site speed issues going on.
Theoretically, it should have a few thousand links...
I don't think the site speed is necessarily the issue. The speed index you're seeing applies to mobile on 3G network (which indeed is really bad).
On desktop & fast network, the speed is much higher.
I'll investigate on this though. Thanks for the gibt about the speed.
Site speed would not be an issue if the site loaded all at once, but if you have a full render before async calls return to populate content the page scan will take a body snapshot before the async content is rendered.
I ran into this a lot trying to scrape, and ended up needing to delay the snapshot 5-10s after page load for it to capture the content and not just the layout and placeholders.
I haven't looked into how this project handles the delay needs, but if your site has slow loading data sources, may need to add some delay before trying to snapshot the html.
Oh I see, that makes sense.
I'll see if I can find a way to add the delay.
Thanks for your help! 🙏