Vercel WebApp
Closed this issue · 5 comments
Is your feature request related to a problem? Please describe.
I'm currently working on improving Scrape-ML's ability to handle websites with dynamically loaded content. This is a common challenge because websites often use JavaScript to fetch and display content after the initial page load. Scrape-ML's current static parsing approach often misses this dynamically generated content, leading to incomplete data extraction.
Describe the solution you'd like
I propose implementing a feature that utilizes browser automation to handle dynamic content. This could be achieved by integrating with a library like Selenium or Puppeteer. These libraries allow Scrape-ML to simulate a real browser, execute JavaScript code, and wait for the dynamically loaded content to appear before parsing the page.
Describe alternatives you've considered
I've explored using Scrape-ML's existing features like custom selectors and regular expressions to target specific elements within the source code. However, this approach becomes cumbersome and unreliable for complex websites with intricate JavaScript interactions. Additionally, it requires a deep understanding of the website's underlying code, making it difficult for users who are not familiar with web development.
Additional context
Several popular web scraping frameworks utilize browser automation for handling dynamic content. This functionality has become a critical aspect of modern web scraping due to the prevalence of dynamic websites.
Thank you for raising a issue, Hope you enjoing the open source. we try to reply or assign as soon possibe. Connect with mentor.
I Request You to Assign me This Feature Request under GSSOC'24 (Level 3)
Hey @sanjay-kv @Sitevity if this issue is available, I would like to work on it
Its already assigned if you want to collaborate reach out to assigned person
This issue has been automatically closed because it has been inactive for more than 30 days. If you believe this is still relevant, feel free to reopen it or create a new one. Thank you!