[QUESTION] Are there cases where BrowserFetcher does not fully support CSR?
pistolcaffe opened this issue · 1 comments
describe what you want to archive
I am going to create a user guide page for my app and I need to crawl that page in my app. (I need to crawl certain urls in the app as well as notion pages)
https://fundevstudio.notion.site/524eafbfa8f2414898d6d8d79f222c05?pvs=4
However, even if �i use the initial BrowserFetcher
,cannot get the title of the loaded page.
Please let me know if there is any additional way I can do it.
Code Sample
fun main(args: Array<String>) {
skrape(BrowserFetcher) {
request {
url = "https://fundevstudio.notion.site/524eafbfa8f2414898d6d8d79f222c05?pvs=4"
}
response {
htmlDocument {
println("title: $titleText")
}
}
}
}
[expect] title: 인사이트 플로우 가이드
[but] title: Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.
If it is not possible, waitUntill
property value similar to playwright, puppeteer: load
, networkidle
, documentLoaded
Please consider providing options.
When using htmlUnit
directly, I found the following exception. net.sourceforge.htmlunit.corejs.javascript.EvaluatorException: identifier is a reserved word: class (https://fundevstudio.notion.site/8402-8521e6e24e557272e4c0.js#1)
Since htmlUnit
is using an outdated Rhino, I think we may need to consider porting it to a V8
engine or something.
Of course, it's only speculation that the exception caused by the engine is the direct cause. If there is any additional information, I will write a comment.