IllegalStateException when parsing specific URL with Readability4J (topCandidate.parent() must not be null)
samtheeagle578 opened this issue · 0 comments
Hello,
First, I would like to express my appreciation to @dankito and everyone else involved for developing such a useful library as Readability4J.
I encountered an issue while parsing content from the following URL: https://www.whitecoatinvestor.com/high-yield-savings-accounts-364/. When attempting to parse the page, Readability4J throws an IllegalStateException.
Here is the stack trace of the exception:
java.lang.IllegalStateException: topCandidate.parent() must not be null
at net.dankito.readability4j.processor.ArticleGrabber.getTextDirection(ArticleGrabber.kt:1118)
at net.dankito.readability4j.processor.ArticleGrabber.grabArticle(ArticleGrabber.kt:167)
at net.dankito.readability4j.processor.ArticleGrabber.grabArticle$default(ArticleGrabber.kt:57)
at net.dankito.readability4j.Readability4J.parse(Readability4J.kt:101)
I am not well-versed in Kotlin. I have debugged the issue but unfortunately, I have no meaningful insight to provide that could assist in resolving it. Below is the full HTML content that is causing the issue, available for review:
https://1drv.ms/u/s!AnpDf81AVQi-ht46xIci3w9gWPfCNA?e=mOy7e7
Additionally, here is how I am using Readability4J in my code:
Readability4J readability4J = new Readability4J(link, content);
Article article = readability4J.parse();
Any guidance on how to resolve or work around this problem would be greatly appreciated.
Thank you for your support and efforts!
Kind regards,
Sam