dankito/Readability4J

IllegalStateException when parsing specific URL with Readability4J (topCandidate.parent() must not be null)

samtheeagle578 opened this issue · 0 comments

Hello,

First, I would like to express my appreciation to @dankito and everyone else involved for developing such a useful library as Readability4J.

I encountered an issue while parsing content from the following URL: https://www.whitecoatinvestor.com/high-yield-savings-accounts-364/. When attempting to parse the page, Readability4J throws an IllegalStateException.

Here is the stack trace of the exception:

java.lang.IllegalStateException: topCandidate.parent() must not be null
	at net.dankito.readability4j.processor.ArticleGrabber.getTextDirection(ArticleGrabber.kt:1118)
	at net.dankito.readability4j.processor.ArticleGrabber.grabArticle(ArticleGrabber.kt:167)
	at net.dankito.readability4j.processor.ArticleGrabber.grabArticle$default(ArticleGrabber.kt:57)
	at net.dankito.readability4j.Readability4J.parse(Readability4J.kt:101)

I am not well-versed in Kotlin. I have debugged the issue but unfortunately, I have no meaningful insight to provide that could assist in resolving it. Below is the full HTML content that is causing the issue, available for review:

https://1drv.ms/u/s!AnpDf81AVQi-ht46xIci3w9gWPfCNA?e=mOy7e7

Additionally, here is how I am using Readability4J in my code:

Readability4J readability4J = new Readability4J(link, content);    
Article article = readability4J.parse();

Any guidance on how to resolve or work around this problem would be greatly appreciated.

Thank you for your support and efforts!

Kind regards,
Sam