Some web pages did not get readable mode
JadeVane opened this issue · 3 comments
Most of the pages I visit are Chinese content, for example:
https://finance.sina.com.cn/tech/it/2022-07-27/doc-imizirav5659240.shtml
https://www.zhihu.com/question/346862321/answer/2573127062
Only archived versions of these pages are available, not readable versions. But one thing puzzles me is that the first page added to shiori gets the readable version correctly and it comes from this link: https://www.zhihu.com/question/546215156/answer/2605044965 , and other links from this site are not able to get a readable version
As you can see, both of them are from zhihu.com
This issue has been automatically marked as stale because it has not had any activity for quite some time.
It will be closed if no further activity occurs.
Thank you for your contributions.
https://finance.sina.com.cn/tech/it/2022-07-27/doc-imizirav5659240.shtml and https://www.zhihu.com/question/346862321/answer/2573127062 are actually readable but the CheckDocument()
function fails because these contents consist of many small paragraphs and the condition of 140 characters minimum in a paragraph to calculate the final score is not reached.
https://www.zhihu.com/question/546215156/answer/2605044965 have a paragraph longer than 140 characters and the calculated score is over 20 so the CheckDocument()
function does not fails and caching can be done.
https://habr.com/ru/company/selectel/blog/684162/ is ok and this https://habr.com/ru/post/683052/ need this commit