buriy/python-readability
fast python port of arc90's readability tool, updated to match latest readability.js!
PythonApache-2.0
Issues
- 1
significant portion of content missed by readability
#119 opened by robh71 - 5
Consider switching from lxml's clean_html for enhanced security (and possibly performance)
#179 opened by frenzymadness - 1
- 5
cannot import name 'Document'
#118 opened by 123tobias123 - 0
Readability of MSN articles
#181 opened by rpdelaney - 0
Summary is fooled by a modal popup
#180 opened by rpdelaney - 0
- 0
- 4
- 2
Issue with utf8 and HTML entities
#175 opened by uuencode - 4
.text may guess the encoding incorrectly
#163 opened by 097115 - 3
isProbablyReaderable
#174 opened by Uzay-G - 4
- 0
Problems with thecyberwire.com
#171 opened by 097115 - 0
<p> wrongly inserted before <i> or <b>
#170 opened by ploum - 0
Does not handle github pages
#169 opened by ploum - 6
- 3
REGEXES["divToPElementsRe"] logical error
#160 opened by luoqishuai - 1
Error when using positive_keywords (or negative_keywords) argument with python >= 3.7
#161 opened by nbtravis - 0
Missing <p>-text
#159 opened by adbar - 4
Missing image when process medium page
#117 opened by jerryan9999 - 1
RuntimeWarning and Correct invocation on the shell command line (not python script)
#158 opened by m040601 - 1
No chance for GitHub commit page?
#157 opened by 097115 - 0
- 0
Inlining images?
#154 opened by valexandersaulys - 0
Orphan links in doc.summary()
#153 opened by adbar - 2
- 2
New release to PyPI?
#149 opened by raharrison - 1
please replace log.info() with log.debug() in Document.select_best_candidates()
#141 opened by yevgenpapernyk - 0
- 0
Break Inline tags
#143 opened by Amecom - 3
Syntax error while installing using pip3
#98 opened by alandria - 2
Leave necessary images
#121 opened by ozhyrenkov - 1
Some tags create unnecessary paragraphs
#130 opened by GabMus - 2
Pass LXML object straight to readability?
#140 opened by adbar - 0
- 1
Issue with self-closing tags
#125 opened by azmeuk - 3
Documentation
#110 opened by tocka9 - 1
Will it make the task unable to proceed and end?
#123 opened by kjxy - 6
Remove distracting and unnecessary tags
#122 opened by rien333 - 2
- 0
Unparseable: Invalid IPv6 URL
#99 opened by mskrajnowski - 4
Splitting the text in scoring
#113 opened by haziyevv - 4
get_clean_html: lxml error
#108 opened by dufferzafar - 0
Donc summary() won't work on this web site
#112 opened by MChrys - 1
summary removing non breaking space
#111 opened by dosomder - 3
Can not import class Document
#104 opened by simhavas - 2
Don't use drop_tree() while iterating
#102 opened by P1zz4br0etch3n - 3
Drop support for Python 2.6
#97 opened by cmermingas - 1
No module named langdata
#95 opened by sandeepsingh