appledora/mwparserfromhtml
An unofficial mirror of our repo of the `mwparserfromhtml` package. It is a python library for working with the HTML dumps. Since this is only a mirror, DO NOT PR.
PythonMIT
Issues
- 3
Contribution Guideline and Tutorial Notebook
#42 opened by appledora - 0
Add logging to indicate mismatch between HTML spec version and html dumps version
#44 opened by appledora - 3
- 0
Allow raw article html strings to be passed without all the additional metadata in the dump
#46 opened by appledora - 0
- 0
- 0
- 0
Split plaintext by sections and paragraphs
#40 opened by appledora - 6
Initial template for packaging - [merged]
#61 opened by appledora - 1
Reduce down requirements.txt
#38 opened by appledora - 22
- 1
add function to extract media to library
#25 opened by appledora - 19
feature: metadata extraction - [merged]
#62 opened by appledora - 1
additional metadata from json
#39 opened by appledora - 11
feature: added namespace attribute to Wikilink instances, language attribute... - [merged]
#58 opened by appledora - 6
discuss python packaging
#33 opened by appledora - 57
Resolve "Create Documentation" - [merged]
#60 opened by appledora - 1
Create Documentation
#37 opened by appledora - 0
- 1
Add namespace attribute to Wikilink objects
#20 opened by appledora - 17
- 4
add functions to extract plaintexts to library
#30 opened by appledora - 0
add functions to extract tables to library
#27 opened by appledora - 1
Choose a license
#36 opened by appledora - 3
- 3
Add tests to CI pipeline
#14 opened by appledora - 0
write test for template extraction method
#23 opened by appledora - 0
- 0
write test for header extraction method
#22 opened by appledora - 0
write test for comment extraction method
#21 opened by appledora - 0
Write test for category extraction method
#15 opened by appledora - 0
Write test for external links extraction method
#16 opened by appledora - 0
write test for wikilinks extraction method
#18 opened by appledora - 1
add function to extract references to library
#26 opened by appledora - 11
- 1
write test for section extraction method
#19 opened by appledora - 21
- 0
determine how to identify hidden categories
#35 opened by appledora - 0
reduce redundancy in testing module
#34 opened by appledora - 0
Write test for dump module
#31 opened by appledora - 0
add functions to extract parents to library
#29 opened by appledora - 0
add functions to extract ancestors to library
#28 opened by appledora - 0
pretty print article information
#24 opened by appledora - 2
add function to extract templates to library
#12 opened by appledora - 47
feature: template extraction method - [merged]
#53 opened by appledora - 3
add static namespace list and utility for generating it to help with namespace... - [merged]
#52 opened by appledora - 44
- 1
- 59
feature: extract external links - [merged]
#51 opened by appledora - 0