Democritus Html

Democritus functions^[1] for working with HTML.

[1] Democritus functions are simple, effective, modular, well-tested, and well-documented Python functions.

We use d8s (pronounced "dee-eights") as an abbreviation for democritus (you can read more about this here).

Installation

pip install d8s-html

Usage

You import the library like:

from d8s_html import *

Once imported, you can use any of the functions listed below.

Functions

def html_text(html_content: StringOrBeautifulSoupObject) -> str:
    """."""

def html_unescape(html_content: StringOrBeautifulSoupObject) -> str:
    """."""

def html_escape(html_content: StringOrBeautifulSoupObject) -> str:
    """."""

def html_to_markdown(html_content: StringOrBeautifulSoupObject, **kwargs) -> str:
    """Convert the html string to markdown."""

def html_find_comments(html_content: StringOrBeautifulSoupObject) -> str:
    """Get a list of all of the comments in the html strings."""

def html_soupify(html_string: str, parser: str = 'html.parser') -> bs4.BeautifulSoup:
    """Return an instance of beautifulsoup with the html."""

def html_remove_tags(html_content: StringOrBeautifulSoupObject) -> bs4.BeautifulSoup:
    """."""

def html_remove_element(html_content: StringOrBeautifulSoupObject, element_tag: str) -> bs4.BeautifulSoup:
    """."""

def html_find_css_path(html_content: StringOrBeautifulSoupObject, css_path: str) -> ListOfBeautifulSoupTags:
    """Find the given css_path in the html_content."""

def html_elements_with_class(
    html_content: StringOrBeautifulSoupObject, html_element_class: str
) -> ListOfBeautifulSoupTags:
    """Find all elements with the given class from the html string."""

def html_elements_with_id(html_content: StringOrBeautifulSoupObject, html_element_id: str) -> ListOfBeautifulSoupTags:
    """Find all elements with the given html_element_id from the html_content."""

def html_elements_with_tag(html_content: StringOrBeautifulSoupObject, tag: str) -> ListOfBeautifulSoupTags:
    """."""

def html_headings_table_of_contents(html_content: StringOrBeautifulSoupObject) -> ListOfBeautifulSoupTags:
    """."""

def html_headings_table_of_contents_string(
    html_content: StringOrBeautifulSoupObject, *, indentation: str = '  '
) -> str:
    """."""

def html_headings(html_content: StringOrBeautifulSoupObject) -> ListOfBeautifulSoupTags:
    """."""

def html_to_json(html_content: StringOrBeautifulSoupObject, *, convert_only_tables: bool = False):
    """Convert the html to json using https://gitlab.com/fhightower/html-to-json."""

def html_soupify_first_arg_string(func):
    """Return a Beautiful Soup instance of the first argument (if it is a string)."""

Development

👋 If you want to get involved in this project, we have some short, helpful guides below:

If you have any questions or there is anything we did not cover, please raise an issue and we'll be happy to help.

Credits