/d8s-html

Democritus functions for working with HTML.

Primary LanguagePythonGNU Lesser General Public License v3.0LGPL-3.0

Democritus Html

PyPI CI Lint codecov The Democritus Project uses semver version 2.0.0 The Democritus Project uses black to format code License: LGPL v3

Democritus functions[1] for working with HTML.

[1] Democritus functions are simple, effective, modular, well-tested, and well-documented Python functions.

We use d8s (pronounced "dee-eights") as an abbreviation for democritus (you can read more about this here).

Installation

pip install d8s-html

Usage

You import the library like:

from d8s_html import *

Once imported, you can use any of the functions listed below.

Functions

  • def html_text(html_content: StringOrBeautifulSoupObject) -> str:
        """."""
  • def html_unescape(html_content: StringOrBeautifulSoupObject) -> str:
        """."""
  • def html_escape(html_content: StringOrBeautifulSoupObject) -> str:
        """."""
  • def html_to_markdown(html_content: StringOrBeautifulSoupObject, **kwargs) -> str:
        """Convert the html string to markdown."""
  • def html_find_comments(html_content: StringOrBeautifulSoupObject) -> str:
        """Get a list of all of the comments in the html strings."""
  • def html_soupify(html_string: str, parser: str = 'html.parser') -> bs4.BeautifulSoup:
        """Return an instance of beautifulsoup with the html."""
  • def html_remove_tags(html_content: StringOrBeautifulSoupObject) -> bs4.BeautifulSoup:
        """."""
  • def html_remove_element(html_content: StringOrBeautifulSoupObject, element_tag: str) -> bs4.BeautifulSoup:
        """."""
  • def html_find_css_path(html_content: StringOrBeautifulSoupObject, css_path: str) -> ListOfBeautifulSoupTags:
        """Find the given css_path in the html_content."""
  • def html_elements_with_class(
        html_content: StringOrBeautifulSoupObject, html_element_class: str
    ) -> ListOfBeautifulSoupTags:
        """Find all elements with the given class from the html string."""
  • def html_elements_with_id(html_content: StringOrBeautifulSoupObject, html_element_id: str) -> ListOfBeautifulSoupTags:
        """Find all elements with the given html_element_id from the html_content."""
  • def html_elements_with_tag(html_content: StringOrBeautifulSoupObject, tag: str) -> ListOfBeautifulSoupTags:
        """."""
  • def html_headings_table_of_contents(html_content: StringOrBeautifulSoupObject) -> ListOfBeautifulSoupTags:
        """."""
  • def html_headings_table_of_contents_string(
        html_content: StringOrBeautifulSoupObject, *, indentation: str = '  '
    ) -> str:
        """."""
  • def html_headings(html_content: StringOrBeautifulSoupObject) -> ListOfBeautifulSoupTags:
        """."""
  • def html_to_json(html_content: StringOrBeautifulSoupObject, *, convert_only_tables: bool = False):
        """Convert the html to json using https://gitlab.com/fhightower/html-to-json."""
  • def html_soupify_first_arg_string(func):
        """Return a Beautiful Soup instance of the first argument (if it is a string)."""

Development

👋  If you want to get involved in this project, we have some short, helpful guides below:

If you have any questions or there is anything we did not cover, please raise an issue and we'll be happy to help.

Credits

This package was created with Cookiecutter and Floyd Hightower's Python project template.