/polars-deunicode-string

This is a simple polars-plugin that de-unicodes a string using the deunicode crate.

Primary LanguagePython

polars-deunicode-string

Crates.io

PIPY Docs.rs PyPI - License PyPI - Python Version PyPI - Downloads PyPI - Wheel PyPI - Format PyPI - Implementation

This is a simple polars-plugin that de-unicodes a string using the deunicode crate.

Installation

pip install polars-deunicode-string

Basic Example

from polars-deunicode-string import decode_string


df: pl.DataFrame = pl.DataFrame(
    {
        "text": ["Nariño", "Jose Fernando Ramírez Güiza",
                 "Córdoba", "Hello World!", None],
    }
)

Let´s de-unicode and make lowercase the column "text":

result*df: pl.DataFrame = (
df.lazy().with_columns([decode_string("text").name.prefix("decode")]).collect()
)
print(result_df)

shape: (5, 2)
┌─────────────────────────────┬─────────────────────────────┐
│ textdecode_text                 │
│ ------                         │
│ strstr                         │
╞═════════════════════════════╪═════════════════════════════╡
│ NariñoNarino                      │
│ Jose Fernando Ramírez GüizaJose Fernando Ramirez Guiza │
│ CórdobaCordoba                     │
│ Hello World!                ┆ Hello World!                │
│ nullnull                        │
└─────────────────────────────┴─────────────────────────────┘