[BUG] cell_len inaccurate for OSC8 links, occasionally breaking text wrapping

Question

[BUG] cell_len inaccurate for OSC8 links, occasionally breaking text wrapping

Closed this issue a month ago · 7 comments

I've checked docs and closed issues for possible solutions.
I can't find my issue in the FAQ.

Describe the bug

It seems that cell_len does not return the visible cell length for OSC8 links

I've reproduced and actually just had to patch this for my own project. Attaching a repro and the fix below.

Can PR too but thought I'd confirm this should be fixed, as probably my fix affects performance or
interacts with other things I've not thought of?

It may not be something everyone notices, but if you do use OSC8 links, it can make line breaking not work correctly. Among other things, rich can break a link in the middle (making a fragment of the URL visible).

Thanks so much!

Platform

Click to expand

What platform (Win/Linux/Mac) are you running on? What terminal software are you using?

v13.9.4 on macos, but as noted above you can see the issue on main branch.

I may ask you to copy and paste the output of the following commands. It may save some time if you do it now.

If you're using Rich in a terminal:

python -m rich.diagnose
pip freeze | grep rich

If you're using Rich in a Jupyter Notebook, run the following snippet in a cell
and paste the output in your bug report.

from rich.diagnose import report
report()

"""
Monkey-patch to fix Rich's cell_len method to correctly handle OSC-8 links.
"""

from functools import lru_cache
from typing import Callable

import rich.cells
from rich.cells import _is_single_cell_widths, get_character_cell_size


def strip_control_sequences(text: str) -> str:
    """Strips ANSI control sequences, including OSC-8 links, from the text."""
    from rich.ansi import _ansi_tokenize

    plain_text_parts = []
    for token in _ansi_tokenize(text):
        if token.plain:
            plain_text_parts.append(token.plain)
    return "".join(plain_text_parts)


@lru_cache(4096)  # noqa: F821
def cached_cell_len(text: str) -> int:
    """Get the number of cells required to display text.

    This method always caches, which may use up a lot of memory. It is recommended to use
    `cell_len` over this method.

    Args:
        text (str): Text to display.

    Returns:
        int: Get the number of cells required to display text.
    """
    if _is_single_cell_widths(text):
        return len(text)
    return sum(map(get_character_cell_size, strip_control_sequences(text)))


def cell_len(text: str, _cell_len: Callable[[str], int] = cached_cell_len) -> int:
    """Get the number of cells required to display text.

    Args:
        text (str): Text to display.

    Returns:
        int: Get the number of cells required to display text.
    """
    if len(text) < 512:
        return _cell_len(text)
    if _is_single_cell_widths(text):
        return len(text)
    return sum(map(get_character_cell_size, strip_control_sequences(text)))


# Monkey patch!
rich.cells.cached_cell_len = cached_cell_len
rich.cells.cell_len = cell_len


## Tests


def _make_osc_link(url: str, text: str) -> str:
    return f"\x1b]8;;{url}\x1b\\{text}\x1b]8;;\x1b\\"


_plain_text = "ansi🤔"
_short_osc_link = _make_osc_link("http://example.com/", _plain_text)
_long_osc_link = _make_osc_link("http://example.com/" + "x" * 100, _plain_text)


def test_old_cell_len_bug():
    from rich.cells import cell_len as old_cell_len

    print(
        f"old lengths: plain_text={old_cell_len(_plain_text)} "
        f"short_osc_link={old_cell_len(_short_osc_link)} "
        f"long_osc_link={old_cell_len(_long_osc_link)}"
    )
    assert old_cell_len(_plain_text) == 6
    # Without patching:
    # assert old_cell_len(_short_osc_link) == 35  # Wrong!
    # assert old_cell_len(_long_osc_link) == 135  # Wrong!
    # If this patch is loaded:
    assert old_cell_len(_short_osc_link) == cell_len(_short_osc_link)
    assert old_cell_len(_long_osc_link) == cell_len(_long_osc_link)


def test_cell_len():

    print(
        f"new lengths: plain_text={cell_len(_plain_text)} "
        f"short_osc_link={cell_len(_short_osc_link)} "
        f"long_osc_link={cell_len(_long_osc_link)}"
    )
    assert cell_len(_plain_text) == 6
    assert cell_len(_short_osc_link) == cell_len(_plain_text)
    assert cell_len(_long_osc_link) == cell_len(_plain_text)

Answer 1 · 2024-11-15T03:34:15.000Z

Thank you for your issue. Give us a little time to review it.

PS. You might want to check the FAQ if you haven't done so already.

This is an automated reply, generated by FAQtory

Answer 2 · 2024-11-15T11:18:46.000Z

I don't think cell_len is intended for text with ANSI codes?

If you convert your links into Rich Text objects, the cell_len is as expected:

from rich.text import Text
from rich import print


def make_osc_link(url: str, text: str) -> str:
    return f"\x1b]8;;{url}\x1b\\{text}\x1b]8;;\x1b\\"

plain_text = "ansi"
short_link_ansi = make_osc_link("http://example.com/", plain_text)
long_link_ansi = make_osc_link("http://example.com/" + "x" * 100, plain_text)

short_link_text = Text.from_ansi(short_link_ansi)
long_link_text = Text.from_ansi(long_link_ansi)

print(short_link_text, short_link_text.cell_len)
print(long_link_text, long_link_text.cell_len)

Answer 3 · 2024-11-15T11:31:40.000Z

Tom is correct -- as usual!

cell_len was never intended to parse ansi escape sequences.

It is better to let Rich generate all the escape sequences, or if you can't avoid escape sequences in your output, use Text.from_ansi.

Answer 4 · 2024-11-16T17:23:06.000Z

Ah so Rich is intended to work with preexisting ANSI codes only via Text.from_ansi(), including all OSC codes? That makes sense!

I'll give it a try and see if that fixes everything.

I had text created with Text() and as you can imagine that led to subtle/latent wrapping bugs. Perhaps a warning in the docs and/or a method for creating OSC links in Rich itself (or even an exception for misuse like this?) would encourage correct usage? Thanks again!

Answer 5 · 2024-11-17T09:39:52.000Z

,,,a method for creating OSC links in Rich itself

https://rich.readthedocs.io/en/stable/markup.html#links

Answer 6 · 2024-11-21T08:07:52.000Z

https://rich.readthedocs.io/en/stable/markup.html#links

Ah thanks, yes though in my use case it makes more sense to generate links programmatically.

Closing this since it's clear Text.from_ansi() is the best answer.

Also just want to say, looking more closely the codebase have noticed how unusually clean and refreshingly well designed it is! Nice work @willmcgugan and team.

Answer 7 · 2024-11-21T08:08:18.000Z

I hope we solved your problem.

If you like using Rich, you might also enjoy Textual