jgm/djot

Limit numbers in ordered list markers

hellux opened this issue · 5 comments

hellux commented

There currently is no limit to how large the numbers in ordered lists can be.

The reference implementation runs on JavaScript which means it just uses
floating-point numbers:

9999999999999999999999999.
<ol start="1e+25">
<li>
</li>
</ol>

It might be useful to set a reasonable max limit so implementations that use float and int
can easily match. Preferably the limit is on the number of digits so that no
number parsing is required when identifying list items.

Commonmark has a limit of 9 decimal digits, which can fit in both a 32-bit unsigned and a double precision float.

However, we also have alphabetic and roman numerals. Roman numerals can be very
long and still be small but alphabetic numerals grow fast. The largest number
of alphabetic digits we can use for 32-bit unsigned integers is 6 (zzzzzz <
(1<<32) < zzzzzzz). 6 would be pretty limiting for roman numerals so they might
need separate limits even though they are ambiguous initially.

jgm commented

That probably makes sense, but you're right that roman numerals make it tricky.

bpj commented

Pandoc last time I looked didn't allow alphabetic markers after z, so no aa,
ab, ac, ...
. I've never actually run into it but I think it should be possible
in principle. The source code for the Perl module Number::Latin has a
conversion algorithm.

The highest number which can be expressed as a roman numeral with ASCII letters
is 4999. After that you get overbars like V̅ 5000, X̅ 10000 or cute looking things
like IƆƆ 5000, CCIƆƆ 10000. I dont know how many letters numerals for up to
4999 can have, but you should hardly run into a problem on that account.

I wonder how high list markers would reasonably get in the wild. I'd
imagine both HTML an TeX (for example) have some limit at least in practice,
but one would think lists so long as to run into technical problems are
exceedingly rare, so I can't imagine it would be a problem for the actual
limit to be implementation dependent.

jgm commented

@bpj do you happen to know what the longest (character-length) roman numeral is?

bpj commented

@jgm I was mistaken that the highest number expressible in ASCII letters be 4999. If you use the subtractive model for 4, 9, 14, 19… which most modern algorithms do the limit is 3999 which is MMMCMXCIX, although the Romans themselves would have written 4000 MMMM and 4999 MMMMDCCCCLXXXXVIIII.

If I am not mistaken the longest possible with ASCII letters with the modern subtractive model would be 3888 which is MMMDCCCLXXXVIII.

From 4000/5000 and up once you start using inverted C’s there isn’t really any limit since at least in theory you just keep adding (inverted) C’s for each power ad infinitum. The highest number for which Unicode has a glyph is 100,000 which is ↈ or CCCIƆƆƆ and that may vell be a limit in practice. With the overline/vinculum, which multiplies the number it is placed over with 1,000 the highest single-letter number would be M̅ 1,000,000 although nothing stops you from writing say M̅M̅ for 2,000,000. The Romans would not likely ever have dealt with numbers that high, and for computing they would have used an abacus anyway, or possibly Milesian (Greek) numerals.

AFAIK the only ones who (still) use high roman numerals routinely are the BBC who give the year (EDIT: of publication) in roman numerals in the aftertexts to their programmes, where for example 1988 will have been MCMLXXXVIII which should be the longest they ever used since 1998 is MCMXCVIII which is two characters shorter.

hellux commented

Pandoc last time I looked didn't allow alphabetic markers after z, so no aa,
ab, ac, ....

Djot.js also seems to only parse single letter markers.

If I am not mistaken the longest possible with ASCII letters with the modern subtractive model would be 3888 which is MMMDCCCLXXXVIII.

So reasonable limits could be

  • 9 for decimal,
  • 1 for alphabetic and
  • 15 for roman numerals?