multiformats/multibase

Please clarify "highest char/letter"

kevina opened this issue · 8 comments

According to rfc4648 z is included in both base32, base64, and base64url and yet we have this:

base32        U, u    rfc4648 - highest letter
base64        y       rfc4648 highest char
base64url     Y       rfc4648 highest char

I'm confused by what "highest letter/char" is suppose to mean.

It's just an indication that those chars happen to be highest in that
particular alphabet. These ado include f (for hex). It's just a note, of no
spec importance
On Tue, Aug 23, 2016 at 17:51 Kevin Atkinson notifications@github.com
wrote:

According to rfc4648 https://tools.ietf.org/html/rfc4648 z is included
in both base32, base64, and base64url and yet we have this:

base32 U, u rfc4648 - highest letter
base64 y rfc4648 highest char
base64url Y rfc4648 highest char

I'm confused by what "highest letter/char" is suppose to mean.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#10, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAIcoW-a-tAgflYr86WeEoOfBD_MI01gks5qi2tYgaJpZM4JrbbA
.

Except it is wrong. base32 includes 'Z' in the alphabet which is higher than 'U':

                     Table 3: The Base 32 Alphabet

     Value Encoding  Value Encoding  Value Encoding  Value Encoding
         0 A             9 J            18 S            27 3
         1 B            10 K            19 T            28 4
         2 C            11 L            20 U            29 5
         3 D            12 M            21 V            30 6
         4 E            13 N            22 W            31 7
         5 F            14 O            23 X
         6 G            15 P            24 Y         (pad) =
         7 H            16 Q            25 Z
         8 I            17 R            26 2

The same can be said for base64 and base64url.

@kevina hm must be a typo. Want to PR a fix? (just remove).

I am trying to understand the logic behind the code assignments. Using "highest letter/char" breaks down after base 16. If it is still open for discussion I would like to propose the following alternative assignments that make a lot more sense to me.

base32        H, h    rfc4648 - useful for Humans and Hostnames
base64        m       rfc4648 highest char --  originates from the MIME content transfer encoding.
base64url     u       rfc4648 highest char -- useful for url and case insensitive filesystems
  • the "using highest letter" was just a convenience, it doesn't have to hold.
  • 👎 on semantic assignments like that. it's only valid in a subset of languages (english) and quite obscure; it's not obvious. it's not a good design principle.

@jbenet fair enough, but for base64url the 'u' is in the name so I think it makes more sense to use that.

@kevina the point is to associate it with the other char for base64. if you want to change the table and give u,U to the base64s and something else to base32s, i can consider it. but it loses the semantic differentiation you wanted (u for url)

The choice of letters also involves how we handle padding, see #9.

I am going to close this issue.