proposal: Go 2: remove `byte` alias and always use `uint8`

Question

proposal: Go 2: remove `byte` alias and always use `uint8`

hajimehoshi opened this issue 7 years ago · 19 comments

IMO, using byte alias doesn't hide the fact that the number is unsigned 8bit integer, and this doesn't change code readability. Rather, uint8 is more explicit and fits more with Go way.

On the other hand, I don't have a strong opinion on rune, that is an alias for int32. I think rune makes code readable to some extent.

Answer 1 · 2017-10-08T22:15:08.000Z

and this doesn't change code readability. Rather, uint8 is more explicit and fits more with Go way.

I disagree with this in particular

 Directory of C:\Go\src\uint8s

09/16/2017  12:29 PM    <DIR>          .
09/16/2017  12:29 PM    <DIR>          ..
08/24/2017  09:50 PM            14,753 buffer.go
08/24/2017  09:50 PM            15,880 buffer_test.go
08/24/2017  09:50 PM            20,308 uint8s.go
08/24/2017  09:50 PM             2,837 uint8s_amd64.go
08/24/2017  09:50 PM               869 uint8s_decl.go
08/24/2017  09:50 PM               980 uint8s_generic.go
08/24/2017  09:50 PM             2,845 uint8s_s390x.go
08/24/2017  09:50 PM            39,028 uint8s_test.go
08/24/2017  09:50 PM             4,597 compare_test.go
08/24/2017  09:50 PM             1,332 equal_test.go
08/24/2017  09:50 PM             7,245 example_test.go
08/24/2017  09:50 PM               310 export_test.go
08/24/2017  09:50 PM             3,422 reader.go
08/24/2017  09:50 PM             7,049 reader_test.go

cat buffer.go
// Read reads the next len(p) uint8s from the buffer or until the buffer
// is drained. The return value n is the number of uint8s read. If the
// buffer has no data to return, err is io.EOF (unless len(p) is zero);
// otherwise it is nil.
func (b *Buffer) Read(p []uint8) (n int, err error) {
        b.lastRead = opInvalid
        if b.off >= len(b.buf) {
                // Buffer is empty, reset to recover space.
                b.Reset()
                if len(p) == 0 {
                        return
                }
                return 0, io.EOF
        }
        n = copy(p, b.buf[b.off:])
        b.off += n
        if n > 0 {
                b.lastRead = opRead
        }
        return
}

Answer 2 · 2017-10-08T23:45:49.000Z

The purpose of the alias is to make it clear when one is using bytes as character string elements as opposed to small integers. It adds clarity to the code and should stay.

Answer 3 · 2017-10-09T00:15:08.000Z

@fcntl please remember this issue tracker is governed by the code of conduct. Please refrain from ad hominem attacks in the future.

Answer 4 · 2017-10-09T00:21:18.000Z

@fcntl

There's nothing crazy here. I think the idea had good intentions to simplify the language.

@hajimehoshi

I see from your repositories that you have quite a lot of experience with Go. Is there a particular section of code in stdlib or otherwise that made you think of this proposal? I more so would like to understand the reason you feel this change would make things more clear.

Answer 5 · 2017-10-09T04:47:51.000Z

The purpose of the alias is to make it clear when one is using bytes as character string elements as opposed to small integers. It adds clarity to the code and should stay.

Thank you, I didn't know that intension. Should we follow the intentions strictly? As io.Reader is a general byte stream, I was wondering why byte is used in io.Reader. Not all streams are for characters.

There are some usages that aren't related to characters. e.g. https://golang.org/src/image/ycbcr.go#L167

I disagree with this in particular

@as I don't suggest to replace file names or comments, so the code would be:

cat buffer.go
// Read reads the next len(p) bytes from the buffer or until the buffer
// is drained. The return value n is the number of bytes read. If the
// buffer has no data to return, err is io.EOF (unless len(p) is zero);
// otherwise it is nil.
func (b *Buffer) Read(p []uint8) (n int, err error) {
        b.lastRead = opInvalid
        if b.off >= len(b.buf) {
                // Buffer is empty, reset to recover space.
                b.Reset()
                if len(p) == 0 {
                        return
                }
                return 0, io.EOF
        }
        n = copy(p, b.buf[b.off:])
        b.off += n
        if n > 0 {
                b.lastRead = opRead
        }
        return
}

I see from your repositories that you have quite a lot of experience with Go. Is there a particular section of code in stdlib or otherwise that made you think of this proposal? I more so would like to understand the reason you feel this change would make things more clear.

Thank you. No, there isn't. I forgot in what situation I started to feel like byte is not needed. I wasn't sure in which situation byte is preferable until @robpike commented above. I'm still not sure why io.Reader takes byte, not uint8.

Other discussion why I prefer uint8 to byte is:
https://www.reddit.com/r/golang/comments/6i6xks/gomp3_an_mp3_decoder_in_pure_go/dj5b02x/

Answer 6 · 2017-10-09T05:38:31.000Z

I'm 👎 on this. byte is clearer than uint8, which is an implementation detail and people usually know it.

The solution is to have both types be transparently interchangeable, so that you can use an unit8 where Read takes a []byte. Have you tried with Go1.9?

Answer 7 · 2017-10-09T05:40:59.000Z

The solution is to have both types be transparently interchangeable, so that you can use an unit8 where Read takes a []byte. Have you tried with Go1.9?

Thank you for the opinion. byte is already the alias for uint8 and interchangeable before Go 1.9.

Answer 8 · 2017-10-09T06:07:18.000Z

Ok, that's a good news for you? It means that you can use whatever type you want instead of the one used in the stdlib?

The fact that byte backend by a uint8 should be written in the documentation and it is here.

What would it change to switch the type in the stdlib given that they are interchangeable? Most peoples won't even notice it.

Answer 9 · 2017-10-09T06:12:47.000Z

The fact that byte backend by a uint8 should be written in the documentation and it is here.

What would it change to switch the type in the stdlib given that they are interchangeable? Most peoples won't even notice it.

My intention is to make Go spec a little simpler. As everyone knows byte is exactly same as uint8, can't we say byte is redundant?

I didn't know when to use byte over uint8 and otherwise until @robpike mentioned, but does everyone know that?

Answer 10 · 2017-10-10T01:48:47.000Z

Whether they know it or not, it has long been documented and is easy to understand.

Answer 11 · 2017-10-10T02:37:57.000Z

Thanks, I found https://golang.org/pkg/builtin/#byte

byte is an alias for uint8 and is equivalent to uint8 in all ways. It is used, by convention, to distinguish byte values from 8-bit unsigned integer values.

I couldn't understand what is the difference between byte and uint8 here...

Answer 12 · 2017-10-10T02:42:09.000Z

couldn't understand what is the difference between byte and uint8 here...

In practice there is no difference. Conceptually a value of type uint8 can hold an integer between 0 and 255. A value of type byte represents an opaque 8 bit piece of data. Obviously there's a lot of overlap between these two definitions, that's why they a aliases of one another.

Answer 13 · 2017-10-10T02:44:45.000Z

I still feel like they are same even in terms of concepts, but I need to learn more. I appreciate your elaborating.

Answer 14 · 2017-10-10T03:02:01.000Z

IMO, in a strict meaning, uint8 has direction forward to uint16, uint32. And it have direction signed/unsigned like uint8/int8 too. If you want to declare the direction for the name of function or variable, you will provide APIs like below, you will use uint8 instead of byte.

func ReadUint8() uint8 { ... }
func ReadUint16() uint16 { ... }
func ReadUint32() uint32 { ... }

If you just read buffer, you use byte instead of uint8. In many cases, ReadByte is enough to read stream.

func ReadByte() byte {}

We can choose their name by whether it has directionality or not.

Answer 15 · 2017-10-10T04:11:00.000Z

@robpike has already said most of this very succinctly. I'm elaborating here to drive the point home once more:

@hajimehoshi First and foremost, byte and uint8 are simply type names for a predeclared type. We all know that this type represents the set of all unsigned 8-bit integer values represented in two's complement arithmetic - next to a bit probably the most basic data type in computing.

The only way to refer to this type is by giving it a name (there's no way to construct it from more basic things). In Go we decided from the start to give this type two names, byte and uint8. The reason was not to sow confusion but to have a choice: Sometimes we want to emphasize the byte nature (usually when we talk about the space consumed, or data); sometimes we want to emphasize the integer nature, a small number with which we do arithmetic. That is, the name is a simple (if primitive way) to express more meaning in the code.

We do this everywhere in programming: For instance we may call a struct{ x, y float64 } a Point rather than a Pair or a Tuple because we want to express the fact that we're dealing with a point in a 2D coordinate system (for instance). And so forth.

It just so happens that byte is often what we mean when have a 8 bit unsigned integer, so it's nice to give it a good name.

I'm not saying people are doing this consistently, or even should follow this as a hard and fast rule. Guidelines can only go so far - good naming requires experience and is more art than science. So to answer your question explicitly: No, there should be no rule to be followed strictly.

You yourself mention that you don't see a problem with int32 and rune. The exact same thing is going on there as with byte and uint8. It just may be the case that more often than not people do arithmetic with bytes than with runes, which perhaps colors your impression (I'm speculating here).

Finally, and especially now that the language supports type aliases as a first-class construct, there's really no complexity to speak of here, neither in the implementation nor the spec.

In closing, there doesn't seem to be anything to gain here by removing byte from the language. Most likely you will just find people defining type byte = uint8 all over the place. This doesn't seem to be worth it.

I'm against this proposal. There's bigger fish to fry.

Answer 16 · 2017-10-14T18:41:46.000Z

I'm not saying people are doing this consistently, or even should follow this as a hard and fast rule. Guidelines can only go so far - good naming requires experience and is more art than science. So to answer your question explicitly: No, there should be no rule to be followed strictly.

Well, that's the point. As there is no strict rule here compared to rune, my suggestion was to unify them to uint8 and avoid bike-shedding. Now I started to be convinced, not fully yet, they have different context.

You yourself mention that you don't see a problem with int32 and rune. The exact same thing is going on there as with byte and uint8. It just may be the case that more often than not people do arithmetic with bytes than with runes, which perhaps colors your impression (I'm speculating here).

Right. Probably I'd be fine even if rune and int32 were not interchangeable and required explicit conversion. However, I'd not be happy if byte and uint8 were not interchangeable.

Answer 17 · 2017-10-16T22:19:02.000Z

@hajimehoshi Before removing byte, I'd probably remove uint8. The byte type is just much more common. I'd use the uint8 type for variables that really are integers but where we know that we only need small values and thus can save space. The byte type on the other hand is fundamentally the smallest data type we can address directly with a pointer. We really want to keep that name, it's a good name.

As I said above, it's all about being able to chose a fitting name depending on context.

Answer 18 · 2018-01-14T04:56:18.000Z

Perhaps drop all of uint8, uint16, uint32 and uint64 and replace them with names byte, byte2, byte4, and byte8. Many times we use an intxx instead of an uintxx anyway to represent positive-only numbers, e.g. runes which can't be negative.

Perhaps there could even be an optimized conversion between bytex and byte[x] arrays.

Answer 19 · 2018-03-21T21:25:54.000Z

There is little support for this proposal. Declined.