proposal: Go 2: remove `byte` alias and always use `uint8`
hajimehoshi opened this issue ยท 19 comments
IMO, using byte
alias doesn't hide the fact that the number is unsigned 8bit integer, and this doesn't change code readability. Rather, uint8
is more explicit and fits more with Go way.
On the other hand, I don't have a strong opinion on rune
, that is an alias for int32
. I think rune
makes code readable to some extent.
and this doesn't change code readability. Rather, uint8 is more explicit and fits more with Go way.
I disagree with this in particular
Directory of C:\Go\src\uint8s
09/16/2017 12:29 PM <DIR> .
09/16/2017 12:29 PM <DIR> ..
08/24/2017 09:50 PM 14,753 buffer.go
08/24/2017 09:50 PM 15,880 buffer_test.go
08/24/2017 09:50 PM 20,308 uint8s.go
08/24/2017 09:50 PM 2,837 uint8s_amd64.go
08/24/2017 09:50 PM 869 uint8s_decl.go
08/24/2017 09:50 PM 980 uint8s_generic.go
08/24/2017 09:50 PM 2,845 uint8s_s390x.go
08/24/2017 09:50 PM 39,028 uint8s_test.go
08/24/2017 09:50 PM 4,597 compare_test.go
08/24/2017 09:50 PM 1,332 equal_test.go
08/24/2017 09:50 PM 7,245 example_test.go
08/24/2017 09:50 PM 310 export_test.go
08/24/2017 09:50 PM 3,422 reader.go
08/24/2017 09:50 PM 7,049 reader_test.go
cat buffer.go
// Read reads the next len(p) uint8s from the buffer or until the buffer
// is drained. The return value n is the number of uint8s read. If the
// buffer has no data to return, err is io.EOF (unless len(p) is zero);
// otherwise it is nil.
func (b *Buffer) Read(p []uint8) (n int, err error) {
b.lastRead = opInvalid
if b.off >= len(b.buf) {
// Buffer is empty, reset to recover space.
b.Reset()
if len(p) == 0 {
return
}
return 0, io.EOF
}
n = copy(p, b.buf[b.off:])
b.off += n
if n > 0 {
b.lastRead = opRead
}
return
}
The purpose of the alias is to make it clear when one is using bytes as character string elements as opposed to small integers. It adds clarity to the code and should stay.
@fcntl please remember this issue tracker is governed by the code of conduct. Please refrain from ad hominem attacks in the future.
There's nothing crazy here. I think the idea had good intentions to simplify the language.
I see from your repositories that you have quite a lot of experience with Go. Is there a particular section of code in stdlib or otherwise that made you think of this proposal? I more so would like to understand the reason you feel this change would make things more clear.
The purpose of the alias is to make it clear when one is using bytes as character string elements as opposed to small integers. It adds clarity to the code and should stay.
Thank you, I didn't know that intension. Should we follow the intentions strictly? As io.Reader
is a general byte stream, I was wondering why byte
is used in io.Reader
. Not all streams are for characters.
There are some usages that aren't related to characters. e.g. https://golang.org/src/image/ycbcr.go#L167
I disagree with this in particular
@as I don't suggest to replace file names or comments, so the code would be:
cat buffer.go
// Read reads the next len(p) bytes from the buffer or until the buffer
// is drained. The return value n is the number of bytes read. If the
// buffer has no data to return, err is io.EOF (unless len(p) is zero);
// otherwise it is nil.
func (b *Buffer) Read(p []uint8) (n int, err error) {
b.lastRead = opInvalid
if b.off >= len(b.buf) {
// Buffer is empty, reset to recover space.
b.Reset()
if len(p) == 0 {
return
}
return 0, io.EOF
}
n = copy(p, b.buf[b.off:])
b.off += n
if n > 0 {
b.lastRead = opRead
}
return
}
I see from your repositories that you have quite a lot of experience with Go. Is there a particular section of code in stdlib or otherwise that made you think of this proposal? I more so would like to understand the reason you feel this change would make things more clear.
Thank you. No, there isn't. I forgot in what situation I started to feel like byte
is not needed. I wasn't sure in which situation byte
is preferable until @robpike commented above. I'm still not sure why io.Reader
takes byte
, not uint8
.
Other discussion why I prefer uint8
to byte
is:
https://www.reddit.com/r/golang/comments/6i6xks/gomp3_an_mp3_decoder_in_pure_go/dj5b02x/
I'm ๐ on this. byte
is clearer than uint8
, which is an implementation detail and people usually know it.
The solution is to have both types be transparently interchangeable, so that you can use an unit8
where Read
takes a []byte
. Have you tried with Go1.9?
The solution is to have both types be transparently interchangeable, so that you can use an unit8 where Read takes a []byte. Have you tried with Go1.9?
Thank you for the opinion. byte
is already the alias for uint8
and interchangeable before Go 1.9.
Ok, that's a good news for you? It means that you can use whatever type you want instead of the one used in the stdlib?
The fact that byte
backend by a uint8
should be written in the documentation and it is here.
What would it change to switch the type in the stdlib given that they are interchangeable? Most peoples won't even notice it.
The fact that byte backend by a uint8 should be written in the documentation and it is here.
What would it change to switch the type in the stdlib given that they are interchangeable? Most peoples won't even notice it.
My intention is to make Go spec a little simpler. As everyone knows byte
is exactly same as uint8
, can't we say byte
is redundant?
I didn't know when to use byte
over uint8
and otherwise until @robpike mentioned, but does everyone know that?
Whether they know it or not, it has long been documented and is easy to understand.
Thanks, I found https://golang.org/pkg/builtin/#byte
byte is an alias for uint8 and is equivalent to uint8 in all ways. It is used, by convention, to distinguish byte values from 8-bit unsigned integer values.
I couldn't understand what is the difference between byte and uint8 here...
couldn't understand what is the difference between byte and uint8 here...
In practice there is no difference. Conceptually a value of type uint8
can hold an integer between 0 and 255. A value of type byte
represents an opaque 8 bit piece of data. Obviously there's a lot of overlap between these two definitions, that's why they a aliases of one another.
I still feel like they are same even in terms of concepts, but I need to learn more. I appreciate your elaborating.
IMO, in a strict meaning, uint8 has direction forward to uint16, uint32. And it have direction signed/unsigned like uint8/int8 too. If you want to declare the direction for the name of function or variable, you will provide APIs like below, you will use uint8 instead of byte.
func ReadUint8() uint8 { ... }
func ReadUint16() uint16 { ... }
func ReadUint32() uint32 { ... }
If you just read buffer, you use byte instead of uint8. In many cases, ReadByte is enough to read stream.
func ReadByte() byte {}
We can choose their name by whether it has directionality or not.
@robpike has already said most of this very succinctly. I'm elaborating here to drive the point home once more:
@hajimehoshi First and foremost, byte
and uint8
are simply type names for a predeclared type. We all know that this type represents the set of all unsigned 8-bit integer values represented in two's complement arithmetic - next to a bit probably the most basic data type in computing.
The only way to refer to this type is by giving it a name (there's no way to construct it from more basic things). In Go we decided from the start to give this type two names, byte
and uint8
. The reason was not to sow confusion but to have a choice: Sometimes we want to emphasize the byte nature (usually when we talk about the space consumed, or data); sometimes we want to emphasize the integer nature, a small number with which we do arithmetic. That is, the name is a simple (if primitive way) to express more meaning in the code.
We do this everywhere in programming: For instance we may call a struct{ x, y float64 }
a Point
rather than a Pair
or a Tuple
because we want to express the fact that we're dealing with a point in a 2D coordinate system (for instance). And so forth.
It just so happens that byte
is often what we mean when have a 8 bit unsigned integer, so it's nice to give it a good name.
I'm not saying people are doing this consistently, or even should follow this as a hard and fast rule. Guidelines can only go so far - good naming requires experience and is more art than science. So to answer your question explicitly: No, there should be no rule to be followed strictly.
You yourself mention that you don't see a problem with int32
and rune
. The exact same thing is going on there as with byte
and uint8
. It just may be the case that more often than not people do arithmetic with bytes than with runes, which perhaps colors your impression (I'm speculating here).
Finally, and especially now that the language supports type aliases as a first-class construct, there's really no complexity to speak of here, neither in the implementation nor the spec.
In closing, there doesn't seem to be anything to gain here by removing byte
from the language. Most likely you will just find people defining type byte = uint8
all over the place. This doesn't seem to be worth it.
I'm against this proposal. There's bigger fish to fry.
I'm not saying people are doing this consistently, or even should follow this as a hard and fast rule. Guidelines can only go so far - good naming requires experience and is more art than science. So to answer your question explicitly: No, there should be no rule to be followed strictly.
Well, that's the point. As there is no strict rule here compared to rune
, my suggestion was to unify them to uint8
and avoid bike-shedding. Now I started to be convinced, not fully yet, they have different context.
You yourself mention that you don't see a problem with int32 and rune. The exact same thing is going on there as with byte and uint8. It just may be the case that more often than not people do arithmetic with bytes than with runes, which perhaps colors your impression (I'm speculating here).
Right. Probably I'd be fine even if rune
and int32
were not interchangeable and required explicit conversion. However, I'd not be happy if byte
and uint8
were not interchangeable.
@hajimehoshi Before removing byte
, I'd probably remove uint8
. The byte
type is just much more common. I'd use the uint8
type for variables that really are integers but where we know that we only need small values and thus can save space. The byte
type on the other hand is fundamentally the smallest data type we can address directly with a pointer. We really want to keep that name, it's a good name.
As I said above, it's all about being able to chose a fitting name depending on context.
Perhaps drop all of uint8
, uint16
, uint32
and uint64
and replace them with names byte
, byte2
, byte4
, and byte8
. Many times we use an intxx
instead of an uintxx
anyway to represent positive-only numbers, e.g. runes which can't be negative.
Perhaps there could even be an optimized conversion between bytex
and byte[x]
arrays.
There is little support for this proposal. Declined.