dotnet/runtime

[Analyzer Proposal]: Analyzer/fixer for converting Stream.Read calls to ReadAtLeast and ReadExactly

eerhardt opened this issue ยท 16 comments

With the addition of the new Stream ReadAtLeast and ReadExactly methods, there is a pattern of mistakes we can catch with an analyzer and suggest the caller use the new APIs. For example:

stream.Read(buffer, 0, buffer.Length);

where the return value isn't checked, and recommend it be changed to a call to ReadExactly.

Or cases where the return value is checked but required to be the same as what was requested:

if (stream.Read(buffer, 0, buffer.Length) != buffer.Length) throw ...;

and similarly suggest it use ReadExactly.

If the caller is using a length that is different than the length of the buffer being passed in, and not checking the return value, we can suggest to use ReadAtLeast:

stream.Read(buffer, 0, count);

would become:

stream.ReadAtLeast(buffer, count);

However, note in the ReadAtLeast case, the caller should still be checking the return value because the ReadAtLeast call could be returning more than count bytes, and this data would be missed if the caller only consumed count bytes from buffer.

One problem with this analyzer might be when we implement #34098. If the caller saw the "Do not discard results of methods marked as DoNotIgnoreReturnValue" warning first and addressed it, they would lose out on this fixer.

cc @stephentoub

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

With the addition of the new Stream ReadAtLeast and ReadExactly methods, there is a pattern of mistakes we can catch with an analyzer and suggest the caller use the new APIs. For example:

stream.Read(buffer, 0, buffer.Length);

where the return value isn't checked, and recommend it be changed to a call to ReadExactly.

Or cases where the return value is checked but required to be the same as what was requested:

if (stream.Read(buffer, 0, buffer.Length) != buffer.Length) throw ...;

and similarly suggest it use ReadExactly.

If the caller is using a length that is different than the length of the buffer being passed in, and not checking the return value, we can suggest to use ReadAtLeast:

stream.Read(buffer, 0, count);

would become:

stream.ReadAtLeast(buffer, count);

However, note in the ReadAtLeast case, the caller should still be checking the return value because the ReadAtLeast call could be returning more than count bytes, and this data would be missed if the caller only consumed count bytes from buffer.

One problem with this analyzer might be when we implement #34098. If the caller saw the "Do not discard results of methods marked as DoNotIgnoreReturnValue" warning first and addressed it, they would lose out on this fixer.

cc @stephentoub

Author: eerhardt
Assignees: -
Labels:

api-suggestion, area-System.IO, code-analyzer, code-fixer

Milestone: -

One problem with this analyzer might be when we implement #34098. If the caller saw the "Do not discard results of methods marked as DoNotIgnoreReturnValue" warning first and addressed it, they would lose out on this fixer.

I don't think these conflict, as the methods aren't pure (changing state of the stream) and won't be marked as such?

[edit] I was pointed at the fact that the referenced issue is no longer about purity alone

@jeffhandley @stephentoub with 3.1 EOL in 3 months, and based on some of the issues posted here recently, we can expect a number of folks migrating before support expires. Some of them will hit this and get a hang in their app. It might take a while to connect to the breaking change notice because there isn't an obvious "smell" (the doc doesn't contain the word "hang" for example). Once they've made that connection, they may have many places in their codebase that need auditing for the bug and fixing.

An analyzer/fixer could help with both those hurdles. I wonder whether we should start now to front load getting one into the SDK ..

The intent had been that this be addressed in .NET 7. It unfortunately got pushed out due to lack of time. We should tackle it soon.

Note that another analyzer that would help in this case is #34098. Code that doesn't capture the result of Stream.Read or BinaryReader.Read would get flagged by that analyzer as well (along with other APIs that shouldn't have their results ignored).

However, the downfall of #34098 is that the necessary attribute didn't get added to net7.0, so we can't mark the APIs as "do not discard" yet.

Estimates:

  • Analyzer: Medium
  • Fixer: Medium

Video

Looks good as proposed.

The diagnostic (possibly with a different diagnostic ID) should still fire on older TFMs to report that the call made to Read is unreliable as written.

Category: Reliability
Severity: Warning

I would like to be assigned to this one, please.

I would like to be assigned to this one, please.

Terrific. Done. Thanks!

What is the reason for suggesting to call ReadAtLeast instead of ReadExactly when the length is different from the buffer size?
Is it due to the difference in how the end of the stream is handled in ReadAtLeast (but wouldn't this also apply to the first case)?

What is the reason for suggesting to call ReadAtLeast instead of ReadExactly when the length is different from the buffer size? Is it due to the difference in how the end of the stream is handled in ReadAtLeast (but wouldn't this also apply to the first case)?

I'm actually not sure now... @eerhardt, do you remember what you had in mind there?

@stephentoub I think the initial post is based on a comment from you when discussing the implementation of ReadExactly/ReadAtLeast.

One of the reasons why I am following up on the second code fix is because I am not sure how we can enforce the following:

However, note in the ReadAtLeast case, the caller should still be checking the return value because the ReadAtLeast call could be returning more than count bytes, and this data would be missed if the caller only consumed count bytes from buffer.

The code is still unreliable (in a different way) if we just change the call to ReadAtLeast. We could generate something like var bytesRead = stream.ReadAtLeast(buffer, count); to encourage the user to use the return value.

in fact, if I read this right, if you change from Read to ReadAtLeast you may introduce additional bugs because now you can overwrite data past the original count being passed in? it might be best to always suggest ReadExactly when the return value isn't checked in the first place? maybe the description of the diagnostic can mention the other options

Thinking about this now, I think ReadExactly is the only correct suggestion here. It's the only method you can call without checking the return value, as it is the only method that ensures you got exactly count bytes. When calling ReadAtLeast, you still need to check the return value in case of EOS (when throwOnEndOfStream = false), or when more data is read than minimumBytes.

Yeah, for the life of me I can't think of why I wrote ReadAtLeast in my original comment. It should be ReadExact.

Thank you all for helping to clarify ๐Ÿ˜ƒ.