microsoft/referencesource

HttpWebRequest BeginRead different behaviour on .NET Core vs Framework

thenik opened this issue · 5 comments

Hello ,

If you run this code sample
https://learn.microsoft.com/en-us/dotnet/api/system.net.httpwebrequest.begingetresponse?view=net-7.0

on .net framework 462 or 472 it calls BeginRead on a ThreadPool and it is standart behavious that I expect.
but
on .net core 2.2, 3 and above it calls BeginRead on a stack and it is wrong behaviour because if you need to load a big file it will rise an exception StackOverflow.

So question : how to run BeginRead for Stream from GetResponseStream on a ThreadPool on .NET Core 6/7?

Here is my small modification of original code from the sample that calls stackoverflow at .net core 6 or .net core 7

image

using System;
using System.Net;
using System.IO;
using System.Text;
using System.Threading;

public class RequestState
{
    // This class stores the State of the request.
    const int BUFFER_SIZE = 1;//1024;
    public StringBuilder requestData;
    public byte[] BufferRead;
    public HttpWebRequest request;
    public HttpWebResponse response;
    public Stream streamResponse;
    public RequestState()
    {
        BufferRead = new byte[BUFFER_SIZE];
        requestData = new StringBuilder("");
        request = null;
        streamResponse = null;
    }
}

class HttpWebRequest_BeginGetResponse
{
    public static ManualResetEvent allDone = new ManualResetEvent(false);
    const int BUFFER_SIZE_ONE = 1;//1024;
    const int DefaultTimeout = 2 * 60 * 1000; // 2 minutes timeout

    // Abort the request if the timer fires.
    private static void TimeoutCallback(object state, bool timedOut)
    {
        if (timedOut)
        {
            HttpWebRequest request = state as HttpWebRequest;
            if (request != null)
            {
                request.Abort();
            }
        }
    }

    static void Main()
    {

        try
        {
            // Create a HttpWebrequest object to the desired URL.
            HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create("https://mydataprovider.com");

            /**
              * If you are behind a firewall and you do not have your browser proxy setup
              * you need to use the following proxy creation code.

                // Create a proxy object.
                WebProxy myProxy = new WebProxy();

                // Associate a new Uri object to the _wProxy object, using the proxy address
                // selected by the user.
                myProxy.Address = new Uri("http://myproxy");


                // Finally, initialize the Web request object proxy property with the _wProxy
                // object.
                myHttpWebRequest.Proxy=myProxy;
              ***/
            // Create an instance of the RequestState and assign the previous myHttpWebRequest
            // object to its request field.
            RequestState myRequestState = new RequestState();
            myRequestState.request = myHttpWebRequest;

            // Start the asynchronous request.
            IAsyncResult result =
              (IAsyncResult)myHttpWebRequest.BeginGetResponse(new AsyncCallback(RespCallback), myRequestState);

            // this line implements the timeout, if there is a timeout, the callback fires and the request becomes aborted
            ThreadPool.RegisterWaitForSingleObject(result.AsyncWaitHandle, new WaitOrTimerCallback(TimeoutCallback), myHttpWebRequest, DefaultTimeout, true);

            // The response came in the allowed time. The work processing will happen in the
            // callback function.
            allDone.WaitOne();

            // Release the HttpWebResponse resource.
            myRequestState.response.Close();
        }
        catch (WebException e)
        {
            Log("\nMain Exception raised!");
            Log("\nMessage:{0}", e.Message);
            Log("\nStatus:{0}", e.Status);
            Log("Press any key to continue..........");
        }
        catch (Exception e)
        {
            Log("\nMain Exception raised!");
            Log("Source :{0} ", e.Source);
            Log("Message :{0} ", e.Message);
            Log("Press any key to continue..........");
            //Console.Read();
        }

        Log("finished");
    }
    private static void RespCallback(IAsyncResult asynchronousResult)
    {
        try
        {
            // State of request is asynchronous.
            RequestState myRequestState = (RequestState)asynchronousResult.AsyncState;
            HttpWebRequest myHttpWebRequest = myRequestState.request;
            myRequestState.response = (HttpWebResponse)myHttpWebRequest.EndGetResponse(asynchronousResult);

            // Read the response into a Stream object.
            Stream responseStream = myRequestState.response.GetResponseStream();
            myRequestState.streamResponse = responseStream;

            // Begin the Reading of the contents of the HTML page and print it to the console.
            IAsyncResult asynchronousInputRead = responseStream.BeginRead(myRequestState.BufferRead, 0, BUFFER_SIZE_ONE, new AsyncCallback(ReadCallBack), myRequestState);
            return;
        }
        catch (WebException e)
        {
            Log("\nRespCallback Exception raised!");
            Log("\nMessage:{0}", e.Message);
            Log("\nStatus:{0}", e.Status);
        }
        allDone.Set();
    }
    private static void ReadCallBack(IAsyncResult asyncResult)
    {
        try
        {

            RequestState myRequestState = (RequestState)asyncResult.AsyncState;
            Stream responseStream = myRequestState.streamResponse;
            int read = responseStream.EndRead(asyncResult);
            // Read the HTML page and then print it to the console.
            if (read > 0)
            {
                myRequestState.requestData.Append(Encoding.ASCII.GetString(myRequestState.BufferRead, 0, read));
                IAsyncResult asynchronousResult = responseStream.BeginRead(myRequestState.BufferRead, 0, BUFFER_SIZE_ONE, new AsyncCallback(ReadCallBack), myRequestState);
                return;
            }
            else
            {
                Log("\nThe contents of the Html page are : ");
                if (myRequestState.requestData.Length > 1)
                {
                    string stringContent;
                    stringContent = myRequestState.requestData.ToString();
                    Log(stringContent);
                }
                Log("Press any key to continue..........");
                //Console.ReadLine();

                responseStream.Close();
            }
        }
        catch (WebException e)
        {
            Log("\nReadCallBack Exception raised!");
            Log("\nMessage:{0}", e.Message);
            Log("\nStatus:{0}", e.Status);
        }
        catch(Exception e2)
        {
            Log(e2.Message);
        }
        allDone.Set();
    }

    static void Log(string m)
    {
        Console.WriteLine(m);
    }

    static void Log(string m, string m2)
    {
        Log(m + " - " + m2);
    }

    static void Log(string m, WebExceptionStatus m2)
    {
        Log(m + " - " + m2);
    }
}

That code sample is buggy and does not correctly follow the APM pattern. BeginXx methods may complete the operation synchronously, which is why the IAsyncResult.CompletedSynchronously property exists. The callback needs to check that property, and if it's true, exit immediately and instead allow the BeginXx call site to perform the continuation. Otherwise, a string of synchronously completing operations may stack dive. The docs should be fixed, but this isn't a bug in .NET Core nor is it configurable. .NET Core is simply faster and much more likely for operations to complete synchronously.

Thanks, got it.
It is clear how it works now =)
Funny that this shoud work from .NET 4.5 but on practice I do not see that even at .NET 4.8 .
Only .net core sends CompletedSynchronously == true .

I found your comment here dotnet/runtime#29024 also
and it has the same story.

If anyone works in Microsoft or anyone knows anybody works in Microsoft
ask Microsoft to adjust this samples =) because it is wrong now.

About this case the some relevant answer it to look at
https://learn.microsoft.com/en-us/dotnet/api/system.iasyncresult.completedsynchronously?view=net-7.0

when it is possible to find

Notes to Callers
Use this property to determine if the asynchronous operation completed synchronously. For example, this property can return true for an asynchronous I/O operation if the I/O request was small.

Hope that somebody will find that via google also.
Thanks for your help again!
Best regards.
Nik

svick commented

@thenik

If anyone works in Microsoft or anyone knows anybody works in Microsoft
ask Microsoft to adjust this samples =) because it is wrong now.

Most things are open now in .Net, so we don't have to rely on Microsoft employees. Namely:

  1. Anyone can open an issue, making sure the problem is properly tracked. I just did that at dotnet/dotnet-api-docs#8662.
  2. Anyone can open a Pull Request, fixing the documentation. If you want to do that, click the pencil icon on the documentation page, then click another pencil icon on the opened GitHub page.

@thenik I wonder what is the reason you are using HttpWebRequest and not HttpClient?

@antonfirsov it is an old legacy project ( https://mydataprovider.com web scraper ).
Started since Microsoft .NET Framework 2.0 .
I migrate it to .net 6. So it is hard to use HttpClient.
Additionally , in a real life I saw many times when
WebClient, HttpClient, HttpWebRequest do not work as expected,
all these classes can hang-up under the high load.
So we if you want a reliable software (webscraper) you have to managing all http requests itself.