Jayx239/BrowseSharp

Some sites aren't correctly nagivated to

Closed this issue · 8 comments

Describe the bug
When navigating a site, the site does not respond, returning nothing.

To Reproduce
(from Visual Studio Immediate window)
var browser = new Browser();
{BrowseSharp.Browser}
Authenticator: null
AutomaticDecompression: true
BaseHost: null
BaseUrl: null
CachePolicy: null
ClientCertificates: null
CookieContainer: {System.Net.CookieContainer}
DefaultParameters: Count = 1
DefaultUriProtocol: "http"
Document: '(browser = new Browser()).Document' threw an exception of type 'System.InvalidOperationException'
Documents: Count = 0
Encoding: {System.Text.UTF8Encoding}
FollowRedirects: true
ForwardHistory: Count = 0
History: Count = 0
JavascriptEngine: {BrowseSharp.Javascript.JavascriptEngine}
JavascriptScrapingEnabled: true
MaxHistorySize: -1
MaxRedirects: null
Pipelined: false
PreAuthenticate: false
Proxy: null
ReadWriteTimeout: 0
RemoteCertificateValidationCallback: null
StyleEngine: {BrowseSharp.Style.StyleEngine}
StyleScrapingEnabled: true
Timeout: 0
UseSynchronizationContext: false
UserAgent: null
_history: {BrowseSharp.History.HistoryManager}
_javascriptEngine: {BrowseSharp.Javascript.JavascriptEngine}
_restClient: {RestSharp.RestClient}
_styleEngine: {BrowseSharp.Style.StyleEngine}

var document = browser.Navigate("https://perdaman.com.au")
{BrowseSharp.Document}
Body: {AngleSharp.Dom.Html.HtmlBodyElement}
Children: {AngleSharp.Dom.Collections.HtmlCollection<AngleSharp.Dom.IElement>}
Data: null
FirstElementChild: {AngleSharp.Dom.Html.HtmlHtmlElement}
Forms: Count = 0
Head: {AngleSharp.Dom.Html.HtmlHeadElement}
HtmlDocument: {AngleSharp.Dom.Html.HtmlDocument}
LastElementChild: {AngleSharp.Dom.Html.HtmlHtmlElement}
Request: {RestSharp.RestRequest}
RequestUri: {https://perdaman.com.au/}
Response: "StatusCode: 0, Content-Type: , Content-Length: 0)"
Scripts: Count = 0
Styles: Count = 0

Expected behavior
Was expecting a Content-Length greater than zero.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
    Windows Server 2012 R2

Additional context
Background article https://dev.to/bugmagnet/doing-google-s-natural-language-classifytext-in-asp-net-4can

Hi,
Can you replicate this issue outside of the immediate window by writing the code to a file and running it. I was having trouble running this code in the immediate window but when I compiled and executed it I was able to get a response with the documents content, although the Content-Length was set to -1.

Also, can you give me a little more info.
Are you using the nuget package or building from source:
Package version:
.NET version:

I think we can close this issue. I have just been working with 0.1.1-beta and have been able to access the troublesome site. I had been using 0.0.8

        public static Browser NavigateToUrl(string url)
        {
            var browser = new BrowseSharp.Browser()
            {
                FollowRedirects = true,
                AutomaticDecompression = true,
                Timeout = 45000,
                JavascriptScrapingEnabled = true
            };
            browser.Navigate(url);
            return browser;
        }
            Browser browser = NavigateToUrl(url);
            var document = browser.Document;
            var html = document.Response.Content;

The url https://searchsmart.com.au gives the following for browser and document. It gives empty string for html.

browser
{BrowseSharp.Browser}
    Authenticator: null
    AutomaticDecompression: true
    BaseHost: null
    BaseUrl: {https://searchsmart.com.au/}
    CachePolicy: null
    ClientCertificates: null
    CookieContainer: {System.Net.CookieContainer}
    DefaultParameters: Count = 1
    DefaultUriProtocol: "http"
    Document: {BrowseSharp.Common.Document}
    Documents: Count = 1
    Encoding: {System.Text.UTF8Encoding}
    FollowRedirects: true
    ForwardHistory: Count = 0
    History: Count = 1
    JavascriptEngine: {BrowseSharp.Common.Javascript.JavascriptEngine}
    JavascriptScrapingEnabled: true
    MaxHistorySize: -1
    MaxRedirects: null
    Pipelined: false
    PreAuthenticate: false
    Proxy: null
    ReadWriteTimeout: 0
    RemoteCertificateValidationCallback: null
    StyleEngine: {BrowseSharp.Common.Style.StyleEngine}
    StyleScrapingEnabled: true
    Timeout: 45000
    UseSynchronizationContext: false
    UserAgent: null
    _history: {BrowseSharp.Common.History.HistoryManager}
    _javascriptEngine: {BrowseSharp.Common.Javascript.JavascriptEngine}
    _restClient: {RestSharp.RestClient}
    _styleEngine: {BrowseSharp.Common.Style.StyleEngine}
document
{BrowseSharp.Common.Document}
    Body: {AngleSharp.Html.Dom.HtmlBodyElement}
    Children: {AngleSharp.Dom.HtmlCollection<AngleSharp.Dom.IElement>}
    Data: null
    FirstElementChild: {AngleSharp.Html.Dom.HtmlHtmlElement}
    Forms: Count = 0
    Head: {AngleSharp.Html.Dom.HtmlHeadElement}
    HtmlDocument: {AngleSharp.Html.Dom.HtmlDocument}
    LastElementChild: {AngleSharp.Html.Dom.HtmlHtmlElement}
    Request: {RestSharp.RestRequest}
    RequestUri: {https://searchsmart.com.au/}
    Response: "StatusCode: 0, Content-Type: , Content-Length: 0)"
    Scripts: Count = 0
    Styles: Count = 0

Am I supposed to be providing something for the HTTPS to work? I ask this because document.Response gives the following:

document.Response
"StatusCode: 0, Content-Type: , Content-Length: 0)"
    Content: ""
    ContentEncoding: null
    ContentLength: 0
    ContentType: null
    Cookies: Count = 0
    ErrorException: {"The request was aborted: Could not create SSL/TLS secure channel."}
    ErrorMessage: "The request was aborted: Could not create SSL/TLS secure channel."
    Headers: Count = 0
    IsSuccessful: false
    ProtocolVersion: null
    RawBytes: null
    Request: {RestSharp.RestRequest}
    ResponseStatus: Error
    ResponseUri: null
    Server: null
    StatusCode: 0
    StatusDescription: null

Note the ErrorMessage: "The request was aborted: Could not create SSL/TLS secure channel." Is it up to me to create an "SSL/TLS secure channel" or is that something your code is supposed to be handling?

Hey, I think I ran into this issue before. What I recall is that the default setting for the security protocol before .NET 4.7.1 is different and was causing issues. Using the below snippet previously fixed issues like this for me:

ServicePointManager.SecurityProtocol |= SecurityProtocolType.Ssl3 | SecurityProtocolType.Tls12 | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls; // Updates protocol for target framework 4.5.2

That being said, when I tried to test your code with this I was getting the below exception:
{"Unable to connect to the remote server"}

Can you give this a try and see if the change in security protocol fixes your issue?

A bit of an improvement. At least now I'm getting something out of BrowseSharp. However, in the case of https://searchsmart.com.au the html starts with "\vD]\0�������\u000e����fU�Y�m�\u0011�\u0011a\u001e��Q��"

Okay, after a reboot, the fix is working and I'm not getting random gibberish

That's great. So is this issue resolved?

Yes