Some sites aren't correctly nagivated to
Closed this issue · 8 comments
Describe the bug
When navigating a site, the site does not respond, returning nothing.
To Reproduce
(from Visual Studio Immediate window)
var browser = new Browser();
{BrowseSharp.Browser}
Authenticator: null
AutomaticDecompression: true
BaseHost: null
BaseUrl: null
CachePolicy: null
ClientCertificates: null
CookieContainer: {System.Net.CookieContainer}
DefaultParameters: Count = 1
DefaultUriProtocol: "http"
Document: '(browser = new Browser()).Document' threw an exception of type 'System.InvalidOperationException'
Documents: Count = 0
Encoding: {System.Text.UTF8Encoding}
FollowRedirects: true
ForwardHistory: Count = 0
History: Count = 0
JavascriptEngine: {BrowseSharp.Javascript.JavascriptEngine}
JavascriptScrapingEnabled: true
MaxHistorySize: -1
MaxRedirects: null
Pipelined: false
PreAuthenticate: false
Proxy: null
ReadWriteTimeout: 0
RemoteCertificateValidationCallback: null
StyleEngine: {BrowseSharp.Style.StyleEngine}
StyleScrapingEnabled: true
Timeout: 0
UseSynchronizationContext: false
UserAgent: null
_history: {BrowseSharp.History.HistoryManager}
_javascriptEngine: {BrowseSharp.Javascript.JavascriptEngine}
_restClient: {RestSharp.RestClient}
_styleEngine: {BrowseSharp.Style.StyleEngine}
var document = browser.Navigate("https://perdaman.com.au")
{BrowseSharp.Document}
Body: {AngleSharp.Dom.Html.HtmlBodyElement}
Children: {AngleSharp.Dom.Collections.HtmlCollection<AngleSharp.Dom.IElement>}
Data: null
FirstElementChild: {AngleSharp.Dom.Html.HtmlHtmlElement}
Forms: Count = 0
Head: {AngleSharp.Dom.Html.HtmlHeadElement}
HtmlDocument: {AngleSharp.Dom.Html.HtmlDocument}
LastElementChild: {AngleSharp.Dom.Html.HtmlHtmlElement}
Request: {RestSharp.RestRequest}
RequestUri: {https://perdaman.com.au/}
Response: "StatusCode: 0, Content-Type: , Content-Length: 0)"
Scripts: Count = 0
Styles: Count = 0
Expected behavior
Was expecting a Content-Length greater than zero.
Desktop (please complete the following information):
- OS: [e.g. iOS]
Windows Server 2012 R2
Additional context
Background article https://dev.to/bugmagnet/doing-google-s-natural-language-classifytext-in-asp-net-4can
Hi,
Can you replicate this issue outside of the immediate window by writing the code to a file and running it. I was having trouble running this code in the immediate window but when I compiled and executed it I was able to get a response with the documents content, although the Content-Length was set to -1.
Also, can you give me a little more info.
Are you using the nuget package or building from source:
Package version:
.NET version:
I think we can close this issue. I have just been working with 0.1.1-beta and have been able to access the troublesome site. I had been using 0.0.8
public static Browser NavigateToUrl(string url)
{
var browser = new BrowseSharp.Browser()
{
FollowRedirects = true,
AutomaticDecompression = true,
Timeout = 45000,
JavascriptScrapingEnabled = true
};
browser.Navigate(url);
return browser;
}
Browser browser = NavigateToUrl(url);
var document = browser.Document;
var html = document.Response.Content;
The url https://searchsmart.com.au
gives the following for browser and document. It gives empty string for html.
browser
{BrowseSharp.Browser}
Authenticator: null
AutomaticDecompression: true
BaseHost: null
BaseUrl: {https://searchsmart.com.au/}
CachePolicy: null
ClientCertificates: null
CookieContainer: {System.Net.CookieContainer}
DefaultParameters: Count = 1
DefaultUriProtocol: "http"
Document: {BrowseSharp.Common.Document}
Documents: Count = 1
Encoding: {System.Text.UTF8Encoding}
FollowRedirects: true
ForwardHistory: Count = 0
History: Count = 1
JavascriptEngine: {BrowseSharp.Common.Javascript.JavascriptEngine}
JavascriptScrapingEnabled: true
MaxHistorySize: -1
MaxRedirects: null
Pipelined: false
PreAuthenticate: false
Proxy: null
ReadWriteTimeout: 0
RemoteCertificateValidationCallback: null
StyleEngine: {BrowseSharp.Common.Style.StyleEngine}
StyleScrapingEnabled: true
Timeout: 45000
UseSynchronizationContext: false
UserAgent: null
_history: {BrowseSharp.Common.History.HistoryManager}
_javascriptEngine: {BrowseSharp.Common.Javascript.JavascriptEngine}
_restClient: {RestSharp.RestClient}
_styleEngine: {BrowseSharp.Common.Style.StyleEngine}
document
{BrowseSharp.Common.Document}
Body: {AngleSharp.Html.Dom.HtmlBodyElement}
Children: {AngleSharp.Dom.HtmlCollection<AngleSharp.Dom.IElement>}
Data: null
FirstElementChild: {AngleSharp.Html.Dom.HtmlHtmlElement}
Forms: Count = 0
Head: {AngleSharp.Html.Dom.HtmlHeadElement}
HtmlDocument: {AngleSharp.Html.Dom.HtmlDocument}
LastElementChild: {AngleSharp.Html.Dom.HtmlHtmlElement}
Request: {RestSharp.RestRequest}
RequestUri: {https://searchsmart.com.au/}
Response: "StatusCode: 0, Content-Type: , Content-Length: 0)"
Scripts: Count = 0
Styles: Count = 0
Am I supposed to be providing something for the HTTPS to work? I ask this because document.Response
gives the following:
document.Response
"StatusCode: 0, Content-Type: , Content-Length: 0)"
Content: ""
ContentEncoding: null
ContentLength: 0
ContentType: null
Cookies: Count = 0
ErrorException: {"The request was aborted: Could not create SSL/TLS secure channel."}
ErrorMessage: "The request was aborted: Could not create SSL/TLS secure channel."
Headers: Count = 0
IsSuccessful: false
ProtocolVersion: null
RawBytes: null
Request: {RestSharp.RestRequest}
ResponseStatus: Error
ResponseUri: null
Server: null
StatusCode: 0
StatusDescription: null
Note the ErrorMessage: "The request was aborted: Could not create SSL/TLS secure channel."
Is it up to me to create an "SSL/TLS secure channel" or is that something your code is supposed to be handling?
Hey, I think I ran into this issue before. What I recall is that the default setting for the security protocol before .NET 4.7.1 is different and was causing issues. Using the below snippet previously fixed issues like this for me:
ServicePointManager.SecurityProtocol |= SecurityProtocolType.Ssl3 | SecurityProtocolType.Tls12 | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls; // Updates protocol for target framework 4.5.2
That being said, when I tried to test your code with this I was getting the below exception:
{"Unable to connect to the remote server"}
Can you give this a try and see if the change in security protocol fixes your issue?
A bit of an improvement. At least now I'm getting something out of BrowseSharp. However, in the case of https://searchsmart.com.au
the html starts with "\vD]\0�������\u000e����fU�Y�m�\u0011�\u0011a\u001e��Q��"
Okay, after a reboot, the fix is working and I'm not getting random gibberish
That's great. So is this issue resolved?
Yes