SimpleBrowserDotNet/SimpleBrowser

Request is not being validated

wallacemariadeandrade opened this issue · 8 comments

Could you guys help me on submitting a request to bgp.he.net? I'm trying to extract data from a search like bgp.he.net/AS15169 but the server validation is blocking the request.

I'm using the following code:

var browser = new Browser();

// log the browser request/response data to files so we can interrogate them in case of an issue with our scraping
browser.RequestLogged += OnBrowserRequestLogged;
browser.MessageLogged += new Action<Browser, string>(OnBrowserMessageLogged);

// we'll fake the user agent for websites that alter their content for unrecognised browsers
browser.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10";

// browse to https://bgp.he.net
browser.Navigate("http://bgp.he.net/");
if(LastRequestFailed(browser)) return; // always check the last request in case the page failed to load

browser.Log("Searching by input field");
var searchInput = browser.Find("search_search");
var commitSearch = browser.Find("input", FindBy.Name, "commit");
if(searchInput.Exists && commitSearch.Exists)
{
    searchInput.Value = "AS15169";
    commitSearch.SubmitForm();
    if(LastRequestFailed(browser)) return;
}
else
    browser.Log($"Search input exists? {searchInput.Exists}\nCommit search button exists? {commitSearch.Exists}");


WriteLine($"Request OK! Saving output to {WriteFile("content.html", browser.CurrentHtml)}");

Hi Wallace,

Happy to help. I ran your code. I get this in the console:

 -> GET request to http://bgp.he.net/
 <- Response status code: 301
 -> GET request to https://bgp.he.net/
 <- Response status code: 200
Searching by input field
New HTML result set obtained, containing 0 element(s)
New HTML result set obtained, containing 0 element(s)
New HTML result set obtained, containing 1 element(s)
New HTML result set obtained, containing 1 element(s)
Setting the value of &lt;input id=&quot;search_search&quot; name=&quot;search[search]&quot; size=&quot;15&quot; type=&quot;text&quot; /&gt; to AS15169
Submitting parent/ancestor form of: &lt;input name=&quot;commit&quot; type=&quot;submit&quot; value=&quot;Search&quot; /&gt;
New HTML result set obtained, containing 0 element(s)
New HTML result set obtained, containing 0 element(s)
 -> GET request to https://bgp.he.net/search?search%5bsearch%5d=AS15169
 <- Response status code: 302
New HTML result set obtained, containing 0 element(s)
 -> GET request to https://bgp.he.net/cc
 <- Response status code: 200
Request OK! Saving output to C:\dev\SimpleBrowser\SimpleBrowserDotNet\Sample\bin\Debug\netcoreapp2.0\Logs\content.html

As far as I can tell, this code is doing what is expected. Are you getting an error or not getting the response you expected? What do you mean by "server validation is blocking the request"?

Kevin

Hi Kevin! I'm not getting response I expected. If you look at content.html file you'll see this output:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<!-- rmosher 2010 - 2016 -->
<meta http-equiv="Content-type" content="text/html;charset=UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<script src="/javascripts/jquery/jquery-1.4.4.js?1414109767" type="text/javascript"></script>
<script src="/javascripts/jquery/jquery.history.js?1364589087" type="text/javascript"></script>
<script src="/javascripts/jquery/jquery-ui.js?1269850573" type="text/javascript"></script>
<script src="/javascripts/jquery/jrails.js?1269850578" type="text/javascript"></script>
<script src="/javascripts/bgp.js?1260526324" type="text/javascript"></script>
<link href="/stylesheets/bgp.css?1553925714" media="all" rel="stylesheet" type="text/css" />


<script src="/javascripts/jstest.js?1442294748" type="text/javascript"></script>
<meta http-equiv="refresh" content="15; url=/jf">


</head>

<body>
	<div id='header'>
		<a href="//www.he.net/"><img alt='Hurricane Electric' src='/helogo.gif' /></a>
		<form action="/search" method="get">
			<div class='search'>
			<input id="search_search" name="search[search]" size="15" type="text" />
			<input name="commit" type="submit" value="Search" />
		</div>
		</form>
		
		<div class='clear'></div>
		<div class='floatleft'>
			<div class='leftsidemenu'>
				<div class='menuheader'>Quick Links</div>
				<ul class='leftsidemenuitems'>
					<li><a href='//bgp.he.net/'>BGP Toolkit Home</a></li>
					<li><a href="/report/prefixes">BGP Prefix Report</a></li>
					<li><a href="/report/peers">BGP Peer Report</a></li>
					<li><a href="/report/exchanges">Exchange Report</a></li>
					<li><a href="/report/bogons">Bogon Routes</a></li>
					<li><a href="/report/world">World Report</a></li>
					<li><a href="/report/multi-origin-routes">Multi Origin Routes</a></li>
					<li><a href="/report/dns">DNS Report</a></li>
					<li><a href="/report/tophosts">Top Host Report</a></li>
					<li><a href="/report/netstats">Internet Statistics</a></li>
					<li><a href='//lg.he.net/'>Looking Glass</a></li>
					<li><a href='//networktools.he.net/'>Network Tools App</a></li>
					<li><a href='//tunnelbroker.net/'>Free IPv6 Tunnel</a></li>
					<li><a href='//ipv6.he.net/certification/'>IPv6 Certification</a></li>
					<li><a href='//bgp.he.net/ipv6-progress-report.cgi'>IPv6 Progress</a></li>
					<li><a href='//bgp.he.net/going-native.pdf'>Going Native</a></li>
					<li><a href='//bgp.he.net/contact/'>Contact Us</a></li>
				</ul>
	
			</div>
			<div class='clear'></div>
			<div class='social'>
			
				<a href="/r/Twitter" title="Hurricane Electric on Twitter"><img alt="Hurricane Electric on Twitter" src="/images/twitter.png?1215539178" /></a>
			
				<a href="/r/Facebook" title="Hurricane Electric on Facebook"><img alt="Hurricane Electric on Facebook" src="/images/facebook.png?1215539178" /></a>
			
			</div>
		</div>
	</div>

	<div id='content'>
			
		

		



<div class='clear'></div>

<div id='error' class='tabdata'>
Please wait while we validate your browser.
</div>
<script type='text/javascript'>
var _0xb539=["\x62\x67\x70\x2E\x68\x65\x2E\x6E\x65\x74\x20\x72\x65\x71\x75\x69\x72\x65\x73\x20\x6A\x61\x76\x61\x73\x63\x72\x69\x70\x74\x20\x61\x6E\x64\x20\x63\x6F\x6F\x6B\x69\x65\x73\x20\x74\x6F\x20\x66\x75\x6E\x63\x74\x69\x6F\x6E\x2E\x20\x20\x50\x6C\x65\x61\x73\x65\x20\x65\x6E\x61\x62\x6C\x65\x20\x74\x68\x65\x73\x65\x20\x69\x6E\x20\x79\x6F\x75\x72\x20\x62\x72\x6F\x77\x73\x65\x72\x2E","\x74\x65\x78\x74","\x23\x65\x72\x72\x6F\x72","\x68\x61\x73\x68","\x6C\x6F\x63\x61\x74\x69\x6F\x6E","\x3F\x68\x3D","\x72\x65\x73\x70\x6F\x6E\x73\x65","\x70\x61\x74\x68","\x63\x6F\x6F\x6B\x69\x65","\x6A\x73\x74\x65\x73\x74","\x70\x6F\x73\x74","\x61\x6A\x61\x78"];function printerror(){$(_0xb539[2])[_0xb539[1]](_0xb539[0])}function doredirect(_0x1cc4x3){url='/cr';if(window[_0xb539[4]][_0xb539[3]]){url+=_0xb539[5]+encodeURIComponent(window[_0xb539[4]][_0xb539[3]])};window[_0xb539[4]]=url;}$(function(){$[_0xb539[11]]({url:'/i',dataType:_0xb539[1],complete:function(_0x1cc4x3){ip=_0x1cc4x3[_0xb539[6]];$[_0xb539[11]]({url:'/jc',data:{p:$[_0xb539[9]]($[_0xb539[8]](_0xb539[7])),i:$[_0xb539[9]](ip)},type:_0xb539[10],error:printerror,complete:doredirect});},error:printerror})});
</script>



	</div>
	
	<div id='footer'>
	Updated 23 Oct 2020 12:47 PST &copy; 2020 Hurricane Electric
	</div>
	
	<script type="text/javascript">
		var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
		document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
	</script>
	<script type="text/javascript">
		try {
			var pageTracker = _gat._getTracker("UA-12276073-1");
			pageTracker._trackPageview();
		} catch(err) {}
	</script>
</body>
</html>

As you can see the page appears to be in validation proccess, but nothing changes.

This isn't server validation. This is browser validation. The server has sent a page containing JavaScript to verify that certain browser features have been enabled. The JavaScript is in this line:

var _0xb539=["\x62\x67\x70\x2E\x68\x65\x2E\x6E\x65\x74\x20\x72\x65\x71\x75\x69\x72\x65\x73\x20\x6A\x61\x76\x61\x73\x63\x72\x69\x70\x74\x20\x61\x6E\x64\x20\x63\x6F\x6F\x6B\x69\x65\x73\x20\x74\x6F\x20\x66\x75\x6E\x63\x74\x69\x6F\x6E\x2E\x20\x20\x50\x6C\x65\x61\x73\x65\x20\x65\x6E\x61\x62\x6C\x65\x20\x74\x68\x65\x73\x65\x20\x69\x6E\x20\x79\x6F\x75\x72\x20\x62\x72\x6F\x77\x73\x65\x72\x2E","\x74\x65\x78\x74","\x23\x65\x72\x72\x6F\x72","\x68\x61\x73\x68","\x6C\x6F\x63\x61\x74\x69\x6F\x6E","\x3F\x68\x3D","\x72\x65\x73\x70\x6F\x6E\x73\x65","\x70\x61\x74\x68","\x63\x6F\x6F\x6B\x69\x65","\x6A\x73\x74\x65\x73\x74","\x70\x6F\x73\x74","\x61\x6A\x61\x78"];function printerror(){$(_0xb539[2])[_0xb539[1]](_0xb539[0])}function doredirect(_0x1cc4x3){url='/cr';if(window[_0xb539[4]][_0xb539[3]]){url+=_0xb539[5]+encodeURIComponent(window[_0xb539[4]][_0xb539[3]])};window[_0xb539[4]]=url;}$(function(){$[_0xb539[11]]({url:'/i',dataType:_0xb539[1],complete:function(_0x1cc4x3){ip=_0x1cc4x3[_0xb539[6]];$[_0xb539[11]]({url:'/jc',data:{p:$[_0xb539[9]]($[_0xb539[8]](_0xb539[7])),i:$[_0xb539[9]](ip)},type:_0xb539[10],error:printerror,complete:doredirect});},error:printerror})});

If you "nicify" (deobfuscate and pretty) that line, you get this:

'use strict';
/** @type {!Array} */
var _0xb539 = ["bgp.he.net requires javascript and cookies to function.  Please enable these in your browser.", "text", "#error", "hash", "location", "?h=", "response", "path", "cookie", "jstest", "post", "ajax"];
/**
 * @return {undefined}
 */
function printerror() {
  $(_0xb539[2])[_0xb539[1]](_0xb539[0]);
}
/**
 * @param {?} canCreateDiscussions
 * @return {undefined}
 */
function doredirect(canCreateDiscussions) {
  /** @type {string} */
  url = "/cr";
  if (window[_0xb539[4]][_0xb539[3]]) {
    /** @type {string} */
    url = url + (_0xb539[5] + encodeURIComponent(window[_0xb539[4]][_0xb539[3]]));
  }
  /** @type {string} */
  window[_0xb539[4]] = url;
}
$(function() {
  $[_0xb539[11]]({
    url : "/i",
    dataType : _0xb539[1],
    complete : function(keyValuePairsObj) {
      ip = keyValuePairsObj[_0xb539[6]];
      $[_0xb539[11]]({
        url : "/jc",
        data : {
          p : $[_0xb539[9]]($[_0xb539[8]](_0xb539[7])),
          i : $[_0xb539[9]](ip)
        },
        type : _0xb539[10],
        error : printerror,
        complete : doredirect
      });
    },
    error : printerror
  });
});

In a browser that supports JavaScript, the "$(function()" will execute as soon as the page is completely loaded. I haven't spent a lot of time trying to figure out what this JavaScript is doing, but I know that it's making an API call to the server. When the API call completes successfully, it calls the doredirect() function to redirect the browser to the target page.

SimpleBrowser can't execute the JavaScript, so the redirect can't happen automatically. That said, all is not lost. You can get the content of the script tag, drill into the doredirect() method and do what that method does manually - specifically, navigate to the page indicated by the created URL in that method. Again, I haven't spent much time looking at the JavaScript, but it doesn't look like it's that involved, but I could be wrong.

Wow! I didn't realize that numbers was only text encoded to hex. So what is happening is just a client side validation through Javascript to ensure requesting browser is not fake? Looks like a smart way to validate requests.

I'll work to turnaround this Javascript validation. Really glad for your help!

And SimpleBrowser is amazing, man. Good work.

Thanks, but it's not just me. I came in late. I merely seem to be the only one left. :)

I think the validation is to prevent bots from scraping data. The code is calling the server using the /i endpoint and sending the user's IP address. That call returns a hash value that likely contains the IP address and other data. The hash is then sent to the /jc endpoint. That endpoint creates a cookie in the browser. Later, if that cookie is present and it's valid, the browser is validated. If the cookie isn't present or invalid, the validation process starts again.

I am fairly certain this can be automated using a combination of C# webclient and SimpleBrowser. You will need to make the first two calls that the JavaScript is making to the /i and /jc endpoints using the webclient to get the cookie values (there are actually two). Once you have that, you can create a cookie in SimpleBrowser before navigating to bgp.he.net/AS15169. That should validate the SimpleBrowser session.

If you try that, let me know if it works.

The call to /i endpoint is ok, just returns user IP address. The problem is the call to /jc endpoint, which uses some Javascript functions delivered by external scripts:

<script src="/javascripts/jquery/jquery-1.4.4.js?1414109767" type="text/javascript"></script>
<script src="/javascripts/jquery/jquery.history.js?1364589087" type="text/javascript"></script>
<script src="/javascripts/jquery/jquery-ui.js?1269850573" type="text/javascript"></script>
<script src="/javascripts/jquery/jrails.js?1269850578" type="text/javascript"></script>
<script src="/javascripts/bgp.js?1260526324" type="text/javascript"></script>
<script src="/javascripts/jstest.js?1442294748" type="text/javascript"></script>

For example, jstest() computes a hash from IP address and is present on jstest.js file. The only ways I found to bypass this is translate the function to C# or to somehow execute this script. Do you have any other idea?

Unfortunately, that's the only way - to translate jstest() to C#.

I wish that SimpleBrowser could run JavaScript (even if it didn't do DOM manipulation) for cases just like this - where a chunk of JS needs to be run just to perform a calculation. I've tried to integrate a JS engine into SimpleBrowser a couple times, but there's so much work needed. For the amount of time that I have available to do the work, it would take years to complete on my own.

I imagine. Well, for my case I really need JS working because translation to C# will demand some time too and I'm not having much, so I'll look for another options.

I'm really glad for your help! Thanks!