Athlon1600/SerpScraper

All keywords starting with "test" returns same results when using proxies

Closed this issue · 5 comments

There seems to be a problem with keywords starting with "test". They all returns the same 10 result:

[[
"http://www.test.com/",
"http://en.wikipedia.org/wiki/Test_cricket",
"http://en.wikipedia.org/wiki/Test",
"http://www.speedtest.net/",
"http://www.humanmetrics.com/cgi-win/jtypes2.asp",
"http://test.org.uk/",
"http://www.speakeasy.net/speedtest/",
"http://omgipv6day.com/",
"http://www.adobe.com/shockwave/welcome/",
"http://html5test.com/"
]]

This is what I'm getting:

$res = $engine->search("test");

gives

  [0]=>
  string(25) "http://www.speedtest.net/"
  [1]=>
  string(21) "https://www.test.com/"
  [2]=>
  string(34) "https://en.wikipedia.org/wiki/Test"
  [3]=>
  string(36) "https://www.speakeasy.net/speedtest/"
  [4]=>
  string(56) "https://www.google.com/webmasters/tools/mobile-friendly/"
  [5]=>
  string(47) "http://www.humanmetrics.com/cgi-win/jtypes2.asp"
  [6]=>
  string(29) "http://speedtest.xfinity.com/"
  [7]=>
  string(53) "https://www.16personalities.com/free-personality-test"
  [8]=>
  string(29) "http://www.att.com/speedtest/"
  [9]=>
  string(52) "https://implicit.harvard.edu/implicit/takeatest.html"
  [10]=>
  string(21) "http://test-ipv6.com/"
$res = $engine->search("test kit");

gives:

  [0]=>
  string(23) "http://testkitplus.com/"
  [1]=>
  string(44) "http://testkitplus.com/product/mdma-test-kit"
  [2]=>
  string(47) "https://dancesafe.org/testing-kit-instructions/"
  [3]=>
  string(56) "https://dancesafe.org/product/mecke-reagent-testing-kit/"
  [4]=>
  string(43) "https://www.youtube.com/watch?v=WQe4vNap168"
  [5]=>
  string(23) "https://www.eztest.com/"
  [6]=>
  string(64) "http://www.amazon.com/5ML-Marquis-Reagent-Test-Kit/dp/B00NS6Q1VK"
  [7]=>
  string(56) "http://doc.akka.io/docs/akka/snapshot/scala/testing.html"
  [8]=>
  string(48) "http://whatismolly.com/product/mdma-testing-kit/"
  [9]=>
  string(31) "http://testkit.sourceforge.net/"
  [10]=>
  string(22) "http://bunkpolice.com/"

are you doing something different?

It seems to have something to do with proxies. I tried again and can confirm that it works normal without proxies but when I use $google->setProxy("myproxy"); all results (starting with "test") returns the same response as above.

Maybe a guzzle bug?

I just tested it with a proxy and everything works fine.. show me the exact code you're using. I'm very curious how that could be happening.

maybe proxy is giving you false urls/results. check raw html output from that proxy...

https://gist.github.com/kjellberg/cdf111ae55119109f29a74bb061c7afe

Here is a part of my code. It works perfectly with all other keywords except them starting with "test".

I tried to change both proxies and google_domain but it doesnt help.

I also tried "test+kit" and "test20%kit". The html response still shows me 10 results from keyword "test":
http://imgur.com/D4D3KZm

Wierd but interesting..