/RandomWalk

Random Walk stuff

Primary LanguageJava

#Rooted Random Walk

Walking on a rooted graph makes sure on every step there's a probability p that the walker will come back to root and start over. In other words, the walk is guaranteed to converge no matter where one starts. After some number of iterations we will get a stationary distribution over the whole graph.

I wrote some code for search keywords discovery. Given a brand, or a URL, what search keywords will most likely lead to it and what URLs people are most likely to visit? This is useful for brand awareness measurement.

#Input

  • search log file: tab separated 3 cols (query url frequency).
  • starting points: initial distribution over nodes. can be queries, URLs, or mixture of both
  • other parameters: number of iterations, p of coming to root, minimal frequency, etc.

#Output Sample output when starting node is query "verizon" with prob = 1.0:

Node Relative probability
q_verizon 5873.263398339937
u_verizonwireless.com 1455.9084581603506
q_verizon wireless 401.6830507740922
u_www.verizonwireless.com 390.5675267279395
u_www.verizon.com/myverizon 168.0251122644921
u_fios.verizon.com 125.3538862114845
u_www.verizon.com 103.63712605159769
q_verizon.com 81.49887513592103
u_login.verizonwireless.com/amserver/ui/login 78.14558939858978
u_verizon.net 75.8823853015666
q_verizon wireless my account 52.246465116221806
u_verizonwirelessdeals.com 49.96107820664791
q_verizonwireless.com 23.877661355753848
u_verizon.com.bundlensave.com 23.017610430971754
q_my verizon 20.610333293360657
u_verizonfiosbundles.com 19.583158701515075
q_verizon.net 19.35466362865535
q_my verizon account login 18.106004623570044
q_verizon fios 17.52378657847812