icy/google-group-crawler

ajax-crawling

Closed this issue · 4 comments

thanks for making this.
but isn't this using new technology ? : https://developers.google.com/webmasters/ajax-crawling/docs/specification

icy commented

Hi @tinku99,

I think the answer is positive. In my script, I had to use _escaped_fragment_ to download data from Google. Basically, my script benefits from the fact that Google follows the specification :) You will see how a group is organized here [1].

My script is written in #bash, and it uses some known tools (lynx, wget,...) to download data. I believe someone can write and/or improve it by rewritting it in Python, Ruby bla bla. For me #bash is just enough.

[1] https://github.com/icy/google-group-crawler/blob/master/craw.sh#L28

[OT] (j/k) As @icy really loves bashing!

icy commented

@cmpitg ;) I like the idea of using pipe (|) to glue small things. Let's see how https://github.com/matz/streem would help ^^

👍