G-Collector is a Rust-based tool designed to automate the collection of URLs from Google search results. It utilizes the thirtyfour
Rust crate to streamline the manual process of web scraping.
The G-Collector project originated as a proof of concept to explore Rust's capabilities in browser automation. It has evolved into a convenient tool for web scraping and is shared here to assist others in getting started with browser automation in Rust.
Before you begin, ensure you have the following in place:
-
Rust and Cargo installed. Install Rust
-
A compatible operating system, preferably Linux Mint, Ubuntu, or Debian.
-
Note that Google frequently updates its search result pages, which may affect the program's functionality. There are no guarantees that it will work as expected.
-
Be aware that making too many requests to Google can lead to IP blocking. Use the program responsibly.
-
You will need chromedriver and the Chrome browser installed.
- Check your Chrome version (Help -> About Google Chrome).
- Download the appropriate
chromedriver
zip file for your Chrome version. - Extract the zip file and run
./chromedriver
in a command terminal. This opens the webdriver server athttp://localhost:9515
.
- You can also download the
chrome
binary from chrome-for-testing. This version of Chrome is specifically designed for automated browser testing.
Use the following command to install G-Collector:
cargo install --path .
G-Collector provides a set of command-line arguments for customizing your search:
$ g-collector --help
A command line program to scrape google results urls given a keyword to search for
Usage: g-collector [OPTIONS] --chrome <CHROME> --search-for <SEARCH_FOR>
Options:
-c, --chrome <CHROME> Full path of the chrome binary
-p, --proxy <PROXY> Proxy to use if any
-s, --search-for <SEARCH_FOR> The keyword(s) to search for in Google
-n, --number-of-scrolls <NUMBER_OF_SCROLLS> Number of times to scroll down the page [default: 1]
-h, --help Print help
-V, --version Print version
$
To collect results for a keyword:
g-collector -c /path/to/chrome-for-testing/chrome-linux64/chrome -s "i love rust"
To use a SOCKS5 proxy (note: Tor might be blocked by Google):
g-collector -c /path/to/chrome-for-testing/chrome-linux64/chrome -s "i love rust" -p "socks5://localhost:9050"
Adjust the number of scrolls to control the number of results:
g-collector -c /path/to/chrome-for-testing/chrome-linux64/chrome -s "i love rust" -n 20
G-Collector consists of a library and a binary. The library contains a single function, `scrape``, while the binary calls this function. This separation allows you to use the scrape library programmatically from other Rust code and process results as needed.
This project is licensed under the MIT License. You are free to use, modify, and distribute it according to the terms of this license.