/gcs

Program shows how easy it is to use the Google Custom Search Engine from the command line

Primary LanguageShell

## Overview

Update April 2018 : Wrote in 2013 and not tested recently.

I wrote this script as a prood of concept after blogging about how the API to
the Google Custom Search Engine works.

This script performs a Google Custom Search from the command-line using the API,
so the query follows Google's terms and conditions.


http://dazdaztech.wordpress.com/2013/08/03/using-google-custom-search-api-from-the-command-line/

Here is how to use the Google Custom Search API from the command line. I
could.nt find this information clearly laid out anywhere on the web in a simple
5 step process, so here it is.

If you did.nt know, Google have for a long time banned unauthenticated bot usage
on Google Search (your IP will be blocked if you do so) to prevent abuse, this is
why we are using Google CSE, which requires authentication.

The first 100 searches per day are free. Any more, then you have to pay $5 per
1000 queries, for up to 10,000 queries per day, just enable billing to do so.

Google searches and Google Custom Searches will yield slightly different
results. You can read up on the reasons why here :

http://support.google.com/customsearch/bin/answer.py?hl=en&answer=2633385

Read up on Google CSE here -> https://www.google.com/cse/compare

And see https://developers.google.com/custom-search/v1/overview

 

Step 1 - Create a CSE A CSE is a Google Custom Search Engine. Typically you
configure it to search your own website, however we can also configure it to
search the entire web.

1. Goto https://www.google.com/cse/all
2. Create a CSE, enter .www.google.de. and then click create
3. Then goto Control Panel / Setup / Basics / Sites to search = Search entire web
```
It looks like .cx: yyyyyyyyyyyyyyyyyyyyy:yyyyyyyyyyy.
```
Test some queries using the CSE within a web browser before crafting a URL for use with curl.
https://www.google.com/cse/publicurl?cx=yyyyyyyyyyyyyyyyyyyyy:yyyyyyyyyyy



Step 2 - Create a unique Developer API Key
https://code.google.com/apis/console/



Step 3 - Send the query
Perform a query using :

key = developers API Key
cx = custom search
q = query string which needs to be encoded
start = is the URL position, and is from 1 to 101. So start=11 is page 2 of the results
num = maximum of 10 search results are allowed

More information of the query parameters can be found here.
https://developers.google.com/custom-search/v1/using_rest?hl=en#query-params

Output is in JSON [JavaScript Object Notation] format.

Each query returns a maximum of 10 results (URL.s and more).
```
$ curl "https://www.googleapis.com/customsearch/v1?key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx&cx=yyyyyyyyyyyyyyyyyyyyy:yyyyyyyyyyy&q=Singapore&filter=0&start=1&num=10&alt=json" > search.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 18389    0 18389    0     0  12291      0 --:--:--  0:00:01 --:--:-- 13984
```
* If the Dload is more than 1000 bytes in size, then it was successful, else you
  have errors in the output file.



Step 4 - Download a JSON command-line parser
Homepage for jq http://stedolan.github.io/jq/

$ sudo wget http://stedolan.github.io/jq/download/linux64/jq -O /usr/local/bin/jq



Step 5 - Parse the JSON output file, to see our results
```
$ jq '.items[].link' search.json | awk '{print substr($0, 2, length() - 2)}'
```
http://en.wikipedia.org/wiki/Singapore
http://www.singaporeair.com/
http://www.yoursingapore.com/
http://wikitravel.org/en/Singapore
http://www.gov.sg/
https://www.cia.gov/library/publications/the-world-factbook/geos/sn.html
http://www.lonelyplanet.com/singapore
http://topics.bloomberg.com/singapore/
http://www.singapore.sg/
http://www.bbc.co.uk/news/world-asia-15961759




When you've reached your daily quota, this is what you will see in the JSON output.
```
{
"error": {
"errors": [
{
"domain": "usageLimits",
"reason": "dailyLimitExceeded",
"message": "Daily Limit Exceeded"
}
],
"code": 403,
"message": "Daily Limit Exceeded"
}
}
```

If you don't encode the query string, you'll see this.
"Error 400
Your client has issued a malformed or illegal request."


## Ideas for improvement
Count how many queries per day
Check json output for maximum free daily quota reached
Parse results through a link checker to save on wasted time
Test more advance queries