aviaryan/python-gsearch

Against Google ToS

trobertsca opened this issue ยท 4 comments

Per /u/LightShadow on reddit.com, this library violates the Google ToS:

This is against Google ToS.

The reason no other library exists is because they shouldn't exist at all.

If you want to use Google Search API you need to pay for credits.

JSON/Atom Custom Search API provides 100 search queries per day for free. If you need more, you may sign up for billing in the API Console. Additional requests cost $5 per 1000 queries, up to 10k queries per day.

Thanks for the issue.
I have created a T&C for the project to make users aware of this situation.
https://github.com/aviaryan/python-gsearch/blob/master/T_AND_C.md


I don't think creating a library like this is against the ToS, but using it for scraping is. So having a T&C like above should work fine in this case. What do you think?

IANAL, but I don't think creating the library could be against the ToS in any way, just using it like you said. Just wanted to make sure any end-users are aware of the risk.

Thanks for publishing the library! I've been slowly going through it to brush up on my regex skills, which are unfortunately lacking.

byrro commented

I find it sad that the first issue in an open sourced software project is related to... "legalities" and not... software. I'm not a lawyer and this is not legal advice. I just like to study this topic and I'd like to contribute with my ideas, just for a critical reflection about this controversial topic.

Short story

Google doesn't own exclusive rights over "automated data gathering" processes and they can't prohibit others to do it. Remember: they do it themselves thousands of times every single second. I could have a ToS stating that once Google indexes my website, they enter into a contract and give me the right to index their website as well.

Google Page Rank is protected by database copyright and it's a fair protection. I cannot copy the ordering of pages and create another search service using it. But in most countries, I can use the search results to create another service, something innovative, and I'm not infringing copyright. Google's willingness to restrict other people's right to innovate (that they benefit themselves from, using other people's content) is outrageous.

Long Story

Be careful about considering ToS blindly as law

Considering a website Terms of Service (ToS) blindly as "law" may lead to outcomes that one could reasonably argue as undesirable. Take this example: Internet Archive v. Suzanne Shell. Suzanne put a notice in her website ToS stating that, once you copy data from her website, you enter into a contract and you owe her US$5,000 per web page copied. Yes, this is really true, this lawsuit really happened.

If ToS is law, I can have a ToS giving me the same rights as Google gives itself to scrape the web

If I would really think that anything written in a ToS is enforceable and I still wanted to scrape Google, I'd just put a ToS on my website stating something like: if you scrape any content from our website, you automatically give us an irrevocable equal right to also scrape any content from your website(s). Since I blindly treat ToS as "law", once Google indexes my website I would have the right to scrape Google's website as well.

Google doesn't have exclusive rights over "automated data gathering"

Honestly, it is an outright hypocrisy for Google to prohibit "automated data gathering". The entire Google business model is based on automated data gathering and processing. Google DO NOT own automated data gathering processes, they DO NOT have any exclusive rights over this. They didn't even create it.

Two cents about database copyrights

Removing the discussion about "manually is ok, automated is not ok", it all comes to the database itself. The question is: does Google hold copyright over the content they serve on their Search service? The answer is: it depends on the country.

In the UK and the EU, there's the old-fashioned "Sui Generis Database Right". I heard no court has ever enforced it in Europe, but, in theory, the answer is yes: Google owns copyright over the list of web pages they serve on their Search service and you cannot use it (either gathering automatically or manually) without their prior consent.

Most other world regions (including most countries, if not all, in North, Central, and South America) adopted a different approach that is more friendly to innovation. It's based on the Berne Convention. In summary, a database owner only has rights over the ordering of the database (if there's a creative work in the ordering), not the content itself (unless the content is a creative work, which is not the case for Google). For example, Google worked very hard to come up with a way to order web pages, what is called "Page Rank". The approach proposed by the Berne Convention is that Google owns the ordering, but not the content that is ordered, which I think is fair.

Google's business model would be dead if they needed to respect the "Sui Generis Database" rights. Google is one of the most benefited from the innovation-friendly ideas conveyed by the Berne Convention. Google should be the first one advocating to protect, not restrict, everyone's rights to use non-creative works in innovative ways like they did. Google ToS is actually infringing other people's rights to innovate, IMHO.

I cannot copy Google's ordering (Page Rank), but I can derive innovative works from Search Results

Now, under the light of the Berne Convention, if I scrape Google and create another search service following the same ordering of pages, I would be offending a copyright. And it's fair because Google invested so much time and money on their Page Rank and this must be protected. On the other hand, if I scrape Google and use the content to do something else, combine with other data, create something different, new, innovative, then I'm not offending any copyrights.

I'm kind of wondering if there's no consideration about the electricity cost at all, since it is paid by the server owner, and may vary per site ( presumably between google and other normal sites )