Fetch Googlebot IP ranges from their published JSON resource

Google publishes the current IP ranges for Googlebot: https://developers.google.com/search/docs/crawling-indexing/verifying-googlebot#automatic

Of course Legitbot could fetch them with fetch:url, similarly to how it works for Ahrefs:

legitbot/lib/legitbot/ahrefs.rb

Lines 6 to 7 in e5c8923

    
           # @fetch:url https://api.ahrefs.com/v3/public/crawler-ip-ranges?output=json 
        
           # @fetch:jsonpath $.prefixes[*].ipv4Prefix

But we don't know the cadence of changes to this list and fetch:url updates the Legitbot sources. Even with the automatic detection in place, the change would have to wait until the next release.

In order to dynamically fetch Googlebot IP ranges from their published JSON, ip_ranges block can be used, similarly to how it works for Facebook:

legitbot/lib/legitbot/facebook.rb

Lines 10 to 19 in e5c8923

    
             ip_ranges do 
        
               client = Irrc::Client.new 
        
               client.query :radb, AS, source: :radb 
        
               results = client.perform 
        
               %i[ipv4 ipv6].map do |family| 
        
                 results[AS][family][AS] 
        
               end.flatten 
        
             end 
        
           end

We probably need fetch:url factored out from Rubocop cop sources though, so it can be easily accessible.

Though I have to add that I am against making pre-fetching the IP ranges list the default behaviour.

Currently implemented DNS-based approach is superior, because it relies on the DNS caching (including eviction). Only the first request may be slow, and all subsequent requests will utilise the cache. This somewhat increased latency of the first request is not a big deal for web crawlers and it does not affect human visitors.

Contrary, if someone wants to fetch IP ranges from an external resource, they would also be responsible for refreshing this list regularly using reload_ips.

	# @fetch:url https://api.ahrefs.com/v3/public/crawler-ip-ranges?output=json
	# @fetch:jsonpath $.prefixes[*].ipv4Prefix

	ip_ranges do
	client = Irrc::Client.new
	client.query :radb, AS, source: :radb
	results = client.perform

	%i[ipv4 ipv6].map do \|family\|
	results[AS][family][AS]
	end.flatten
	end
	end