Explore integrating with Cloudflare's Domain Intelligence API
rviscomi opened this issue · 4 comments
https://api.cloudflare.com/#domain-intelligence-properties
This API could help with categorizing websites based on type, eg Travel, Technology, News, etc. It's been on our wish list for a long time and would unlock new kinds of analysis.
Need to look into what the requirements/limitations are. Is it free? Can we get enough quota for our crawl rate? Is it only supported for websites that use Cloudflare? Is our use case aligned with the TOS?
We'll also need to assess how it would be integrated with the crawl and how the data would be exposed. cc @pmeenan
It looks like the domain intelligence is for ~100k domains (or at least the rank info is). We wouldn't want to call the API directly as part of the crawl.
The Cloudflare API sets a maximum of 1,200 requests in a five minute period.
I can ping the team to see if they would be interested in offering the raw dataset to HA to merge with the crawl data but that feels like it would basically be dumping and exposing their full database every month and I'm not sure that's fair to their IP (happy to ask though).
@pmeenan did you have any feedback from Cloudflare?
Seems there is a limit of up to 100 requests per month for free accounts.
I pinged but didn't get a response. I wouldn't plan on having access to it though. Like I mentioned, it is a commercial product that we'd basically be exposing most of the data from to the web for free.
Ok, let's focus on Topics API data for now then.
Will close this.