Category API
rviscomi opened this issue · 8 comments
For feature parity in v1 we'll also need an API to list all of the technologies for each category.
You can see how it works in the existing dashboard:
Enter a category name
The Technology dropdown updates to display only the technologies of the filtered category
The shape of the API should be an object where the keys are category names and the values are arrays of technologies sorted by popularity:
{
"Most popular category by total number of origins": [
"Most popular technology in the category",
"Second most popular technology",
"..."
],
"Second most popular category": [
"..."
]
}Here's an example query to extract the categories:
WITH categories AS (
SELECT
category,
COUNT(DISTINCT root_page) AS origins
FROM
`httparchive.all.pages`,
UNNEST(technologies) AS t,
UNNEST(t.categories) AS category
WHERE
date = '2023-08-01' AND
client = 'mobile'
GROUP BY
category
),
technologies AS (
SELECT
category,
technology,
COUNT(DISTINCT root_page) AS origins
FROM
`httparchive.all.pages`,
UNNEST(technologies) AS t,
UNNEST(t.categories) AS category
WHERE
date = '2023-08-01' AND
client = 'mobile'
GROUP BY
category,
technology
)
SELECT
category,
categories.origins,
ARRAY_AGG(technology ORDER BY technologies.origins DESC) AS technologies
FROM
categories
JOIN
technologies
USING
(category)
GROUP BY
category,
categories.origins
ORDER BY
categories.origins DESCI've formatted the output and saved the results to a static file: https://github.com/HTTPArchive/tech-report-apis/blob/main/static/categories.json
Also available via the CDN: https://cdn.httparchive.org/reports/cwvtech/categories.json
I'd say only the category name should be a required parameter, but I'll defer to @sarahfossheim if it'd be useful to have any special behavior when it's omitted. For example, maybe it could list only the category names.
We do need to get the list of category names as well (for the category filter dropdown), so that'd be useful yes
Example of how to consume this endpoint
One category or Multiple categories
curl --request GET \
--url 'https://dev-gw-2vzgiib6.ue.gateway.dev/v1/categories?category=["Blogs"]'
curl --request GET \
--url 'https://dev-gw-2vzgiib6.ue.gateway.dev/v1/categories?category=["Blogs","Domain parking"]'
or for only category names
curl --request GET \
--url 'https://dev-gw-2vzgiib6.ue.gateway.dev/v1/categories?onlyname=true'
@rviscomi @sarahfossheim let me know if this is helpful in this way.
Per our chat, change to (here and other APIs):
https://dev-gw-2vzgiib6.ue.gateway.dev/v1/categories?category=Blogs,Domain%20parking
On the frontend we'll need to URL-encode each input param
@rviscomi @sarahfossheim all the changes discussed are already deployed.
New URL https://dev-gw-2vzgiib6.uk.gateway.dev/v1/categories
Documentation: https://github.com/HTTPArchive/tech-report-apis#get-categories
Hi @rviscomi
why does the query for categories contain WHERE ... client = 'mobile' ? are there no categories for desktop ?
Every technology category that exists on desktop pages almost certainly exists on mobile, so this was a small query optimization to avoid processing half the dataset.