HTTPArchive/data-pipeline

Keep secondary page data separate from home page data (for now)

rviscomi opened this issue · 0 comments

The pages.2022_06_01_desktop table contains desktop page data for home pages. It's not clear that the pages.2022_06_09_desktop table contains desktop page data for both home pages and secondary pages.

Secondary pages are still an experimental feature, so we should keep them separate from the stable home page data. We're doing that to some extent with the _01 DD naming scheme for home-only tables, but this may still be causing confusion about which table to use and is already interfering with our automated analysis pipeline for reports on httparchive.org.

In order to alleviate this until the all dataset is ready in #15, move the home+secondary page tables to experimental_-prefixed datasets. Their DD names could be renamed back to 01 for consistency with their corresponding home-only tables.

Old New Eventually
pages.2022_06_09_desktop experimental_pages.2022_06_01_desktop all.pages (date=2022-06-01, client=desktop)
lighthouse_2022_05_16_mobile experimental_lighthouse.2022_05_01_mobile all.pages (date=2022-05-01, client=mobile)
summary_requests.2022_06_09_dekstop experimental_summary_requests.2022_06_01_desktop all.requests (date=2022-06-01, client=desktop)
summary_pages.2022_06_01_mobile (no change, already home-only) all.pages (date=2022-06-01, client=mbile)