Secondary pages marked as root pages in parsed CSS table
rviscomi opened this issue · 1 comments
rviscomi commented
In the experimental_parsed_css.2022_07_01_* tables, home and secondary pages are included and both are marked with is_root_page set to true.
Secondary pages should have this field set to false.
Because of this bug, secondary pages are mistakenly included in the almanac.parsed_css table.
rviscomi commented
Overwriting the existing tables with BQ DML
UPDATE
`httparchive.experimental_parsed_css.2022_07_01_mobile`
SET
is_root_page = FALSE
WHERE
page NOT IN (SELECT url AS page FROM `httparchive.summary_pages.2022_07_01_mobile`)Also need to fix the data pipeline to use the correct is_root_page value.