Process new meta tag scans
Closed this issue · 3 comments
gbinal commented
gbinal commented
Stats:
- 28624 Target URLs
- 15,133 are live.
- There are 1,006 tags detected on the other 13,491 non-Live Target URLs
Compare the lower table against these, the numbers for current tag scans...
Field | Number of the 15031 sites that have a value for it | % |
---|---|---|
title | 14466 | 96.24% |
description | 4615 | 30.70% |
og_title | 3854 | 25.64% |
og_description | 3018 | 20.08% |
og_article_published | 260 | 1.73% |
og_article_modified | 683 | 4.54% |
canonical_link | 4677 | 31.12% |
viewport_meta_tag | 10929 | 72.71% |
main_element_present | 5383 | 35.81% |
[Legend for below: *=search.gov recommended; +=civichackingagency recommended; ^=open graph; ~=dublincore; #=schema.org
Tag | Number of the 15133 sites that use it | % |
---|---|---|
meta_keywords_content* | 1733 | 11.45% |
meta_robots_content*+ | 2702 | 17.86% |
meta_article_section_content*^ | 0 | 0.00% |
meta_article_tag_content*^ | 3 | 0.02% |
og_image_final_url*+^ | 3198 | 21.13% |
dcterms_keywords_content*~ | 0 | 0.00% |
dc_subject_content*~ | 41 | 0.27% |
dcterms_subject_content*~ | 369 | 2.44% |
dcterms_audience_content*~ | 48 | 0.32% |
dc_type_content*~ | 41 | 0.27% |
dcterms_type_content*~ | 507 | 3.35% |
dc_date_content*~ | 4 | 0.03% |
dc_date_created_content*~ | 21 | 0.14% |
dcterms_created_content*~ | 297 | 1.96% |
og_locale_content+ | 1293 | 8.54% |
og_site_name_content+ | 2962 | 19.57% |
og_type_content+ | 3094 | 20.45% |
og_url_content+ | 3483 | 23.02% |
og_image_alt_content+ | 690 | 4.56% |
revised_content | 11 | 0.07% |
last_modified_content | 32 | 0.21% |
language_content | 205 | 1.35% |
date_content | 16 | 0.11% |
subject_content | 64 | 0.42% |
owner_content | 26 | 0.17% |
pagename_content | 0 | 0.00% |
dc_title_content~ | 158 | 1.04% |
og_site_name^ | 29 | 0.19% |
item_type_content# | 682 | 4.51% |
item_scope_content# | 176 | 1.16% |
item_prop_content# | 859 | 5.68% |
vocab_content# | 8 | 0.05% |
type_of_content# | 969 | 6.40% |
property_content# | 4979 | 32.90% |
context_content# | 31 | 0.20% |
type_content# | 12271 | 81.09% |
html_lang_content | 12181 | 80.49% |
href_lang_content | 775 | 5.12% |
me_content | 0 | 0.00% |
gbinal commented
Here's our proposal of what to keep, what to not keep, and why.
Keep:
- <meta name='keywords' - Decent adoption and useful content indicators
- <meta property="og:image" - Important component of OG implementation
- <meta property="og:type" - Important component of OG implementation
- <meta property="og:url" - Important component of OG implementation
<html lang=
- Important data for multilingual content analysis<link hreflang=
- Important data for multilingual content analysis
Don't keep:
- <meta name="robots" - Content isn't currently actionable and if we wanted to pursue it, we should first do more with the robots.txt
- <meta name="article:section" - Low adoption by agencies
- <meta name="article:tag" - Low adoption by agencies
- <meta name="dcterms.keywords" - Low adoption by agencies
- <meta name="dc.subject" - Low adoption by agencies
- <meta name="dcterms.subject" - Low adoption by agencies
- <meta name="dcterms.audience" - Low adoption by agencies
- <meta name="dc.type" - Low adoption by agencies
- <meta name="dcterms.type" - Low adoption by agencies
- <meta name="dc.date" - Low adoption by agencies
- <meta name="dc.date.created" - Low adoption by agencies
- <meta name="dcterms.created" - Low adoption by agencies
- <meta property="og:locale" - Decent adoption but almost universally just an indicator of EN or EN_US. Only 4 records are otherwise.
- <meta property="og:site_name" - Good adoption but not clear that it's central to OG implementation and the information appears likely to be duplicative with e.g. page title.
- <meta property="og:image:alt" - Low adoption by agency and not clear that it's necessarily an important component of accessibility
- <meta name="revised" - Low adoption by agency
- <meta http-equiv=”last-modified” - Low adoption by agency
- <meta name='language' - Low adoption and almost universally just an indicator of EN or EN_US. Only 3 records are otherwise.
- <meta name='date' - Low adoption by agency
- <meta name='subject' - Low adoption by agency
- <meta name='owner' - Low adoption by agency
- <meta name='pagename' - Low adoption by agency
- <meta name='DC.title' - Low adoption by agency
- <meta name='og:site_name' - Low adoption by agency
<link rel="me"
- Low adoption by agency- itemtype="" - Mainly scanned to determine schema.org adoption levels
- itemscope="" - Mainly scanned to determine schema.org adoption levels
- itemprop="" - Mainly scanned to determine schema.org adoption levels
- vocab="" - Mainly scanned to determine schema.org adoption levels
- typeof="" - Mainly scanned to determine schema.org adoption levels
- property="" - Mainly scanned to determine schema.org adoption levels
- context="" - Mainly scanned to determine schema.org adoption levels
- type="" - Mainly scanned to determine schema.org adoption levels
gbinal commented
This is done. I need to add links to this data in the documentation before closing it though.