A very very immature tool for extracting semating data from web pages, primarily targeting the browser, with no external runtime dependencies.
Include it on your page:
<script src="woven.min.js"></script>
require
it in Node/io.js
var woven = require("woven")
All methods take an HTMLDocument
or HTMLElement
as the first argument. You can get ahold of one in the browser:
// as the main document global variable
var doc = document
// with any `getElement-*` method
var elem = document.getElementById("some-element")
// by parsing some HTML
var parser = new DOMParser()
var html = "<meta name='a' content='b'>"
var docFragment = parser.parseFromString(html, "text/html")
// with some help
var elem = $("#some-element")[0]
In Node/io.js, you can use jsdom (or similar):
var jsdom = require("jsdom")
var html = "<meta name='a' content='b'>"
var docFragment = jsdom.jsdom(html)
Schema.org Data
Given a docFragment
:
<div itemscope itemtype="http://data-vocabulary.org/Person">
My name is <span itemprop="name">Bob Smith</span>,
but people call me <span itemprop="nickname">Smithy</span>.
Here is my homepage:
<a href="http://www.example.com" itemprop="url">www.example.com</a>.
I live in
<span itemprop="address" itemscope
itemtype="http://data-vocabulary.org/Address">
<span itemprop="locality">Albuquerque</span>,
<span itemprop="region">NM</span>
</span>
and work as an <span itemprop="title">engineer</span>
at <span itemprop="affiliation">ACME Corp</span>.
</div>
woven.extractSchemaItems(docFragment) // =>
[ { itemtype: 'http://data-vocabulary.org/Person',
name: 'Bob Smith',
nickname: 'Smithy',
url: 'www.example.com',
address:
{ itemtype: 'http://data-vocabulary.org/Address',
locality: 'Albuquerque',
region: 'NM' },
title: 'engineer',
affiliation: 'ACME Corp' } ]
Given a page:
<html>
<head>
<meta name="title" content="I Am a Teapot">
<meta name="keywords" content="self being vessel">
<meta property="og:title" content="I Am a Teapot">
<meta property="not-real-property" content="418">
</head>
<body>
<h1>I Am a Teapot</h1>
</body>
</html>
woven.extractDocumentMeta(document) // =>
{ title: 'I Am a Teapot',
keywords: 'self being vessel',
'og:title': 'I Am a Teapot',
'not-real-property': '418' }
Work in progress.
(Look, it's the one I needed.)
Given a docFragment
:
<div class="haudio">
<span class="fn">Start Wearing Purple</span> by
<span class="contributor">
<span class="vcard">
<span class="fn org">Gogol Bordello</span>
</span>
</span>
found on
<span class="album">Underdog World Strike</span>
</div>
woven.extractHAudio(docFragment) // =>
[ { fn: 'Start Wearing Purple',
contributor: 'Gogol Bordello',
album: 'Underdog World Strike' } ]
They're in Mocha.
$ mocha test/*
Or, automatically
$ mocha watch test/*
$ gulp
Builds the Browserified version, a minified version of that and corresponding source map.
extractAll
methodextractMicroformats
method- More individual microformats
- Meaningful breakdown of common
meta
tags - Interface for page meta with fallthrough for values
- More real-world example tests
- Browser-based tests?
- Visualizer?
- Commandline tool?