/prism

A Ruby microformat parser and HTML toolkit powered by Nokogiri

Primary LanguageRubyMIT LicenseMIT

Prism

Ruby microformat parser and HTML toolkit

RDoc | Gem | Metrics | Microformats.org

What Prism is:

  • A robust microformat parser
  • A command-line tool for parsing microformats from a url or a string of markup
  • A DSL for defining semantic markup patterns
  • Export microformats to other standards:
    • hCard => vCard

It is your lowercase semantic web friend.

Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. Instead of throwing away what works today, microformats intend to solve simpler problems first by adapting to current behaviors and usage patterns (e.g. XHTML, blogging).

Learn more about Microformats at http://microformats.org.

Usage

The command line tool takes a SOURCE from the Standard Input or as an argument:

$: curl http://markwunsch.com | prism --hcard > ~/Desktop/me.vcf

OR

$: prism --hcard http://markwunsch.com > ~/Desktop/me.vcf

Installation

With Ruby and Rubygems:

gem install prism

Or clone the repository and run bundle install to get the development dependencies.

Requirements:

Microformats supported (right now, as of this very moment)

More on the way.

Finding Microformats:

# All microformats
Prism.find 'http://foobar.com'

# A specific microformat
Prism.find 'http://foobar.com', :hcard

# Search HTML too
Prism.find big_string_of_html

Parsing Microformats:

twitter_contacts = Prism.find 'http://twitter.com/markwunsch', :hcard
me = twitter_contacts.first
me.fn
#=> "Mark Wunsch"
me.n.family_name
#=> "Wunsch"
me.url
#=> ["http://markwunsch.com/"]
File.open('mark.vcf','w') {|f| f.write me.to_vcard }
## Add me to your address book!	

POSH DSL

The Prism module defines a group of methods to search, validate, and extract nodes out of a Nokogiri document.

All microformats inherit from Prism::POSH, because all microformats begin as POSH formats. If you wanted to create your own POSH format, you'd do something like this:

class Navigation < Prism::POSH
	search {|document| document.css('ul#navigation') }
	# Search a Nokogiri document for nodes of a certain type
	
	validate {|node| node.matches?('ul#navigation') }
	# Validate that a node is the right element we want
	
	has_many :items do
		search {|doc| doc.css('li') }
	end
	# has_many and has_one define properties, which themselves inherit from
	# Prism::POSH::Base, so you can do :has_one, :has_many, :search, :extract, etc.
end

Now you can do:

nav = Navigation.parse_first(document) 
# document is a Nokogiri document. 
# parse_first extracts just the first example of the format out of the document

nav.items
# Returns an array of contents
# This method comes from the has_many call up above that defines the :items property

Other Microformat parsers

  • Mofo is a Ruby microformat parser backed by Hpricot.
  • Sumo is a JavaScript microformat parser.
  • Operator is a Firefox extension.
  • hKit is a microformat parser for PHP.
  • Oomph is a microformat toolkit add-in for Internet Explorer.

Feature wishlist:

  • HTML outliner (using HTML5 sectioning)
  • HTML5 article, time, etc POSH support
  • Extensions so you can do something like: String.is_a_valid? :hcard in your tests
  • Extensions to turn Ruby objects into semantic HTML. Hash.to_definition_list, Array.to_ordered_list, etc.

TODO:

  • Code is ugly. Especially XOXO.
  • Better recursive parsing of trees. See above.
  • Tests are all kinds of disorganized. And slow.
  • Broader support for some of the weirder Patterns, like object[data]
  • Man pages (see Ron)

License

Prism is licensed under the MIT License and is Copyright (c) 2010 Mark Wunsch.