/url-info-scraper

Utility for retrieving a small amount of meta data from a URL

Primary LanguageJavaScript

NPM version Build Status Dependency Status

Retrives a small amount of meta data from a URL

Install

$ npm install --save url-info-scraper

Usage

var urlInfoScraper = require('url-info-scraper');

urlInfoScraper('http://en.wikipedia.org/wiki/Wikipedia', function(error, linkInfo) {
  var title = linkInfo.title; //'Wikipedia - Wikipedia, the free encyclopedia'
});

The response is an object with the following properties:

{
  isWebResource: boolean, //true if the link is valid
  title: string, //title of the page requested
  mime: string, //content-type header of the page e.g. image/jpeg
  parsable: boolean, //false if the content type is 'application'
  tooLarge: boolean, //true if the link body is greater than 5MB
  faviconUrl: string //the url of the favicon for the root site, null if not found
}

Todo

  • Rewrite tests to use mocked resources instead of real ones
  • Favicon support
  • "Best image" support
  • Store additional metadata (response time etc.)
  • Screenshots
  • ...?

License

MIT © Paul Cleary