/xSkrape.APIWrapper.REST

Data extraction, identification, and shaping functionality. This repository is for the REST-based interfaces to xSkrape.com, which offers these parsing services in a hosted environment.

Primary LanguageC#MIT LicenseMIT

xSkrape.APIWrapper.REST

xSkrape provides data parsing for structured, semi and non-structured data sources. Extract tabular and discrete data from sources with minimal coding. Interact with HTML, JSON, XML, CSV, Excel and other sources over http/https using simple directives. Pull data from Google Docs, shape data from web API's, merge data over multiple requests, and more.

This assembly interacts with Web API services offered at www.xskrape.com. Most functionality requires a client key that can be obtained by creating a free account at xskrape.com, confirming your email address, and visiting the Queries page under My Account. Note that you receive free credits each month, so the service can be used for free for most light to moderate usage. Heavier usage that would go past your free credit limit can be covered by purchasing additional usage credits. More details can be found here: https://www.xskrape.com/home/faq. Examples of usage can be found here: https://github.com/codexguy/xSkrape.APIWrapper.REST/blob/master/xSkrape.APIWrapper.REST.Sample/RESTExamples.cs.

One example is for pulling tabular data from an HTML source, in this case a spreadsheet published in Google Docs:

var result = await xSkrapeREST.GetDataTable(CLIENT_KEY, "https://docs.google.com/spreadsheets/d/1r_gYGu8nawdIk7wpUrbL1evCqE0eygC-TZwVD9ViS-o/edit?usp=sharing", "columnname=Name");

Of note, one line of code is all that's needed here to fully express where the data is, and a hint is provided about what it looks like (a column titled "Name"). As a second example:

var url = "http://www.ndbc.noaa.gov/data/latest_obs/46042.rss";
Dictionary queries = new Dictionary()
{
  { "name", "firstelement=title" },
  { "windspeed", @"numberfollowsnear=Wind\ Speed" },
  { "winddir", @"followinginnertext=Wind\ Direction" },
  { "pubdate", "xpath=/rss[1]/channel[1]/pubDate[1]/text()" }
};
var result = await xSkrapeREST.GetMultiple(CLIENT_KEY, url, queries);

Here we're pulling four discrete values from a single page source using four different approaches. The last approach of using an xpath expression works for virtually all pages: even ill-formed HTML. However, the simplest approaches are to use simple matching terms like "numberfollowsnear" - very easy to understand and use. We even have a tool that can make suggestions about how to extract values and tables - see https://www.xskrape.com/Home/XSPageExplorer.

Given that the REST library wraps what's available on xskrape.com, it also offers the ability to send emails / SMS without requiring your own SMTP server, and generate random data as described here: https://www.xskrape.com/Home/ObfuscationPatterns

Looking for a feature or have a cool idea? Drop us a line, admin@codexframework.com.