/xll_inet

Win Inet for Excel

Primary LanguageC++

WinInet for Excel, or what WEBSERVICE wants be when it grows up.

A limitation of WEBSERVICE is that it returns a string. Strings in Excel are limited to 32767 = 215 - 1 characters. Most web pages are larger than that. Much larger. The function \URL.VIEW(url) returns a handle to all the characters returned from the URL. It uses InternetOpenUrl , InternetReadFile , and a memory mapped file to buffer data to memory.

VIEW

The function \URL.VIEW(url) reads url and returns a view of the characters returned. Use VIEW(view, offset, length) to return the characters in view. The default value of offset is 0 and if length is not specified then all characters from the offset are returned. The first dozen or so characters let you identify what type of document was returned.

HTML/XML

This library uses libxml2 for HTML/XML parsing and XPath. Install libxml2 using vcpkg. This is what FILTERXML wants to be when it grows up.

Call \HTML.DOCUMENT(view) to get a handle to an HTML document containing the parsed view. The function \XML.DOCUMENT(view) is also provided for parsing XML documents. A XML document is an ordered tree of nodes. Every XML document has a root node. All nodes except the root node have a unique parent node. Nodes having the same parent are siblings and are the children of the parent. The node ordering is called document order.

Nodes have a type, name, path, and content. The most common node types are element, attribute, and text. Element nodes have the form <name attribute*>content</name> where attribute has the form key="value" and * indicates zero or more occurences. Elements with no content can be abbreviated as <name attribute* /> The path is the node name and list of parents names to the root.

Use XML.DOCUMENT.ROOT(doc) to get the root node for either type of document. In addition to XML.NODE.TYPE, XML.NODE.NAME, XML.NODE.PATH, and XML.NODE.CONTENT, the function XML.NODE.ATTRIBUTES(node) will return a two row range of all node attributes with keys in the first row and values in the second. The functions XML.NODE.NEXT(node, type) and XML.NODE.PREV(node, type) find the first sibling node following or preceding node of the given type. If type is missing the next node is returned. The function XML.NODE.CHILD(node, type) finds the first child of node with type and XML.NODE.PARENT(node) returns the unique parent of node. Node types are defined in the XML_*_NODE() enumeration. These functions can be used to (tediously) traverse a document node-by-node. The functions XML.NODE.SIBLINGS(node) and XML.NODE.CHILDREN(node) return handles to all sibling and children nodes that have type XML_NODE_ELEMENT(). Element nodes are usually what you want when traversing a document.

XPath

XPATH.QUERY(doc, query) returns all nodes of doc matching query. A simple way to get a full picture of the result of a URL query is to call \URL.VIEW(url), \XML.DOCUMENT(view), XPATH.QUERY(document, "//*"), then call XML.NODE.* functions to get types, names, paths, and content.

CSV

Comma separated strings are parsed into 2-dimensional ranges by CSV.PARSE(view, rs, fs, esc, offset, count) where rs is the record separator, fs is the field separator, esc is the escape character, offset is the number of initial lines to skip, and count is the number of rows to return. The default record separator is comma (','), field separator is newline ('\n'), and escape character is (''). The default offset is 0 and if count is missing then all lines are returned. All range elements are returned as strings.

Use CSV.CONVERT(range, types, index) to convert columns specified by (0-based) index into corresponding types from the TYPE_* enumeration.

JSON

JSON strings are parsed using JSON.PARSE into values. Objects are parsed into two row arrays where the first row contains the keys and the second row contains their corresponding values. Arrays are parsed into one row ranges. Use JSON.TYPE(value) to detect if the value is an object, array, string, number, boolean, or null.

The function JSON.INDEX(value, index) retrieves values from a JSON value. If value is an object and index is a string this is equivalent to HLOOKUP(index, object, 2, FALSE). If value is an array and index is a number this is equivalent to INDEX(array, index + 1). The index can be an array of keys and will lookup values recursively. The index can also be specifed in dotted jq style.

Unfiled

http://worldtimeapi.org/pages/schema
http://worldtimeapi.org/api/timezone/Europe/London.txt
https://www.wikidata.org/wiki/Wikidata:Data_access
Each item or property has a persistent URI that you obtain by appending its ID (such as Q42 or P31) to the Wikidata concept namespace: http://www.wikidata.org/entity/ Linked data clients would receive Wikidata's data about the entity in a different format such as JSON or RDF, depending on the HTTP Accept: header of their request. For cases in which it is inconvenient to use content negotiation (e.g. to view non-HTML content in a web browser), you can also access data about an entity in a specific format by extending the data URL with an extension suffix to indicate the content format that you want, such as .json, .rdf, .ttl, .nt or .jsonld

https://www.w3.org/TR/rdf11-concepts/

https://www.wikidata.org/w/api.php?action=wbgetentities&sites=enwiki&titles=Berlin&props=descriptions&languages=en&format=json