Tarql is a command-line tool for converting CSV files to RDF using SPARQL 1.1 syntax. It's written in Java and based on Apache ARQ.
In Tarql, the following SPARQL query:
CONSTRUCT { ... }
FROM <file:table.csv>
WHERE {
...
}
is equivalent to executing the following over an empty graph:
CONSTRUCT { ... }
WHERE {
VALUES (...) { ... }
...
}
In other words, the CSV file's contents are input into the query as a table of bindings. This allows manipulation of CSV data using the full power of SPARQL 1.1 syntax, and in particular the generation of RDF using CONSTRUCT
queries. See below for more examples.
Get the latest releases from here: http://lab.linkeddata.deri.ie/2013/tarql/
For Unix, the executable script is bin/tarql
. For Windows, bin\tarql.bat
. Example invocations:
tarql --help
to show full command line helptarql my_mapping.sparql input1.csv input2.csv
to translate two CSV files using the same mappingtarql --header my_mapping.sparql input1.csv
to force use of column headers as variable namestarql my_mapping.sparql
to translate a CSV file defined in SPARQLFROM
clausetarql --test my_mapping.sparql
to show only the CONSTRUCT template, variable names, and a few input rows (useful for CONSTRUCT query development)`
The input CSV file can be specified using FROM
or on the command line. Use FROM <file:filename.csv>
to load a file from the current directory.
Tarql auto-detects the input CSV file's encoding. This is a guess and may fail sometimes.
Variable names: By default, variable ?a
will contain values from the first column, ?b
from the second, and so on.
If the CSV file already has column headings in the first row, Tarql can be instructed to use the column headings as variable names by appending OFFSET 1
to the SPARQL query. In standard SPARQL, this has the effect of skipping the first row. Tarql will take it as a sign to use the first row as variable names. In the variable names, spaces will be replaced with underscores. If any column doesn't contain a valid variable name, then the default name (?a
, ?b
, etc.) will be used for the column.
All-empty rows are skipped automatically.
Syntactically, a Tarql mapping is one of the following:
- A SPARQL 1.1 SELECT query
- A SPARQL 1.1 ASK query
- One or more consecutive SPARQL 1.1 CONSTRUCT queries, where
PREFIX
andBASE
apply to any subsequent query
In Tarql mappings with multiple CONSTRUCT
queries, the triples generated by previous CONSTRUCT
clauses can be queries in subsequent WHERE clauses to retrieve additional data. When using this capability, note that the order of OPTIONAL
and BIND
is significant in a SPARQL query.
Tarql supports a magic ?ROWNUM
variable for accessing the number of the row within the input CSV file. Numbering starts at 1, and skips empty rows.
SELECT ?First_name ?Last_name ?Phone_number
WHERE { ... }
OFFSET 1
The OFFSET 1
indicates that the first row is to be used to provide variable names.
SELECT DISTINCT ?id ?name
FROM <file:filename.csv>
WHERE {}
LIMIT 100
OFFSET 1
Note that using OFFSET 1 we could not start by a specific offset, and we have to add a FILTER.
SELECT ...
WHERE { FILTER (BOUND(?d)) }
SELECT ...
WHERE {
BIND (URI(CONCAT('http://example.com/ns#', ?b)) AS ?uri)
BIND (STRLANG(?a, 'en') AS ?with_language_tag)
}
CONSTRUCT {
?URI a ex:Organization;
ex:name ?NameWithLang;
ex:CIK ?CIK;
ex:LEI ?LEI;
ex:ticker ?Stock_ticker;
}
FROM <file:companies.csv>
WHERE {
BIND (URI(CONCAT('companies/', ?Stock_ticker)) AS ?URI)
BIND (STRLANG(?Name, "en") AS ?NameWithLang)
}
OFFSET 1
...
WHERE {
BIND (URI(CONCAT('companies/', STR(?ROWNUM))) AS ?URI)
}
...
Get the code from GitHub: http://github.com/cygri/tarql
Tarql uses Maven. To create executable scripts for Windows and Unix in /target/appassembler/bin/tarql
:
mvn package appassembler:assemble
Otherwise it's standard Maven.
- Set base URI from command line
- Choice of output format, writing to file, etc.
- Extract and document a proper API with TarqlQuery, TarqlExecution, etc.
- Allow feeding of triples from RDF files into the mapping process
- Experiment with input files in TSV, different quoting styles, etc.
- Allow overriding of encoding on command line
- Optionally generate VoID/PROV triples
- Find a way of producing consistent blank nodes from strings (esp. in multiple CONSTRUCT clauses)
- Find a way of using the table multiple times, e.g.:
{ { SELECT ?author1 { TABLE } } UNION { SELECT ?author2 { TABLE } } }
- Handle tables where repeated values within a column are omitted, maybe tarql:valueFromPreviousRow(?a))
- Make a proper homepage and set it up at http://tarql.deri.ie/
- Support Excel files?
- Find a way of streaming the CSV parser output into ARQ
- Read input from stdin?
- Web service?
- Get this into ARQ!?