tql: A Python repository from dwelch91

tql - Read, Modify, and Generate Tabular Data

Inspired by theq tool (https://harelba.github.io/q/)...

Features

Support for reading of CSV, TSV, HTML, JSON, LTSV, Markdown, MediaWiki, Excel, etc. formats
Extensive support for data pre-filtering, conversions, etc. of input data before it is added to database
Full support for all of SQLite features
Support for writing in CSV, TSV, Excel, LaTex, Markdown, JSON, rst, TOML, etc. formats
Create database, add data to existing database, or use in-memory database
Table name remapping
Column name remapping
Automatic or user specified headers/column names
Full Python 3.5+ support (sorry, no Python 2.x support)
MIT license

Planned Features

Input from stdin
Regex-based filters
Locale based filters
Encodings support
CSV column merging and splitting
Modification queries (TBD)
Callable API

Installation

Documentation

Usage

In a nutshell:

tql "<SQL>" [options]

Where the <SQL> is standard SQLite compatible SQL with the following modifier - when loading CSV data into a table, use the format @<filename> (for <filename>s w/o spaces), @"<filename>" or @'<filename>' in the place of table names in FROM clause(s).

Example:

tql "SELECT filename, size FROM @./data.csv WHERE size > 1024 SORT BY size DESC;" --auto-filter

Detailed Usage

usage: tql [-h] [--input-format {csv, json}] [--skip-lines SKIP_LINES]
           [--input-delimiter INPUT_DELIMITER]
           [--input-encoding INPUT_ENCODING] [--headers HEADERS]
           [--filter FILTER] [--auto-filter] [--remap-column REMAP_COLUMN]
           [--remap-table REMAP_TABLE] [--save-db SAVE_DB | --load-db LOAD_DB]
           [--output OUTPUT] [--output-format {csv, table, md, markdown}]
           [--output-delimiter OUTPUT_DELIMITER]
           [--output-quotechar OUTPUT_QUOTECHAR] [--aws-profile AWS_PROFILE]
           [--gcp-profile GCP_PROFILE] [--debug] [--filters-list]
           [--replacements-list]
           [sql [sql ...]]
positional arguments:
  sql                   The SQL to execute. Use filenames surrounded by single
                        or double quotes to specify CSV sources instead of
                        existing tables in the FROM clause(s). You can use
                        [:...:] replacements for special characters (see
                        --help-filters for more information.
optional arguments:
  -h,  --help            show this help message and exit
  --input-format {csv, json},  --in-format {csv, json},  --in-fmt {csv, json},
  -f {csv, json}
                        Input format. Valid value are csv,  json. Default is
                        `csv`.
  --skip-lines SKIP_LINES,  --skip SKIP_LINES,  -k SKIP_LINES
                        Skip `SKIP_LINES` lines at the beginning of the file.
                        Default is 0.
  --input-delimiter INPUT_DELIMITER,  -d INPUT_DELIMITER
                        Specify the CSV delimiter to use. Default is a comma
                        (, ).
  --input-encoding INPUT_ENCODING
                        Specify the input file encoding. Defaults to 'utf8'.
  --headers HEADERS,  -r HEADERS
                        Don't use the first non-skipped line for header/column
                        names,  use these header/column names instead. Format
                        is a comma separated list of column names. Column
                        names must not be SQLite reserved words.
  --filter FILTER,  -e FILTER
                        Specify a column filter. Use one filter per
                        switch/param. Format is <column_name>|filter|<0 or
                        more params or additional filters in filter chain>.
                        Filters have a variable number of parameters (0+).
                        Filters may be chained.
  --auto-filter,  -a     Automatically apply the `num` filter to all column
                        data.
  --remap-column REMAP_COLUMN,  --remap-header REMAP_COLUMN,  -m REMAP_COLUMN
                        A single column re-map in the form
                        <col_name>=<new_col_name>. Use one switch for each
                        column re-mapping. This overrides any column/header
                        names that are auto-discovered or passed in via
                        --headers/-r. You can use [:...:] replacements for
                        special characters (see --help-filters for more
                        information.
  --remap-table REMAP_TABLE,  --remap-file REMAP_TABLE,  -T REMAP_TABLE
                        A single table re-map in the form
                        <table_name>=<new_table_name>. Use one switch for each
                        table re-mapping. This overrides any table names that
                        are auto-generated from filenames passed in via the
                        SQL statement. You can use [:...:] replacements for
                        special characters (see --help-filters for more
                        information.
  --save-db SAVE_DB,  -s SAVE_DB
                        Specify a SQLite database to use (instead of using an
                        in-memory database. The database will remain after tql
                        exits.
  --load-db LOAD_DB,  -l LOAD_DB
                        Load an existing database instead of creating a new
                        one.
  --output OUTPUT,  -o OUTPUT
                        Output file. Default is stdout (-).
  --output-format {csv, table, md, markdown},  --out-format {csv, table, md,
                        markdown},  --out-fmt {csv, table, md, markdown},  -F {csv, table, md, markdown}
                        Output format. Valid value are csv,  table,  md, 
                        markdown. Default is table.
  --output-delimiter OUTPUT_DELIMITER,  -D OUTPUT_DELIMITER
                        Specify the CSV delimiter to use for output. Default
                        is a comma (, ).
  --output-quotechar OUTPUT_QUOTECHAR,  -Q OUTPUT_QUOTECHAR,  --output-quote-
                        char OUTPUT_QUOTECHAR
                        Specify the CSV quote character for output. Default is
                        double quote (").
  --aws-profile AWS_PROFILE
  --gcp-profile GCP_PROFILE
  --debug,  -g           Turn on debug output.
  --filters-list,  --filter-list,  --help-filters
  --replacements-list,  --replacement-list,  --help-replacements

Data Filtering

Available Data Filters

Filter	Num. Params	Syntax**	In type*	Out type	Description
`abs`	0	`<col>\|abs`	`num`	`num`	Take the absolute value of a number.
`add`	1	`<col>\|add:<value>`	`num`	`num`	Add `<value>` to number.
`backticks`	0	`<col>\|backticks`	`str`	`str`	Wrap a string in backticks.
`capitalize`	0	`<col>\|capitalize`	`str`	`str`	Capitalize string.
`ceil`	0	`<col>\|ceil`	`num`	`num`	Return the ceiling value of a number.
`center`	1	`<col>\|center:<width>`	`str`	`str`	Center string in `<width>` spaces.
`datetime`	0	`<col>\|datetime`	`str`	`datetime`	Parse a datetime string.
`datetime_tz`	1	`<col>\|datetime_tz:<tz>`	`str`	`datetime`	Parse a datetime string with the specified `<tz>` timezone.
`dehumanize`	0	`<col>\|dehumanize`	`str`	`num`	Convert from human string to number.
`div`	1	`<col>\|div:<value>`	`num`	`num`	Divide number by `<value>`.
`dquotes`	0	`<col>\|dquote`	`str`	`str`	Wrap a string in double quotes.
`float`	0	`<col>\|float`	`str`	`float`	Convert to float.
`floor`	0	`<col>\|floor`	`num`	`num`	Return the floor value of a number.
`format`	1	`<col>\|format:<format>`	`str`	`str`	Format data using Python's `format()` function.
`humanize`	1	`<col>\|humanize:<unit>`	`num`	`str`	Format number to human string.
`int`	0	`<col>\|int`	`str`	`int`	Convert to integer.
`iso8601`	0	`<col>\|iso8601`	`datetime`	`str`	Convert a datetime to an ISO8601 string representation.
`length`	0	`<col>\|length`	`str`	`str`	Return the length of the string.
`ljust`	1	`<col>\|ljust:<width>`	`str`	`str`	Left justify string in `<width>` spaces.
`lower`	0	`<col>\|lower`	`str`	`str`	Convert string to lowercase.
`lstrip`	1	`<col>\|lstrip:<chars>`	`str`	`str`	Strip `<chars>` from the left end of the string.
`ltrim`	1	`<col>\|ltrim`	`str`	`str`	Strip whitespace characters from the left end of the string.
`mult`	1	`<col>\|mult:<value>`	`num`	`num`	Multiply number by `<value>`.
`num`	0	`<col>\|num`	`str`	`num`	Convert to integer or float.
`number`	0	`<col>\|number`	`str`	`num`	Convert to integer or float.
`ordinal`	0	`<col>\|ordinal`	`num`	`str`	Convert number to ordinal string.
`prefix`	1	`<col>\|prefix:<prefix>`	`str`	`str`	Prefix the string with `<prefix>`.
`replace`	2	`<col>\|replace:<from>,<to>`	`str`	`str`	Replace sub-string `<from>` with `<to>`.
`reverse`	0	`<col>\|reverse`	`str`	`str`	Reverse the characters in the string.
`rjust`	1	`<col>\|rjust:<width>`	`str`	`str`	Right justify string in `<width>` spaces.
`round`	1	`<col>\|round:<digits>`	`num`	`num`	Round number to `<digits>` digits.
`rstrip`	1	`<col>\|rstrip:<chars>`	`str`	`str`	Strip `<chars>` from the right end of the string.
`rtrim`	1	`<col>\|rtrim`	`str`	`str`	Strip whitespace characters from the right end of the string.
`squotes`	0	`<col>\|squote`	`str`	`str`	Wrap a string in single quotes.
`str`	0	`<col>\|str`	`any`	`str`	Convert to string.
`strftime`	1	`<col>\|strftime:<format>`	`datetime`	`str`	Format a datetime using `strftime()`.
`sub`	1	`<col>\|sub:<value>`	`num`	`num`	Subtract `<value>` from number.
`substr`	2	`<col>\|substr:<start>,<end>`	`str`	`str`	Return a sub-string.
`suffix`	1	`<col>\|suffix:<suffix>`	`str`	`str`	Suffix the string with `<suffix>`.
`swapcase`	0	`<col>\|swapcase`	`str`	`str`	Swap string case.
`thousands`	0	`<col>\|thousands`	`num`	`str`	Format number with thousands separators.
`title`	0	`<col>\|title`	`str`	`str`	Convert string to title case.
`trunc`	0	`<col>\|trunc`	`num`	`num`	Return the number truncated.
`tz`	1	`<col>\|tz:<tz>`	`datetime`	`datetime`	Convert a datetime to a new `<tz>` timezone.
`upper`	0	`<col>\|upper`	`str`	`str`	Convert string to uppercase.
`utc`	0	`<col>\|utc`	`datetime`	`datetime`	Convert a datetime to UTC.
`zfill`	1	`<col>\|zfill:<width>`	`str`	`str`	Zero fill string to `<width>` size.

Most filters that take numeric inputs will automatically apply the num filter to the column data prior to filtering. Filters can be chained together using the pipe (|) character. For example, c1|num|add|1|human The type of the data after the last filter has run will be the type that is added to the database.

Auto Filtering

Column Remapping

Table Remapping

Character Replacements

Sequence	Description
`[:space:]`	Space ( )
`[:pipe:]`	Pipe (\|)
`[:backslash:]`	Blackslash ()
`[:backtick:]`	Backtick (`)
`[:squote:]`	Single quote (')
`[:dquote:]`	Double quote (")
`[:tab:]`	Tab (\t)
`[:cr:]`	Carriage return (\r)
`[:newline:]`	Newline (\n)
`[:n:]`	Newline (\n)
`[:comma:]`	Comma (,)
`[:colon:]`	Colon (:)
`[:amp:]`	Ampersand (&)
`[:ampersand:]`	Ampersand (&)
`[:gt:]`	Greater than (>)
`[:lt:]`	Less than (<)

Save Database

Load Database

Output Formats

Use Cases

CSV -> filter/sort/aggregate/etc (SQL) -> CSV (or table)

Without -s/--save-db or -l/--load-db, tql will load the CSV into an in-memory database, perform the SQL query, and output the result in the chosen output format. If the data set is too large for memory, use -s/--save-db to force tql to use a on-disk database and will only then be limited by available disk space and/or SQLite database size limitations.

CSV -> filter/sort/aggregate/etc (SQL) -> database -> CSV (or table)

Using -s/--save-db will force tql to retain an on-disk database of the loaded CSV data. It will still produce results in the chosen output format. The database can then be used in subsequent -l/--load-db workflows or manipulated using any desired SQLite database tool.

CSV - database -> filter/sort/aggregate/etc (SQL) -> database -> CSV (or table)

Using the -l/--load-db switch will cause tql to load an existing database and load the CSV data into a new table in the database. It will still produce results in the chosen output format. The database can then be used in subsequent -l/--load-db workflows or manipulated using any desired SQLite database tool.

Database -> filter/sort/aggregate/etc (SQL) -> CSV (or table)

Using the -l/--load-db switch will cause tql to load an existing database. If using regular tables for the FROM clauses (ie, non-CSV files), the specified SQL will be run against the database and a CSV (or table) will produce results in the chosen output format.

FAQs

How do I change the order of the columns in the output?

Use SELECT with the column names listed out; the order which they appear will produce output columns in the same order.

How do I filter, sort, etc on a numeric column?

tql does not automatically convert CSV data into integers or floats. There are two options:

Coerce the data to either an integer or float using a filter (e.g., int) on the target column and then SQLite will be able to operate on the column as a numeric value for sorting, comparisons, etc.
Use the -a/--auto-filter switch that will try to convert all column data to numeric values. Data that cannot be converted to a numeric value will remain a string.