sql_to_ibis
is a Python package that translates SQL syntax into ibis expressions.
This provides the capability of using only one SQL dialect to target many different
backends
pip install sql_to_ibis
To use an ibis table in sql_to_ibis you must register it. Note that for joins or queries that involve more than one table you must use the same ibis client when creating both ibis tables. Once the table is registered you can query it using SQL with the query function. In the example below, we create and query a pandas DataFrame
import ibis.pandas
import pandas
import sql_to_ibis
df = pandas.DataFrame({"column1": [1, 2, 3], "column2": ["4", "5", "6"]})
ibis_table = ibis.pandas.Backend().from_dataframe(
df, name="my_table", client=ibis.pandas.connect({})
)
sql_to_ibis.register_temp_table(ibis_table, "my_table")
sql_to_ibis.query(
"select column1, cast(column2 as integer) + 1 as my_col2 from my_table"
).execute()
This would output a DataFrame that looks like:
column1 | my_col2 |
---|---|
1 | 5 |
2 | 6 |
3 | 7 |
The sql syntax for sql_to_ibis is as follows (Note that all syntax is case insensitive):
SELECT [{ ALL | DISTINCT }]
{ [ <expression> ] | <expression> [ [ AS ] <alias> ] } [, ...]
[ FROM <from_item> [, ...] ]
[ WHERE <bool_expression> ]
[ GROUP BY { <expression> [, ...] } ]
[ HAVING <bool_expression> ]
Example:
SELECT
column4,
Sum(column1)
FROM
my_table
WHERE
column3 = 'yes'
AND column2 = 'no'
GROUP BY
column4
Note that columns with spaces in them can be expressed using double quotes. For example:
SELECT
"my column 1",
column2 as "the second column"
FROM
my_table
<select_statement1>
{UNION [DISTINCT] | UNION ALL | INTERSECT [DISTINCT] | EXCEPT [DISTINCT] | EXCEPT ALL}
<select_statment2>
Example:
SELECT
*
FROM
table1
UNION
SELECT
*
FROM
table2
INNER, CROSS, FULL OUTER, LEFT OUTER, RIGHT OUTER, FULL, LEFT, RIGHT
SELECT
*
FROM
table1
CROSS JOIN
table2
SELECT
*
FROM
table1
JOIN
table2
ON table1.column1 = table2.column1
<set>
[ORDER BY <expression>]
[LIMIT <number>]
Example:
SELECT
*
FROM
table1
ORDER BY
column1
LIMIT 5
<aggregate>() OVER(
[PARTITION BY (<expresssion> [, <expression>...)]
[ORDER_BY (<expresssion> [, <expression>...)]
[ ( ROWS | RANGE ) ( <preceding> | BETWEEN <preceding> AND <following> ) ]
)
<preceding>: UNBOUNDED PRECEDING | <unsigned_integer> PRECEDING | CURRENT ROW
<following>: UNBOUNDED FOLLOWING | <unsigned_integer> FOLLOWING | CURRENT ROW
+, -, *, /
CASE WHEN <condition> THEN <result> [WHEN ...] ELSE <result> END
SUM, AVG, MIN, MAX
{RANK | DENSE_RANK} OVER([PARTITION BY (<expresssion> [, <expression>...])])
CAST (<expression> AS <data_type>)
<expression> is null
<expression> is not null
COALESCE(<expresssion> [, <expression>...])
- Anything in <> is meant to be some string
- Anything in [] is optional
- Anything in {} is grouped together
- VARCHAR, STRING
- INT16, SMALLINT
- INT32, INT, INTEGER
- INT64, BIGINT
- FLOAT16
- FLOAT32
- FLOAT, FLOAT64
- BOOL
- DATETIME64, TIMESTAMP
- CATEGORY
- OBJECT