DBIx::PreQL - less awful SQL generation through templates
use DBIx::PreQL ();
my $q = <<END_SQL;
* SELECT
& count(*), !total!
* name,
END_SQL
my $q = <<END_SQL;
* SELECT
& count(*), !total!
D name,
D height,
* FROM tbl_monkey
* WHERE
& AND barrel_id = ?barrel_id?
& AND name ILIKE '%' || ?monkey_name? || '%'
& AND color ?=monkey_color?
& AND ARRAY[type] <@ ?\@types? -- "IN"
& ORDER BY name !~total!
END_SQL
my $total = undef;
my $rows = $db->select_all( DBIx::PreQL->build_query(
query => $q,
data => {
barrel_id => 32,
monkey_color => \'NULL',
total => $total,
types => [ 'ape', 'chimp' ],
},
wanted => ! $total ? ['D'] : [], # Want 'D'ata if not $total
) );
# Runs the following with @params = ( 32, ['ape','chimp'] ):
# SELECT
# name,
# height
# FROM tbl_monkey
# WHERE
# barrel_id = ?
# AND color IS NULL
# AND ARRAY[type] <@ ? -- "IN"
# ORDER BY name
This module generates queries based on a query template, a hash of related data, and possibly a list of wanted tags or a function that determines which tags are wanted.
This templating system adds only a handful of concepts to standard SQL text.
The query template is processed on a line-by-line basis.
Each line consists of a tag followed by the body of the line. That is, each line is composed of: optional whitespace, a tag composed of one or more non-whitespace characters, a separator of one or more whitespace characters, and the remainder of the line is the embedded SQL (the body).
Each line body consists of plain SQL text and perhaps some named place-holders and/or dependency markers.
- Named place-holders
A named place-holder is a name bracketed by question marks, like
?key_name?
. Use named place-holders where you would normally use
place-holders (?
) in SQL (with DBI).
When a named place-holder is included in a query, it gets replaced by just
a question mark (?
, a regular DBI place-holder) and the named value gets
pushed onto the list of query parameters.
There are also special forms of place-holders (?=key_name?
, ?!key_name?
,
?@key_name?
, and ?"key_name?
) which we will describe later.
- Dependency markers
A dependency marker is a name bracketed by exclamation points, like
!key_name!
. Use dependency markers to indicate that a given line
should only be included in the generated query if the named value is
defined in the data hash.
When the template is processed, dependency markers are just removed (and nothing is added to the list of query parameters).
If a line has multiple dependency markers, then you can request that the line be included if any of them are defined or only if all of them are defined (in the data hash). See the Tags section for how this is done.
You can also use !~key_name!
to negate the dependency. Lines marked with
a negated dependency are only included if the named key is undefined (or
missing) from the data hash.
Each tag indicates how to decide whether to include the tagged line of SQL in the generated query.
Tags must be one or more non-whitespace characters.
There are several pre-defined tags that don't require the use of a 'wanted' list / function:
*
A tag of *
(asterisk) means always include this line. The skeleton of
your query will be lines starting with *
.
#
A tag of #
(pound sign) means never include this line. Yep, they are
just comments.
&
A tag of &
(ampersand) means include this line if we have data for
ALL named place-holders and ALL dependency markers. These lines are
the work-horse lines that will handle most of the dynamic query assembly.
You can also use a custom tag that starts with an ampersand (&
), like
'&TOT'
. For such tags, first we check that we have data for all named
place-holders and all dependency markers. If not, then the line is simply
excluded. Otherwise, the &
is stripped and the remainder of the tag
is treated as a custom tag and will be checked against your 'wanted' list /
function.
The tag is &
to match Perl's &&
operator since a line like
& LIMIT ?limit? !~total! !paged!
only gets included if
defined $data->{limit}
&& ! defined $data->{total}
&& defined $data->{paged}
|
A tag of |
(vertical bar) means include this line if we have data for
ANY dependency markers as well as for ALL named place-holders. These
lines will be less common.
Similar to &
, you can use a custom tag that starts with a vertical
bar (|
), like '|SUM'
. For such tags, first we check that we have data
for ANY dependency markers and for ALL named place-holders. If not, then the
line is simply excluded. Otherwise, the |
is stripped and the remainder
of the tag is treated as a custom tag and will be checked against your
'wanted' list / function.
The tag is a |
because of how it treats dependency markers. A line with
two dependency markers, say !tot!
and !~sum!
, only gets included if
defined $data->{tot} || ! defined $data->{sum}
and the |
tag was chosen to match the ||
operator in that expression.
(Named place-holders are treated the same as for the &
tag since having
an undefined or missing value for a place-holder would just be fatal.)
Any other block of characters is a custom tag. Whether to include a line marked by a custom tag or not is determined by a 'wanted' list / function. If you use custom tags, then you must supply a 'wanted' list / function (described later).
To catch accidental omission of a tag, tags that are common SQL keywords
(like 'FROM') or that end with a comma are fatally rejected (unless you
specify a 'known_tags'
list).
For the same reason, an empty SQL line is fatally rejected unless you used
the '#'
tag or also omitted the tag. So there is no way to include a
blank line in the generated SQL because we want to catch cases like:
query => [
'* SELECT',
' *', # Oops, left off the tag on this line
'* , CASE ... END AS ...',
'* FROM ...',
...
],
You can use "* --"
to include a nearly-blank (SQL comment) line in the
generated SQL.
- SELECT trailing-comma clean-up
When including a line of SQL that begins with the word FROM
(case
insensitive, ignoring white-space), we remove the last character of the
previous (included) line, if and only if it is a comma (,
).
So, please put a comma after the last value in your SELECT list (if you can
follow it by a line that starts with FROM
) in order to simplify editing
of the template.
No special provisions are made for handling trailing commas anywhere else in SQL.
- WHERE clause leading-AND clean-up
To greatly simplify the very common case of building a WHERE
clause from
a subset of several optional conditional expressions that should all be
separated by AND
, we can also remove an AND
that appears immediately
after a WHERE
.
Specifically, if we include a line of SQL that ends with the word WHERE
(case insensitive, ignoring white-space) and the next (included) line of SQL
begins with the word AND
(case insensitive, ignoring white-space), then we
will replace the AND
with spaces.
Alternatively, you can replace your 'WHERE'
with 'WHERE TRUE'
. You
should certainly do this if there is a chance that sometimes all of the
conditional expressions will be omitted. For example,
* SELECT some, stuff
* WHERE
& AND foo = ?foo?
& AND bar = ?bar?
* ORDER BY some
will result in an SQL syntax error if neither 'foo' nor 'bar' keys are present (and defined) in the data hash. But this can and should be prevented by instead writing:
* SELECT some, stuff
* WHERE TRUE
& AND foo = ?foo?
& AND bar = ?bar?
* ORDER BY some
?=key_name?
A special form of named place-holder includes an equals sign (=
) before
the key name. This place-holder does special handling for NULL
values.
To specify a NULL
value, use \'NULL'
as the associated value (a SCALAR
reference to the string 'NULL'
).
So, a template line like:
& AND affil_parent ?=parent?
will become (if the 'parent' key is defined) either:
AND affil_parent = ?
or
AND affil_parent IS NULL
The second case (where the place-holder is replaced by 'IS NULL'
) happens
if $data-
{parent}> is \'NULL'
(ignoring case and external whitespace).
For this case, the list of query parameters is not added to.
The first case (where the place-holder is replaced by '= ?'
) happens
if $data-
{parent}> is not a reference (but is defined). For this case,
$data-
{parent}> (or just 'parent') is pushed onto the list of query
parameters.
?!key_name?
A similar special form of named place-holder includes an exclamation point
(!
) before the key name. This place-holder similarly supports \'NULL'
as the associated value. But this place-holder represents "distinct from"
(the opposite meaning compared to ?=key_name?
). Note that actually using
"is distinct from" or "is not distinct from" in your template is discouraged
for these cases as the Postgres query optimizer can be hampered by such.
So, a template line like:
& AND affil_parent ?!parent?
will become (if the 'parent' key is defined) either:
AND affil_parent <> ?
or
AND affil_parent IS NOT NULL
The second case (where the place-holder is replaced by 'IS NOT NULL'
)
happens if $data-
{parent}> is \'NULL'
(ignoring case and external
whitespace). For this case, the list of query parameters is not added to.
The first case (where the place-holder is replaced by '<> ?'
)
happens if $data-
{parent}> is not a reference (but is defined). For
this case, $data-
{parent}> (or just 'parent') is pushed onto the list
of query parameters.
?@key_name?
A place-holder with an at sign (like ?@key_name?
) requires that the
associated value be an ARRAY reference but otherwise behaves identically
to a plain, named placed-holder. DBD::Pg will treat the array reference
as a Postgres array value.
There are a few gotchas with using Postgres array values and ?@key_name?
so let's give an example of typical usage. First, let's show the typical
case that one would end up replacing with a use of ?@key_name?
:
push @where, 'account_id IN (' . join(',',('?')x@sub_accts) . ')';
push @param, @sub_accts;
# ...
join( ' AND ', @where )
# ...
which ends up generating SQL that includes something like:
... AND account_id IN (?,?,?,?,?) AND ...
where the number of '?'
s matches the number of elements in @sub_accts
.
The SQL we want to generate to use a Postgres array value instead would be:
... AND ARRAY[account_id] <@ ? AND ...
ARRAY[account_id]
makes a Postgres array value containing a single value
(the value of account_id). <@
means "is contained in". And the ?
is a plain DBI place-holder.
So the new code would look something like:
$rows = $db->select_all( DBIx::PreQL->build_query( {
data => {
sub_accts => \@sub_accts,
# ...
},
query => [
# ...
"& AND ARRAY[account_id] <@ ?\@sub_accts?",
# ...
],
# ...
} ) );
Note how I had to put a backslash (\
) in front of the at sign in
?\@sub_accts?
because my SQL template string was enclosed in double
quotes. If I hadn't done that, the query would not have worked. Luckily,
build_query() would almost certainly have complained because that line had
a '&'
tag but no named place-holders.
Note that a reference to an empty array would mean that
ARRAY[account_id] <@ ?\@empty?
would never be true (it would generate an SQL syntax error using the old method). Sometimes that is what is wanted. Often, an empty array means that the condition should just be ignored. You can accomplish that easily as follows:
data => {
sub_accts => @sub_accts ? \@sub_accts : undef,
# ...
},
query => [
# ...
"& AND ARRAY[account_id] <@ ?\@sub_accts?",
# ...
],
?"key_name?
For the rare, complicated case, you can put a double quote before the key name in a named place-holder. This place-holder will be replaced by the string value associated with that key name in the data hash. So the string value should be a snippet of valid SQL.
For example:
& AND affil_parent IS ?"parent? NULL
could be combined with:
data => {
parent => $has_parent ? 'NOT' : '',
# ...
},
to only find items with a non-NULL affil_parent if $has_parent
is true and
vice versa.
The values in the 'data'
hash are usually expected to be strings (or maybe
numeric values). But some other types of values are also handled by this
templating system.
undef()
The named key being present but with an undefined value associated with it causes the templating system to act the same as if the key were not present.
\$sql
If the associated value for a named place-holder is a reference to a scalar,
then the referenced scalar is expected to contain a valid snippet of literal
SQL (similar to how AT::SQL and other helpers treat such SCALAR refs). The
named place-holder will be replaced with the literal SQL (not with '?'
)
and the list of query parameters will not be added to.
For ?=key_name?
, '= '
will also be inserted just prior to the literal
SQL snippet, unless the snippet is equivalent to NULL
, in which case it
will be preceded with 'IS '
instead. While for ?!key_name?
, '<
'>
will precede the snippet, except for NULL
which will be preceded by
'IS NOT '
.
For ?@key_name?
, a reference to a scalar is a fatal error.
For dependency markers, the value being a reference does not (currently) matter.
\'NULL'
A reference to a string of 'NULL'
(ignoring case and external whitespace)
is treated differently from a reference to some other snippet of SQL only for
the ?=key_name?
and ?!key_name?
place-holders (as documented elsewhere).
\@list
DBD::Pg can use ARRAY references to represent a Postgres array value, including in a query parameter. So it is good to allow an array reference as a data value. However, it is quite hard to imagine a spot in an SQL template where a Postgres array and a non-array value would both be equally valid.
So we require you to declare whether or not you expect the place-holder to
take an ARRAY reference. ?@key_name?
requires an array reference. Other
place-holders treat an array reference as a fatal error.
- Stringifier
If a data value is a reference to a blessed object that overloads stringification, then no special behavior is triggered. The object may be pushed onto the list of query parameters where it will likely later be stringified.
In such a case, the blessed object being a reference to a SCALAR or to an
ARRAY will be ignored. So, for example, a blessed reference to an ARRAY
that overloads stringification is a fatal error for a ?@key_name?
place-holder.
- Other references
Other types of references are treated as fatal errors by named place-holders. Dependency markers currently treat any kind of reference the same as a non-reference. But these behaviors should not be relied upon.
Future versions of this module may add additional special treatments for different types of references, including changing how dependency markers treat reference values.
The 'wanted'
argument can be a reference to an array containing just the
custom tags whose lines should be included in the generated query.
For truly complicated cases, the 'wanted'
argument can be a CODE reference
that is called for each custom tag. The associated line(s) will be included
in the generated query if and only if the sub returns a true value for that
tag.
A 'wanted'
function takes two arguments: a tag and the 'data'
hash-ref.
For example, here is one that includes lines for tags that are the same as any
(lower-case) 'data'
hash key having a defined value:
wanted => sub {
my( $tag, $data ) = @_;
return defined $data->{ lc $tag };
},
Here is a more complex example. Not that this is a great example of
a case where a 'wanted'
function is preferred over a simpler
'wanted'
list. But it is a good example of a relatively sane
'wanted'
function that also avoids the example being extremely complex.
my( $query, @params ) = DBIx::PreQL->build_query(
data => \%data,
query => [
"* SELECT",
"C count(*),",
"D m.name,",
"D m.height,",
"* FROM tbl_monkey AS m",
"T JOIN tbl_tree AS t USING( monkey_id )",
"* WHERE",
"&T AND t.height >= ?min_height?",
"&T AND t.bark = ?bark?",
"* AND barrel_id ?=barrel_id?",
"* AND m.name ILIKE '%' || ?monkey_name? || '%'",
"* AND m.color ?!skip_color?",
"D ORDER BY name",
],
known_tags => [qw< C D T >],
wanted => sub {
my( $tag, $data ) = @_;
return defined $data->{total}
if 'C' eq $tag; # 'C'ount
return ! defined $data->{total}
if 'D' eq $tag; # 'D'ata (not 'count')
return 0 < grep defined $data->{$_}, 'min_height', 'bark';
if 'T' eq $tag; # 'T'rees
die "Unknown tag ($tag)";
},
);
But note that this equivalent version is simpler in several respects:
my( $query, @params ) = DBIx::PreQL->build_query(
data => \%data,
query => [
"* SELECT",
"& count(*), !total!",
"D m.name,",
"D m.height,",
"* FROM tbl_monkey AS m",
"| JOIN tbl_tree AS t USING( monkey_id ) !bark! !min_height!",
"* WHERE",
"& AND t.height >= ?min_height?",
"& AND t.bark = ?bark?",
"* AND barrel_id ?=barrel_id?",
"* AND m.name ILIKE '%' || ?monkey_name? || '%'",
"* AND m.color ?!skip_color?",
"& ORDER BY name !~total!",
],
wanted => defined $total ? [] : ['D'],
known_tags => ['D'],
);
This is also the our only example that makes use of the '|'
tag.
Conditional generation of SQL is a problem with many bad solutions and no really good ones. This library attempts to offer a solid less-bad solution that keeps SQL near the surface and favors simplicity and readability over enforced correctness. If you want correctness, you will have to provide it yourself.
Don't get too fancy with your 'wanted' subroutines.
Pay attention to your AND
s, OR
s, and other joining words / characters.
NONE, but please include the empty list when you use
the module:
use DBIx::PreQL ();
so the fact that nothing is being imported is obvious to the person reading that code.
build_query is called as a class method with arguments in name/value pairs:
( $query, @params ) = DBIx::PreQL->build_query(
query => $template_string,
data => \%data,
wanted => \@wanted_tags,
known_tags => \@tag_list,
keep_keys => $boolean,
);
Most of the prior documentation covers the details of using this method, the only functionality provided by this module.
- query
Required. A string of several (tagged) lines that is your query template. You can also pass a reference to an array of lines.
- data
Virtually required. A reference to a hash whose keys can be referenced in your query template and whose values can end up in the resulting parameter list (or even incorporated into the generated SQL).
Technically, you don't have to provide a 'data'
argument. But not
providing one rather severely restricts your templating options and so
is a rather unlikely scenario.
- wanted
This argument is required if you use any custom tags. This argument is ignored otherwise.
Either 1) a reference to an array containing the list of (custom) tags whose lines should be included in the generated SQL.
Or 2) a reference to a subroutine that will be passed a $tag
and the
'data'
hash-ref and that returns a true value if lines having that $tag
should be included (and returns a false value if such lines should be
excluded).
- known_tags
Optional. A reference to an array of custom tag strings. Used to catch typos when composing custom tags. Finding a custom tag not in this list is a fatal error. Having a tag in this list that is never found is a warning.
- keep_keys
Optional. A boolean value. Unlikely to be used.
When true, place-holder names (the keys to the 'data'
hash-ref) are
what get pushed onto the query parameter list. When false (the default),
what gets pushed onto the query parameter list are the values from the
'data'
hash-ref.
Returns a string which is the generated SQL query and a list of query parameters to be used with that query.