## MaCaQue : Macros for Caml Queries MaCaQue (or macaque) is a DSL for SQL Queries in Caml. It has the following objectives : - be highly compositional - use expressive types to avoid runtime errors macaque is still a work in progress, please send feedback to `bluestorm dot dylc (gmail)`. I'm interested in bug reports and reasonable feature requests. Usual makefile provided (in src/) : make, make install, bleh, make uninstall Depends on PG'Ocaml Summary of this document : - [Important macaque types and structures](#macaque_types) - [Sending queries to the SQL server](#sending_queries) - [Processing query results](#processing_results) - [Single value queries](#single_value_query) - [General syntax : values, comprehensions, queries](#syntax_description) - [View expressions](#view) - [View results](#view_results) - [View modifiers](#view_modifiers) - [Comprehension items](#comprehension_items) - [GROUP BY expressions and aggregate functions](#group_by) - [Empty GROUP part](#empty_group) - [Restrictions on the GROUP record, accumulators \[..\] syntax](#accumulators) - [Static GROUP BY typing errors](#static_group_by_typing) - [Non-grouped aggregates](#non_grouped_aggregates) - [Description syntax](#descriptions) - [Auto-incrementing columns](#incr_column) - [Sequence descriptions](#sequence_descriptions) - [Checking descriptions](#check_descriptions) - [Sql functions, operators and data types](#SQL_operators) - [Sql value operators](#value_operators) - [Value types antiquotations](#value_antiquotations) - [Column types](#column_types) - [Sequence operators](#sequence_ops) - [View antiquotations](#view_antiquotations) - [phantom types](#phantom_types) - [Specific type fourberies](#type_subtleties) - [Sql interface safety](#interface_safety) - [Update subtyping problem](#update_subtyping) - [Remarks](#general_remarks) - [Side effects](#side_effects) - [Semantic of row bindings](#row_binding_semantics) - [Camlp4 use](#camlp4_use) - [Macaque and PG'OCaml](#pgocaml_compat) - [NULL and semantic issues](#null_inference) ### Important macaque types and structures [.macaque_types] macaque builds SQL queries/views/values at three different levels : ##### query Macaque queries represent `SELECT`, `INSERT`, `DELETE` and `UPDATE` queries. They can not be composed. Examples: - Select all rows from table `sql_table` whose `id` is inferior to `50` : <:select< row | row in $sql_table$; row.id < 50 >> - Delete all rows from `sql_table` whose `id` is superior to `50` : <:delete< row in $sql_table$ | row.id > 50 >> Type : macaques queries have type `'a Sql.query`, where `'a` is the return type of the query : a row list for `SELECT` queries, unit otherwise. ##### view Views represent SQL tables, views and (sub-)-`SELECT` results; view are composable and can be used nearly anywhere a SQL table can. Examples : - User name and category of the user with id superior to `42` <:view< {user = u.name; category = cat.name} | u in $users$; u.id > 32 + 10; cat in $categories$; cat.id = u.cat >> - Names of all simple users of the 'prev_view' view (wich might be the previous example view) <:view< {name = row.user} | row in $prev_view$; row.category = "simple user" >> - Generic function that will project any view on its "name" column let names view = <:view< {name = t.name} | t in $view$ >> - Generic function that will build an associative table, wich the 'name' column in the 'name' field and the whole column in the 't' field. let names view = <:view< {name = t.name; t = t} | t in $view$ >> Type : `('a, 'b) Sql.view`, where `'a` is the type of a row (considered as a value). `'b` is either `non_writable` or `'c writable` : only writable views correspond to real tables in the SQL database, and thus insert/update/delete won't accept non-writable views. The `'c` type parameter in `'c writable` is useless for now, it will eventually be used to transport information about default values for table fields. :Note: views and select queries are distinct objects (view = reusable set of rows, select = final query), but views are very common so a specific library function is provided to directly send a view to the SQL server, without having to manually build a select query first. ##### value Values represent SQL values and expressions : `int`, `float`, `string`, and results of SQL data functions and operators. Examples : - `2` (as a Sql Value, not a Caml int) <:value< 1 + 1 >> - `true` (as a Sql Value, not a Caml bool) <:value< 1 = 1 >> - field 'id' of table 'foo', casted as a nullable value (wich can take the `NULL` value) <:value< nullable foo.id >> - returns twice the given int, as a Sql value let double (n : int) = <:value< 2 * $int:n$ >> - returns the immediate row with column `foo` and `bar` of values `1` and `baz` <:value< { foo = 1; bar = "baz" } >> Type : `'a Sql.t`, the most pervasive type in macaque. `'a` is actually a rich type containing the value type (int, string, a row, etc.) but also several phantom types witnessing different properties of the value (detailed description in the "phantom types" section of this documentation). View rows are also values, and rows can be nested at will : let strange_pack foo = <:value< { a = 1; b = { c = 2; d = $foo$ } } >> ### Sending queries to the SQL server [.sending_queries] Most of macaque efforts are directed towards producing safe and composables SQL queries, and rebuilding typed values from the all-string result of SQL queries. The actual interaction with the SQL server is relatively minor and well separated. It is done through the `Query` module. The `Query` module is designed to work with PGOCaml (and is actually the only part of macaque with a hard dependence on PGOCaml). It thus reproduces the functorized design of PGOCaml, wich is intended to support flexible threading/concurrence models. See PGOCaml documentation for more information. The casual user can use the non-functorized values, wich are a directly-usable implementation of the `QUERY` interface, with no threading integrated. The core `Query` interface is quite simple : val query : _ Db.t -> ?log:out_channel -> 'a Sql.query -> 'a Db.monad val view : _ Db.t -> ?log:out_channel -> 'a Sql.view -> 'a list Db.monad val view_one : _ Db.t -> ?log:out_channel -> 'a Sql.view -> 'a Db.monad val view_opt : _ Db.t -> ?log:out_channel -> 'a Sql.view -> 'a option Db.monad (in the simple case, `_ Db.t` is `_ PGOCaml.t`, and `'a Db.monad` is just `'a`) The main function is `query`, wich accepts a database handler (the one produced by `PGOCaml.connect`), the query, and returns the typed result : in case of `SELECT`, a list of value rows, otherwise (`UPDATE`, `INSERT`...) `unit`. An optional parameter `?log` will output the query string in the given output channel, if any. `view`, `view_one` and `view_opt` are given for convenience. `view_one` expect exactly one result, and `view_opt` at most one result; if they get more results, they will raise a `Failure` exception. :Note: The query module is very loosely integrated with the rest of Macaque. In particular, all the typed result processing is done by the `Sql` modules, through the `Sql.handle_query_results` function. It makes the `Query` module inessential : an user willing to experiment other interfaces to the SQL server could easily get the SQL string itself (`Sql.sql_of_query`, `Sql.sql_of_view`), call the SQL server in her preferred way, and give back the results to `Sql.handle_query_results`. Beware however that `Sql.handle_query_results` is unsafe, as it strongly relies on results corresponding to the given query; giving incorrect results (eg. coming from another query) results in an undefined behaviour, even possibly a Segfault. You should really not try anything clever except giving the query string to a PostGreSQL server (in any way you like), and sending back the results to Macaque. ### Processing query results [.processing_results] macaque produces `Sql.t` values from the query results. The real caml values can be accessed using the `Sql.get` and `Sql.getn` operators : val get : < get : _; nul : non_nullable; t : 't #type_info > t -> 't val getn : < get : _; nul : nullable; t : 't #type_info > t -> 't option `Sql.get` operates on non-nullable values, while `Sql.getn` operates no nullable values and returns an `option`. A common use case is to access fields from a row, leading to code such as `Sql.get row#field`. The problem with this syntax is that it is not chainable : to access field of a subrow (row inside the result row), wich is a Sql.t value itself, one has to write code such as `Sql.get (Sql.get row#sub_row)#field`. To lighten the notation, a specific syntaxic sugar is integrated with the `pa_bananas` syntax extension : `a#!b` is equivalent to `Sql.get a#b`, and `a#?b` is equivalent to `Sql.getn a#b`. It is thus possible to write `row#!sub_row#!field` directly. Note that `a#?b` still returns an option, so you can't have something like `a#?b#?c` (as `a#?b` is not an object) : you still need to do pattern matching on the option value yourself. I could create an unsafe constructor wich would raise an exception in the `None` case, but that would defeat the point of macaque (as few runtime errors as possible). I also feels that such facilities would encourage nullability-laziness from the user : you have to work on your database design upfront to eradicate nullable types were possible; in the case were a `NULL` case is a semantic necessity, you will want the explicit option matching anyway. #### Single value queries [.single_value_query] It is sometimes useful to query single values from the database, instead of the usual list-of-rows SELECT query -- for example to request the current time, or the value of a specific counter. For this purpose, Macaque was extended with: - A `Sql.value` function building a query from a `Sql.t` value; This query, once run through `Query.query`, agains return a `Sql.t` value, on which `Sql.get` or `Sql.getn` can be used depending on its nullability. - Convenience functions in Query to get the extracted value directly without using Sql.get(n): `Query.value` for non-nullable value, and `Query.value_opt` for nullable ones. As an example, the following three definitions of `get_time` are equivalent: let time = <:value< current_timestamp >> let get_time dbh = Sql.get (Query.view_one dbh <:view< {x = $time$} >>)#x let get_time dbh = Sql.get (Query.query dbh (Sql.value time)) let get_time dbh = Query.value dbh time ### General syntax : values, comprehensions, queries [.syntax_description] macaque use the Camlp4 quotations system to integrate Domain-specific syntaxes into the Objective Caml language. General principles are as follows : ##### quotations `<:foo< ... >>` is a quotation using the 'foo' syntax expander. macaque provides 6 syntax expanders : - 'select', 'insert', 'delete', 'update' : queries syntax expanders - 'view' : view syntax expander - 'value' : value syntax expander 'view' is the default syntax expander : `<< ... >>` is equivalent to `<:view< ... >>` ##### antiquotations Inside a quotations, `$foo$` is an antiquotation with denotes a caml-level value : let value v = <:value< $v$ >> (* the identity function on values *) antiquotations can also be named in specific context : let int n = <:value< $int:n$ + 0 >> (* function that transforms a caml int into a Sql.t value *) named antiquotations are not hardcoded in the syntax definition : they rely on values of the `Sql` module, wich are in the appropriate submodule : `$foo:bar$` will expand to `Sql.Module.foo bar`, where the `Foo` module depends on the antiquotation position (`View`, `Value`, `Table_type`); this way, on can read the `Sql` interface to discover named parameters and their meaning; similarly, operators are defined in `Sql.Op`. ##### Loosely defined syntax - camlp4 quotations symbols aren't represented - `[foo]` means optional `foo` (one or none) - `foo sep ...` means list of `foo`s separated by `sep` (";" or ",") :BNF: select ::= view insert ::= table ':=' value [refinement] delete ::= row name 'in' table refinement update ::= row name 'in' table ':=' value refinement view ::= comprehension | view-op '(' view ')' '(' view ')' ... comprehension ::= comp-result comp-modifiers [refinement] refinement ::= '|' comp-items ';' ... comp-result ::= value | 'group' tuple ['by' tuple] comp-modifiers ::= ['order' 'by' order ',' ...] ['limit' value] ['offset' value] order ::= value ['asc' | 'desc'] comp-item ::= row-binding | guard row-binding ::= row-name 'in' view guard ::= boolean value value ::= litteral constants | 0-ary operator : 'null', 'current_timestamp' | 1-ary operator : 'nullable' value | infix binary operator : value '+' value | tuple : tuple | row '.' field : field-access | row '?' field : default value for a field | 'cast' value 'as' column-type tuple ::= '{' tuple-field ';' ... '}' tuple-field ::= field-name '=' value | field-access field-access ::= value '.' field-name '.' ... view-op ::= | 'union' | 'union_all' | 'intersect' | 'intersect_all' | 'except" | 'except_all' column-type ::= identifier (see the "Column Types" section) :Note: A few remarks. - ";"-separated lists allows an optional ending ";", to match OCaml syntax : {a;b;} - 'table' denotes a place where an `'a table` type is required - antiquotations can be used in 'view', 'table' and 'value' places. You can use antiquoted expressions in all the places where a 'view', 'table' (view) or 'value' (including rows, but not row names or field names) is accepted - Tuples, comprehension and guard lists can be empty : - `<:view< $row$ | >>` is the view with only one line, wich is 'row' - `<:insert< $table$ := $row$ | >>` inserts a single row - `<:delete< row in $table$ | >>` deletes a whole table - `<:update< row in $table$ := $value$ | >>` updates a whole table In the 'view' and 'insert' cases, the ending "|" is optional. It is however not the case for 'update' and 'delete' : see the remark at the end of this section. - In order to simplify the common `{ a = foo.a; b = bar.foobar.b }` use case, tuple field names are optional when the value is a field access : in that case, the name of the accessed field is used (in case of nested access, eg.`'foo.bar.baz`, the rightmost name is used, here `baz`); the example can thus be written `{ foo.a; bar.foobar.b }` - if the `BY` record of a `group .. by` expression is empty, the `BY` part (`by {}`) is optinal. See the [GROUP BY expressions and aggregate functions](#group_by) section. ##### Remark : implicit exhaustive manipulation [.exhaustive_manipulations] It was found that user sometimes forgot the guards in an `UPDATE` or `DELETE` query. The results of that mistake are usually rather displeasing (deleting or modifying the whole table). In order to make the mistake more visible and less likely to happen, the "|" before the guards of an `UPDATE` or `DELETE` query are not optional by default : if the user forgets the guards altogether, she will also forget the "|" and will get an error. ### View expressions [.view] :BNF: view ::= comprehension | view-op '(' view ')' '(' view ')' ... comprehension ::= comp-result comp-modifiers [refinement] view-op ::= | 'union' | 'union_all' | 'intersect' | 'intersect_all' | 'except" | 'except_all' #### View results [.view_results] View results are either a simple value or a `GROUP BY` expression (see the [GROUP BY expressions and aggregate functions](#group_by) part of this document). The usual return value is a SQL tuple: `{a = foo; b = bar}` will return a view with columns named `a` and `b` and values corresponding to the `foo` and `bar` expressions. You can also give the name of a row : `<< t | t in .. >>` Columns values can be tuples themselves. This is a quite Macaque-specific feature, wich makes the language much more homogenous. Tuple types (column or immediate records) can be nested arbitrarily. For example, a cartesian product view : let cartesian_product va vb = << {a = a; b = b} | a in $va$; b in $vb$ >> Care is taken that tuple values support all operations the plain SQL values support : Macaque works hard to translate them to expressions the usual SQL server can understand (tuple flattening). This part of macaque is rather delicate and you might encounter bugs with no-yet-well-tested combinations of nested tuples and other macaque features. #### View modifiers [.view_modifiers] View modifiers correspond to following SQL features : `LIMIT`, `OFFSET` and `ORDER BY`. Their behaviour should not come as a surprise : `ORDER BY` will sort the returned rows according to the specified sorting criteria, `LIMIT` will restrict the number of returned rows, and `OFFSET` will skip a given number of rows (for example, `LIMIT 2 OFFSET 3` will return the 4th and 5th row of a view). `ORDER BY` syntax mimics the usual SQL syntax : << ... order by foo asc, bar desc, ... >> The ordering modifier (`asc` | `desc`) is optional, `asc` will be choosed by default. If the ordering expression `(foo, bar)` is a record (immediate tuple or row), no guarantee is given as to the column ordering decided (fields of a macaque record are not ordered, so a lexicographic order cannot be chosen) : the view result will be sorted on each column of the tuple, with the given ordering modifier. `ORDER BY` expressions are in the comprehension items scope : they can depend on the rows bound in the latter part of the comprehension. `LIMIT` and `OFFSET` are not : obviously `<< foo LIMIT foo.id | foo in $..$ >>` is not a well-formed query. #### Comprehension items [.comprehension_items] There is one important thing to know about comprehension items \: row bindings are not sequential, they're simultaneous (`let .. and .. and .. in ..`) ! This is a not so happy feature, as the comprehension syntax strongly suggest a sequential binding : there is an impedence mismatch between the comprehension syntax and the SQL behaviour. For further discussion, see the [Remarks > semantic of row bindings](#row_binding_semantics) section of this manual. Of course, guards can depend on the declared rows. #### View operators The usual UNION, INTERSECT and EXCEPT operators can be used in prefix position, with parentheses around their parameters. The syntax allows to pass arbitrary number of parameters, and a left-associativity is applied: except (foo...) (bar...) (baz...) is equivalent to except (except (foo...) (bar...)) (baz...) ### GROUP BY expressions and aggregate functions [.group_by] The SQL query `SELECT fields GROUP BY group_fields` roughly translate in the `group {fields} by {group_fields}` expression : after `group` the aggregating part, and after `by` the grouping part. The returned row are the concatenation of the `GROUP` record (`{fields}`) and the `BY` records (`{group_fields}`). Example : << group {subtotal = sum[t.a]} by {k = t.b} | t in $view$ >> Will expand to SELECT SUM(t.a) AS subtotal, t.b AS k GROUP BY t.b FROM (...) AS t #### Empty GROUP part [.empty_group] `GROUP BY` expressions with all fields in the `BY` part are equivalent to a `SELECT DISTINCT` query. #### Restrictions on the GROUP record, accumulators [..] syntax [.accumulators] The `BY` values (here `t.b`) can be any expression depending on the bound view (here `t`, but possibly more than one), and anything else in scope. The group values are restricted to match the `GROUP .. BY` semantic : - values depending on the bound tables of the view must be inside "accumulators", that is square brackets : - `<:view< group {c = t.c} by {k = t.b} | ... >>` is illegal (and won't type), - accumulators must be used by an aggregate function, and nowhere else : - `<:view< group {c = [t.c]} by {k = t.b} | ... >>` is illegal (and won't type), - `<:view< group {c = count[t.c]} by {k = t.b} | ... >>` is legal - fields name bound in the `BY` record can be used freely : - `<:view< group {c = k; d = k + k; e = count[l]} by {k = t.b; l = t.c} | ... >>` is legal #### Corresponding typing errors [.static_group_by_typing] - Use of group-varying values outside accumulators Error: This expression has type Sql.grouped_row but an expression was expected of type < nul : Sql.non_nullable; t : 'a #Sql.row_t; .. > Sql.t Cause : use of a bound row in the `GROUP` record outside an accumulator - Use of aggregate functions outside accumulators Error: This expression has type < nul : 'a; t : 'b > Sql.group Sql.unsafe but an expression was expected of type < nul : 'c; t : 'd; .. > Sql.t Sql.unsafe Cause : one of the field values is an accumulator expression; accumulators should be consumed by aggregate functions and cannot be returned directly #### Non-grouped aggregates [.non_grouped_aggregates] Aggregates functions *cannot* be used outside group by expressions. If you want to use an aggregate functions over all the rows of a table, use a group by expression with an empty BY record : `SELECT max(t.id) FROM ...` -> `group {max = max[t.id]} by {} | ...` In that case, `by` is optional : `group {max = max[t.id]} | ...` Rationale : aggregate functions are not regular operators; they have a non-trivial semantic, wich is modeled by the typing transformations inside the group .. by expressions. Using them outside those expressions would not be safe and could lead to runtime errors if the SQL server doesn't accept an ill-formed query. It is a design goal of macaque to protect from runtime errors. ### Description syntax [.descriptions] Description syntax is used to describe existing database tables, making them as macaque views (internally, it builds a runtime description of the data-base typing informations, to be used by further macaque processing). The syntax is a far relative of `SQL CREATE TABLE` statements, and can be discovered in the example base.ml file. Currently, no other information that name, type, nullability and default value are accepted. <:table< recipes ( recipe bigint NOT NULL, name text, amount bigint NOT NULL DEFAULT($ <:value< 0L >> $), category text NOT NULL DEFAULT($ <:value< "unknown category" >> $) ) >> This is not a table creation/specification tool : it does not free you from the need of creating your tables in the database (with a potentially richer description : foreign keys, etc.). If you change a table description, you will have to duplicate the changes in the caml description as well. #### Auto-incrementing columns [.incr_column] `SERIAL` columns are not supported (yet). It is however possible to emulate this feature with explicit `SEQUENCE` manipulations ([sequence operators](#sequence_ops) section) : let my_table = <:table< my_sql_table ( id bigint NOT NULL DEFAULT(nextval $table_id_seq$), ... ) >> At insertion site, one can then use the default access syntax to get the next identifier: <:insert< my_table := { id = my_table?id; ... } >> #### sequence descriptions [.sequence_descriptions] It is not possible to create new sequences from Macaque (this is coherent with the choice that macaques allows descriptions, not declarations), but macaque support sequence descriptions. Sequence creation operators are part of the `Sql.Sequence` module. let table_id_seq = <:sequence< serial "the_sql_sequence_name" >> #### Checking descriptions [.check_descriptions] To help you with the macaque/database synchronization, the `Check` module provide coherence check routines (`check_table`, `check_sequence`). It will raise errors if the caml-side description is not faithful to the real table structure (as described by the SQL server table informations). It is possible to enforce an automatic runtime check of every macaque-described table or sequence with the '-check_tables' command line option (camlp4 time), wich can be enabled by the `use_check` ocamlbuild flag (see the [OCamlbuild](#ocamlbuild) section of this document). ### OCamlbuild [.ocamlbuild] I have created a macaque-specific OCamlbuild plugin. It was originally intended to help during macaque development only, but is probably a valuable ressource if you want to use Macaque inside your project. Besides the classical ocamlfind integration, you'll find the `Sql_syntax` modules with support for the following tags : - `use_macaque` : enable the macaque syntax extensions - `use_check` : enable the `-check_tables` flag (see the [Decription syntax > Checking table descriptions](#check_descriptions) subsection) You should generally use the `macaque` tag, the three other preprocessing tags being there for finer-grained control. ### Sql functions, operators and data types [.SQL_operators] #### Sql value operators [.value_operators] macaque can use all function and operators defined in the `Sql.Op` module, using the standard ocaml syntax (and operator associativities and precedences) : - `<:value< $a$ + $b$ >>` is equivalent to `Sql.Op.(+) a b` - `<:value< nullable $foo$ >>` is equivalent to `Sql.Op.nullable foo` ##### aggregate function `Sql.Op` has some aggregate functions. See the [GROUP BY expressions and aggregate functions](#group_by) section of this document for more information. #### Value types antiquotations [.value_antiquotations] macaque supports some SQL data types, some of them having a litteral syntax (ints and string : `<:value< 2 >>`, `<:value< "foo" >>`). They can all be constructed by using named antiquotations : <:value< $bool:true$ >> <:value< $float:cos 1.2$ >> More generally, the type constructors are the values in Sql.Value : `<:value< $foo:bar$ >>` is equivalent to `Sql.Value.foo bar` #### Column types [.column_types] Data types used in table descriptions are defined in the `Sql.Table_type` module. We use a different set of type names, in order to mimic SQL type names and ease specification derivation from existing SQL tables. For example, "integer" is used instead of "int", and will expand to a `Sql.Table_type.integer` value. #### Casts [.casts] You can use an explicit cast of the form `cast foo as bar`, where `bar` is an identifier for a column-type as described in the "Column types" section above. This is useful in particular to work around some limitations of PostgreSQL type inference for nulls (see the "NULL and semantic issues" subsection): if a `null` somewhere is "too polymorphic" for PostgreSQL, you may want to use for example `cast null as integer` instead. #### Sequence description Sequence creators are defined in the `Sql.Sequence` module. #### Sequence operators [.sequence_ops] Sql.Op has some sequence functions. Currently supported are `currval` and `nextval`. They can be used to have an auto-incrementing identifier : <:insert< $table$ := {id = nextval $table_id_seq$; .. } >> See the src/tests/sequence.ml for example. #### View antiquotations [.view_antiquotations] In view positions, macaque supports antiquotations through the `Sql.View` module : `$foo:bar$` will expand to `Sql.View.foo bar`. ### phantom types [.phantom_types] Sql.t types sure are heavy. You *will* be confronted to cluttered error messages with ugly as hell unification problems. Hopes this section helps. Sql values are packed in a phantom type providing several information about the value. It is an object type with a field `t` containing type information, and other fields for value information. Type information fields : - 'typ' : the corresponding caml type (eg. `<typ : int>`); `Sql.null` has a polymorphic `t` field, as `None` for option types val null : < t : < typ : 'a; numeric : 'b >; nul : nullable; get : unit > t - 'numeric' : allows for numerical operators overloading (see `Sql.Op.(+)` type) Value information fields : - 'nul' field : nullability information, is either `Sql.nullable` or `Sql.non_nullable`. Constants from the `Sql.Data` operators have a polymorphic `nul` field, so that they can be used in both nullability context - 'get' : just-built sql values (<:value< 1 >>, etc.) or values retrieved from a SQL query are gettable : you can ask for their corresponding caml value values using the two accessors Sql.get and Sql.getn (get nullable) : val get : < t : < typ : 'a; ..>; nul : non_nullable; get : _; .. > Sql.t -> 'a val getn : < t : <typ : 'a; ..>; nul : nullable; get : _; .. > Sql.t -> 'a option On the contrary, values built from SQL operators or field access can not be transformed back into caml values without being first sent to the SQL server as part of a query : they don't have a 'get' field. Now, you probably understand why the error messages tends to be slightly longer than usual. And all sql values carry such information, including (nested) rows... ### Specific type fourberies [.type_subtleties] macaque being strongly (and intricately) typed, you will often find yourself confronted to hostile error messages, wich means someone has done something wrong. I hope that you will more often be wrong than macaque. There are nonetheless some specificites that you should be aware of. #### Sql interface safety [.interface_safety] Macaque syntax extensions transform user code into complicated caml code. But the produced codes still lies outside macaque module boundaries, and has access to the same information that the user has. Concretely, as most of the operations needed by those extensions are not typable inside the OCaml type system (for example dynamic construction of an object based on a list of field names and values), some unsafe operations had to be exposed through Sql interface^α. They are marked with the "unsafe" parametrized type (wich is only there for documentation purposes) and you should NOT use them : every function having an "unsafe" type somewhere in its interface is to be considered forbidden. All the other functions should be type-safe, otherwise it's a bug. :Footnote: α: and there is some Obj magic behind the scene; but it's protected by typing and you won't get a segfault, I hope. #### Update subtyping problem [.update_subtyping] An example of update syntax is << t in $tab$ := {amount = t.amout + 1} >>,- wich increment the "amout" column of all rows in table `tab`. Table `tab` probably has more fields than just the "amount" colum, so the type of the tuple on the right should be a subtype of `table` row type. The problem is that polymorphic subtyping quantification is not available in OCaml type system : subtype relations have to be explicitely constructed from the object types. This is fine when, as in the given example, the right tuple object type is known as camlp4-time (an object with only one field 'amout'), but not when the corresponding value is an antiquotation : let update table value predicate = <:update< row in $table$ := $value row$ | $predicate row$ >> In this case, the update syntax is used to define a generic update operation^β : I know of no syntaxic way to impose that the right tuple type be a subtype of the table row type. To keep things simple, there is an ad-hoc rule for the specific update problem : when tuple type is known as camlp4-time, a subtyping relation is used, but when there is a quotation, the two values types are unified : the set tuple has to have exactly the same columns as the table, no less. This loss of generality could surprise the innocent user. In order to prevent hair scratching, a warning is emitted at camlp4-time when an antiquoted caml value is used for the update tuple. This warning can be disabled using the -sql-nowarn-undetermined-update command line option. Of course, it is always possible to manually expand an antiquotated value, when the fields name are known at development-time : <:update< t in $table$ := $tup$ | ... >> should be rewritten into <:update< t in $table$ := { foo = $tup$.foo; bar = $tup$.bar } | ... >> :Footnote: β: notice how `value` and `predicate` are actually functions depending on `row` values; this kind of thing can greatly improve code factorization among your SQL queries ### Remarks [.general_remarks] #### Side effects [.side_effects] It is probably an obvious thing to say, but users should not put expressions wich have side-effect when evaluated inside macaque expressions. No guarantee is given that any part of a macaque expression will be evaluated at all, or only once. In case of doubt, you should explicitely evaluate expressions before handling them to the macaque expression : let my_val = my_expr in << ... $my_val$ ... >> #### Semantic of row bindings [.row_binding_semantics] From the [View expressions > Comprehension items](#comprehension_items) part of this document : % > There is one important thing to know about comprehension items : row > bindings are not sequential, they're simultaneous (let .. and .. and > .. in ..) ! This is a not so happy feature, as the comprehension > syntax strongly suggest a sequential binding : there is an impedence > mismatch between the comprehension syntax and the SQL behaviour. The rationale behind this choice is purely pragmatic : SQL queries use simultaneous bindings, and reproducing that choice much facilitate the translation. It would be possible to have sequential bindings, but that's somewhat more complicated, and I supposed it was unnatural to the SQL query writer anyway. I'm not quite satisfacted with this state of affairs, so it is possible that I replace the current behaviour with sequential declarations in the future. Please do not write code that would break if that was the case : `<< .. | t1 in $..$; t2 in $ .. t1 $ >>` when `t1` is already in the scope. Such trickery tremendously hurts readability anyway, so you should not use it even if it wasn't for compatibility reasons. #### Camlp4 use [.campl4_use] I am under the impression that some "serious" ocaml users try to avoid to include camlp4 in their compilation chain if possible, and to minimize their reliance on camlp4 extensions. Camlp4 is a not-so-simple (and really-not-so-well-documented) tool, and camlp4 extensions are sometimes fragile and more subject to bugs than other pieces of ocaml software; syntaxic bugs can have far-reaching effects (they are, however, nearly always spotted at compilation-time) and are difficult to spot for the non-camlp4-aware user. Specific care has been taken in Macaque to make the camlp4 processing as solid as possible : - camlp4 processing is split in two independent extensions (`pa_macaque`, `pa_bananas`) so that the user can make fine-grained decisions as to what extension to activate for a specific compilation - `pa_macaque` relies on the "quotation" mechanism of the existing ocaml camlp4 grammar : they DO NOT modify the ocaml grammar but instead reuse the `<:..< .. >>` mechanism wich is widely used inside camlp4 itself, and thus can be reasonably trusted - `pa_bananas` is the only extension modifying the OCaml grammar itself; it is a simple extension wich affects a localized part of the parsing process (the "." level of expressions : `a.b`, `a#b`..), with no precedence/associativity subtleties. It is a relatively trustable extension, and any spotted defects would be easy to fix. Moreover, `pa_bananas` is only a convenience extension and it is easy not to use it (use `Sql.get`/`Sql.getn` instead of `#!`/`#?`). If a problem were to arise, it would be easy for a user to mechanically replace all `pa_bananas` syntax uses by camlp4-free equivalent expressions, and drop `pa_bananas` from her compilation chain entirely. #### Macaque and PG'OCaml [.pgocaml_compat] Macaque relies on PG'OCaml low-level interface. It is fully compatible with PG'OCaml : you can use a database handler for Macaque queries and PG'OCaml queries at the same time (PG'OCaml stocks some private information in those handler; Macaque doesn't access nor modify them). PG'OCaml is more mature and its general approach leads to a safer software : Macaque will probably catch less errors at compile-time than PG'OCaml (see the PostGreSQL typing issues of the next section for an area where I believe Macaque lacks behind PG'OCaml in terms of safety), and it is very young software wich probably have quite a few bugs remaining. This inconvenient comes with the flexibility Macaque brings : it is composable, and does not need the SQL server available at compile time (or, more exactly, camlp4-time). I also believe that PG'OCaml prepared queries model is a bit more efficient than Macaque raw-SQL-queries output. I have done absolutely no work on Macaque optimisation, and my priority are rather safety and composability. If you're interested in investigating Macaque performances, let me know. #### NULL and semantic issues [.null_inference] I have tried to reproduce SQL behaviour as closely as possible. In particular, all operators are really SQL operators, so they will have the SQL behaviour, for example `null = null` is null, not false. I have encountered issues with PostGreSQL typing system though. PostGreSQL gives NULL the type 'unknown', wich is problematic because sql operators are strongly typed, but the type system is not very expressive. For example `NULL + 0` will work okay, but : base=> SELECT e.n + 0 FROM (SELECT NULL AS n) AS e; ERROR: failed to find conversion function from unknown to integer I have tried to work around those issues, but think some of them are probably lurking somewhere. It is unfortunate as it undermines Macaque static safety : these runtime errors are subtle and quite difficult to prevent. Please report them if you find some. I hope not.