Introduce sideways open dicts
torbjornlager opened this issue · 8 comments
Here is a proposal for sideways open dicts to complement the currently existing dicts (which we will refer to as closed dicts). Our point of departure is the following quote from the manual at http://swish.swi-prolog.org/pldoc/man?section=dicts:
In the current implementation, two dicts unify only if they have the same set of keys and the tags and values associated with the keys unify. In future versions, the notion of unification between dicts could be modified such that two dicts unify if their tags and the values associated with common keys unify, turning both dicts into a new dict that has the union of the keys of the two original dicts.
A dict is a structured compound datatype. A closed dict has a tag and a fixed number of pairs of keys and values. Two dicts unify, even if their key-value pairs comes in a different order:
?- point{x:1, y:2} = Tag{y:2, x:X}.
Tag = point,
X = 1.
?-
There is a path notation that allows us to select individual values if we have the key:
?- D = foo{x:1, y:2}, Val = D.x .
D = foo{x:1, y:2},
Val = 1.
?-
If the key does not exist, an exception is thrown:
?- D = foo{x:1, y:2}, Val = D.z .
ERROR: '$get_dict_ex'/3: key `z' does not exist in foo{x:1,y:2}
?-
Dicts can also be open. Two open dicts unify if their tags and the values associated with common keys unify, turning both dicts into a new dict that has the union of the keys of the two original dicts.
To keep dicts maximally useful, we are going to need both open and closed dicts as well as ways to convert between them. Our suggestion is to use the notation foo{a:1, b:2, ...}
for open dicts.
Unification of two open dicts results in an open dict:
?- OD = foo{x:Val, y:2, ...}, OD = foo{x:1, z:3, ...}.
OD = foo{x:1, y:2, z:3, ...},
Val = 1.
?-
Unification of a closed dict with an open dict results in a closed dict:
?- OD = foo{x:Val, y:2, ...}, D = foo{x:1, z:3}, D = OD.
D = OD, OD = foo{x:1, y:2, z:3},
Val = 1.
?-
With open dicts the path notation should work like so, rather than throwing an exception:
?- OD = foo{x:1, y:2, ...}, Val = OD^z .
OD = {x:1, y:2, z:_G123, ...},
Val = _G123.
?-
Note that the path notation uses ^
rather than a dot (.
) when we are dealing with open dicts. As we saw above, having two different path languages is probably necessary.[1]
We should be able to close an open dict:
?- dict_close(foo{x:1, y:2, ...}, D).
D = foo{x:1, y:2}.
?-
and open a closed one:
?- dict_open(foo{x:1, y:2}, D).
D = foo{x:1, y:2, ...}.
?-
We illustrate with an example from computational linguistics. A Definite Clause Grammar (DCG) is a unification-based grammar that uses Prolog terms to represent syntactic categories. In a term-based grammar formalism, the structure of a syntactic category is determined by the predicate symbol, the arguments, and the order of the arguments.
In a feature-structure based formalism, the structure of a category is defined by its features and their values. The value of a feature may be any element, either atomic or complex.
It is easy to see that we can use open dicts instead of terms when parsing and generating with ordinary DCGs. The following is a so called unification-based grammar for a tiny fragment of English. The use of feature structures implemented as the open dicts guarantees number agreement between the subject and the verb of a sentence:[2]
s(S) --> np(NP), vp(VP), {
S^head = VP^head,
S^head^subject = NP^head
}.
np(NP) --> [uther], {
NP^head^agreement^number = singular,
NP^head^agreement^person = third
}.
vp(VP) --> v(V), {
VP^head = V^head
}.
v(V) --> [sleeps], {
V^head^form = finite,
V^head^subject^agreement^number = singular,
V^head^subject^agreement^person = third
}.
v(V) --> [sleep], {
V^head^form = finite,
V^head^subject^agreement^number = plural
}.
An example run:
?- phrase(s(S), [uther, sleeps]).
S = _{head:_{form:finite,
subject:_{agreement:_{number:singular,
person:third,
...},
...},
...},
...}.
?- phrase(s(S), [uther, sleep]).
false.
?-
[1] We are also inspired by the Oz programming language, which boosts open as well as closed records with two different path languages, with a functionality very similar to the one that our open and closed dicts give us.
[2] The grammar example is borrowed from Stuart Shieber, see
https://dash.harvard.edu/handle/1/11576719.
As I mentioned in the email on the swipl mailing list, I have implemented this using attributed variables, as Jan suggested. The pack is here. In addition, I've implemented some syntactic sugar for that tries to be as similar to your suggestion here as possible. But since _{...}
is not a valid dict, for now the syntax is _{... : ...}
(spaces required). I'm open to alternatives. I use the function_expansions
library to implement this. And I also use it to implement that open dict key paths as you suggest, but I use the operator .^
instead of ^
so as not to conflict with other uses of ^
.
So with the current syntactic sugar choices, the synopsis of the functionality is:
?- use_module(library(open_dicts)).
?- A = _{a: 1, ... : ...}, B = _{b : 1, ... : ...}, A = B.
A = B,
open_dict(_104{a:1, b:1}, B).
?- A = _{a: X, ... : ...}, contains(A, [a-1, b-2]).
X = 1,
open_dict(_3520{a:1, b:2}, A).
?- A = _{a:Val, b:2, ... : ...}, B = _{a:1, c:3, ... : ...}, A = B.
A = B,
Val = 1,
open_dict(_4266{a:1, b:2, c:3}, B).
?- A = _{a:Val, b:2, ... : ...}, B = _{a:1, c:3, ... : ...}, A = B, close_dict(B).
A = B, B = _50{a:1, b:2, c:3},
Val = 1.
?- A = _{a: _{b:1, ... : ...}, ... : ...}, B = A.^a.^b.
B = 1,
open_dict(_1430{a:_1434}, A),
open_dict(_1450{b:1}, _1434).
?- A = _{a: _{b:X, ... : ...}, ... : ...}, A.^a.^b = 1.
X = 1,
open_dict(_5378{a:_5382}, A),
open_dict(_5398{b:1}, _5382).
Cool! But there is a definitely a need for prettier syntax. That's probably a job for Jan though, if he want to go further with the idea.
I changed the syntax last night. A dict with a + appended to it is an open dict. So the above code now looks like:
?- use_module(library(open_dicts)).
?- A = _{a: 1}+, B = _{b : 1}+, A = B.
A = B,
open_dict(_104{a:1, b:1}, B).
?- A = _{a: X}+, contains(A, [a-1, b-2]).
X = 1,
open_dict(_3520{a:1, b:2}, A).
?- A = _{a:Val, b:2}+, B = _{a:1, c:3}+, A = B.
A = B,
Val = 1,
open_dict(_4266{a:1, b:2, c:3}, B).
?- A = _{a:Val, b:2}+, B = _{a:1, c:3}+, A = B, close_dict(B).
A = B, B = _50{a:1, b:2, c:3},
Val = 1.
?- A = _{a: _{b:1}+}+, B = A.^a.^b.
B = 1,
open_dict(_1430{a:_1434}, A),
open_dict(_1450{b:1}, _1434).
?- A = _{a: _{b:X}+}+, A.^a.^b = 1.
X = 1,
open_dict(_5378{a:_5382}, A),
open_dict(_5398{b:1}, _5382).
For completeness, here is your parsing example (I added the word "we"):
:- use_module(library(open_dicts)).
s(S) --> np(NP), vp(VP), {
S.^head = VP.^head,
S.^head.^subject = NP.^head
}.
np(NP) --> [we], {
NP.^head.^agreement.^number = plural,
NP.^head.^agreement.^person = first
}.
np(NP) --> [uther], {
NP.^head.^agreement.^number = singular,
NP.^head.^agreement.^person = third
}.
vp(VP) --> v(V), {
VP.^head = V.^head
}.
v(V) --> [sleeps], {
V.^head.^form = finite,
V.^head.^subject.^agreement.^number = singular,
V.^head.^subject.^agreement.^person = third
}.
v(V) --> [sleep], {
V.^head.^form = finite,
V.^head.^subject.^agreement.^number = plural
}.
go :-
forall(phrase(s(S), X, []),
(close_dicts_in_term(S, Closed),
format('Sentence: ~p~n', [X]),
format('Parse: ~p~n', [Closed]),
nl
)
).
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
?- go.
Sentence: [we,sleep]
Parse: _708{head:_730{form:finite,subject:_756{agreement:_778{number:plural,person:first}}}}
Sentence: [uther,sleeps]
Parse: _1528{head:_1550{form:finite,subject:_1576{agreement:_1598{number:singular,person:third}}}}
true.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
Great, that looks quite pretty!
I still think this would be a great feature to have! I spent the last hour reading through the dict-functions documentation, looking exactly for how to create/unify dictionaries that only have a subset of keys in common.
It would be also quite handy to have some syntax to easily match structure of a dict right in the clause header, for example when processing parsed JSON documents. The :<
is of limited use, as it doesn't handle nested dicts so something that would ideally write as match(_{foo: _{bar: Bar}}) :- ...
becomes a quite verbose match(S) :- _{foo: Foo} :< S, _{bar: Bar} :< Foo, ...
.
It is not very hard to define some syntax and do term expansion. As is though, Prolog head matching is unification. We can introduce alternative neck symbols to have other semantics, but this can be left to libraries.