Compostionality of Atomese processing stages (was "StopUsing SetLinks!)
linas opened this issue · 2 comments
Issue #1502 "Stop using SetLink for search results!" highlights a problem with the compositionality of the search and processing primitives in Atomese. Search results were (still are) delivered wrapped in a SetLink
. This presents challenges for the cleanup of search results, and various other core issues (See issues described on the SetLink wiki.)
Basically, it is difficult to create long processing pipelines, such as (cog-evaluate! (SequentialAnd ... (Put .... (State ... (Get ... (SequentialOr ... (Delete ... (Bind ...
with the top-level SequentialAnd
written to be tail recursive. Such long pipelines have been implemented (in the robot code, circa 2017) and they do work (see for example the tail-recusion demo and older copies of the https://github.com/opencog/ros-behavior-scripting repo) However, there are problems:
- There is difficulty in deleting the wrapper
SetLink
s when they are no longer needed. - Writing long chained pipelines seems more difficult than it should be.
- Awkward internal implementation of how SetLinks are handled when they are passed to
PutLink
and other assorted functions (e.g.PlusLink
when given a set of numbers to add.)
A replacement solution is needed. Desirable properties:
- Ability to get search results incrementally, as they come in, rather than getting one big blob at the end.
- Ability to run one query in parallel mode.
- Ability to handle processing pipelines on steaming data, e.g. from the
LinkStream
Value.
So basically, there are two issues being explored here, in tandem:
- How to make the query subsystem stream? (Well, it already does, but how should it be used effectively?)
- How to build a general streaming subsystem?
A wholly unexplored idea:
- Monads -- The PutLink and other misc links currently accept either individual atoms, or sets of atoms, as input. In the case of sets, the contained atoms are automatically unwrapped and processed. This suggests that the current use of
Setlink
is as a kind of sloppy monad and thus what we really need is a cleaned-up monad to hold multiple-value results. The precise way to do this is unclear
The best idea seems to be:
- Futures/Promises subsystem Compositionality can be achieved by creating a subsystem that supports futures and steams. Some partial work in this direction has been already taken, with the
FormulaStream
andFutureStream
and also withQueueValue
. Currently, there is no way to concatenateQueueValue
s. - Generic Publish-Subscribe system Streams/futures are one-to-one: one producer, one consumer. It seems like multiplexers should be provided as well, allowing multiple consumers/producers. The
QueueValue
already multiplexes, in a way. This should be provided as an add-on to the above. (Some old, obslete ideas were discussed in issue 1750: #1750.)
The following progress has been made:
- QueryLink returning QueueValue -- implemented, works, documented, unit tested. See #2571 How do do compositionality with this is unexplored.
The cogutils provides five thread-safe tools for building these things:
- concurrent_queue.h-- thread-safe FIFO
- concurrent_set.h -- thread-safe version of
std::set
. Note it provides deduplication (just likestd::set
does.) - concurrent_stack.h -- thread-safe LIFO
- async_method_caller.h -- asynchronous method caller. Manages a collection of threads that call some method on some data, at a later time, in some other thread than the current thread. Useful if the method is slow or might block. Avoids overflow; guarantees forward progress. Data is placed on a queue (FIFO) and is processed in order of arrival.
- async_buffer.h -- Same as above, except data is placed in a set. This does provide deduplication, if the same request is made multiple ties. It loses the ability to guarantee that really old data eventually gets handled. Data is processed in the same order as what
std::set
provides, i.e. instd::less_than
order.
Note that QueueValue
is built on top of concurrent_queue.h
Implementing compositionality requires finishing work on the $vau
-ization (fexpr-ization) of Atomese functions. See $vau
(aka fexper) on Wikipedia. Many or most functions now work like this; the grand exceptions include the PutLink
and most of the TruthValue subsystem.
Possible building block for monads or related:
- AtomSpaceNode -- This suggests treating the AtomSpace as a kind of mutable Link. Thus, the query would return an AtomSpace, layered on the main space, holding the search results. Partly implemented: See #2865 The
AtomSpacePtr
is now a kind ofValuePtr
. No one is using this for anything, just yet. There is no way to automatically project/collapse contents of derived atomspaces back into the main atomspace. (You'd have to do it by hand.) This could generalize: Atomspaces could be layered arbitrarily deep; passed around, hold temporary results that disappear when all references to the AtomSpace disappear. AtomSpaces could be stored in ordinary Links. - QueueValue -- This is currently used to hold search results from
QueryLink
andMeetLink
. The whole value flow subsystem is envisioned as being able to handle flows of ... values ... and not flows of atoms.
To explore these issues, and possible solutions, some demos are suggested. So far, we have the following demos and related issues:
- dot product -- Demo of how to compute the dot-product of two vectors, where the two vectors are sets of atoms returned by a searrch query. See dot-product.scm
- recursive query -- Demo of how a recursive query can be written. See recursive.scm
- #2752 -- Transient Atoms -- this describes another data flow issue.
- #2215 -- SatisfactionLink used to leave behind a
*-PatternGroundingKey-*
that told you how it was satisfied. But this crumbled away.
At this time, building on top of QueueValue
seems the most sensible thing to do. This could be formalized by providing an API within QueuueValue itself, instead of inheriting from what's in cogutils.
The following attempt was made:
- Anchor proposal -- This suggests that an AnchorLink can be specified in the query, and results are chained on with MemberLink, as they show up. This has been implemented and documented and unit-tested. See #2500 No one uses it. It was reverted earlier today, in eee7a61
The reason that this was reverted is because it didn't seem to be needed, and there seems to be a more generic solution: if one really needs stuff stuck to an AnchoreNode, then create a new kind of Value, that dequeues from the QueueValue, as results come in, and sticks them onto an AnchoreNode. This could be done for any kind of data stream, and not just for queries.
There has been some minor exploration of how compositionality works; it is demoed in the (still existing) example query.scm This example uses AnchorNodes, but without needing pull req #2500 to do it. It "works". It's even multi-threaded, so its "naturally" parallel. Is it clunky? I dunno. It posts results to the AtomSpace, ... but why? was this really needed?