opencog/atomspace

Compostionality of Atomese processing stages (was "StopUsing SetLinks!)

linas opened this issue · 2 comments

linas commented

Issue #1502 "Stop using SetLink for search results!" highlights a problem with the compositionality of the search and processing primitives in Atomese. Search results were (still are) delivered wrapped in a SetLink. This presents challenges for the cleanup of search results, and various other core issues (See issues described on the SetLink wiki.)

Basically, it is difficult to create long processing pipelines, such as (cog-evaluate! (SequentialAnd ... (Put .... (State ... (Get ... (SequentialOr ... (Delete ... (Bind ... with the top-level SequentialAnd written to be tail recursive. Such long pipelines have been implemented (in the robot code, circa 2017) and they do work (see for example the tail-recusion demo and older copies of the https://github.com/opencog/ros-behavior-scripting repo) However, there are problems:

  • There is difficulty in deleting the wrapper SetLinks when they are no longer needed.
  • Writing long chained pipelines seems more difficult than it should be.
  • Awkward internal implementation of how SetLinks are handled when they are passed to PutLink and other assorted functions (e.g. PlusLink when given a set of numbers to add.)

A replacement solution is needed. Desirable properties:

  • Ability to get search results incrementally, as they come in, rather than getting one big blob at the end.
  • Ability to run one query in parallel mode.
  • Ability to handle processing pipelines on steaming data, e.g. from the LinkStream Value.

So basically, there are two issues being explored here, in tandem:

  • How to make the query subsystem stream? (Well, it already does, but how should it be used effectively?)
  • How to build a general streaming subsystem?

A wholly unexplored idea:

  • Monads -- The PutLink and other misc links currently accept either individual atoms, or sets of atoms, as input. In the case of sets, the contained atoms are automatically unwrapped and processed. This suggests that the current use of Setlink is as a kind of sloppy monad and thus what we really need is a cleaned-up monad to hold multiple-value results. The precise way to do this is unclear

The best idea seems to be:

  • Futures/Promises subsystem Compositionality can be achieved by creating a subsystem that supports futures and steams. Some partial work in this direction has been already taken, with the FormulaStream and FutureStream and also with QueueValue. Currently, there is no way to concatenate QueueValues.
  • Generic Publish-Subscribe system Streams/futures are one-to-one: one producer, one consumer. It seems like multiplexers should be provided as well, allowing multiple consumers/producers. The QueueValue already multiplexes, in a way. This should be provided as an add-on to the above. (Some old, obslete ideas were discussed in issue 1750: #1750.)

The following progress has been made:

  • QueryLink returning QueueValue -- implemented, works, documented, unit tested. See #2571 How do do compositionality with this is unexplored.

The cogutils provides five thread-safe tools for building these things:

  • concurrent_queue.h-- thread-safe FIFO
  • concurrent_set.h -- thread-safe version of std::set. Note it provides deduplication (just like std::set does.)
  • concurrent_stack.h -- thread-safe LIFO
  • async_method_caller.h -- asynchronous method caller. Manages a collection of threads that call some method on some data, at a later time, in some other thread than the current thread. Useful if the method is slow or might block. Avoids overflow; guarantees forward progress. Data is placed on a queue (FIFO) and is processed in order of arrival.
  • async_buffer.h -- Same as above, except data is placed in a set. This does provide deduplication, if the same request is made multiple ties. It loses the ability to guarantee that really old data eventually gets handled. Data is processed in the same order as what std::set provides, i.e. in std::less_than order.

Note that QueueValue is built on top of concurrent_queue.h

Implementing compositionality requires finishing work on the $vau-ization (fexpr-ization) of Atomese functions. See $vau (aka fexper) on Wikipedia. Many or most functions now work like this; the grand exceptions include the PutLink and most of the TruthValue subsystem.

Possible building block for monads or related:

  • AtomSpaceNode -- This suggests treating the AtomSpace as a kind of mutable Link. Thus, the query would return an AtomSpace, layered on the main space, holding the search results. Partly implemented: See #2865 The AtomSpacePtr is now a kind of ValuePtr. No one is using this for anything, just yet. There is no way to automatically project/collapse contents of derived atomspaces back into the main atomspace. (You'd have to do it by hand.) This could generalize: Atomspaces could be layered arbitrarily deep; passed around, hold temporary results that disappear when all references to the AtomSpace disappear. AtomSpaces could be stored in ordinary Links.
  • QueueValue -- This is currently used to hold search results from QueryLink and MeetLink. The whole value flow subsystem is envisioned as being able to handle flows of ... values ... and not flows of atoms.

To explore these issues, and possible solutions, some demos are suggested. So far, we have the following demos and related issues:

  • dot product -- Demo of how to compute the dot-product of two vectors, where the two vectors are sets of atoms returned by a searrch query. See dot-product.scm
  • recursive query -- Demo of how a recursive query can be written. See recursive.scm
  • #2752 -- Transient Atoms -- this describes another data flow issue.
  • #2215 -- SatisfactionLink used to leave behind a *-PatternGroundingKey-* that told you how it was satisfied. But this crumbled away.
linas commented

At this time, building on top of QueueValue seems the most sensible thing to do. This could be formalized by providing an API within QueuueValue itself, instead of inheriting from what's in cogutils.

linas commented

The following attempt was made:

  • Anchor proposal -- This suggests that an AnchorLink can be specified in the query, and results are chained on with MemberLink, as they show up. This has been implemented and documented and unit-tested. See #2500 No one uses it. It was reverted earlier today, in eee7a61

The reason that this was reverted is because it didn't seem to be needed, and there seems to be a more generic solution: if one really needs stuff stuck to an AnchoreNode, then create a new kind of Value, that dequeues from the QueueValue, as results come in, and sticks them onto an AnchoreNode. This could be done for any kind of data stream, and not just for queries.

There has been some minor exploration of how compositionality works; it is demoed in the (still existing) example query.scm This example uses AnchorNodes, but without needing pull req #2500 to do it. It "works". It's even multi-threaded, so its "naturally" parallel. Is it clunky? I dunno. It posts results to the AtomSpace, ... but why? was this really needed?