mthom/scryer-prolog

Partial strings:

UWN opened this issue ยท 31 comments

UWN commented
?- X = [a,b,c|_], X = [a,b,c|Y].
X = [a, b, c | _1], Y = _1.                             %%% OK
?- partial_string("abc",X), X = [a,b,c|Y].
X = [a, b, c | _], Y = _.                                 %%% ???

Why this difference?

mthom commented

In the second instance, Y points to the end of a singly allocated string buffer, which is not part of the heap, so there is no heap cell to refer to.

UWN commented

That does not make sense to me. The end must be a real variable.

UWN commented

On the heap, Xs0 = [a,b,c|Xs], Xs0 = [_|Xs1] would look like (byte-by-byte)"

a <---- Xs0 points here
b <---- Xs1 points here
c
\0\ % zero to terminate the actual string
\0\ % further zeroes for padding, if needed
Xs_b1 % four-byte address on a 32bit system (eight on a 64)
Xs_b2
Xs_b3
Xs_b4

UWN commented

See #24

UWN commented
chars_partialstring(Chs, Cs0,Cs) :-
    must_be(chars, Chs),
    can_be(chars, Cs0),
    can_be(chars, Cs),
   /* now the hard part */
UWN commented

Currently I get:

?- partial_string(_,_).
error: exception thrown: error(syntax_error(inadmissible_query_term), repl/0)

But this is not a syntax error. And further, "inadmissible_query_term" is what is found and not what is expected. See N226 for a lengthy analysis of what errors are about. TL;DR: They signal primarily what is expected and not what is found. In the long run, you will find yourself going through that table of error classification quite frequently.

An existence_error or a permission_error is preferable in this situation. OK, you can also signal a system_error.

UWN commented

Do you think you will be able to continue work on partial strings?

mthom commented

I do. Things are sluggish right now because I'm preparing to leave in two weeks.

UWN commented

How do you intend to proceed? It seems in spite of this, you have still these _-variables. There are no such variables in Prolog. The underscore is just a unique variable name. But the variable itself has an identity

mthom commented

I will change the _-variables to something else, probably using a specialized convention to denote that the variable points to the end of a character buffer.

UWN commented

Here is a related problem:

?- partial_string("abc", L), length(L, N), N = 4.
false.

Expected: success.

UWN commented

Yet,

?- partial_string("abc", L), L = [_,_,_,d|_],length(L,N), N = 4.
L = [a,b,c,d|_], N = 4, _180 = a, _189 = b, _192 = c, _196 = _.
mthom commented

I don't understand why the partial string "abc" should have 4 elements.

UWN commented

Because partial_string("abc",L) should be equivalent (but not internally equivalent) to L = [a,b,c|_]. And now,

?- L = [a,b,c|_], length(L, N), N = 4.

succeeds as well.

UWN commented

In other words:

?- partial_string("abc",L), length(L, N).
L = [a,b,c|_], N = 3.
% Expected: More answers
?- L = [a,b,c|_], length(L, N).
L = [a,b,c], N = 3, _181 = [] ;
L = [a,b,c,_10], N = 4, _181 = [_10] ;
L = [a,b,c,_10,_14], N = 5, _181 = [_10,_14] ;
L = [a,b,c,_10,_14,_18], N = 6, _181 = [_10,_14,_18] ...

( In any case: very nice ... )

UWN commented

Alternatively:

ulrich@p0:~$ export RUST_BACKTRACE=1
ulrich@p0:~$ echo $RUST_BACKTRACE
1
ulrich@p0:~$ /opt/gupu/scryer-prolog/target/debug/scryer-prolog 
?- [user].
l([],0).
l([_|L], s(X)) :- l(L,X).
end_of_file.
?- partial_string("abc",L),l(L,N).
thread 'main' panicked at 'index out of bounds: the len is 18 but the index is 18', libcore/slice/mod.rs:2448:10
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
   1: std::sys_common::backtrace::print
   2: std::panicking::default_hook::{{closure}}
   3: std::panicking::default_hook
   4: std::panicking::rust_panic_with_hook
   5: std::panicking::continue_panic_fmt
   6: rust_begin_unwind
   7: core::panicking::panic_fmt
   8: core::panicking::panic_bounds_check
   9: <usize as core::slice::SliceIndex<[T]>>::index
             at libcore/slice/mod.rs:2448
  10: core::slice::<impl core::ops::index::Index<I> for [T]>::index
             at libcore/slice/mod.rs:2316
  11: <alloc::vec::Vec<T> as core::ops::index::Index<I>>::index
             at liballoc/vec.rs:1653
  12: <scryer_prolog::prolog::machine::heap::Heap as core::ops::index::Index<usize>>::index
             at src/prolog/machine/heap.rs:83
  13: scryer_prolog::prolog::machine::machine_state_impl::<impl scryer_prolog::prolog::machine::machine_state::MachineState>::execute_fact_instr
             at src/prolog/machine/machine_state_impl.rs:1745
  14: scryer_prolog::prolog::machine::<impl scryer_prolog::prolog::machine::machine_state::MachineState>::dispatch_instr
             at src/prolog/machine/mod.rs:705
  15: scryer_prolog::prolog::machine::<impl scryer_prolog::prolog::machine::machine_state::MachineState>::execute_instr
             at src/prolog/machine/mod.rs:727
  16: scryer_prolog::prolog::machine::<impl scryer_prolog::prolog::machine::machine_state::MachineState>::query_stepper
             at src/prolog/machine/mod.rs:819
  17: scryer_prolog::prolog::machine::Machine::run_query
             at src/prolog/machine/mod.rs:546
  18: scryer_prolog::prolog::machine::Machine::submit_query
             at src/prolog/machine/mod.rs:317
  19: scryer_prolog::prolog::machine::compile::compile_term
             at src/prolog/machine/compile.rs:201
  20: scryer_prolog::prolog::machine::Machine::handle_toplevel_command
             at src/prolog/machine/mod.rs:395
  21: scryer_prolog::prolog::machine::Machine::run_query
             at src/prolog/machine/mod.rs:552
  22: scryer_prolog::prolog::machine::Machine::run_toplevel
             at src/prolog/machine/mod.rs:212
  23: scryer_prolog::main
             at src/main.rs:28
  24: std::rt::lang_start::{{closure}}
             at libstd/rt.rs:74
  25: std::panicking::try::do_call
  26: __rust_maybe_catch_panic
  27: std::rt::lang_start_internal
  28: std::rt::lang_start
             at libstd/rt.rs:74
  29: main
  30: __libc_start_main
  31: <unknown>
ulrich@p0:~$ 
UWN commented

@triska, can you reproduce this?

Yes, I can reproduce this, but unfortunately I cannot help with this issue!

mthom commented

I will have a look tomorrow.

mthom commented

In other words:

?- partial_string("abc",L), length(L, N).
L = [a,b,c|], N = 3.
% Expected: More answers
?- L = [a,b,c|
], length(L, N).
L = [a,b,c], N = 3, _181 = [] ;
L = [a,b,c,_10], N = 4, _181 = [_10] ;
L = [a,b,c,_10,_14], N = 5, _181 = [_10,_14] ;
L = [a,b,c,_10,_14,_18], N = 6, _181 = [_10,_14,_18] ...

( In any case: very nice ... )

I'm not sure how to make sense of that. Internally, a partial string is a buffer of UTF-8 characters. If the buffer is expanded to accommodate new characters, how would that work with respect to unification? The variables beyond c could only be characters, supposing that they are indices into the partial string (or character codes if double_quotes is set to codes). Unless the partial string is continued as an ordinary list somehow, but then, it ceases to be a partial string, doesn't it?

UWN commented

What about talking about it?

mthom commented

Sure, we can work out a time over email.

mthom commented

I'm about ready to take a second crack at this from Unsafe Rust. You gave this example layout a few months back:

a <---- Xs0 points here
b <---- Xs1 points here
c
\0\ % zero to terminate the actual string
\0\ % further zeroes for padding, if needed
Xs_b1 % four-byte address on a 32bit system (eight on a 64)
Xs_b2
Xs_b3
Xs_b4

To be sure I understand the example properly, the final four/eight bytes are self-referential until bound, correct? This is how unbound variables are implemented elsewhere.

UWN commented

Yes. Let's consider a built_in chars_partialstring(Chars, Xs0, Xs):

?- partial_string([a,b,c], Xs0, Xs).

creates above term. Please keep in mind that

?- partial_string([a,b,c], Xs0, []).

might now have just the code for [ ] at the place of Xs_b1 ... Xs_b4/8

mthom commented

Alright. And the pointer at Xs could also point into the WAM heap or stack?

UWN commented

Xs in this example resides on the heap and therefore can only point to the heap. It can never point to the stack, since only stackelements (local variables) can point to other stackelements.

edit
should have read: since only stackelements and (argument and temporary) registers can point to other stackelements.

And thus, strictly speaking, also choicepoints point to other stackelements

UWN commented

Here is an extreme example, "ab\0\d". This is a four-element list with '\0\' at the third position. How can we represent zero within a zero-terminated UTF-string? Well, we can't!

?- chars_partiallist("ab\0\de", Xs0,Xs).

And these are the resulting terms on the heap of a 64 bit machine:

0'a <------------ Xs0
0'b
0'\0\ to terminate the string
0'\0\ up to 7 for padding, if needed
[8 bytes list pointer to next cell]
[8 bytes for atom '\0\'] this is the car of the list
[8 bytes partial list pointer to next byte] this is the cdr of the list
0'd
0'e
0'\0\ to terminate the string
0'\0\ up to 7 for padding, if needed
[8 bytes self-referential variable] <------- Xs

In another notation we have:

    Xs0 = [a,b|_I1],            % partial string
    _I1 = '.'('\0\',_I2),       % normal dotted pair
    _I2 = [d,e|Xs]              % partial string

It should be noted that zeroes in strings are extremely rare, so it is not worth doing any extra optimization for them.

UWN commented

Another request that calls for this feature:

https://news.ycombinator.com/item?id=21937272

mthom commented

I suppose if we have an example a bit more general than your last:

?- chars_partiallist([a,b,X,d,e], Xs0,Xs).

with the uninstantiated X in place of the null character, something similar would occur in the heap:

0'a <------------ Xs0
0'b
0'\0\ to terminate the string
0'\0\ up to 7 for padding, if needed
[8 bytes list pointer to next cell]
[8 byte self-referential pointer to X] this is the car of the list
[8 bytes partial list pointer to next byte] this is the cdr of the list
0'd
0'e
0'\0\ to terminate the string
0'\0\ up to 7 for padding, if needed
[8 bytes self-referential variable] <------- Xs

UWN commented

See #95 (comment)
The first argument must be a list of characters. And a variable there will thus produce an instantiation error.