Chez Scheme: performance of vector reads

Question

Chez Scheme: performance of vector reads

NHALX opened this issue 6 years ago · 7 comments

I realize this isn't Shen-scheme's fault, since it just use chez's vectors directly, but the performance of vector reads seems quite bad. In every situation I've come across where an array would normally outperform a linked list, the opposite has been true.

Answer 1 · 2018-09-02T19:38:47.000Z

Do you have any examples of this? are you using <-address or <-vector ?

<-address compiles into a call to Scheme's vector-ref. Other than the check it makes to ensure the object is a vector object and to avoid making out of bound memory accesses it shouldn't have much overhead (if those checks happen also depends on how aggressively the Chez compiler is set to compile, for now Shen/Scheme uses the safest option).

<-vector on the other hand is implemented in Shen, and in addition to the above, it checks that the index argument is not equal to 0, and that the object resulting from accessing the vector is not the fail object.

On the other hand, the only overhead lists have is when checking if the object is a cons object, there will be more indirection and memory accesses when following the list, but if the list is not huge and all its fragments were allocated together, then it will not have that much more overhead in terms of computation (I'm not familiar with the internals but I would guess that Scheme being a list-heavy language, Chez spends some effort in allocating nodes together in memory when possible).

Answer 2 · 2018-09-04T00:38:53.000Z

Looks like the main problem was actually vector construction rather than reads (probably a kernel issue more than anything then).

One could argue that vectors aren't supposed to be used in a functional update style, but I think Shen tuples are implemented over vectors.

vec-read: run time: 5.890625 secs
vec-read+build@v: run time: 50.671875 secs
vec-read+build: run time: 27.34375 secs
list-hd: run time: 4.984375 secs
list-hd+cons: run time: 6.171875 secs
tuple-fst+build: run time: 21.75 secs
control: run time: 3.75 secs

(package null []
 (define r
   V 1000000000 -> ok
   V N -> (r V (+ N (<-address V 1))))

 (define rb1
   V 1000000000 -> ok
   V N -> (rb1 (@v 1 <>)
               (+ N (<-address V 1))))

 (define rb2
   V 1000000000 -> ok
   V N -> (rb2 (address-> (vector 1) 1 1)
               (+ N (<-address V 1))))

 (define c
   C 1000000000 -> ok
   C N -> (c C (+ N (hd C))))

 (define c/b
   C 1000000000 -> ok
   C N -> (c/b [1] (+ N (hd C))))

 (define t/b
   C 1000000000 -> ok
   C N -> (t/b (@p 1 []) (+ N (fst C))))

 (define control
   C 1000000000 -> ok
   C N -> (control C (+ N 1))))

(do (scm.collect 4) (time (r (@v 1 <>) 1)))
(do (scm.collect 4) (time (rb1 (@v 1 <>) 1)))
(do (scm.collect 4) (time (rb2 (@v 1 <>) 1)))
(do (scm.collect 4) (time (c  [1] 1)))
(do (scm.collect 4) (time (c/b  [1] 1)))
(do (scm.collect 4) (time (t/b (@p 1 []) 1)))
(do (scm.collect 4) (time (control unit 1)))

Answer 3 · 2018-09-04T04:31:08.000Z

Ok, I see what you mean. You will face the same problem in any port, this is not a problem with Chez.

Looks like the main problem was actually vector construction rather than reads (probably a kernel issue more than anything then).

One could argue that vectors aren't supposed to be used in a functional update style, but I think Shen tuples are implemented over vectors.

That is correct. Vectors have overhead in Scheme (and most platforms) already because they have to store their capacity and validate accesses, then Shen duplicates this overhead on top (when using vector instead of absvector) because it needs to keep track of that too and in addition, fill every slot with the (fail) object. To make things worse a construct like @v has to instantiate and copy vectors each time (either as a deconstructor in pattern matching or as a constructor).

I tried a version of rb2 but using (absvector 2) to construct the vector and it takes half the time (34.66s vs 16.04s -- because the Shen-side initialisation doesn't happen in that case).

I think I can make vector work almost as fast as absvector by overriding it, I will give that a try, but it is not really going to solve the problem you see here.

Tuples can probably be optimised too by overriding the tuple operations to work on some datatype defined natively in Scheme, one that doesn't have to keep the capacity as part of the object like vectors do, and that doesn't have to validate indexes on each access.

Small rant about `@v`

I don't like @v at all (neither as constructor and deconstructor), at least not in its current form, it is too wasteful and slow, and vectors being mutable there is no way around it because sharing of the underlying data is not possible. @s has the same problem, but since strings are immutable it is possible to share the underlying data and use heuristics to make it perform better (or even use a different representation for strings that fits the pattern better).

Vectors could have "views", so that @v works efficiently with that functional update (and decomposition) pattern, but that would require the kernel and spec to change.

I haven't done anything in that direction because I don't want the OS Kernel to be incompatible with the SP one in any way if I can avoid it, but maybe it is something that could be tried as an extension.

Answer 4 · 2018-09-04T18:41:22.000Z

I replaced Shen's implementation of vector with a more performant one in f4ee005

This should bring the speed of vector constructions closer to the one of absvectors.

Your rb2 function now runs in 19.55s on my notebook (down from 34.66s before the change).

Answer 5 · 2018-09-05T12:57:34.000Z

Added a faster version of @p constructor in b010cda, time went down from 26.27s to 12.79s for the t/p function.

Answer 6 · 2018-09-05T14:26:00.000Z

Those are pretty decent speed improvements.

Too bad tuples can't be cons cells, since Shen uses the first element of a vector for type disambiguation. I wish the system handled type metadata differently.

Answer 7 · 2019-09-01T19:07:49.000Z

Closing this one because there isn't much else I can do here.

@v (and @s) could be optimized by the compiler to construct the vector at once bypassing the intermediary steps, but can only be done safely when the contents of all the inputs are known, so I don't think it is worth the effort.

Small rant about @v

Small rant about `@v`