Rc variant and compressing pointer size

Question

Rc variant and compressing pointer size

amarmaduke opened this issue 2 years ago · 15 comments

Hi,

I've been expermenting with hconsing with NBE for lambda calculus. This library is the primary inspiration, which I've mostly copied. Originally, I was using the library, but ran into some performance problems:

I believe the uid is not needed, the internal NonNull pointer data in the Rc is enough to act as a unique reference. As every Rc is the spoke on a fuzzy ball surrounding a RcBox, even if the location of the RcBox were to somehow move, it must move in a way such that all constructed Rc's remain sound. Of course, the Weak stored in the HConsign has the same properties, because it references the same RcBox. Moreover, once the pointer is reclaimed, that is the exact condition in which the unique id would be reusable because the original Weak would fail to upgrade.
The usage of Arc with no ability to switch to an Rc. In my case, there is no point in being multithreaded. Parallelization in evaluating lambda terms is not very good unless you take an optimal lambda evaluator route.

"Fixing" those two problems were about a 2-3x speedup for my benchmarks, especially because lambda terms are explosive in terms of size, so halving the pointer size ends up being a useful win.

Ideally, I'd like these changes to make it into this library, because then I can just pull it from cargo :)

Unrelated, I would prefer if the pointer types where called Hc and Ahc, but this is a minor thing that is not important to me, I can just rename the types in my project.

If you are interested I could draft a PR with the above changes.

Answer 1 · 2023-02-13T17:24:29.000Z

I would be interested in seeing this as a pr, even if just to add this to my own fork. Do you have a benchmark you can add to the repository to show the speedup?

Answer 2 · 2023-02-15T07:27:23.000Z

Thank you (both) for your PR/issues ❤️
I will finally have time to spend on hashconsing this week and go over your feedback and contributions, just give me a day or two

Answer 3 · 2023-02-18T10:40:40.000Z

The usage of Arc with no ability to switch to an Rc. In my case, there is no point in being multithreaded. Parallelization in evaluating lambda terms is not very good unless you take an optimal lambda evaluator route.

So do you not use a static factory? If you do, what do you store in the map?

Other than that, I think @Pat-Lafon's PR on having a derive-macro could help by having an attribute for that eventually. What do you think?

Answer 4 · 2023-02-18T10:45:16.000Z

Regarding the uid, I see your point. Could you provide a link to your version so that I can take a look?

Answer 5 · 2023-02-18T17:20:57.000Z

Here is a gist of what I'm currently using: https://gist.github.com/amarmaduke/d5abb2cc8f7f1b769f4af661bf701a32

I'm not using a static store. I pass around a container of all application state via mutable reference, the hashcons table is stored there.

Another ask (that I'm willing to contribute): into_raw and from_raw mirroring the Rc variants. Worse case these leak memory by not updating the strong-count, however, it's useful for me because I need to emulate a call stack (Rust is not great with recursion) and I would like to do it in a type-agonistic way.

Answer 6 · 2023-02-18T18:54:21.000Z

Thank you for the link, I will take a look.

I'm not using a static store. I pass around a container of all application state via mutable reference, the hashcons table is stored there.

Yeah I'm not surprised, that's the problem with Rc: you can't put them in a static storage. So we could have a derive-attribute that would exclude static-factory stuff.

Answer 7 · 2023-02-27T08:48:50.000Z

@amarmaduke I'm a bit concerned with your hash function.

It seems to me like like whenever a term is moved by memory management its hash will change. It will stay equal to itself, but if its placement in the factory relies on a previous value of the pointer then you won't find it since the hash has change, it might look in the wrong bucket.

Does this make sense or am I speaking nonsense?

Answer 8 · 2023-02-27T19:57:36.000Z

This seems like a problem, but it hasn't been in practice for me. Here is my theory as to why that is:

When we construct a new Hc we save a WeakHc in the table. However, now we have two pointers pointing to the same RcBox. Suppose that someone, compiler or otherwise, wanted to move the RcBox via a Hc, how would they correctly update the WeakHc pointer? The RcBox only holds data and strong/weak counts. If you want to move the location of it you would need to remember all referencing objects and update them too. Thus, I would argue that the WeakHc is "pinning" the pointer.

When we upgrade a WeakHc to a Hc, we (can, I updated to code to do it this way to make things more obvious) upgrade through a &WeakHc, but per the pin documentation, a const reference cannot move. This means the underlying pointer of the WeakHc remains the same. I did some experimenting to check if, after upgrading, the pointer could ever have updated, and in my benchmarks, which does a lot of cloning and dropping of Hcs it never changed.

The crate pin-weak gives an example of where generating a Weak from a Pin<Rc<T>> can cause undefined behavior. In other words, I think your worries are very justified, and the API surface area would have to treat Hc contents as if it was pinned, but I don't think there is a use case where you end up moving the data anyway, the whole point is you want the table to record a reference so you can always generate Hcs as long as one exists in the wild.

When the strong count falls to 0, then the Weak could theoretically move, but I don't find this to be a problem, either, because there are no Hcs to break the hash of.

I did some more small benchmarking on various encodings for Hc, and this is what I found:

Hc(u64, Rc) ~43s
Hc(Rc<(u64, T)>) ~18s
Hc(Pin<Rc>) ~14s
Hc(Rc) ~12s

However, the Pin<Rc> case was pretty variable for some reason (from 12s to 16s), not sure why. The other benchmarks after a warm-up run did not have nearly as much deviation.

To me it seems like there is a tradeoff here between two possibilities: 1. Rc<(u64, T)> and 2. Rc<T>. For the first case, you don't need to worry about satisfying an implicit pin invariant as the API surface grows. Certainly, this makes for better code maintainability. In the second case, you win some performance but need to maintain that invariant. For example, any function that leaks the inner Rc<T> or Weak<T> must be private or marked unsafe (or both) since anyone could start using mem::replace or what have you to break the pin invariant.

Answer 9 · 2023-02-28T10:17:58.000Z

Thank you for this great answer. I'm not an expert in pinning, I'm going to research this a bit on my end to convince myself.

Answer 10 · 2023-02-28T10:19:43.000Z

Short term, how about this: we can have a branch on this repo that's the same but without the u64 uid.

That way you can use this branch instead of your fork, and we can run experiments on a WIP version to make sure there's no problem.

Answer 11 · 2023-02-28T10:28:25.000Z

Alternatively, we could just feature-gate no-uid in the main branch and say it's experimental somewhere as long as we're not sure it's safe

Answer 12 · 2023-02-28T10:43:26.000Z

It's starting to sound like you're right:

https://users.rust-lang.org/t/can-a-rc-move-location-behind-my-back/28828/2

Since hashconsing will not let you mutate terms anyways, I think addresses must be stable like you said. Inner mutability would potentially break this, but it breaks hashconsing altogether anyways. That should probably be documented somewhere actually >_<

Answer 13 · 2023-02-28T10:48:21.000Z

By the way @amarmaduke, it sounds like we will have quite a few configuration options soon, so I think @Pat-Lafon's PR #14 is the way to go. We would have a derive macro where you could (de)activate Arc/Rc, static factory and so on. It would also probably generate the HConsed type directly in your crate so that you can decide how it's called and provide implementations.

Would that work for you?

Answer 14 · 2023-02-28T18:17:09.000Z

Yeah, that would likely work for me, worst case it is future-proof to more knobs :)

Answer 15 · 2023-02-28T22:19:13.000Z

My thoughts exactly 😸

Let me take care of the proc-macro PR first then