phillord/horned-owl

RFC: Significant Model change

phillord opened this issue · 26 comments

I am trialing out a significant change to the data model for the next release. The main idea is to move IRI<Rc<str>> to being IRI<Borrow<str>>.

This addresses a long standing problem; horned is currently only usuable in one thread, because of the Rc. This change means that IRI can wrap Rc but also Arc or even 'str.

The big disadvantage is that it means that I have to make all most every method generic. This is turning out to be a significant PITA because it breaks lots of the macros taht I have used both in model and in the IO libraries. Model I have managed to fix. The IO libraries, I think, are too hard so I will either have to unwind all the macros or turn them into proceedural ones. Either way, it's a big change and I'd like to get it right.

I am still debating whether this should be IRI<Borrow<str>> or the weaker IRI<AsRef<str>>. I think that the core model does not require Borrow<str>, but any form of caching does. But the tighter restriction could be added where needed.

Currently, Build also includes an Rc which means it cannot be used cross thread. Although I put the Rc into make build easy to clone and share, I actually seem to have forgotten about it, because all the IO libraries use &Build rather than just cloning it. So, rather than make this generic, I have simply removed the Rc although maintained the interior mutability. I think this makes sense. Making Build generic over Arc would be a pain and the multi-threaded the cost of co-ordination would probably outweigh the benefit of caching.

As I don't have a working version yet (because of all the IO changes) I've added an outline version of IRI and Build here.

https://github.com/phillord/horned-owl/tree/scratch/borrow-model

@althonos @jannahastings Thoughts welcome

I think this is a quite common problem in Rust, using String for storing something, then Rc<str> to reuse memory, then thinking how to allow multithreading with Arc. There are two strategies I can think of, I'm more leaning towards the former but you want to go with the latter. Ultimately this is a design choice I guess. But my questions are ultimately:

  • Who needs control over the storage type: you, because of the multithreaded parser, or the end user?
  • Are multiple storage types going to coexist in a library: will there be developers having
  • If so, how will the two co-operate? Can you convert from a SetOntology<Rc<str>> to a SetOntology<Arc<str>> (kinda, but you lose shared strings, or you need some kind of a map to convert from one to the other)

Regarding the approaches:

Feature-based

Most approaches I've seen take this path, using a dedicated feature for enabling the multithreading code, turning the interior Rc with Arc.

Pros:

  • You don't need to rewrite everything
  • Users not interested in multi-threading can still get the Rc implementation
  • Downstream users can also feature-gate

Cons:

  • You can only have a single storage type in a final compiled artifact
  • Not as much flexibility since the library developers must manually add new storage types (often limited to Rc<str> and Arc<str> but you may want to add more, e.g. ArcStr)

Generic-based

Some more feature-complete libraries chose this:

Pros:

  • Gives more flexibility in choosing the storage type (e.g. a user could choose to use smartstring without you having to support it explicitly)
  • Allows multiple storage types to coexist (à la nalgebra)

Cons:

  • Harder to use, unless a default type is specified
  • If a default type is specified and users don't change it, there's virtually no advantage over the feature-based option

@althonos That's interesting. I had not thought of the possibility of using features. Indeed that would be a very simple change -- just a couple of lines for IRI and Build based on devel at the moment, and just one line if I removed the Rc from Build which I think I should do anyway.

You are absolutely correct, the generic approach would be harder to use if for no other reason than all the generic bounds that require potentially long trait bounds. We can't type alias these to reduce typing (although there does seem to be a workaround)

I think this would likely mostly be for the library users. I guess people would either want Arc or Rc and not mix the two at any point. So, probably the feature would work for most use cases. I can think of one little use case. Tests would be a lot easier to write with 'str rather than Rc, but that hardly justifies the cost .In terms of converting between SetOntology<Rc<str>> and SetOntology<Arc<str>> I think would be possible using the conversion methods that I have already got, or with a small extension of them (i.e. turn into an iterator and do one axiom a time).

But, I don't know for sure. A multi-threaded parser for an XML closure would be very possible. For RDF, rather more of a pain to do but still possible. And I am aware of going to the simple approach to find out it was too simple later.

For my use case of bridging to Python, the option to have Arc rather than Rc everywhere, controllable by a flag, would be fantastic.

@jannahastings I know! That's my main use case.

I've never used the python binding tools, but if you have time to wrap this tiny version of the library up in python and see if it works that would be really great. Last thing I want it to move the whole lot, then find it fails for python.

My guess is for you that either the feature or generics would work equally well -- with the later you'd just need to change how you instantiate the Build to get an arc rather than rc. With the former, you'd fiddle with the build scripts. But it would be good to know for sure.

OK! I'll give it a try and let you know ... :-)

@phillord sorry for the delay!

I've now tried to implement this. As might be expected, I can't just use IRI<Borrow<str>> as the type for the generic IRI in the Python library, it complains that it wants a sized type at compile time. However, using IRI<Arc<str>> does seem to be accepted by the #pyclass annotation macro, at least at compile time. Beyond that, I can't really test the use in the library in context as that would require the transformation of the IRIs in horned-owl's model.

https://github.com/phillord/horned-owl/tree/scratch/iri-with-for-iri-trait

I have now implemented this. It is a rather depressingly massive diff (around 2000 changes). And it rather reduces the ergonomics of the library. (Although that is partly because of a problem in the design of the library that I didn't realise was there, so needs fixing anyway).

I will post more on this later, once I have had time to digest, but for now, I thought it was better to get the port out for comments.

@jannahastings Indeed, I would not have expected the generic to cross the python boundary. But it's really good to know that IRI<Arc<str>> works as expected.

At the moment, I have moved IRI but OntologyIndex is still defined in terms of Rc<AnnotatedAxiom> . I haven't worked out how to do that one yet.

@althonos still thinking about the possibility of using features. Do you have any examples of libraries that have done this?

And benches in place which means I can sanely compare Rc vs Arc. Short answer, the library is 10-20% faster with Rc.

test big_tree_arc       ... bench: 165,599,993 ns/iter (+/- 8,014,632)
test big_tree_rc        ... bench: 144,422,594 ns/iter (+/- 16,506,717)
test big_tree_string    ... bench: 230,809,716 ns/iter (+/- 32,193,460)

Interesting. The build.rs malarky in im-rs worries me a bit. It's evident that this is much simpler.

It wouldn't allow stuff like

fn big_tree_rc(bench: &mut Bencher) {
bench.iter(|| {
bigger_tree(Build::new_rc())
})
}

But then how often am I going to need to do this, other than in benching?

The build.rs in im-rs is because they are really doing some weird thing by releasing two versions of the library, one with reference counting (im-rc), one with direct references (im), while keeping the code shared between the two, hence the build.rs. Normally it's not needed for the feature approach to work.

To bench fastobo (or generally, the same library with different features) I use cargo benchcmp:

$ cargo bench --bench ms --no-default-features > sequential.bench
$ cargo bench --bench ms --features threading  > parallel.bench
$ cargo benchcmp sequential.bench parallel.bench
 name                     sequential.bench ns/iter  parallel.bench ns/iter  diff ns/iter  diff %  speedup 
 bench_sequential         53,699,698 (13 MB/s)      52,069,054 (14 MB/s)      -1,630,644  -3.04%   x 1.03

And the implementation is now complete including re-writing OntologyIndex.

https://github.com/phillord/horned-owl/tree/scratch/iri-with-for-iri-trait

Conclusions are: it's a very big change; that currently it reduces the ergonomics of the library somewhat, although that can be addressed to some extent with more type aliases and some more tweaking and regularity.

I haven't tried the feature solution but it would obviously be a lot simpler.

So the question has to be is the complexity worth it, both in terms of the changes now (which I've already done!) and future implementation. There is no doubt that the library is more functional with this, but is that functionality valuable enough. Something to reflect on for a while I guess.

And benchmarks which, perhaps surprisingly, actually make sense.

SetOntology (i.e. with no index, using Rc<str>) is fastest, then SetIndex (i.e. with an index, using two layers of Rc indirection), then SetOntology with Arc, then SetIIndex with Arc. Performance difference is 25% between best and worst. So significant, but not massive.

test a_thousand_classes        ... bench:   1,372,816 ns/iter (+/- 89,291)
test big_tree                  ... bench:  11,503,391 ns/iter (+/- 829,396)
test bigger_tree_arc           ... bench: 165,433,996 ns/iter (+/- 22,100,913)
test bigger_tree_rc            ... bench: 141,900,835 ns/iter (+/- 14,939,904)
test bigger_tree_set_index_arc ... bench: 179,722,862 ns/iter (+/- 10,148,303)
test bigger_tree_set_index_rc  ... bench: 159,411,837 ns/iter (+/- 17,930,706)
test bigger_tree_string        ... bench: 218,724,476 ns/iter (+/- 10,902,210)
test io_read                   ... bench:     275,290 ns/iter (+/- 15,189)

One thing that I hadn't thought of it is that this also means I can do this:

let o:OneIndexedOntology<Rc<str>, AnnotatedAxiom<Rc<str>>,_> = OneIndexedOntology::new(SetIndex::new());

Currently I have a SetOntology and a SetIndex. The only point of the former is to avoid the use of an Rc at all (which is a bit pointless in any single indexed ontology. Basically the same code twice. But with these generics in place, I can simply remove the Rc and store AnnotatedAxiom directly.

The benchmarks for this are below.

test bigger_tree_rc                               ... bench: 139,750,703 ns/iter (+/- 3,751,951)
test bigger_tree_set_index_annotated_axiom_rc_iri ... bench: 139,628,828 ns/iter (+/- 4,015,430)

Basically identical.

https://github.com/phillord/horned-owl/tree/scratch/rc-arc-conditional

And the really rather simpler conditional compilation solution.

I am still in the process of implementing the generics approach all the way through, especially the IRI-based index that the Python library needs. This looks much simpler! I will get back to you soon once I have both running.

There is no doubt that the conditional compile is more simple. It's obviously also less functional and I worry that if I go for the easy option I might need to re-implement it again in the future. I still haven't come up with a great use case for threading though (other than your python example), at least not one that requires the ability to switch it on and off. Threaded parsing might be an example, I guess.

Thanks for testing. The ergonomics of the generic version can for definately be improved; so please do write down any pain point!

@phillord I've been working on the implementation of the Python library around the feature-driven version, and noticed that the RDF reader and writer still use Rc everywhere, (not RcT) which means that I am getting asked to convert between Arc and Rc during processing. I'm not sure if this is what you intended? What do you think about trying to schedule a short meeting to discuss this and other questions in the next weeks?

@jannahastings Yep, looks like my "got conditional compilation working" was distinctly preliminary. Worse, Build in model.rs suffers from the same issue. This seems one of the issues with the conditional compilation solution -- it's not always so obvious what is happening in the code.

It's going to take me a few days to fix this (although PRs are of course welcome!).

Be happy to catch up in person, but github issues are probably not the best place to fix meeting schedules. Will email!

Wasn't too bad. Can you try again?

I am coming to the conclusion that the generic solution is probably the way forward. Despite being a PITA to implement, it's not too bad in usage, and I have already found it useful both internally to horned (with SetOntology) and also in testing (esp Build which pointlessly records IRIs but otherwise nothing).

If there are no shop stoppers from the python side, then this is the way I will go. @althonos I still agree with your point that most uses of the library will use either Rc<str> or Arc<str> and otherwise never convert between them. But conditional compliation just feels too messy.

Closed by d568520

Congrats @phillord on implementing this! The proposal for conditional compilation was intended to speed-up the switch from Rc to Arc, but with a fully generic implementation I agree that it's even easier to test and to integrate elsewhere. I'll update horned-visit and horned-functional when I get some time 😃

When I was half way through my second attempt, I did think of giving up and going conditional. But overall, I think it is better this way.

I am afraid I didn't know about horned-visit or I would have stolen it, rather than writing my own. We should do a compare and contrast and pick one. I wouldn't want horned-owl getting too big but there is no reason either to have stuff in lots of independent crates just because. I pulled in Janna's iri-mapped index for this reason and needed a visitor to implement that.