google/rust_icu

Thread safety

nkovacs opened this issue · 5 comments

According to https://unicode-org.github.io/icu/userguide/icu/design.html#thread-safe-const-apis, using the methods that take a const this pointer is thread safe.

In my case I'd like to use UCollator across multiple threads, which should be safe since ucol_strcoll takes a const pointer, but I can't because it's not Sync+Send. I've looked at the source of rust_icu_ucol, and all the methods that take a non-const pointer in C are &mut self in Rust, except for set_attribute, which is generated by the generalized_fallible_setter macro. If generalized_fallible_setter is fixed to generate &mut self, would it be safe to mark UCollator (and other service objects, assuming the same is true for their methods) as Sync (edit: and Send)?

I think it definitely makes conceptual sense to provide Send if possible. For that to happen we'd also need to test the change in a multithreaded context.

Similar issue here - having UCollator !Sync !Send is bit limiting, if it is thread safe according to ICU doc (question is from which version).
Most efficient for me would be to share it across threads, rather then recreating in each thread.
!Sync and !Send are auto implemented because of NonNull pointer in UCollator I think, so need to unsafe implemented them , if we are sure about thread safety.

You can use this wrapper to work around it, create an UCollator, set it up, then convert it to the wrapper and share it.

use rust_icu_common as common;
use rust_icu_ucol as ucol;
use rust_icu_ustring as ustring;
use std::cmp::Ordering;

/// Collator is a thread-safe version of UCollator.
/// It achieves this by only allowing access to UCollator's const this methods,
/// which are thread safe: https://unicode-org.github.io/icu/userguide/icu/design.html#thread-safe-const-apis
pub struct Collator(ucol::UCollator);

// SAFETY: Collator only exposes methods that take a const this, and those are thread-safe: https://unicode-org.github.io/icu/userguide/icu/design.html#thread-safe-const-apis
unsafe impl std::marker::Sync for Collator {}
unsafe impl std::marker::Send for Collator {}

impl From<ucol::UCollator> for Collator {
    fn from(src: ucol::UCollator) -> Self {
        Self(src)
    }
}

impl Collator {
    /// Compares strings `first` and `second` according to the collation rules in this collator.
    ///
    /// Returns [Ordering::Less] if `first` compares as less than `second`, and for other return
    /// codes respectively.
    ///
    /// Implements `ucol_strcoll`
    pub fn strcoll(&self, first: &ustring::UChar, second: &ustring::UChar) -> Ordering {
        self.0.strcoll(first, second)
    }

    /// Compares strings `first` and `second` according to the collation rules in this collator.
    ///
    /// Returns [Ordering::Less] if `first` compares as less than `second`, and for other return
    /// codes respectively.
    ///
    /// In contrast to [UCollator::strcoll], this function requires no string conversions to
    /// compare two rust strings.
    ///
    /// Implements `ucol_strcoll`
    /// Implements `ucol_strcollUTF8`
    pub fn strcoll_utf8(
        &self,
        first: impl AsRef<str>,
        second: impl AsRef<str>,
    ) -> Result<Ordering, common::Error> {
        self.0.strcoll_utf8(first, second)
    }
}

The current version of UCollator is not thread safe because generalized_fallible_setter generates a &self receiver instead of &mut self. This would be a breaking change.

@nkovacs Thanks a lot I already used similar wrapper - I'm interested just in strcoll_utf8 method so it's simpler.
Is it save for all supported ICU versions? I'm on 66 on Ubuntu 20.04 where it looks fine, but just wondering if works universally.

I'm not sure. https://unicode-org.atlassian.net/browse/ICU-8202 says that all const methods should be thread safe, and that's marked as 50.1.