huggingface/tokenizers

Cross-compilation fails for custom target

semaraugusto opened this issue · 1 comments

I'm compiling tokenizers into a custom target that's not x86 nor unstable_wasm. I cannot use oniguruma but it would be possible to use the fancy-regex library instead.

Solution would be as simple as changing tokenizers/src/utils/mod.rs from

#[cfg(feature = "unstable_wasm")]
mod fancy;
#[cfg(feature = "unstable_wasm")]
pub use fancy::SysRegex;
#[cfg(not(feature = "unstable_wasm"))]
mod onig;
#[cfg(not(feature = "unstable_wasm"))]
pub use crate::utils::onig::SysRegex;

to

#[cfg(feature = "fancy-regex")]
mod fancy;
#[cfg(feature = "fancy-regex")]
pub use fancy::SysRegex;
#[cfg(feature = "onig")]
mod onig;
#[cfg(feature = "onig")]
pub use crate::utils::onig::SysRegex;

Since onig is enabled by default and unstable_wasm "requires" fancy-regex, this change would not break anything that I'm aware of.
It would only break if users didn't set either flag of the regex features or if they set both simultaneously.

This change would also make it easier to switch between regex implementations and provide new ones in the future.

I'm opening a pull-request with said changes, let me know if there are any issues with it

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.