string identification
atstp opened this issue · 6 comments
use case
When I was putting together #20, I reached for numeric?
to find out that it only matched digits (granted, it's clear in the docs).
tiny fix
When it comes to identifying strings, a number?
function would be a nice tool, one that it would match "55"
as well as "-55.0"
with the idea that this would mostly* work:
(= (cuerdas.core/number? "-55.0")
(clojure.core/number? (clojure.edn/read-string "-55.0")))
(*) i don't have strong stance on 1N
, 1M
, and 1/3
larger changes
This introduces awkwardness with numeric?
, alpha-numeric?
, and, by relation, alpha?
. (breaking change suggestion) In my opinion, they could be scrapped in favor of digits?
, letters?
, and letters-and-digits?
, though the later seems less useful and could probably just be dropped entirely.
At least based on a gut reaction, the familiar alpha
, numeric
, would leave a hole if they weren't around. To fill that gap, posix-
prefixed functions would be serve well: posix-alphas?
, posix-alnums?
, posix-blanks?
, etc. would all provide the stable, well-known role that they exist for, leaving cuerdas
to provide concisely-named, modern equivalents like it already does with words
, which breaks tradition by including "-"
.
A bonus with posix-
prefixed functions is that the explicit "old" naming makes cuerdas
' modern equivalents expected.
Here's some tests that would pass what i'm suggesting:
(are [tst val] #(%1 %2)
number? "-99.8"
digits? "99"
letters? "abcde"
word? "this-that"
word? "This_that"
posix-alphas? "aBc"
posix-digits? "12345"
posix-alunms? "abc123"
posix-word? "This_That"
I'm up for putting most (all) of the work in on this if it's likely to get accepted. Thanks!
I agree with almost all proposed changes, including breaking changes (maybe for some functions that are not in conflict mark as deprecated first).
This is my list of doubts:
- why the
posix-
prefixed functions, I'm clearly understand the purpose of them - I miss a function that replaces
alpha-numeric?
because is pretty useful.
Feel free to work on it, I glad to have new and better predicates for identify strings.
the posix-
prefix
You're right, posix
is a bad prefix. Sorry to cloud the idea with a bad name. Generally <prefix>-
would provide matchers that match ascii/unicode-latin
<prefix>-<just alpha>?
: `[a-zA-Z] in comparison to generic unicode "letters"<prefix>-<alpha and numeric>?
: characters like[a-zA-Z0-9]
/[:alnum:]
for product numbers and such
and
<prefix>-word?
: the unfortunately common ([a-zA-Z0-9_]
,\w
, or[:word:]
) meaning of "word"
While they aren't going to change the world, they fill a common need. simple-
, latin-
, or traditional-
could work as well. Generally, they serve enough purpose to be useful, but not enough to nudge out the more useful word?
and letter?
.
This would allow letter?
and word?
to adapt as java and javascript support unicode better, (perhaps for another issue) but letter?
and word?
could match based on letters for the locale or any unicode language.
alpha-numeric?
replacement
yeah, would letters-and-numbers?
work? should it respect locale?
Aha, seems like I start understanding. If I understand it well, your proposal is to have the "standard" or "traditional" behavior prefixed with posix
or traditional
and unprefixed more better that the traditional approach.
I'm pretty convinced with the approach, feel free to make a PR with that and we will make the final adjustments on the working code. Thank you very much for taking care of this!
We only need to choice the final prefix, for now use the posix
that now as I understand the motivation behind the name it make sense for me.
great, glad to hear it!
After analyzing the situation, I have done a little bit different approach. I have't done the renames because most of the stuff are obvious:
alpha?
is alpha independently of unicode. So it can be called as is. alnum?
is always alpha + num so it does not need to be renamed. But other functions such as letters?
and word?
they are unicode aware without any specific prefix.
In fact word?
is a unicode aware alternative to alnum?
.
With that changes I can consider this issue fixed ;)