dash character and -a option
aborruso opened this issue · 9 comments
Hi,
I have these two input files
Name,Age
Andy,32
Mary-Jane,43
Name,City
Andy,Rome
Mary Jane,New York
If I run
csvmatch -i -a -n input_01.csv input_02.csv --fields1 "Name" --fields2 "Name"
I have
Name,Name
Andy,Andy
Using -a
"Mary-Jane" should be equal to "Mary Jane". Dash character is a non-alphanumeric char or not?
Thank you
If I create this rule.txt file
$
and run
csvmatch -i -a -n -l rule.txt input_01.csv input_02.csv --fields1 "Name" --fields2 "Name"
I have also Mary Jane
Name,Name
Andy,Andy
Mary-Jane,Mary Jane
But it has no sense for me, because I have added only a $
, a white space at the end of sentence.
Thank you
Ok, I'm not sure if this is a bug or just something that isn't clear in the documentation.
The reason it happens is that 'ignoring' nonalphanumeric characters means they are removed -- so Mary-Jane
becomes MaryJane
, which doesn't match. A workaround would be to use something like Levenshtein, which even with a high threshold like 85% should produce a match.
One option to resolve problems like this would be to replace nonalphanumerics with spaces instead of removing them. Of course with cases like Lastname, Firstname
, you'd end up with two spaces, but with a flag to ignore repeating whitespace characters (such as your suggestion in #29) it could work quite well.
Hi @maxharlow thank you.
But why does it work with $
in -l
file? Why "Mary Jane" matches "MaryJane"?
Afraid I wasn't able to replicate that. Would you mind checking again?
@maxharlow look here http://youtu.be/cUfAunJnUuU?hd=1
My files are
# rule.txt
$
# input_01.csv
Name,Age
Andy,32
Mary-Jane,43
Andrè,50
#input_02.csv
Name,City
Andy,Rome
Mary Jane,New York
Andre',Palermo
How odd. I've tried it with files exactly the same as yours. A single space in rule.txt
would make sense, as that would remove the space from the second file, and the -a
would remove the hyphen from the first. But with the $
, it shouldn't match, and I don't know why it does for you.
Ok @maxharlow I'm closing, I have had a good reply to my issue question.
Thank you
As of v1.19 this should now work as you originally expected
Wow, I'm very proud also of it :)