Grep and mutate (grepl)
Closed this issue · 2 comments
iferres commented
I may not be seeing the answer, but I think it is not possible right now. I want to create a new column based on the presence or absence (logical) of a substring inside a string column.
With grep I get the matches:
echo -e "lorem,1\nloremipsum,232\nlorem,334\nipsum,214234\nfoobar,4" |
csvtk -H grep -f1 -r -p "ipsum"
# loremipsum,232
# ipsum,214234
With mutate I create a new column with the substring, but if doesn't exists, it returns the whole column:
echo -e "lorem,1\nloremipsum,232\nlorem,334\nipsum,214234\nfoobar,4" |
csvtk -H mutate -f1 -p "(ipsum)" |
csvtk -H pretty
# lorem 1 lorem
# loremipsum 232 ipsum
# lorem 334 lorem
# ipsum 214234 ipsum
# foobar 4 foobar
Would be possible to ask if substring exists, and return it in a logical column? The expected output would be:
# lorem 1 false
# loremipsum 232 true
# lorem 334 false
# ipsum 214234 true
# foobar 4 false
Or the possibility of leaving blank (or NA/null) cells in case no matches are detected, something like:
# lorem 1 NA
# loremipsum 232 ipsum
# lorem 334 NA
# ipsum 214234 ipsum
# foobar 4 NA
or just leaving them empty.
shenwei356 commented
There's a --na
flag.
--na for unmatched data, use blank instead of original data
$ echo -e "lorem,1\nloremipsum,232\nlorem,334\nipsum,214234\nfoobar,4" \
| csvtk mutate -H -f1 -p "(ipsum)" --na \
| csvtk pretty -H
lorem 1
loremipsum 232 ipsum
lorem 334
ipsum 214234 ipsum
foobar 4
Well, if you'd like a NA.
$ echo -e "lorem,1\nloremipsum,232\nlorem,334\nipsum,214234\nfoobar,4" \
| csvtk mutate -H -f1 -p "(ipsum)" --na \
| csvtk replace -H -f3 -p '^$' -r NA \
| csvtk pretty -H
lorem 1 NA
loremipsum 232 ipsum
lorem 334 NA
ipsum 214234 ipsum
foobar 4 NA
iferres commented
Awesome, many thanks!