shenwei356/csvtk

Grep and mutate (grepl)

Closed this issue · 2 comments

I may not be seeing the answer, but I think it is not possible right now. I want to create a new column based on the presence or absence (logical) of a substring inside a string column.
With grep I get the matches:

echo -e "lorem,1\nloremipsum,232\nlorem,334\nipsum,214234\nfoobar,4" |
   csvtk -H grep -f1 -r -p  "ipsum"  
# loremipsum,232
# ipsum,214234

With mutate I create a new column with the substring, but if doesn't exists, it returns the whole column:

echo -e "lorem,1\nloremipsum,232\nlorem,334\nipsum,214234\nfoobar,4" | 
   csvtk -H mutate -f1 -p  "(ipsum)" | 
   csvtk -H pretty
# lorem        1        lorem 
# loremipsum   232      ipsum 
# lorem        334      lorem 
# ipsum        214234   ipsum 
# foobar       4        foobar

Would be possible to ask if substring exists, and return it in a logical column? The expected output would be:

# lorem        1        false 
# loremipsum   232      true 
# lorem        334      false 
# ipsum        214234   true 
# foobar       4        false

Or the possibility of leaving blank (or NA/null) cells in case no matches are detected, something like:

# lorem        1        NA 
# loremipsum   232      ipsum 
# lorem        334      NA 
# ipsum        214234   ipsum 
# foobar       4        NA

or just leaving them empty.

There's a --na flag.

      --na               for unmatched data, use blank instead of original data
$ echo -e "lorem,1\nloremipsum,232\nlorem,334\nipsum,214234\nfoobar,4"  \
    | csvtk mutate -H -f1 -p "(ipsum)" --na \
    | csvtk pretty -H
lorem        1             
loremipsum   232      ipsum
lorem        334           
ipsum        214234   ipsum
foobar       4

Well, if you'd like a NA.

$ echo -e "lorem,1\nloremipsum,232\nlorem,334\nipsum,214234\nfoobar,4"  \
    | csvtk mutate -H -f1 -p "(ipsum)" --na \
    | csvtk replace -H -f3 -p '^$' -r NA \
    | csvtk pretty -H
lorem        1        NA   
loremipsum   232      ipsum
lorem        334      NA   
ipsum        214234   ipsum
foobar       4        NA 

Awesome, many thanks!