Extract numbers with digit range
bes827 opened this issue · 2 comments
Hello
This package has been very helpful and efficient, I am able to achieve a lot with much less code than with stringr or with other packages. I am trying to perform the task below which is a bit complex:
I have a large number of text files that I converted to a dataset and now trying to extract a particular number (serial number, the unique pattern for this is the first occurrence of a number between 5-8 digits).
I tried couple of codes, including the sapply function you posted in the previous question, but no luck so far. The issues I am running through are:
- unable to find the regex to define the number of digits ranging (5-8), the example below only includes the 5 digits.
- I believe that the code I use does not search in text after a new line, is there a way to fix that?
thanks a lot
#create sample dataframe:
text = c("name: xyz, abc age: 23, serial: 12345, dob: 1/1/2011, other: 0000000" , "name: aaa, bbb
age: 21, serial: 123456, DOB: 1/2/1234", "name: ccc, ddd
age:42
number: 1234567
dob: 1/1/111")
df <- data.frame (text)
# attempt to extract the number in anew variable:
library(qdapRegex)
df$serial = sapply(qdapRegex::rm_number(df$text, pattern = "(?<!\\d)\\d{5}(?!\\d)", extract=TRUE) , `[`, 1)
df$serial
unable to find the regex to define the number of digits ranging (5-8), the example below only includes the 5 digits.
\\d{5,8}
says 5 to 8 digits
I believe that the code I use does not search in text after a new line, is there a way to fix that?
Not sure what you mean. Can you povide an example that fails and your desired output?
Closing because OP never responded for clarification