`get_sentences` no longer works on dataframe after commit `645401e`
Closed this issue · 4 comments
Hi,
I love this package! I've been using it in regular sentiment analysis and had previously set up something like the example:
presidential_debates_2012 %>%
get_sentences()
This used to produce a dataframe with all columns retained and the individual sentences that I could then run sentiment()
on. This no longer works, and I'm struggling to find a way to get sentiment on a sentence level while preserving all of the remaining data in the original dataframe.
Here is a reprex for what I'm working through:
library(sentimentr)
library(magrittr)
library(dplyr)
x <- tibble(text = c('I like data a lot.',
'sentimentr is great for sentiment analysis. I use it a lot.'),
rating = c(5, 4))
x
# Does not preserve `rating` colulmn and is not at sentence level:
x %>%
mutate(sentence = get_sentences(text)) %$%
sentiment_by(sentence)
# Preserves `rating` column, but does not allow for sentence-level sentiment:
x %>%
mutate(sentence = get_sentences(text)) %$%
sentiment_by(sentence, list(rating))
# Allows for sentence-level sentiment, but does not preserve `rating` column
x %>%
mutate(sentence = get_sentences(text)) %$%
sentiment(sentence)
I've also tried a combination of get_sentences()
+ tidyr::unnest()
and then using bind_cols()
on a separate dataset built using sentiment()
, which works most of the time, but fails when, for whatever reason, sentiment()
produces more rows than its supporting dataframe (produced by get_sentences()
+ tidyr::unnest()
)
There's a lot of information here..It sounds like the problem is that you can't extract the sentences form a sataframe like you used to be able to as shown below:
library(sentimentr)
library(tidyverse)
presidential_debates_2012 %>%
sentimentr:::get_sentences()
## Error in get_sentences.data.frame(.) : object 'text.var' not found
Can you confirm this is the error you are getting?
I think this has been fixed now. Can you re-try?
If you want to keep the original text plus the new text here is one way to do it in the vein you were trying (not the most efficient):
library(sentimentr)
library(magrittr)
library(dplyr)
x <- tibble(text = c('I like data a lot.',
'sentimentr is great for sentiment analysis. I use it a lot.'),
rating = c(5, 4))
## helper function to return sentence level reults (text and scores)
sentiment_sentences <- function(x){
sents <- get_sentences(x)
bind_cols(sentences = unlist(sents), sentiment(sents))
}
sentiment_sentences(x$text)
x %>%
group_by(across()) %>%
summarize(
sentiment_sentences(text)
)
Which yields:
text rating id sentences element_id sentence_id word_count sentiment
<chr> <dbl> <int> <chr> <int> <int> <int> <dbl>
1 I like data a lot. 5 1 I like data a lot. 1 1 5 0.224
2 sentimentr is great for sentiment analysis. I use it a lot. 4 2 sentimentr is great for sentiment analysis. 1 1 6 0.204
3 sentimentr is great for sentiment analysis. I use it a lot. 4 2 I use it a lot. 1 2 5 0
I'd probably approach this more of an aggregation and rejoin problem and am guessing it's way faster as it's not doing it rowwise:
x$id <- seq_len(nrow(x))
x %>%
get_sentences() %>%
sentiment() %>%
left_join(x %>% select(original = text, id), by = 'id') %>%
relocate(original, .before = text)
Which yields:
original text rating id element_id sentence_id word_count sentiment
1: I like data a lot. I like data a lot. 5 1 1 1 5 0.2236068
2: sentimentr is great for sentiment analysis. I use it a lot. sentimentr is great for sentiment analysis. 4 2 2 1 6 0.2041241
3: sentimentr is great for sentiment analysis. I use it a lot.
Closing as there is no response from the OP.