trinker/sentimentr

`get_sentences` no longer works on dataframe after commit `645401e`

Closed this issue · 4 comments

Hi,

I love this package! I've been using it in regular sentiment analysis and had previously set up something like the example:

presidential_debates_2012 %>%
    get_sentences()

This used to produce a dataframe with all columns retained and the individual sentences that I could then run sentiment() on. This no longer works, and I'm struggling to find a way to get sentiment on a sentence level while preserving all of the remaining data in the original dataframe.

Here is a reprex for what I'm working through:

library(sentimentr)
library(magrittr)
library(dplyr)

x <- tibble(text = c('I like data a lot.', 
                     'sentimentr is great for sentiment analysis.  I use it a lot.'),
            rating = c(5, 4))
x

# Does not preserve `rating` colulmn and is not at sentence level:
x %>%
 mutate(sentence = get_sentences(text)) %$%
    sentiment_by(sentence)

# Preserves `rating` column, but does not allow for sentence-level sentiment:
x %>%
 mutate(sentence = get_sentences(text)) %$%
    sentiment_by(sentence, list(rating))

# Allows for sentence-level sentiment, but does not preserve `rating` column
x %>%
  mutate(sentence = get_sentences(text)) %$%
    sentiment(sentence)

I've also tried a combination of get_sentences() + tidyr::unnest() and then using bind_cols() on a separate dataset built using sentiment(), which works most of the time, but fails when, for whatever reason, sentiment() produces more rows than its supporting dataframe (produced by get_sentences() + tidyr::unnest())

There's a lot of information here..It sounds like the problem is that you can't extract the sentences form a sataframe like you used to be able to as shown below:

library(sentimentr)
library(tidyverse)

presidential_debates_2012 %>%
    sentimentr:::get_sentences()

## Error in get_sentences.data.frame(.) : object 'text.var' not found

Can you confirm this is the error you are getting?

I think this has been fixed now. Can you re-try?

If you want to keep the original text plus the new text here is one way to do it in the vein you were trying (not the most efficient):

library(sentimentr)
library(magrittr)
library(dplyr)

x <- tibble(text = c('I like data a lot.', 
                     'sentimentr is great for sentiment analysis.  I use it a lot.'),
            rating = c(5, 4))

## helper function to return sentence level reults (text and scores)

sentiment_sentences <- function(x){

    sents <- get_sentences(x)
    bind_cols(sentences = unlist(sents), sentiment(sents))

}

sentiment_sentences(x$text)


x %>%
    group_by(across()) %>%
    summarize(
        sentiment_sentences(text)
    ) 

Which yields:

  text                                                         rating    id sentences                                   element_id sentence_id word_count sentiment
  <chr>                                                         <dbl> <int> <chr>                                            <int>       <int>      <int>     <dbl>
1 I like data a lot.                                                5     1 I like data a lot.                                   1           1          5     0.224
2 sentimentr is great for sentiment analysis.  I use it a lot.      4     2 sentimentr is great for sentiment analysis.          1           1          6     0.204
3 sentimentr is great for sentiment analysis.  I use it a lot.      4     2 I use it a lot.                                      1           2          5     0    

I'd probably approach this more of an aggregation and rejoin problem and am guessing it's way faster as it's not doing it rowwise:

x$id <- seq_len(nrow(x))
x %>%
    get_sentences() %>%
    sentiment() %>%
    left_join(x %>% select(original = text, id), by = 'id') %>%
    relocate(original, .before = text)

Which yields:

                                                       original                                        text rating id element_id sentence_id word_count sentiment
1:                                           I like data a lot.                          I like data a lot.      5  1          1           1          5 0.2236068
2: sentimentr is great for sentiment analysis.  I use it a lot. sentimentr is great for sentiment analysis.      4  2          2           1          6 0.2041241
3: sentimentr is great for sentiment analysis.  I use it a lot.                   

Closing as there is no response from the OP.