ivan-rivera/RedditExtractor

Error in data.frame(url = strip_json(request_url), author = extract_comments_attributes(json,: arguments imply differing number of rows: 1, 0

Closed this issue · 5 comments

Hello,

I got the error message when crawling from some Reddit forums (e.g. https://www.reddit.com/r/sports/, https://www.reddit.com/r/COVID19_support/) using the following codes. I am unsure why the codes throw up error messages on some, not all, Reddit forums. What have I missed? Any help will be much appreciated!

Error in data.frame(url = strip_json(request_url), author = extract_comments_attributes(json, :
arguments imply differing number of rows: 1, 0

top_urls <- find_thread_urls(subreddit="sports", sort_by="top")
threads_contents <- get_thread_content(top_urls$url)

Hi Phoebe, thanks again for reporting this.

I'm afraid that I cannot seem to reproduce the problem. When I run these 2 commands, I manage to successfully create thread_contents. I'm guessing that we might be getting slightly different URLs from the Reddit API. Are you able to narrow this problem down to a single URL?

Some URLs that triggered the error message:
https://www.reddit.com/r/sports/comments/qgblmv/braves_vs_astros_world_series_pits_father_versus/
https://www.reddit.com/r/sports/comments/qmu3qr/buffalo_sabres_agree_to_deal_jack_eichel_to_vegas/
https://www.reddit.com/r/sports/comments/qc4yk4/sports_everything_you_need_to_know_about_the/
https://www.reddit.com/r/sports/comments/qd3h9r/chairs_maloney_and_krishnamoorthi_launch/
https://www.reddit.com/r/sports/comments/qblwzq/mlb_qualifying_offer_value_drops_by_500000_to_184/
https://www.reddit.com/r/sports/comments/qkbbxp/coach_gary_patterson_out_at_tcu_after_20_years/
https://www.reddit.com/r/sports/comments/qgneuw/solskjær_is_staying/
https://www.reddit.com/r/sports/comments/qcpvym/fifa_says_57_more_refugees_evacuated_from/
https://www.reddit.com/r/sports/comments/qn7pzx/hutton_resigns_as_yorkshire_chairman/
https://www.reddit.com/r/sports/comments/qdnnqt/that_solidified_me_being_brave_simone_biles_most/
https://www.reddit.com/r/sports/comments/qhitdb/gladbach_crowd_partying_with_the_players_after/
https://www.reddit.com/r/sports/comments/qjhk7n/new_research_on_concussions_recovery_finds/
https://www.reddit.com/r/sports/comments/qb1bk0/espn_broadcaster_dick_vitale_reveals_lymphoma/

https://www.reddit.com/r/COVID19_support/comments/q5ghzt/irregular_sticky_october_2021_new_request_for/

I'm running:

  • RStudio 2021.09.0+351 "Ghost Orchid" Release (077589bcad3467ae79f318afe8641a1899a51606, 2021-09-20) for macOS
  • R version 4.1.1 (2021-08-10)
  • RedditExtractoR version 3.03

Thanks!

Ah, I see, I think I know what the problem is. I think you are running into a problem that was recently fixed in this PR. The issue is caused by threads that according to Reddit contain comments even when no comments can actually be seen. Perhaps Reddit does not update the comment count when comments get deleted. Anyway, that issue should be fixed now, so I suggest that you upgrade to version 3.0.5 and try it out.

Wow! Now it works! Many thanks!

Glad it works :)