Very puzzling issue with get_thread_content -- 503 Service Unavailable

Question

Very puzzling issue with get_thread_content -- 503 Service Unavailable

Closed this issue 3 years ago · 3 comments

I used the following query, which returned a dataframe with 237 threads:

mydata = RedditExtractor::find_thread_urls(
keywords = "psychology",
subreddit = "askphilosophy",
sort_by = "relevance",
period = "all")

I then used those threads in get_thread_content(mydata$url), but received the following error repeatedly:

Error in value[3L] :
Cannot read from Reddit, check your inputs or internet connection
In addition: Warning message:
In file(con, "r") :
cannot open URL 'https://www.reddit.com/r/askphilosophy/comments/o4rbkx/engels_on_the_a_priori_in_antidühring/.json?limit=500': HTTP status was '503 Service Unavailable'

Here is where it becomes puzzling: When I search for that thread in mydata$url, it's there, but it doesn't end with ".json?limit=500". That is being added somewhere in get_thread_content(). It's also only happening for that one link. When I remove just that one link from mydata before using get_thread_content(), then get_thread_content() works (i.e., it doesn't appear to be a rate limit issue). To make matters even more confusing, when I went to the thread (after removing the "/.json?limit=500") to manually check it out, nothing weird about it pops out. It's not a deleted post, a heavily downvoted post, a private post, from a private account, or anything like that. Just a normal reddit thread.

In the end I made a workaround, but I don't understand what the root cause of the issue was or how to prevent it.

Answer 1 · 2021-08-31T07:24:52.000Z

Thanks for reporting! A weird issue indeed. I noticed that the thread contains some special characters in it -- do the other URLs that you get from your query also contain special characters? FYI, the suffix (JSON + limit) is being appended here.

I'll look into this issue in the next few days.

Answer 2 · 2021-08-31T11:17:18.000Z

FYI, the issues is caused by special characters, I'll try to fix it

Answer 3 · 2021-08-31T17:52:10.000Z

The issue is fixed but it won't appear on CRAN until at least next week (from 31/08/21)