Very puzzling issue with get_thread_content -- 503 Service Unavailable
Closed this issue · 3 comments
I used the following query, which returned a dataframe with 237 threads:
mydata = RedditExtractor::find_thread_urls(
keywords = "psychology",
subreddit = "askphilosophy",
sort_by = "relevance",
period = "all")
I then used those threads in get_thread_content(mydata$url), but received the following error repeatedly:
Error in value[3L] :
Cannot read from Reddit, check your inputs or internet connection
In addition: Warning message:
In file(con, "r") :
cannot open URL 'https://www.reddit.com/r/askphilosophy/comments/o4rbkx/engels_on_the_a_priori_in_antidühring/.json?limit=500': HTTP status was '503 Service Unavailable'
Here is where it becomes puzzling: When I search for that thread in mydata$url, it's there, but it doesn't end with ".json?limit=500". That is being added somewhere in get_thread_content(). It's also only happening for that one link. When I remove just that one link from mydata before using get_thread_content(), then get_thread_content() works (i.e., it doesn't appear to be a rate limit issue). To make matters even more confusing, when I went to the thread (after removing the "/.json?limit=500") to manually check it out, nothing weird about it pops out. It's not a deleted post, a heavily downvoted post, a private post, from a private account, or anything like that. Just a normal reddit thread.
In the end I made a workaround, but I don't understand what the root cause of the issue was or how to prevent it.
Thanks for reporting! A weird issue indeed. I noticed that the thread contains some special characters in it -- do the other URLs that you get from your query also contain special characters? FYI, the suffix (JSON + limit) is being appended here.
I'll look into this issue in the next few days.
FYI, the issues is caused by special characters, I'll try to fix it
The issue is fixed but it won't appear on CRAN until at least next week (from 31/08/21)