ivan-rivera/RedditExtractor

Thread content URL containing Permalink instead of URL to external site

Opened this issue · 0 comments

Describe the bug
Using get_thread_content(), we receive a list containing two data frames, one for metadata about a particular thread.
This data frame includes a variable url, which, however, does not actually include the url to the corresponding external website of a thread (if available) but instead the permalink for the thread.

As an example. please check this thread:
https://www.reddit.com/r/worldnews/comments/ebamnt/venezuelas_civilian_militia_surpasses_target.json

The correct url would be "https://venezuelanalysis.com/news/14742", but instead the permalink "https://www.reddit.com/r/worldnews/comments/ebamnt/venezuelas_civilian_militia_surpasses_target" is listed.

I suggest adding the external url as an additional variable in the "thread" dataframe, either via a name such as "external_url" or by adjusting the naming conventions in line with the API results ("url", "permalink").