InkBunny Scraper
Closed this issue · 4 comments
Hear me out! While InkBunny is one of the only furry sites that allows cub, and is therefore known as a cub site now, it has many fabulous artists on it that don't do cub, such as Dripponi, TheSecretCave, Chunie, Tsaiwolf, and more! Unlike FA, SoFurry, etc, InkBunny has an amazing API and should be super easy to use as a scraper.
To get a SID, make an InkBunny account, go to settings in inkbunny and enable API access.
Then put in https://inkbunny.net/api_login.php?username=derpibooru&password=hunter2
and it will output the SID in JSON. This way, you could have people login in their user settings, or my suggestion, put one in the config.
You can also login as guest, but then you can only view general rating posts with https://inkbunny.net/api_login.php?username=guest
which will output an SID you can use so you can view a post by converting
https://inkbunny.net/s/2436862
-> https://inkbunny.net/api_submissions.php?show_description=yes&sid=4X88ktV7jxywp65Ng40ez1qTJd&submission_ids=2436862
which gives you a beautiful json to mine for info. You'd probably only pull file_url_full
for file, username
for artist namespace, and description
for description. Obviously tags are free writing and likely will just muddy the database.
Is your feature request related to a problem? Please describe.
You can only use direct image links and it struggles when it's behind an auth wall (anything Mature+).
Describe the solution you'd like
You can either code in the config an API key related to a username or allow users to put in their own InkBunny SID.
Describe alternatives you've considered
Put in the direct image link.
I tried putting together a scraper ex file but haven't tested it yet because I know it's wrong. I can't figure out how to pull the submission ID from the URL to plug it into the API.
defmodule Philomena.Scrapers.Inkbunny do
@url_regex ~r|\Ahttps?://inkbunny.net/s/([\d]+)/?|
@spec can_handle?(URI.t(), String.t()) :: true | false
def can_handle?(_uri, url) do
String.match?(url, @url_regex)
end
def scrape(_uri, url) do
[submission_id] = Regex.run(@url_regex, url, capture: :last)
api_url = "https://inkbunny.net/api_submissions.php?show_description=yes&sid=#{inkbunny_sid()}&submission_ids=#{submission_id}"
{:ok, %Tesla.Env{status: 200, body: body}} = Philomena.Http.get(api_url)
json = Jason.decode!(body)
submission = json["submissions"]
images = submission["files"]["file_url_full"]
%{
source_url: submission["url"],
author_name: submission["username"],
description: submission["description"],
images: images
}
end
end
From a glance it seems like it should work, is there a specific issue you are having?
I got it working! I had to also update runtime obviously and the scrapers file. I also got it working for FurAffinity and e621 too. NSFW works without issue.
Im working on Pixiv but it's a little tricky.
How can I submit my code to this repository? Any policies on that?
To contribute, use the Github UI to fork the repository under your own account, add the relevant code to a new branch, and use the Github UI to submit a pull request