/MAL-Scraper

MyAnimeList data scraper intended for academic purposes.

Primary LanguagePythonApache License 2.0Apache-2.0

What is this about?

This is a Scraper that utilize Jikan API to crawl data from MyAnimeList. It serves as an introductory and educational tool for deeper analysis of the trends among the anime community.

Explanation

As there have been lots of anime released since the first one was registered in MAL DB, it's highly recommended one utilize multi-thread approach to scrape such big data cluster. Whilst our scraper currently doesn't use the forementioned approach, it's however, multi-thread-compatible. We have also introduced a version where one can easily deploy to non-paid Cloud Application Platform such as Heroku, which will be directly linked to a Google Spreadsheet through OAuth2 with some uses of open-sourced APIs.

As for how the scraper works, it's highly recommended that one went through JikanV4 docs firsthand as it heavily rely on it.

Pre-scraped Data

As for those that are short on time or want to have a preview on the data, we have published Spreadsheet for those interested.

Date Data
28/05/2022 Spreadsheet

How to use

Example: Start scraping a new.

py src/scraping.py

Resuming scraping from id 100.

py src/scraping.py --start 100 --resume

Reference

JikanV4 which this scraper heavily relied on.

Google OAuth2 as the web-server authentication protocol we used.

Google API for server-to-server authentication mechanisms to Google API.

Gspread for a simple automatable way to access and edit Spreadsheet from server-side.