This repo contains the file used to create the dataset which contains the animelist from the MyAnimeList website as well as the userdata. It uses python as the underlying language and an unofficial MAL API Jikan to scrape the data as well as BeautifullSoup4.
For the anime list file you just need a range to go through the mal_id from 1 till the user limit.
Just edit the input in the file under the function call in the main function.
- animeID: id of anime as in anime url https://myanimelist.net/anime/ID
- name: title of anime
- premiered: premiered on. default format (season year)
- genre: list of genre
- type: type of anime (example TV, Movie etc)
- episodes: number of episodes
- studios: list of studio
- source: source of anime (example original, manga, game etc)
- scored: score of anime
- scoredBy: number of member scored the anime
- members: number of member added anime to their list
For the userdata you can use the following script to get all the userdata. This script uses Jikan API to get the data as well as BS4 to get the usernames from the MAL website directly as you need the username to get the user data directly.
- user_id: id of user
- username: username of the user
- gender: gender of the user
- birthday: birthday of the user
- location: location of the user
- joined: date joined
- days_watched: days spent watching,
- mean_score: mean score rated,
- watching: total animes currently watching,
- completed: total anime completed,
- on_hold: total anime on hold,
- dropped: total anime dropped,
- plan_to_watch: anime planned to watch,
- total_entries: total animes,
- rewatched: animes rewatched,
- episodes_watched: total episodes watched
python getUser.py UserList.txt user.csv
For this you need to get topic ID. Go to MAL -> Community -> Forums -> Select a forum
For example for the following forums links their respective ID are highlighted in bold below:
https://myanimelist.net/forum/?topicid=1699126 -> 1699126
https://myanimelist.net/forum/?topicid=1696289 -> 1696289
After getting the topic ID, you can use createUserListFromPost script.
python getUserFromPost.py topicID UserList.txt
For this you need to get club ID. Go to MAL -> Community -> Clubs -> Select a club
For example for the following clubs links their respective ID are highlighted in red below:
https://myanimelist.net/clubs.php?cid=72250 -> 72250
https://myanimelist.net/clubs.php?cid=32683 -> 32683
After getting the topic ID, you can use createUserListFromClub script.
python getUserFromClub.py clubID UserList.txt