IMDb Series Scraper is a Python script designed to scrape episode data from IMDb for a given TV series. It fetches details such as episode titles, thumbnails, and descriptions, and saves them in a JSON file.
.
├── README.md
├── main.py
└── requirements.txt
The project dependencies are listed in the requirements.txt
file. They include:
beautifulsoup4
requests
-
Clone the repository
git clone https://github.com/Odisseu93/imdb-series-scraper cd imdb-series-scraper
-
Create and activate a virtual environment (optional but recommended)
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required packages
pip install -r requirements.txt
To run the script, execute the following command:
python main.py
Follow the on-screen prompts to search for a series or get episode data for a specific series by its IMDb ID.
- Find series: Search for TV series on IMDb by entering a query.
- Get series data: Enter the IMDb ID of a series to fetch and save its episode data in a JSON file.
- Exit: Exit the script.
-
Find a series:
- Select option
1
and enter a search query. - The script will display a list of series with their respective IMDb IDs.
- Select option
-
Get series data:
- Select option
2
and enter the IMDb ID of the desired series (e.g.,tt0386676
). - The script will fetch the episode data for all seasons and save it in a JSON file named after the series in the
data
directory.
- Select option
The script creates a directory named data
(if it doesn't already exist) and saves the scraped episode data in a JSON file with the following structure:
[
{
"title": "series_name"
},
{
"season_number": 1,
"episodes": [
{
"title": "Episode 1",
"thumbnail": "url_to_thumbnail",
"description": "Episode description"
},
...
]
},
...
]
This project is licensed under the MIT License. See the LICENSE file for details.
- BeautifulSoup for parsing HTML.
- Requests for making HTTP requests.