This is a simple web scraper for the Films of Africa website.
- Scrape pages for information on the name, description, director, genre and length of the movie
- Interactively move on to the next or the previous movie on the website
- Clone this repository into your project
- Install requirements using the
requirements.txt
file - Configure your SQLAlchemy database in
settings.py
To run the program you need to create a Scraper
object with a valid URL
from scraper import scraper
s = scraper.Scraper('http://www.filmsofafrica.com/Ethiopia/Adwa.htm')
# The command above scrapes the given URL
# You can access the movie using the movie attribute
print(s.movie)
# To save the movie to the database use the save method
s.movie.save()
# To move to the next page, use the next property
# To move to the previous page, use the previous property
next_page = s.next
prev_page = s.prev
next_page.movie.save()
prev_page.movie.save()
To retrieve the movies from the database, use the MovieManager
class
from scraper.models import MovieManager
m = MovieManager()
all_movies = m.get_all() # get all the movies in the database
m.get(uid=8) # get the movie with the uid: 8
m.get(name='Journey To Lasta') # get the movie with the name: Journey To Lasta
- This has only been tested for the sites under
http://www.filmsofafrica.com/Ethiopia/