Project built to crawl Real Estate Agency websites. It can get the price, location and anything else.
Built using the tool Scrapy, a Python framework to extract data from web pages.
This project actually have spiders for the following websites:
Country | Agency |
---|---|
Brazil | Stória Imóveis |
Brazil | ImovelWeb |
Brazil | ZapImóveis |
Brazil | VivaReal |
Package | Version |
---|---|
Python | v3.6.5 |
Package | Version |
---|---|
Selenium | v3.12.0 |
Package | Version |
---|---|
GeckoDriver¹ | v0.20.1 |
¹ : Geckodriver also can be installed using the command npm install -g geckodriver
To clone the repository, run in the command line:
$ git clone http://github.com.br/MatheusDosReis/real-estate-agency-scraper
$ cd real-state-agency-scraper
Run the command bellow:
$ pip install -r requirements.txt
Run the command:
$ mkdir results
List of names of the available spiders:
- storia
- imovelweb
- zapimoveis
- vivareal
To crawl a specific spider:
scrapy crawl <name_of_the_spider>