A scraper that gathers data from real estate ads.
Country | Website |
---|---|
Brazil | ZAP Imóveis |
Requirements |
---|
Python 3.6 |
MongoDB |
Clone this repository using git and cd into the project folder:
git clone https://github.com/pauloromeira/realestate-scraper.git && \
cd realestate-scraper
Inside project folder, install python requirements using pip:
pip install -r requirements.txt
First, run MongoDB server:
mongod &
Then use the following command to start crawling:
scrapy crawl zap [-a url=<zapimoveis-url>] [-a start=n] [-a count=n] [-a seed=<seed>]
Curently, only ZAP Imóveis is supported
Arguments:
-
count: limits the number of pages the crawler will search for. The default is to crawl till the end.
-
start: start crawling from a given page. The default is
1
. -
url: website url to perform search.
-
seed: seed for the website search engine.
-
Default values - properties in Pernambuco, Brazil. Crawl all pages.
scrapy crawl zap
-
Olinda-PE. Crawl the first 4 pages.
scrapy crawl zap -a count=4 -a urls="https://www.zapimoveis.com.br/venda/imoveis/pe+olinda/"
-
Rio de Janeiro-RJ - south zone. Starting at page 100, crawl till the end:
scrapy crawl zap -a start=100 -a urls="https://www.zapimoveis.com.br/venda/imoveis/agr+rj+rio-de-janeiro+zona-sul/"
-
All places. Starting from page 4, crawl 3 pages:
scrapy crawl zap -a start=4 -a count=3 -a urls="https://www.zapimoveis.com.br/venda/imoveis/"