This project involves web scraping commercial real estate listings from the Domain.com.au website using the Scrapy framework. The scraped data includes the name of the apartment, location, description, agent contact, and website.
The goal of this project is to extract commercial real estate listings data from Domain.com.au for further analysis or integration into other applications.
- Clone this repository to your local machine:
git clone https://github.com/yourusername/domain-commercial-real-estate-scraper.git
- Install Scrapy if you haven't already:
pip install scrapy
- Navigate to the project directory:
cd domain-commercial-real-estate-scraper
- Run the Scrapy spider:
scrapy crawl domain_commercial -o output.json
This command will execute the Scrapy spider named domain_commercial
and save the scraped data to a JSON file named output.json
.
The project structure is organized as follows:
domain-commercial-real-estate-scraper/
│
├── domain_commercial/
│ ├── __init__.py
│ ├── items.py
│ ├── middlewares.py
│ ├── pipelines.py
│ ├── settings.py
│ └── spiders/
│ ├── __init__.py
│ └── domain_commercial_spider.py
│
├── README.md
└── scrapy.cfg
domain_commercial/
: Contains the Scrapy project files.domain_commercial/items.py
: Defines the data items to be scraped.domain_commercial/middlewares.py
: Contains custom middleware settings.domain_commercial/pipelines.py
: Defines the pipeline for processing scraped data.domain_commercial/settings.py
: Contains project settings such as user agents and pipeline settings.domain_commercial/spiders/
: Directory containing Scrapy spiders.domain_commercial/spiders/domain_commercial_spider.py
: Scrapy spider for scraping commercial real estate listings from Domain.com.au.scrapy.cfg
: Scrapy configuration file.README.md
: This file, providing an overview of the project and instructions for usage.
The following fields are extracted for each commercial real estate listing:
name
: Name or title of the apartment.location
: Location or address of the apartment.description
: Description of the apartment.agent_contact
: Contact information of the agent.website
: Website URL of the commercial real estate listing.
The scraped data is saved to a JSON file named output.json
in the project directory.
Contributions are welcome! Feel free to open an issue or submit a pull request.
For questions or inquiries, please contact FaeyO.
Replace yourusername
in the clone command with your GitHub username.