/github_trending_bot

A scrappy project for demonstration and educational purposes.

Primary LanguagePythonMIT LicenseMIT

Github-Trending-Bot

A scrappy project for demonstration and educational purposes. For more information read Github terms of service for scrapping.

Scrapy Topics

  • Data Extraction
  • Xpath and CSS
  • Rule
  • Settings
  • Item and Data Storage

Extracted data files

Github-Trending-Bot will extract the URL and description of trending data. The collected data is stored using below commands:

scrapy crawl GithubTrendingRepoCrawler -o extracted_data_files/links_CVS.cvs
scrapy crawl GithubTrendingRepoCrawler -o extracted_data_files/links_JSON.json
scrapy crawl GithubTrendingRepoCrawler -o extracted_data_files/links_XML.xml

and available at:

CSV Json XML

Spiders

$ scrapy list
GithubTrendingRepo
GithubTrendingRepoCrawler
  • GithubTrendingRepo: Demonstrate CSS, Xpath and LinkExtractor using regex.
  • GithubTrendingRepoCrawler: Demonstrate recursive approach to extract data and description from extracted link.

Execute Spiders

Use the scrappy crawl command to execute project and particular spider:

$ scrapy crawl GithubTrendingRepoCrawler

Run the particular spider using runspider command:

$ scrapy runspider github_trending_bot/spiders/GithubTrendingRepo.py

License

MIT