/Keywords-scraper

This script crawls through URL's of google search and extracts top 20 frequently used keywords in title, h2, h3 and alt tags

Primary LanguagePython

Introduction

Tags Scraper is working set of Python scripts which crawls over the URLs of Google Searches and gives out the result of most frequently used words by webpages of the search result. Having idea regarding the most used keywords can help in optimising

Description

This project is modularized in 4 Modules

  • URL_generator : It's an indipendent module which generates Google accepted URL with all the valid paramenters. It can be extended further by adding support for other Search engines as well.

  • SearchResults : SearchResults module takes the search query as input and returns the list of links of "

    " tag, which are searched result's pages link.

  • Fetch_Tags : Fetch_Tags module takes in the list of URLs generated by SearchResults , visits each URL, scrapes data of Title tag, H2 and H3 tag and alt tag and count for how many times each word has been used in different webpages.

  • Main : Main module is what we need to interact with. It combines the functionality of other three modules into 1 whole. The data generated is Stored in KwywordsData.txt file.

Working

  • After installing all the required libraries, run the main.py script. 'main.py' is the driver module. Run in in the CLI or in IDE.

  • When you'll run the main module, it will ask for "Enter Search Query", and you need to enter the query for which you want top keywords just like you search in google search bar , it will start fetching results .

  • It will try to fetch 100 results but can be less due to inaccessible pages and broken links

searching

results