CrawlAPIs

Public APIs List Crawler

This is a simple yet powerful crawler that gets the list of all public APIs and stores it in a database.

Code follows Object Oriented Programming principles with the implementation of Encapsulation and Constructor Invoking
End-to-end support for handling authentication requirements & token expiration of server. This is enabled by fetching a new token after a regular interval of time well before it expires.
Complete Pagination support in getting all data - including index categories(Eg. Animals) as well ass Sub-Categories(Eg. Dogs)
Solves the problem of Rate Limiting Server requests by regular cool-down period prior to sending the next request.
Crawls all public APIs and stores it in sqlite3 database as well generates an excel sheet for ease of visualization.
Special attention is paid to URL links. Since each endpoint URL is generated dynamically, characters like ' ' and '&' are replaced to their ASCII values i.e %20 and %26 respectively

Open Command Prompt
run "git clone https://github.com/purvansh11/CrawlAPIs.git" in any directory
run "cd CrawlAPIs"
run "pip install -r requirements.txt"
run "python main.py"
Since each request to sub-categories is sent every 20 sec, the final output shall be generated in 45 mins.
Once the code runs successfully, check for the output of the sample query in the command prompt.
In the same directory i.e. CrawlAPIs, check the thus generated excel sheet for visualizing the API list

The final output generated consists of 640 rows of data.
All the points from the Points to be achieved have been taken care of.
If given more days, I shall improve on mainly two things - First is that I would think of some way to optimize the code since it takes 45 mins to get the output file. Secondly, I would be looking forward to its deployment to provide a tool to any user to enter any SQL query and provide the user with relevant data in any required form.

Feel free to reach out to me at purvansh11@gmail.com for any feedbacks :)