/Data-Collector

Collect data from structured sources on the domain - "Companies registered in Telangana"

Primary LanguageJupyter Notebook

Data-Collector

Collect data from structured sources on the domain - "Companies registered in Telangana"

Note: The data folder on repo is obsolete due to size constraints and the main data folder is located at https://drive.google.com/drive/folders/122s1rhnApXVexOag-IvDUW1pEmWbukFE?usp=sharing.

The final dataset is in data/integrated.json but since the file is too big it is also there on https://drive.google.com/drive/folders/14RuhmEesHnLbOMxDmH0KzSa0YDJuVK95?usp=sharing.

Repository Structure:

There is only one main branch which is divided into folders.

  • The data folder holds all the intermediate data and the final dataset. The data folder in the repository is a little obsolete and the main data folder is on https://drive.google.com/drive/folders/122s1rhnApXVexOag-IvDUW1pEmWbukFE?usp=sharing. The folder is divided into further subfolders- one for each source tried. The data folder also contains the final dataset integrated.json.
  • The src folder holds all the code. The folder is divided into further subfolders- one for each source tried and one for final integration. Each code is documented.
  • The docs folder contains all relevant documents and reports.

Video Folder:

The folder contains a subfolder with a video for each major module of the project called “Module Wise Videos” and a finalvideo.mp4 - which is just a concatenation of all the videos. Link: https://drive.google.com/drive/folders/1UvdJwSpws59-7u0dzs7D4J8NB-0fUgpP?usp=sharing