Internet Computer Protocol NNS Governance System Database

The Internet Computer Protocol NNS Governance System Database repository offers a holistic framework for establishing and managing a database-centric development environment. Inside the code folder, users can find tools to seamlessly set up their local environment, fetch data from the Internet Computer blockchain, handle automatic data updates and potential query failures, save data in various standard formats, and establish databases using diverse systems like SQLite, MySQL, and PostgreSQL. Additionally, the data folder contains both raw and processed data files in formats such as CSV, pickle, and JSON, all managed efficiently through Git Large File Storage (LFS). This repository not only provides guidance on setting up using pipenv and various database options but also places a spotlight on leveraging Git LFS and a dedicated section on Kaggle's Data Explorer for the Internet Computer Protocol NNS Proposals dataset, complete with a demo notebook to jumpstart data analysis projects.

Acknowledgments:

We also thank the DFINITY Foundation for their support and guidance throughout the project. We extend our heartfelt gratitude to Zesen Zhuang, whose invaluable contributions during the exploratory phases of this project, particularly in data querying and visualizations.

Environment Setup
- Prerequisites
- Installation and Setup
Database Options
- SQLite
  - Description
  - Use Cases
  - Local Management Tools
  - Cloud Hosting Options
- MySQL
  - Description
  - Use Cases
  - Local Management Tools
  - Cloud Hosting Options
- PostgreSQL
  - Description
  - Use Cases
  - Local Management Tools
  - Cloud Hosting Options
  - Advanced Options
Use Git LFS to track and add large files
Data Explorer on Kaggle

Environment Setup

Prerequisites

This project utilizes pipenv for package management. Ensure you have it installed and set up correctly by following the steps below.

Tip: We recommend using the ChatGPT code interpreter for debugging purposes.

Installation and Setup

Install Pipenv:

Install the package and confirm its version with:
```
pip install pipenv 
pipenv --version
```
Add Pipenv to the System Path:

To make pipenv accessible from any terminal window, you might need to add it to your system's PATH. First, find the installation location:
```
pip show pipenv | grep Location
```
Then, add it to the PATH. Replace path_to_site-packages with the actual path you got from the above command:
```
export PATH="$PATH:/path_to_site-packages/pipenv"
```
Create the Virtual Environment:

Activate the virtual environment using:
```
pipenv shell
```

Database Options

The project can be configured to use various database management systems. For each option, we've detailed the primary local management tools and notable cloud hosting services available:

SQLite:
- Description: SQLite is a C-language library that provides a lightweight, serverless, self-contained SQL database engine.
- Use Cases: Ideal for small to medium-sized applications, prototypes, or when a minimal setup with no configuration is preferred.
- Local Management Tools:
  - DB Browser for SQLite: A high-quality, visual tool to create, design, and edit SQLite database files. Website
  - SQLiteStudio: A SQLite database manager with an intuitive interface, all tools and features bundled together. Website
- Cloud Hosting Options: Given SQLite's serverless and embedded nature, it is less common to see dedicated cloud-hosting solutions. Instead, SQLite databases are typically embedded within applications hosted on cloud platforms like AWS, Azure, or Google Cloud.
MySQL:
- Description: MySQL is an open-source relational database system known for its speed and reliability.
- Use Cases: Suitable for a wide range of applications, especially web applications.
- Local Management Tools:
  - MySQL Workbench: A unified visual tool for database architects, developers, and DBAs. Website
  - phpMyAdmin: A free software tool written in PHP, offering a web interface for MySQL administration. Website
- Cloud Hosting Options:
  - Amazon RDS (MySQL): Managed relational database service by AWS that supports MySQL. Website
  - Google Cloud SQL for MySQL: Fully-managed MySQL service by Google Cloud. Website
  - Azure Database for MySQL: Managed MySQL services provided by Microsoft Azure. Website
PostgreSQL:
- Description: PostgreSQL is an open-source relational database that focuses on extensibility and compliance with SQL standards.
- Use Cases: Ideal for complex applications requiring advanced data types or geospatial support.
- Local Management Tools:
  - pgAdmin: The most popular administration and management tool for the PostgreSQL database. Website
  - DBeaver: Free multi-platform database tool for developers and database administrators that supports PostgreSQL. Website
- Cloud Hosting Options:
  - Amazon RDS (PostgreSQL): AWS's managed service that makes it easy to set up, operate, and scale PostgreSQL deployments. Website
  - Google Cloud SQL for PostgreSQL: Managed PostgreSQL service offered by Google Cloud. Website
  - Azure Database for PostgreSQL: Fully managed, enterprise-ready PostgreSQL database service provided by Azure. Website
- Advanced Options:
  - GraphQL Integration:
    - Description: Transform your PostgreSQL database into a GraphQL API without manually defining all resolvers.
    - Tools:
      - Hasura: Automatically generates a GraphQL API based on your PostgreSQL schema. Allows for real-time data, remote schemas, and role-based data access. Website
      - PostGraphile: Instant, highly-performant GraphQL API for your PostgreSQL database. Supports plugins and can handle complex schemas. Website

Use Git LFS to track and add large files:

Initialize Git LFS:

Before you can use Git LFS, you must initialize it for your repository. This only needs to be done once per repository.
```
git lfs install
```
Track Files:

You need to tell Git LFS which files to track. This is done using the git lfs track command. For example, to track all .pkl files, you'd do:
```
git lfs track "*.pkl"
```
This command updates the .gitattributes file, indicating which file patterns should be managed by LFS.
Add the Files:

Now, add the files to your repository like you would with standard Git.
```
git add .
```
Note: This will add all changes. If you want to add specific files, replace the . with the specific file paths.
Commit and Push:

Commit your changes:
```
git commit -m "Add large files with Git LFS"
```
Then push to your remote repository:
```
git push origin main
```

Remember, each time you add a new type of file that you want to be managed by Git LFS, you need to use the git lfs track command with the appropriate file pattern. The git lfs install command, however, only needs to be run once per repository.

Data Explorer on Kaggle

Overview

Kaggle is a renowned platform for data science competitions, public datasets, and knowledge sharing. One such invaluable dataset hosted on Kaggle is related to the Internet Computer Protocol NNS Proposals. The dataset can be accessed via the following link: ICP NNS Proposals Dataset.

File Formats

The ICP NNS Proposals data is available in three primary file formats, catering to various analysis and computational needs:

Pickle:
- Description: Pickle is a Python-specific binary serialization format. It's efficient for storing Python objects, especially data frames in pandas, but can't be easily read with other programming languages.
- How to Use: In Python, you can load a pickle file using the pandas library.
```
import pandas as pd
data = pd.read_pickle("path_to_file.pkl")
```
JSON (JavaScript Object Notation):
- Description: JSON is a lightweight data-interchange format that is easy for humans to read and write. It's commonly used for configuration files and data exchange between languages with different data structures.
- How to Use: In Python, the json module can be used to load JSON files.
```
import json
with open('path_to_file.json', 'r') as file:
    data = json.load(file)
```
CSV (Comma Separated Values):
- Description: CSV is a plain-text format used for structured data. It's straightforward and supported by many data processing tools, databases, and spreadsheet applications.
- How to Use: In Python, the CSV can be loaded using the pandas library.
```
import pandas as pd
data = pd.read_csv("path_to_file.csv")
```

Exploring the Data

To dive deep into the dataset and explore its potential, visit the provided Kaggle link. Kaggle offers an interactive environment, called Kaggle Kernels, which allows for in-browser data analysis. This feature is especially handy for those who want to explore the data without downloading it or setting up a local environment. We also provide a demo notebook that can be used as a starting point for your analysis using kaggle kernels Notebook URL.

Data Dictionary for ICP NNS Governance System Dataset

About the Dataset

The Internet Computer Protocol (ICP) operates under a decentralized governance system known as the Network Nervous System (NNS). This dataset provides a comprehensive account of proposals within the NNS, capturing the democratic governance process of the Internet Computer.

Dataset Variables

Variable	Definition	Description	Frequency	Range	Unit	Type	Sample Observations
`action`	Type of action	Indicates the intent of the proposal (propose, reject, execute)	Per proposal	-	-	String	"ExecuteNnsFunction"
`action_nns_function`	Specific NNS function	Identifies the exact NNS function addressed by the proposal	Per proposal	-	-	String	"NnsCanisterUpgrade"
`deadline_timestamp_seconds`	Voting deadline	Timestamp for the end of the voting period	Per proposal	-	Seconds	Integer	1620340878
`decided_timestamp_seconds`	Decision time	Timestamp when the decision on the proposal was made	Per proposal	-	Seconds	Float	1620340878.0
`executed_timestamp_seconds`	Execution time	Timestamp when the proposal was executed	Per proposal	-	Seconds	Integer	0
`failed_timestamp_seconds`	Failure time	Timestamp when the proposal was deemed unsuccessful	Per proposal	-	Seconds	Integer	1
`id`	Proposal identifier	Unique ID for each proposal	Per proposal	0-104331	-	Integer	3
`known_neurons_ballots`	Voting neurons	Record of neurons that voted	Per proposal	-	-	JSON	`[]`
`latest_tally`	Vote tally	Most recent vote count	Per proposal	-	-	JSON	`{'no': 0, 'yes': 31539577669890139}`
`payload`	Proposal details	Detailed JSON object of the proposal	Per proposal	-	-	JSON	`{...}`
`proposal_id`	Designated proposal ID	ID for the proposal	Per proposal	0-104331	-	Integer	3
`proposal_timestamp_seconds`	Proposal creation time	Timestamp for the inception of the proposal	Per proposal	-	Seconds	Integer	1620339017
`proposer`	Proposer address	Canister address that presented the proposal	Per proposal	-	-	String	"35.0"
`reject_cost_e8s`	Rejection cost	Expense linked to a proposal's rejection	Per proposal	-	e8 ICP units	Integer	100000000
`reward_status`	Reward status	Current state of the proposal's reward	Per proposal	-	-	String	"SETTLED"
`settled_at`	Conclusion time	Timestamp marking the proposal's conclusion	Per proposal	-	Date & Time	String	"2021-05-06 16:00:00"
`status`	Proposal status	Current standing of the proposal	Per proposal	-	-	String	"EXECUTED"
`summary`	Proposal overview	Concise overview of the proposal	Per proposal	-	-	String	"Upgrade ledger canister..."
`title`	Proposal title	Headline of the proposal	Per proposal	-	-	String	"Upgrade ledger canister to git commit 8a560f9..."
`topic`	Proposal topic	Primary theme or subject	Per proposal	-	-	String	"TOPIC_NETWORK_CANISTER_MANAGEMENT"
`updated_at`	Last update time	Last moment the proposal received modifications	Per proposal	-	Date & Time	String	"2021-08-05 15:50:43.155180"
`url`	Proposal link	Direct web link to the proposal	Per proposal	-	-	String	"GitHub Link"

For detailed information on data generation and updates, visit our GitHub repository which also includes auto-update scheduling provisions for this dataset.

sunshineluyao/icp-nns-db