🗳️ We are building a collaborative platform to create and collect key datasets.
Our public institutions are horrible at data management. This is a fact known to most researchers, students, and journalists. If you browse through government websites for long enough, you can find a plethora of data. But the problem lies with how it is organized. Here are a set of problems we at WatchDog have to constantly deal with:
- Data is stuck inside poorly formatted PDF or PPT files.
- Data discovery happens almost by accident. Datasets are poorly organized and search functions are non-existent.
- Data is not machine-readable.
- The Government Open Data Portal exists (https://data.gov.lk/) but hasn’t been updated in years making the portal practically useless.
- Bad data archiving practices/data disappearing into the void.
The core platform, all datasets, governance, and discussions will take place inside a single repository on Github. Here’s a breakdown of our thinking:
-
Simplicity
-
By hosting everything on a simple repository on Github, we eliminate the need for
-
External databases
-
Implementing user management and authentication logic, administration interfaces, dataset management logic, etc.
-
Contributions can happen through Pull Requests. This also comes with the ability to manage revisions, comments, and rejections.
-
You can request a dataset by simply raising an issue on Github. For non-technical users and sake of consistency, we’ll create a simple form on the public-facing portal which utilizes the GitHub APIs.
-
-
-
Transparency
-
We noticed that the https://data.gov.lk/ portal has the ability for users to suggest and upload datasets. But there’s no transparency as to what happens to these requests or uploaded data waiting for approval. Utilizing Github’s Pull Request and Issues functionality, we can make sure that all of these contributions are visible to the public.
-
Git provides us with an audit trail and shows who changed what and when.
-
-
Resiliency
-
If WatchDog seizes to exist as an organization, our community should be able to continue updating the portal without our direct involvement.
-
This can happen in two ways:
-
Transferring ownership of the repository to a different organization (Action needed from the repository owner — Team WatchDog)
-
Forking the repository together with the core architecture and all underlying datasets (No action required on behalf of the repository owner)
-
-
- Clone the repository to your computer
git clone <address>
- Create a new branch for your dataset. Use the branch naming convention
dataset-new/<dataset identifier>
git checkout -b new-dataset/<dataset_identifier>
-
Create a new directory for your dataset inside
datasets/
-
Add your files to the folder you created.
-
Use the Meta Generator Tool to generate the metadata file for your dataset.
-
Copy the generate JSON structure and add it to a file named
meta.json
inside the directory you created. -
Here’s some information you should include inside the
Notes
section:-
About the dataset
-
Links to the original source and retrieval dates
-
Collection methodology and tools used
-
Copyright information
-
Contributor Information (Optional)
-
-
Commit your code
git commit -am "Insert commit notes here"
- Push your changes to the repository
git push origin new-dataset/<dataset_identifier>
-
Go to the branch on the GitHub web interface and create a Pull Request to the
main
branch -
Members from our core team will review your Pull Request and let you know if there are any necessary changes.
-
When our core team approves your changes, the data portal will get updated with your dataset. Congratulations and thank you for your contribution! 🥳
- Create a GitHub account (github.com) if you don’t already have one.
- Navigate to the Issues section of the
databank-sri-lanka
repository: https://github.com/team-watchdog/databank-sri-lanka/issues - Click on the “New Issue” button
- Provide as much information as possible. Including but not limited to:
- Why are you requesting this dataset?
- Links to original data
- How do you want the data to be formatted?
- Include any related files as attachments
- Submit request by clicking on “Submit new issue”
- For Files more than 20MB upload files to a third-party service such as S3 and add the link to the
Notes
section of the Metadata file for the dataset.
- NodeJS
-
Run
yarn install
-
Run
yarn dev
-
Visit
http://localhost:3000