This project is the data repository code behind the cloud hosted Microsoft Research Open Data Repository. The code can be used to instantiate a highly customizable cloud based data repository to host and share datasets under a flexible licensing infrastructure with a high level of security and privacy. It provides the ability to deploy datasets directly to an Azure Data Science VM allowing development using popular open source tools such as Python/R on Juypter notebooks, and deep learning frameworks.
This code is forked from the original code that implements Microsoft Research Open Data. Refer to the site Microsoft Research Open Data for a working example of what a data repository instance based on this code will look like.
The repository code can be used to:
- Create a storage service for producers to securely store datasets in the Azure cloud
- Create a customizable, fully responsive cross-device (web + mobile + tablet), and cross-platform user interfaces:
- for users to browse, download, or consume the datasets on Azure computational resources
- for dataset administrators to onboard and administer, and monitor usage for these datasets
- Create distinct authenticated experiences for regular users and organizational users
- Allow dataset owners to administer datasets once they have been onboarded
- Monitor dataset usage analytics that you can visualize via PowerBI or using your own dashboards through API access
Detailed technical overview can be viewed HERE
Detailed instructions on deploying the repository is available HERE
Once deployed, guidance on onboarding and administering datasets is available HERE
Additional videos that demonstrate application functionality can be found HERE.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Copyright (c) Microsoft Corporation. All rights reserved.
Licensed under the MIT license.
For any questions, email us at odr@microsoft.com