filecoin-project/devgrants

Cloud Sentinel Proposal

Closed this issue · 1 comments

Open Grant Proposal:

Project Title: Cloud Sentinel

Proposal Category: Integration/Storage

Individual or Entity Name: Emmanuel Damilare Adediji

Proposer: emmYgd

Project Repo(s): Non-Existent Yet, but will be created upon approval as open-source at https://github.com/emmYgd

(Optional) Filecoin ecosystem affiliations: None

(Optional) Technical Sponsor: None

Do you agree to open source all work you do on behalf of this RFP and dual-license under MIT, GPL, and APACHE2 licenses?: Yes

Project Summary

INTRODUCTION:

When under perilous data breaches, cyber warfare, hacking, hijacking and other often radical cyber-attacks, the aim of the adversary entails gaining access and insight to some form of data, data theft and piracy, cyber-terrorism, cyber-bullying, cyber ransom demands, etc.

This could mean embarrassment, profit or revenue loss, bottom-line dip, customer or compliance regulatory lawsuits and forced closure, depressions/suicidal inclinations, for enterprises and individuals respectively.

Aside general protection requirements, individuals and enterprises alike also have needs for the protection to include various aspects that are unique to their use-cases - for instance, “Can I set this data‘s expiration?”, “Will this data be accessible outside France - we don’t want it to be!”, “This design is a business secret, we don’t want it assessed outside 10 meter radius to our office!”, “How can I meet up with digital security compliance and regulations in my country regarding my business?”

In short, they want not only to secure and protect their data in ways unique to them, but also holistically govern their data even when it is sent outside their domain.

The digital landscape is changing in recent times, new breaches & attacks are being recorded at lightening speed. Security, protection, standardization & governance at the data layer exposed over myriads of networks & protocols form very unique strategy.

NETWORKS AND PROTOCOLS:

There are diverse and varied networks, protocols & infrastructure available for use by individuals, enterprises, governmental parastatals and organizations to move, share and store data. Examples include:

  • high level networks & protocols e.g. HTTP(s)/REST, SOAP, RPC, etc.

  • messaging/queuing networks & protocols e.g. Kafka, AMPQ, etc.

  • Storage Networks & protocols e.g. Filecoin, IPFS, Amazon-S3, etc.

It must be noted that both modern and existing enterprises have put or are putting the decentralized, distributed networks, protocols, infrastructure and solutions into strong considerations while many other have embraced and implemented them.

Decentralized Storage - Filecoin

The future of data storage is truly decentralized & distributed in a zero-trust manner. This has already began to shift the focus from initial centralized storage solutions based on the services of single storage providers or vendors into the truly decentralized distributed storage solutions leveraging on the power of various robust implementations of the blockchain and allied technologies.

In turn, this has began to yield effectiveness and efficiency through cost reduction, speed, high availability, censorship resistance, etc.

Filecoin with its underlying technology IPFS being a content-addressable decentralized storage network, offers the aforesaid. The technology of how this is possible is explained well in associated whitepapers.

SUMMARY PROPOSED SOLUTION:

Cloud Sentinel exposes and implements data security & protection, various encryption use-cases, data standardization and governance, diverse cryptographic operations, key management & rotation, (de-)compression, robust policy enforcements & complex permission provisioning, authorization & authentication, data logging and finger-printing, provenance and reporting, over diverse networks/protocols server and client implementations, on top of the Filecoin network.

It also offers secured data transport, movement, migration, partitioning & backup in the same manner - on top of the Filecoin network.

The server implementation is proposed to be hybrid: while being a server infrastructure in its own right, also serve as a gateway built on top of and interfacing with the Filecoin storage network.

Hence, the implementation performs a dual service of not only exposing data standardization, security, protection, governance, provenance and such other features as mentioned above, but also acting as a gateway to our decentralized and distributed storage - the Filecoin network.

Introducing the EMMA protocol:

At the core of my proposed solution is the Electronic Masking for Many (EMMA) Protocol. This protocol is implemented to prepare data received in various forms and from other disparate and often diverse networks, protocols, services and infrastructure into uniform standardized formats that are compact, discoverable, standardized, ACID-compliant, transactional, indexable, searchable, schema-enforceable, versioned, key-time-stamped, storage-ready and performant. Out-of-core and memory efficient compute strategies, architecture and designs will be implemented.

In conclusion, all the aforementioned will be accomplished using a blend of solid designs & architecture, battle-hardened infrastructure & systems, well-optimized algorithms and data structures & a robust technology strategy that scale seamlessly.

Impact

By adopting the aforementioned technology, the projected value involves the following:

Existing infrastructural improvements:
The solution will add value to the existing Filecoin infrastructure and ecosystem by providing the various aforesaid offerings. Filecoin becomes more like a sort of a secured data lakehouse system upon which even further more complex systems could be built.

Storage Decentralization and Vendor Neutrality:
Rather than having to delegate to a centralized vendor (storage provider), user data coming from disparate sources are stored on the Filecoin network, hence, eliminating reliance on centralized storage providers and encouraging Filecoin’s adoption.

Advanced Data Privacy:
A blend of Fast and strong encryption protects stored data from unauthorized access, even by Filecoin network operators.

Future-Proof System:
A mixture of the default decentralization and the post-quantum cryptography implemented by the system help ensure a true future-proof system against quantum attacks and breaches regarding user data.

Granular Access Control:
Users define who can gain access to their data, how their data would be assessed, where their data could be assessed and when their data would be assessed through various policy enforcement strategies, permission controls and rules. In addition, they can also revoke access at any time.

Complex Cryptography and Governance Use-Cases/Scenarios:
Complex Use-cases and Scenarios like encrypted search, homomorphic/malleable cryptography, location-aware access control, role-based multi-party permission provisioning and other cross-cutting concerns are implemented by the system.

Wider Adoption :
Exposing the system over an array of networks and protocols, settings, configurations and possibly later over User Interfaces (desktop, web and mobile) also serves as a plus regarding flexibility, ease of use and adoption by different platforms and users.

Zero Trust Protection:
In a multi-connected environment, this can add to a layer of security where security travels with data even outside the domain of the original data owner.

Cost Reduction:
The cost of storage by traditional centralized providers is usually higher when compared with that of decentralized technologies. The offering of highly optimized compression algorithms implemented by the proposed system also goes a step further to reduce storage costs for individuals and organizations alike.

True Data Ownership:
This combined approach not only empowers data owners (viz individuals or enterprises), but also gives them back control over their valuable data in their own way, hence, aligning with the core values of the internet of the future - a truly secured, decentralized and distributed web where, no matter their location, user data can be truly said to be SECURED AND OWNED.

Lowering the barrier of entry:
By providing services such as secured data migration, backup and partitioning, the barrier of entry experienced by organizations and individuals in embracing the web-3 due to the data silos that they already have with existing centralized cloud storage providers is greatly lowered. Not only can they move to the decentralized storage, they automatically have access to all the services including security, cryptography, governance etc. offered by the proposed solution and its underlying decentralized Filecoin infrastructure.

Flexibility & Dynamism:
The high penetration of the proposed system in its exposure over networks and protocols like HTTP(s)/REST on one end and possible expansions to RPC, MQTT even for IOT devices - makes the system flexible and dynamic in its adaptation and use-cases.

Outcome

Deliverables:

While there are plans to expand the system, the deliverables for the early working version are laid down below:

Stable version of the Cloud Sentinel exposing Cryptography, Standardization, Compression and Governance, Monitoring and such other concerns as previously aforementioned.

Stable RESTFUL API 
Stable gRPC API
Alternate Web-UI for Alternate Admin Panel Dashboard.
Documentations and usage guides

Success Metrics:

This project has not been implemented yet. It’s still at the architectural, planning, and envisioning stage. As such, success metrics are at best still abstract. However, both technical metrics like performance evaluation, quick throughput, lower latency, deployment and scaling metrics and non-technical metrics like community feedback, adoption rate, ubiquity and versatility, dynamism, etc. will be considered.

Adoption, Reach & Growth Strategies

Target Audience:
Any individual, enterprise, and organization that deals with data - video files, documents, e-books, pictures, business designs & trade secrets, health-related reports and images, texts and numerical data, metrics, etc.

More specifically, the product application include but are not limited to:
Ordinary/Everyday Users
Hospitals & Healthcare Outfits
Modern Data Engineering companies
Machine Learning Outfits
Banks & Financial Institutions
Schools & Educational Institutions
Entertainment Industry
Governmental Organizations & Parastatals
Military
Storage Network Solutions
Resource Constrained Infrastructures (e.g. IOT)
Legal and paralegal Organizations
Life Research (Bio-informatics, Genomics)

Customer Acquisition:
Once sufficient milestone is reached, we first start with running a holistic targeted digital marketing campaign across relevant social media platforms. The aim is to get the words out there.

Another strategy is by reaching out personally to organizations, companies and parastatals that might be interested in such technology. Another way is through community engagements, seminars, tech events.

Reach:
Basically, the service will be rendered through a subscription-based services as primarily B2B (Business to Business) via APIs, SDKs and Gum layers with existing enterprise infrastructure and also B2C(Business to Customers) through easy web and mobile apps.

In addition to that, there are plans to render a B2G (Business to Government) service in the form of government contracts and public projects in the cyber-space.

Development Roadmap

Week 1:
MAJOR
Project/Infrastructure setup
Environment installation and Configuration of Docker, Poetry, Celery, Redis, etc.
Setting up project directory structure
Initializing abstract classes (interfaces), base design templates and dependencies for all layers (Server Gateway, Storage Layer, API Layer)

MINOR
Planning, Strategizing & Design-Thinking on System Interactions and various Flow Considerations

Week 2 - 3:
MAJOR
Server Gateway Core (1)
Non-Deterministic Crypto Strategy
(As-)Symmetric Crypto Algos
Block/Stream Crypto Algos
Post-Quantum Crypto Algos
Fast Compression Algos
Strong Compression Algos
Data Metadata & Standardization(1)

MINOR
Unit/Integration testing for Server Gateway classes
Beginning the RESTFUL API planning and setup

Week 4 - 6:
MAJOR
Server Gateway Core (2)
Key Storage, Management & Rotation
Authorization & Authentication
Permission Provisioning
Caching (1)
Monitoring & Logging(1)
RESTFUL API exposure (1)

MINOR
Unit/Integration testing for Server Gateway classes
Planning, Strategizing & Design-Thinking on Data Governance Considerations

Week 7 - 9:
MAJOR
Server Gateway Core (3)
Data Governance Use-cases
Policy Enforcement
Data Metadata & Standardization(2)
Format Preserving Crypto (Sensitive Metadata Masking)
Caching (2)
Monitoring & Logging(2)
Data Provenance & Fingerprinting
RESTFUL API exposure (2)

MINOR
Unit/Integration testing for Server Gateway classes
Planning, Strategizing & Design-Thinking on Persistence Layer Integration.

Week 10 - 12:
MAJOR
Persistence Layer Implementations
Transactioning & STM
Filecoin Storage/Retrieval interactions
Dynamic Space Allocation & Query
Persistence Time- & Key- Versioning and Stamping
Out-of-core computing

MINOR
Unit/Integration testing for Persistence Layer Classes
RESTFUL API exposure (3)
Planning, Strategizing & Design-Thinking on gRPC.

Week 11 - 12:
MAJOR
Documentation and user guide (REST)
gRPC Implementation (1)
Finalizing the Persistence Layer
Thorough Integrations & Functional Testing of Persistence and Server Gateway Layers.
System Deployment (1)

MINOR
Holistic Debugging and optimization
Alternate Web UI Dashboard Panel - Designing & Envisioning

Week 13 - 14:
MAJOR
System Deployment (2)
Documentation and user guide (REST) - (2)
Performance metrics gathering
Documentations and user guide (gRPC)
Alternate Web UI Dashboard Panel - Full Implementation & Backend connection for Admin Panel - (1)

MINOR
Holistic, thorough & system-wide features and Integrations Testing

Week 15 - 16:
MAJOR
Alternate Web UI Dashboard - Full Implementation & Backend connection for Admin Panel - (2)
Documentations and user guide (gRPC) - (2)

MINOR
Enhancement thinking, Improvements envisioning, Finalization, Sign-off)

Total Budget Requested

Budget Breakdown:

Development (4 Months): $19,200 (48 hrs/week, $25/hr)
Justification: Working 48 hours per week for 4 months allows for focused and efficient development.

Deployment: $900
Justifications:

Cloud Resources: ($500)
Justification: Estimated cost for a CPU/IO bound virtual machine instance on a cloud platform, leverage on free tier as much as possible

CI/CD: ($200)
Justification: For automated builds and deployment

Hosting/Domain Name: ($200)
Justification: Cost of domain registration and general hosting for the server gateway service

Hardware: $1,400
Justifications:

1 MacBook Pro: ($1300)
Justification: This budget allows for a used MacBook Pro with sufficient performance to support the one I currently have. Exploring used options helps minimize budget allocation for essential project components.

1 5G Router for Internet Connectivity: ($100)
Justification: Having a standby gadget for internet access is a boost to my productivity and efficiency in getting the job done. As I live in a developing country with unstable internet access, this will boost my productivity and workflow.

Workspace Rentage: $700
Justification: Renting a dedicated workspace with constant electricity is necessary for the fast and effective completion of the project. As I live in a developing country with unsteady power supply, this will boost my productivity and work-flow.

Contingencies: $200
Justification: Unforeseen and unplanned situations and circumstances

Total Project Budget: $22,400

Maintenance and Upgrade Plans
Subsequent maintenance and upgrade plans are envisioned.

UPGRADES

  • Intention to add a front-end user interface for ordinary and non-technical users to suit their use-case (different from the admin UI app explicated above).

  • Making the system become more intelligent and “living” like a sentinel by implementing the following:

    Anomaly Detection: Using Random Forest Machine Learning models on various standard log and monitoring data to detect unusual patterns that could strongly indicate intrusion or breach.

    Signature-Based Detection: Scanning uploaded files and comparing them against a database of known malware signatures.

    Notification/Alerting System: Notification or alerting system (either via registered emails or phone numbers) to notify stakeholders of possible breaches, perceived threats and other pre-set conditions (based on configuration or rules)

  • Expansion to other Networks/Protocols/Use-cases:
    Expanding the system and its various functionalities in order to expose it over other protocols like: MQTT for IOTs, SOAP for alternate consumption, Apache Thrift/Protocol Buffers for large enterprises, or even Web-RTC for streaming use-cases.

  • Enterprise Integration:
    Integration with popular and in-demand enterprise stacks like Elk-stack for better and even more robust monitoring, reporting, logging, visualization and performance tracking.

  • Data migration, partitioning or backups unto the Filecoin network from other sources (especially centralized storage servers) via the Cloud Sentinel gateway in order to prevent vendor lock-ins especially for already existing data silos. Existing stacks like R-clone and Kafka could be leveraged.

  • Homomorphic/Malleable Cryptography:
    In combination with other systems suited for machine learning and data science operations in and outside the Filecoin eco-system, Cloud Sentinel could be made inter-operable to secure data even at the computation layer - while still ensuring conventional secured storage on Filecoin even after computations.

  • Other language libraries and SDKs:
    Libraries and SDKs can be written in other languages that will consume the various APIs underneath. SDKs and libraries could be implemented in such well-known languages as Java, C#, Node-JS, PHP, and others.

Note that: all the aforementioned upgrade plans are very real, broad and intense developmental efforts that will be presented in various other grant proposals to be considered after the completion and signing off on the current project scope.

MAINTENANCE
Active collaboration with the community and eco-system in order to gather feedbacks, suggestions, constructive criticism, possible alternative strategies and probable evolution insights so as to keep maintaining and improving this project.

Project dependencies will be updated from time to time to account for advances in scope and technology.

Team

This is a single developer project.

Team Members

Name: Emmanuel Damilare Adediji

LinkedIn: https://www.linkedin.com/in/emmanuel-damilare-adediji

GitHub: https://github.com/emmYgd/

Relevant Experiences & Team Code Repositories
I am a polyglot software engineer with 6+ years experience. While my repository (github.com/emmYgd) showcases lots of projects that I have been a part of, for the sake of this application, I will be specific on some examples:

(1)
A fault-tolerant system for an IoT infrastructure written in both Java and Groovy.
Role: My part was to implement the networking and encryption layer - I utilized the Java Networking API and Java Cryptography Extension(JCE). Concurrency was handled using the Groovy GPARS library.

Code Patches and Samples can be found here:
https://github.com/emmYgd/IOT-centric-Server-Projects

(2)
A cloud infrastructure whose core is in the similitude of a message queue.
Role: My part was to implement the event-bus layer using the “subscribe-consume” model - I utilized Python’s Twisted networking and Thespian for actor-based concurrency.

Code Patches and Samples can be found here:
https://github.com/emmYgd/Event-Bus

(3)
An encryption and base layer components implemented as part of an academic research for some client’s freelance post-grad project on Data Governance.
Role: My part was to implement the base layer core for the research project in Python. I utilized PyNaCl, PyCryptoDome and documented their Data Standards.

Code Patches and Samples can be found here:
https://github.com/emmYgd/CloudGuards

In addition to that, I have participated in various projects involving different programming languages, technology stacks and frameworks.
From Backend API for an e-commerce shopping project at:
https://github.com/WiCartitOrg/WiCartIt-Laravel-BE to real estate app that connects house owners to prospective tenants:
https://github.com/emmYgd/Rentium

From Typescript projects at peiges.co: https://github.com/peiges to various other research projects in my career.

For a more holistic capture and a broader perspective into my professional background, career trajectory exegesis and projects done, you could visit and read through my LinkedIn profile at: https://linkedin.com/in/emmanuel-damilare-adediji

Additional Information:
Some development feature concerns are cross-cutting and may involve going back and forth in an overlapping manner…

Other measures to ensure speed, quality and effectiveness:

  • A hybrid of both Defensive Programming (Exception Handling) and DBC (Design By Contract) practices coupled with aforesaid unit-testing will be used to detect bugs quickly & early, while not sacrificing on quality and effective development moving forward.

  • SOLID, DRY and other software engineering principles will be strictly adhered to.

  • DI/IOC, Python’s Poetry & Docker container will be utilized to streamline and manage cross-dependencies, pre-requisites and complex operational requirements at the code, system and infrastructure levels respectively.

Coupled with the aforementioned, listed below are strategies that will help in further giving me development speed gains:

  • I will be using a robust language suited for rapid prototyping, gluing systems together, data security, processing and compute (Python)

  • I will not be re-inventing the wheel often as my solution will consist of utilizing and re-using well established open-source Python libraries, frameworks, toolsets (e.g. Celery for Async Queue, Thespian for Actor-based concurrency, PyNacl and PyCryptoDome for encryption, Dogpile for Cache, Cosmian KMS for key management, Dask for out-of-core computing, GitPython for versioning/stamping etc.), standards ( e.g. Dublin Core, ISO ) and file formats (e.g. H5)

  • Existing and alternative libraries and APIs will be explored and analyzed to select the best for Filecoin storage network connectivity and interactivity.

  • Agile development through horizontal cross-development and quick Unit Testing.

Hi @emmYgd, Thank you for your time with this proposal and for your patience with our review. Unfortunately, we will not be moving forward with a grant at this time. To contact our team with inquiries regarding our review or grants program, please send an email to grants@fil.org.

Wishing you the best with your future projects!