/landscape-graph

CNCF Landscape Graph, data model, and applications.

Primary LanguageJupyter NotebookOtherNOASSERTION

CNCF Landscape Graph

Initial, open, active development.

Join us @ #landscape-graph. Here's our current activities. Formal plan and roadmap are in progress.


Often, we need to understand how an open source project interacts with others, how it's changing over time, and who's enabling it's continued success. We want to understand what alternatives exist, or how complementary projects might be combined in purpose-fit or novel ways. We might want to dive in and contribute! This is how projects and ecosystems grow to meet business challenges facing modern organizations.

Landscape Graph Data Model

Graphs can facilitate rich analysis of our vibrant and dynamic communities, the humans they comprise, and the clusters of contribution and thought leadership they produce.

Using the data underlying the existing landscape as input, a Labeled Property Graph (LPG) is constructed using Cypher (SQL for Graphs), resulting in a Neo4j graph database.

Here's the schema:

landscape-graph-data-model


"Origin Story"

In November of 2018 there were 25 CNCF projects.

At the time Ayrat Khayretdinov published the "Beginner's Guide to the CNCF Landscape." It opened with:

The cloud native landscape can be complicated and confusing. Its myriad of open source projects are supported by the constant contributions of a vibrant and expansive community. The Cloud Native Computing Foundation (CNCF) has a landscape map that shows the full extent of cloud native solutions, many of which are under their umbrella.

It described the CNCF Mission in these terms:

The CNCF fosters this landscape of open source projects by helping provide end-user communities with viable options for building cloud native applications. By encouraging projects to collaborate with each other, the CNCF hopes to enable fully-fledged technology stacks comprised solely of CNCF member projects. This is one way that organizations can own their destinies in the cloud.

We. Have. Grown.

Today there are 5.4 million humans using Kubernetes and the landscape continues to expand.

2022 Q2 Cards cap funding
projects 111 614,394 $291.4 M $29.6 M
ecosystem 1,061 3,066,372 $15.7 T $29.1 B

We have a "good" problem

The CNCF Landscape aggregates summary data from GitHub, Crunchbase, Yahoo Finance, Twitter, and other sources while providing the ability quickly find, filter, and group the more than 1000 Cards across numerous dimensions. It is automagically updated daily. It continues to work as designed.

landscape-all

With a single well placed click a wealth of data can be summoned. Here's the "Card" for Neo4j

neo4j-card

This is perfect when we know what we're looking for (specifically).

Technical TLDR

source: grandstack.io/docs/...

GRANDstack is a combination of technologies that work together to enable developers to build data intensive full stack applications. The components of GRANDstack are:

  • GraphQL - A new paradigm for building APIs, GraphQL is a way of describing data and enabling clients to query it.
  • React - A JavaScript library for building component based reusable user interfaces.
  • Apollo - A suite of tools that work together to create great GraphQL workflows.
  • Neo4j Database - The native graph database that allows you to model, store, and query your data the same way you think about it: as a graph.

Here's how it all fits together in the context of a movie search app:

grand-arch

Additional tools and frameworks

TODO: #27

Component What it is
Neo4j GraphQL Library {neo}/product/graphql-library, (dev blog)
Neo4j Streams {neo}/labs/kafka, {gh}/neo4j-contrib/neo4j-streams
gitbase Git history as MySQL, src-d/gitbase
JavaFX UI, 3d, openjfx.io
Quarkus AoT, minify, Dev UX, quarkus.io

Graph Data Science Algorithms ("Why Neo4j?")

https://neo4j.com/developer/graph-data-science/graph-algorithms

Graph Databases “perform the join on insert” instead of query time. No joins or table scans required.

A graph data model (vs. rectangular relational) can bring to bear all that we’ve learned from ad/fin/security tech, big data, ml, etc.

graph-data-science-pic

Graph Data Science Algorithm Types

Docs --> https://neo4j.com/docs/graph-data-science/current

Type Definition
Path Finding Help find the shortest path or evaluate the availability and quality of routes
Centrality Determine the importance of distinct nodes in a network
Community Detection Evaluate how a group is clustered or partitioned, as well as its tendency to strengthen or break apart
Similarity Help calculate the similarity of nodes
Topological link prediction Determine the closeness of pairs of nodes
Node Embeddings Compute vector representations of nodes in a graph.
Node Classification Uses machine learning to predict the classification of nodes.
Link prediction Use machine learning to predict new links between pairs of nodes.

Cypher ("SQL for Graphs")

https://github.com/opencypher/openCypher

Cypher is a declarative graph query language that allows for expressive and efficient querying, updating and administering of the graph. It is designed to be suitable for both developers and operations professionals. Cypher is designed to be simple, yet powerful; highly complicated database queries can be easily expressed, enabling you to focus on your domain, instead of getting lost in database access.

On its influences and roots:

Cypher is inspired by a number of different approaches and builds on established practices for expressive querying. Many of the keywords, such as WHERE and ORDER BY, are inspired by SQL. Pattern matching borrows expression approaches from SPARQL. Some of the list semantics are borrowed from languages such as Haskell and Python. Cypher’s constructs, based on English prose and neat iconography, make queries easy, both to write and to read.

How to Contribute

License

This repository contains data received from Crunchbase. This data is not licensed pursuant to the Apache License. It is subject to Crunchbase’s Data Access Terms, available at https://data.crunchbase.com/docs/terms, and is only permitted to be used with Linux Foundation landscape projects.

Everything else is under the Apache License, Version 2.0, except for project and product logos, which are generally copyrighted by the company that created them, and are simply cached here for reliability.