CondensationDS/Condensation

Project Status

ansarizafar opened this issue · 22 comments

Is this project dead?

Hello, thanks for your message, it is not, after a short break we are preparing a new setup.

We are now delivering a few projects with Condensation which help us to prepare the javascript version to present demos/tutorials for web applications. We are also investigating new business opportunities.

We will communicate soon about the status of the project.

Thanks for reply. I am happy to know that this project is not dead. Developers like me are looking for a better database to replace decades old RDMS.

I am unable to find information about query language in documentation. I suggest a Datalog like query language(https://terminusdb.com/docs/terminushub/reference/woql or https://docs.flur.ee/guides/1.0.0/analytical-queries/inner-joins-in-fluree). I also suggest a discord server for community building.

d28b commented

Thank you for the suggestions.

Condensation does not currently have a query language in the conventional sense, and this is not even a goal. Condensation offers the following structures:

  • Synchronized documents: This works great for generic data which should to be available on multiple devices, and synchronized across them. Examples: messenger (Twelve), photo album, or any business-specific app where people are working on the same data. A document is kept in-memory, and can be traversed using the tools your programming language offers. There is no need to learn a query language. This works great for what I call "small data" (< 10 GB, small enough to keep in memory). Most data people are dealing with is < 100 MB.

  • Timeline data: Timeline data is ubiquitous - any sensors measuring something (IoT) produces timeline data, your email feed or any chat history is timeline data. With Condensation, such data is stored as a blockchain-like structure, and you can efficiently query slices (i.e., all data from time X to time Y). Once you have such a slice, you again process it with the tools provided by your programming language. This works equally well for "small data" and "big data", because you always work with a manageable slice.

  • Index: This is basically a searchable index, mapping keywords/ids to documents or objects, e.g. the list of all employees of a large corporation, or an index of videos by title, or pictures by location. This is not really a query language either, or just a very simple one. Since we are operating in a distributed system, the index is not always consistent with the data - it is eventually consistent.

The concept of a query language is tightly linked with the structure of relational databases (whether SQL or NoSQL). Condensation is more like a network of documents linked with each other, and you are navigating through that network.

Condensation is more like a network of documents linked with each other, and you are navigating through that network.

Condensation is like a graph database and all graph databases have a query language. It would be difficult to build apps without a proper query language.

d28b commented

No, Condensation is not a graph database. Just like an animal with pointy ears isn't necessarily a cat.

Query languages are of interest if the computation is carried out on another computer. With Condensation, data is kept locally anyway. Your app can just read and process the data. You don't need a query language. It's much simpler than that. Once you've opened the data, you navigate through a tree (a bit like a filesystem), and gather the information you need.

How is it possible to store whole database (multiple gbs) on user's device specially on mobile phones.

d28b commented

In case you are looking for a traditional centralized database with a query language, then Condensation isn't for you, I'm afraid. There are plenty of centralized cloud solutions out there. We have no intention to compete in that market.

Regarding user data:

When using Condensation, you design your system in such a way that the users' data can reside on the users mobile phone. The approach we take is very different from a centralized cloud solution.

Since users only keep their own data on the device, the device memory is usually big enough. Take a typical messenger app, such as WhatsApp: the users' data is stored on the device, sometimes taking several GB (with photos and media). Many other apps require far less data.

There are projects where the device memory is not enough, and there are a few simple solutions for that:

  • In some situations, old data can be archived or deleted. Since Condensation allows for easy data synchronization, it is fairly straightforward to push such data to a server. However, you'll have to provide enough server space (# users * many GB).

  • When talking about a large library, say a video platform, users would query an index server (see above) with a keyword to find their video, and download that video on demand. Condensation can do such things very efficiently.

Index server is like a traditional centralized database without a query language right? I am very much interested in a new database, just trying to understand how Condensation works.

d28b commented

We are talking about a distributed system. An index server is comparable to an expert in our society. If you have a question regarding a specific topic, you contact this expert.

You may have an index server knowing all employees by name, for example. If you are an employee of the company, you'd register yourself with that index server, and this index server adds you to the list. If you're looking for another employee, you can send a query to the index server.

An index server infrastructure actually consists of two parts:

  • The index master, who creates and maintains the index. Usually a single machine.
  • Index query servers who reply to queries. These servers receive regular updates from the index master. Depending on the size of your system, you may have dozens of such index servers.

For performance reasons, an index query server typically keeps the whole index in memory (or on high speed disks). Queries are often made using HTTP/REST requests.

Index masters/query servers contain some application-specific code. The index master may have to identify you as an employee, for example. The index master may also aggregate or compute data. An index query server may limit the number of results to prevent misuse.

For maximum scalability, every index master/server manages a single index only. For small systems with little load, they may all run on the same physical/virtual server, however.

For maximum security, the index master runs in a secure location outside of the datacenter, and reads messages asynchronously. Index query servers run inside the datacenter and are publicly accessible, of course.

Unlike with classical databases, indices are eventually consistent only, and sometimes not consistent at all by design. E.g., you may be able to participate in the system without registering with the index of employees. These things are application-specific.

As you can see, with Condensation you are building a (distributed) data system with multiple "actors" (as we call them) that communicate with each other. For small systems, we usually have all actors running on the same virtual machine in some datacenter. For larger systems, you can easily scale up.

Thanks for the detailed explanation.

Thank you for your questions Ansar, they are very helpful to prepare the next step for Condensation which is to create a more comprehensive and step-by-step introduction - for that we are preparing further materials which will be published on a dedicated website this autumn. If you are interested in contributing and doing a deep dive into the core please directly contact Thomas by email.

Yes I am interested. Could you please share Thomas's email address?

Sure, you can find his details here (https://viereck.ch/thomas/)

I am about to start a new project and I would to love to use Condensation. Is it somehow possible to play with Javascript version?

every index master/server manages a single index only.

If we have separate index servers for customers, products and orders then how can we query all customers who bought a particular product in last six months.

d28b commented

Answer 1

In a typical SQL database, you'd have three tables: customers, products, and orders. With Condensation, you would think about this differently:

  • The customer is the actor who places the order. Technically, you don't need a customer "table" at all. The customer data belongs to the customer, and is managed by themself.
  • Products are a public list. Every product could be a separate object/document. Whenever you modify a product, you get a new hash. Hence, products are automatically versioned.
  • Orders are timeline data. You're mostly interested in the most recent orders (perhaps the last 2 years or so), and older data is archived. Every order is signed by a customer. It contains a list of references to products, a delivery address and other relevant information.

With this structure, your query would be executed as follows:

  1. Get a slice of the last few months of orders.
  2. Loop through these orders.
  3. If the order contains the product we are looking for, add the customer (actor hash) to the result set.

Answer 2

Let's assume you have a similar problem, for which you really have three "tables" with three indices.

You would create an analytics actor. This actor would load all three indices directly (not query the index servers, but get the indices from the respective index masters), and process that data. Implementing this is not as elegant as writing a SQL statement, but it's just a couple of loops, nothing complicated.

If you do such queries a lot, you'd build a new index with exactly this data, organized in an efficient way, and set up an index server to respond to queries.

For small systems, you could just create a single index master + index server, create all indices there, and keep everything in memory.

Answer 1 looks viable but the question is where all customers/products/orders data will be stored and where the computation will be performed If we are building an eCommerce platform for a big retailer with thousands of customers/products/orders, surely it can't be done on user device. We also need customer/product names and other properties and not just hashes.

I am asking these question as we need to show developers that CondensationDB is better than other available solutions.

d28b commented

Every actor stores all data they need. More precisely:

  • Every user (actor) stores their profile and cart locally¹.
  • The product manager (actor) stores the list of all products.
  • The eCommerce platform (actor) stores all orders. Since orders reference products, it also stores a copy² of these products.

An order contains/references everything necessary to fully process the order. It would contain a shipping address and a payment method, for instance.

Some eCommerce platforms provide the possibility of storing more than one delivery address, and then pick one when ordering. The list of delivery addresses are user-private data. The order contains the chosen delivery address.

When submitting an order, the user actor sends a message with the order to the eCommerce actor. The eCommerce actor verifies the order and replies with a appropriate message.

Every product is an object (or small tree) containing all information about the product (id, name, description, image references³, ...).

When searching for products, the user queries the product index, and then retrieves the found products. A user doesn't generally store products locally, except perhaps for products in his cart / wishlist, or recently visited products.

Footnotes:

  1. If you provide a login server, users can log in using a username and password, and an encrypted copy of the user data will be stored on your Condensation store. A login server sort of reverts Condensation back to a cloud solution (from the user's perspective). You still get all the advantages of Condensation. Using a login server is optional, and even if you provide one, users can still opt out and manage their data on their own, without using the login server.
  2. The product actor and the eCommerce actor may use the same Condensation store, so that the data is not physically duplicated.
  3. Images, technical datasheets, ... are stored as Condensation objects. No separate storage for that is necessary.

It means CondensationDb design is very flexible and it can be used for different use cases.

Messages are like bidirectional streaming RPCs via web sockets right?

d28b commented

Yes, you can look at messages as RPCs.

Protocols:

  • Most of the time, we use HTTP(S) to connect to Condensation stores. We have a well-defined protocol for that, and every implementation supports this.
  • Inside our own systems, we sometimes use a simple proprietary protocol over TCP. This is slightly more efficient.
  • For IoT sensors submitting low-bandwidth measurements, we use a simple UDP protocol. We are about to have a working implementation, and I'll publish/document this within the next few months.
  • Web sockets: we don't currently have any web socket implementation, but this would be feasible.

Note three things regarding messages:

  • Messages are asynchronous. You are posting a message, and the corresponding actor is processing this message at some time in the future ("eventually"). For actors running all the time, messages are delivered immediately (i.e., within network/processing delay).
  • There is no intrinsic order. If you send messages A and B, the recipient may process them in any order.
  • Messages can be updated. You can send A1, then update it to A2. The recipient may receive A1 and A2 (in any order), or only A2.

We are about to have a working implementation, and I'll publish/document this within the next few months.

I can work on documentation site If you need helping hand.

I will close this issue, Ansar to continue the conversation I invite you to use the discussion section in GitHub HERE