biokoda/actordb

How to choose actors when users share data?

spiffytech opened this issue · 3 comments

Are there guidelines for choosing actor types when users share data?

In the following scenarios, the actor breakdowns I come up with all require selecting data from multiple actors and manually joining it in my app, rather than relying on SQL operations inside individual actors. That sounds like it violates the intent of ActorDB. Alternatively, using extensive denormalization, which is error-prone and possibly storage-heavy.

  • Jukebox software with music collections and playlists, where users can share playlists and music collections. My actor ideas: collection, playlist, user.
  • A to-do app where users can share tasks or task categories. My actor ideas: user, category.
  • An RSS reader that captures RSS items in one place, and separately keeps track of each item's unread status for each user. My actor ideas: feed, user.

How could these be elegantly solved in ActorDB?

I see nothing wrong with querying multiple actors. It's just a read operation. I would consider actor types just as you have proposed.

Using these actor choices seems less elegant than classic monolithic SQL queries to me. Perhaps I'm missing something?

For example, retrieving the contents of all unread items in the RSS reader. With classic SQL, it would be approximately, SELECT body FROM items JOIN item_statuses WHERE user_id=user_id AND is_read=false. Very straight-forward.

With ActorDB using one actor type stores feeds and another actor type stores each user's per-feed-item read/unread status, I'd need one query to select all unread item IDs from the user(user_id) actor, then a second query to grab the contents of all those items by ID from feed(feed_id). Maybe with IN? Or pack it all into one query with an ActorDB loop and variables?

Using either multiple queries and an IN holding possibly thousands of IDs, or using a query with a loop, both seem less elegant than the classic SQL solution. Are either of these the recommended ActorDB approach?

Or maybe a better approach is for each feed actor to contain a table with the read status of all items for all users?

With each actor being a complete database, I'm very unclear on where classic SQL design and queries stop end and ActorDB-specific operations and design begin.

I'd need one query to select all unread item IDs from the user(user_id) actor, then a second query to grab the contents of all those items by ID from feed(feed_id)

Yes this is how I would do it. User actor stores positions of last read items for every feed, then you go to the feeds themselves to check if there is anything new.

Using either multiple queries and an IN holding possibly thousands of IDs, or using a query with a loop, both seem less elegant than the classic SQL solution

Yes but what you described has to happen either way. You are either joining yourself or within the database. A single server database is fine with doing it as it has all that data readily available. A distributed system must join across servers for this to work.

With ActorDB it is very clear what has to happen. This way you can design your system with this in mind. When you are displaying the feeds to the user, they can be loaded asynchronously. You do not have to read everything at once. You are not doing joins across large tables, you are making simple read queries.

I suggest not worrying as much about what is "elegant" but what is scalable and reasonable. A lot of the time people consider one solution more elegant then the other, simply because they shifted the complexity to some other system. It's elegant to them, because they don't have to deal with the messy details.

With each actor being a complete database, I'm very unclear on where classic SQL design and queries stop end and ActorDB-specific operations and design begin.

Usually the correct design is avoiding multi actor write transactions. In this case if you have a feed, your crawler updates feeds actors, your client side code updates user actors. This keeps your write and read load distributed across your cluster.