pubkey/rxdb

[RFC] autoClean for crdt operations.

1yasa opened this issue · 10 comments

I tested the performance of CRDT under frequent and intense operations, typically in a massive canvas used for collaborative online drawing. Additionally, I evaluated its maintainability to ensure it doesn't slow down over time. The results confirmed the need for a cleaning mechanism in CRDT, akin to the garbage collection operation in programming languages. Think of it as the GC operation ensuring that the performance of your CRDT application remains consistently high.

While this may sacrifice a certain degree of data traceability (the number of steps allowed for redo), it significantly enhances the adaptability of the CRDT plugin in various scenarios.

I copied crdt plugin, and tested auto clean with my case, here is the codes, only need a few lines:

Define options and clean function

// trigger_counts - reduce_counts should > 30
const config = {
	trigger_counts: 120,
	reduce_counts: 60,
	auto_clean: true
}

const cleanOperations = <RxDocType>(operations: Array<Array<CRDTOperation<RxDocType>>>) => {
        if(!config.auto_clean) return operations
        if (operations.length < config.trigger_counts) return operations
        
        return operations.slice(config.reduce_counts)
}

Add function to updateCRDT:

截屏2024-01-04 20 01 19

Disable value check by config.auto_clean

截屏2024-01-04 20 03 49

Rxdb support

createRxDatabase support CRDTConfig:

interface CRDTConfig {
	triggerCounts: number,
	reduceCounts: number,
	autoClean: boolean
}

Due to formatting and code style issues in the editor, I didn't submit a pull request to avoid any inconvenience.

And i found the function cleanup for clean plugin can be more performance:

截屏2024-01-04 21 45 26

I asked gpt, it suggest use and to exec query for better performance:

const remove_items = await storage.dexieTable
	.where('_deleted')
	.equals('1')
        .and(item=>item['_meta']['lwt'] < maxDeletionTime)
	.toArray()

await storage.dexieTable.bulkDelete(
	remove_items.map(item => item[this.schema.primaryPath])
)

Some things I have in mind:

We do not have to disable the value check. Instead of fully puring some CRDT operations, they should be replaced with an insert-like operation. This would ensure that in all cases it is able to rebuild the doucment from the currently stored crdt operations.

The purging of operations should not run in the cleanup() of the RxStorage because that would require all storages to implement the exact behavior. Instead I think the CRDT plugin should use the plugin hooks to strip away crdt operations at the correct times.

The purging strategy with triggerCounts etc should be defined in the schema. This ensures that all nodes using that schema have the exact same behavior.

Some things I have in mind:我想到的一些事情:

We do not have to disable the value check. Instead of fully puring some CRDT operations, they should be replaced with an insert-like operation. This would ensure that in all cases it is able to rebuild the doucment from the currently stored crdt operations.我们不必禁用值检查。不应完全纯化某些 CRDT 操作,而应将其替换为类似插入的操作。这将确保在所有情况下都能够从当前存储的 crdt 操作重建文档。

The purging of operations should not run in the cleanup() of the RxStorage because that would require all storages to implement the exact behavior. Instead I think the CRDT plugin should use the plugin hooks to strip away crdt operations at the correct times.操作的清除不应在 RxStorage 的 cleanup() 中运行,因为这将要求所有存储实现确切的行为。相反,我认为 CRDT 插件应该使用插件挂钩在正确的时间剥离 crdt 操作。

The purging strategy with triggerCounts etc should be defined in the schema. This ensures that all nodes using that schema have the exact same behavior.应在架构中定义 triggerCounts 等的清除策略。这确保使用该模式的所有节点都具有完全相同的行为。

  • You are right, a better way is to "compress" crdt operations instead of reduce.
  • Actually, the puring of operations what i mean runing at updateCRDT in CRDTplugin.
  • triggerCounts may be define in db config and schema, define in schema will override the db config, because in most case, this will be convenient

In order to prevent replication errors, I updated the clean function for CRDT operations:

const cleanOperations = async <RxDocType>(args: {
	crdtDocField: CRDTDocumentField<RxDocType>
	docData: WithDeleted<RxDocType>
	storageToken: string
	hashFunction: HashFunction
}) => {
	const { crdtDocField, docData, storageToken, hashFunction } = args

	if (!config.auto_clean) return crdtDocField.operations
	if (crdtDocField.operations.length < config.trigger_counts) return crdtDocField.operations

	const target_operations = crdtDocField.operations.slice(config.reduce_counts)

	target_operations.push([
		{
			body: [
				{
					ifMatch: {
						$set: omit(docData, 'crdts', '_meta', '_rev', '_attachments', '_deleted')
					}
				}
			],
			creator: storageToken,
			time: now()
		}
	] as Array<CRDTOperation<RxDocType>>)

	crdtDocField.operations = target_operations
	crdtDocField.hash = await hashCRDTOperations(hashFunction, crdtDocField)
}

export async function updateCRDT<RxDocType>(
	this: RxDocument<RxDocType>,
	entry: CRDTEntry<RxDocType> | CRDTEntry<RxDocType>[]
) {
	entry = overwritable.deepFreezeWhenDevMode(entry) as any

	const jsonSchema = this.collection.schema.jsonSchema
	if (!jsonSchema.crdt) {
		throw newRxError('CRDT1', {
			schema: jsonSchema,
			queryObj: entry
		})
	}
	const crdtOptions = ensureNotFalsy(jsonSchema.crdt)
	const storageToken = await this.collection.database.storageToken

	return this.incrementalModify(async docData => {
		const crdtDocField: CRDTDocumentField<RxDocType> = clone(getProperty(docData as any, crdtOptions.field))

		const operation: CRDTOperation<RxDocType> = {
			body: toArray(entry),
			creator: storageToken,
			time: now()
		}

		const lastAr: CRDTOperation<RxDocType>[] = [operation]

		crdtDocField.operations.push(lastAr)

		crdtDocField.hash = await hashCRDTOperations(this.collection.database.hashFunction, crdtDocField)

		docData = runOperationOnDocument(this.collection.schema.jsonSchema, docData, operation)

		await cleanOperations({
			crdtDocField,
			docData,
			storageToken,
			hashFunction: this.collection.database.hashFunction
		})

		setProperty(docData, crdtOptions.field, crdtDocField)

		return docData
	}, RX_CRDT_CONTEXT)
}

After cleaning, perform a full data $set operation once to ensure the final data consistency.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed soon. If you still have a problem, make a PR with a test case or to prove that you have tried to fix the problem. Notice that only bugs in the rxdb premium plugins are ensured to be fixed by the maintainer. Everything else is expected to be fixed by the community, likely you must fix it by yourself.

just leaving CR-SQLite

This issue has been automatically marked as stale because it has not had recent activity. It will be closed soon. If you still have a problem, make a PR with a test case or to prove that you have tried to fix the problem. Notice that only bugs in the rxdb premium plugins are ensured to be fixed by the maintainer. Everything else is expected to be fixed by the community, likely you must fix it by yourself.

Issues are autoclosed after some time. If you still have a problem, make a PR with a test case or to prove that you have tried to fix the problem.