nalgeon/sqlean

varargs for hashing?

tantaman opened this issue · 6 comments

I noticed that the hash functions only accept 1 argument

sqlite3_create_function(db, "md5", 1, flags, (void*)5, crypto_hash, 0, 0);
sqlite3_create_function(db, "sha1", 1, flags, (void*)1, crypto_hash, 0, 0);
sqlite3_create_function(db, "sha256", 1, flags, (void*)2256, crypto_hash, 0, 0);
sqlite3_create_function(db, "sha384", 1, flags, (void*)2384, crypto_hash, 0, 0);
sqlite3_create_function(db, "sha512", 1, flags, (void*)2512, crypto_hash, 0, 0);

but the underlying implementations do appear to be able to update an existing hash with more data.
E.g., from sha1.c

sqlean/src/crypto/sha1.c

Lines 202 to 221 in ab0e7e2

void sha1_update(SHA1Context* ctx, const unsigned char* data, size_t len) {
unsigned int i, j;
j = ctx->count[0];
if ((ctx->count[0] += len << 3) < j) {
ctx->count[1] += (len >> 29) + 1;
}
j = (j >> 3) & 63;
if ((j + len) > 63) {
(void)memcpy(&ctx->buffer[j], data, (i = 64 - j));
SHA1Transform(ctx->state, ctx->buffer);
for (; i + 63 < len; i += 64) {
SHA1Transform(ctx->state, &data[i]);
}
j = 0;
} else {
i = 0;
}
(void)memcpy(&ctx->buffer[j], &data[i], len - i);
}

Could the extension be updated to accept varargs for hash code generation? E.g., for the use case of hashing an entire selection set?

cc @schickling

What do you mean by "use case of hashing an entire selection set"? Please provide an example.

SELECT sha1(*) FROM foo;

where sha1 here would return the hash code of all the cells taken together for each row.

E.g.,

.mode column
CREATE TABLE foo (a, b, c);
INSERT INTO foo VALUES (1,2,3), (4,5,6), (7,8,9);
SELECT sha1(*) as hash FROM foo;
hash
----
row1_hash
row2_hash
row3_hash

Thanks!

That's not exactly how SQL works. To apply a function (other than count) to all columns in a row, you must list them explicitly:

select sha1(a, b, c) as hash from foo;

Could you please clarify why it would be helpful to compute a single hash over multiple columns? Do you have a real-world use case?

Do you have a real-world use case?

@schickling does

He wants to compare whether or not new query results differs from his previous query results. Other ideas would be:

  • compare each item in the results, cell by cell
  • add counter columns to each table that are incremented on any mutation of a row. This of course get problematic when joining many tables since counters need to be selected and concatenated.
  • any other ideas?

I'm not sure I follow. Let's say sha1 can compute the hash over multiple columns (sha1(a, b, c)). What do I do with that? What exactly does it mean "to compare whether or not new query results differs from his previous query results"? What are the "query results" and "previous query results" here? I'd appreciate a complete example.

All right, then 🤷‍♀️