resurrecting-open-source-projects/dcfldd

[master] Undefined behavior in `src/hash.c`

hartwork opened this issue ยท 19 comments

Symptom when compiled with UndefinedBehaviorSanitizer:

# src/dcfldd if=.github/workflows/test.txt of=/tmp/test2.txt hash=md5,sha1,sha256,sha384,sha512
hash.c:227:9: runtime error: call to function MD5Init through pointer to incorrect function type 'void (*)(void *)'
([..]/src/dcfldd+0x137df8): note: MD5Init defined here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior hash.c:227:9 

I can offer a fix, but let's first get pull request #27 reviewed an merged.

#27 is merged now, let me try a pull request for a fixโ€ฆ

@davidpolverari would you rather want me to go (a)

--- void SHA256_Init(SHA256_CTX *);
--- void SHA256_Update(SHA256_CTX*, const uint8_t*, size_t);
--- void SHA256_Final(SHA256_CTX *, uint8_t[SHA256_DIGEST_LENGTH]);
+++ void SHA256_Init(void*);
+++ void SHA256_Update(void*, void*, size_t);
+++ void SHA256_Final(void*, void*);

and cast inside the function body or (b) add a wrapper function doing the casting in between, and keep the original functions untouched? Other ideas?

Shouldn't we fix src/hash.c?

hashtype_t hashops[] =
{
    {"md5",
     1,
     &MD5_window_context,
     &MD5_total_context,
     &MD5_vwindow_context,
     &MD5_vtotal_context,
     (void (*)(void *)) MD5Init,     /* if we set the parameter to "MD5_CTX *", as defined in `src/md5.h` */
     (void (*)(void *, const void *, size_t)) MD5Update,
     (void (*)(void *, void *)) MD5Final,
     &MD5_hashstr[0],
     sizeof (MD5_hashstr),
     NULL},

See the comment I added to the above code snippet. What do you think?

Shouldn't we fix src/hash.c?

@davidpolverari that won't be enough. We cannot cast MD5Init to (void (*)(void *)) with its current signature and the problem multiplies times 3 times 6 or so. The undefined behaviors is that we're calling a function pointer with a signature mismatch at runtime.

See the comment I added to the above code snippet. What do you think?

I diffed that against master and the only change is adding the comment. I don't get the idea you have for a fix then. What's the idea?

I don't have another idea at the moment. I will have to take a deeper look on how it is implemented. I am thinking something along the lines of using preprocessor concatenation to try to solve it, if possible. But I can't be sure if it will be useful without having a deeper look, and I will not be able to do this right now.

@davidpolverari okay, I will pause my own work about it until I hear back from you on your own experiments. See you ๐Ÿป

I diffed that against master and the only change is adding the comment. I don't get the idea you have for a fix then. What's the idea?
From what I glanced, I was thinking about changing all (void *) parameters to (xxx_CTX *) ones upon hashops initialization. But it was just a thought, without having thought about how everything was working. I will really need to debug that one further to think about a solution.

I really prefer to avoid changing the functions signatures if we can.

@davidpolverari okay, I will pause my own work about it until I hear back from you on your own experiments. See you ๐Ÿป

See you ๐Ÿป

PS: when possible, please let me know which flags you used to build dcfldd with UBSan, and under which compiler. Using only -fsanitize=undefined under GCC 13 I wasn't able to repro the runtime message.

@davidpolverari sure!

# clang-18 --version | head -n1
clang version 18.1.4

# lsb-release -i
Distributor ID: Gentoo

# ./autogen.sh && ./configure CC=clang-18 CFLAGS='-fsanitize=address,undefined,leak -fno-sanitize-recover=all -fno-omit-frame-pointer' LDFLAGS='-fsanitize=address,undefined,leak' && make clean all
[..]

# src/dcfldd if=.github/workflows/test.txt of=/tmp/test2.txt hash=md5,sha1,sha256,sha384,sha512 
hash.c:227:9: runtime error: call to function MD5Init through pointer to incorrect function type 'void (*)(void *)'
(/home/sping/__playground/dcfldd-fork/src/dcfldd+0x137df8): note: MD5Init defined here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior hash.c:227:9 

Thanks!

@davidpolverari would you rather want me to go (a)

--- void SHA256_Init(SHA256_CTX *);
--- void SHA256_Update(SHA256_CTX*, const uint8_t*, size_t);
--- void SHA256_Final(SHA256_CTX *, uint8_t[SHA256_DIGEST_LENGTH]);
+++ void SHA256_Init(void*);
+++ void SHA256_Update(void*, void*, size_t);
+++ void SHA256_Final(void*, void*);

and cast inside the function body or (b) add a wrapper function doing the casting in between, and keep the original functions untouched? Other ideas?

fwiu, the issue UBSan warns us about would only present practical problems if/when we build dcfldd with CFI (Control Flow Integrity), which distros are beginning to do in any case. In that case, I think adding wrapper functions would be better, something like what sqlite has done to solve this same issue.

What do you think about it?

@davidpolverari

  • I'm good with going for adding a layer of wrap ๐Ÿ‘
  • Chimera Linux is already using CFI on a distro level today (but not across shared objects).

Would you rather be in the developer seat or in the review seat with this? I'm good with either, with a secret slight preference ๐Ÿ˜„

How would you like to continue?

Would you rather be in the developer seat or in the review seat with this? I'm good with either, with a secret slight preference ๐Ÿ˜„

How would you like to continue?

I definitely rather be in the review seat in this case ๐Ÿ˜„

Would you rather be in the developer seat or in the review seat with this? I'm good with either, with a secret slight preference ๐Ÿ˜„
How would you like to continue?

I definitely rather be in the review seat in this case ๐Ÿ˜„

@davidpolverari here you go:

Wow. I thought it would take a while for you to send the fixes ๐Ÿ˜†. Anyways, I will probably take a little longer to have a look at this one, given the extent of the changes. But I will make some remarks about some of the changes in advance.

Wow. I thought it would take a while for you to send the fixes ๐Ÿ˜†.

It became smaller after I started, except the side show bug that I ran into.

Anyways, I will probably take a little longer to have a look at this one, given the extent of the changes. But I will make some remarks about some of the changes in advance.

Review on commit-level should be most fun. Through that lense, things become quite small, only the combined diff may seem a bitter bigger at first, I think.

I'm looking forward to the continuation of this.

It's fine for me that way. It really eases the reviewing process.