pacman82/odbc-api

Many APIs are not unsafe

Closed this issue · 7 comments

In the context of this crate, a potential source of unsoundness is reading from uninitialized values. However, most structs in this crate do initialize the data before presenting it to driver via Cdata's pointers. Therefore, there is no need for these interfaces to be marked as unsafe.

The ODBC may not write any data to the pointer, but because it can't re-allocate them, the regions are always initialized.

Examples of APIs that do not need unsafe:

  • ColumnWithIndicator::iter
  • TextColumn::indicator_at and TextColumn::value_at

The situation where these APIs would need to be unsafe is if they would be initialized with Vec::with_capacity, filled with ODBC, and then a call of Vec::set_len would be made in order to avoid the initialization during allocation.

Even in that case, I do not think that they would be unsafe because we would still need to know if the ODBC wrote something to them in order to correctly set_len on them (or we would be exposed via a faulty ODBC implementation).

Ah, you have a good eye. I was unsure about the safety status of these myself. Happy to discuss this with someone. Here is my reasoning though, why I decided to error on the site of caution and declare these unsafe:

It is entirely up to the ODBC driver that values the "unused" parts of the buffers hold. So while we do not access invalid memory, we can't make any reliable statement about their contents either. Depending on the circumstances an odbc driver may even memcopy some slices of partially initialized memory over to these.

Most likely though a driver just doesn't care about these values. Yet, even in these scenarios the contents of the buffer can be at very least confusing. The unused values may all be neatly set to zero in the beginnig, but once you start reusing the same buffer over and over again, they may just as well be random. Any application basing decisions on the contents of these buffers should be careful and is likely to be suprised by the behaviour of at least some ODBC drivers.

Tl;dr: We wont't access invalid memory, but we also can not give any guarantee about the contents either.

Thanks for the explanation.

unsafe is used to declare an invariant that the Rust compiler can't prove it is upheld, that would otherwise result in unsoundness. My understanding is that reading an initialized memory region is always sound.

We have exactly the same situation in arrow - the values buffer of an arrow array contains both the valid and null values, and it is the validity that declares what is valid. There is nothing unsound or unsafe in reading those values, they are just semantically meaningless. We also still need to initialize the values; the operation of creating a slice over something un-initialized is unsound.

The only situation where random values cause a soundness problem is when the data type has trap values or otherwise invalid bit patterns. The only example I can find in this crate for this is the u8 -> bool, which we properly protect here, since everything else fits into bytemuck::Pod definition.

My understanding is that this crate would like to use unsafe to mark methods that are sound but where the content they produce may be random. I can't see how that fits in the intended use of the unsafe keyword. By these standards, almost all code that depends on random numbers would be unsafe.

imo the contract is very clear: values' are meaningful in indexes where the corresponding indicator is different from -1 and some extra rules for Text and Binary, where the indicator also represents the length of the value. This is unrelated to safety, it is just a semantic invariant of the data.

You make some valid points. I do not think that at this point in time it is well defined in edge cases what safe code is allowed to do. As far is I understand Rusts safety guarantees are fundamentally about preventing undefined behaviour. What exactly counts as undefined behaviour is not fully specified. Yet I am more inclined to share your view, then before. However exposing these buffers is even more dangerous, than one might assume. E.g. the indicator values could also be longer than the actual values. They are also used to indicate exception behaviour.

Contrary to my previous statement I would be up for another pair programming session on the weekend (if you are game), in order to better understand how exactly you want to use this buffers. I am hoping to gain a better abstraction out of it.

Safe or not using this buffers is at least very error prone.

@jorgecarleitao Our last pair programming session has been very eduactional for me. Thanks for sharpening the distinction between safe and unsafe for me. I of course agree with you now, that the examples you are mentioning are indeed safe.

These apis should now be safe in version 0.35.1

Likewise, it was fantastic to know more about ODBC and work with you!