Re-interpreting a value of a large type as an array of a smaller type
ergawy opened this issue · 8 comments
I want to re-interpret a value as an array of values of a smaller type. For example, an i32 as an array of 4 i8s. Also, the source (big) and target (small) types are polymorphic so the size of the target array is not fixed (in terms of the number of elements rather than the total size in bytes). Can I do that?
No, bitcasting i32
to [i8 * 4]
is not allowed. Doing stuff like this is not even allowed in C. For example, the following C code is illegal:
int32_t* p = /*...*/;
int16_t* q = (int16_t*) p;
int16_t i = q[1]; // the C standard does not allow this.
I don't understand the second part of your question. Can you give me some more context or an example?
EDIT:
I just wanted to clarify my intent with a simpler example. Here is a Rust code to do it:
fn main() {
use std::mem;
let x : i32 = 200;
let one: [i8 ; 4] = unsafe { mem::transmute_copy(&x) };
assert_eq!(-56, one[0]);
assert_eq!(0, one[1]);
}
ORIGINAL COMMENT:
Sorry for not giving an example from the start.
I want to use the PTX warp shuffle instruction (here).
fn shuffle_words[T, T2](input: T, src_lane: i32) -> () {
let num_words = sizeof[T]() / sizeof[T2]();
let in_words : [T2 * num_words] = bitcast[[T2 * num_words]](&input); // I know that I can't use an expr for allocation here, but I am just trying to explain my point.
let out_words : [T2 * num_words];
for i in range(0, num_words) {
out_words(i) = shuffle(in_words(i), src_lane);
}
}
For more context, I am implementing parallel operations equivalent to what is provided by CUDA's CUB library. Here is an implementation of the shuffle operation from CUB's code:
template <typename T>
__device__ __forceinline__ T ShuffleDown(
T input, ///< [in] The value to broadcast
int src_offset, ///< [in] The relative up-offset of the peer to read from
int last_lane = CUB_PTX_WARP_THREADS - 1) ///< [in] Index of first lane in segment
{
typedef typename UnitWord<T>::ShuffleWord ShuffleWord;
const int WORDS = (sizeof(T) + sizeof(ShuffleWord) - 1) / sizeof(ShuffleWord);
T output;
ShuffleWord *output_alias = reinterpret_cast<ShuffleWord *>(&output);
ShuffleWord *input_alias = reinterpret_cast<ShuffleWord *>(&input);
unsigned int shuffle_word;
asm volatile("shfl.down.b32 %0, %1, %2, %3;"
: "=r"(shuffle_word) : "r"((unsigned int) input_alias[0]), "r"(src_offset), "r"(last_lane));
output_alias[0] = shuffle_word;
#pragma unroll
for (int WORD = 1; WORD < WORDS; ++WORD)
{
asm volatile("shfl.down.b32 %0, %1, %2, %3;"
: "=r"(shuffle_word) : "r"((unsigned int) input_alias[WORD]), "r"(src_offset), "r"(last_lane));
output_alias[WORD] = shuffle_word;
}
// ShuffleDown(input_alias, output_alias, src_offset, last_lane, Int2Type<WORDS - 1>());
return output;
}
Hi,
We should be careful here, line 3 is certainly legal (indexing an
array). The only issue is the pointer_cast, which is also not "illegal"
but probably means that the results are undefined (subsequent accesses
might give a "bus error" or such due to misalignement).
However, if handled properly (alignment and such) the results can be
well defined and are actually really useful. So, it makes sense to think
about"How can we design the system to allow this (after enough
safeguards)". For example, a pointer cast between 32 bit and 16 bit ints
should always be fine on x86, as far as I know, as long as the input
pointer is OK.
Best,
Philipp
Am 14.04.2016 um 11:50 schrieb Roland Leißa:
No, bitcasting i32 to [i8 * 4] is not allowed. Doing stuff like this is
not even allowed in C. For example, the following C code is illegal:int32_t* p = /.../;
int16_t* q = (int16_t*) p;
int16_t i = q[1]; // the C standard does not allow this.I don't understand the second part of your question. Can you give me
some more context or an example?—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#28 (comment)
Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI) GmbH
Trippstadter Strasse 122, D-67663 Kaiserslautern
Geschäftsführung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973, Steuernummer: 19/673/0060/3
No, line 3 is illegal (i.e. undefined behavior). According to the C standard (§6.5 - 7):
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object
[...]
And the effective type of q[1]
is int32_t
. Line 2 is legal as long as the implementation-defined alignment requirements are obeyed.
But I agree that we want to support such things in the long run. I'm just saying, that there are good reasons for C to prohibit such things. Allowing such things will give us performance problems (missed optimization opportunities) at other locations.
@KareemErgawy: I will check out your example at the weekend.
OK, I see. This is certainly not how C has been used for ages but it
seems to be undefined, at last.
Thanks,
Philipp
Am 15.04.2016 um 18:43 schrieb Roland Leißa:
No, line 3 is illegal (i.e. /undefined behavior/). According to the C
standard (§6.5 - 7):An object shall have its stored value accessed only by an lvalue expression that has one of the following types: * a type compatible with the effective type of the object [...]
And the effective type of |q[1]| is |int32_t|. Line 2 is legal as long
as the implementation-defined alignment requirements are obeyed.But I agree that we want to support such things in the long run. I'm
just saying, that there are good reasons for C to prohibit such things.
Allowing such things will give us performance problems (missed
optimization opportunities) at other locations.@KareemErgawy https://github.com/KareemErgawy: I will check out your
example at the weekend.—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#28 (comment)
Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI) GmbH
Trippstadter Strasse 122, D-67663 Kaiserslautern
Geschäftsführung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973, Steuernummer: 19/673/0060/3
Yes, it's a big problem. Many C programmers don't understand the standard and there are many myths regarding the language. And then programmers curse the compiler engineers when the C compiler apparently emits "incorrect" code. But actually the input program was broken to begin with... That's why icc
, for instance, has very conservative optimization flags regarding aliasing rules - even with -O3
.
But anyway, we're getting off-topic :)
@KareemErgawy: I think the easiest solution would be to simply use Rust's idea with this transmute_copy
. Would this be fine for you?
Yes, that would be totally fine. Also for now, I ended up using bitwise operators as suggested by Arsene. Which do the same trick in a less elegant way.
The down side is this will enable us to only read the words of a large variable and not be able to write them (because it is just a copy not an actual alias). But I understand this goes against the design choice for more type safety.
But again, this is totally fine.
Thanks.