oschulz/BitOperations.jl

add bflip, mention Int32 bit indexing

JeffreySarnoff opened this issue · 25 comments

It would be helpful to toggle bit[s].

v0.5:
@inline bflip(x::Integer, bit::Integer) = x $ bmask(typeof(x), bit)
@inline bflip{U <: Integer}(x::Integer, bits::UnitRange{U}) = x $ bmask(typeof(x), bits))
v0.6:
@inline bflip(x::Integer, bit::Integer) = xor(x, bmask(typeof(x), bit))
@inline bflip{U <: Integer}(x::Integer, bits::UnitRange{U}) = xor(x, bmask(typeof(x), bits))

Julia codes tighter when the bit, bits, nbits args are Int32 (or Int16 or Int8).

On a 64bit [32bit] system, bmask(Int32, 5) is bmask(Int32, 5::Int64) [bmask(Int32, 5::32)].

#                                                                               using bit::Int32

julia> code_llvm(bmask, (Type{Int32}, Int32))
define i32 @julia_bmask_62946(%jl_value_t*, i32) #0 !dbg !5 {
top:
  %2 = call i32 @"julia_<<_62947"(i32 1, i32 %1) #0
  ret i32 %2
}

julia> code_native(bmask, (Type{Int32}, Int32))
	.text
Filename: bitops.jl
	pushq	%rbp
	movq	%rsp, %rbp
Source line: 32
	movabsq	$"<<", %rax
	movl	$1, %edi
	callq	*%rax
	popq	%rbp
	retq
	nopw	(%rax,%rax)


#                                                                               using bit::Int64

julia> code_llvm(bmask, (Type{Int32}, Int64))
define i32 @julia_bmask_62948(%jl_value_t*, i64) #0 !dbg !5 {
top:
  %2 = icmp slt i64 %1, 0
  %3 = trunc i64 %1 to i32
  %4 = shl i32 1, %3
  %5 = icmp ugt i64 %1, 31
  %6 = select i1 %5, i32 0, i32 %4
  %7 = sub i64 0, %1
  %8 = trunc i64 %7 to i32
  %9 = lshr i32 1, %8
  %10 = icmp ugt i64 %7, 31
  %11 = select i1 %10, i32 0, i32 %9
  %12 = select i1 %2, i32 %11, i32 %6
  ret i32 %12
}

julia> code_native(bmask, (Type{Int32}, Int64))
	.text
Filename: bitops.jl
	pushq	%rbp
	movq	%rsp, %rbp
Source line: 32
	movl	$1, %eax
	movl	$1, %edx
	movl	%esi, %ecx
	shll	%cl, %edx
	xorl	%edi, %edi
	cmpq	$31, %rsi
	cmoval	%edi, %edx
	movq	%rsi, %rcx
	negq	%rcx
	shrl	%cl, %eax
	cmpq	$31, %rcx
	cmoval	%edi, %eax
	testq	%rsi, %rsi
	cmovnsl	%edx, %eax
	popq	%rbp
	retq
	nopw	%cs:(%rax,%rax)

Sorry, @JeffreySarnoff, I completely missed this (watch repo was off for some reason). Is this still an issue (i.e. are you still interested in something like bflip)? Thanks for your remarks concerning 64-bit vs. 32-bit bit-indices!

I was planning to register this package soon - remarks and ideas are very welcome.

Oh, good. Yes, flipping bits is something I use with multibit sets as a word (2 bits for an entity that self-evinces in two or three manners).

The focus on individual bit manipulation is necessary. Having this allows extension to utilize bit fields of small extent. With Julia v0.6, support for [U]Int128 is solid, portable and rather quick with bit ops and +,-, even *.

Let me know if you would like help with some capability.

Consider not using Integer, it has included Bool with attendant dispatch resolution ambiguities.
For bit manipuaiton functions, I use Signed or Unsigned as immediate Integer abstractions and
typealias SUN Union{S igned, Unsigned } or, for v0.6 const SUN = Union{Signed, Unsigned}

With Julia v0.6, support for [U]Int128 is solid [...]

Hadn't seen that yet - very nice!

Consider not using Integer, it has included Bool with attendant dispatch resolution ambiguities.

Hm, true - but using the union is a bit cumbersome as well. Is the dispatch ambiguity really a problem? You can even use a Bool as an array index, so if it can (in principle) be used as a bit index, what's the harm?

I think the harm is less in v0.6 than v0.5. In v0.5, there are half-dozen unrelated packages I wrote where out of Narnia -- Dispatch Ambiguity fn(x,y) matches fn(Integer, Real) or fn(Bool, Real) [<--- not real, but close].

Using Signed is better than using Integer if your targets are Int__s and not UInt_s.
Using Unsigned is better than using Integer if your targets are UInt_s and not Int_s.
Using e.g. `typealias SystemInt Union{..}, or typealias MachineInteger ... etc is better than Integer
--- unless you make explicit signatures including Bool where Signed or Unsigned could appear

Well, even though I'm still on 0.5 for my daily work, it's day are numbered. :-) Is this currently a problem for you, or more of a principal thing? If it's a problem, I can introduce something like

const BitIdx = Union{Signed, Unsigned}

Maybe truncating integers with more than 32 bits to 32 bits would also be helpful, considering your remarks about 64-bit indices resulting in less optimal code?

It is a daily problem -- and even if it were not, you package is more widely useful accepting both kinds.
The faster with 32 bits applies to the type of the variable used as the "how many bits to shift" value.
There is (or was) more of a relative slowdown (on my machine) doing this:

value_to_shift = Int32(1234); number_of_bits_to_shift = Int64(5)
result = value_to_shift << number_of_bits_to_shift

than doing this

value_to_shift = Int64(1234); number_of_bits_to_shift = Int32(5)
result = value_to_shift << number_of_bits_to_shift

relative to

value_to_shift = Int32(1234); number_of_bits_to_shift = Int32(5)
result = value_to_shift << number_of_bits_to_shift

I have been curious about how you chose this way

function msbmask end
export msbmask


function msbmask ...

Its perfectly legit -- I had not seen it before.

I have been curious about how you chose this way [...] function msbmask end

I can't find the docs now, but AFAIK it's the official way to define a new function without methods. It's useful for the main docstring, I guess (well, I still have to add those ...).

The faster with 32 bits applies to the type of the variable used as the "how many bits to shift" value.

I looked at some @code_llvm output, and I see the same. However, when running things in a loop, things seem to be more complex. Take

A = rand(Int32, 5)
f!(A, b) = @inbounds @simd for i in eachindex(A) A[i] = bget(A[i], b) end

Now, for a 64-bit bit numbers:

@code_llvm f!(A, Int64(5))
@code_llvm f!(A, Int64(2):Int64(5))

the loop is vectorized (at least on my 64-bit machine) in both cases. But for a 32-bit bit numbers

@code_llvm f!(A, Int32(5))
@code_llvm f!(A, Int32(2):Int32(5))

the loop is not vectorized in either case - even though A is a 32-bit array.

hmm-- lets ask this as a question for gitter.im folk, if that's ok with you.

it would help to know why that is so, and if there is a way to force the vectorization in both cases

Sure - I'd certainly like to lean more about this!

Take a look at the current dev branch - it uses BitNumber instead of Integer for bit indices/counts now, and it also adds bflip.

The best way is give to give a minimal example -- you are better suited to that than I.
If you can show the behavior with shorter output, I'll post it as a question.

(thanks)

join the chat room and I start things rolling by bringing you in as the question source

Here's a short example:

julia> f!(A, n) = @inbounds @simd for i in eachindex(A) A[i] = A[i] >> n end

julia> @code_llvm f!(rand(Int32, 10), Int64(5))

[...]
vector.ph:                                        ; preds = %overflow.checked
[...]
vector.body:                                      ; preds = %vector.body, %vector.ph
[...]

julia> @code_llvm f!(rand(Int32, 10), Int32(5))

[...]
if10.lr.ph.us17:                                  ; preds = %top
[...]
if10.us18:                                        ; preds = %if10.us18, %if10.lr.ph.us17
[...]

so .. nobody who knows the answer is there right now. I am trying to create the same effect more simply.

It seems the "bad case" is rand(Int32, n) .<< Int64. The other combinations behave better.
idk if that covers the same ground you were seeing

found to be a bug
the issue was posted
write your code to avoid it until fixed -- very good find

I think I have a solution now, have a look at the current "dev" branch. Seems to yield tight code, and bit manipulations (those I've tested so far) vectorize nicely.

looks good