/bsat_wo_paddle

Boolean Satisfaction Mermurings

Primary LanguageC

UP B-SAT WITHOUT A PADDLE =============================================================================================

    ████
████░░░░████
██░░░░░░░░░░▓▓
██░░░░░░▓▓██████
██▒▒░░▓▓▒▒▒▒▒▒▒▒████░░░░░░░░░░
██░░▒▒██▒▒▒▒▒▒▒▒▒▒▒▒██░░░░░░░░░░░░░░░░░░
██▒▒░░██▒▒▒▒▒▒▒▒▒▒▒▒▒▒██░░░░░░░░░░░░░░░░░░
  ██▒▒  ██▒▒▒▒▒▒▒▒▒▒▒▒▒▒██░░░░░░░░░░░░░░░░░░
    ██▒▒░░██▒▒▒▒▒▒▒▒▒▒▒▒▒▒▓▓██░░░░░░░░░░░░░░░░░░
    ░░██▒▒░░██▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒██░░░░░░░░░░░░░░░░
  ░░░░██▒▒▒▒░░██▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒██░░░░░░░░░░░░
  ░░&&░░██▒▒▒▒░░▓▓▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▓▓░░░░░░░░░░
  ░░&&░░░░██▒▒▒▒░░██▒▒▒▒▒▒▒▒▒▒████▒▒████░░░░░░░░
  ░░&&░░░░░░██▒▒▒▒░░██▒▒▒▒▒▒██▓▓▓▓████▒▒██░░░░░░
  ░░&&░░░░░░░░██▒▒▒▒░░██▒▒██▓▓▓▓▓▓▓▓▓▓██▒▒██░░░░░░░░░░░░░░░░
  ░░░░&&░░░░░░░░██▒▒░░██▒▒██▓▓▓▓▓▓▓▓▓▓▓▓████████▒▒░░░░░░░░░░░░
  ░░░░░░&&&&░░░░░░██▒▒░░██▒▒██▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓██▒▒░░░░░░░░░░░░
  ░░░░░░░░░░&&░░░░░░██▒▒░░████▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓██▒▒░░░░░░░░░░░░
  ░░░░░░░░░░░░&&░░░░░░██▒▒░░██▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓██░░░░░░░░░░░░
  ░░░░░░░░░░░░░░&&░░░░░░██▒▒░░▓▓▓▓██████████████▒▒██░░░░░░░░░░░░░░░░
  ░░░░░░░░░░░░░░&&░░░░░░░░██▒▒░░██▒▒▒▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓██░░░░░░░░░░░░░░░░
  ░░░░░░░░░░░░░░&&░░░░░░░░░░██▒▒░░████░░██████████▓▓▓▓████░░░░░░░░░░░░░░░░░░░░░░
    ░░░░░░░░░░░░&&░░░░░░░░░░░░██▒▒░░██ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓██▓▓██░░░░░░&&░░░░░░░░░░░░░░░░
    ░░░░░░░░░░░░&&░░░░░░░░░░░░░░██▒▒░░██▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓██▓▓██░░░░░░((&|!)░░░░░░░░░░░░
    ░░░░░░░░░░░░&&░░░░░░░░!!░░░░░░██▒▒░░▓▓░░██▓▓▓▓▓▓▓▓▓▓▓▓██▓▓▓▓▓▓▓▓▓▓░░░░&&░░||||||░░░░░░
      ░░░░░░░░░░&&░░░░░░░░!!░░░░░░░░██▒▒░░████▒▒██████████▓▓▓▓▓▓▓▓▓▓▓▓██░░░░░░░░░░░░!!░░░░░░
      ░░░░░░░░░░░░&&░░░░░░░░!!░░░░░░░░██▒▒░░▒▒██▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓██▒▒██░░░░░░░░░░░░░░░░░░░░
        ░░░░░░░░░░&|&░░░░░░░░░!!░░░░░░░░▓▓▒▒▒▒████▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓██▒▒▒▒▒▒▓▓░░░░░░░░░░░░░░░░░░
        ░░░░░░░░░░░░&&░░░░░░░░░░!!░░░░░░░░██▒▒██░░██████▓▓▓▓▓▓▓▓██▒▒▒▒▒▒▒▒▒▒████░░░░░░░░░░░░░░
        ░░░░░░░░░░░░░░(&░░░░░░░░!!░░░░░░░░██▒▒██▒▒░░██▒▒████████░░▒▒▒▒▒▒▒▒▒▒▒▒▒▒██░░░░░░░░░░░░░░
        ░░░░!!░░░░░░░░&)░░░░░░░░!!░░░░░░░░░░████▒▒▒▒░░██▒▒▒▒▒▒░░▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒██░░░░░░░░░░░░
        ░░░░░░!!░░░░░░░░&&░░░░░░░░░░░░░░░░░░░░░░██▒▒▒▒░░████░░▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒██░░░░░░░░░░
          ░░░░!(░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░██▒▒▒▒░░░░██▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒██░░░░░░░░
          ░░░░░░|)░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░████▒▒▒▒░░██▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒██░░░░░░
          ░░░░░░&&░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░██▒▒▒▒░░████▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒██░░░░░░░░
          ░░░░░░!!░░░░░░░░░░░░░░░░||░░░░░░░░░░░░░░░░░░░░░░██▒▒▒▒░░░░██▒▒▒▒▒▒▒▒▒▒████▒▒██░░░░░░░░
            ░░░░))░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░██▒▒▒▒▒▒░░██▒▒▒▒████▒▒▒▒░░░░████░░░░
            ░░░░((░░░░░░░░░░░░░░░░░░░░&||&░░░░░░░░░░░░░░░░░░░░░░████▒▒▒▒░░████▒▒▒▒░░░░░░░░░░▒▒▓▓░░
              ░░^^░░░░░░░░░░░░░░&&░░░░░░░░&&░░░░░░░░░░░░░░░░░░░░░░████▒▒▒▒▒▒░░░░░░░░░░░░░░░░▒▒██
              ░░░░||&&░░░░░░░░░░░░&&&&░░░░░░&&░░░░░░░░░░||||░░░░░░░░░░████▒▒▒▒░░░░░░░░░░▒▒████░░
                ░░░░░░!!!!░░░░░░░░░░░░&&&&░░░░((()))░░░░░░░░||░░░░░░░░░░░░████▒▒▒▒░░▒▒████░░░░
                    ░░░░░░&&░░░░░░░░░░░░░░&&░░░░░░░░&&░░░░░░░░||░░░░░░░░░░░░░░████▒▒██░░░░░░░░
                        ░░░░░░░░░░░░░░░░░░░░&&░░░░░░&&░░░░░░░░||░░░░░░░░░░░░░░░░░░██░░░░░░░░░░
                              ░░░░░░░░░░░░░░░░&&░░░░░░&&░░░░░░░░||░░░░░░░░░░░░░░░░░░░░░░░░░░░░
                                        ░░░░░░&&░░░░░░░░░░░░░░░░░░^^^^^^░░░░░░░░░░░░░░░░░░░░
                                              ░░░░░░        ░░░░░░░░░░░░&&░░░░░░░░░░░░░░░░░░
                                                                    ░░░░░░░░░░░░░░░░░░░░

Preface =============================================================================================

pre-preface: I wrote this a few years ago and once finishing I read something which made me think the conclusion I came to was "CNF vs DNF" format. I brushed it off as "p vs np is a compression problem" and never uploaded this. More of a backup than anything, but either way here it is. See: FOLLOW_UP.md

I am a programmer and computer nerd, and not a math nerd. The words you are about to read will from a programming point of view. From a lax, joking, and standards hating pupil of the world :]

Rarely do I read math related content. I barely read papers written in official formats for any fields. Occasionally interest like biology or an exploit write-up warrants a read. Excuse any misunderstanings or lack of math-based expressions. Boolean will be in C-like syntax, until it isn't.

Assumptions will be made about the reader's understanding of boolean logic, programming, algeabra and complex correlation.

The boolean satisfiability problem makes some assumptions I believe, and hope to show, are misconstrued.

Boolean Basics =============================================================================================

At the heart of boolean math, you have 2 core functions. These are the AND && and OR || functions. For simplicity I will refer to them as & and |. There are other operators, such as NOT ! groups () and even XOR ^. These represent a simple binary switch, a 1 or 0, on or off.

Let's explore.

AND

a & b

If a is true, and b is true, the statement is true.

Simple enough. How about the | operator?

OR

a | b

If a is true, or b is true, the statement is true.

If you have programmed or seen the P vs NP problem, you are no doubt familiar with the basics.

From there we get more complexities, groupings, variables. But all of these will be clearly calculatable, just set said variables to true and you're done.

Here we come to the ! operator.

NOT

!a

NOT: meaning we flip the value of a. If it was 1, it is now 0. If 0, now it's 1. It is easy to see if we set this to 0 then it will ultimately pass this simple example.

The operation of ! is what makes the boolean satisfying difficult. Later we will massively 'expand' on this issue. But for now we will keep going forward.

If you have

a & !a

There is no way to solve this. You can not have, and not have, a. It is a single state.

LAZY

Before we get into operations, I would like to simplify a couple more things. The first is important to the whole argument being made. Everything, EVERYTHING, in boolean math can be reduced to what we will call chains or and chains.

This is a larger boolean example than those above.

a & b | c & ( d & e)

We haven't touched too much on expansion and compression yet, but every boolean problem, no matter the operations, can be expressed as sets of &s separated by |s. With or without !s, even ^s. This is key.

You have 2 actual problems here. Each side representing an & chain.

a & b

|

c & d & e

Going forward, and in code, the & will be implied.

ab | cde is much easier to read without &. It will also become important in the way code behaves.

Note: | implies () in above example `(abc) | (cde)` which is clearer to read

Turing Solutions and Operations =============================================================================================

In progamming it is relatively easy to flip bits, move memory and reduce complexity of operations by using a binary format.

For example, I am running a x86_64 ISA, 64bit amd64 based CPU. The 64 bits used for data can be taken advantage of to reduce operations. A lookup-table referring to bits can be used to set values.

e.g.


// ( = 0x28 hex = 40 decimal
lookup_table[255] = { 0 };
...
lookup_table[33]  = 8; // 00001000 binary, ! points here
...
lookup_table[40]  = 2; // 00000010 binary, ( points here
lookup_table[41]  = 1; // 00000001 binary, ) points here
...
lookup_table[124] = 4; // 00000100 binary, | points here
...

t_master[j] |= lookup_table[(unsigned char)input[i]];

By doing this you can assign a bit value 1, 2, 4, 8...n per token. This limits you to only 64 tokens, since we have a 64bit machine. It does however, allow for quick instructions like the |= which will add the bit value to a variable. Unfortunately we do not have "infinite-bit" computers, but this method is used in computing today.

Note: add+ and subtract- can also be used, but we will stick with boolean operations for the sake of B-SAT

Now that we've established how we will be using values. We will do a check for more complex & ! | chains. We will limit the bits to 8 so examples are managable.

For a chain of abc the bits in memory would look like

11100000
abc_____

Back to our simple ! example

a!a

The unsolvable a but !a. This requires 2 variables, pos[itive] and neg[ative], for inclusions and exclusions.

pos = 10000000

neg = 10000000

There is a conflict here, both can't be true. Easy enough to test for: & pos and neg, if your value is non-0, there is a conflict and this can not be true.

1000 & 1000 = 1000

1000 != 0

So far the examples have all been a few operations per-input.

But I would like to ask this before we go any further.

Let's say your friend sends you a file. But you don't know what this file is. So you ask...

yyy: What's this file?
fff: It's 10MB
yyy: I see that, but I can't open it
fff: It's compressed. You have to unzip it
yyy: Oh, so it's not 10MB?
fff: No, it's 10MB

You spend time figuring out what the file is: tar, zip, 7z, rar - you decompress it. Now it's a 100MB file.

Did your friend send you a 10MB file? Or a 100MB file?

You surely can't use the 10MB version. It's only when you expand and unravel the encoded data that you have what something you recognize.

Clearly, I side with the 100MB file.

The reduction of expressions has been shown by various people in various fields, but the expansion is what I'm concerned with. I do not believe reduced problems adequately represent their size; nor present the amount of operations required obviously to the observer. "This is my thesis statement." [Planes, Trains, and Plantains]

With math we recognize what's going on without reformatting the original problem. Whether it's Σ, some other loop, function, or what we will be using here: distributive property.

Expand/Distribute/Condense/Compress/Simplify =============================================================================================

As previously stated, all boolean math can be expanded, with the distributive property, to chains of & statements separated by | statements. This is contrast to the commonly used CNF format, which uses | chains separated by & statements. Also, as previously stated, the only reason we're working these problems at all is because they have potential for a conflicting ! value in them.

The distribution of a & operator cleans up string syntax for parsing issues. As shown previously, this is also useful because you can set the bit for a given token in less space/variables.

a(b|c)

Becomes

(ab) | (ac)

Note there are now only 2 & chains instead of 3. The ab could be represented in memory as 11000000, and the ac as 10100000.

This is much like the distributive property for basic algebra.

4(1 + 4)

Becomes

(1 + 1 + 1 + 1 + 4 + 4 + 4 + 4)

Of course, we would not add, but instead multiply, because we know the shortcut.

Here is a larger problem.

abc((d|be)|c)(((bc|ae)d)ad|c)a(bd|dc)

Without going step-by-step, the end result looks like

(abcd|abcbe|abcc)(abcbcdada|abcaedada|ca)(bda|dca)

This is not a & chain separated by | yet though. So we continue.

(abcd|xxx|xxx)(abcbcdada|xxx|xxx)(bda|xxx)
(abcd|xxx|xxx)(abcbcdada|xxx|xxx)(xxx|dca)
(abcd|xxx|xxx)(xxx|abcaedada|xxx)(bda|xxx)
...
(xxx|xxx|abcc)(xxx|xxx|ca)(xxx|dca)

And the real problem at the end of it

single line
abcdabcbcdadabda | abcdabcbcdadadca | abcdabcaedadabda|abcdabcaedadadca | abcdcabda | abcdcadca | abcbeabcbcdadabda | abcbeabcbcdadaca | abcbeabcaedadabda | abcbeabcaedadadca | abcbecabda | abcbecadca | abccabcbcdadabda | abccabcbcdadadca | abccabcaedadabda | abccabcaedadadca | abcccabda | abcccadca

multi line
abcd   abcbcdada  bda
abcd   abcbcdada  dca
abcd   abcaedada  bda
abcd   abcaedada  dca
abcd   ca         bda
abcd   ca         dca
abcbe  abcbcdada  bda
abcbe  abcbcdada  ca
abcbe  abcaedada  bda
abcbe  abcaedada  dca
abcbe  ca         bda
abcbe  ca         dca
abcc   abcbcdada  bda
abcc   abcbcdada  dca
abcc   abcaedada  bda
abcc   abcaedada  dca
abcc   ca         bda
abcc   ca         dca

The total of actual & chains can be found by

(||)   (||)  (|)
 2+1   2+1    1+1
  3  *   3  *   2 = 18

The chains outside of the | inside each () signify what's really going on.

This is not a single 38 boolean input problem. This is a 18 set of & chains separated by | operators, 267 input problem. Though we look at it differently in our minds, rationalizing and compressing/decompressing data to simplify our understanding - this is a way more complex problem than it first appeared.

This example still lacks !s that are required to qualify as unsolveable. But these tokens could be anything. Simplified representations of !s that get translated to a x e.g. x = !a, or an enitre set of variables abc so that you only have to check 1 bit instead of 3. But this is all logic we create. Not all transmutations are efficient for every problem, but they are doable.

When a computer compiles code, we introduce our own condensing and logic so that it produces efficient and fast results: transmutation. In code, the if-statements we write become cmp, jne, and je instructions. Constantly checking your cmp and jmp in-line while checking has 2 issues: first, it adds operations while running; second, some of those operations may leave the code flow, which does not give us "worst case" per a run. Other algorithms use such techniques. I would say they are not suited for proof-of-concept code.

I want to get into these algorithms, but you may have wondered why ! examples have been so elusive up until now. I suppose it's time I tell you why :[

NOT Hell

NOT, ! - This may be the largest reason B-SAT distribution is so difficult. It seems innocent at first, just !a, a pos and neg 64bit variable, & them, no issue. But when you add the distributive property...

Take : !( (bc|ae) (a|b) )

expand ! to each side of either | in (|)(|)

( (!b!c|!a!e) (!a|!b) )

However... (!b!c|!a!e) isn't quiet the same as !(bc|ae).

In the orignal if you had bc or ae the statement could be true, but 1 of each could be wrong.

(!b|c)(!a|!e) follows the same logic. This means ! flips & and | symbols when expanded.

Following through gives us : ( (!b|!c)(!a|!e) | (!a!b) )

This means 3 | operators - compared to 2 before.

So instead of what appeared to be 4 branches of | chains

(|)  (|)
1+1  1+1
2   *  2 = 4

There are over 4x more branches to solve the original !() - This ! drastically reduced the actual size of this problem.

(|)   (|)   (|)
2+1    2+1    1+1
3    *   3    *  2 = 18

Another example : !( !(bc|ae) d (a|b) ) Even worse - double !s

Expand non ! : !( !(dbc|dae) (da|db) )

This becomes... !( (!d|!b|!c)(!d|!a|!e) (da|db) )

then ( !(!d|!b|!c) | !(!d|!a|!e) | !(da|db) )

then further ( !(!d|!b|!c) | !(!d|!a|!e) | ((!d|!a)(!d|!b)) )

Now !(!x) is a double negative. Which means all of the | and ! must go away. If any one of them is false, the whole expression becomes true

and that's NOT what you want

(dbc)|(dae)

Well, wait, that isn't right. Originally we had !( !(dbc|dae) ...) so if EITHER dbc OR dae became true, we ! that true we fail the inside chain, which the oustside !false passes - we have to add the | back in

( (dbc|dae) (!d|!a)(!d|!b) )

but even !( !(dbc|dae) (da|db) ) is wrong

! expansion order

In our original !( !(bc|ae) d (a|b) ) if d was false, the outside !() would become true. So the order of expansion matters too.

Trying again...

Start with : !( !(bc|ae) d (a|b) )

Expand outside ! : ( (bc|ae) | !d | !(a|b) ) The inside double !(!()) cancels out and the implied &s become |s

Expand the newly added !() : ( (bc|ae) | !d | (!a!b) )

Is this right?

compare ! distribution

!( !(bc|ae) d (a|b) )

So if bc were true, we would ! it. The inside !(true) would become false, failing the rest we would !false, and be true.

If bc OR ae were not true, say !c and !e we would have a !false: true. !true: false.

Move to d, if that were true, !true: false.

Move to (a|b), say either are true we end up with !true - and we fail overall.

Again:

If bc or ae chains were true...we're fine. If d is false...we're fine. If both in the a|b set fail...we're fine.

All of these conditions seem to be true for : ( (bc|ae) | !d | (!a!b) ) and !( !(bc|ae) d (a|b) )

The original had 4 total (|) paths. The real representation happens to have 4 as well.

But, you can see in a simpler example that we could easily increase our total | count.

!(abcdefg) becomes !a|!b|!c|!d|!e|f|!g

enough of NOT

It would seem ! hides and adds most of the complexity behind statements. & become |, | become &, includes become excludes. !! doubles up and wreak inside ()

Algebra expansion is less thought provoking

4(5*1+1)

We wouldn't multiply 5 by itself 4 times. But the () does mean we multiply 5 by 4 once in this case.

The only "positive" thing to be said, is parsing is a bit easier when you can flip & to |, and | to &. But the actions involved with !() expansion imply a lot of complexity in a single token.

! is the bane of boolean logic. It changes entire regions into | segments.

XOR

I won't go into much on XOR ^, but similar to NOT ! there are hidden expansions and compressed problems.

a^b expands to (a && !b) | (!a && b)

And further down the line

!( !(a^b | c^a) d^e)

!( !( (a!b | !ab) | (c!a | !ca) ) (d!e|!de) )

!( (!aba!b) (!cac!a) | !ded!e)

( !(!aba!b) | !(!cac!a) !(!ded!e) )

( (a|!b|!a|b) | (c|!a|!c|a) (d|!e|!d|e) )

( (a|!b|!a|b|c|!a|!c|a) (d|!e|!d|e) )

We went from 2 | paths to 10. Of course in this example we could reduce more. There are duplicates and counters like !a|a

But it is another very compressed represenation of what is actually going on under the hood. Even executing a single ^ is a not a simple task.

Algorithmic Basic BSAT Solver

Given an & chain, e.g. abcde!f, the solution is a simple loop to perform a few operations per input.

do_bsat(...)

...

  register int i;
  register unsigned long int pos = 0;
  register unsigned long int neg = 0;
  register unsigned char cur = 0;

  for (i=0; i < strlen(b); i++) {
     cur = (unsigned char)b[i];
     switch(cur) {
       case '!':
         cur = (unsigned char)b[++i];
         neg |= lookup[dec(cur)];
         break;
       default:
         cur = (unsigned char)b[i];
         pos |= lookup[dec(cur)];
     }
  } // for strlen(b)

  return pos & neg;

Algorithmic Expansion - untie NOTs

NOT ! distribution should be done first to avoid any logic from backwards distributed values and complexity. This makes the code easier to understand for humans too :]

 for (i=0; i < in_sz ; i++) {
    cur_tok = in[i];
    switch (cur_tok) {
      case ' ':             continue;
      case '!':
         cur_tok = in[++i];     // look ahead
         if (cur_tok == '(') {  // !()
           l_pos++;
           not_layer++;
         } else                 // !x
           ADD(dec('!') );
         break;
      case '(':
         l_pos++;
         break;
      case ')':
         l_pos--;
         if (l_pos < not_layer)
            not_layer--;
         break;
      case '|':
         i++;   // skip expected next |
         if (is_notting)    continue;
         else               or_tot++;
         break;
      case '&': // only used with !() dist
         i++;   // skip expected next &
         if (is_notting) {
           cur_tok = dec('|');
           or_tot++;
           break;
         }
         continue;
      default:  // cur_tok > 16
         if (is_notting)    ADD(dec('!') );
         break;
    } // switch(cur_tok)
    ADD(cur_tok);
  } // for strlen(in)

Algorithmic Expansion - distribute x()x

void do_dist(chain *or, p_tree p, int e) {

  int s;
  for (s=p.start; s <= e; s++) {
    or[s].pos |= p.pos_dist;
    or[s].neg |= p.neg_dist;
  }
  p.pos_dist  = 0;
  p.neg_dist  = 0;
} // void do_dist

...

  while (in[i] != 0x0) {
    cur_tok = in[i];
    switch (cur_tok) {
      case 1: // (
         p_layers[l_pos].start = chain_pos;
         p_layers[l_pos].pos_dist |= cur_p;
         p_layers[l_pos].neg_dist |= cur_n;
         cur_p = cur_n = is_disting = 0;
         l_pos++;
         break;
      case 2: // )
         if (is_disting == 1) { // NOT (x|x)
            p_layers[l_pos].pos_dist |= cur_p;
            p_layers[l_pos].neg_dist |= cur_n;
         } else {
            or_ch[chain_pos].pos |= cur_p;
            or_ch[chain_pos].neg |= cur_n;
            chain_pos++;
         }
         is_disting = 1;
         cur_p = cur_n = 0;
         l_pos--;
         if (l_pos == 0) { // back out to layer 0
           csz_pos++;
           is_disting = 0; // ok bc ( assumes disting
         } else {
           do_dist(or_ch, p_layers[l_pos], chain_pos);
         }
         break;
      case 4: // |
         if (l_pos > 0) // need to add layer 0
           c_szs[csz_pos]++;
         if (is_disting == 1) {
            p_layers[l_pos].pos_dist |= cur_p;
            p_layers[l_pos].neg_dist |= cur_n;
            do_dist(or_ch, p_layers[l_pos], chain_pos);
         } else {
            or_ch[chain_pos].pos |= cur_p;
            or_ch[chain_pos].neg |= cur_n;
            chain_pos++;
         }
         cur_p = cur_n = is_disting = 0; // has to be x|x
         p_layers[l_pos].start = chain_pos + 1;
         break;
      case 8: // !
         // these should only be !x now
         cur_tok = in[++i];
         cur_n |= cur_tok;
         break;
      default:  // cur_tok > 16
         cur_p |= cur_tok;
         break;
    } // switch(cur_tok)
    i++;
  } // while strlen(in)

...
// wrap up code excluded in doc
// do_dist once more for layer 0
...

Algorithmic OR chain to AND chain conversion

A recursive function with the right multi-array data can work through the hidden complexities of the | chains we've determined.

chain expand(...)
{
  for (n=0 ; n < cur_sz ; n++) {
    cur_ch.pos = or_ch[or_start + n].pos | dist.pos;
    cur_ch.neg = or_ch[or_start + n].neg | dist.neg;
    if (csz_idx < c_szs[0]) { // max or-block (||)(||)(|)
      chain tmp_ch = expand(zero_solves, or_ch, c_szs, or_start + cur_sz, csz_idx + 1, cur_ch);
      cur_ch.pos |= tmp_ch.pos;
      cur_ch.neg |= tmp_ch.neg;
    } else {
      if ( (cur_ch.pos | dist.pos) & (cur_ch.neg | dist.neg) ) {
        // use case here
      } 
    }
  } // for n < cur_sz
} // func

Gate Logic and Problem Solving

Back to the basics on boolean operations. These are hardware circuits that allow electricity to flow.


a && b              a || b              a ^ b
1 op                1 op                1 op

a--\                a--\                a--\
    ---if both          ---if either        ---if one
b--/                b--/                b--/
      
for all there are 2 inputs with 1 output
      
  0 0 = 0            0 0 = 0            0 0 = 0
  0 1 = 0            0 1 = 1            0 1 = 1
  1 0 = 0            1 0 = 1            1 0 = 1
  1 1 = 1            1 1 = 1            1 1 = 0
      
1 true condition    3 true conditions   2 true conditions

There are different outcomes for diferent operations. Only the & is true under a single condition. This makes finding true boolean statements for & chains easy to do. The | and ^ on the other hand have multiple states they could have come from, and only a single value telling you 1 of multiple states happened.

Let's solve some algebra problems to explain why these multiple answers are an issue.

x * y = ?

How do you come up with an answer? You don't have enough information. You know there are some variables and a multiplicaiton operator.

x ? y = ?

No way to come up with an answer, you don't even know the operator.

x * y = 100

Still fairly hard to deduce, a few answers come to mind, now you may be able to guess values.

4 * y = 100

Now we have enough information to come up with a solution for y.

Similarly, with boolean |

a | b = ?

No way to come up with an answer

a | b = 0

Must mean both a and b are 0, easy to solve

However

a | b = 1

There are 3 possible answers, and even if we are given a single value

1 | b = 1

There are 2 correct states left

Alternatively: & suffers the same problem when being solved for false.

Even by cheating, like in the above algorithms, and reducing the | to an ^ for 2 total possible answers instead of 3, we can not tell which is the case for an outcome of true.

This MUST be guessed, and given the unknowns it is impossible to be certain the first guess.

Arguments and Conclusion

It can be shown with the Basic BSAT solver that simple & chains are solvable in Polynomial-Time.

It can be shown with logic gates, that | has multiple true conditions, while & has multiple false conditions.

If the multiple | conditions are computed by distribution, into & chains such that the solution is true, then the problem can be solved in Polynomial-Time.

This table shows the exponential growth of problems being solved, in both: basic form, with expanded & chains as input; distributive form with reduced input.

input size basic_solve distributed_solve basic_instruction_count dist_instruction_count
518400 0.040417 0.005921 207373123 11781296
388800 0.032709 0.005744 139141279 13133453
155520 0.022558 0.002048 51339896 5432433
77760 0.005535 0.001331 25867540 4576715
38880 0.002473 0.000469 12321431 1581202
29170 0.001982 0.000175 9208968 1474001
19440 0.001473 0.000124 6178766 1366859
9720 0.00054 0.000185 3067639 616964
6480 0.000354 0.000162 2113132 581251
3240 0.000044 0.000034 1130063 330407

[ github: xlsx files should be with this markdown file. more data on function calls included ]

The slow down from distributed to basic can be attributed 2 factors: strtok and input size. The strtok operation is used to read lines from a buffer. These lines are bigger in memory, as the distributive solution has these longer string values reduced to a single 8 byte value no matter the size. The 8 byte value is computed once and used thenceforth. This results in the distributive solution having a reduced time to solve and instruction count. If the basic solver was given pre-binary-converted input, it would be: read 2 8-sized bytes, & 1. This would take considerably less time, but the growth of time-per-input would remain steady.

Solving for a simple & only expression takes relatively short time. The introduction of | complicates things by reducing what is actually multiple possibilities of & statements to a easier to read format for us, the user.

The placement and number of |s in a problem can make the difference between an end list of 5000 & chains or 500,000.

The amount of solvable input follows the time to complete a given Boolean SATisfiable test.

Back to the question of the friend who gave you a 10MB file. Someone I asked this question to responsed with, "I would say you have a 10MB file, but 100MB of data." I believe this is an accurate way to represent the problem at hand. The input may be 100 tokens of boolean expressions, but the data it represents could be in the 1000s or more. An &(|) statement is as much a reduction as 2^2 meaning 22, or 2^4 meaning 2222. The big difference being, while arithmetic gives infinite results - boolean does not. There is a single result, with 2 possile values, from a myriad of input. The common CNF format is | chains separated by &s, which is still a heavily reduced set of problems.