/Reduction-to-SAT-Problem

University algorithm-oriented project that accurately detects complex networks of influential mafia families represented as vertices and edges of rare graphs. The algorithms are being reduced to the Satisfiability Problem using a boolean formula that will be evaluated by an SAT Solver, deciding whether the initial clause has been satisfied or not.

Primary LanguageJava

*******************************************************************************
                       GIGEL AND THE MAFIA - Universiy Project
                            Algorithm Analysis 2020-2021

                            Fatu Miruna-Maria 321CA
*******************************************************************************

    The main goal of this project is to conduct an investigation on the mafia 
society in the town, which aims to detect complex networks of influential mafia 
families, ending with the arrest of as few families as possible through which 
all its allied families will be discovered and abolished. In this respect, the
problems encountered in this investigation process will be reduced to the SAT 
problem, querying and asking for help of an Ancient Oracle at every step in
order to validate our proposed solution. 
    Generally, the satisfiability (SAT) problem consists of determining if a
Boolean formula, that will be later evaluated by the Ancient Oracle, is 
satisfiable or unsatisfiable using the conjunctive normal form - a conjuction
of clauses, where every clause is a disjunction.


            -------- ~ TASK 1 - Planting the Spies ~ ---------

    The purpose of the first task of the project is to infiltrate a number of 
spies (k) in the networks of mafia families. In the interest of the mission to be 
accomplished and the spies not to be discovered, the families (nodes) that have
unidirectional connections (edges) between them must necessarily have different 
spies. This property points out the similarity of the current problem with the 
famous problem of k-colorability by reducing to the SAT problem.

    Consider the following notations:
        N - number of families      (nrFamilies)
        K - number of spies         (nrSpies)
        M - number of relations     (nrRel)

• formulateOracleQuestion()
    
    This method builds the whole key plan to successfully distribute spies to 
families by creating valid clauses, as will be seen in the implementation logic:

    ▶ The logic of choosing variables:

        For each mafia family there will be a k number of numerical variables,
                                            where k - number of available spies.
        -> Total variables = N * K 

        Demonstration:
                                            N -> 1 : N

    N = 1   var: 1 - family no.1 can be investigated by the spy no.1
            var: 2 - family no.1 can be investigated by the spy no.2
                                    ...
            var: K - family no.1 can be investigated by the spy no.K
    N = 2   var: K + 1 - family no.2 can be investigated by the spy no.1
                                    ...
            var: (N-1) * K + K: family no.2 can be investigated by the spy no.K
                                    ...
    N = N   var: N * (K-1) + 1: family no.N can be investigated by the spy no.1


    ▶ The logic of the created clauses:

            In order to create the normal conjunctive form to be analyzed by 
         the oracle, 3 types of clauses were designed:

        1. Clauses stating that every family must have at least one assigned 
        spy.

            1 ∨ 2 ∨ 3 ∨ 4  -> Assuming that there are four spies available for
                ...                   the mission
                ...
         N * (K-1) + 1 ∨ 1 * (K-1) + 2 ∨ 1 * (K-1) + 3  1 * (K-1) + 4

        2. Clauses requiring that each family should not have more than one 
        assigned spy.

            1 ∨ 2 
             ...         -> Example    
            1 ∨ K
            
        3. Clauses that establish that no family that has a direct connection
        with another one has the same assigned spy. 
             
    ▶ Time Complexity:

        The first nested "for" instructions in this method, used for creating
        the first two boolean clauses have:

            O((K-1)*(K-2)/2 + K) * O(N) 
                -> O(K^2 + K) * O(N) -> O(K^2)*O(N) 
                        --> O(K^2 * N)

        The second nested "for" instruction through which the third set of
        clause is created has:

                 --> O(N * K * adjacent edges)

      ` ~ Final complexity: O (N*K^2) + O(N * K * adjacent edges)

• readProblemData()
    The data in the input file is processes by constructing an unidirectional 
graph where the families represent the nodes, and the edges are represented by 
the relationships established between the families. The data extracted from 
the input is processed in O(M) complexity;

• decipherOracleAnswer()
    Extract the oracle answer which consists of a list of negative or positive 
numerical variables, where the negative variables are marked as FALSE, while
the positive ones are marked TRUE.
    - O(N*K) time complexity
        
• writeAnswer()
    The method writes to the output file the number of the assigned spy to each
family 
    - O(N) complexity;

    -------- ~ TASK 2 - Investigating the extended families ~ ---------

    The mission of the second task is to investigate an extended family of a 
given dimension. To simplify this problem, families and the connections between
them can be seen as a graph in which the extended family is a clique
of size K.

    Consider the following notations:
        N - number of families      (nrFamilies)
        K - number of spies         (nrSpies)
        M - number of relations     (nrRel)

    • formulateOracleQuestion()

    In the respect of implementing a process of reducing instances of the 
current problem to instances of the SAT Problem, there will be created CNF
boolean clauses completing an entire final formula that will be satistiable
only if the graph has a clique of size K.

▶ The logic of choosing the variables:

    A convinient way to start the implementation is by choosing a suitable
set of numerical variables, as following:

    - for every 1 <= i <= k there will be a list of variables x_iv 
                            ,where  i - index of the ith node in the clique
                                    v - current node

    Total number of variables = N * K

    Demonstration:          
                                            N -> 1 : N

    N = 1   var: 1 - family no.1 might be part of the clique
            var: 2 - family no.1 might be part of the clique
                                    ...
            var: K - family no.1 might be part of the clique
    N = 2   var: K + 1 - family no.2 might be part of the clique
                                    ...
            var: (N-1) * K + K: family no.2 might be part of the clique
                                    ...
    N = N   var: N * (K-1) + 1: family no.N might be part of the clique

▶ The logic of creating the clauses:

    The querying process of the Ancient Oracle involves defining three
logical constraints that align with the bounds of the problem:

    1. Clauses that implies the fact that there must be exactly one i_th
vertex in the clique for each i between 1 and the given dimension K.

                x_1v ∨ x_2v ∨ x_3v ∨ x_4v ,     1 <= i <= K

    2. A set of clauses involving that no vertex can be at the same time 
the i_th and the j_th (where i != j) vertex of the clique. In other words,
two different vertices can not be both the i_th vertex in the clique. 

                ¬x_iv ∨ ¬x_jv , i != j
                ¬x_iv ∨ ¬x_iw , v != w

    3. A list of clauses that emphasize the fact that if there in no edge
between v and w, the nodes v and w can not belong to the same clique.

                ¬x_iv ∨ ¬x_jw , i != j, v != w

▶ Time Complexity:

    The first set of "for" instructions in which the first type of clauses
are formed has a complexity of O(N*K).
    The second nested "for" intruction in this method contains two "for"
loops each of them having N * K operations. Thus, the entire "for"
construction will have O(N*K*N*K) which means a O(N^2*K^2) time complexity.

    Also, another way of computing the complexity of the instructions is by
counting the number of created clauses due to the fact that the purpose
of all "for" instrutions is to create suitable clauses for the CNF
formula.

    Total time complexity: O(N*K) + O((N^2*K^2)

• readProblemData()
    The data in the input file is processes by constructing an bidirectional 
graph using an adjacency matrix where the families represent the nodes, 
and the edges are represented by the relationships established between the
families. The data extracted from the input is processed in O(M) complexity 
(read the data & create the adjacency matrix), while creating the variables
takes up to O(N*K) complexity;

• decipherOracleAnswer()
    Extract the oracle answer which consists of a list of negative or positive 
numerical variables, where the negative variables are marked as FALSE, while
the positive ones are marked TRUE.
    - O(number of total variables) -> O(N*K) time complexity
        
• writeAnswer()
    The method writes to the output file all families' index that make up the
clique of size K.
    - O(N) complexity;

                -------- ~ TASK 3 - Mafia arrest ~ ---------

    In order to constitute an efficient investigation and the number of arrests
to be minimal for which the mafia network will be abolished, the implementation 
of this problem will be reduced to two main ideas:

        1. Creating a complementary graph of the complementary problem of this 
         task

        2. Reducing the current problem to a previous task - task2 in order to
        find out the maximal clique of the complementary graph of this problem

    The most important reason of reducing this problem to the SAT problem using 
a complementary graph is the fact that by successively querying the Oracle we 
can find the maximal clique of the complementary graph. Thus, the families 
(nodes) we are looking for are the ones that do not belong to the maximal 
clique because they represent the nodes that make up the minimal clique of the 
main graph which represent the families that should be arrested.

    Consider the following notations:
        N - number of families      (nrFamilies)
        K - number of spies         (nrSpies)
        M - number of relations     (nrRel)

▶ Time complexity: 

    By successively querying the Oracle and reducing the current problem to a
previously solved task -task 2-, the time complexity should be:

    Main complexity:

    O(N * p) , p = how many times the Oracle has been queried (by successively
                    using task2 functionalities)

• readProblemData()
    The data in the input file is processes by constructing an bidirectional 
graph using an adjacency matrix where the families represent the nodes, 
and the edges are represented by the relationships established between the
families. The data extracted from the input is processed in O(M) complexity 
(read the data & create the adjacency matrix).

• reduceToTask2()
    Writes to the input file of task 2, creating the adjacency matrix of the
complementary problem.
    -> O(N^2) time complexity

• extractAnswerFromTask2()
    Extracting all processed data from the task2 output file.
    -> O(dimension of the maximal clique found using the complementary graph)
        
• writeAnswer()
    The method writes to the output file all families' index that should be 
arrested.
    - O(N) complexity;


        -------- ~ BONUS - Mafia arrest 2nd stage ~ ---------

    This task closely follows the implementation logic of task 2 and 3, 
reducing the current problem to the SAT problem in a very similar way. Thus, 
the process of creating the variables and clauses follows the same algorithms
as the ones previously described for task 2, considering some essential changes:

        1. Approximating the value of the maximal clique size (K), reducing the
    current graph to its complementary problem

        2. Using the Weigthed Partial Max-SAT input format, the created clauses
    can belong to two different categories: hard clauses or soft clauses

        Consider the following notations:
        N - number of families      (nrFamilies)
        K - number of spies         (nrSpies)
        M - number of relations     (nrRel)

▶  The algorithm for approximating the value of maximum k can be demonstrated
as follows:

    Considering that for each node in the complementary graph there will be 
added edges between them and all the remaining nodes in the graph, are obtained
the following operations:

   - Adding N-1 edges -> there will be edges between the first node and all
                        the remaining nodes

                        -> clique dimension -> 2

   - Adding N-2 edges -> two nodes in the graph will have connections with all
                        the remaining nodes in the graph

                        -> clique dimension -> 3

                    .......

   - Adding N - k  -> clique dimension -> k + 1

    Therefore, by repeatedly subtracting the possible edges added to the graph
from the total number of edges of the graph (complementaryEdges), the 
approximate value of the maximal clique dimension of the complementary graph
will be determined.

▶ Determining the type of the clauses:

    The first set of clauses created will be the only set of soft clauses, 
each of them having a different weight than the previous one. By doing this,
the Oracle will try to fulfill as many clauses as it can in order to find the
maximal clique.

    Hard clauses are represented by the last two sets of clauses created in the
formulateOracleQuestion() method so that they establish the minimum necessary 
conditions that must be met so that a clique is valid. These refer to clauses 
no.2 and no.3 of the clauses used for task 2 (more details are provided in the
explanation of task 2).

 For further information, comments in the code should clarify any ambiguity.