Subset Sum
Installation
Via composer
composer require lmn/subsetsum
Overview
This algorithm works only for positive integers as target and set values.
Complexity
We compare this algorithm to algorithm where we try all possible combinations. In this base case, the complexity is exponential.
If we use dynamic programming we will end up with pseudo polynomial complexity O(n * m)
where n
is number of items in source set and m
is number of target increments.
Number of operations is on logarithmic scale. Calculating all combinations may seem to be linear but on logartihmic scale it means it is growing exponentialy. The same applies for dynamic programming, it may seem to be logarithmical, but on logarithmic scale it may be linear or polynomial (in this case it's linial because number of target incremest is fixed number = 100)
API
We recommend using SubsetSumBuilder
for generating your subsets. To create a builder just call SubsetSum::builder()
static method.
<?php
$subsetTable = SubsetSum::builder()
->withSet([1, 2, 5])
->withTarget(8)
->build();
$subset = $subsetTable->getSubset(8);
?>
You can also use other static methods of SubsetSum class, if you prefer to create your subset tables other way
SubsetSum::create
andSubsetSum::createWithRepetition
.
Set Of Values
Provide a set of possible values used in subset by calling withSet
method of a builder.
<?php
$subsetTable = SubsetSum::builder()
->withSet([1, 2, 5])
...
?>
Target
Subsets are calculated only to maximum target value. You have to define this maximum target value, by calling withTarget
method. This method accepts one argument and that is the maximum targe value.
To define the maximu target of a table, call withTarget
method with first argument of a maximum target value and second argument of taraget spacing.
<?php
$subsetTable = SubsetSum::builder()
->withTarget(800)
...
?>
To optimize calculation we try to use
ext-gmp
php extention. If this extention is not presen, then subset will still be calculated correctly, but if will not be fully optimized. To optimize this algorith manualy, see paragraph below.
If you dont want to install ext-gmp
php extention, and you want to optimize your algoritmus as much as possible, then you can manually set spacing for calculating targets. Use withTargetSpaced
method of a builder. Spacing should be greatest common divider
of set values and final target.
<?php
$subsetTable = SubsetSum::builder()
->withTargetSpaced(800, 100)
...
?>
This example will create targets spaced out by 100
{0, 100, 200, 300, 400, 500, 600, 700, 800}
You can also specify a custom target set unevenly spaced out by calling withTargetSet
method of a builder.
<?php
$subsetTable = SubsetSum::builder()
->withTargetSet([1, 2, 3, 5, 8, 13, 21, 34])
...
?>
Create Subset Sum
To create a Subset Sum table call build
method of a builder;
<?php
$subsetTable = SubsetSum::builder()
...
->build();
?>
Create Subset Sum Table With Repetition
To create a Subset Sum With Repetition table call buildWithRepetition
method of a builder;
<?php
$subsetTable = SubsetSum::builder()
...
->buildWithRepetition();
?>
Find Only Exact Match
Sometimes you may wish yo find subset only if it matches exactly the target value. You can use onlyExactSum
method of a builder. If exact match cannot be found, you will receive empty array after calling $subsetTable->getSubset();
.
<?php
$subsetTable = SubsetSum::builder()
...
->onlyExactSum();
?>
Calculate Result Subset
To calculate a subset from table call getSubset
method of a subsetTable. This method will return a array of values (subset) that can be used to sum to maximal target value.
<?php
$subsetTable = SubsetSum::builder()->...->build();
$subset = $subsetTable->getSubset()
?>
To receive a subset for target smaller then max target, you can call getSubsetForTarget
with target value as a argument.
<?php
$subsetTable = SubsetSum::builder()->...->build();
$subset = $subsetTable->getSubsetForTarget(100)
?>
Custom Compare Function
Sometimes you may want to change the algorithm that decides what is the best cell value. TO change this decision making function call withComperable
method of a builder with an argument of a class that implements Comeparable
interface.
You can also use
withComparableFunction
method of a builder with an callback argument.
<?php
$subsetTable = SubsetSum::builder()
->withComperable(new CustomComperable())
...
?>
Builder provides some predefined comperables.
- prefer greater
- prefer lower
Prefer Greater
To pick a subset that is equal or greater then target use preferGreaterSum
method of a builder
<?php
$subsetTable = SubsetSum::builder()
->preferGreaterSum()
...
?>
Prefer Lower
Tto pick a subset that is equal or lower then target use preferLowerSum
method of a builder
<?php
$subsetTable = SubsetSum::builder()
->preferLowerSum()
...
?>
Algorithm
To compute a subset sum in a polynomial time you have to use dynamic programming. This method will find subset in O(n * m)
where n
is number of items in source set and m
is number of target increments.
Let's assume you want to find a subset equal to 100
and you can use only values in set setOfValues = {10, 20, 50, 70}
. You would divide the target to smaller pecies, Actually, you would want to use the greatest common diviser
of a set of values to create those smaller target pieces. In this example, the GCD would be 10
and our target pieces would look like this setOfTargets = {0, 10, 20, 30, 40 50, 60, 70, 80, 90, 100}
. So our n
would be equal to count(setOfValues) == 4
and our m
would equal to count(setOfTargets) == 11
.
In this case it's faster to compute all combinations of values set (4 * 3 * 2 * 1 = 24) in comparison to our approach (11 * 4 = 44). If the set of values gets larger, we will see the advantage of our approach.
For Subset Sum
To calculate subset sum without repetition, we have to create a table where rows are items of set and columns are target pieces. We will fill this table from top to bottom and from left to right. In other words, we will iterate every row and for every row we iterate every column.
0 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
10 | 0 | 0 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 |
20 | 0 | 0 | 0 | 0 | 10 | 20 | 30 | 40 | 50 | 60 | 70 |
50 | 0 | 0 | 0 | 0 | 10 | 0 | 0 | 0 | 10 | 20 | 30 |
70 | 0 | 0 | 0 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 0 |
Cell value equals to how close we can get to target number with current rows filled. So if the cell value is 0
it means we can create subset that will produce sum equal to column value. If the cell value is 10
that means we can produce a subset, which sum is 10 less then column value. To fill a cell value we will use this algorithm:
- subtract current cell's column value and row value. Let's call it
remainder
. This is a scenario where we assume that the best result could be done by using only subset of size one. The only item in this subset would be current row's value. - if
remainder
is greater then 0, then take value in row above in column of valueremainder
. Let's call this valueremainder optimized
. This is a scenario, where we create a subset with current row value and the best subset forremainder
. - last value is cell in current column and row above. Let's call this value
skip current
. This is a scenario, where we don't want to include current row value to a finnal subset and we expect to get better result by using only previous rows.
Compare all these three values (remainder
, remainder optimized
and skip current
), pick the value, that is the best. In this case we will pick a value that's closest to zero.
0 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
10 | 0 | remainder optimized | 10 | skip current | 30 | 40 | 50 | 60 | 70 | 80 | 90 |
20 | 0 | 0 | 0 | remainder | |||||||
50 | |||||||||||
70 |
0 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
10 | 0 | 0 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 |
20 | 0 | 0 | 0 | 10 | |||||||
50 | |||||||||||
70 |
0 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
10 | 0 | 0 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 |
20 | 0 | 0 | 0 | 0 | |||||||
50 | |||||||||||
70 |
For Subset Sum With Repetition
To create a subset where items can repeat, we will have to flip the table axes. Now we will use set items as columns and target values as rows. The assumption is, that there will always be more target values then set values. This gives us opportunity to repeat a set value in finnal subset. I will also include a best
column. This is just a visual guide to help you understand how the algorithm works. This columns is not physically present in the data structure.
20 | 50 | 70 | best | |
---|---|---|---|---|
0 | -20 | -50 | -70 | -20 |
10 | -10 | -40 | -60 | -10 |
20 | 0 | -30 | -50 | 0 |
30 | -10 | -20 | -40 | -10 |
40 | 0 | -10 | -30 | 0 |
50 | -10 | 0 | -20 | 0 |
60 | 0 | -10 | -10 | 0 |
70 | 0 | 0 | 0 | 0 |
80 | 0 | -10 | -10 | 0 |
90 | 0 | 0 | 0 | 0 |
100 | 0 | 0 | 0 | 0 |
Filling the table is basically the same as filling the table of classig subset table. We will iterate trough every row and for each row iterate trough every column. When the row is filled we will fill the best column at the current row. Best column just stores the best result in a row. We will consider only two values
- subtract current cell's column value and row value. Let's call it
remainder
. This is a scenario where we assume that the best result could be done by using only subset of size one. The only item in this subset would be current column's value. - if the remainder is greater then 0, then take the value in the best column in the
remainder
row. This is a scenarion where we add current column value to the best subset for theremainder
target. Let's call this valueremainder optimized
Current cell will have the better of the two values. In this case we will pick a value closest to zero. When the row is filled we will fill the best column with the best value in the row.
20 | 50 | 70 | best | |
---|---|---|---|---|
0 | -20 | -50 | -70 | -20 |
10 | -10 | -40 | -60 | -10 |
20 | 0 | -30 | -50 | remainder optimized |
30 | -10 | -20 | -40 | -10 |
40 | remainder | |||
50 | ||||
60 | ||||
70 |
20 | 50 | 70 | best | |
---|---|---|---|---|
0 | -20 | -50 | -70 | -20 |
10 | -10 | -40 | -60 | -10 |
20 | 0 | -30 | -50 | 0 |
30 | -10 | -20 | -40 | -10 |
40 | 20 | |||
50 | ||||
60 | ||||
70 |
20 | 50 | 70 | best | |
---|---|---|---|---|
0 | -20 | -50 | -70 | -20 |
10 | -10 | -40 | -60 | -10 |
20 | 0 | -30 | -50 | 0 |
30 | -10 | -20 | -40 | -10 |
40 | 0 | |||
50 | ||||
60 | ||||
70 |