Assignment Task
In the multiset partition problem we ask the question can we divide a multiset S into two partitions (S, and S2) that have equal sums? Note that it is not required that |S| = |S2l: in the extreme case, there may be one member of S in S, and all other members of S in S2. Recall also that multiset implies that S (and consequently S, and S2) can contain duplicates, unlike regular sets. For example, if S = {2,2,1,10,5,15,4,1} we can produce S1 = {5,15) and S2 = {2,2,1,1,10,4), which both sum to 20 (i.e., the difference between the sums of S, and S2 is 0).
This childishly simple problem is computationally expensive if we seek to find the best answer. For a set of cardinality n there are 2″ possible subsets, so a brute force approach (ie., trying all possibilities) would lead to exponential time complexity 0(2″). It is traditionally framed as an NP-complete decision problem (i.e.. producing a yes/no result, depending upon whether an equal-sum partition is or is not possible). We can also frame it as an optimisation problem that aims to minimise the absolute difference between the two sums. For instance, if S = {2,6,15,10) then the optimal partition is S1 = {2,15) and S2 = {6,10), which have sums of 17 and 16 respectively. giving an absolute difference of 1. It is the optimisation variant of the partition problem that we shall consider here.
We will compare several approaches to see how well they partition S to minimise the difference between sums over a range of cardinalities. Our multiset S can be implemented as an array in any programming language of your choice (choose wisely, the use of OO may unnecessarily complicate matters). The array should be populated with randomly generated numbers € N in the interval [10 x [S]..100 x [SI]. We will test the following 8 cardinalities of S: 8, 16, 32, 64, 128, 256, 512, and 1024. As our objective is to divide S into two partitions with equal sum, we will define the ideal sum to be equal to the sum of all values in S divided by 2 (although this may not be achievable in practice for a specific multiset S) and measure how close we can get to this ideal.
The Alogorithm to be compare are
A. Add the first half of the values in S to S, and the second half of the values in S to S2 (ie., split the array in half).
B. Add all values stored in even array indices to S, and all values stored in odd array indices to S2. Note: 0 is even.
C. Add all even values in S to S, and all odd values in S to S2.
D. Add the first value in S to S, and the second value in S to S2 and then iterate through the remaining values, adding them to whichever partition currently has the smallest sum.
E. Sort S into ascending order, then add values stored at even array indices to S, and values stored at odd array indices to S2.
F. Sort S into ascending order, then add the first value in S to S, and the second values in S to S2 and then iterate through the remaining values, adding them to whichever partition currently has the smallest sum.
G. Sort S into descending order, then add the first value in S to S, and the second values in S to S2 and then iterate through the remaining values, adding them to whichever partition currently has the smallest sum.
H. Iterate through all possible two-way partitions to find the one that has the lowest deviation from ideal (note: this may run indefinitely for large [S], so run only as far as practically possible).
I. Do some research and find a provably better approach (lower difference between sums) than A..G.
Task
1. Provide a written example walkthrough (similar to those given above above) for a small [S], such as 8, to make sure you fully understand how each method will work in practice (up to 50 words for each method, on average).
2. Provide a function implementing each algorithm (note that, if implemented concisely. A…G should be ~10 lines of code each, possibly even less; H and I may be longer).
3. Conduct some testing to make sure your function operates correctly by sending it some arrays containing known values and comparing the output to a paper-based walkthrough.
When done, complete the following tasks:
4. Provide a test harness that runs functions A..I; send them randomly generated arrays and collect the deviation from the ideal partition. Run each cardinality repeatedly and store all results for plotting.
5. Plot all results on a line graph with ISI on the x-axis and mean deviation from the ideal partition on the y-axis. Either plot in a language like MATLAB or Python with matplotlib, or export a text file (e.g., csv) from you program and plot in Excel. Add error bars to each data point to show the dispersion around the mean for each data point.
6. State the exact work and time complexity of each approach, along with some reasoning to explain how you came to this conclusion (e.g.. for each method), and draw some conclusions as to which method is the best both in term of computing cost and closeness of result to the ideal outcome up.